IBM InfoSphere DataStage vs StreamSets comparison

Cancel
You must select at least 2 products to compare!
IBM Logo
10,952 views|9,105 comparisons
82% willing to recommend
StreamSets Logo
4,200 views|2,349 comparisons
100% willing to recommend
Comparison Buyer's Guide
Executive Summary

We performed a comparison between IBM InfoSphere DataStage and StreamSets based on real PeerSpot user reviews.

Find out in this report how the two Data Integration solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
To learn more, read our detailed IBM InfoSphere DataStage vs. StreamSets Report (Updated: May 2024).
771,212 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"It is quite useful and powerful.""Highly customizable: Allowing you to handle multiple data latencies (scheduled batch, on-demand, and real-time) in the same job.""The product is a stable and powerful data management solution that can run in parallel mode for enhanced speed.""It works with multiple servers and offers high availability.""The solution has improved the time it takes to perform tasks related to batch applications.""The most valuable feature of the solution is the ability to incorporate very complex business rules in Data Stage.""The data lineage report can be filtered for reporting. The reports are user-friendly and take less time to find what you need.""It's a robust solution."

More IBM InfoSphere DataStage Pros →

"I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks.""The scheduling within the data engineering pipeline is very much appreciated, and it has a wide range of connectors for connecting to any data sources like SQL Server, AWS, Azure, etc. We have used it with Kafka, Hadoop, and Azure Data Factory Datasets. Connecting to these systems with StreamSets is very easy.""I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally.""The ETL capabilities are very useful for us. We extract and transform data from multiple data sources, into a single, consistent data store, and then we put it in our systems. We typically use it to connect our Apache Kafka with data lakes. That process is smooth and saves us a lot of time in our production systems.""The entire user interface is very simple and the simplicity of creating pipelines is something that I like very much about it. The design experience is very smooth.""It's very easy to integrate. It integrates with Snowflake, AWS, Google Cloud, and Azure. It's very helpful for DevOps, DataOps, and data engineering because it provides a comprehensive solution, and it's not complicated.""The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize it to do what you need. Many other tools have started to use features similar to those introduced by StreamSets, like automated workflows that are easy to set up.""The UI is user-friendly, it doesn't require any technical know-how and we can navigate to social media or use it more easily."

More StreamSets Pros →

Cons
"The initial setup can be complex.""It doesn't have any big data connections. It would be good to have them because most of the systems are moving towards big data. There should also be a user-friendly way to interact with the cloud. Its loading process is very slow. It takes a lot of time for around 5 or 6 million records, and we are not able to provide real-time data to the vendors due to this delay. Its performance needs to be improved. It is also like a legacy system. It is not updated much. In higher versions, they only do small changes. We would like to have new features and new technologies.""The template mapping could be easier.""The solution can be a bit more user-friendly, similar to Informatica.""The solution should be more user-friendly.""The interface needs improvement. It is really too technical. That is the main problem.""DataStage is quite expensive. It is too hard to find a consultant using DataStage in Turkey.""There could be more customization options for the product."

More IBM InfoSphere DataStage Cons →

"The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time.""We've seen a couple of cases where it appears to have a memory leak or a similar problem.""The execution engine could be improved. When I was at their session, they were using some obscure platform to run. There is a controller, which controls what happens on that, but you should be able to easily do this at any of the cloud services, such as Google Cloud. You shouldn't have any issues in terms of how to run it with their online development platform or design platform, basically their execution engine. There are issues with that.""We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back.""The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date.""Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful.""StreamSet works great for batch processing but we are looking for something that is more real-time. We need latency in numbers below milliseconds.""Sometimes, when we have large amounts of data that is very efficiently stored in Hadoop or Kafka, it is not very efficient to run it through StreamSets, due to the lack of efficiency or the resources that StreamSets is using."

More StreamSets Cons →

Pricing and Cost Advice
  • "High-cost of ownership: They could take a page from open source software."
  • "Pricing varies based on use, and it is not as costly as some competing enterprise solutions."
  • "Small and medium-sized companies cannot afford to pay for this solution."
  • "The cost is too high."
  • "It's very expensive."
  • "Our internal team takes care of group licensing and cost. We don't have individual licenses. We have group licensing at the company level. Usually, IBM doesn't charge anything separately on the licensing side. For storage and everything else, we are paying around $6,000 per month, which is not very high. It includes Linux data storage, execution, and licensing. They're charging $40 for one-hour execution. Based on that, we are spending around $2,000 on the production environment and $1,000 on the lower environment for testing and development-side executions. For the mainframe, we are using the Db2 mainframe database, and we are spending around $1,000 on the Db2 mainframe database as well. All this comes out to be around $6,000. We, however, would like to have some cost reduction."
  • "The price is expensive but there are no licensing fees."
  • "It is quite expensive."
  • More IBM InfoSphere DataStage Pricing and Cost Advice →

  • "We are running the community version right now, which can be used free of charge."
  • "StreamSets Data Collector is open source. One can utilize the StreamSets Data Collector, but the Control Hub is the main repository where all the jobs are present. Everything happens in Control Hub."
  • "It has a CPU core-based licensing, which works for us and is quite good."
  • "There are different versions of the product. One is the corporate license version, and the other one is the open-source or free version. I have been using the corporate license version, but they have recently launched a new open-source version so that anybody can create an account and use it. The licensing cost varies from customer to customer. I don't have a lot of input on that. It is taken care of by PMO, and they seem fine with its pricing model. It is being used enterprise-wide. They seem to have got a good deal for StreamSets."
  • "The pricing is good, but not the best. They have some customized plans you can opt for."
  • "We use the free version. It's great for a public, free release. Our stance is that the paid support model is too expensive to get into. They should honestly reevaluate that."
  • "The overall cost for small and mid-size organizations needs to be better."
  • "There are two editions, Professional and Enterprise, and there is a free trial. We're using the Professional edition and it is competitively priced."
  • More StreamSets Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Data Integration solutions are best for your needs.
    771,212 professionals have used our research since 2012.
    Questions from the Community
    Top Answer: My company currently uses the free version of the product, and we are definitely switching to a paid one. We needed a tool that can help us not only integrate our data but use it effectively. For the… more »
    Top Answer: I think the tool may cause some difficulties if you have not used other data integration solutions before. I have worked at companies that used different tools for data integration, and they work… more »
    Top Answer:IBM Cloud Paks makes a big difference in your data integration. My company has been using it alongside IBM InfoSphere DataStage and while the main product is good on its own, this one truly expands… more »
    Top Answer:The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize… more »
    Top Answer:We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which was… more »
    Top Answer:StreamSets is used for data transformation rather than ETL processes. It focuses on transforming data directly from sources without handling the extraction part of the process. The transformed data is… more »
    Ranking
    7th
    out of 101 in Data Integration
    Views
    10,952
    Comparisons
    9,105
    Reviews
    16
    Average Words per Review
    467
    Rating
    7.9
    8th
    out of 101 in Data Integration
    Views
    4,200
    Comparisons
    2,349
    Reviews
    22
    Average Words per Review
    1,306
    Rating
    8.4
    Comparisons
    Learn More
    StreamSets
    Video Not Available
    Overview

    IBM InfoSphere DataStage is a high-quality data integration tool that aims to design, develop, and run jobs that move and transform data for organizations of different sizes. The product works by integrating data across multiple systems through a high-performance parallel framework. It supports extended metadata management, enterprise connectivity, and integration of all types of data.

    The solution is the data integration component of IBM InfoSphere Information Server, providing a graphical framework for moving data from source systems to target systems. IBM InfoSphere DataStage can deliver data to data warehouses, data marts, operational data sources, and other enterprise applications. The tool works with various types of patterns - extract, transform and load (ETL), and extract, load, and transform (ELT). The scalability of the platform is achieved by using parallel processing and enterprise connectivity.

    The solution has various versions, catering to different types of companies, which include the Server Edition, the Enterprise Edition, and the MVS Edition. Depending on which version a company has bought, different goals can be achieved. They include the following:

    • Designing data flows to extract information from multiple sources, transform the data, and deliver it to target databases or applications.

    • Delivery of relevant and accurate data through direct connections to enterprise applications.

    • Reduction of development time and improvement of consistency through prebuilt functions.

    • Utilization of InfoSphere Information Server tools for accelerating the project delivery cycle.

    IBM InfoSphere DataStage can be deployed in various ways, including:

    • As a service: The tool can be accessed from a subscription model, where its capabilities are a part of IBM DataStage on IBM Cloud Park for Data as a Service. This option offers full management on IBM Cloud.

    • On premises or in any cloud: The two editions - IBM DataStage Enterprise and IBM DataStage Enterprise Plus - can run workloads on premises or in any cloud when added to IBM DataStage on IBM Cloud Pak for Data as a Service.

    • On premises: The basic jobs of the tool can be run on premises using IBM DataStage.

    IBM InfoSphere DataStage Features

    The tool has various features through which users can integrate and utilize their data effectively. The components of IBM InfoSphere DataStage include:

    • AI services: The tool offers services such as data science, event messaging, data warehousing, and data virtualization. It accelerates processes through artificial intelligence (AI) and offers a connection with IBM Cloud Paks - the cloud-native insight platform of the solution.

    • Parallel engine: Through this feature, ETL performance can be optimized to process data at scale. This is achieved through parallel engine and load balancing, which maximizes throughput.

    • Metadata support: This feature of the product uses the IBM Watson Knowledge Catalog to protect companies' sensitive data and monitor who can access it and at what levels.

    • Automated delivery pipelines: IBM InfoSphere DataStage reduces costs by automating continuous integration and delivery of pipelines.

    • Prebuilt connectors: The feature for prebuilt connectivity and stages allows users to move data between multiple cloud sources and data warehouses, including IBM native products.

    • IBM DataStage Flow Designer: This feature offers assistance through machine learning design. The product offers its clients a user-friendly interface which facilitates the work process.

    • IBM InfoSphere QualityStage: The tool provides a feature that automatically resolves data quality issues and increases the reliability of the delivered data.

    • Automated failure detection: Through this feature, companies can reduce infrastructure management efforts, relying on the automated detection that the tool offers.

    • Distributed data processing: Cloud runtimes can be executed remotely through this feature while maintaining its sovereignty and decreasing costs.

    IBM InfoSphere DataStage Benefits

    This solution offers many benefits for the companies that utilize it for data integration. Some of these benefits include:

    • Increased speed of workload execution due to better balancing and a parallel engine.

    • Reduction of data movement costs through integrations and seamless design of jobs.

    • Modernization of data integration by extending the capabilities of companies' data.

    • Delivery of reliable data through IBM Cloud Pak for Data.

    • Utilization of a drag-and-drop interface which assists in the delivery of data without the need for code.

    • Effective data manipulation allows data to be merged before being mapped and transformed.

    • Creating easier access of users to their data by providing visual maps of the process and the delivered data.

    Reviews from Real Users

    A data/solution architect at a computer software company says the product is robust, easy to use, has a simple error logging mechanism, and works very well for huge volumes of data.

    Tirthankar Roy Chowdhury, team leader at Tata Consultancy Services, feels the tool is user-friendly with a lot of functionalities, and doesn't require much coding because of its drag-and-drop features.

    StreamSets is a data integration platform that enables organizations to efficiently move and process data across various systems. It offers a user-friendly interface for designing, deploying, and managing data pipelines, allowing users to easily connect to various data sources and destinations. StreamSets also provides real-time monitoring and alerting capabilities, ensuring that data is flowing smoothly and any issues are quickly addressed.

    Sample Customers
    Dubai Statistics Center, Etisalat Egypt
    Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
    Top Industries
    REVIEWERS
    Computer Software Company50%
    Insurance Company14%
    Transportation Company7%
    Healthcare Company7%
    VISITORS READING REVIEWS
    Financial Services Firm26%
    Manufacturing Company11%
    Computer Software Company10%
    Insurance Company7%
    REVIEWERS
    Financial Services Firm21%
    Energy/Utilities Company21%
    Comms Service Provider14%
    Computer Software Company14%
    VISITORS READING REVIEWS
    Financial Services Firm17%
    Computer Software Company13%
    Manufacturing Company8%
    Government7%
    Company Size
    REVIEWERS
    Small Business45%
    Midsize Enterprise6%
    Large Enterprise49%
    VISITORS READING REVIEWS
    Small Business16%
    Midsize Enterprise9%
    Large Enterprise74%
    REVIEWERS
    Small Business40%
    Midsize Enterprise12%
    Large Enterprise48%
    VISITORS READING REVIEWS
    Small Business16%
    Midsize Enterprise11%
    Large Enterprise73%
    Buyer's Guide
    IBM InfoSphere DataStage vs. StreamSets
    May 2024
    Find out what your peers are saying about IBM InfoSphere DataStage vs. StreamSets and other solutions. Updated: May 2024.
    771,212 professionals have used our research since 2012.

    IBM InfoSphere DataStage is ranked 7th in Data Integration with 37 reviews while StreamSets is ranked 8th in Data Integration with 24 reviews. IBM InfoSphere DataStage is rated 7.8, while StreamSets is rated 8.4. The top reviewer of IBM InfoSphere DataStage writes "User-friendly with a lot of functions for transmission rules, but has slow performance and not suitable for a huge volume of data". On the other hand, the top reviewer of StreamSets writes "We no longer need to hire highly skilled data engineers to create and monitor data pipelines". IBM InfoSphere DataStage is most compared with IBM Cloud Pak for Data, SSIS, Azure Data Factory, Talend Open Studio and Alteryx Designer, whereas StreamSets is most compared with Fivetran, Azure Data Factory, Informatica PowerCenter, SSIS and Oracle GoldenGate. See our IBM InfoSphere DataStage vs. StreamSets report.

    See our list of best Data Integration vendors.

    We monitor all Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.