We performed a comparison between Apache Spark and Cloudera Distribution for Hadoop based on real PeerSpot user reviews.
Find out in this report how the two Hadoop solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."There's a lot of functionality."
"The product’s most valuable feature is the SQL tool. It enables us to create a database and publish it."
"The product's deployment phase is easy."
"One of Apache Spark's most valuable features is that it supports in-memory processing, the execution of jobs compared to traditional tools is very fast."
"Spark helps us reduce startup time for our customers and gives a very high ROI in the medium term."
"It's easy to prepare parallelism in Spark, run the solution with specific parameters, and get good performance."
"The scalability has been the most valuable aspect of the solution."
"The features we find most valuable are the machine learning, data learning, and Spark Analytics."
"It is helpful to gather and process data."
"In terms of scalability, if you have enough hardware you can scale out. Scalability doesn't have any issues."
"The most valuable feature is that I can use CDH for almost all use cases across all industries, including the financial sector, public sector, private retailers, and so on."
"CDH has a wide variety of proprietary tools that we use, like Impala. So from that perspective, it's quite useful as opposed to something open-source. We get a lot of value from Cloudera's proprietary tools."
"Very good end-to-end security features."
"The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized."
"Customer service and support were able to fix whatever the issue was."
"The main advantage is the storage is less expensive."
"It's not easy to install."
"I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it."
"There could be enhancements in optimization techniques, as there are some limitations in this area that could be addressed to further refine Spark's performance."
"The management tools could use improvement. Some of the debugging tools need some work as well. They need to be more descriptive."
"The initial setup was not easy."
"The graphical user interface (UI) could be a bit more clear. It's very hard to figure out the execution logs and understand how long it takes to send everything. If an execution is lost, it's not so easy to understand why or where it went. I have to manually drill down on the data processes which takes a lot of time. Maybe there could be like a metrics monitor, or maybe the whole log analysis could be improved to make it easier to understand and navigate."
"When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data."
"Technical expertise from an engineer is required to deploy and run high-tech tools, like Informatica, on Apache Spark, making it an area where improvements are required to make the process easier for users."
"There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon."
"The initial setup of Cloudera is difficult."
"The user infrastructure and user interface needs to be improved, as well as the performance. The GUI needs to be better."
"While the deployed product is generally functional, there are instances where it presents difficulties."
"It could be faster and more user-friendly."
"They should focus on upgrading their technical capabilities in the market."
"The procedure for operations could be simplified."
"There are better solutions out there that have more features than this one."
More Cloudera Distribution for Hadoop Pricing and Cost Advice →
Apache Spark is ranked 1st in Hadoop with 60 reviews while Cloudera Distribution for Hadoop is ranked 2nd in Hadoop with 47 reviews. Apache Spark is rated 8.4, while Cloudera Distribution for Hadoop is rated 8.0. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Cloudera Distribution for Hadoop writes "Good end-to-end security features and we like that it's cloud independent". Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and AWS Lambda, whereas Cloudera Distribution for Hadoop is most compared with Amazon EMR, HPE Ezmeral Data Fabric, MongoDB, Cassandra and ScyllaDB. See our Apache Spark vs. Cloudera Distribution for Hadoop report.
See our list of best Hadoop vendors.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.