We performed a comparison between Apache Flink and Databricks based on real PeerSpot user reviews.
Find out in this report how the two Streaming Analytics solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."Apache Flink is meant for low latency applications. You take one event opposite if you want to maintain a certain state. When another event comes and you want to associate those events together, in-memory state management was a key feature for us."
"Allows us to process batch data, stream to real-time and build pipelines."
"It provides us the flexibility to deploy it on any cluster without being constrained by cloud-based limitations."
"Apache Flink's best feature is its data streaming tool."
"Apache Flink allows you to reduce latency and process data in real-time, making it ideal for such scenarios."
"The top feature of Apache Flink is its low latency for fast, real-time data. Another great feature is the real-time indicators and alerts which make a big difference when it comes to data processing and analysis."
"It is user-friendly and the reporting is good."
"Another feature is how Flink handles its radiuses. It has something called the checkpointing concept. You're dealing with billions and billions of requests, so your system is going to fail in large storage systems. Flink handles this by using the concept of checkpointing and savepointing, where they write the aggregated state into some separate storage. So in case of failure, you can basically recall from that state and come back."
"The time travel feature is the solution's most valuable aspect."
"One of the features provides nice interactive clusters, or compute instances that you don't really need to manage often."
"The most valuable feature of Databricks is the notebook, data factory, and ease of use."
"The built-in optimization recommendations halved the speed of queries and allowed us to reach decision points and deliver insights very quickly."
"It's very simple to use Databricks Apache Spark."
"The most valuable feature of Databricks is the integration with Microsoft Azure."
"It's easy to increase performance as required."
"Databricks is a unified solution that we can use for streaming. It is supporting open source languages, which are cloud-agnostic. When I do database coding if any other tool has a similar language pack to Excel or SQL, I can use the same knowledge, limiting the need to learn new things. It supports a lot of Python libraries where I can use some very easily."
"In a future release, they could improve on making the error descriptions more clear."
"Apache Flink should improve its data capability and data migration."
"In terms of improvement, there should be better reporting. You can integrate with reporting solutions but Flink doesn't offer it themselves."
"One way to improve Flink would be to enhance integration between different ecosystems. For example, there could be more integration with other big data vendors and platforms similar in scope to how Apache Flink works with Cloudera. Apache Flink is a part of the same ecosystem as Cloudera, and for batch processing it's actually very useful but for real-time processing there could be more development with regards to the big data capabilities amongst the various ecosystems out there."
"The solution could be more user-friendly."
"PyFlink is not as fully featured as Python itself, so there are some limitations to what you can do with it."
"We have a machine learning team that works with Python, but Apache Flink does not have full support for the language."
"In terms of stability with Flink, it is something that you have to deal with every time. Stability is the number one problem that we have seen with Flink, and it really depends on the kind of problem that you're trying to solve."
"The query plan is not easy with Databrick's job level. If I want to tune any of the code, it is not easily available in the blogs as well."
"The solution could be improved by adding a feature that would make it more user-friendly for our team. The feature is simple, but it would be useful. Currently, our team is more familiar with the language R, but Databricks requires the use of Jupyter Notebooks which primarily supports Python. We have tried using RStudio, but it is not a fully integrated solution. To fully utilize Databricks, we have to use the Jupyter interface. One feature that would make it easier for our team to adopt the Jupyter interface would be the ability to select a specific variable or line of code and execute it within a cell. This feature is available in other Jupyter Notebooks outside of Databricks and in our own IDE, but it is not currently available within Databricks. If this feature were added, it would make the transition to using Databricks much smoother for our team."
"Databricks doesn't offer the use of Python scripts by itself and is not connected to GitHub repositories or anything similar. This is something that is missing. if they could integrate with Git tools it would be an advantage."
"Scalability is an area with certain shortcomings. The solution's scalability needs improvement."
"Would be helpful to have additional licensing options."
"The integration and query capabilities can be improved."
"A lot of people are required to manage this solution."
"CI/CD needs additional leverage and support."
Apache Flink is ranked 5th in Streaming Analytics with 15 reviews while Databricks is ranked 2nd in Streaming Analytics with 78 reviews. Apache Flink is rated 7.6, while Databricks is rated 8.2. The top reviewer of Apache Flink writes "A great solution with an intricate system and allows for batch data processing". On the other hand, the top reviewer of Databricks writes "A nice interface with good features for turning off clusters to save on computing". Apache Flink is most compared with Spring Cloud Data Flow, Amazon Kinesis, Azure Stream Analytics, Apache Pulsar and Google Cloud Dataflow, whereas Databricks is most compared with Amazon SageMaker, Informatica PowerCenter, Dataiku, Dremio and Domino Data Science Platform. See our Apache Flink vs. Databricks report.
See our list of best Streaming Analytics vendors.
We monitor all Streaming Analytics reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.