Fast data isn’t really fast, it’s just data that we decide we want to engage with faster than some other bits of data,
What fast data really is… is real-time (or near real-time data) that requires instant awareness, faster decision-making and immediate action.
That hasn’t stopped the industry attempting to coin the phrase itself and talking about the need to build ‘real-time data pipelines’ to channel this so-called fast data through towards the point of analysis, action or exposure to deeper algorithmic logic.
For you, blue
One of the players in this market is BlueData, the firm’s latest software is focused on real-time data pipelines with Spark Streaming, Kafka and Cassandra.
This company claims to have produced a new turnkey offering designed for developing and testing applications that analyse fast data.
Use cases of fast data rising
BlueData contends that fast data use cases are emerging in ‘almost every industry’ — such as: use ranging from fraud detection for financial transactions; to Internet of Things (IoT) monitoring with sensor-generated data; to marketing campaign optimisation and real-time bidding in advertising technology.
Other cases include real-time analysis of these high-velocity data streams from financial markets, sensor data, machine logs, social media, mobile applications etc.
The idea is that fast data is perishable and may lose its operational value in a very short time frame.
Speed is of the essence — hence the name.
For data scientists and developers working with real-time pipelines, the stack of Spark-Kafka-Cassandra has (BlueData argues) emerged as the good place to start.
The suggestion here is that that this new trinity of open source systems delivers on key requirements for Fast Data:
Spark: a fast in-memory data processing engine and the fastest growing Apache open source technology — spark Streaming is an extension of the core Spark API; it allows integration of real-time data from disparate event streams.
Kafka: a messaging system to capture and publish streams of data. With Spark you can ingest data from Kafka, filter that stream down to a smaller data set, augment the data and then push that refined data set to a persistent data store.
Cassandra: this data needs to be written to a scalable and resilient operational database like Cassandra for persistence, easy application development and real-time analytics.
“Batch processing of large datasets was the start for many Big Data analytics initiatives. But now there’s growing demand from organisations analysing real-time ‘data in motion’ in addition to the more traditional batch-oriented ‘data at rest’ use cases,” said Kumar Sreekanti, CEO of BlueData.
“For real-time data pipelines, we’ve seen Spark Streaming together with Kafka and Cassandra emerge as a popular stack. BlueData makes it easy for enterprises to get started quickly with these new tools and technologies in a turnkey on-premises lab environment,” added Sreekanti.