Success with streaming data and events requires a more holistic approach to managing and governing data in motion and data at rest. The use of streaming data and event processing has been part of the data landscape for many decades. For much of that time, data streaming was a niche activity, however, with standalone data streaming and event-processing projects run in parallel with existing batch-processing initiatives, utilizing operational and analytic data platforms. I noted that there has been an increased focus on unified approaches that enable the holistic management and governance of data in motion alongside data at rest. One example is the recent emergence of streaming databases designed to combine the incremental processing capabilities of stream-processing engines with the SQL-based analysis and persistence capabilities of traditional databases.
Ventana Research’s Streaming Data Dynamic Insights enables organizations to assess their relative maturity in achieving value from streaming data. Data from Ventana Research’s Analytics and Data Benchmark Research
Stream-processing systems have, to date, primarily been used for ingestion of streaming data and events and streaming analytics. Streaming data ingestion enables event data to be cleaned and transformed as it is ingested, while streaming analytics allows users to query the event data in flight, enabling low-latency continuous analysis of data as it is generated. In both cases, once processed, the historical event data can then be stored in an external, relational or non-relational data platform for batch processing and analysis as well as integration with transactional data. Streaming analytics provides a real-time view, and batch processing provides a historical view. If an organization is to gain a complete picture, batch and streaming analytics need to be combined. A prime example is combining transactional and user behavior data to fully understand customer behavior in an online retail environment. Our research leads us to assert that, by 2025, more than 7 in 10 organizations’ standard information architectures will include streaming data and event processing, allowing organizations to be more responsive and provide better customer experiences. The emergence of a new breed of streaming databases could improve operational inefficiencies compared to approaches that rely on separate platforms for stream and batch processing. Streaming databases, including the likes of Confluent ksqlDB, DeltaStream, Materialize, RisingWave and Timeplus, are designed to continually process streams of event data using SQL queries and real-time materialized views and also persist historical event data for further analysis. Unlike streaming compute engines that persist the data in an external database, streaming databases are designed to provide native processing and persistence. As such, a single streaming database could be used as an alternative to a combination of (for example) Apache Flink and Apache Cassandra, reducing deployment, configuration, integration and management complexity.
I recently wrote about how real-time analytic data platforms could be used to develop and support data-intensive operational applications requiring sub-second latency. Streaming databases serve a similar purpose, with latency
Stream and batch processing both have advantages that lend themselves to specific use cases with different performance requirements. As such, they will continue to co-exist. However, there are also potential advantages to be had from unification. Organizations should evaluate vendors on their ability to deliver on a combination of those capabilities, whether delivered as one product or several products. I recommend that the potential use cases for streaming databases should be part of that evaluation process.
Regards,
Matt Aslett