I have written on multiple occasions about the increasing proportion of enterprises embracing the processing of streaming data and events alongside traditional batch-based data processing. I assert that, by 2026, more than three-quarters of enterprises’ standard information architectures will include streaming data and event processing, allowing enterprises to be more responsive and provide better customer experiences.
Although many enterprises have adopted products to store and process data in motion, often those systems are deployed in parallel to those used to store and process data at rest rather than enabling a holistic view of all
Confluent was founded in 2014 by the creators of the open-source Apache Kafka distributed event streaming platform. Based on a publish-and-subscribe messaging model for communicating events and event streams, Apache Kafka was originally developed at LinkedIn to store and process data related to member activity as well as logs and metrics.
Apache Kafka has been widely adopted by thousands of enterprises to support real-time data processing by capturing event data from sensors, applications and databases and processing and analyzing it in real time as it flows through the organization. It also forms the basis of Confluent’s product portfolio, which includes the Confluent Platform distribution of Apache Kafka for self-managed deployment on-premises and in the cloud, as well as the Confluent Cloud managed service.
The company reported total revenue of $777 million in fiscal year 2023, an increase of 33% on $586 million the previous year, and forecasts total revenue of approximately $950 million in fiscal 2024. Revenue in the first quarter of 2024 was $217 million, up 25%. In addition to benefitting from increased adoption of streaming data and event processing in general as well as expansion and greater maturity among existing customers, the company has also expanded its addressable market by developing capabilities for streaming data governance as well as adding stream processing and analytics capabilities through the early 2023 acquisition of Immerok, one of the primary companies behind the Apache Flink stream processing engine.
As I previously described, Confluent Cloud is more than just a hosted version of Apache Kafka. The company’s Kora engine was designed to provide a cloud-native experience for Kafka, including support for tiered storage, elastic scaling, high availability and improved performance. Additionally, the company has invested in security and governance capabilities, with its Stream Governance suite providing capabilities for schema management and data quality as well as self-service data discovery and classification and stream lineage. The acquisition of Immerok further extended the differentiation of Confluent Cloud with the addition of the Confluent Cloud for Apache Flink serverless stream processing service, which provides an engine for performing stateful computations on unbounded and bounded streams of events.
Confluent Cloud automatically interprets Apache Kafka topics as Apache Flink tables, enabling users to create applications for SQL-based streaming and batch analytics of event data, including filtering, joining and enriching data streams without specialist Flink expertise. Confluent Cloud for Apache Flink provides the SQL Workspaces user interface for workers to write SQL statements executed against a serverless managed Apache Flink compute pool.
The general availability of Confluent Cloud for Apache Flink was announced at the Kafka Summit London in March of this year, where Confluent also unveiled Tableflow. A new feature on Confluent Cloud, Tableflow automatically materializes Apache Kafka topics and schemas as Parquet files to be persisted in a data warehouse, data lake or cloud storage using the Apache Iceberg open table format. Tableflow ensures Iceberg tables are continuously updated with the latest streaming data and enables batch processing of historical event data using Iceberg-compatible SQL analytics engines. The streaming data stored as Iceberg tables can also be consumed as Kafka topics.
More recently, Confluent announced the addition of AI Model Inference to Confluent Cloud for Apache Flink, enabling enterprises to incorporate machine learning into streaming data pipelines. Now available for early access, AI Model Inference is designed to allow users to create SQL statements using Confluent Cloud for Apache Flink to calls to external artificial intelligence services, including Amazon SageMaker, Google Cloud Vertex, Microsoft Azure and OpenAI to coordinate data processing and AI workflows and ensure that AI models have access to streaming data as it is updated in real time.
As I previously stated, the execution of business events has always occurred in real time. Batch processing is an artificial construct driven by the limitations of traditional data processing capabilities that require enterprises
Capabilities such as Confluent Cloud for Apache Flink and Tableflow are changing assumptions about event and stream data processing. I would encourage enterprises evaluating data architecture to consider streaming data platforms and Confluent Cloud alongside more traditional data platforms to provide a holistic view of all data—in motion and at rest.
Regards,
Matt Aslett