I have written recently about increased demand for data-intensive applications infused with the results of analytic processes, such as personalization and artificial intelligence (AI)-driven recommendations. Almost one-quarter of respondents (22%) to Ventana Research’s Analytics and Data Benchmark Research are currently analyzing data in real time, with an additional 10% analyzing data every hour. There are multiple data platform approaches to delivering real-time data processing and analytics and more agile data pipelines. These include the use of streaming and event data processing, as well as the use of hybrid data processing to enable analytics to be performed on application data within operational data platforms. Another approach, favored by a group of emerging vendors such as Rockset, is to develop these data-intensive applications on a specialist, real-time analytic data platform specifically designed to meet the performance and agility requirements of data-intensive applications.
Rockset was founded in 2016 with the goal of enabling organizations to develop and deploy real-time data-driven applications. Prior to Rockset, the company’s founders had been involved in data engineering at Facebook and had played a key role in the development of the RocksDB open-source embedded database storage engine. RocksDB is also under the hood of Rockset’s cloud-based real-time analytics database service, playing an important role in supporting the real-time analytics alongside the company’s Converged Index approach and serverless auto-scaling architecture. The Rockset managed cloud service is available on Amazon Web Services and was launched in 2018 when the company also announced $21.5 million seed and Series A funding from Greylock Partners and Sequoia Capital. That was followed by $40 million in Series B funding from the same investors, announced in October 2020. Rockset has also attracted a set of customers that includes internet of things connectivity specialist 1NCE, construction logistics firm Command Alkon, gaming company eGoGames, health startup Ritual, wellness platform provider Rumble, and e-learning provider Seesaw. Rockset provides these and other customers with the ability to perform low-latency complex queries on high volumes of data to support high-concurrency applications. Example applications and use cases include personalization in retail, media and ad-tech, as well as customer experience in retail, manufacturing and customer support. Other key use cases include A/B testing, logistics and gaming.
I assert that through 2026, and despite increased demand for hybrid operational and analytic processing, more than three-quarters of data platform use cases will have functional requirements that encourage the use of specialized analytic or operational data platforms. Rockset’s database is an analytic data platform, as opposed to an operational database designed to support transactional workloads. However, it is not a data warehouse-style analytic database designed to support business intelligence (BI) dashboards and tools. Rather than being targeted at BI and analytics users, it is aimed at developers to support the creation of new real-time analytics applications. Rockset’s managed cloud service is designed to serve real-time applications by ingesting, indexing and querying operational data in real time, acting as an external secondary index of operational data to accelerate application-driven analytic queries of operational data. It does so by providing an index of data from multiple sources that can be queried via REST API by real-time applications executing pre-built and tested SQL queries. Rockset is based on an Aggregator-Leaf-Tailer (ALT) architecture in which disaggregated, independently scalable microservices are responsible for the ingestion (Tailer), indexing (Leaf), and querying (Aggregator) of data. Tailer microservices support data ingestion via native connectors for several operational data sources including NoSQL databases (MongoDB and Amazon DynamoDB), streaming data (Apache Kafka and Amazon Kinesis) and cloud storage (Amazon S3 and Google Cloud Storage). Data is ingested into Rockset in its native format (such as JSON, XML, Avro, and Parquet) and is indexed by Leaf microservices and persisted using a document data model without the need for predefined schema. The RocksDB-Cloud embedded storage engine automatically persists hot data in SSDs and cold data in cloud object storage (such as Amazon S3). The data is stored in a Converged Index which enables Rockset to support SQL-based querying of the document data model. Specifically, the Converged Index includes an inverted index, columnar index and row index of the data, enabling optimization for multiple access patterns, with the optimizer utilizing different indexes based on the nature of the query. As data is ingested into Rockset, it is indexed with all three indexes, using RocksDB-Cloud data compaction capabilities to alleviate data duplication and storage volume challenges associated with the use of multiple indexes. Aggregator microservices are responsible for query planning, optimization and execution, along with aggregations, filters, joins, sorting and grouping. Rockset also provides support for Rollups, which are utilized to aggregate data from multiple documents as they are ingested from Apache Kafka and Amazon Kinesis streams to reduce storage requirements and accelerate query performance. Query performance can also be accelerated using Views (saved SQL queries) and Query Lambdas (parameterized SQL queries that can be executed from a dedicated REST endpoint).
There are multiple approaches to delivering real-time data processing, so to gain more mainstream adoption Rockset will need to articulate not only how it compares with other real-time analytics databases, but also the business and technological reasons why organizations should consider investing in a dedicated real-time analytics database as opposed to an operational database with hybrid data processing capabilities, or a high-performance data warehouse. Nevertheless, I recommend that all organizations evaluating data platforms to support the development of real-time data-driven applications include Rockset in their investigations.
Regards,
Matt Aslett