I previously wrote about the potential for rapid adoption of the data lakehouse concept as enterprises combined the benefits of data lakes based on low-cost cloud object storage with the structured data processing functionality normally associated with data warehousing. By layering support for table formats, metadata management and transactional updates and deletes as well as query engine and data orchestration functionality on top of low-cost storage of both structured and unstructured data, the data lakehouse enables enterprises to not only store and process data from multiple applications, but also enable it to be analyzed by multiple users in multiple departments for many purposes, including business intelligence and artificial intelligence. Vendors such as Dremio have added capabilities to the core concept to better equip enterprises to rely on the use of data lakehouses for self-service analytics and AI.
Dremio was founded in 2015 to build a business around the Apache Arrow in-memory columnar data format, which was developed to enable high-performance analysis of large volumes of data. Apache Arrow underpins the company’s SQL Query Engine, which is designed to deliver high-performance BI and interactive analytics directly on the data stored in a data lake or other data platforms across cloud, on-premises or hybrid environments. The SQL Query Engine is one of three core components of Dremio’s Unified Lakehouse Platform, alongside governed self-service analytics and data lakehouse management based on Dremio’s data catalog for the Apache Iceberg table format. In combination, these capabilities are designed to enable enterprises to connect and govern data in on-premises and cloud data lakes as well as other data sources across the database estate and make it available to data analysts and business users to access and analyze on a self-service basis.
With customers in a variety of industries including, financial services, healthcare, retail, manufacturing and consumer packaged goods, Dremio has raised more than $400 million in funding from the likes of Adams Street Partners, Cisco Investments, Insight Partners, Lightspeed Venture Partners, Norwest Venture Partners, and Sapphire Ventures. Most recently, Dremio raised a $160 million Series E funding round in January 2022, which valued the company at over $2 billion.
It is common for enterprises to create data lake environments to persist structured and unstructured data in object storage, either on-premises or in the cloud. More than one-half (53%) of participants in Ventana Research’s
Meanwhile, data lakehouse vendors have integrated the functionality associated with data warehousing into the data lake itself. This includes distributed SQL query engines; support for atomic, consistent, isolated and durable transactions; updates and deletes; concurrency control; metadata management; data indexing; data caching; schema enforcement and evolution; query acceleration; semantic models; data governance; version control; access control and auditing.
Dremio’s Unified Lakehouse Platform is available as software for deployment on-premises and in the cloud as well as a cloud service. It is made up of three core sets of capabilities addressing SQL query processing and acceleration, lakehouse management and unified analytics. For SQL query processing and acceleration, the platform’s SQL Query Engine enables the processing and transformation of data in cloud data lakes as well as federated querying of metastores and databases on-premises and in the cloud. SQL Query Engine enables users to create virtual tables known as Views from the source data for query acceleration. It also offers pre-computed data summaries, known as Reflections, that accelerate complex aggregations and other operations as well as using Columnar Cloud Cache for in-memory data processing.
Dremio’s lakehouse management capabilities provide a data catalog based on the Apache Iceberg table format that can be accessed using SQL Query Engine as well as other query engines such as Apache Spark or Apache Flink.
While Dremio has always offered features and functionality of value to data engineers, the recent addition of lakehouse management capabilities enables the company to articulate a larger value proposition for technology decision-makers that addresses the advantages of self-service analytics and AI. I anticipate further investment in generative AI capabilities, such as vector search and automated semantic data modeling. I recommend that any organization considering the data lakehouse approach evaluate Dremio’s Unified Lakehouse Platform when evaluating options to take advantage of its combination of query acceleration and data management.
Regards,
Matt Aslett