It is well known that data integration, transformation and preparation represent a significant proportion of the time and effort required in any analytics project. Traditionally, operational data platforms are designed to store, manage, and process data to support worker-, customer- and partner-facing operational applications, and data is then extracted, transformed, and loaded (or “ETLed”) into a separate analytic data platform, which is designed to store, manage, process, and analyze data. More than two-thirds (69%) of participants in Ventana Research’s Analytics and Data Benchmark Research cite preparing data for analysis as the most time-consuming aspect of the analytics process. Reducing the time and effort spent on data integration and preparation can significantly accelerate time to business insight. In that context, it is no surprise the concept of zero-ETL integration has generated a lot of interest among enterprises in recent years.
As highlighted by the 2023 Ventana Research Buyers Guide for Data Pipelines, the development, testing and deployment of data pipelines is essential to generating
The claim is that zero-ETL makes operational data available instantly for real-time analytics, which could be useful for including artificial intelligence and machine learning (AI/ML)
Enterprises concerned about vendor lock-in should also be aware that zero-ETL offerings introduced to date only provide for point-to-point integration between specific data platforms that cannot be replicated with alternative databases. The concept of zero-ETL integration was popularized by Amazon Web Services, which introduced continuous replication of data between its Amazon Aurora MySQL operational database and its Amazon Redshift analytic database at its re:Invent customer event in 2022. The company then followed that up at re:Invent 2023 when it announced zero-ETL integration to Amazon Redshift from Amazon Aurora PostgreSQL, Amazon DynamoDB, and Amazon RDS for MySQL, as well as zero-ETL integration between Amazon DynamoDB and Amazon OpenSearch Service. Other vendors that have embraced zero-ETL include Salesforce, which introduced zero-ETL integration between Salesforce Data Cloud and Snowflake and Databricks in September 2023, while the introduction of Couchbase Capella columnar service promised zero-ETL integration between operational and analytic nodes. Additionally, Google described the ability to federate queries to Google Cloud Bigtable and from Google Cloud BigQuery as zero-ETL. Many other vendors offer products that automatically replicate data from source to target without the need for up-front transformation or enable virtual or federated queries to be performed on data in external data platforms. Given the level of interest in zero-ETL and the industry’s love of buzzwords, we expect many of these to be rebranded as providing zero-ETL integration.
Any technology that reduces the time and effort spent on data integration and preparation can accelerate time to business value from analytic projects. The implied advantages of zero-ETL make it an attractive proposition for accelerating real-time analytics on operational data in support of intelligent operational applications. I recommend that enterprises include zero-ETL capabilities in their evaluations, while also being aware that there are significant limitations of the approach that make it unsuitable for other use cases with broader analytics, data management and data governance requirements.
Regards,
Matt Aslett