The 2023 Ventana Research Buyers Guide for Data Orchestration research enables me to provide observations about how the market has advanced.
Data orchestration is a concept that has been growing in popularity in the past five years amid the rise of DataOps, which describes more agile approaches to data integration and
At the highest level of abstraction, data orchestration covers three key capabilities: collection (including data ingestion, preparation and cleansing); transformation (additionally including integration and enrichment); and activation (making the results available to compute engines, analytics and data science tools, or operational applications).
This may sound very much like the tasks that data management practitioners have been undertaking for decades. As such, it is fair to ask what separates data orchestration from traditional approaches to data management.
Key to understanding why data orchestration is different, and necessary, is viewing data management challenges through the lens of modern data-processing requirements. Data-driven organizations stand to gain competitive advantage, responding faster to worker and customer demands for more innovative, data-rich applications and personalized experiences.
Being data-driven requires a combination of people, processes, information and technology improvements involving data culture, data literacy, data democracy, and data curiosity. Encouraging employees to discover and experiment with data is a key aspect of being data-driven that requires new, agile approaches to data management.
Meanwhile, the increasing reliance on real-time data processing is driving requirements for more agile, continuous data processing. Additionally, the rapid adoption of cloud computing has fragmented where data is accessed or consolidated, with data increasingly spread across multiple data centers and cloud providers.
Traditional approaches to data management are rooted in point-to-point batch data processing, whereby data is extracted from its source, transformed for a specific purpose, and loaded into a target environment for analysis. These approaches are unsuitable for the demands of modern analytics environments, which instead require agile data pipelines that can traverse multiple data-processing locations and can evolve in response to changing data sources and business requirements.
Given the increasing complexity of evolving data sources and requirements, there is a need to enable the flow of data across the organization through new approaches to the creation, scheduling, automation and monitoring of workflows. This is the realm of data orchestration, although the key capabilities of data orchestration will be familiar to existing data practitioners. Specific tasks related to these capabilities have traditionally been addressed with a variety of tools as well as manual effort, hand-coded scripts and expertise.
In comparison, data orchestration tools are designed to automate and coordinate the sequential or parallel execution of a complete set of tasks via data pipelines, typically based on directed acyclic graphs (DAGs) that represent the relationships and dependencies between the tasks. The capabilities delivered by data orchestration fall under three categories: pipeline monitoring, pipeline management, and workflow management.
As is often the case with new approaches to data and analytics, the requirements for data orchestration were first experienced by digital-native brands at the forefront of data-driven business strategies. One of the most prominent data orchestration tools, Apache Airflow, began as an internal development project within Airbnb, becoming an Apache Software Foundation project in 2016; workflow automation platform Flyte was originally created and subsequently open-sourced by Lyft; and Metaflow was developed and open-sourced by Netflix.
Data orchestration is not just for digital natives, however, and a variety of vendors have sprung up with offerings based around these open-source projects, as well as other development initiatives, to bring the benefits of data orchestration to the masses.
In addition to stand-alone data orchestration software products and cloud services, data orchestration capabilities are also being built into larger data-engineering platforms addressing broader data management requirements, including data observability, often in the context of data fabric and data mesh.
Whether stand-alone or embedded in larger data-engineering platforms, data orchestration has the potential to drive improved efficiency and agility in data and analytics projects. Data orchestration addresses one of the most significant impediments to generating value from data. More than two-thirds (69%) of participants in Ventana Research’s Analytics and Data Benchmark Research cite preparing data for analysis as the most time-consuming task in analyzing data.
Adoption of data orchestration is still in the early stages and is closely linked to larger data transformation efforts that introduce greater agility and flexibility. However, by 2026, more than one-half of organizations will adopt data orchestration technologies to automate and coordinate data workflows and increase efficiency and agility in data and analytics projects.
If an organization’s data processes and skills remain rooted in traditional products and manual intervention, then data orchestration is not likely to be a quick fix. However, alongside the cultural and organizational changes involved in people, processes, and information improvements, data orchestration has the potential to play a key role in the technological improvement involved in becoming more data-driven. All organizations are recommended to investigate the potential advantages of data orchestration with a view to improving their use of data and analytics.
This research evaluates the following vendors that offer products that address key elements of data orchestration as we define it: Alteryx, AWS, Astronomer, BMC, Databricks, DataKitchen, Google, Hitachi Vantara, IBM, Infoworks.io, Matillion, Microsoft, Prefect, Rivery, Saagie, SAP, Stonebranch, StreamSets and Y42.
You can find more details on our site as well as in the Buyers Guide Market Report.
Regards,
Matt Aslett