ISG Software Research Analyst Perspectives

Acceldata Enables Data Observability

Written by Matt Aslett | Jan 10, 2023 11:00:00 AM

Data observability is a hot topic and trend. I have written about the importance of data observability for ensuring healthy data pipelines, and have covered multiple vendors with data observability capabilities, offered both as standalone and part of a larger data engineering system. Data observability software provides an environment that takes advantage of machine learning and DataOps to automate the monitoring of data quality and reliability. The term has been adopted by multiple vendors across the industry, and while they all have key functionality in common – including collecting and measuring metrics related to data quality and data lineage – there is also room for differentiation. A prime example is Acceldata, which takes a position that data observability requires monitoring not only data and data pipelines but also the underlying data processing compute infrastructure as well as data access and usage.

Acceldata was founded in 2018 by former executives and engineers of Apache Hadoop-specialist Hortonworks. The founders identified an opportunity to help organizations monitor and manage the reliability of data pipelines and data infrastructure by developing a product to address data infrastructure scaling and performance issues. Acceldata’s research and development has been fueled by $43.5 million in funding, including $8.5 million Series A funding and, most recently a $35 million Series B round provided by Insight Partners, March Capital, Lightspeed, Sorenson Ventures and Emergent Ventures.

Acceldata’s focus on data infrastructure scaling and performance issues is a differentiator among data observability specialists, many of which are focused specifically on automating data quality and lineage monitoring. It also manages the cost and performance of data projects, and monitors the quality and reliability of the data itself.

Data observability as a product segment is still nascent but is attracting standalone data observability software specialists as well as the inclusion of data observability functionality in wider data-operations platforms. I assert that, through 2025, data observability will continue to be a priority for the evolution of data operations products as vendors deliver more automated approaches to data engineering and improving trust in enterprise data.

The general availability of Acceldata Data Observability Cloud was announced in August 2022 and provides a platform for monitoring data compute infrastructure, reliability, pipelines and users. It offers alerts, audits and reports for data platforms, including Databricks, Hadoop, Kafka and Snowflake. It provides monitoring for anomalies, such as missing, late or erroneous data as well as information related to cost controls and predictions.

While Data Observability Cloud is available as a cloud managed service, Acceldata recently announced a new open source version of its data platform, as well as six tools and utilities used to support its data observability functionality, enabling organizations to begin building data products without upfront licensing or subscription costs. The company also offers two self-managed software services: Pulse and Torch. Pulse provides standalone data infrastructure monitoring for Hadoop environments that delivers utilization, scheduling and capacity planning capabilities as well as performance optimization recommendations for Apache Spark jobs and Apache Hive queries. Torch is a standalone product for data profiling, automated data quality management and data pipeline monitoring that provides data reliability observability, including data reconciliation and data drift and anomaly detection. Torch delivers automated alerts based on user-defined data quality policies as well as artificial intelligence-based recommendations to remediate data quality issues. Providing both data infrastructure and data reliability observability enables Acceldata to correlate data for improved root-cause analysis and also expands the applicability of Acceldata beyond data engineers and IT professionals to include system resource engineers and data architects.

As organizations strive to become more data-driven, the ability to rely on data used to make business decisions is more important than ever. Building trust in data requires the orchestration of data pipelines to automate and accelerate the flow of data from multiple sources to support analytics initiatives and drive business value. Data observability ensures that data used for analytics and governance projects is fit for its purpose. The monitoring of data quality and data lineage is already well-established as a data management discipline to ensure that data used for business decision-making is reliable.

Traditionally, data quality software has provided users an environment to manually check and correct data quality issues. This can be time-consuming, delaying time to insight. Almost two-thirds (64%) of participants in our Analytics and Data Benchmark Research cited reviewing data for quality issues as being one of the most time-consuming aspects of analytics initiatives, second only to preparing data for analysis. In contrast, data observability takes advantage of machine learning and DataOps to automate the monitoring of data used for analytics projects to ensure that it is complete, valid and consistent as well as relevant and free from duplication.

Adoption of data observability software remains nascent, but it has been a hot topic during 2022 due to the importance of ensuring the quality of data used in analytic projects as well as the reliability of data infrastructure and data pipelines. These factors are likely to become increasingly important to businesses as data volumes continue to grow and organizations become increasingly reliant on data-driven decision-making. The launch of Acceldata’s Open Source Data Platform also has the potential to fuel adoption for data observability in general, and the Data Observability Cloud managed service in particular. I recommend that organizations exploring approaches to improve the reliability of data infrastructure and trust in data evaluate Acceldata.

Regards,

Matt Aslett