Earlier this year, I wrote about the increasing importance of data observability, an emerging product category that takes advantage of machine learning (ML) and Data Operations (DataOps) to automate the monitoring of data used for analytics projects to ensure its quality and lineage. Monitoring the quality and lineage of data is nothing new. Manual tools exist to ensure that it is complete, valid and consistent, as well as relevant and free from duplication. Data observability vendors, including Monte Carlo Data, have emerged in recent years with the goal of increasing the productivity of data teams and improving organizations’ trust in data using automation and artificial intelligence and machine learning (AI/ML).
Monte Carlo was founded in 2019 by CEO Barr Moses and Chief Technology Officer Lior Gavish, who were previously VP of customer operations at Gainsight and SVP of engineering at Barracuda, respectively. In those
Data observability may be a new term, but the benefits of automating data quality mean that it is unlikely to be a passing fad. I assert that through 2025, data observability will continue to be a priority for the evolution of DataOps
At the heart of the platform is ML-powered monitoring, anomaly detection and notification, which automatically assesses fields and tables based on known issues or business rules to detect and alert on data freshness, volume and schema changes. Incident resolution is addressed by automated field-level lineage, root cause analysis and workflow tools, which can also be used to proactively make changes to data assets to prevent data-quality issues. Additionally, the identification of fields, tables, and queries that are unused or used inefficiently can be utilized to proactively manage compute and storage costs. Monte Carlo uses a Data Collector deployed in a customer’s Amazon Web Services (AWS) environment to extract metadata, logs and statistics from analytic data platforms and business intelligence (BI) tools, and it provides integration with data orchestration tools while notifications can also be sent to productivity tools and notification systems. Monte Carlo’s Data Observability Platform is delivered as a cloud-managed service and is targeted at data engineers, providing them with the visibility they need to detect, resolve and prevent data-quality and data-lineage issues. However, it is designed to ensure that organizations have a higher level of trust to drive data-driven decision-making by ensuring that data owners have greater visibility into how their data is used across the organization and data users have confidence in the integrity of the data used to make decisions.
Data observability is a new approach to an established problem, but it is by no means a matter of slapping a new label on existing data-quality products. Automation and intelligence are critical to data observability platforms in terms of the expanding volume of data to be monitored and efficiency compared to manual techniques. These factors are likely to become increasingly important to businesses as data volumes continue to grow and they become increasingly reliant on DataOps and the orchestration of data pipelines to support data-driven decision-making. I recommend that organizations exploring approaches to improving trust in data evaluate the emerging group of data observability providers, including Monte Carlo.
Regards,
Matt Aslett