Metadata management has played a role in data governance and analytics for many years. It wasn’t until the emergence of the data catalog as a product category just over a decade ago that enterprises had a platform for metadata-driven data management that could span multiple departments and use cases across an entire enterprise.
ISG defines a data catalog as an inventory of data assets that surfaces metadata from data platforms, tools and applications that can be used to understand how and when data is produced and consumed. In theory, a single data catalog can provide an inventory of data that spans multiple departments, use cases and initiatives, and ISG’s Data Governance Benchmark Research
Data catalog functionality has been incorporated into numerous data management, data governance and data platform products to the extent that enterprises have multiple catalogs of data across numerous domains and repositories, perhaps with a “catalog of catalogs” providing higher-level insight. There is a danger that enterprises with multiple catalogs are repeating past mistakes of creating data silos. However, it is important to recognize that the term data catalog applies to various products that serve discreet use cases and user roles. As such, it may be desirable for an enterprise to have multiple connected catalogs, or at least numerous interfaces to a catalog, that serve the requirements of different users.
In our view, there are four main types of data catalogs, including:
Additionally, we sometimes see the term data catalog used in relation to a variety of other products, including analytics catalogs used to provide a portal for business analysts to discover and access business intelligence reports and dashboards; metrics stores used to provide a consistent layer of metrics for consumption by business intelligence reports and dashboards; feature stores used to store curated features for developing machine learning applications; and model catalogs or model gardens which store curated models for the development of Generative AI applications. These capabilities could be addressed by standalone products but also incorporated into data intelligence catalogs.
ISG’s 2024 Buyers Guide for Data Intelligence illustrates the relative maturity of the functionality available across the various data catalog products. Almost all (96%) software providers evaluated were graded at A- or above for data source connectivity, creating and maintaining profiles of data assets and creating a search-based inventory of data assets. While this core data catalog functionality has become pervasive, a second category of capabilities has also become widely adopted by data catalog providers. For example, more than three-quarters (78%) of software providers evaluated were graded at A- or above for both viewing and managing technical data lineage and creating, modifying and deleting a business glossary, while 77% achieved A- or above for automated generation of metadata from data sources and 74% for self-service data discovery by data consumers.
I assert that through 2027, enterprises will increase strategic focus on data catalogs as the intersection of data production and consumption, enabling the self-service creation and sharing of data products based on trusted and governed data sources.
Additionally, the use of artificial intelligence for data cataloging continues to provide a significant opportunity for differentiation. Less than one-third (30%) of the software providers evaluated ISG’s Buyers Guide for Data Intelligence graded at A- or above for the use of AI for data asset descriptions. Only 26% did so for the use of AI for metadata generation and only 17% for the use of AI for data usage recommendations. I recommend that all enterprises evaluate the current use of data catalogs based on the various user requirements and assess potential providers with a view to ensuring governed data sharing across the organization and a holistic, business-level view of data production and consumption.
Regards,
Matt Aslett