Data Catalogs Serve Multiple Roles and Use Cases

Written by Matt Aslett | Jan 29, 2025 11:00:00 AM

Metadata management has played a role in data governance and analytics for many years. It wasn’t until the emergence of the data catalog as a product category just over a decade ago that enterprises had a platform for metadata-driven data management that could span multiple departments and use cases across an entire enterprise.

ISG defines a data catalog as an inventory of data assets that surfaces metadata from data platforms, tools and applications that can be used to understand how and when data is produced and consumed. In theory, a single data catalog can provide an inventory of data that spans multiple departments, use cases and initiatives, and ISG’s Data Governance Benchmark Research shows a correlation between a higher proportion of data catalog users and improved confidence in the organization’s ability to govern and manage data across the business.

Data catalog functionality has been incorporated into numerous data management, data governance and data platform products to the extent that enterprises have multiple catalogs of data across numerous domains and repositories, perhaps with a “catalog of catalogs” providing higher-level insight. There is a danger that enterprises with multiple catalogs are repeating past mistakes of creating data silos. However, it is important to recognize that the term data catalog applies to various products that serve discreet use cases and user roles. As such, it may be desirable for an enterprise to have multiple connected catalogs, or at least numerous interfaces to a catalog, that serve the requirements of different users.

In our view, there are four main types of data catalogs, including:

Technical data catalogs, representing the original and fundamental data catalog functionality of a metadata repository that scans the enterprise’s data estate and extracts technical metadata to provide an inventory of the data’s location, structure and schema. This inventory can be used by data administrators and data engineers to discover, manage and optimize the data while also providing insights on data usage, data lineage and data quality, as well as security and access control. While there are standalone technical data catalog products, this technology also forms the base layer of functionality used by other types of data catalogs and represents the core capabilities embedded in other data management products.
Business data catalogs, providing functionality that relies upon and builds on the underlying technical metadata and access control capabilities with an additional layer that provides business metadata related to the context, meaning and relevance of the data to business domains and applications. This layer of capability enables self-service discovery and access to data by business users and data analysts using natural language search. In addition to technical metadata, business data catalogs also provide business metadata, such as glossaries, descriptions and classification, as well as information related to data lineage and data quality.
Data governance catalogs, providing an interface for data stewards, data quality and data governance professionals focused on ensuring the enterprise fulfills its data governance and regulatory requirements. In addition to the data usage, data lineage, data quality, data security and access control capabilities of the underlying technical catalog, these users require additional functionality to define and manage data usage policies, view and manage data profiles, determine and administer data quality rules and define and administer data models and master data definitions. While there are standalone data governance catalog products, this functionality is also delivered by providers of technical and business data catalogs with dedicated interfaces for data stewards, data quality and data governance professionals.
Data intelligence catalogs, representing the evolution of business data catalogs and combined technical metadata, business metadata and data governance capabilities with knowledge graph functionality to deliver a holistic, business-level view of data production and consumption. Data intelligence catalogs help data administrators understand the use of data in reports and dashboards and provide chief analytics and chief data officers with key metrics on data production and consumption, including the value generated by data initiatives. Data intelligence catalogs also increasingly provide functionality to enable the development, sharing and management of data products.

Additionally, we sometimes see the term data catalog used in relation to a variety of other products, including analytics catalogs used to provide a portal for business analysts to discover and access business intelligence reports and dashboards; metrics stores used to provide a consistent layer of metrics for consumption by business intelligence reports and dashboards; feature stores used to store curated features for developing machine learning applications; and model catalogs or model gardens which store curated models for the development of Generative AI applications. These capabilities could be addressed by standalone products but also incorporated into data intelligence catalogs.

ISG’s 2024 Buyers Guide for Data Intelligence illustrates the relative maturity of the functionality available across the various data catalog products. Almost all (96%) software providers evaluated were graded at A- or above for data source connectivity, creating and maintaining profiles of data assets and creating a search-based inventory of data assets. While this core data catalog functionality has become pervasive, a second category of capabilities has also become widely adopted by data catalog providers. For example, more than three-quarters (78%) of software providers evaluated were graded at A- or above for both viewing and managing technical data lineage and creating, modifying and deleting a business glossary, while 77% achieved A- or above for automated generation of metadata from data sources and 74% for self-service data discovery by data consumers.

I assert that through 2027, enterprises will increase strategic focus on data catalogs as the intersection of data production and consumption, enabling the self-service creation and sharing of data products based on trusted and governed data sources. As a result, capabilities that enable the creation of data products and a more holistic view of data production and consumption will be in high demand. ISG’s Buyers Guide for Data Intelligence also highlighted that these provide significant opportunities for differentiation as data catalogs evolve towards providing data intelligence. For example, only 57% of software providers evaluated were graded at A- or above for applying product thinking to the creation of data products, 52% for metrics and scorecards related to data usage and 44% for the incorporation of knowledge graph capabilities to map the relationships between data and entities.

Additionally, the use of artificial intelligence for data cataloging continues to provide a significant opportunity for differentiation. Less than one-third (30%) of the software providers evaluated ISG’s Buyers Guide for Data Intelligence graded at A- or above for the use of AI for data asset descriptions. Only 26% did so for the use of AI for metadata generation and only 17% for the use of AI for data usage recommendations. I recommend that all enterprises evaluate the current use of data catalogs based on the various user requirements and assess potential providers with a view to ensuring governed data sharing across the organization and a holistic, business-level view of data production and consumption.

Regards,

Matt Aslett

View full post