I have previously written about the importance of data democratization as a key element of a data-driven agenda. Removing barriers that prevent or delay users from gaining access to data enables it to be treated as a product that is generated and consumed, either internally by employees or externally by partners and customers. This is particularly important for organizations adopting the data mesh approach to data ownership, access and governance. Data mesh is an organizational and cultural approach to data, rather than a technology platform. Nevertheless, multiple vendors are increasingly focused on providing products that facilitate adoption of data mesh and promote data democratization. Amazon Web Services is one such vendor, thanks to the recent launch of Amazon DataZone, one of the figurehead analytics and data announcements made during the company’s recent re:Invent customer event.
Few trends have had a bigger impact on the computing landscape in recent decades than the emergence of cloud computing. Having pioneered the concept, Amazon Web Services was initially created to support the e-commerce requirements of Amazon.com but began making its services available to other organizations in 2006 and now offers more than 200 cloud services to millions of customers from data centers, providing 99 availability zones in 31 regions around the globe. Analytics and data form an integral part of the AWS cloud services portfolio, with the company providing multiple services addressing data platform, data management, analytics and machine learning (ML) requirements. AWS has always emphasized the benefits of customer choice: It has 11 different databases in its cloud databases portfolio as well as more than 20 data processing, data management, analytics and ML services in its analytics portfolio. Integrating and governing data dispersed across multiple services, availability zones and regions can be a challenge, however, and during the company’s re:Invent technology event in late 2022, there was an increased emphasis on unifying capabilities to reduce operational complexity, such as easier integration between Amazon Aurora and Amazon Redshift, improved integration with Apache Spark for Amazon Athena and Amazon Redshift, and the launch of Amazon DataZone.
The Amazon DataZone data catalog is designed to enable the management of, and self-service access to, data throughout an organization. That includes not only AWS’ cloud services but also other cloud and
Amazon DataZone is well-aligned with the four key principles of the data mesh concept: domain-oriented ownership, data as a product, self-serve data infrastructure and federated governance. Amazon DataZone
The high number of different data and analytics services in the AWS portfolio provides customers with a broad range of choices but can also result in a complex web of services that can be difficult to manage holistically. The introduction of Amazon’s DataZone is important in providing a unifying environment for data access and management and will facilitate the ability of AWS customers to govern and manage data across the business and improve access to data through data mesh initiatives. I recommend that all AWS customers using multiple data services evaluate the potential benefits of Amazon DataZone alongside other data catalog products and services.
Regards,
Matt Aslett