Enterprises looking to adopt cloud-based data processing and analytics face a disorienting array of data storage, data processing, data management and analytics offerings. Departmental autonomy, shadow IT, mergers and acquisitions, and strategic choices mean that most enterprises now have the need to manage data across multiple locations, while each of the major cloud providers and data and analytics vendors has a portfolio of offerings that may or may not be available in any given location. As such, the ability to manage and process data across multiple clouds and data centers is a growing concern for large and small enterprises alike. Almost one-half (49%) of respondents to Ventana Research’s Analytics and Data Benchmark Research study are using cloud computing for analytics and data, of which 42% are currently using more than one cloud provider.
Google Cloud has a broad portfolio of data and analytics offerings that spans data storage and processing, data management and data governance, as well as analytics and machine learning (ML). Recently, the company has sought to provide a more coherent message by describing this portfolio as a unified data cloud. Google has also notably stepped up its multi-cloud and hybrid architecture credentials as well. Google Cloud's recent Next ’21 customer event saw the company announce the general availability of several new offerings that lend credence to Google Cloud’s ability to both deliver a single unified data cloud and address hybrid and multi-cloud requirements. As my colleague asserts, by 2024, over two-thirds of organizations will operate across multiple public cloud computing environments, necessitating the requirement for a unified data platform to virtualize access for business continuity.
While BigQuery can be thought of as the primary engine of a unified data cloud, another important announcement at Next ’21 was the general availability of Google Cloud Dataplex, which is designed to deliver a unified data fabric that supports integrated data discovery, security, cataloging and analysis across data lakes, data warehouses and data marts. Google Cloud Dataplex provides templates for ingesting data into Google Cloud Storage and BigQuery using multiple approaches, such as Dataflow (for stream and batch processing), Data Fusion (ETL/ELT), Dataproc (big data) and Pub/Sub (event processing).
Dataplex automatically harvests the related metadata into a metastore that is used for search and discovery, while the metadata can also be published to BigQuery, Dataproc, and Data Catalog. Dataplex also provides integration with Google Cloud Vertex AI for training and deploying ML models, and Looker for business intelligence (BI). Next ’21 also saw a variety of new analytics announcements, including the preview of the Vertex AI Workbench, a serverless notebook that offers native integration with BigQuery, Dataproc, and Apache Spark. Additionally, Google also announced the preview release of a new serverless Spark service that can be leveraged via BigQuery, Dataproc, Dataplex, and Vertex AI.
While these announcements demonstrated the interwoven nature of Google’s data cloud portfolio, the preview release of Google Distributed Cloud was also a significant announcement given the need for enterprises to process and manage data across multiple data centers and cloud providers. Google Distributed Cloud provides a portfolio of managed hardware and software, based on Anthos, for deployment in customer data centers or on edge locations. From a data perspective, Google Distributed Cloud will be particularly relevant in relation to data sovereignty, data security and data privacy requirements, which are among the primary reasons for enterprises to retain data workloads on-premises and/or in specific regional data centers.
Google Cloud’s embrace of hybrid IT and multi-cloud in recent years is, in part, motivated by providing differentiation against the company’s major cloud rivals. However, it is also driven by a response to customer demand given that there are a variety of workloads that are not suitable for migration to, or across, clouds which is often due to data security, data privacy, data sovereignty and latency requirements. Google Distributed Cloud is only in preview at this stage, and potential adopters will be looking for greater detail from the company about which Google Cloud data services it will support. Nevertheless, I recommend that enterprises considering their options for creating a unified approach to data management that spans multiple data centers and cloud providers evaluate Google Cloud’s capabilities when considering providers.
Regards,
Matt Aslett