ISG Software Research Analyst Perspectives

Cloudera Enables Hybrid Cloud Data and AI

Written by Matt Aslett | Jan 15, 2025 11:00:00 AM

Too often, enterprises find that data is distributed across multiple silos on-premises and in the cloud. More than two-thirds of participants in ISG’s Market Lens Cloud Study are using a hybrid architecture involving both on-premises and cloud infrastructure for analytics and artificial intelligence deployments. Unifying data to achieve operational and analytic objectives requires complex data integration and management processes. Fulfilling these processes requires a smorgasbord of tools aimed at professionals in a variety of roles with diverse skill sets, further increasing the cost and complexity of analytics and AI initiatives.

I previously explained how Cloudera was positioning itself and its Cloudera Data Platform as an enabler of versatile enterprise data strategies, thanks to its ability to support a variety of workloads, deployment locations and architectural approaches. The provider has recently accelerated that strategy through a combination of acquisitions and product development.

Cloudera was founded in 2008 to build a business around the Apache Hadoop data-processing framework. It enjoyed a rapid rise thanks to high levels of interest in the Hadoop project and big data, establishing itself as a primary data platform provider for Fortune 500 companies in industries such as financial services, retail, healthcare, telecommunications, manufacturing and energy/utilities along with government. Cloudera made its debut on the New York Stock Exchange in 2017 before merging with fellow Hadoop provider Hortonworks in 2019 amid market consolidation and a shift toward object storage as the persistence layer for data processing both on-premises and in the cloud.

Cloudera was acquired by investment firms Clayton, Dubilier & Rice and KKR for $5.3 billion in June 2021, enabling the software provider to avoid the glare of public markets as it transitioned its customer base away from established products to the combined Cloudera Data Platform on public and private cloud. The provider cites more than 25 exabytes of data under management using CDP and its ability to support data, analytics and AI workloads across a hybrid architecture as a key differentiator. CDP can serve as an operational and analytic data platform, with functionality to address data engineering, streaming data and analytics as well as machine learning and AI, including generative AI.

CDP is available for deployment on both private and public cloud infrastructure. Cloudera Private Cloud can be deployed on virtual private cloud infrastructure and provides services to address data engineering, data warehouse and AI workloads. CDP Public Cloud is available on AWS, Microsoft Azure and Google Cloud Platform and, in addition to data engineering, data warehouse and AI workloads, also provides services to address data movement, stream processing, data hub and operational database workloads. Although CDP is typically deployed as a data lakehouse, Cloudera’s security, governance and management capabilities also provide support to form a key element of a data fabric or data mesh approach that enables unification of data from across a distributed data estate.

One of the key technologies that Cloudera uses to enable that strategy is the Apache Iceberg table format. Cloudera first adopted Iceberg as a table format in CDP back in 2022 and now sees it as the unifying layer for analyzing data using multiple engines. Integration with the Iceberg REST Catalog was unveiled in August to enable interoperability with external data platforms that also implement Iceberg. In October 2024, Cloudera announced a partnership with Snowflake that enables Snowflake customers to use Apache Iceberg REST Catalog to gain access to Cloudera’s Data Lakehouse. That same month, Cloudera also introduced the technical preview of its Cloudera Lakehouse Optimizer to automate Iceberg table maintenance.

Cloudera supports widespread adoption of Apache Iceberg to commoditize data persistence and provide opportunities for the delivery of differentiated value through the use of metadata to enable enterprises to better understand how data is used across the organization. This perspective matches my assertion that through 2027, three-quarters of enterprises will be engaged in data intelligence initiatives to increase trust in their data by leveraging metadata to understand how, when and where data is used in the organization and by whom. Cloudera has long-provided data security, governance and metadata management capabilities through the shared data experience layer that underpins CDP. In August 2024, the provider also announced its intention to make the SDX capabilities available as a standalone product for the first time. Cloudera plans to deliver a cloud-native, containerized product that will enable users to manage and govern data across the enterprise state in both CDP and external data platforms. Cloudera’s metadata management capabilities have also been boosted by the November 2024 acquisition of Octopai and its data discovery, data lineage and data catalog capabilities.

The acquisition of Octopai was Cloudera’s second acquisition of the year, following its purchase of operational AI specialist Verta in June 2024. The acquisition of Verta added model catalog, model development, model monitoring and AI governance capabilities to Cloudera’s portfolio. The capabilities are available as part of the Cloudera AI offering (formerly known as Cloudera Machine Learning), which was also recently boosted by the launch of Cloudera AI Inference, based on NVIDIA NIM microservices, to accelerate the development of applications, agents and assistants based on GenAI models. The Cloudera AI portfolio also includes Accelerators for ML Projects—sample ML projects designed to accelerate AI development. In September 2024, Cloudera added new AMPs for fine-tuning, prompting and grounding AI models, as well as chat-based document query. Additionally, Cloudera unveiled the Cloudera Copilot for Cloudera AI in November 2024 to facilitate AI development. Cloudera Copilot for Cloudera AI complements the previously released SQL AI Assistant and AI Chatbot within Cloudera Data Visualization.

Cloudera remains best known as a data platforms provider and was designated as Innovative in ISG’s 2024 Data Platforms Buyers Guide. The software provider was designated as a Provider of Merit in ISG’s 2024 AI Platforms Buyers Guide and 2024 MLOps Buyers Guide. Cloudera has made great strides to improve its capabilities in relation to AI in recent months, and although its plans are nascent, the acquisition of Octopai combined with the SDX capabilities should see it stake a claim to be a provider of data intelligence. I recommend that enterprises evaluating options for strategic data providers to facilitate a versatile and hybrid cloud approach to data management include Cloudera in the evaluations.

Regards,

Matt Aslett