Too often, enterprises find that data is distributed across multiple silos on-premises and in the cloud. More than two-thirds of participants in ISG’s Market Lens Cloud Study are using a hybrid architecture involving both on-premises and cloud infrastructure for analytics and artificial intelligence deployments. Unifying data to achieve operational and analytic objectives requires complex data integration and management processes. Fulfilling these processes requires a smorgasbord of tools aimed at professionals in a variety of roles with diverse skill sets, further increasing the cost and complexity of analytics and AI initiatives.
I previously explained how Cloudera was positioning itself and its Cloudera Data Platform as an enabler of versatile enterprise data strategies, thanks to its ability to support a variety of workloads, deployment locations and architectural approaches.
Cloudera was founded in 2008 to build a business around the Apache Hadoop data-processing framework. It enjoyed a rapid rise thanks to high levels of interest in the Hadoop project and big data, establishing itself as a primary data platform provider for Fortune 500 companies in industries such as financial services, retail, healthcare, telecommunications, manufacturing and energy/utilities along with government. Cloudera made its debut on the New York Stock Exchange in 2017 before merging with fellow Hadoop provider Hortonworks in 2019 amid market consolidation and a shift toward object storage as the persistence layer for data processing both on-premises and in the cloud.
Cloudera was acquired by investment firms Clayton, Dubilier & Rice and KKR for $5.3 billion in June 2021, enabling the software provider to avoid the glare of public markets as it transitioned its customer base away from established products to the combined Cloudera Data Platform on public and private cloud. The provider cites more than 25 exabytes of data under management using CDP and its ability to support data, analytics and AI workloads across a hybrid architecture as a key differentiator. CDP can serve as an operational and analytic data platform, with functionality to address data engineering, streaming data and analytics as well as machine learning and AI, including generative AI.
CDP is available for deployment on both private and public cloud infrastructure. Cloudera Private Cloud can be deployed on virtual private cloud infrastructure and provides services to address data engineering, data warehouse and AI workloads. CDP Public Cloud is available on AWS, Microsoft Azure and Google Cloud Platform and, in addition to data engineering, data warehouse and AI workloads, also provides services to address data movement, stream processing, data hub and operational database workloads. Although CDP is typically deployed as a data lakehouse, Cloudera’s security, governance and management capabilities also provide support to form a key element of a data fabric or data mesh approach that enables unification of data from across a distributed data estate.
One of the key technologies that Cloudera uses to enable that strategy is the Apache Iceberg table format. Cloudera first adopted Iceberg as a table format in CDP back in 2022 and now sees it as the unifying layer for analyzing data using multiple engines. Integration with the Iceberg REST Catalog was unveiled in August to enable interoperability with external data platforms that also implement Iceberg. In October 2024, Cloudera announced a partnership with Snowflake that enables Snowflake customers to use Apache Iceberg REST Catalog to gain access to Cloudera’s Data Lakehouse. That same month, Cloudera also introduced the technical preview of its Cloudera Lakehouse Optimizer to automate Iceberg table maintenance.
Cloudera supports widespread adoption of Apache Iceberg to commoditize data persistence and provide opportunities for the delivery of differentiated value through the use of metadata to enable enterprises to better understand how data is used across the organization.
The acquisition of Octopai was Cloudera’s second acquisition of the year, following its purchase of operational AI specialist Verta in June 2024. The acquisition of Verta added model catalog, model development, model monitoring and AI governance capabilities to Cloudera’s portfolio. The capabilities are available as part of the Cloudera AI offering (formerly known as Cloudera Machine Learning), which was also recently boosted by the launch of Cloudera AI Inference, based on NVIDIA NIM microservices, to accelerate the development of applications, agents and assistants based on GenAI models. The Cloudera AI portfolio also includes Accelerators for ML Projects—sample ML projects designed to accelerate AI development. In September 2024, Cloudera added new AMPs for fine-tuning, prompting and grounding AI models, as well as chat-based document query. Additionally, Cloudera unveiled the Cloudera Copilot for Cloudera AI in November 2024 to facilitate AI development. Cloudera Copilot for Cloudera AI complements the previously released SQL AI Assistant and AI Chatbot within Cloudera Data Visualization.
Cloudera remains best known as a data platforms provider and was designated as Innovative in ISG’s 2024 Data Platforms Buyers Guide. The software provider was designated as a Provider of Merit in ISG’s 2024 AI Platforms Buyers Guide and 2024 MLOps Buyers Guide. Cloudera has made great strides to improve its capabilities in relation to AI in recent months, and although its plans are nascent, the acquisition of Octopai combined with the SDX capabilities should see it stake a claim to be a provider of data intelligence. I recommend that enterprises evaluating options for strategic data providers to facilitate a versatile and hybrid cloud approach to data management include Cloudera in the evaluations.
Regards,
Matt Aslett