I previously wrote about the challenge facing distributed SQL database providers to avoid becoming pigeonholed as only being suitable for a niche set of requirements. Factors including performance, reliability, security and scalability provide a focal point for new vendors to differentiate from established providers and get a foot in the door with customer accounts. Expanding and retaining those accounts is not necessarily easy, however, especially as general-purpose data platform providers evolve their products to respond to competitive threats. The same issue impacts vendors in the analytic data platform market. While Snowflake has been hugely successful in helping to drive adoption of cloud-based analytic databases, the established providers are evolving to respond. To maintain and expand its importance to customers, Snowflake is itself evolving, with an aim to provide not just a cloud-based data warehouse but a cloud-based platform for a wider analytic ecosystem.
Snowflake was founded in 2012 to build a business around its cloud-based data warehouse with in-built data-sharing capabilities. Now described as a data cloud, Snowflake has
Snowflake is not resting on its laurels, and at its recent Snowflake Summit customer event, it announced a slew of new product features, functions and packaging designed to cement its importance to existing customers and drive expansion into new accounts. One of the core tenets of the company’s positioning is that it provides an elastic multi-cluster cloud compute platform that runs on optimized storage to support multiple data and analytics workloads. Increasingly, these workloads include not only those running on Snowflake’s native data processing engine, but an ecosystem of applications and cloud services provided by partners. Key announcements at Snowflake Summit 2023 included the public preview of the Snowflake Native Apps Framework and the private preview of Snowpark Container Services, which enables customers to run their own choice of third-party software, including programming languages, data science libraries and generative AI models, on Snowflake Data Cloud.
Snowflake is still most often used as a data warehouse for SQL-based analysis of structured data. However, the company has demonstrated that it has bigger plans, expanding its addressable market with capabilities for data engineering and data science, as well as the analysis of semi- and unstructured data. The ability to deploy and process non-SQL code is enabled by the Snowpark developer environment, which was introduced in 2020 and is designed to enable data engineers, data scientists and developers to execute custom Python, Java, and Scala code, as well as utilize the embedded Anaconda repository, for advanced analytics workloads, including trained machine learning (ML) models. Snowpark also includes integration with Streamlit, the Python-based rapid application development and iteration environment, which was acquired by Snowflake in 2022. In addition to Snowflake’s warehouse engine, Snowpark can now utilize Snowpark Container Services, enabling users to deploy languages and libraries not already supported by Snowflake on Data Cloud using Docker containers. The company also announced that several partners — including Alteryx, Amplitude, Astronomer, Dataiku, H2O.ai, Pinecone, SAS Institute and Weights & Biases — are delivering products and services with Snowpark Container Services. Snowflake also announced a partnership with NVIDIA to make NVIDIA AI Enterprise available with Snowpark Container Services, along with plans to host NVIDIA’s NeMo platform for developing large language models (LLMs) and to collaborate on support for NVIDIA GPU-accelerated computing. Separately, Snowflake also launched its own Document AI LLM for extracting information from unstructured documents and converting it into structured data. Document AI is based on the generative AI technology Snowflake acquired with Applica in 2022. Snowflake’s search experience is also due to get a boost from generative AI thanks to the recent acquisition of Neeva.
Snowflake also announced the public preview of its Snowflake Native App Framework which enables developers to build and test applications that run natively on Snowflake Data Cloud
I assert that by 2026, three-quarters of organizations will use cloud-based products and services as their primary analytic data platform, making it easier to adopt and scale operations as necessary. This will provide ample opportunity for Snowflake to continue to grow, even as it faces stiffer competition from both established vendors and emerging startups. The company’s plans for generative AI are comparatively nascent given the high levels of excitement we’ve seen in relation to LLMs, although it has made some interesting acquisitions; the partnership with NVIDIA will stand it in good stead. I recommend that all organizations considering their options for analytic data platforms include Snowflake in their evaluations.
Regards,
Matt Aslett