The Arguments For, and Against, In-Database Machine Learning

Written by Matt Aslett | Nov 23, 2022 11:00:00 AM

Almost all organizations are investing in data science, or planning to, as they seek to encourage experimentation and exploration to identify new business challenges and opportunities as part of the drive toward creating a more data-driven culture. My colleague, David Menninger, has written about how organizations using artificial intelligence and machine learning (AI/ML) report gaining competitive advantage, improving customer experiences, responding faster to opportunities and threats, and improving the bottom line with increased sales and lower costs. One-quarter of participants (25%) in Ventana Research’s Analytics and Data Benchmark Research are already using AI/ML, while more than one-third (34%) plan to do so in the next year, and more than one-quarter (28%) plan to do so eventually. As organizations adopt data science and expand their analytics initiatives, they face no shortage of options for AI/ML capabilities. Understanding which is the most appropriate approach to take could be the difference between success and failure. The cloud providers all offer services, including general-purpose ML environments, as well as dedicated services for specific use cases, such as image detection or language translation. Software vendors also provide a range of products, both on-premises and in the cloud, including general-purpose ML platforms and specialist applications. Meanwhile, analytic data platform providers are increasingly adding ML capabilities to their offerings to provide additional value to customers and differentiate themselves from their competitors. There is no simple answer as to which is the best approach, but it is worth weighing the relative benefits and challenges. Looking at the options from the perspective of our analytic data platform expertise, the key choice is between AI/ML capabilities provided on a standalone basis or integrated into a larger data platform.

In-database analytics is nothing new. For many years, analytic data platform vendors have incorporated analytics capabilities into their data platforms, while also continuing to partner with analytics specialists. In recent years, this practice has extended to data science with the addition of key ML algorithms, functions, tools and associated capabilities into the data platform. This looks set to continue. I assert that through 2024, analytic data platform vendors will continue to accelerate the delivery of actionable insight by integrating native data integration, data management, analytics and ML with their core data persistence and processing functionality. As a result, even organizations that have not invested in AI/ML tooling and expertise are likely to have AI/ML capabilities at their disposal, courtesy of their incumbent analytic data platform provider. One of the primary arguments in favor of taking advantage of in-database ML functionality, rather than a separate AI/ML platform, is avoiding complex and costly data movement. Training ML models requires large volumes of data. Thanks to data warehousing and data lake initiatives, many organizations will have already consolidated that data into an analytic data platform. This is not an insignificant task, to the extent that there may be resistance among data management professionals against copying or moving that data into another platform to develop and test ML models. Assuming the necessary functionality is available in the analytic data platform, there is a solid argument in favor of bringing the model to the data, rather than taking the data to the model, given the data movement complexity and potential data duplication and data drift challenges.

Of course, it may not be the case that the necessary functionality is available in the analytic data platform. One of the primary arguments for taking advantage of standalone AI/ML platforms is the breadth and depth of specialist functionality available. This could mean support for a greater range of ML and deep learning models. Clearly, this is a fundamentally important consideration. If the functionality to support a required algorithm is not available in an analytic data platform, data scientists will have no option but to utilize a separate AI/ML platform. Even if the algorithm is supported, a data scientist’s preference for a specific tool or framework may encourage them to think that it is worth extracting the data from the analytic data platform for preparation, modeling and analysis elsewhere. Even with data movement complexity and costs, there could be arguments in favor of this approach if it results in happier data scientists and greater productivity and efficiency in the model development and testing stage. As David recently discussed, however, putting AI/ML into production requires more than just developing a model. Other key capabilities include data engineering (such as data preparation and feature discovery), MLOps (for model deployment, monitoring and management), and AI governance (to ensure that models are ethical, explainable, trusted and compliant with organizational policies or regulatory requirements). These are capabilities that are being incorporated into analytic data platforms, alongside core AI/ML functionality, but may or may not match up to the maturity or depth of functionality that is offered by specialist AI/ML platform vendors.

Most organizations will employ multiple approaches to AI/ML, utilizing in-database and standalone functionality, both on-premises and in the cloud, depending on the specific use case. Indeed, it is likely that even within specific AI/ML initiatives, multiple platforms and approaches will be used. As data platform providers continue to add ML capabilities to their products, they are likely to become suitable for a growing number or range of use cases. The need for standalone AI/ML platforms will continue, but I would recommend that all organizations evaluate the AI/ML functionality available from their preferred analytic data platform providers on an ongoing basis and keep themselves informed on the level of capabilities they have at their disposal.

Regards,

Matt Aslett

View full post