ISG Software Research Analyst Perspectives

Neo4j Expands Data Science Focus with New Managed Service

Written by Matt Aslett | Aug 3, 2022 10:30:00 AM

I recently explained how emerging application requirements were expanding the range of use cases for NoSQL databases, increasing adoption based on the availability of enhanced functionality. These intelligent applications require a close relationship between operational data platforms and the output of data science and machine learning projects. This ensures that machine learning and predictive analytics initiatives are not only developed and trained based on the relationships inherent in operational applications, but also that the resulting intelligence is incorporated into the operational application in real time to support capabilities such as personalization, recommendations and fraud detection. Graph databases already support operational use cases such as social media, fraud detection, customer experience management and recommendation engines. Graph database vendors such as Neo4j are increasingly focused on the role that graph databases can play in supporting data scientists, enabling them to develop, train and run algorithms and machine learning models on graph data in the graph database, rather than extracting it into a separate environment.

Neo4j was founded in 2007 to build a business around the open-source graph database of the same name. The company’s Neo4j Graph Database enables native storage of data using a property graph model in which entities, values and the relationship between them are stored as edges, nodes and attributes. The graph data model is inherently more suitable than the established relational data model for high-performance identification and navigation of relationships between entities and values, and Neo4j enjoyed early success for use cases including network management, master data management, social media, fraud detection and navigation systems.

The Neo4j Graph Database is also immediately consistent for ACID (atomicity, consistency, isolation, durability) compliance. In addition to developing the core graph database, Neo4j also invested in the development of associated tools and applications including the Cypher Query Language, the Bloom graph data visualization environment, desktop and browser-based developer tools, and an application programming interface developer library based on GraphQL. In 2020, the company expanded its purview to data scientists with the introduction of Neo4j Graph Data Science.

Mainstream adoption of graph databases is still in its early stages. Fewer than 1 in 6 participants (15%) in Ventana Research’s Analytics and Data Benchmark Research are in production with graph databases today, although 11% plan to use them within 12 months, and another 9% within two years. In 2021, Neo4j introduced Neo4j AuraDB, a managed cloud service designed to facilitate adoption. Currently available on Google Cloud and Amazon Web Services, Neo4j AuraDB enables developers to provision the graph database and associated tools and interfaces without any upfront infrastructure investment or ongoing management requirements. Earlier this year the company also launched Neo4j AuraDS, a fully managed version of Neo4j Graph Data Science. The company has more than 950 customers including Airbus, Allianz, Daimler, eBay, Marriott, Verizon and UBS. It has also attracted the interest of investors, including a $325 million series F funding round announced in June 2021 that valued the company at over $2 billion.

Some Neo4j customers were already using the company’s graph database to support data science prior to the launch of Neo4j Graph Data Science. The native representation of relationships can be particularly useful in surfacing “features” for use in machine learning modeling. The delivery of Neo4j Graph Data Science was designed to ensure that data scientists could use the data stored in graph databases to facilitate the development of predictive and prescriptive analytics models without needing to extract it into an external data science platform. Neo4j Graph Data Science represents a unified workspace for data scientists that combines the graph database and associated query, visualization and developer tools with a native graph API client and a library of 65 pre-tuned graph algorithms, including algorithms to detect and analyze centrality, community, similarity and path finding.

While Neo4j Graph Data Science is designed to bring data scientists to graph data, various approaches are offered to bring external data to Neo4j Graph Data Science. Data can be loaded as CSV files or via API using the Awesome Procedures on Cypher (APOC) library or via a variety of extract, transform and load tools. The Neo4j Connector for Apache Spark also enables Neo4j to read data from Spark DataFrames.

Key use cases for Neo4j Graph Data Science include anomaly and fraud detection in network management, insurance and security; entity resolution and recommendation engines in retail and ecommerce; medicine and therapy recommendation systems in pharmaceuticals; and optimized routing in the logistics sector. I assert that through 2026, operational data platform providers will continue to invest in hybrid operational and analytic processing capabilities to support growing demand for intelligent operational applications infused with personalization and artificial intelligence-driven recommendations.

Neo4j’s cloud offerings are relatively new and are being utilized to ease adoption by new customers as well as transition existing customers based on the benefits of cloud consumption. As is the case with AuraDB compared to the self-hosted and managed Neo4j Graph Database, the advantage of AuraDS over Neo4j Graph Data Science is the ability to consume the functionality as a managed service with elastic scalability and automated provisioning, upgrades, updates and backups. All data stored in AuraDB and AuraDS is encrypted while the Enterprise editions, designed for large-scale deployments, run within a dedicated virtual private cloud for isolation.

Neo4j’s cloud services and dedicated data science offerings are both relatively new but provide the company with opportunities to expand its addressable market and accelerate demand for graph-native data science. Adoption will likely be driven initially by small-scale projects, and there are opportunities for the company to expand its capabilities for enterprise-grade production. The company’s graph expertise and growing data science portfolio means it is building on its strengths. I recommend that organizations consider Neo4j when evaluating potential use cases for the graph data model as well as graph-based machine learning.

Regards,

Matt Aslett