There is always space for innovation in the data platforms sector, and new vendors continue to emerge at regular intervals with new approaches designed to serve specialist data storage and processing requirements. Factors including performance, reliability, security and scalability provide a focal point for new vendors to differentiate from established vendors, especially for the most demanding operational or analytic data platform requirements. It is never easy, however, for developers of new data platform products to gain significant market traction, given the dominance of the established relational database vendors and cloud providers. Targeting requirements that are not well-served by general purpose data platforms can help new vendors get a toe in the door of customer accounts. The challenge to gaining further market traction is for new vendors to avoid having products become pigeon-holed as only being suitable for a niche set of requirements. This is precisely the problem facing the various distributed SQL database providers.
Distributed SQL is a term that has been adopted by several vendors to describe operational data platform products designed to combine the benefits of the relational database model and native support for distributed cloud architecture, including resilience that spans multiple data centers and/or cloud regions. Vendors of products that meet this description include the likes of Cockroach Labs, MariaDB, PingCAP, PlanetScale, and Yugabyte as well as Amazon Web Services, Google Cloud and Microsoft Azure. Distributed SQL database vendors provide relational data platforms that replicate data across multiple servers, forming a single, logical database. Replicating data across multiple servers could mean spanning multiple nodes in a single data center, multiple nodes across multiple data centers or even multiple nodes across multiple cloud providers in multiple geographic regions. Distributed SQL databases are primarily designed to support operational workloads used to run the business and should not be confused with distributed SQL query engines such as Presto and Trino, which are used to accelerate analysis of data in multiple data platforms across a distributed architecture. Distributed SQL databases and distributed SQL query engines are complementary and could potentially be used in conjunction.
There are multiple reasons why organizations might be interested in distributed SQL databases. A significant potential benefit is business continuity. Distributed SQL databases are specifically designed to provide scalability and resiliency that extend beyond a single data center or cloud instance. Support for elastic scalability can be utilized to deliver automated cloud expansion and contraction in response to evolving capacity requirements, while the ability to span multiple geographic regions – or even cloud providers – can be used to support disaster recovery and high availability. Multi-region support could also be used to segregate data processing to support data sovereignty requirements. Latency is also a consideration for data platforms that can replicate data that is not subject to regulatory limitations, supporting local performance requirements by minimizing network delays between the data platform and users across multiple regions.
All of these are potential advantages of distributed SQL databases and are driving adoption by organizations that have identified use cases that require active-active cross data center data replication. However, the same was also true of the first generation of distributed relational database vendors such as GenieDB, NuoDB, TransLattice and Xeround, which emerged around 2010 only to be acquired or closed within a few years without having made a material impact on the data platforms landscape. These vendors, sometimes referred to by the collective term NewSQL, provide a cautionary tale for distributed SQL providers. Since active-active cross data center data replication functionality was relatively new, organizations needed time to understand the potential use cases and develop applications to take advantage of that functionality. Customer demand for NewSQL databases was by no means non-existent, but it did not materialize fast enough to keep the various vendors in business long enough to become more than niche providers.
The prospects for distributed SQL vendors seem brighter. Enterprise understanding and demand for active-active cross data center data replication, while still not mainstream, has benefited from a decade
Another potential benefit of distributed SQL is avoidance of complexity, compared to the use of database sharding and in-memory caching layers that are used to scale existing relational databases. Developer agility
Another key factor in lowering barriers to developer adoption is compatibility with existing applications, frameworks, drivers and tools. The various distributed SQL vendors have delivered compatibility with popular open-source databases such as MySQL or PostgreSQL to enable developers to use familiar tools and techniques when developing new applications to run on distributed SQL databases. Migrating existing applications to distributed SQL databases is more complex given that existing, single-node applications have not been developed to take advantage of a distributed architecture, and many distributed SQL databases are not designed to support single-node applications.
If distributed SQL vendors are to avoid the pitfalls that befell NewSQL vendors, these organizations need to provide additional value to customers by supporting the migration of existing software as well as the development of new applications. Ironically, support for single-node applications may be critical to proving the value of distributed SQL. While this functionality is a work in progress for the various distributed SQL vendors, I nevertheless recommend that organizations evaluate the potential benefits of distributed SQL when considering options for new operational data platforms.
Regards,
Matt Aslett