by Hari Kiran
August 22, 2023
Data replication is an essential aspect of modern database management systems, ensuring data availability, fault tolerance, and scalability. In the PostgreSQL world, a groundbreaking extension called Spock has emerged, transforming the way multi-active replication is handled. Spock, based on the pglogical logical replication tool, brings in a host of new features, including conflict resolution and avoidance, asynchronous replication, and more. In this blog post, we'll explore the powerful capabilities that Spock, part of the pgEdge Platform, offers and how it addresses the challenges faced by developers and database administrators. We'll also delve into Spock’s new architecture, multi-master capabilities, security features, and the promise of ultrahigh availability.
To set the groundwork, let's review a few of the top features of pglogical:
- Logical Replication: Allows selective replication of specific tables, enabling more flexible and efficient data synchronization between databases.
- Bi-Directional Replication (BDR): Unlike PSR, pglogical supports bi-directional replication, allowing changes to flow in both directions between source and target databases. However, without conflict resolution, this can be highly error-prone and questions the data integrity.
- Replication Filtering: This allows you to apply filtering rules to determine which data changes should be replicated, providing flexibility in data synchronization.
- No Dependency on Physical Replication: Operates independently of the physical replication mechanisms, allowing greater flexibility and compatibility with different PostgreSQL setups and versions.
pgEdge Spock - A Leap Forward for pglogical and Multi-Active Replication for Postgres
Asynchronous Multi-Active Replication:
One of the key features that set Spock apart is its support for asynchronous multi-active replication. Unlike pglogical, Spock allows multiple nodes to accept writes simultaneously. This feature boosts performance and enhances fault tolerance and scalability, providing an optimal solution for demanding environments.
Conflict-Free Delta-Apply Columns:
Handling conflicts is a crucial aspect of multi-active replication. Spock introduces conflict-free delta-apply columns, an innovative mechanism that ensures smooth and efficient conflict resolution, and handles columns that hold numeric information. With this approach, Spock will resolve to the true numeric value, significantly reducing the chances of conflicts arising in the first place, and thereby enhancing data consistency and integrity. With this approach, Spock significantly reduces the chances of conflicts arising in the first place, thereby enhancing data consistency and integrity. As I eluded before, this is one of the features dearly missed in pglogical replication.
Advanced Conflict Resolution with Better Error Handling:
In scenarios where conflicts do occur, Spock doesn't disappoint. It offers a robust conflict resolution mechanism that intelligently resolves conflicts without compromising data quality. Moreover, Spock comes with improved error handling, making it easier for developers and administrators to identify and address any issues that might arise during the replication process.
Enhanced Management, Monitoring Stats, and Integration:
Managing a replicated database system can be challenging. Spock simplifies this process by providing enhanced management and monitoring statistics.
For reference, the following Spock metadata tables are used for real-time conflict tracking by pgEdge Cloud, a fully managed cloud service running in multiple regions across AWS, Azure, or Google Cloud.
- spock.conflict_tracker
- spock.resolutions
- spock.local_sync_status
- spock.queue
- spock.lag_tracker
One of our early-stage customers operates a highly scalable web platform with users from the US and EU regions. To provide a seamless experience, they have deployed the multi-active replication architecture using PostgreSQL with Spock. This allows them to distribute read and write operations across multiple database nodes, ensuring high availability and optimal performance. Additionally, it leverages Spock's advanced conflict resolution capabilities, so Spock intelligently identifies conflicting changes and applies sophisticated algorithms to resolve conflicts automatically.
These detailed insights into replication status and performance help their analysts make informed decisions, ensuring a smooth and efficient operation. Additionally, Spock seamlessly integrates with existing PostgreSQL tools like Prometheus, enhancing the overall management experience.
Performance, Stability, and Networking Stress Testing:
Spock has been undergoing rigorous performance, stability, and networking stress testing, making it a robust and reliable solution for critical deployments. Spock boasts efficient streaming of large transactions and handling distributed transactions.
Replication of Partitioned Tables for Geo-Sharding Support:
With the increasing demand for geographically distributed applications, Spock's support for the replication of partitioned tables comes as a game-changer. This feature enables developers to implement geo-sharding, distributing data across different geographical locations while maintaining data consistency and minimizing latency.
Linking Database to a Country of Residence with Configurable PII Rules:
For businesses dealing with sensitive data and privacy regulations, Spock offers a unique advantage. Users can link specific databases to a country of residence, ensuring that Personally Identifiable Information (PII) is kept within the specified region to comply with data residency requirements. Configurable PII rules (part of the `spock.pii` metadata table) provides additional flexibility to tailor data storage policies to meet compliance needs.
Conclusion and where to learn more
Spock's introduction is a pglogical renaissance, and elevates PostgreSQL availability to 99.99% - the 4 9’s (now a de-facto requirement). This also heralds a new era in multi-active replication for PostgreSQL. With its support for asynchronous replication, advanced conflict resolution, and enhanced monitoring capabilities, Spock empowers developers and administrators to build highly available and scalable database architectures. Spock's stress-tested performance and support for partitioned tables offer a reliable solution for modern applications with geographically distributed data requirements. Moreover, Spock's compliance features, such as linking databases to countries of residence and configurable PII rules, ensure data privacy and regulatory compliance. For anyone seeking to elevate their database replication capabilities, Spock is undoubtedly a leap forward in the Postgres world. Check out a feature comparison here.
To learn more and see a live demo, join the webinar Enhancing pgLogical with Multi-Master Features on August 30th at 11 AM ET featuring database expert and Postgres veterans Ahsan Hadi and Cady Motyka.
About the Author
Hari Kiran is a seasoned Database Engineer with nearly 17 years of experience in multiple domains of the IT industry, including healthcare, banking, project & portfolio management, and CRM. He is passionate about PostgreSQL and has helped customers across various geographies with database administration, enterprise implementations, security and hardening, backup and recovery, and performance tuning. Hari has worked at companies such as GE, EDB, Oracle, Optum, and 2ndQuadrant. He is also a regular speaker at PostgreSQL conferences like FOSSASIA Summit, PGConf India/ASIA and PGConf Down Under in Australia.