MySQL-PostgreSQL Sync Tools Compared: Best Options for 2025

Step-by-Step Guide to Setting Up MySQL → PostgreSQL SynchronizationThis guide walks through planning, configuring, and operating a reliable synchronization pipeline from MySQL to PostgreSQL. It covers tools, schema mapping, change data capture (CDC), initial data load, conflict handling, monitoring, and scaling considerations. Follow the steps and examples below to build a production-ready sync that keeps PostgreSQL updated with MySQL changes.


Why synchronize MySQL to PostgreSQL?

  • Use PostgreSQL features (advanced indexing, JSONB, extensions) while keeping MySQL as the primary OLTP source.
  • Migrate gradually: keep MySQL running while moving services to PostgreSQL.
  • Analytics & reporting: maintain a near-real-time replica in PostgreSQL for analytical workloads without taxing MySQL.

Overview of approaches

Common approaches to sync MySQL → PostgreSQL:

  • Logical replication / CDC using binlog readers (Debezium, Maxwell, Bottled Water-style tools).
  • Transactional dump + periodic incremental updates (rsync + timestamps).
  • Trigger-based replication (triggers in MySQL write changes to an intermediary table/queue).
  • ETL/ELT tools (Airbyte, Fivetran, Singer, custom scripts) that support CDC.

Choice depends on latency, complexity, schema differences, and transactional guarantees. For near-real-time, CDC via binlog is recommended.


Prerequisites

  • MySQL server (5.7+ recommended) with replication/row-based binlog enabled.
  • PostgreSQL server (11+ recommended).
  • A Linux host for running sync tools (Docker recommended for portability).
  • Sufficient permissions: MySQL user with REPLICATION SLAVE/CLIENT and SELECT; PostgreSQL user with INSERT/UPDATE/DELETE privileges (and optionally CREATE for schema creation).
  • Network connectivity and secure credentials management (Vault/secret manager).

Step 1 — Plan schema mapping

MySQL and PostgreSQL have different datatypes and behavior.

Key mappings:

  • VARCHAR/TEXT → TEXT or VARCHAR(n)
  • INT/SMALLINT → INTEGER/SMALLINT
  • BIGINT → BIGINT
  • DATETIME/TIMESTAMP → TIMESTAMP WITHOUT TIME ZONE (or WITH if you need tz)
  • TINYINT(1) → BOOLEAN (common mapped case)
  • JSON → JSONB (PostgreSQL)
  • AUTO_INCREMENT → SERIAL/GENERATED AS IDENTITY

Decide how to handle:

  • Primary keys and unique constraints — keep consistent schemas to avoid conflicts.
  • Default expressions and functions — rewrite MySQL functions to Postgres equivalents.
  • Character sets/collations — ensure UTF-8 compatibility; prefer utf8mb4 in MySQL and UTF8 in Postgres.
  • ENUMs — map to check constraints or text with domain types in Postgres.

Make a migration mapping document listing each table, column, and target datatype.


Step 2 — Prepare MySQL for CDC

Enable binary logging and set row-based format:

  1. Edit MySQL config (my.cnf):
    
    [mysqld] server-id=1 log_bin=mysql-bin binlog_format=ROW binlog_row_image=FULL expire_logs_days=7 gtid_mode=ON enforce_gtid_consistency=ON 
  2. Restart MySQL.
  3. Create a replication user:
    
    CREATE USER 'replicator'@'%' IDENTIFIED BY 'strongpassword'; GRANT REPLICATION SLAVE, REPLICATION CLIENT, SELECT ON *.* TO 'replicator'@'%'; FLUSH PRIVILEGES; 
  4. Note current binlog position or GTID for initial snapshot:
    
    SHOW MASTER STATUS; 

If using older MySQL without GTID, record File and Position for the CDC tool.


Step 3 — Prepare PostgreSQL

  • Create target database and user:

    
    CREATE USER sync_user WITH PASSWORD 'strongpassword'; CREATE DATABASE analytics OWNER sync_user; 

  • Configure Postgres for expected load (tune wal_level if using logical decoding for other purposes). Usually no special settings needed for inbound writes from CDC tool.

  • Create schemas/tables matching the mapping document, or let the sync tool create tables remotely if supported. For production, prefer creating and validating schemas manually to control indexes and constraints.


Step 4 — Choose a sync tool

Recommended tools for MySQL→Postgres CDC:

  • Debezium (Kafka-based or standalone via Kafka Connect) — robust, supports schema history, works well in distributed systems.
  • Airbyte — simpler UI-driven, supports CDC connectors.
  • Maxwell’s Daemon — lightweight binlog reader that emits JSON to Kafka/HTTP.
  • pg_chameleon — Python-based tool specifically for MySQL→Postgres replication.
  • Custom scripts using mysqlbinlog + logical apply (for small/simple use cases).

This guide uses Debezium (with Kafka Connect) for examples because it’s production-grade and widely used.


Step 5 — Initial data snapshot

There are two common options:

  • Take a consistent snapshot first (mysqldump or tool-managed snapshot), then start CDC from the saved binlog position.
  • Let CDC tool perform snapshot (many tools can take an online snapshot while locking minimally).

Example: use mysqldump to create a snapshot:

mysqldump --single-transaction --master-data=2 --set-gtid-purged=OFF --routines --triggers --databases app_db > app_db.sql 

Load into PostgreSQL after adjusting schema SQL for Postgres types (mysqldump output needs conversion).

Alternatively, Debezium connector can perform a snapshot automatically and continue from binlog — verify connector snapshot mode and ensure it records offsets.


Step 6 — Configure Debezium (example)

Run Kafka + Zookeeper + Kafka Connect + Debezium (Docker Compose recommended). Minimal Debezium MySQL connector config (JSON POST to Connect REST API):

{   "name": "mysql-connector",   "config": {     "connector.class": "io.debezium.connector.mysql.MySqlConnector",     "tasks.max": "1",     "database.hostname": "mysql-host",     "database.port": "3306",     "database.user": "replicator",     "database.password": "strongpassword",     "database.server.id": "184054",     "database.server.name": "mydbserver",     "database.history.kafka.bootstrap.servers": "kafka:9092",     "database.history.kafka.topic": "schema-changes.mydb",     "include.schema.changes": "true",     "database.history.producer.bootstrap.servers": "kafka:9092",     "database.history.consumer.bootstrap.servers": "kafka:9092",     "snapshot.mode": "initial"   } } 

Debezium will emit change events to Kafka topics named like mydbserver.app_db.table.

To move changes from Kafka to PostgreSQL, use Kafka Connect sink connectors (JDBC sink) or a consumer application that applies changes to Postgres respecting ordering and transactions. Kafka Connect JDBC Sink can be used, but it may not handle complex upserts or deletes without configuration.


Step 7 — Applying changes to PostgreSQL

Options:

  • Kafka Connect JDBC Sink connector (simple, may need SMTs for key handling).
  • Use ksqldb, custom consumer, or Debezium Outbox pattern consumer that reads events and runs SQL against Postgres with idempotency.
  • Use a transformer (ksql/dbt or Kafka Streams) to convert Debezium envelope to flat records.

Key concerns:

  • Preserve ordering per primary key and per transaction.
  • Apply DELETE/UPDATE/INSERT operations correctly. Debezium events contain before/after states — consumer must translate to SQL statements: INSERT for create, UPDATE for update, DELETE for delete.
  • Idempotency: use upserts (INSERT … ON CONFLICT DO UPDATE) to handle retries.

Example PostgreSQL upsert:

INSERT INTO users (id, name, email, updated_at) VALUES ($id, $name, $email, $updated_at) ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name,     email = EXCLUDED.email,     updated_at = EXCLUDED.updated_at; 

For deletes:

DELETE FROM users WHERE id = $id; 

If using JDBC Sink connector, configure pk.mode=record_key and pk.fields to ensure upserts.


Step 8 — Handle schema changes

Debezium records schema change events. Strategies:

  • Allow automatic schema evolution: sink connector updates columns (risky).
  • Manage schema changes manually: apply ALTER TABLE in PostgreSQL first, then allow CDC to populate new columns.
  • Use a schema registry to manage Avro/Protobuf schemas if using Kafka Connect.

Test schema changes on staging before production.


Step 9 — Conflict resolution and data direction

If MySQL is the single source of truth and writes to Postgres are not permitted, configure Postgres to be downstream-only to avoid write conflicts. If bi-directional sync is needed, introduce conflict resolution rules (last-writer-wins, version vectors) and consider using a purpose-built multi-master system.


Step 10 — Monitoring, testing, and validation

Monitor:

  • Connector health (Debezium/Kafka Connect metrics).
  • Lag between MySQL binlog position and applied LSN in Postgres.
  • Error topics in Kafka and failed records in sink connector.
  • Data drift: periodically run checksums between MySQL and PostgreSQL tables (pt-table-checksum style or custom queries).

Testing:

  • Simulate schema changes, high write loads, and network partitions in staging.
  • Test recovery from connector restarts and collector failures.

Validation example: row counts, checksums, and sample primary-key comparisons.


Step 11 — Performance and scaling tips

  • Batch writes to PostgreSQL to reduce transaction overhead.
  • Tune Postgres parameters: wal_level, checkpoint_timeout, max_wal_size, maintenance_work_mem, and effective_cache_size as appropriate.
  • Use partitioning and indexes carefully — too many indexes slow down writes.
  • Scale Kafka (or message bus) to handle throughput; use topic partitioning keyed by primary key to preserve ordering.
  • For very large initial loads, consider chunked snapshotting and parallel apply workers.

Troubleshooting common issues

  • Missing rows: check snapshot completeness and connector offsets.
  • Duplicate rows: ensure primary keys and idempotent upserts.
  • Schema mismatch errors: update mapping and re-run schema migration.
  • Connector crashing: check logs for OutOfMemory or network auth errors, increase JVM heap or fix credentials.

Alternative: Using Airbyte or pg_chameleon

  • Airbyte: simpler UI, built-in connectors for MySQL CDC → Postgres, easier to set up for teams without Kafka.
  • pg_chameleon: designed specifically for MySQL→Postgres replication, handles snapshots and replication; good for migrations.

Evaluate trade-offs: Debezium + Kafka is more robust and extensible; Airbyte is faster to bootstrap.


Security considerations

  • Use TLS for MySQL/Postgres connections.
  • Restrict replication user privileges.
  • Rotate credentials and store in a secrets manager.
  • Secure Kafka and connectors with ACLs if used.

Example end-to-end checklist

  • [ ] Plan schema mappings.
  • [ ] Enable MySQL binlog & create replicator user.
  • [ ] Create target Postgres schemas/tables.
  • [ ] Take initial snapshot and load into Postgres.
  • [ ] Deploy Debezium/MySQL connector.
  • [ ] Deploy sink (Kafka Connect JDBC or consumer) to apply changes to Postgres.
  • [ ] Validate data and set up monitoring.
  • [ ] Test failover and recovery scenarios.
  • [ ] Harden security and rotate credentials.

If you want, I can:

  • Provide a Docker Compose example for Debezium + Kafka + Connect + Postgres.
  • Convert mysqldump output to Postgres-compatible DDL for a specific schema.
  • Generate a sample Kafka Connect sink config tuned for upserts to Postgres.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *