Skip to content

Conversation

@tchivs
Copy link
Contributor

@tchivs tchivs commented Jan 9, 2026

Summary

Add partition.tables configuration option to support PostgreSQL 10+ partition table routing, significantly improving performance for databases with many partitions.

JIRA: https://issues.apache.org/jira/browse/FLINK-38881

Background

Related to Discussion #4079 and PR #2571.

As noted in PR #2571: "It's very time-consuming for postgres to refresh schema if there are many tables to read. According to our testing, refreshing 400 tables takes 15 minutes."

This PR addresses similar performance issues specifically for PostgreSQL partition tables, where hundreds of child partitions can cause severe performance degradation.

New Features

PostgreSQL 10+ Partition Table Routing

  • Child-to-parent routing: WAL events from child partitions are automatically routed to parent table
  • Unified table ID: All partition data appears under the parent table ID in downstream systems
  • Pattern-based matching: Use regex patterns to match child partition tables
  • Primary key inheritance: Parent table automatically inherits primary key from representative child partition

Performance Improvements

Before (Problems)

  • Frequent schema refresh: Schema was refreshed on every table access
  • Full schema load: Even when requesting single table schema, all tables were loaded
  • Excessive CreateTableEvents: Hundreds of partition tables generated massive CreateTableEvent storms
  • High memory/DB pressure: Loading schema for each partition table caused memory bloat and database load
  • Checkpoint timeout risk: Schema loading time too long, easily triggers Flink checkpoint timeout

After (Optimizations)

  • Lazy schema loading: Schema loaded only when needed
  • Single table fetch: readSchema() now fetches only the requested table
  • Event consolidation: Child partition events routed to parent table, eliminating duplicate CreateTableEvents
  • Reduced DB queries: Parent table schema cached and reused for all child partitions

Configuration

Flink SQL:

CREATE TABLE pg_source (...) WITH (
  'connector' = 'postgres-cdc',
  'partition.tables' = 'public.orders:public\.orders_\d{6}'
);

Pipeline YAML:

source:
  type: postgres
  partition.tables: "public.orders:public\.orders_\d{6}"

Format: parent:child_regex (comma-separated for multiple entries)

Test Plan

  • PostgresPartitionRulesTest - Rule parsing tests
  • PostgresPartitionRouterTest - Routing logic tests
  • PostgresPartitionConnectorConfigTest - Config and filter tests
  • PostgreSQLPartitionTablesConfigITCase - End-to-end integration test

@tchivs tchivs changed the title [FLINK-XXXXX][postgres] Support partition table routing for PostgreSQL CDC [FLINK-38881][postgres] Support partition table routing for PostgreSQL CDC Jan 9, 2026
@tchivs tchivs force-pushed the perf/postgres branch 4 times, most recently from a72da61 to 10e19a0 Compare January 16, 2026 02:21
…L CDC

This PR adds partition.tables configuration to support PostgreSQL 10+
partition table routing, significantly improving performance for databases
with many partitions.

New Features:
- Child-to-parent routing for partition tables
- Pattern-based matching for child partitions
- Primary key inheritance from representative child

Performance Improvements:
- Fix frequent schema refresh on every table access
- Fix full schema load when requesting single table
- Consolidate child partition events to parent table
- Cache parent table schema to reduce DB queries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant