-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[FLINK-38742][cdc-pipeline/postgres] Fix timestamp with time zone #4181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[FLINK-38742][cdc-pipeline/postgres] Fix timestamp with time zone #4181
Conversation
8f24e77 to
033e204
Compare
033e204 to
c89cdd9
Compare
111c324 to
c89cdd9
Compare
|
@Fluder-Paradyne Thank for your contribute. cc @lvyanquan Do you agree with me? |
Can you describe in detail how I can reproduce this problem? |
|
This table works but this one doesn't |
|
column_type_test#time_types has' TIMESTAMP WITH TIME ZONE 'ut test, but I have a problem with e2e test, let me check again. |
loserwang1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left some minor comment
...in/java/org/apache/flink/cdc/connectors/postgres/source/PostgresSchemaDataTypeInference.java
Outdated
Show resolved
Hide resolved
...um/src/main/java/org/apache/flink/cdc/debezium/event/DebeziumEventDeserializationSchema.java
Outdated
Show resolved
Hide resolved
…TIMESTAMP_LTZ PostgreSQL TIMESTAMPTZ columns were causing NumberFormatException because PostgresTypeUtils incorrectly mapped them to ZonedTimestampType (TIMESTAMP_WITH_TIME_ZONE). PostgreSQL TIMESTAMPTZ stores values internally in UTC and converts on display based on session timezone, which semantically matches TIMESTAMP_LTZ, not TIMESTAMP_WITH_TIME_ZONE. Additionally, Debezium's PostgreSQL connector always converts TIMESTAMPTZ values to UTC format (e.g., '2025-12-30T05:59:50.724893Z') before serialization. The type mismatch caused sinks to call getZonedTimestamp() on LocalZonedTimestampData, resulting in NumberFormatException when trying to parse binary data as a string. This commit fixes PostgresTypeUtils to correctly map TIMESTAMPTZ and TIMESTAMPTZ_ARRAY to TIMESTAMP_LTZ type, matching both PostgreSQL semantics and Debezium's serialization format. The E2E test is updated to verify the fix.
c89cdd9 to
6d41a79
Compare
Hey @Mrart thanks for the review, dug around a bit, yes TIMESTAMP_LTZ is more reasonable as it close to postgres secmantics and sinks also support it |
|
@Fluder-Paradyne @Mrart , I have seen your reproduce step.
I don't think so. Timestamp ltz will replace the timestamp zone of source with local time zone in flink cdc. For some sink(such as iceberg, kafka), they want to reserve the time zone from source. If The data from source is 08:00+8, while the time zone of fink cdc is UTC+0, the sink will write into 00:00+0. (If it's written into postgres , it's ok, will stored into same long value. However, If it's written into kafka, it wil be stored as string) @lvyanquan , WDYT? |
|
@loserwang1024 for pg TIMESTAMPTZ column, the moment you insert/update the timezone information is converted to UTC and store as UTC time, and then when you select it, it gets converted based on timezone of the postgres session. Since the source itself does not have the original timezone info, shoudn't Timestamp ltz be fine ? |
I agree with @loserwang1024's option, we can configure this information of Source through server-time-zone, refer to Line 67 in 533072b
It seems that we are missing this section in the document (maybe you can add it together). |
|
Hey Guys I am bit confused, This is my current understanding on how TIMESTAMPTZ is treated. TIMESTAMPTZ in PG --> DEBEZIUM Since debezium also give only a UTC converted string, I dont think there is a way to retain timezone information to the sink other than UTC https://www.postgresql.org/docs/14/datatype-datetime.html
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes an issue where Flink CDC pipeline failed when processing PostgreSQL tables with TIMESTAMPTZ (timestamp with time zone) columns. The fix involves changing the type mapping from ZonedTimestampType to TIMESTAMP_LTZ (LocalZonedTimestampType), which correctly represents PostgreSQL's TIMESTAMPTZ semantics.
Changes:
- Updated type mapping for TIMESTAMPTZ columns in PostgreSQL connector
- Added comprehensive E2E test coverage with TIMESTAMPTZ column
- Updated test expectations to validate the fix
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| PostgresTypeUtils.java | Changed TIMESTAMPTZ mapping from ZonedTimestampType to TIMESTAMP_LTZ for both scalar and array types, removed unused import |
| postgres_inventory.sql | Added created_at TIMESTAMPTZ column to products table with test data |
| PostgresE2eITCase.java | Updated test expectations to include new TIMESTAMPTZ column in schema and data change events |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
In postgres when a table has a column with timestamp with time zone column the cdc pipeline fails