-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[FLINK-38889][pipeline][kafka] Support serializing complex types(MAP, ARRAY, ROW) to JSON (Debezium / Canal) #4221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
yuxiqian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Skyler for the quick fix! Just left some trivial comments.
.../java/org/apache/flink/cdc/connectors/kafka/json/canal/CanalJsonSerializationSchemaTest.java
Outdated
Show resolved
Hide resolved
...afka/src/test/java/org/apache/flink/cdc/connectors/kafka/json/ComplexTypesEdgeCasesTest.java
Outdated
Show resolved
Hide resolved
...onnector-kafka/src/main/java/org/apache/flink/cdc/connectors/kafka/json/TableSchemaInfo.java
Outdated
Show resolved
Hide resolved
… ARRAY, ROW) to JSON (Debezium / Canal)
701a1c7 to
8ab67e5
Compare
|
Thanks @yuxiqian for the review! I have made the changes as suggested. Since the suggestions focused on code formatting, I force-pushed the code to make it clearer. PTAL. |
|
Thanks for the quick response! Just pushed another commit to simplify IT case and docs style. Would @lvyanquan like to take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for serializing complex types (MAP, ARRAY, ROW) to JSON format in the Kafka sink connector for both Debezium and Canal JSON formats. Previously, only the Kafka SQL connector supported these complex types, while the YAML-configured Kafka sink connector would fail when encountering them.
Changes:
- Refactored type conversion logic from TableSchemaInfo into a new RecordDataConverter utility class
- Added conversion support for ARRAY, MAP, and ROW types with recursive handling for nested structures
- Added comprehensive test coverage including unit tests and integration tests for various complex type scenarios
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| RecordDataConverter.java | New utility class that handles conversion of CDC RecordData to Flink SQL RowData, including support for complex types (ARRAY, MAP, ROW) with recursive nesting |
| TableSchemaInfo.java | Refactored to delegate field getter creation to RecordDataConverter, removing duplicate conversion logic |
| TableSchemaInfoTest.java | Added test for nested ROW types within ARRAY to verify complex type conversion |
| DebeziumJsonSerializationSchemaTest.java | Added test to verify Debezium JSON serialization of complex types |
| CanalJsonSerializationSchemaTest.java | Added test to verify Canal JSON serialization of complex types |
| KafkaDataSinkITCase.java | Added comprehensive integration tests covering basic complex types, nested arrays, maps with array values, null/empty collections, and deeply nested structures |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...afka/src/main/java/org/apache/flink/cdc/connectors/kafka/json/utils/RecordDataConverter.java
Show resolved
Hide resolved
...afka/src/main/java/org/apache/flink/cdc/connectors/kafka/json/utils/RecordDataConverter.java
Show resolved
Hide resolved
...afka/src/main/java/org/apache/flink/cdc/connectors/kafka/json/utils/RecordDataConverter.java
Show resolved
Hide resolved
...afka/src/main/java/org/apache/flink/cdc/connectors/kafka/json/utils/RecordDataConverter.java
Outdated
Show resolved
Hide resolved
|
Hi @lvyanquan, I have modified the codes as requested by Copilot. Please take another look if you hava time. |
This closes FLINK-38889.
Purpose
This PR fixes the issue where YAML Kafka sink connector does not support serializing complex types (MAP, ARRAY, ROW) to JSON format (Debezium / Canal), while Kafka SQL connector handles them without problem.
Root Cause
The issue was in the TableSchemaInfo class, which is responsible for converting CDC's RecordData format to Flink's RowData format before JSON serialization. The createFieldGetter() method lacked the necessary conversion logic for complex types.
Changes
Testing