Open
Description
Motivation
- Tests source parsing and reading.
- We have encountered various bugs from source parser recently, this will help us catch bugs early.
Basic Requirements
- Able to test basic source read of various row formats.
- Able to test basic source parse of various row formats.
(Thanks @waruto210 for clarifying with me 63C4 .)
Full Requirements
- Able to generate complex nested schema.
- Able to configure nesting depth.
- Able to generate mix of data types, including array.
- Main use is to test for correctness, but perhaps it can be used for performance testing in the future?
Background
Originally posted here: #5164 by @neverchanje :
Source reading/parsing
The parsing part will be more deterministic than the rest. We need to generate random data in a specific format (with a probability of generating false data, the expected behavior is to drop it) and verify the correctness of the parsed output.
- MockSource x Protobuf
- MockSource x JSON test: a test framework for source parsing #5512
- MockSource x Debezium JSON
- MockSource x Avro
Offline discussion with @neverchanje :
The goal is to stabilize of protobuf and avro, which typically have complex nested schema. For testing, we need to ensure that a protobuf file with multiple nested levels and a mix of various data types (including array) can be correctly parsed.