forked from greenplum-db/pxf-archive
-
Notifications
You must be signed in to change notification settings - Fork 6
[ADBDEV-5595] - Add support for new types in PXF #100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Refactor read and write parquet process
- Support time, decimal, uuid, interval and their respective array types
- Support list of lists for reading
- Refactor read and write parquet process - Support time, decimal, uuid, interval and their respective array types - Support list of list 8000 s for reading
- Refactor read and write parquet process - Support time, decimal, uuid, interval and their respective array types - Support list of lists for reading
- Refactor read and write parquet process - Support time, decimal, uuid, interval and their respective array types - Support list of lists for reading
- Refactor read and write parquet process - Support time, decimal, uuid, interval and their respective array types - Support list of lists for reading
- Refactor read and write parquet process - Support time, decimal, uuid, interval and their respective array types - Support list of lists for reading
- Refactor read and write parquet process - Support time, decimal, uuid, interval and their respective array types - Support list of lists for reading
- Refactor read and write parquet process - Support time, decimal, uuid, interval and their respective array types - Support list of lists for reading
- Fix integration tests
- Add filtering
RomaZe
reviewed
Jun 13, 2024
server/pxf-api/src/main/java/org/greenplum/pxf/api/filter/SupportedOperatorPruner.java
Outdated
Show resolved
Hide resolved
server/pxf-api/src/main/java/org/greenplum/pxf/api/GreenplumDateTime.java
Show resolved
Hide resolved
server/pxf-s3/src/main/java/org/greenplum/pxf/plugins/s3/S3SelectQueryBuilder.java
Show resolved
Hide resolved
- Fix review comments
RomaZe
previously approved these changes
Jun 14, 2024
- Add tests - Add binary various parsing - Add bson to dependencies
- Add tests - Add binary various parsing - Add bson to dependencies
Refactor ListConstToStr function The original implementation of the list_const_to_str (ListConstToStr) function was intended to extract and format values from array constants into a string buffer, supporting data types like int2[], int4[], int8[], and text[]. It checked for null constants, logged exceptions, extracted the array, processed each supported type with specific code blocks to deconstruct the array, converted each element to a string, and appended the formatted data to a buffer. However, the approach led to code duplication, maintenance challenges, and reduced readability due to scattered, repetitive logic. The refactored function consolidated common logic into reusable, unified steps. It retrieves type information and deconstructs arrays using a single, generalized procedure, eliminating repetitive code. This streamlined array processing is supported by getTypeOutputInfo, which extracts output function from the catalog, facilitating consistent conversion across all data types with OidOutputFunctionCall. Additionally, the refactored function simplifies its logic by centralizing the handling of array elements and reducing the switch case complexity. This enhances readability and maintainability, making it easier to understand and modify. Adding new array types now involves minimal changes, creating a more elegant, less error-prone codebase.
- Fix handling of arrays. Currently they're not supported on parquet library level. Trying <array-column> = array[val1] leads to 0 match
iamlapa
reviewed
Jun 28, 2024
...f-hdfs/src/main/java/org/greenplum/pxf/plugins/hdfs/parquet/ParquetTypeConverterFactory.java
Outdated
Show resolved
Hide resolved
.../main/java/org/greenplum/pxf/plugins/hdfs/parquet/converters/BinaryParquetTypeConverter.java
Outdated
Show resolved
Hide resolved
- Fix review comments
iamlapa
previously approved these changes
Jul 3, 2024
RomaZe
reviewed
Jul 10, 2024
.../pxf-hdfs/src/main/java/org/greenplum/pxf/plugins/hdfs/parquet/ParquetIntervalUtilities.java
Show resolved
Hide resolved
server/pxf-api/src/main/java/org/greenplum/pxf/api/filter/SupportedOperatorPruner.java
Show resolved
Hide resolved
...xf-hdfs/src/main/java/org/greenplum/pxf/plugins/hdfs/parquet/ParquetRecordFilterBuilder.java
Outdated
Show resolved
Hide resolved
- Optimize imports
RomaZe
previously approved these changes
Jul 12, 2024
# Conflicts: # automation/arenadata/Dockerfile
- Fix bug with time without annotation
- Fix bug with time without annotation
- Fix bug with json array
- Fix bug with json array
- Fix bug with json array
- Fix unit tests
Extend the list of types for predicate pushdown The recent changes to the pxffilters.c file in the external-table codebase and in pxf_filter.c in the fdw codebase introduce several enhancements and adjustments aimed at improving filter handling, particularly extending of supported data types range. Here is a summary of the key updates: Array comparison support in the filters: The operator map for pxf_supported_opr_op_expr now includes generic array comparison operators like ARRAY_EQ_OP, enhancing the filter capabilities for array types. <>, IN, IS NULL operators are supported. If the array's elements do not support any scalar operator, the error is thrown in order to avoid extra work of pxf (otherwise GPDB will reject the result and throw an error after the data has been processed and returned by the pxf). A new oid list pxf_supported_array_types[] and a new function supported_array_type(Oid type) are added to check if a given array type is supported for filtering. It is used to process array as list constants inside the opexpr_to_pxffilter() (OpExprToPxfFilter()) function. pxf_serialize_filter_list() (PxfSerializeFilterList()) is updated to handle scenarios where a filter operand is an attribute and the other is a list constant. List Constant Handling: The function list_const_to_str() (ListConstToStr()) now includes an additional bool parameter with_nulls, allowing it to handle NULL values in array constants appropriately. For the case of IN operators (scalar_array_op_expr_to_pxffilter() ) the NULL value inside the enumeration is meaningless (the comparison with NULLS can't be performed) and the list_const_to_str() is called with false with_nulls argument. For the case of array comparison (like a = ARRAY[TRUE, NULL]) the list_const_to_str() is called with true parameter with_nulls. BOOLARRAY type introduction: New OID BOOLARRAYOID is defined as a new macro and is covered in #ifndef because GPDB 6 does not have the definition for this type. The patch makes updates to pxf_supported_opr_scalar_array_op_expr with new entry BooleanEqualOperator to handle the IN operator for BOOL type. The list_const_to_str() (ListConstToStr()) also uses predefined string constants to encode the bool values in the final filter representation. Other types are handled by extending pxf_supported_opr_op_expr[], pxf_supported_types[], pxf_supported_array_types[], pxf_supported_opr_scalar_array_op_expr[] lists and corresponding switch operators inside the scalar_const_to_str() and list_const_to_str() functions. The following new data types have been added or extended for array types: BYTEA, BYTEAARRAY; FLOAT4, FLOAT8, FLOAT4ARRAY, FLOAT8ARRAY; BPCHARARRAY; VARCHARARRAYOID (here the modification of scalar_array_op_expr_to_pxffilter() was needed due to RelabelType occurence in queries like SELECT * from test_varchar where t in ('aaa'::varchar(10), 'bbb'::varchar(10)); DATEARRAY; TIME, TIMEARRAY; TIMESTAMPARRAY; TIMESTAMPTZ, TIMESTAMPTZARRAY; INTERVAL, INTERVALARRAYOID; NUMERICARRAY; UUID, UUIDARRAY; JSONB, JSONBARRAY; JSON, JSONARRAY (scalar operators and IN are not supported); In order to launch the regression tests the UserDataVerifyAccessor.java has been extended to generate more fields of newly supported types. Automation tests checkFilterPushdown were modified and extendted to cover most of the newly added types.
GPDB 7 do not have macros for Float84EqualOperator and for UuidEqualOperator. The patch 58c933c didn't take it into account, what led to compilation errors for 7 version. This patch replaces the marcos with straight oid representaion in order to fix extension compilation.
iamlapa
approved these changes
Jul 30, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.