8000 [ADBDEV-5595] - Add support for new types in PXF by xardazzzzzz · Pull Request #100 · arenadata/pxf · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[ADBDEV-5595] - Add support for new types in PXF #100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Jul 30, 2024
Merged

Conversation

xardazzzzzz
Copy link
  • Refactor read and write parquet process
  • Support time, decimal, uuid, interval and their respective array types
  • Support list of lists for reading

- Refactor read and write parquet process
- Support time, decimal, uuid, interval and their respective array types
- Support list of list
8000
s for reading
- Refactor read and write parquet process
- Support time, decimal, uuid, interval and their respective array types
- Support list of lists for reading
@xardazzzzzz xardazzzzzz requested a review from RomaZe June 10, 2024 11:09
- Refactor read and write parquet process
- Support time, decimal, uuid, interval and their respective array types
- Support list of lists for reading
- Refactor read and write parquet process
- Support time, decimal, uuid, interval and their respective array types
- Support list of lists for reading
- Refactor read and write parquet process
- Support time, decimal, uuid, interval and their respective array types
- Support list of lists for reading
- Refactor read and write parquet process
- Support time, decimal, uuid, interval and their respective array types
- Support list of lists for reading
- Refactor read and write parquet process
- Support time, decimal, uuid, interval and their respective array types
- Support list of lists for reading
RomaZe
RomaZe previously approved these changes Jun 14, 2024
- Add tests
- Add binary various parsing
- Add bson to dependencies
xardazzzzzz and others added 3 commits June 19, 2024 14:25
- Add tests
- Add binary various parsing
- Add bson to dependencies
Refactor ListConstToStr function

The original implementation of the list_const_to_str (ListConstToStr) function
was intended to extract and format values from array constants into a string
buffer, supporting data types like int2[], int4[], int8[], and text[]. It
checked for null constants, logged exceptions, extracted the array, processed
each supported type with specific code blocks to deconstruct the array,
converted each element to a string, and appended the formatted data to a buffer.
However, the approach led to code duplication, maintenance challenges, and
reduced readability due to scattered, repetitive logic.

The refactored function consolidated common logic into reusable, unified steps.
It retrieves type information and deconstructs arrays using a single,
generalized procedure, eliminating repetitive code. This streamlined array
processing is supported by getTypeOutputInfo, which extracts output function
from the catalog, facilitating consistent conversion across all data types
with OidOutputFunctionCall.

Additionally, the refactored function simplifies its logic by centralizing the
handling of array elements and reducing the switch case complexity. This
enhances readability and maintainability, making it easier to understand and
modify. Adding new array types now involves minimal changes, creating a more
elegant, less error-prone codebase.
- Fix handling of arrays. Currently they're not supported on parquet library level. Trying <array-column> = array[val1] leads to 0 match
iamlapa
iamlapa previously approved these changes Jul 3, 2024
RomaZe
RomaZe previously approved these changes Jul 12, 2024
# Conflicts:
#	automation/arenadata/Dockerfile
iamlapa and others added 19 commits July 12, 2024 12:21
- Fix bug with time without annotation
- Fix bug with time without annotation
Extend the list of types for predicate pushdown

The recent changes to the pxffilters.c file in the external-table codebase and
in pxf_filter.c in the fdw codebase introduce several enhancements and
adjustments aimed at improving filter handling, particularly extending of
supported data types range.
Here is a summary of the key updates:

Array comparison support in the filters:
The operator map for pxf_supported_opr_op_expr now includes generic array
comparison operators like ARRAY_EQ_OP, enhancing the filter capabilities for
array types.  <>, IN, IS NULL operators are supported. If the array's
elements do not support any scalar operator, the error is thrown in order
to avoid extra work of pxf (otherwise GPDB will reject the result and throw an
error after the data has been processed and returned by the pxf).
A new oid list pxf_supported_array_types[] and a new function
supported_array_type(Oid type) are added to check if a given array type is
supported for filtering. It is used to process array as list constants
inside the opexpr_to_pxffilter() (OpExprToPxfFilter()) function.
pxf_serialize_filter_list() (PxfSerializeFilterList()) is updated to handle
scenarios where a filter operand is an attribute and the other is a list
constant.

List Constant Handling:
The function list_const_to_str() (ListConstToStr()) now includes an additional
bool parameter with_nulls, allowing it to handle NULL values in array constants
appropriately. For the case of IN operators (scalar_array_op_expr_to_pxffilter()
) the NULL value inside the enumeration is meaningless (the comparison with
NULLS can't be performed) and the list_const_to_str() is called with false
with_nulls argument. For the case of array comparison (like a =
ARRAY[TRUE, NULL]) the list_const_to_str() is called with true parameter
with_nulls.

BOOLARRAY type introduction:
New OID BOOLARRAYOID is defined as a new macro and is covered in #ifndef
because GPDB 6 does not have the definition for this type.
The patch makes updates to pxf_supported_opr_scalar_array_op_expr with new entry
BooleanEqualOperator to handle the IN operator for BOOL type. The
list_const_to_str() (ListConstToStr()) also uses predefined string constants
to encode the bool values in the final filter representation.

Other types are handled by extending pxf_supported_opr_op_expr[],
pxf_supported_types[], pxf_supported_array_types[],
pxf_supported_opr_scalar_array_op_expr[] lists and corresponding switch
operators inside the scalar_const_to_str() and list_const_to_str() functions.

The following new data types have been added or extended for array types:

BYTEA, BYTEAARRAY;
FLOAT4, FLOAT8, FLOAT4ARRAY, FLOAT8ARRAY;
BPCHARARRAY;
VARCHARARRAYOID (here the modification of scalar_array_op_expr_to_pxffilter()
was needed due to RelabelType occurence in queries like SELECT * from
test_varchar where t in ('aaa'::varchar(10), 'bbb'::varchar(10));
DATEARRAY;
TIME, TIMEARRAY;
TIMESTAMPARRAY;
TIMESTAMPTZ, TIMESTAMPTZARRAY;
INTERVAL, INTERVALARRAYOID;
NUMERICARRAY;
UUID, UUIDARRAY;
JSONB, JSONBARRAY;
JSON, JSONARRAY (scalar operators and IN are not supported);

In order to launch the regression tests the UserDataVerifyAccessor.java
has been extended to generate more fields of newly supported types.
Automation tests checkFilterPushdown were modified and extendted
to cover most of the newly added types.
GPDB 7 do not have macros for Float84EqualOperator and for UuidEqualOperator.
The patch 58c933c didn't take it into account, what led to compilation errors
for 7 version. This patch replaces the marcos with straight oid representaion
in order to fix extension compilation.
@iamlapa iamlapa merged commit 1743548 into pxf-6.x Jul 30, 2024
1 check passed
@iamlapa iamlapa deleted the feature/ADBDEV-5595 branch July 30, 2024 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0