8000 Parquet nullable strings converted to JSON · Issue #17126 · duckdb/duckdb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Parquet nullable strings converted to JSON #17126
Open
@OneCyrus

Description

@OneCyrus

What happens?

with duckdb 1.3.0 we have issues when reading multiple parquet files due to auto converting of null values to JSON columns. basically some parquet files just end up with JSON columns due to empty columns. when trying duckdb 1.3.0 we get the following error:

the column "memberDn" has type VARCHAR, but we are trying to read it as type JSON.
  This can happen when reading multiple Parquet files. The schema information is taken from the first Parquet file by default.

in duckdb 1.2.2 this results in a JSON column without an error.

To Reproduce

test1.json

[
    {
        "groupDn":  "TEST 1",
        "memberDn":  null
    },
    {
        "groupDn":  "TEST 2",
        "memberDn":  null
    },
    {
        "groupDn":  "TEST 3",
        "memberDn":  null
    }
]

test2.json

[
    {
        "groupDn": "TEST 1",
        "memberDn": "a"
    },
    {
        "groupDn": "TEST 2",
        "memberDn": "b"
    },
    {
        "groupDn": "TEST 3",
        "memberDn": "c"
    }
]
copy (select * from 'test1.json') to test1.parquet;
copy (select * from 'test2.json') to test2.parquet;
select * from read_parquet('test*.parquet');

OS:

ubuntu 22.4

DuckDB Version:

1.3.0 previews from the last couple of weeks (latest as well)

DuckDB Client:

python and cli

Hardware:

No response

Full Name:

Daniel Gut

Affiliation:

Aveniq

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a nightly build

Did you include all relevant data sets for reproducing the issue?

No - I cannot share the data sets because they are confidential

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0