8000 OOM when reading Parquet file · Issue #3969 · duckdb/duckdb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
OOM when reading Parquet file #3969
Closed
Closed
@alexey-milovidov

Description

@alexey-milovidov

What happens?

It is using all available memory and is terminated by OOM.

To Reproduce

Allocate a machine with 32 GB RAM, like c6a.4xlarge on AWS, with Ubuntu 22.04.
ssh into that machine.
Run the following commands:

sudo apt-get update
sudo apt-get install python3-pip
pip install duckdb
wget 'https://datasets.clickhouse.com/hits_compatible/hits.parquet'

Create the following run.py file:

#!/usr/bin/env python3

import duckdb
import timeit

con = duckdb.connect(database='my-db.duckdb', read_only=False)

print("Will load the data")

start = timeit.timeit()
con.execute("CREATE TABLE hits AS SELECT * FROM parquet_scan('hits.parquet')")
end = timeit.timeit()
print(end - start)

Make it executable:

chmod +x run.py

Run it:

./run.py

Wait around 10 minutes...

Will load the data
Killed

Environment (please complete the following information):

  • OS: Ubuntu 22.04
  • DuckDB Version: 0.4.0
  • DuckDB Client: Python

Identity Disclosure:

  • Full Name: Alexey Milovidov
  • Affiliation: ClickHouse, Inc

With OOM it cannot qualify in the ClickHouse benchmark.

Before Submitting

  • Have you tried this on the latest master branch? No.
  • Python: pip install duckdb --upgrade --pre It installs the same version 0.4.0.
  • R: I don't use R.
  • Other Platforms: I don't use other platforms.
  • Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there? Yes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0