8000 Regression: reading multiple Parquet files · Issue #1015 · duckdb/duckdb-r · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Regression: reading multiple Parquet files #1015

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
krlmlr opened this issue Jan 20, 2025 · 1 comment · Fixed by #1024
Closed

Regression: reading multiple Parquet files #1015

krlmlr opened this issue Jan 20, 2025 · 1 comment · Fixed by #1024

Comments

@krlmlr
Copy link
Collaborator
krlmlr commented Jan 20, 2025

Introduced in c52bef5.

if (!file.exists("1.parquet")) {
  duckplyr::compute_parquet(data.frame(a = 1, b = 2), "1.parquet")
}
#> Loading duckplyr
#> # A duckplyr data frame: 2 variables
#>       a     b
#>   <dbl> <dbl>
#> 1     1     2
if (!file.exists("2.parquet")) {
  duckplyr::compute_parquet(data.frame(a = 2, b = 3), "2.parquet")
}
#> # A duckplyr data frame: 2 variables
#>       a     b
#>   <dbl> <dbl>
#> 1     2     3

urls <- c("1.parquet", "2.parquet")

con <- DBI::dbConnect(duckdb::duckdb())
duckdb:::rel_from_table_function(con, "read_parquet", list(list(urls)))
#> Error:
#> ! {"exception_type":"Invalid Input","exception_message":"Failed to cast value: Unimplemented type for cast (VARCHAR -> r_string)"}

Created on 2025-01-20 with reprex v2.1.1

duckdb 1.1.3-1

if (!file.exists("1.parquet")) {
  duckplyr::compute_parquet(data.frame(a = 1, b = 2), "1.parquet")
}
#> Loading duckplyr
#> # A duckplyr data frame: 2 variables
#>       a     b
#>   <dbl> <dbl>
#> 1     1     2
if (!file.exists("2.parquet")) {
  duckplyr::compute_parquet(data.frame(a = 2, b = 3), "2.parquet")
}
#> # A duckplyr data frame: 2 variables
#>       a     b
#>   <dbl> <dbl>
#> 1     2     3

urls <- c("1.parquet", "2.parquet")

con <- DBI::dbConnect(duckdb::duckdb())
duckdb:::rel_from_table_function(con, "read_parquet", list(list(urls)))
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> read_parquet([1.parquet, 2.parquet])
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (DOUBLE)
#> - b (DOUBLE)

Created on 2025-01-20 with reprex v2.1.1

@krlmlr
Copy link
Collaborator Author
krlmlr commented Jan 20, 2025

Specifically, 721d611 is the trigger -- motivated by the upstream change, but not caused by it. Interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant
0