8000 Should collect 'fully' materialize a duckplyr_df? · Issue #724 · tidyverse/duckplyr · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Should collect 'fully' materialize a duckplyr_df? #724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TimTaylor opened this issue May 29, 2025 · 3 comments
Open

Should collect 'fully' materialize a duckplyr_df? #724

TimTaylor opened this issue May 29, 2025 · 3 comments

Comments

@TimTaylor
Copy link
Contributor

Calling str() on a collected duckplyr_df is very slow on first use.

I assumed collect fully materialized the object. Is this a misunderstanding on my part?

packageVersion("duckplyr")
#> [1] '1.1.0.9000'
base_url <- "https://blobs.duckdb.org/flight-data-partitioned/Year=2024/data_0.parquet"
flights_parquet <- read_parquet_duckdb(base_url)
x <- collect(flights_parquet)
system.time(str(x))
#>    user  system elapsed 
#>   4.898   0.623   5.539
system.time(str(x))
#>    user  system elapsed 
#>   0.014   0.000   0.014
@TimTaylor
Copy link
Contributor Author

I'm assuming it is loading some columns as ALTREP vectors. If so, do you think is it worth forcing their materialisation too?

@krlmlr
Copy link
Member
krlmlr commented May 29, 2025

Thanks, interesting. This could be strings that are allocated (in the R world) on demand.

Why is it important to "fully" materialize?

@TimTaylor
Copy link
Contributor Author

Why is it important to "fully" materialize?

A good question to which I don't have a good answer! Just a feeling that something feels a little 'not quite right'. Perhaps a little additional commentary in the documentation is sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0