`sum()` R <-> duckplyr inconsistency when all column entries are NA · Issue #627 · tidyverse/duckplyr · GitHub
More Web Proxy on the site http://driver.im/
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When working with duckdb I noticed the (for R users) unusual behavior that SUM(x) = NA if all entries in x are NA.
I was curious how this is done in duckplyr, and it seems there is the same difference:
library(dplyr)
#> #> Attache Paket: 'dplyr'#> Die folgenden Objekte sind maskiert von 'package:stats':#> #> filter, lag#> Die folgenden Objekte sind maskiert von 'package:base':#> #> intersect, setdiff, setequal, unionduckplyr::duckdb_tibble(x=NA_real_) |>
mutate(
y= sum(x, na.rm=TRUE)
)
#> The duckplyr package is configured to fall back to dplyr when it encounters an#> incompatibility. Fallback events can be collected and uploaded for analysis to#> guide future development. By default, data will be collected but no data will#> be uploaded.#> ℹ Automatic fallback uploading is not controlled and therefore disabled, see#> `?duckplyr::fallback()`.#> ✔ Number of reports ready for upload: 2.#> → Review with `duckplyr::fallback_review()`, upload with#> `duckplyr::fallback_upload()`.#> ℹ Configure automatic uploading with `duckplyr::fallback_config()`.#> # A duckplyr data frame: 2 variables#> x y#> <dbl> <dbl>#> 1 NA NA
tibble(x=NA_real_) |>
mutate(
y= sum(x, na.rm=TRUE)
)
#> # A tibble: 1 × 2#> x y#> <dbl> <dbl>#> 1 NA 0
Looks very similar to #571, but if the behavior is different in the grouped case, we need to revisit at least the documentation and the priority of that ticket.
This is the behavior when having groups. Seems to be what I would've expected based on the first result.
library(dplyr)
#> #> Attache Paket: 'dp
6D6C
lyr'#> Die folgenden Objekte sind maskiert von 'package:stats':#> #> filter, lag#> Die folgenden Objekte sind maskiert von 'package:base':#> #> intersect, setdiff, setequal, uniondf<-
tibble(
x= c(NA_real_, NA_real_, NA_real_, 0, 0, 0),
group= c(1L, 1L, 2L, 2L, 3L, 3L)
)
f<-function(x) {
x|>
mutate(
y= sum(x, na.rm=TRUE),
.by=group
)
}
duckplyr::as_duckdb_tibble(df) |>
f()
#> The duckplyr package is configured to fall back to dplyr when it encounters an#> incompatibility. Fallback events can be collected and uploaded for analysis to#> guide future development. By default, data will be collected but no data will#> be uploaded.#> ℹ Automatic fallback uploading is not controlled and therefore disabled, see#> `?duckplyr::fallback()`.#> ✔ Number of reports ready for upload: 4.#> → Review with `duckplyr::fallback_review()`, upload with#> `duckplyr::fallback_upload()`.#> ℹ Configure automatic uploading with `duckplyr::fallback_config()`.#> # A duckplyr data frame: 3 variables#> x group y#> <dbl> <int> <dbl>#> 1 NA 1 NA#> 2 NA 1 NA#> 3 NA 2 0#> 4 0 2 0#> 5 0 3 0#> 6 0 3 0df|>
f()
#> # A tibble: 6 × 3#> x group y#> <dbl> <int> <dbl>#> 1 NA 1 0#> 2 NA 1 0#> 3 NA 2 0#> 4 0 2 0#> 5 0 3 0#> 6 0 3 0
When working with duckdb I noticed the (for R users) unusual behavior that
SUM(x) = NA
if all entries inx
areNA
.I was curious how this is done in duckplyr, and it seems there is the same difference:
Created on 2025-02-25 with reprex v2.1.1
(y = 0 with R, while y = NA with duckplyr)
Related to or even equivalent to #571 ?
The text was updated successfully, but these errors were encountered: