8000 `sum()` R <-> duckplyr inconsistency when all column entries are NA · Issue #627 · tidyverse/duckplyr · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

sum() R <-> duckplyr inconsistency when all column entries are NA #627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lschneiderbauer opened this issue Feb 25, 2025 · 2 comments
Open

Comments

@lschneiderbauer
Copy link

When working with duckdb I noticed the (for R users) unusual behavior that SUM(x) = NA if all entries in x are NA.
I was curious how this is done in duckplyr, and it seems there is the same difference:

library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#> 
#>     filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#> 
#>     intersect, setdiff, setequal, union

duckplyr::duckdb_tibble(x = NA_real_) |> 
  mutate(
    y = sum(x, na.rm = TRUE)
  )
#> The duckplyr package is configured to fall back to dplyr when it encounters an
#> incompatibility. Fallback events can be collected and uploaded for analysis to
#> guide future development. By default, data will be collected but no data will
#> be uploaded.
#> ℹ Automatic fallback uploading is not controlled and therefore disabled, see
#>   `?duckplyr::fallback()`.
#> ✔ Number of reports ready for upload: 2.
#> → Review with `duckplyr::fallback_review()`, upload with
#>   `duckplyr::fallback_upload()`.
#> ℹ Configure automatic uploading with `duckplyr::fallback_config()`.
#> # A duckplyr data frame: 2 variables
#>       x     y
#>   <dbl> <dbl>
#> 1    NA    NA

tibble(x = NA_real_) |> 
mutate(
  y = sum(x, na.rm = TRUE)
)
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1    NA     0

Created on 2025-02-25 with reprex v2.1.1

(y = 0 with R, while y = NA with duckplyr)

Related to or even equivalent to #571 ?

@krlmlr
Copy link
Member
krlmlr commented Feb 25, 2025

Interesting. What happens in the grouped case?

Looks very similar to #571, but if the behavior is different in the grouped case, we need to revisit at least the documentation and the priority of that ticket.

@lschneiderbauer
Copy link
Author

This is the behavior when having groups. Seems to be what I would've expected based on the first result.

library(dplyr)
#> 
#> Attache Paket: 'dp
6D6C
lyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#> 
#>     filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <-
  tibble(
    x = c(NA_real_, NA_real_, NA_real_, 0, 0, 0),
    group = c(1L, 1L, 2L, 2L, 3L, 3L)
  )

f <- function(x) {
  x |> 
    mutate(
      y = sum(x, na.rm = TRUE),
      .by = group
    )  
}

duckplyr::as_duckdb_tibble(df) |>
  f()
#> The duckplyr package is configured to fall back to dplyr when it encounters an
#> incompatibility. Fallback events can be collected and uploaded for analysis to
#> guide future development. By default, data will be collected but no data will
#> be uploaded.
#> ℹ Automatic fallback uploading is not controlled and therefore disabled, see
#>   `?duckplyr::fallback()`.
#> ✔ Number of reports ready for upload: 4.
#> → Review with `duckplyr::fallback_review()`, upload with
#>   `duckplyr::fallback_upload()`.
#> ℹ Configure automatic uploading with `duckplyr::fallback_config()`.
#> # A duckplyr data frame: 3 variables
#>       x group     y
#>   <dbl> <int> <dbl>
#> 1    NA     1    NA
#> 2    NA     1    NA
#> 3    NA     2     0
#> 4     0     2     0
#> 5     0     3     0
#> 6     0     3     0

df |>
  f()
#> # A tibble: 6 × 3
#>       x group     y
#>   <dbl> <int> <dbl>
#> 1    NA     1     0
#> 2    NA     1     0
#> 3    NA     2     0
#> 4     0     2     0
#> 5     0     3     0
#> 6     0     3     0

Created on 2025-02-27 with reprex v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0