y2clerk exists to quickly create formatted frequencies tables. It
leverages the tidyverse, allowing the user to %>%
in data frames,
select()
down to lists of variables, group_by()
others, and include
weights for the frequencies. There are two main functions in y2clerk:
freqs()
- creates a frequency table including both ns and percentages for each level of a variable or list of variables.freqs()
is flexible and can also be used for calculating means, medians, or other quartiles while including weighting variables.verbatims_y2()
- creates a table with similar formatting to freqs. However, this function is used for string variables and creates a table with all responses given.cross_freqs()
- creates a frequency table similar to cross tabs. Each group_var given in the function acts as its own unique banner for every variable listed in the freqs. Whereas group_by %>% freqs produces a set of frequencies grouped by a single variable, cross_freqs is designed to produce a set of frequencies grouped by multiple different grouping variables one after another and then combines these results into a single dataframe.
You can install the most updated package version from GitHub with:
# install.packages("devtools")
devtools::install_github("y2analytics/y2clerk")
Below you will find a few basic examples which show you how to quickly
get a frequencies table with freqs()
:
library(y2clerk)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect
7EFB
, setdiff, setequal, union
df <- data.frame(
a = c(1, 2, 2, 3, 4, 2, NA),
b = c(1, 2, 2, 3, 4, 1, NA),
weights = c(0.9, 0.9, 1.1, 1.1, 1, 1, 1)
)
freqs(df, a, b)
#> variable value label n stat result
#> 1 a 1 1 1 percent 0.14
#> 2 a 2 2 3 percent 0.43
#> 3 a 3 3 1 percent 0.14
#> 4 a 4 4 1 percent 0.14
#> 5 a <NA> <NA> 1 percent 0.14
#> 6 b 1 1 2 percent 0.29
#> 7 b 2 2 2 percent 0.29
#> 8 b 3 3 1 percent 0.14
#> 9 b 4 4 1 percent 0.14
#> 10 b <NA> <NA> 1 percent 0.14
df %>% freqs(a, b, wt = weights)
#> variable value label n stat result
#> 1 a 1 1 0.9 percent 0.13
#> 2 a 2 2 3.0 percent 0.43
#> 3 a 3 3 1.1 percent 0.16
#> 4 a 4 4 1.0 percent 0.14
#> 5 a <NA> <NA> 1.0 percent 0.14
#> 6 b 1 1 1.9 percent 0.27
#> 7 b 2 2 2.0 percent 0.29
#> 8 b 3 3 1.1 percent 0.16
#> 9 b 4 4 1.0 percent 0.14
#> 10 b <NA> <NA> 1.0 percent 0.14
freqs(df, stat = 'mean', nas = FALSE)
#> # A tibble: 3 x 6
#> variable value label n stat result
#> <chr> <chr> <chr> <dbl> <chr> <dbl>
#> 1 a "" "" 6 mean 2.33
#> 2 b "" "" 6 mean 2.17
#> 3 weights "" "" 7 mean 1
freqs(df, stat = 'mean', nas = FALSE, wt = weights)
#> # A tibble: 2 x 6
#> variable value label n stat result
#> <chr> <chr> <chr> <dbl> <chr> <dbl>
#> 1 a "" "" 6 mean - weighted 2.37
#> 2 b "" "" 6 mean - weighted 2.2
df %>% group_by(a) %>% freqs(b, stat = 'mean', nas = FALSE, wt = weights)
#> Adding missing grouping variables: `a`
#> Adding missing grouping variables: `a`
#> # A tibble: 4 x 7
#> group_var variable value label n stat result
#> <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl>
#> 1 1 b "" "" 0.9 mean - weighted 1
#> 2 2 b "" "" 3 mean - weighted 1.67
#> 3 3 b "" "" 1.1 mean - weighted 3
#> 4 4 b "" "" 1 mean - weighted 4
If you have issues using y2clerk, please post your issue on GitHub along with a minimal reproducible example. We will do our best to address your issues and get them fixed for the next version of y2clerk.