vignettes/Checking-ranges-and-legal-values.Rmd
Checking-ranges-and-legal-values.RmdChecking that data are within an acceptable or plausible range is an important basic check to apply to quantitative data. Checking that data are recorded with appropriate legal values or codes is an important basic check to apply to categorical data.
We will use the dataset rl.ex01 that is included in the nipnTK package.
#> age sex weight height muac oedema
#> 1 12 2 6.7 68.5 148 2
#> 2 6 1 6.4 65.0 125 2
#> 3 6 2 6.5 65.6 125 2
#> 4 8 1 7.2 68.4 144 2
#> 5 12 M 6.1 65.4 114 2
#> 6 8 1 7.7 66.5 146 2
The rl.ex01 dataset contains anthropometry data from a SMART survey from Angola.
We can use the summary() function to examine range (and other summary statistics) of a quantitative variable:
This returns:
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 11.1 128.0 139.0 140.3 148.0 999.0
A graphical examination can also be made:

The “whiskers” on the boxplot extend to 1.5 times the interquartile range from the ends of the box (i.e., the lower and upper quartiles). This is known as the inner fence. Data points that are outside the inner fence are considered to be mild outliers. The NiPN data quality toolkit provides an R language function outliersUV() that uses the same method to identify outliers:
This returns:
#>
#> Univariate outliers : Lower fence = 98, Upper fence = 178
#> age sex weight height muac oedema
#> 33 24 1 9.8 74.5 180.0 2
#> 93 12 2 6.7 67.0 96.0 1
#> 126 16 2 9.0 74.6 999.0 2
#> 135 18 2 8.5 74.5 999.0 2
#> 194 24 M 7.0 75.0 95.0 2
#> 227 8 M 6.2 66.0 11.1 2
#> 253 35 2 7.6 75.6 97.0 2
#> 381 24 1 10.8 82.8 12.4 2
#> 501 36 2 15.5 93.4 185.0 2
#> 594 21 2 9.8 76.5 13.2 2
#> 714 59 2 18.9 98.5 180.0 2
#> 752 48 2 15.6 102.2 999.0 2
#> 756 59 1 19.4 101.1 180.0 2
#> 873 59 1 20.6 109.4 179.0 2