library(nipnTK)

Checking that data are within an acceptable or plausible range is an important basic check to apply to quantitative data. Checking that data are recorded with appropriate legal values or codes is an important basic check to apply to categorical data.

Checking quantitative data

We will use the dataset rl.ex01 that is included in the nipnTK package.

#>   age sex weight height muac oedema
#> 1  12   2    6.7   68.5  148      2
#> 2   6   1    6.4   65.0  125      2
#> 3   6   2    6.5   65.6  125      2
#> 4   8   1    7.2   68.4  144      2
#> 5  12   M    6.1   65.4  114      2
#> 6   8   1    7.7   66.5  146      2

The rl.ex01 dataset contains anthropometry data from a SMART survey from Angola.

We can use the summary() function to examine range (and other summary statistics) of a quantitative variable:

summary(svy$muac)

This returns:

#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    11.1   128.0   139.0   140.3   148.0   999.0

A graphical examination can also be made:

boxplot(svy$muac, horizontal = TRUE, xlab = "MUAC (mm)", frame.plot = FALSE)

The “whiskers” on the boxplot extend to 1.5 times the interquartile range from the ends of the box (i.e., the lower and upper quartiles). This is known as the inner fence. Data points that are outside the inner fence are considered to be mild outliers. The NiPN data quality toolkit provides an R language function outliersUV() that uses the same method to identify outliers:

svy[outliersUV(svy$muac), ]

This returns:

#> 
#> Univariate outliers : Lower fence = 98, Upper fence = 178
#>     age sex weight height  muac oedema
#> 33   24   1    9.8   74.5 180.0      2
#> 93   12   2    6.7   67.0  96.0      1
#> 126  16   2    9.0   74.6 999.0      2
#> 135  18   2    8.5   74.5 999.0      2
#> 194  24   M    7.0   75.0  95.0      2
#> 227   8   M    6.2   66.0  11.1      2
#> 253  35   2    7.6   75.6  97.0      2
#> 381  24   1   10.8   82.8  12.4      2
#> 501  36   2   15.5   93.4 185.0      2
#> 594  21   2    9.8   76.5  13.2      2
#> 714  59   2   18.9   98.5 180.0      2
#> 752  48   2   15.6  102.2 999.0      2
#> 756  59   1   19.4  101.1 180.0      2
#> 873  59   1   20.6  109.4 179.0      2