A simple R script to merge, clean, and summarize the Human Activity Recognition Using Smartphones data set.
Written for the April 2014 Getting and Cleaning Data course offered by Johns Hopkins University through Coursera.
Requires the data.table (version 1.9.2) and reshape2 R packages.
- Get run_analysis.R on your local machine using whatever method suits you.
- In R, set your working directory to the directory that contains run_analysis.R.
- Download the data set.
- Extract the "UCI HAR Dataset" directory into the same directory as run_analysis.R.
- Your working directory should contain both run_analysis.R and the UCI HAR Dataset directory.
- Execute the script from the R command line with
source("run_analysis.R")
mergedData
- A data.table containing the merged and cleaned data set.tidyData
- A data.table with the average (mean) value of the mean and standard deviation of each measurement, for each subject and activity.tidy.txt
- A text file containing tidyData.
The script performs the steps below to produce a tidy data set with the mean of each std() and mean() feature for each activity and subject, and writes that to the file tidy.txt.
After running, the merged data can be referenced through the mergedData
variable, and the summary data through the tidyData
variable, both of which are of type data.table.
- Combines the training and test feature (X_train.txt and X_text.txt) data from the UCI HAR Dataset directory into one data.table,
mergedData
.
- The Inertial Signals data is not used.
- Applies the names in features.txt to the columns of
mergedData
. - Discards the columns that do not contain mean() and std() in their name.
- Note: meanFreq() columns are not kept.
- Adds two columns to
mergedData
- activity - from y_train.txt and y_test.txt files.
- subject.id - subject_train.txt and subject_test.txt files.
- Replaces the activity column values with the corresponding labels defined in activity_labels.txt.
- Melts
mergedData
using the activity and subject.id columns for id variables. - Casts the molten data, by activity and subject.id, using mean as the aggregate function. The result of this cast operation is stored in the data.table variable
tidyData
. - Writes the cast summary data to tidy.txt in the current working directory.