Dealing with missing data using attention and latent space regularization

A principled strategy for training and performing inference on observed data only, without imputation or dropping rows.

Experiments

We tested the properties of the latent space model using a synthetic dataset and benchmarked the method on real world datasets.

Spiral Dataset Experiment 1

Showing the properties of the latent space using the binary classification synthetic spiral dataset augmented with 2 additional variables. These 2 variables are a purely noise variable containing no information about the outcome, and a signal variable with some information. We show that the noise variable is represented as very close to the point of maximal uncertainty in the latent space, whereas the signal variable is further away in 'informative' space.

Spiral Dataset Experiment 2

A further exploration of the latent space and feature important representation in the setting of missingness. We show that the distance in the latent space between an informative variable and uninformative variable decreases as missingness increases in the informative variable. Additionally, we show that the feature importance as defined in the concrete dropout layer decreases as missingness increases.

OpenML Benchmarking

We benchmark the performance of this approach using OpenML Benchmarking in two ways. Firstly we test performance on datasets with complete data and the same datasets that are corrupted with three types of missingness pattern (MCAR, MAR and MNAR). Secondly we test performance on datasets with incomplete data and unknown missingness pattern. As a comparison, we use the popular and highly performing Light-GBM that handles missingness out-of-the-box. We also compare the out-of-the-box missingness handling of these algorithms against an impute and regress strategy. Imputation strategies tested include simple imputation, multivariate imputation, and multiple imputation with random forests.

Requirements

Please install a version of Jax appropriate for your system (eg. GPU enabled), then install the requirements in the requirements.txt file.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
LSAM		LSAM
aux		aux
benchmarkaux		benchmarkaux
cd-diagram		cd-diagram
results		results
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
benchmark.py		benchmark.py
get_datainfo.py		get_datainfo.py
meta-analysis.py		meta-analysis.py
process_results.py		process_results.py
requirements.txt		requirements.txt
run.sh		run.sh
spiral_experiment1.py		spiral_experiment1.py
spiral_experiment2.py		spiral_experiment2.py
sweeps.py		sweeps.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dealing with missing data using attention and latent space regularization

Experiments

Spiral Dataset Experiment 1

Spiral Dataset Experiment 2

OpenML Benchmarking

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

jahanpd/Missingness

Folders and files

Latest commit

History

Repository files navigation

Dealing with missing data using attention and latent space regularization

Experiments

Spiral Dataset Experiment 1

Spiral Dataset Experiment 2

OpenML Benchmarking

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages