Staff Research Scientist at Stanford University with expertise in data science, modeling building and big data engineering. Experienced in survey, behavioral and biological data analysis. Specialized in building scalable data pipelines, measurement assessment, statistical models and automated workflows for 100+ TB datasets.
Languages: Python, R, Bash
Statistical Methods: EFA, PCA, SEM, Linear/Logistic/Ordinal/Multinomial/Hierarchical Regression, Time Series Analysis, Dimensionality Reduction, A/B Testing, Predictive Modeling
Python Stack: pandas, numpy, scipy, scikit-learn, statsmodels, matplotlib, seaborn, etc
R Stack: tidyverse, ggplot2, lmer, lavaan, lm/glm, emmeans, psych, etc
Cloud & Infrastructure: AWS (S3, EC2, CLI), Docker, uv, HPC clusters, distributed computing
Data Engineering: ETL pipelines, Git/GitHub, automated workflows, data validation/quality control
PyReliMRI - Python package for statistical reliability analysis in large-scale datasets.
OpenNeuro GLM FitLins - Automated analysis pipeline making 500+ task fMRI datasets more accessible, reducing manual resource costs by 70%+ through containerized simplified downloading/filtering, data reshaping, statistical model building and cloud computing.
ABCD-BIDS E-Prime Processor - Automated workflow for the largest consortium-led study in the United States, processing behavioral data from 20,000+ subjects across 20+ sites. Converts E-Prime files to fMRI-ready format with comprehensive quality control at the subject- and group-level.
HCP-YA Preprocessing - End-to-end processing workflow for behavioral and fMRI data for one of the foundational MRI studies in the US. Processing 1000+ subjects, 28TB dataset, and generating BIDS-compliant descriptives and fitting an HCP and alternative statistical model to the task-based timeseries data.
- Built end-to-end stiatical pipelines processing and analyzing 30+ TB datasets on AWS/HPC
- Led collaborative teams of researchers, statisticians and analysts
- Published 35+ peer-reviewed research products
- Created 5+ open-source packages with openly distribution code code
- Contributed with data engineering and statistical knowledge to a large-scale ($500M+) NIH-funded study
For more details, check out my personal webpage