Here are some interesting topics I worked on:
- Currently researching for quasi-linear models that offers interpretability while offering performance on par with neural networks
- Extracted facial landmark keypoints from images with emotion labels and successfully created a linear model that matches accuracy of an existing deep learning model
- Investigated impact of each facial keypoint on emotion prediction logistic regression
- Trying to reduce bias in emotion prediction model by taking into account three dimensional face rotation in degrees
- As this is part of an ongoing research, the code notebook is messy.
- These are non-confidential part of my work at CharacTour (film recommendation engine based on user and character personality matches--there were popular on TikTok!)
- Used SQL & pandas to make service user - personality trait analysis such as ... of all the films in the CharacTour database, Clueless ranks in the 97% percentile regarding having chatty fans.
- Designed code with Pandas and Spacy to preprocess 1000+ film scripts in a data pipeline before AI analysis
I implemented 2d convolution, gaussian filtering, smoothing and downsampling, and sobel gradients using only numpy. Below are the original picture, then my implementation of sobel gradients in the x and y directions.
Referencing the CycleGAN video, I wrote a CycleGAN code that transforms human faces into simpsons and vice versa with Pytorch. The training and testing were performed on 1000 simpson faces and 1000 GAN-generated human faces. After 100~ epochs of training, the image transformations looked plausible (although simpsons -> humans were still very creepy).
Part 1: CDC IL life expectancy data
- Analysis on a linear regression on CDC IL life expectancy data suggested that the percentage of black or African American population was an influential factor in determining positive and negative outliers.
- Components and component plus residual plots displayed that predictors '% households that earn $75000 or more' or '% households without social security income' show nonlinear relationship to life expectancy.
Part 2: Household firearm ownership scores and rates of mortality by firearms from the CDC
- There was a potential non-linear relationship the two variables of interest since spline regression outperformed linear regression
Part 3: Food Access Research Atlas data
- Logistic regression on predictors 'MedianFamilyIncome', 'PrcntSNAP', 'PrcntAA', 'PrcntNoVehicle', 'PrcntHispanic' performed better on urban tracts compared to non-urban ones.
Using Markov chain Monte Carlo methods, I deciphered a paragraph of substitution cipher text "m it vbeh yjmbl. qbl lgb tfwlgo ... ". Calculating the log score of each decryption based on popular bigram count in English (Google Corpus Data), I was able to identify the cipher text as a paragraph from chapter 12 of the novel All Quiet on the Western Front.
With similar methods, I also solved the knapsack problem (trying to pack a set of items, with given values and weights, into a knapsack with a maximum capacity), and I generated five or six-letter strings that sound like English names. This project references code presented in UChicago DATA 21300 (Models in Data Science).