- Brisbane, Australia
-
-
-
-
Statistically answered 8 research questions using Multiple Factor Analysis (MFA), Principal Component Analysis (PCA), Multiple Linear Regression, Welch's t-test, Wilcoxon signed-rank test, and Long…
r statistical-analysis principal-component-analysis unsupervised-machine-learning multiple-linear-regression mixed-effects-models longitudinal-analysisHTML UpdatedAug 20, 2022 -
-
-
-
Human-Resource-Data-Mining Public
5 analytical tasks have been completed using VAT validated gower-PAM clustering, Correspondence Analysis (CA), Asym-Biplot, Multiple Correspondence Analysis (MCA), Chi-Squared test, Regression, and…
R UpdatedJul 6, 2022 -
-
-
VEV model from Mclust among 5 clustering algorithms has optimal performance and detected 8 distinct groups of users. Data was cleaned, standardized and feature-selected, PCA’s biplot, Ggplot, Radar…
R UpdatedMay 27, 2022 -
This project applies multiple correspondence analysis (MCA) with the techniques in scree plot, variable plots, individual plots, biplot, cosine square (CO2) and contribution statistcs (contrib) to …
UpdatedMay 8, 2022 -
-
Solved 7 business tasks and identified statistical important variables related to loan application. Many plots were synthesised during EDA and machine learning. Models built include Logistic regres…
R UpdatedFeb 9, 2022 -
320k obs and 11 vars cleaned and manipulated for EDA and mapping (choropleth, cluster, points) to find a new home for a Brisbane family.
-
Data manipulation, imputation, feature engineering, and machine learning algorithms (K-Nearest neightbour, random forest, and extreme-gradient boosting) were applied to clean the dataset. A final, …
HTML UpdatedOct 26, 2021 -
Solve 9 analysis tasks and identified the most important variables in driving the success of clothes sales. Achieved via 22 plots, multiple linear regression and random forest
R UpdatedOct 21, 2021 -
Extracted statistical relationships between house prices and many factors, applicationised the 90% R2 Random Forest model that outcompeted MLR, Lasso, PLS, KNN, and DT into production.
R UpdatedOct 10, 2021 -
SimpleTalkDemo_R Public
Forked from SQLSuperGuru/SimpleTalkDemo_RDemo data and R script for Simple Talk aricle
R UpdatedOct 5, 2021 -
Dirty-Data-Challenge- Public
Clean, manipulate, transform, and join 4 messy datasets
R UpdatedOct 3, 2021 -
18k obs & 14 vars cleaned and manipulated for EDA, assumption tests, PP, WO, Ljung-Box, and forecasting (ETS & ARIMA) for avocado prices in the US and Houston.
-
Marketing_Analytics Public
Solved 9 biz tasks by 18 graphs and 10 statistical methods include dummy data partitioning (RMSE & R2), stepwise model selection, multicollinearity (correlation, VIF), MLR, GLM for logistic regress…
-
Built an ML API that recommends crop classes with 99.5% accuracy; Trained 13 models included Discriminants analyses, KNN, SVMs, Naive Bayers, Decision Tree, Random Forest (RF), and Boosted RF.
-
Bike-Share_Big_Data_Analysis Public
12 datasets, 3.7 million obs, & 13 vars were cleaned and manipulated for 6 graphs, dynamic map, and statistics to convert casual riders into members.
-
A factorial Split-plot system analysed by Shapiro-Wilk test, Levene’s test, Q-Q plot, CI plot, Mixed-Effect Model, ANOVA, and Tukey test.
-
ResortHotel_versus_CityHotel Public
119k obs & 32 vars cleaned and manipulated to create 14 distinct graphs and statistic tables for an extensive EDA to draw insights.
-
A CRD system (8 treatments & 3 harvests) analysed by Shapiro-Wilk test, Q-Q plot, Levene’s test, Kruskal-Wallis test, and Dunn’s Post-hoc test.
-
A multi-environment Latin Square designed trial analysed by ANOVA, Two-way ANOVA, Fully Random Model, Mixed Effect Model, and Tukey test.