Reference: The second model for this project draws heavily on the concepts described in Rodriguez et al. (2008) on The Nested Dirichlet Process.
We would also like to thank Alessandro Carminati and Alessandra Ragni for their help and guidance.
This project aims to cluster nations based on their educational performance using data from the OECD Program for International Student Assessment (PISA). The dataset includes school and student questionnaires, providing a rich basis for probabilistic modeling.
Two primary approaches are considered:
- Bayesian Semiparametric Approach
- Nested Dirichlet Process
Contributed to:
@InProceedings{10.1007/978-3-031-96303-2_58,
author="Carminati, Alessandro and Guglielmi, Alessandra and Ragni, Alessandra",
editor="di Bella, Enrico and Gioia, Vincenzo and Lagazio, Corrado and Zaccarin, Susanna",
title="Bayesian Nonparametric Clustering of Schools and Countries Based on Mathematics Proficiency",
booktitle="Statistics for Innovation II",
year="2025",
publisher="Springer Nature Switzerland",
address="Cham",
pages="356--361",
abstract="Analyzing educational performance across countries is essential for understanding the strengths and weaknesses of different school systems, ultimately leading to their improvement. In this paper, we analyze data from the Programme for International Student Assessment's 2018 survey to cluster countries and schools based on students' mathematical proficiency within schools. We adopt a Bayesian nonparametric model-based clustering approach, specifically the nested Dirichlet process mixture model, which allows for incorporating school-specific covariates and clustering at both the school and country levels. Our findings reveal patterns in educational outcomes that can inform targeted policy interventions.",
isbn="978-3-031-96303-2"
}
Model
Where:
-
$M$ : Precision parameter, controlling the variability of the Dirichlet process -
$T_{it}$ : Number of students in country i and school t -
$\log(T_{it})$ : Offset term to normalize the count data -
$y_{it}$ : Number of low-achieving students -
$b_i$ : Clustering component from the Dirichlet process, shared by subjects in the same cluster
Model
Where
The file create_sorted_map.R
generates a world map where different clusters are represented using a gradient of colors. A greener color indicates higher educational performance.
In contrast, map.ipynb
is a simplified version of the previous file. Instead of using a gradient, it assigns the same color to all countries within the same cluster.
If you have Homebrew installed, run:
brew install --cask julia
Otherwise, run:
curl -fsSL https://install.julialang.org | sh
Run:
winget install julia -s msstore
Run:
curl -fsSL https://install.julialang.org | sh
Once Julia is installed, open a terminal, navigate to the project root, and run:
using Pkg
Pkg.activate("julia_environment")
Pkg.instantiate()
Exit via exit()
.
Navigate to the project root and run:
julia <fileName>.jl
Alternatively, you can run the files using VSCode or an IDE of your choice.