This repository contains two datasets that track student performance in Math and Portuguese courses. The datasets are provided in CSV format:
- student-mat.csv: Contains data for the Math course.
- student-por.csv: Contains data for the Portuguese course.
Each dataset has identical structure with 33 attributes that provide a comprehensive view of students' personal, academic, and social factors influencing their performance.
- school: Student's school (binary: "GP" = Gabriel Pereira, "MS" = Mousinho da Silveira).
- sex: Student's gender (binary: "F" = female, "M" = male).
- age: Student's age (numeric: 15-22 years).
- address: Home address type (binary: "U" = urban, "R" = rural).
- famsize: Family size (binary: "LE3" = ≤3, "GT3" = >3).
- Pstatus: Parent's cohabitation status (binary: "T" = living together, "A" = apart).
- Medu: Mother's education (numeric: 0 = none, 1 = primary (4th grade), 2 = 5th-9th grade, 3 = secondary education, 4 = higher education).
- Fedu: Father's education (numeric: same as Medu).
- Mjob: Mother's job (nominal: "teacher", "health", "services", "at_home", "other").
- Fjob: Father's job (nominal: same as Mjob).
- reason: Reason for choosing the school (nominal: "home" = close to home, "reputation" = school reputation, "course" = course preference, "other").
- guardian: Guardian (nominal: "mother", "father", "other").
- traveltime: Home to school travel time (numeric: 1 = <15 min., 2 = 15-30 min., 3 = 30 min.-1 hr, 4 = >1 hr).
- studytime: Weekly study time (numeric: 1 = <2 hrs, 2 = 2-5 hrs, 3 = 5-10 hrs, 4 = >10 hrs).
- failures: Number of past class failures (numeric: n if 1 ≤ n < 3, otherwise 4).
- schoolsup: Extra educational support (binary: "yes", "no").
- famsup: Family educational support (binary: "yes", "no").
- paid: Extra paid classes (binary: "yes", "no").
- activities: Participation in extracurricular activities (binary: "yes", "no").
- n 5A62 ursery: Attended nursery school (binary: "yes", "no").
- higher: Plans for higher education (binary: "yes", "no").
- internet: Internet access at home (binary: "yes", "no").
- romantic: In a romantic relationship (binary: "yes", "no").
- famrel: Quality of family relationships (numeric: 1 = very bad, 5 = excellent).
- freetime: Free time after school (numeric: 1 = very low, 5 = very high).
- goout: Frequency of going out with friends (numeric: 1 = very low, 5 = very high).
- Dalc: Workday alcohol consumption (numeric: 1 = very low, 5 = very high).
- Walc: Weekend alcohol consumption (numeric: 1 = very low, 5 = very high).
- health: Current health status (numeric: 1 = very bad, 5 = very good).
- absences: Number of school absences (numeric: 0-93).
- G1: First period grade (numeric: 0-20).
- G2: Second period grade (numeric: 0-20).
- G3: Final grade (numeric: 0-20, this is the output target).
There are 382 students present in both datasets. These students can be matched by comparing the unique attributes they share, allowing for analysis across both subjects.
To load and preprocess the datasets, you can use the following Python code:
import pandas as pd
# Load datasets
math_df = pd.read_csv('student-mat.csv')
por_df = pd.read_csv('student-por.csv')
# Display the first few rows
print(math_df.head())
print(por_df.head())