This repository contains the NBA positions dataset. The main dataset is
designed to replace the iris dataset and contains basic statistics about
150 NBA Centers, Point Guards and Shooting Guards from 2017. There is a
also a expanded dataset (nba_positions_full
) the includes the same
statistics for all NBA players in 2017.
The source file for creating this dataset was obtained from Kaggle at https://www.kaggle.com/drgilermo/nba-players-stats/data?select=Seasons_Stats.csv which was originally scraped from Basketball Reference.
If you use this dataset please acknowledge the Kaggle dataset and Basketball Reference (following the instructions here).
Here is a brief description of how the dataset was created but see
nba_positions.R
for details:
- Read the Kaggle dataset
- Select rows where
Year == 2017
- Select columns containing player, team, games played, positions, turnover percentage, rebound percentage, assist percentage and field goal percentage
- Scale field goal percentage to the range 0-100 to match other
statistics (this is the
nba_positions_full
dataset) - Summarise statistics for players who played on multiple teams in 2017
- Select Centers, Point Guards and Shooting Guards who played more than 10 games
- Cluster the players using k-means with three clusters
- Select 50 Centers that are most like their cluster, 50 random Point Guards and 50 random Shooting Guards
Their are two TSV files. The nba_positions.tsv
file contains the
dataset most similar to the iris data.
- nba_positions.tsv 150 selected NBA players with these columns:
Position
- playing position (“Center”, “PointGuard” or “ShootingGuard”)TurnoverPct
- turnover percentageReboundPct
- rebound percentageAssistPct
- assist percentageFieldGoalPct
- field goal percentage
- nba_positions_full.tsv all NBA players in 2017 with these
additional columns:
Player
- player nameTeam
- team nameGames
- games played
Example plots showing distributions of the different statistics.