Comparative Analysis of Machine Learning Models for DDoS Attack Detection

Authors:
Jeremiah Pitts, Betim Hodza, Ilhan Gelle, and Abinash Bastola
The University of Texas at Arlington, Team Bytewise

Abstract

The global connectivity of the Internet demands robust network security to protect systems, making Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS) crucial. Traditional IDS/IPS often struggle with real-time threat detection due to reliance on predefined rules and high false positives. Machine learning (ML) offers a promising solution by enabling real-time detection and classification of malicious traffic. This project evaluates ML models using the UNSW-NB15 dataset, which contains diverse real-world traffic characteristics and multiple attack categories. Data preprocessing techniques—such as feature selection, normalization, and handling class imbalances—are applied to improve model performance. The goal is to assess how well various classification algorithms can differentiate between normal and malicious traffic.

Overview

This repository contains a Python-based machine learning pipeline for detecting DDoS attacks. The code preprocesses the UNSW-NB15 dataset, balances classes with SMOTE, and trains four models (Decision Tree, Random Forest, Extra Trees, and XGBoost). A stacking classifier is built from these base models, and performance is evaluated using accuracy and weighted F1 scores. In addition, live network traffic can be monitored with PyShark, and, if enabled, suspicious IPs are automatically blocked using OS-specific commands.

Furthermore, the pipeline exports several CSV files (feature distributions, class distributions, hyperparameter tuning results, classifier performance metrics, and confusion matrix data) for external visualization using SAS.

File Structure

main.py
The main pipeline code. It contains functions for data preprocessing, model training/evaluation, live DDoS detection/prevention, and CSV export routines for SAS visualization.
trained_feature_names.pkl
Pickle file storing the list of features used by the model (exported from the training pipeline).
trained_stacking_model.pkl
Pickle file for the final stacking classifier.
trained_scaler.pkl
Pickle file for the RobustScaler used to scale the features.
trained_label_encoder.pkl
Pickle file for the LabelEncoder used for the target column.
CSV Files for SAS Visualization:
These files are generated when running the training pipeline with the CSV export functions:
- feature_distribution.csv
- class_distribution.csv
- hyperparameter_tuning.csv
- classifier_performance.csv
- confusion_matrix.csv
generate_figures.sas
A SAS script (using relative paths) that imports the above CSV files to generate figures (histograms, bar charts, heatmaps, etc.) and save them as an HTML file.

Requirements

Python 3.x
Packages:
- pandas
- numpy
- scikit-learn
- xgboost
- imbalanced-learn
- pyshark
- joblib
- matplotlib
- seaborn
A SAS environment (SAS Studio, Enterprise Guide, or similar) to run the SAS visualization script.

Installation

Clone the repository or download the files.

Install the required Python packages using pip:

pip install pandas numpy scikit-learn xgboost imbalanced-learn pyshark joblib matplotlib seaborn

Ensure the UNSW-NB15 dataset CSV files (UNSW_NB15_training-set.csv and UNSW_NB15_testing-set.csv) are placed in the working directory.

Usage

Training and CSV Export

To train the model and generate the CSV files for SAS visualization, run:

python main.py --action train --train_file ./datasets/UNSW_NB15_training-set.csv --test_file ./datasets/UNSW_NB15_testing-set.csv

This command will:

Preprocess the data.
Train the base models and a stacking classifier.
Evaluate the models (printing accuracy, weighted F1 scores, and confusion matrices).
Export the following CSV files to the working directory:
- feature_distribution.csv
- class_distribution.csv
- hyperparameter_tuning.csv
- classifier_performance.csv
- confusion_matrix.csv

Live Monitoring / Prevention

To run the live network monitoring (detection or prevention mode), use the following command (replace Wi-Fi with your network interface name if needed):

python main.py --action monitor --ddos_mode prevention --interface "Wi-Fi" --duration 30 --port 8000

SAS Visualization

After the CSV files are generated by the training pipeline, run the SAS script to generate figures:

Ensure the CSV files and generate_figures.sas are in the same directory.
Open your SAS environment (e.g., SAS Studio or Enterprise Guide).
Open the generate_figures.sas script.
Run the script. It will produce an HTML file (Figures.html) with all the generated figures.

Research Paper

This code supports the research paper titled:

Comparative Analysis of Machine Learning Models for DDoS Attack Detection
Jeremiah Pitts, Betim Hodza, Ilhan Gelle, and Abinash Bastola
The University of Texas at Arlington, Team Bytewise

The paper details the challenges of traditional IDS/IPS, the methodology used (including data preprocessing, model training, and hyperparameter tuning), and the experimental results comparing multiple ML models using the UNSW-NB15 dataset. Please refer to the paper for detailed analysis, tables, and figures that summarize our findings.

Contact Information

Jeremiah Pitts: jnp2934@mavs.uta.edu
Ilhan Gelle: ilhan.gelle@mavs.uta.edu
Betim Hodza: bxh8702@mavs.uta.edu
Abinash Bastola: axb9775@mavs.uta.edu

Additional contact details are available upon request.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
datasets		datasets
demo		demo
sas		sas
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Comparative Analysis of Machine Learning Models for DDoS Attack Detection

Abstract

Overview

File Structure

Requirements

Installation

Usage

Training and CSV Export

Live Monitoring / Prevention

SAS Visualization

Research Paper

Contact Information

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

xchar08/curiosity-cup

Folders and files

Latest commit

History

Repository files navigation

Comparative Analysis of Machine Learning Models for DDoS Attack Detection

Abstract

Overview

File Structure

Requirements

Installation

Usage

Training and CSV Export

Live Monitoring / Prevention

SAS Visualization

Research Paper

Contact Information

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages