8000 GitHub - xchar08/curiosity-cup
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

xchar08/curiosity-cup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Comparative Analysis of Machine Learning Models for DDoS Attack Detection

Authors:
Jeremiah Pitts, Betim Hodza, Ilhan Gelle, and Abinash Bastola
The University of Texas at Arlington, Team Bytewise


Abstract

The global connectivity of the Internet demands robust network security to protect systems, making Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS) crucial. Traditional IDS/IPS often struggle with real-time threat detection due to reliance on predefined rules and high false positives. Machine learning (ML) offers a promising solution by enabling real-time detection and classification of malicious traffic. This project evaluates ML models using the UNSW-NB15 dataset, which contains diverse real-world traffic characteristics and multiple attack categories. Data preprocessing techniques—such as feature selection, normalization, and handling class imbalances—are applied to improve model performance. The goal is to assess how well various classification algorithms can differentiate between normal and malicious traffic.


Overview

This repository contains a Python-based machine learning pipeline for detecting DDoS attacks. The code preprocesses the UNSW-NB15 dataset, balances classes with SMOTE, and trains four models (Decision Tree, Random Forest, Extra Trees, and XGBoost). A stacking classifier is built from these base models, and performance is evaluated using accuracy and weighted F1 scores. In addition, live network traffic can be monitored with PyShark, and, if enabled, suspicious IPs are automatically blocked using OS-specific commands.

Furthermore, the pipeline exports several CSV files (feature distributions, class distributions, hyperparameter tuning results, classifier performance metrics, and confusion matrix data) for external visualization using SAS.


File Structure

  • main.py
    The main pipeline code. It contains functions for data preprocessing, model training/evaluation, live DDoS detection/prevention, and CSV export routines for SAS visualization.

  • trained_feature_names.pkl
    Pickle file storing the list of features used by the model (exported from the training pipeline).

  • trained_stacking_model.pkl
    Pickle file for the final stacking classifier.

  • trained_scaler.pkl
    Pickle file for the RobustScaler used to scale the features.

  • trained_label_encoder.pkl
    Pickle file for the LabelEncoder used for the target column.

  • CSV Files for SAS Visualization:
    These files are generated when running the training pipeline with the CSV export functions:

    • feature_distribution.csv
    • class_distribution.csv
    • hyperparameter_tuning.csv
    • classifier_performance.csv
    • confusion_matrix.csv
  • generate_figures.sas
    A SAS script (using relative paths) that imports the above CSV files to generate figures (histograms, bar charts, heatmaps, etc.) and save them as an HTML file.


Requirements

  • Python 3.x
  • Packages:
    • pandas
    • numpy
    • scikit-learn
    • xgboost
    • imbalanced-learn
    • pyshark
    • joblib
    • matplotlib
    • seaborn
  • A SAS environment (SAS Studio, Enterprise Guide, or similar) to run the SAS visualization script.

Installation

  1. Clone the repository or download the files.

  2. Install the required Python packages using pip:

    pip install pandas numpy scikit-learn xgboost imbalanced-learn pyshark joblib matplotlib seaborn
  3. Ensure the UNSW-NB15 dataset CSV files (UNSW_NB15_training-set.csv and UNSW_NB15_testing-set.csv) are placed in the working directory.


Usage

Training and CSV Export

To train the model and generate the CSV files for SAS visualization, run:

python main.py --action train --train_file ./datasets/UNSW_NB15_training-set.csv --test_file ./datasets/UNSW_NB15_testing-set.csv

This command will:

  • Preprocess the data.
  • Train the base models and a stacking classifier.
  • Evaluate the models (printing accuracy, weighted F1 scores, and confusion matrices).
  • Export the following CSV files to the working directory:
    • feature_distribution.csv
    • class_distribution.csv
    • hyperparameter_tuning.csv
    • classifier_performance.csv
    • confusion_matrix.csv

Live Monitoring / Prevention

To run the live network monitoring (detection or prevention mode), use the following command (replace Wi-Fi with your network interface name if needed):

python main.py --action monitor --ddos_mode prevention --interface "Wi-Fi" --duration 30 --port 8000

SAS Visualization

After the CSV files are generated by the training pipeline, run the SAS script to generate figures:

  1. Ensure the CSV files and generate_figures.sas are in the same directory.
  2. Open your SAS environment (e.g., SAS Studio or Enterprise Guide).
  3. Open the generate_figures.sas script.
  4. Run the script. It will produce an HTML file (Figures.html) with all the generated figures.

Research Paper

This code supports the research paper titled:

Comparative Analysis of Machine Learning Models for DDoS Attack Detection
Jeremiah Pitts, Betim Hodza, Ilhan Gelle, and Abinash Bastola
The University of Texas at Arlington, Team Bytewise

The paper details the challenges of traditional IDS/IPS, the methodology used (including data preprocessing, model training, and hyperparameter tuning), and the experimental results comparing multiple ML models using the UNSW-NB15 dataset. Please refer to the paper for detailed analysis, tables, and figures that summarize our findings.


Contact Information

Additional contact details are available upon request.


License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0