Resume-Classification-

Resume Classifier Project

This project develops a machine learning model to classify resumes into categories based on the job descriptions. It utilizes several algorithms like Support Vector Machine (SVM), Multinomial Naive Bayes, Random Forest, and DistilBERT to achieve this.

Installation

Clone the repository:

git clone https://github.com/YourUsername/Resume-Classification.git
cd Resume-Classification

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```

Data

The dataset for this project is collected by scraping resume data from LiveCareer.com. Approximately 8,350 resumes are collected across 10 job categories including Python Developer, Java Developer, Web Developer, Database Administrator, Security Analyst, Project Manager, Frontend Developer, Network Administrator, and Software Developer.

Data Collection Process

Resumes are scraped using Selenium with the Chrome WebDriver. The process involves navigating to specific URLs constructed for each job category and extracting links to individual resumes. Here’s a brief outline of the steps involved in the scraping process:

Set up Selenium WebDriver: Configure Selenium with ChromeDriver to interact with web pages. Navigate through job categories: For each category, generate URLs to navigate through pages of listings on LiveCareer.com. Extract resume links: Collect the href attribute of each resume link on the listing pages. Visit each resume link: For each extracted link, navigate to the corresponding page to access the full resume. Extract resume text: Parse the HTML content of each resume page to extract the textual data. Store data: Save the collected resume texts and their corresponding categories to a DataFrame, then export to a CSV file named Resume.csv. This scraping process ensures the collection of a diverse and extensive dataset that represents various sectors in the job market, suitable for training our classification models.

Data Structure

The resulting dataset comprises columns for Resume text and the corresponding Category. It's stored in a CSV file to facilitate easy access and manipulation for training machine learning models.

Usage

Prepare the dataset by placing it in the data/ directory.
Run the preprocessing script:
```
python preprocess.py
```
Train the model:
```
python train.py
```

Make predictions:

python predict.py --resume path/to/resume.pdf

Model Training

The model is trained using scikit-learn and TensorFlow.
The train.py script can be configured to adjust hyperparameters.

This project includes several classification models:

Support Vector Machine (SVM): Used for high accuracy in categorical classification.

Multinomial Naive Bayes: Effective for word counts or frequency data.

Random Forest: Provides a good benchmark for complex classification tasks.

DistilBERT: Not implemented in the project's current scope but recommended for future scaling to handle contextual embeddings from text data.

Performance

The models are evaluated based on accuracy, precision, recall, and F1-score. Random Forest showed a significant performance with an accuracy of 83%, followed by SVM at 88% and Multinomial Naive Bayes at 79% and DistilBERT at 93%.

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a new branch.
Submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Distill-bert Algorithm.ipynb		Distill-bert Algorithm.ipynb
README.md		README.md
Resume-Classification-main.zip		Resume-Classification-main.zip
Resume_Classification.ipynb		Resume_Classification.ipynb
app.py		app.py
requirements.txt		requirements.txt
web_scraping.py		web_scraping.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Resume-Classification-

Resume Classifier Project

Table of Contents

Installation

Data

Data Collection Process

Data Structure

Usage

Model Training

Performance

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

harshalshahh/Resume-Classification-

Folders and files

Latest commit

History

Repository files navigation

Resume-Classification-

Resume Classifier Project

Table of Contents

Installation

Data

Data Collection Process

Data Structure

Usage

Model Training

Performance

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages