Natural Language Processing (NLP) Projects

Welcome to my repository for ECE 467: Natural Language Processing. Here, you'll find my projects exploring various NLP techniques and algorithms.

Project: Text Categorization using Naïve Bayes

In this project, I've implemented a text categorization system using the Naïve Bayes algorithm. The system processes articles, tokenizes the text, removes stopwords, and applies stemming before calculating the likelihood and prior probabilities. These probabilities are then used to classify the text into predefined categories.

Features:

Tokenization: Splits text into individual words, removing punctuation.
Stopword Removal: Eliminates common words that do not contribute to the meaning of the text.
Stemming: Reduces words to their root form.
Smoothing Techniques: Implements Laplacian and Jelinek-Mercer smoothing to handle unseen words.

Performance:

The system was tested on three different corpora, showing varying levels of accuracy. Laplacian smoothing with a constant alpha of 0.058 was chosen based on its overall performance.

Usage:

To use this text categorization system, follow these steps:

Place CHI_naive_bayes.py in the /TC_provided directory of your project.
Run the script and follow the prompts to input the names of the training and test files.
The program will output a file with the predicted labels, which can be compared to the true labels for accuracy assessment.

For more detailed instructions and to view the performance results, please refer to the project documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
TC_provided		TC_provided
.DS_Store		.DS_Store
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
Text Categorization project write up.docx		Text Categorization project write up.docx
Text Categorization project write up.pdf		Text Categorization project write up.pdf
~$xt Categorization project write up.docx		~$xt Categorization project write up.docx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Natural Language Processing (NLP) Projects

Project: Text Categorization using Naïve Bayes

Features:

Performance:

Usage:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

raymond-chii/NLP-Text-Categorization

Folders and files

Latest commit

History

Repository files navigation

Natural Language Processing (NLP) Projects

Project: Text Categorization using Naïve Bayes

Features:

Performance:

Usage:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages