Topic Modeling Exploration of Reddit Dog Communities

Goal

The goal of this project is to explore reddit dog communities using natural language processing and unsupervised learning. Pulling one year's worth of data from the top seven highest-subscribed reddit dog breed communities, I used NLP and topic modeling techniques to identify topics most commonly discussed among subreddit communities, and derived "doggolingo" terms from the corpus. Finally, I built an app to allow users to explore the meaning of different doggolingo terms.

Methodologies

Pulled all 2019 post and comment data from the seven most highly subscribed dog breed subreddits from Googe's BigQuery, in total covering roughly 80K reddit posts.
Used SpaCy pipelines to preprocess the text corpus.
Ran topic modeling on the corpus using Count Vectorizer, TF-IDF, LSA, NMF, LDA, and CorEx.
Used a different a different SpaCy pipeline to pre-process the corpus to derive "doggolingo" words.
Built a doggolingo exploration app using streamlit.

Outline of Files

all_breeds_topic_modeling notebook: data preprocessing and topic modeling.
doggolingo notebook: data processing to derive "doggolingo" and wordcloud generation, as well as data cleaning and preparation for streamlit app.
presentation pdf: final presentation for Metis program
See the doggolingo explained repo for code and final data for the streamlit app.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
NLPPipeline.py		NLPPipeline.py
README.md		README.md
TopicModeling.py		TopicModeling.py
all_breeds_topic_modeling.ipynb		all_breeds_topic_modeling.ipynb
doggolingo.ipynb		doggolingo.ipynb
reddit_dogs_presentation.pdf		reddit_dogs_presentation.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Topic Modeling Exploration of Reddit Dog Communities

Goal

Methodologies

Outline of Files

About

Uh oh!

Releases

Packages

Uh oh!

Languages

labb0t/dog-communities-nlp

Folders and files

Latest commit

History

Repository files navigation

Topic Modeling Exploration of Reddit Dog Communities

Goal

Methodologies

Outline of Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages