8000 GitHub - labb0t/dog-communities-nlp: topic modeling of different dog breed subreddits
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

labb0t/dog-communities-nlp

Repository files navigation

Topic Modeling Exploration of Reddit Dog Communities

Goal

The goal of this project is to explore reddit dog communities using natural language processing and unsupervised learning. Pulling one year's worth of data from the top seven highest-subscribed reddit dog breed communities, I used NLP and topic modeling techniques to identify topics most commonly discussed among subreddit communities, and derived "doggolingo" terms from the corpus. Finally, I built an app to allow users to explore the meaning of different doggolingo terms.

Methodologies

  1. Pulled all 2019 post and comment data from the seven most highly subscribed dog breed subreddits from Googe's BigQuery, in total covering roughly 80K reddit posts.
  2. Used SpaCy pipelines to preprocess the text corpus.
  3. Ran topic modeling on the corpus using Count Vectorizer, TF-IDF, LSA, NMF, LDA, and CorEx.
  4. Used a different a different SpaCy pipeline to pre-process the corpus to derive "doggolingo" words.
  5. Built a doggolingo exploration app using streamlit.

Outline of Files

About

topic modeling of different dog breed subreddits

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0