8000 GitHub - stylianipantela/texttiling: Implementation of the TextTiling algorithm for CS187
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

stylianipantela/texttiling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

  • Kevin Mu, Jonathan Miller, Stella Pantela, and Dianna Hu
  • CS187 - Computational Lingustics
  • Final Project (TextTiling) - Group Implementation
  • README

Setup Instructions


  1. If nltk is already installed, skip to step 5.

  2. Run "python ez_setup.py"

  3. Run (sudo) "easy_install pip"

  4. Run (sudo) "pip install -U nltk"

  5. Run "python", then type "import nltk"

  6. Type "nltk.download()". A new window should open, showing the nltk Downloader.

  7. Click the "corpora" tab.

  8. Select "Stopwords Corpus" (stopwords) and "WordNet" (wordnet), and click Download.

  9. Close the nltk downloader and exit python.


Running Instructions


  1. cd into the project directory

  2. Run: python texttiling.py a) The scores_outfile is the file where you want the results to be written

    e.g., python texttiling.py outfile.txt


Scraping Articles


  1. If you would like to scrape other articles using scraper.py, you can do that by first installing BeautifulSoup.
  2. Then change the 'seed' value in the main() function of scraper.py
  3. You can also adjust the number of articles you scrape (N).
  4. Run: python scraper.py
  5. Verify that the articles were correctly scraped and placed in the articles folder.

About

Implementation of the TextTiling algorithm for CS187

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages

0