8000 GitHub - pgorla-zz/million-books: Search and analysis into the Million Books n-gram corpus using Cassandra, Hadoop, and Solr.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

pgorla-zz/million-books

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Million Books

Search and analysis into books from Wikipedia using Cassandra, Hadoop, and Solr.

Setup

Install the Python requirements with pip.

# cd app
# pip install -r requirements.txt

Download Data

Download the persondata and geo_coordinates from dbpedia.org.

Index into Cassandra

Index the data into Cassandra by running the processor:

$ python process.py

Notes

This cassandra.yaml is symlinked to /etc/dse/cassandra/cassandra.yaml to keep production up to date.

You may see Cannot determine CASSANDRA_CONF after trying to start the dse service.

Append CASSANDRA_CONF="/etc/dse/cassandra" to /etc/init.d/dse.

About

Search and analysis into the Million Books n-gram corpus using Cassandra, Hadoop, and Solr.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0