The purpose of this demo is to show that doing aggregations in-application (app), to work around the lack of joins/aggregates within Cassandra, can work at scale
This demo is built to work with DataStax Entereprise 4.8+ and the Python Cassandra Driver 3.0+ (for the Object Mapper)
- Python PIP
- Python Virtual Environment
- Python Cassandra-driver
These steps are based on using virtualenv.
It is highly recommended to use virtualenv as it keeps packages separate between apps.
Here's a good intro.
Install virtualenv
sudo pip install virtualenv
Enter the directory of your application and create the virtualenv in the app directory (the name env is the standard)
git clone https://github.com/atourkow/at-InAppScoring.git
# Create a virtual environment in the `env` directory
virtualenv env
source env/bin/activate
# If you're using fish shell (like I am):
source env/bin/activate.fish
You are now in the virtualenv, your prompt should reflect this, and are ready to install other python packages.
Type deactivate
to exit the active virtualenv.
#Install Python Requirements: (This can take a while)
pip install -r setup/requirements.txt
# If not started, Start DSE in search mode
dse cassandra -s
# Create the keyspace - We'll create the tables in the generator
deactivate
cqlsh -f setup/setup.cql
source env/bin/activate
# Update _config.py to use your server credentials
vim _config.py
# Get and parse Geo Location data which outputs to GeoLocationsUS.delim.txt
cd setup
wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCity_CSV/GeoLiteCity-latest.zip
unzip GeoLiteCity-latest.zip
python parse_cities_into_geo.py GeoLiteCity_20160105/GeoLiteCity-Location.csv
cd ..
# Create and populate the tables
01.generate_data.py n_experts_start n_experts_stop n_topics_per_expert
You can get a list of topics in setup/topics.txt
- Run the Ratings Generator Feed
python 02.search_data.py n_num_of_total "Comma, Separated, Topics"
- Example
python 02.search_data.py 5 "SoftLayer, GSM, GitHub, Python"