This repository contains a few phonetic search / indexing algorithms implemented in Python.
Unless otherwise noted, these are all (C) Copyright 2015, Mads Olsgaard, released under BDS 3
Moreover this repository also contains two corpus files.
- names.csv
- badwords.csv
names.csv
is a list of first and last names collected from the 1990 US census and contains 155.947 unique names.
Source: http://www.census.gov/topics/population/genealogy/data/1990_census/1990_census_namefiles.html
badwords.csv
is a collection of English swearwords collected online. Words have not been checked for offensiveness or correctness.
Sources: Consist mostly of words from noswearing.com and Google's official list of bad words
Both corpora are considered public domain, and free to use.