This project is the culmination of work I did for my computer science senior project this past spring. For a more thorough description of the project, see this repo.
This repo contains the code for a twitter bot, operating for now with the handle @FullerDisclosur (thanks to Twitter character limits), that uses the modeling work from my senior project to evaluate whether a given tweet (retweets or originals) from a member of Congress discuss industries from which that member has received significant amounts of money. The bot flags any tweets which do exhibit these conflicts of interest in its own timeline.
In the spirit of escaping perfectionism (and because I'm starting a new job tomorrow, and will be short on time for a while), I'm publishing this project a little before it's ready. The model is one that I trained in the spring, and could use more work. This means that there will be false positives (i.e. non-conflicts of interests that are flagged), and there will certainly be many false negatives, because I have set the threshold fairly high. In fact, the bot as it currently functions will potentially tweet very rarely. When I have time over the coming weeks, I hope to work more on the modeling side of things, and try to improve performance. Expect updates to this repo, and the bot's performance.
- My advisor Professor Dragomir Radev for his support and insights while advising this project.
- Greenhouse, for inspiring this project
- The Center for Responsive Politics’ excellent OpenSecrets.org for the data regarding donations.
- Twitter, Google Finance, Bloomberg, Reuters, the U.S. Securities and Exchange Commission’s EDGAR, and Nasdaq for all hosting data relevant to this project.
- Tweepy, BeautifulSoup, Natural Language Toolkit, pandas, Requests, NumPy, tqdm, scikit-learn, and of course, Jupyter Notebook for making Python so excellent to work with.