This project is designed for scraping tweets from Twitter (now X) based on specific hashtags. The code utilizes Python's Selenium library for efficient web scraping. The first script (KeywordScraperV1) extracts tweets related to given keywords and saves them in a JSON file. The JSON file is named using the keywords and the current timestamp.
Before running the code, make sure you have the following:
- The required Python libraries (listed in
requirements.txt
) - Your Twitter Authentication Token (not the API key)
- Log in to Twitter.
- Press F12 (to open Developer Tools).
- Navigate to Application > Cookies > twitter.com > auth_key to get the authentication token.
- Clone the repository or download the project files.
- Install the required Python libraries by running:
pip install -r requirements.txt
- Open the
config.py
file and replace the placeholders with your actual Twitter authentication token:- Set the
TWITTER_AUTH_TOKEN
to your Twitter API authentication token.
- Set the
To extract tweets from some keywords and save them to a JSON file:
- Open
keywordScraperV1.py
. - Set the Twitter keywords you want to scrape in the
keywords
list inside themain
function. - Specify the start and end dates for the data range (in YYYY-MM-DD format). Ensure that
start_date
is earlier thanend_date
. - Run the script in your IDE by executing:
The script will fetch tweets and save them to a JSON file named after the hashtag and timestamp.
python KeywordsScraperV1.py
If you prefer to store the data in a local MySQL database instead of a JSON file, you can use KeywordScraperV2.py.
- Install MySQL connector:
pip install mysql-connector-python
- In the
search_tweets
function, set your MySQL connection parameters (e.g.,root
,user
,password
,database
,table
). - Run the script in your IDE by executing:
This will scrape tweets and save them both in a JSON file and a local MySQL database.
python KeywordScraperV2.py
The JSON format makes it easy to read and write the data, enabling quick initial data analysis. You can extend the functionality to perform more advanced analyses or store the data in different formats.
This is an individual project aimed at gaining familiarity with tools and libraries used for web scraping, specifically the Selenium library. Selenium is one of the most accessible and secure libraries for web scraping in Python, making it a valuable tool for various data extraction tasks. The project was created to enhance practical knowledge of handling web scraping challenges, utilizing automation to extract meaningful data from social media platforms like Twitter.
Personal email: k.anagnostou200328@gmail.com