This is a crawler to scrape population and energy-related data from the website https://www.worldometers.info/ using scrapy.
Feel free to use it, modify or suggest modifications.
-
Python 3.6+
-
pip install scrapy
-
cd <project directory>
-
For population or energy spiders:
scrapy crawl <spider name> -o <output file name>.<json/csv>
-
Fossil fuels (oil, natural gas, coal) spider:
scrapy crawl <spider name> -a category=<fossil fuel name> -o <output file name>.<json/csv>
The project contains four spiders:
- population: scrapes population data by country and year.
- energy: scrapes energy consumption by country.
- fossil_fuel (Oil, Natual Gas, Coal): scrapes energy data by fossil fuel type and country.
- Co2 emissions: scrapes carbon dioxide emissions by country.
-
population.py sipder:
scrapy crawl population -o population_dataset.json
-
energy.py spider:
scrapy crawl energy -o energy_dataset.json
-
Example for fuel_type = oil spider (same for gas and coal):
scrapy crawl fossil_fuel -a category=oil -o oil_dataset.json
-
co2_emissions.py spider:
scrapy crawl co2_emissions -o co2_emissions_dataset.json
-
clean_dataset.ipynb:
File used for data munging. Generates final .csv file ready to use. NaNs means no data available in this field.