8000 GitHub - Noeti/worldometers-crawler: Web scraping project to crawl worldomerters website using scrapy framework
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Noeti/worldometers-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Worldometers Scraper Project

This is a crawler to scrape population and energy-related data from the website https://www.worldometers.info/ using scrapy.

Feel free to use it, modify or suggest modifications.

Requirements:

Usage:

  1. cd <project directory>

  2. For population or energy spiders:

    scrapy crawl <spider name> -o <output file name>.<json/csv>

  3. Fossil fuels (oil, natural gas, coal) spider:

    scrapy crawl <spider name> -a category=<fossil fuel name> -o <output file name>.<json/csv>

Worldometers website crawler

The project contains four spiders:

  • population: scrapes population data by country and year.
  • energy: scrapes energy consumption by country.
  • fossil_fuel (Oil, Natual Gas, Coal): scrapes energy data by fossil fuel type and country.
  • Co2 emissions: scrapes carbon dioxide emissions by country.

Usage:

  • population.py sipder:

    scrapy crawl population -o population_dataset.json

  • energy.py spider:

    scrapy crawl energy -o energy_dataset.json

  • Example for fuel_type = oil spider (same for gas and coal):

    scrapy crawl fossil_fuel -a category=oil -o oil_dataset.json

  • co2_emissions.py spider:

    scrapy crawl co2_emissions -o co2_emissions_dataset.json

Notes:

  • clean_dataset.ipynb:

    File used for data munging. Generates final .csv file ready to use. NaNs means no data available in this field.

About

Web scraping project to crawl worldomerters website using scrapy framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0