8000 GitHub - migue1neto/Idealista: Web scraping and data analysis of all Idealista listings in Portugal.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

migue1neto/Idealista

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Idealista

Idealista Web Scraping & Data Analysis

Sraping the website The project began with scraping the Idealista website. This was achieved with Selenium (for pagination purposes) and Beautiful Soup (to parse and extract the HTML).

There is a dicitonary, which goes along with the scraper, which includes all of the districts, municipalities or locations which are to be scraped. What is important here is to make sure no one link to be scraped includes more than 1.950 ads (30 ads per page * 65 pages). This is because even though a district, location or municipality may include more than 1.950 ads, Idealista will only show the ads up to around page 65.

Data Analysis The data analysis began by cleaning and treating the data, and then exploring the housing market through different lenses.

Importantly, listings which were considered duplicates were eliminated. The criteria to eliminate was such that if the Listing_Area, Price, Location_1 and Floor were exactly the same, they would be eliminated. This is not a perfect solution, but rather a simpler one.

About

Web scraping and data analysis of all Idealista listings in Portugal.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0