This project is a website copier that allows users to download and save the complete structure of a website, including HTML, CSS, JavaScript, and media files. The project is built using Selenium and other web scraping tools.
- Download full website structure (HTML, CSS, JavaScript, and assets)
- Handle dynamic content loaded via JavaScript
- Save pages locally with original structure
- Multi-threaded downloading for efficiency
- Python 3.x
- Selenium
- BeautifulSoup
- Requests
- ChromeDriver or Edge WebDriver (based on your browser)
- Clone the repository:
git clone https://github.com/nematovN/website_copier.git cd website_copier
- Install dependencies:
pip install -r requirements.txt
- Download and set up the appropriate WebDriver for your browser (Chrome or Edge).
- Run the script with the target website URL:
python copier.py --url "https://example.com"
- The copied website will be saved in the
output/
directory.
You can modify the config.json
file to customize:
- User-Agent headers
- Output directory
- Exclusion rules
- Make sure the website you are copying allows scraping (check robots.txt)
- Avoid excessive requests to prevent being blocked
- Do not use this tool for unauthorized or unethical purposes
MIT License