A tool for downloading web pages, snapshots from the Wayback Machine and creating into a ZIM file.
Warning
Still in heavy development, use at your own risk.
- Download web pages and their resources
- Integrates with Internet Archive's Wayback Machine
- Supports recursive downloading with configurable depth
- Preserves page structure and converts links
- Creates timestamped output directories
- Handles both HTTP and HTTPS URLs
- Create a ZIM file
- Install with Go:
go install github.com/Sudo-Ivan/website-archiver@latest
-
Download binary from releases page.
-
Use with Docker:
docker run -it -v ./archive:/app/archive ghcr.io/sudo-ivan/website-archiver:latest [options] <url1> [url2] [url3] ... [depth]
website-archiver [--zim|-z] [--all-snapshots|-as] [--snapshot|-s YYYYMMDDHHMMSS] <url1> [url2] [url3] ... [depth]
Download a single page:
website-archiver https://example.com
Download with ZIM file creation:
website-archiver --zim https://example.com
Download all available snapshots:
website-archiver --all-snapshots https://example.com
Download a specific snapshot:
website-archiver --snapshot 20230101000000 https://example.com
- wget
- ImageMagick (for ZIM file creation)
- zim-tools (for ZIM file creation)
- Go 1.24 or higher
wget
command-line toolzimwriterfs
command-line tool (zim-tools)
The tool creates a directory named downloads/<domain>_<timestamp>
containing the downloaded files. The timestamp format is YYYYMMDD_HHMMSS
.
- Invalid URLs are rejected
- Failed downloads trigger cleanup of partial downloads
- Wayback Machine integration failures fall back to direct downloads
- Invalid depth values are rejected
MIT
Contributions are welcome! Please feel free to submit a Pull Request.