Epsilon is a full-featured web crawler and indexer with a PostgreSQL database and a documented API to interact with the data. It was made for the Epsilon Search Engine.
Epsilon is organized as a monorepo with the following workspaces:
workspaces/api
- Public API to search the database and access analyticsworkspaces/cli
- Main CLI to launch and manage servicesworkspaces/crawler
- Website crawlerworkspaces/database
- SQL models and database functionsworkspaces/favicons
- Favicon downloaderworkspaces/indexer
- Page indexerworkspaces/monitor
- System/database monitoring and analyticsworkspaces/utils
- Shared utility functions
📚 The API documentation is available at http://localhost:<port>/docs
Warning
Epsilon is currently in development. All workspaces still require extensive testing. Known issues include:
favicons
: Does not download all favicons consistentlyindexer
: Already fast, but needs multi-threading for improved performancecrawler
: Sometimes, memory usage increases dramatically until the maximum available memory is used. Crawling speed is not constant because of the domain cooldown (can go from 100page/s to 5page/s). Database deadlocks.
Note
Refer to .env.example
for environment variable documentation.
cargo run
-> Start all servicescargo run -- api indexer
-> Start only the API and the indexercargo run -- - api
-> Start all services except the API
Available services: api
, crawler
, favicons
, indexer
, monitor
Run all tests with:
cargo test
cargo build --release
-> Build the appcargo run --release -- api
-> Launch the API and submit URLs viaPOST /api/request-url
cargo run --release -- monitor crawler
-> Start crawling the webcargo run --release -- monitor indexer favicons
-> Index pages and download faviconscargo run --release -- monitor api
-> Run the search engine
Epsilon uses Diesel for database management.
diesel database reset
-> Drop and recreate the databasediesel migration run
-> Run all migrationsdiesel migration redo
-> Redo the last migration
You will need libpq-dev
(PostgreSQL) and Rust (via rustup
):
sudo apt update
sudo apt install libpq-dev
curl https://sh.rustup.rs -sSf | sh
- Log all requests to a file
If you like the project, please help us at https://patreon.com/sodiumlabs. You can also join our Discord at https://discord.gg/8PDXWSHH7k.
See our other projects at https://sodiumlabs.xyz.