GitHub - sisig-ai/doctor: Doctor is a tool for discovering, crawl, and indexing web sites to be exposed as an MCP server for LLM agents.

🩺 Doctor

A tool for discovering, crawl, and indexing web sites to be exposed as an MCP server for LLM agents for better and more up-to-date reasoning and code generation.

🔍 Overview

Doctor provides a complete stack for:

Crawling web pages using crawl4ai
Chunking text with LangChain
Creating embeddings with OpenAI via litellm
Storing data in DuckDB with vector search support
Exposing search functionality via a FastAPI web service
Making these capabilities available to LLMs through an MCP server

🏗️ Core Infrastructure

🗄️ DuckDB

Database for storing document data and embeddings with vector search capabilities
Managed by unified Database class

📨 Redis

Message broker for asynchronous task processing

🕸️ Crawl Worker

Processes crawl jobs
Chunks text
Creates embeddings

🌐 Web Server

FastAPI service exposing endpoints
Fetching, searching, and viewing data
Exposing the MCP server

💻 Setup

⚙️ Prerequisites

Docker and Docker Compose
Python 3.10+
uv (Python package manager)
OpenAI API key

📦 Installation

Clone this repository
Set up environment variables:
```
export OPENAI_API_KEY=your-openai-key
```
Run the stack:
```
docker compose up
```

👁 Usage

Go to http://localhost:9111/docs to see the OpenAPI docs
Look for the /fetch_url endpoint and start a crawl job by providing a URL
Use /job_progress to see the current job status
Configure your editor to use http://localhost:9111/mcp as an MCP server

☁️ Web API

POST /fetch_url: Start crawling a URL
GET /search_docs: Search indexed documents
GET /job_progress: Check crawl job progress
GET /list_doc_pages: List indexed pages
GET /get_doc_page: Get full text of a page

🔧 MCP Integration

Ensure that your Docker Compose stack is up, and then add to your Cursor or VSCode MCP Servers configuration:

"doctor": {
    "type": "sse",
    "url": "http://localhost:9111/mcp"
}

🧪 Testing

Running Tests

To run all tests:

<
86DE
span class="pl-c"># Run all tests with coverage report
pytest

To run specific test categories:

# Run only unit tests
pytest -m unit

# Run only async tests
pytest -m async_test

# Run tests for a specific component
pytest tests/lib/test_crawler.py

Test Coverage

The project is configured to generate coverage reports automatically:

# Run tests with detailed coverage report
pytest --cov=src --cov-report=term-missing

Test Structure

tests/conftest.py: Common fixtures for all tests
tests/lib/: Tests for library components
- test_crawler.py: Tests for the crawler module
- test_chunker.py: Tests for the chunker module
- test_embedder.py: Tests for the embedder module
- test_database.py: Tests for the unified Database class
tests/common/: Tests for common modules
tests/services/: Tests for service layer
tests/api/: Tests for API endpoints

🐞 Code Quality

Pre-commit Hooks

The project is configured with pre-commit hooks that run automatically before each commit:

ruff check --fix: Lints code and automatically fixes issues
ruff format: Formats code according to project style
Trailing whitespace removal
End-of-file fixing
YAML validation
Large file checks

Setup Pre-commit

To set up pre-commit hooks:

# Install pre-commit
uv pip install pre-commit

# Install the git hooks
pre-commit install

Running Pre-commit Manually

You can run the pre-commit hooks manually on all files:

# Run all pre-commit hooks
pre-commit run --all-files

Or on staged files only:

# Run on staged files
pre-commit run

⚖️ License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.cursor/rules		.cursor/rules
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Dockerfile.base		Dockerfile.base
LICENSE.md		LICENSE.md
README.md		README.md
docker-compose.yml		docker-compose.yml
doctor.png		doctor.png
llms.txt		llms.txt
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🩺 Doctor

🔍 Overview

🏗️ Core Infrastructure

🗄️ DuckDB

📨 Redis

🕸️ Crawl Worker

🌐 Web Server

💻 Setup

⚙️ Prerequisites

📦 Installation

👁 Usage

☁️ Web API

🔧 MCP Integration

🧪 Testing

Running Tests

Test Coverage

Test Structure

🐞 Code Quality

Pre-commit Hooks

Setup Pre-commit

Running Pre-commit Manually

⚖️ License

About

Uh oh!

Releases 5

Packages

Uh oh!

Languages

License

sisig-ai/doctor

Folders and files

Latest commit

History

Repository files navigation

🩺 Doctor

🔍 Overview

🏗️ Core Infrastructure

🗄️ DuckDB

📨 Redis

🕸️ Crawl Worker

🌐 Web Server

💻 Setup

⚙️ Prerequisites

📦 Installation

👁 Usage

☁️ Web API

🔧 MCP Integration

🧪 Testing

Running Tests

Test Coverage

Test Structure

🐞 Code Quality

Pre-commit Hooks

Setup Pre-commit

Running Pre-commit Manually

⚖️ License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Languages

Packages