A professional-grade phishing detection system featuring multiple machine learning algorithms accessible via FastMCP servers. Built with clean architecture principles and trained on 160,000+ real spam/phishing emails.
All processing is done locally on your machine. No email content, subjects, or sender information is ever sent to external services, APIs, or cloud providers.
# Install the package
pip install -e ".[dev]"
# Start a server (e.g., Neural Network with 96.6% accuracy)
make serve-nn
# Or use the CLI
inbox-sentinel server start neural-network
# Check available models
inbox-sentinel models list
inbox-sentinel/
โโโ inbox_sentinel/ # Main package
โ โโโ core/ # Base classes, types, exceptions
โ โโโ ml/ # Machine learning components
โ โ โโโ models/ # Model implementations
โ โ โโโ preprocessing/ # Feature extraction
โ โ โโโ training/ # Training utilities
โ โโโ servers/ # MCP server implementations
โ โ โโโ base/ # Base server class
โ โ โโโ mcp/ # FastMCP servers
โ โโโ config/ # Configuration management
โ โโโ utils/ # Utilities
โ โโโ scripts/ # CLI scripts
โโโ data/ # Data directory
โ โโโ models/ # Trained models (*.pkl)
โ โโโ datasets/ # Training datasets
โโโ tests/ # Test suite
โโโ docs/ # Documentation
Model | Algorithm | Test Accuracy | Key Features |
---|---|---|---|
naive-bayes |
Multinomial Naive Bayes | 96.25% | Fast, interpretable, great for text |
svm |
Support Vector Machine | 95.75% | RBF kernel, 3,882 support vectors |
random-forest |
Random Forest | 93.95% | 100 trees, feature importance |
logistic-regression |
Logistic Regression | 95.75% | Linear, highly interpretable |
neural-network |
Neural Network (MLP) | 96.60% | 3-layer architecture, best accuracy |
- Pre-trained Models: All models trained on real spam datasets
- Feature Engineering: TF-IDF + 15 manual features (URLs, keywords, patterns)
- Ensemble Methods: 5 consensus strategies for combining predictions
- Real-time Analysis: Fast inference (<100ms per email)
- Explainable AI: Feature importance and confidence scores
- LLM Orchestration: Use local LLMs to intelligently coordinate multiple models
- Forwarded Email Support: Automatically parse and analyze Gmail forwarded emails
- Clean Architecture: Separation of concerns, SOLID principles
- Type Safety: Full type hints with custom types
- Configuration Management: Pydantic settings with env support
- Testing: Comprehensive test suite with pytest
- CLI Tools: Rich CLI interface for all operations
- Documentation: Complete API and usage docs
Models trained on 161,640 emails from 6 datasets:
- SpamAssassin (5,809 emails)
- Enron Spam (29,767 emails)
- Ling Spam (2,859 emails)
- CEAS 2008 (39,154 emails)
- Nazario Phishing (1,565 emails)
- Phishing Email Dataset (82,486 emails)
Distribution: 51% spam/phishing, 49% legitimate
The orchestration feature runs multiple ML models in parallel and combines their results for more accurate detection:
# Simple consensus-based orchestration (no dependencies)
inbox-sentinel orchestrate -F email.txt --forwarded
# LLM-powered orchestration with Ollama (requires setup)
inbox-sentinel orchestrate -F email.txt --forwarded --llm-provider ollama --model-name llama2
Two Orchestration Modes:
-
Simple Consensus (Default)
- Runs all 5 ML models in parallel
- Uses majority voting (e.g., 4/5 models = spam)
- Calculates average confidence scores
- No additional dependencies required
- Fast and reliable
-
LLM-Powered (Advanced)
- Uses local LLM to coordinate analysis
- LLM selects which models to query
- Provides natural language explanations
- Can adapt strategy based on results
- Requires Ollama + LangChain setup
How It Works:
- Each MCP server (Naive Bayes, SVM, Random Forest, Logistic Regression, Neural Network) is wrapped as a tool
- In simple mode: All tools are called and results are combined
- In LLM mode: The AI agent decides which tools to use and interprets results
# View available models and their status
inbox-sentinel models list
# Train all models
inbox-sentinel models train
# Verify trained models
inbox-sentinel models verify
# Analyze an email
inbox-sentinel analyze -c "Email content" -s "Subject" -f "sender@email.com"
# Analyze a forwarded Gmail email
inbox-sentinel analyze -F forwarded_email.txt --forwarded
# Orchestrate multiple models with consensus
inbox-sentinel orchestrate -F email.txt --forwarded
# Start a specific MCP server
inbox-sentinel server start neural-network
Each server provides these tools:
analyze_email
- Analyze an email for spam/phishingtrain_model
- Train with new datainitialize_model
- Initialize/load pre-trained modelget_model_info
- Get model information
For advanced analysis using a local LLM to orchestrate multiple detection models:
Windows:
# 1. Download and install from https://ollama.ai/download/windows
# 2. Start Ollama server (in a separate terminal)
ollama serve
# 3. Pull a model (in your main terminal)
ollama pull llama2 # 7B parameters, balanced
# Or use smaller/faster models:
ollama pull phi # 2.7B parameters, very fast
ollama pull mistral # 7B parameters, fast
macOS/Linux:
# 1. Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# 2. Start Ollama server
ollama serve
# 3. Pull a model
ollama pull llama2
# Option 1: Install LangChain dependencies as an extra
pip install -e ".[langchain]"
# Option 2: Install LangChain dependencies separately
pip install langchain langchain-community langchain-openai
# Option 3: Use the requirements file
pip install -r requirements-langchain.txt
# Analyze forwarded email with LLM orchestration
inbox-sentinel orchestrate -F email.txt --forwarded --llm-provider ollama --model-name llama2
# Or use simple consensus-based orchestration (no LLM required)
inbox-sentinel orchestrate -F email.txt --forwarded --llm-provider simple
Recommended Models for Tool Use:
# Mistral - Better at following tool-use instructions
ollama pull mistral
inbox-sentinel orchestrate -F email.txt --forwarded --llm-provider ollama --model-name mistral
# Mixtral - Excellent at structured outputs
ollama pull mixtral
inbox-sentinel orchestrate -F email.txt --forwarded --llm-provider ollama --model-name mixtral
Troubleshooting LLM Orchestration:
If the LLM gets stuck or doesn't use tools correctly:
- Try a different model - Mistral and Mixtral are better at tool use than Llama2
- Check Ollama is running -
curl http://localhost:11434/api/tags
- Use simple orchestration - Works reliably without LLM:
--llm-provider simple
- Install dependencies -
pip install langchain langchain-community nest-asyncio
The LLM orchestration provides:
- Intelligent tool selection based on email characteristics
- Natural language explanations of decisions
- Adaptive analysis strategies
- Context-aware reasoning about phishing patterns
Note: Some models (like Llama2) may struggle with the structured format required for tool use. If you experience issues, the simple consensus-based orchestration provides excellent results without requiring an LLM.
Orchestrated Email Analysis
Subject: Claim Your Merlin Chain Early Users Reward Now
Sender: hello@merlinteamnews.blog
Using consensus-based orchestration
โ
Initialized all 5 models
โญโโโโโโโโโ Orchestrated Analysis Result โโโโโโโโโโฎ
โ SPAM/PHISHING โ
โ โ
โ Consensus: 4/5 models detected spam โ
โ Average Confidence: 58.8% โ
โ โ
โ Individual Results: โ
โ โข naive_bayes: LEGITIMATE (16.7%) โ
โ โข svm: SPAM (53.5%) โ
โ โข random_forest: SPAM (28.4%) โ
โ โข logistic_regression: SPAM (99.9%) โ
โ โข neural_network: SPAM (95.5%) โ
โ โ
โ Recommendation: DO NOT trust this email. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
from inbox_sentinel.ml.models import NeuralNetworkDetector
from inbox_sentinel.core.types import Email
# Initialize detector
detector = NeuralNetworkDetector()
await detector.initialize(use_pretrained=True)
# Analyze email
email = Email(
content="Your account will be suspended...",
subject="Urgent Security Alert",
sender="security@paypal-verify.tk"
)
result = await detector.analyze(email)
print(f"Is Spam: {result.is_spam}")
print(f"Confidence: {result.confidence:.1%}")
# Clone the repository
git clone https://github.com/your-org/inbox-sentinel.git
cd inbox-sentinel
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
make install-dev
# Run tests
make test
# Format code
make format
make help # Show all available commands
make serve-nn # Start Neural Network server
make serve-svm # Start SVM server
make train # Train all models
make test # Run test suite
make lint # Run code quality checks
make format # Format code with black/isort
make clean # Clean build artifacts
# Run all tests
pytest
# Run with coverage
pytest --cov=inbox_sentinel
# Run specific test file
pytest tests/unit/test_detectors.py
Email
: Email data structurePredictionResult
: Single model predictionEnsembleResult
: Combined prediction from multiple modelsConsensusStrategy
: Enum for ensemble strategies
BaseDetector
: Abstract base for all detectorsBaseMCPServer
: Base class for MCP servers
- Environment variables via
.env
file - Pydantic settings for type-safe configuration
- Model-specific configurations in
config/model_config.py
- Full Documentation
- Architecture G 58F2 uide
- Training Details
- FastMCP Documentation
- Model Context Protocol
We welcome contributions! Please see our Contributing Guide for details.
- Additional ML algorithms (XGBoost, LightGBM)
- Deep learning models (BERT, Transformers)
- Real-time learning capabilities
- Email header analysis
- Attachment scanning
- Multi-language support
MIT License - See LICENSE file for details.
This project is for educational and defensive security purposes only.