AI Distiller (`aid`)

Note: This is the very first version of this tool. We would be very grateful for any feedback in the form of a discussion or by creating an issue on GitHub. Thank you!

🚀 MCP Server Available: Install the Model Context Protocol server for AI Distiller from NPM: @janreges/ai-distiller-mcp - seamlessly integrate with Claude, Cursor, and other MCP-compatible AI tools!

🤔 Why AI Distiller?

Do you work with large-scale projects that have thousands of files and functions? Do you struggle with AI tools like Claude Code, Gemini, Copilot, or Cursor frequently "hallucinating" and generating code that looks correct at first glance but is actually incompatible with your project?

The problem is context. AI models have a limited context window and cannot comprehend your entire codebase. Instead, AI agents search files, "grep" for keywords, look at a few lines before and after the found term, and try (often, but not always) to guess the interface of your classes and functions. The result? Code full of errors that guesses parameters, returns incorrect data types, and ignores the existing architecture. If you are a sophisticated user of AI agents (vibe coder), you know that you can help yourself by instructing the AI agent to consistently write and run tests, using static code analysis, pre-commit hooks, etc. - the AI agent will usually fix the code itself, but in the meantime it will take 20 steps and 5 minutes. On the other hand, it must be admitted that if you pay for each AI request (and large context is an expensive factor) and are not "playing for time", you may not mind this limited context approach.

AI Distiller (or aid for short) helps solve this problem. Its main function is code "distillation" – a process where it extracts only the most essential information from the entire project (ideally from the main source folder, or a specific module subdirectory for extremely large projects) that the AI needs to write code correctly on the first try. This distillation usually generates a context that is only 5-20% of the original source code volume, allowing AI tools to include it in their context. As a result, the AI uses the existing code exactly as it was designed, not by trial and error.

Very simply, it can be said that aid, within the distillation process, will leave only the public parts of the interface, input and output data types, but in the default state it will discard method implementations and non-public structures. But everything is configurable via CLI Options.

🤔 Why AI Distiller?
✨ Key Features
🎯 How It Works
🚀 Quick Start
- One-Line Installation
- Basic Usage
📖 Example Output
📖 Guides & Examples
- Deep Code Analysis Prompt Generation
- 🤖 Use with Claude Desktop (MCP)
📖 Complete CLI Reference
🛠️ Advanced Usage
- 🚫 Ignoring Files with .aidignore
- 🎯 Git History Analysis Mode
⚠️ Limitations
🔒 Security Considerations
❓ FAQ
🤝 Contributing
📄 License
🙏 Acknowledgments

✨ Key Features

Feature	Description
🚀 Extreme Speed	Processes tens of megabytes of code in hundreds of milliseconds. By default, it uses 80% of available CPU cores, but can be configured, e.g., with `--workers=1` to use only a single CPU core.
🧠 Intelligent Distillation	Understands 12+ programming languages and extracts only public APIs (methods, properties, types).
⚙️ High Configurability	Allows including private, protected, and internal members, implementation, or comments.
🤖 AI Prompt Generation	Generates ready-to-use prompts with distilled code for AI analysis. The tool creates files with prompts that AI agents can then execute for security audits, refactoring, etc. See `--ai-action` switch.
📋 Analysis Automation	Creates a complete checklist and directory structure for AI agents, who can then systematically analyze the entire project. See the flow-for-* actions for the `--ai-action` switch.
📜 Git Analysis	Processes commit history and prepares data for in-depth analysis of development quality and team dynamics.
💻 Multi-platform	A single binary file with no dependencies for Windows, Linux, and macOS (x64 & ARM).
🔌 Integration via MCP	Can be integrated into tools like Claude Code, VS Code, Cursor, Windsurf and others thanks to the included MCP server.

🎯 Intelligent Filtering

Control exactly what to include with our new granular flag system:

Visibility Control:

--public=1 (default) - Include public members
--protected=0 (default) - Exclude protected members
--internal=0 (default) - Exclude internal/package-private
--private=0 (default) - Exclude private members

Content Control:

--comments=0 (default) - Exclude comments
--docstrings=1 (default) - Include documentation
--implementation=0 (default) - Exclude function/methods bodies
--imports=1 (default) - Include import/use statements

Default behavior: Shows only public API signatures with basic documentation - perfect for AI understanding while maintaining maximum compression.

🤖 AI-Powered Analysis Prompt Generation

AI Distiller generates specialized prompts combined with distilled code for AI-driven analysis:

--ai-action=flow-for-deep-file-to-file-analysis - Generates task lists and prompts for systematic file-by-file analysis
--ai-action=flow-for-multi-file-docs - Creates documentation workflow prompts with code structure
Output to files - Prompts are saved to .aid/ directory (or use --stdout for small codebases)
Ready for AI execution - Generated files contain both the analysis prompt and distilled code
AI agent instructions - Output includes guidance for AI agents to read and process the generated files
Gemini advantage - 1M token context window perfect for larger codebase analysis

Note: AI Distiller doesn't perform the analysis itself - it prepares optimized prompts that AI agents (Claude, Gemini, ChatGPT) then execute. Users often need to explicitly ask their AI agent to process the generated file or copy its contents to web-based AI tools.

📝 Multiple Output Formats

Text (--format text) - Ultra-compact for AI consumption (default)
Markdown (--format md) - Clean, structured Markdown
JSON Structured (--format json-structured) - Rich semantic data for tools
JSONL (--format jsonl) - Streaming format
XML (--format xml) - Legacy system compatible

📊 Smart Summary Output

After each distillation, AI Distiller displays a summary showing compression efficiency and processing speed:

# Default: Visual progress bar for interactive terminals (green dots = saved, red dots = remaining)
✨ Distilled 970 files [░░░░░░░░░░░░░░░] 98% (10M → 256K) in 231ms 💰 ~2.4M tokens saved (~64k remaining)

# Choose your preferred format with --summary-type
aid ./src --summary-type=stock-ticker
📊 AID 97.6% ▲ │ SIZE: 10M→256K │ TIME: 231ms │ EST: ~2.4M tokens saved

# JSON output
aid ./src --summary-type=json

{
  "original_bytes": 70020,
  "distilled_bytes": 8244,
  "savings_pct": 88.22622107969151,
  "duration_ms": 6,
  "tokens_before": 17505,
  "tokens_after": 2061,
  "tokens_saved": 15444,
  "token_savings_pct": 88.22622107969151,
  "file_count": 9,
  "output_path": "/home/user/project/.aid/aid.processor.txt",
  "tokenizer": "cl100k_base"
}

Available formats:

visual-progress-bar (default) - Shows compression as a progress bar
stock-ticker - Compact stock market style display
speedometer-dashboard - Multi-line dashboard with metrics
minimalist-sparkline - Single line with all essential info
ci-friendly - Clean format for CI/CD pipelines
json - Machine-readable JSON output
off - Disable summary output

Use --no-emoji to remove emojis from any format.

📁 Smart Project Root Detection

AI Distiller automatically detects your project root and centralizes all outputs in a .aid/ directory:

Automatic detection: Searches upward for .aidrc, go.mod, package.json, .git, etc.
Consistent location: All outputs go to <project-root>/.aid/ regardless of where you run aid
Cache management: MCP cache stored in .aid/cache/ for better organization
Easy cleanup: Add .aid/ to .gitignore to keep outputs out of version control

Detection priority:

.aidrc file - Create this empty file to explicitly mark your project root
Language markers - go.mod, package.json, pyproject.toml, etc.
Version control - .git directory
Environment variable - AID_PROJECT_ROOT (fallback if no markers found)
Current directory - Final fallback with warning

# Mark a specific directory as project root (recommended)
touch /my/project/.aidrc

# Run from anywhere in your project - outputs always go to project root
cd deep/nested/directory
aid ../../../src  # Output: <project-root>/.aid/aid.src.txt

# Use environment variable as fallback (useful for CI/CD)
AID_PROJECT_ROOT=/build/workspace aid src/

🌍 Language Support

Currently supports 12 languages via tree-sitter:

Full Support: Python, Go, JavaScript, PHP, Ruby
Beta: TypeScript, Java, C#, Rust, Kotlin, Swift, C++
Coming Soon: Zig, Scala, Clojure

Language-Specific Documentation:

C++ - C++11/14/17/20 support with templates, namespaces, modern features
C# - Complete C# 12 support with records, nullable reference types, pattern matching
Go - Full Go support with interfaces, goroutines, generics (1.18+)
Java - Java 8-21 support with records, sealed classes, pattern matching
JavaScript - ES6+ support with classes, modules, async/await
Kotlin - Kotlin 1.x support with coroutines, data classes, sealed classes
PHP - PHP 7.4+ with PHP 8.x features (attributes, union types, enums)
Python - Full Python 3.x support with type hints, async/await, decorators
Ruby - Ruby 2.x/3.x support with blocks, modules, metaprogramming
Rust - Rust 2018/2021 editions with traits, lifetimes, async
Swift - Swift 5.x support with protocols, extensions, property wrappers
TypeScript - TypeScript 4.x/5.x with generics, decorators, type system

🎯 How It Works

Scans your codebase recursively for supported file types (10+ languages)
Parses each file using language-specific tree-sitter parsers (all bundled, no dependencies)
Extracts only what you need: public APIs, type signatures, class hierarchies
Outputs in your preferred format: compact text, markdown, or structured JSON

All tree-sitter grammars are compiled into the aid binary - zero external dependencies!

🚀 Transform Massive Codebases Into AI-Friendly Context

The Problem: Modern codebases contain thousands of files with millions of lines. But for AI to understand your code architecture, suggest improvements, or help with development, it doesn't need to see every implementation detail - it needs the structure and public interfaces.

The Solution: AI Distiller extracts only what matters - public APIs, types, and signatures - reducing codebase size by 90-98% while preserving all essential information for AI comprehension.

< F438 td>🐘 laravel

Project	Files	Original Tokens	Distilled Tokens	Fits in Context¹	Speed²
⚛️ `react`	1,781	~5.5M	250K (-95%)	✅ Gemini³	2,875 files/s
🎨 `vscode`	4,768	~22.5M	2M (-91%)	⚠️ Needs chunking	5,072 files/s
🐍 `django`	970	~10M	256K (-97%)	✅ Gemini³	4,199 files/s
📦 `prometheus`	685	~8.5M	154K (-98%)	✅ Claude/Gemini	3,071 files/s
🦀 `rust-analyzer`	1,275	~5.5M	172K (-97%)	✅ Claude/Gemini	10,451 files/s
🚀 `astro`	1,058	~10.5M	149K (-99%)	✅ Claude/Gemini	5,212 files/s
💎 `rails`	394	~1M	104K (-90%)	✅ ChatGPT-4o	4,864 files/s
1,443	~3M	238K (-92%)	✅ Gemini³	4,613 files/s
⚡ `nestjs`	802	~1.5M	107K (-93%)	✅ ChatGPT-4o	8,813 files/s
👻 `ghost`	2,184	~8M	235K (-97%)	✅ Gemini³	4,719 files/s

_{¹ Context windows: ChatGPT-4o (128K), Claude (200K), Gemini (1M). ✅ = fits completely, ⚠️ = needs splitting}
_{² Processing speed with 12 parallel workers on AMD Ryzen 7945HX. Use -w 1 for serial mode or -w N for custom workers.}
_{³ These frameworks exceed 200K tokens and work only with Gemini due to its larger 1M token context window.}

🎯 Why This Matters for AI-Assisted Development

Large codebases are overwhelming for AI models. A typical web framework like Django has ~10 million tokens of source code. Even with Claude's 200K context window, you'd need to split it into 50+ chunks, losing coherence and relationships between components.

But here's the good news: Most real-world projects that teams have invested hundreds to thousands of hours developing are much smaller. Thanks to AI Distiller, the vast majority of typical business applications, SaaS products, and internal tools can fit entirely within AI context windows, enabling unprecedented AI assistance quality.

⚠️ The Hidden Problem with AI Coding Tools

Most AI agents and IDEs are "context misers" - they try to save tokens at the expense of actual codebase knowledge. They rely on:

🔍 Grep/search to find relevant code snippets
📄 Limited context showing only 10-50 lines around matches
🎲 Guessing interfaces based on partial information

This is why AI-generated code often fails on first attempts - the AI is literally guessing method signatures, parameter types, and return values because it can't see the full picture.

AI Distiller changes the game by giving AI complete knowledge of:

✅ Exact interfaces of all classes, methods, and functions
✅ All parameter types and their expected values
✅ Return types and data structures
✅ Full inheritance hierarchies and relationships

Instead of playing "code roulette", AI can now write correct code from the start.

Result: Django's 10M tokens compress to just 256K tokens - suddenly the entire framework fits in a single AI conversation, leading to:

🎯 More accurate suggestions - AI sees all available APIs at once
🚀 Less hallucination - No more inventing methods that don't exist
💡 Better architecture advice - AI understands the full system design
⚡ Faster development - Especially for "vibe coding" with AI assistance
💰 40x cost reduction - Pay for 256K tokens instead of 10M on API calls

🔧 Flexible for Different Use Cases

# Process entire codebase (default: public APIs only)
aid ./my-project

# Process specific directory or module
aid ./my-project/src/auth
aid ./my-project/src/api

# Process a directory
aid ./my-project/core/

# Process individual file
aid src/main.py

# Include protected/private for deeper analysis
aid ./my-project --private=1 --protected=1

# Include implementations for small projects
aid ./my-small-lib --implementation=1

# Everything for complete understanding
aid ./micro-service --private=1 --protected=1 --implementation=1

Granular Control: Process your entire codebase, specific modules, directories, or even individual files. Perfect for focusing AI on the exact context it needs - whether that's understanding the whole system architecture or diving deep into a specific authentication module.

📈 Full benchmark details | 🧪 Reproduce these results

🚀 Quick Start

One-Line Installation

macOS / Linux / WSL:

# Install to ~/.aid/bin (recommended, no sudo required)
curl -sSL https://raw.githubusercontent.com/janreges/ai-distiller/main/install.sh | bash

# Install to /usr/local/bin (requires sudo)
curl -sSL https://raw.githubusercontent.com/janreges/ai-distiller/main/install.sh | bash -s -- --sudo

Windows PowerShell:

iwr https://raw.githubusercontent.com/janreges/ai-distiller/main/install.ps1 -useb | iex

The installer will:

Detect your OS and architecture automatically
Download the appropriate pre-built binary
Install to ~/.aid/bin/aid by default (no sudo required)
Or to /usr/local/bin/aid with --sudo flag
Guide you through PATH configuration if needed

Basic Usage

# Basic usage
aid .                                   # Current directory, output is saved to file in ./aid
aid . --stdout                          # Current directory, output is printed to STDOUT
aid src/                                # Specific directory
aid main.py                             # Specific file

📖 Example Output

Python Class Example

Input (car.py):

class Car:
    """A car with basic attributes and methods."""
    
    def __init__(self, make: str, model: str):
        self.make = make
        self.model = model
        self._mileage = 0  # Private
    
    def drive(self, distance: int) -> None:
        """Drive the car."""
        if distance > 0:
            self._mileage += distance

Output (aid car.py --format text --implementation=0):

<file path="car.py">
class Car:
    +def __init__(self, make: str, model: str)
    +def drive(self, distance: int) -> None
</file>

TypeScript Interface Example

Input (api.ts):

export interface User {
  id: number;
  name: string;
  email?: string;
}

export class UserService {
  private cache = new Map<number, User>();
  
  async getUser(id: number): Promise<User | null> {
    return this.cache.get(id) || null;
  }
}

Output (aid api.ts --format text --implementation=0):

<file path="api.ts">
export interface User {
  id: number;
  name: string;
  email?: string;
}

export class UserService {
  +async getUser(id: number): Promise<User | null>
}
</file>

📖 Guides & Examples

Deep Code Analysis Prompt Generation

AI Distiller generates sophisticated analysis prompts that AI assistants can execute for comprehensive codebase understanding:

aid internal \
   --private=1 --protected=1 --implementation=1 \
   --ai-action=flow-for-deep-file-to-file-analysis

✅ AI Analysis Task List generated successfully!
📋 Task List: .aid/ANALYSIS-TASK-LIST.internal.2025-06-20.md
📊 Summary File: .aid/ANALYSIS-SUMMARY.internal.2025-06-20.md
📁 Analysis Reports Directory: .aid/analysis.internal/2025-06-20
🤖 Ready for AI-driven analysis workflow!
📂 Files to analyze: 158

💡 If you are an AI agent, please read the Task List above and carefully follow all instructions to systematically analyze each file.

What AI Distiller generates:

📋 Task list prompt - A structured checklist for AI to follow (.aid/ANALYSIS-TASK-LIST.PROJECT.DATE.md)
🎯 Analysis instructions - Detailed prompts guiding AI through security, performance, and quality checks
📊 Code structure - Distilled code included in the prompt files for AI to analyze
📁 Directory structure - Pre-created folders where AI agents can save their analysis results

How to use the generated prompts:

For AI agents: Direct the agent to read the generated task list file and follow instructions
For web AI tools: Copy the content of generated files and paste into Gemini (best for large codebases due to 1M context)
For small codebases: Use --stdout to get prompt directly without saving to file

Note: The analysis dimensions (Security, Performance, Maintainability, Readability) are part of the prompts that guide the AI - AI Distiller itself doesn't perform any analysis.

🤖 Use with Claude Code/Desktop (MCP)

AI Distiller now integrates seamlessly with Claude Code/Desktop through the Model Context Protocol (MCP), enabling AI agents to analyze and understand codebases directly within conversations.

# One-line installation
claude mcp add aid -- npx -y @janreges/ai-distiller-mcp

📦 NPM Package: @janreges/ai-distiller-mcp - Full documentation and examples available

Available MCP Tools

🔍 Code Structure Tools:

distill_file - Extract structure from a single file
distill_directory - Extract structure from entire directories
list_files - Browse directories with file statistics
get_capabilities - Get info about AI Distiller capabilities

🎯 Specialized AI Analysis Tools:

aid_hunt_bugs - Generate bug-hunting prompts with distilled code
aid_suggest_refactoring - Create refactoring analysis prompts
aid_generate_diagram - Produce diagram generation prompts (Mermaid)
aid_analyze_security - Generate security audit prompts (OWASP Top 10)
aid_generate_docs - Create documentation generation prompts
aid_deep_file_analysis - Systematic file-by-file analysis workflow
aid_multi_file_docs - Multi-file documentation workflow
aid_complex_analysis - Enterprise-grade analysis prompts
aid_performance_analysis - Performance optimization prompts
aid_best_practices - Code quality and best practices prompts

🔧 Core Analysis Engine:

aid_analyze - Direct access to all AI actions for custom workflows

Important: AI Distiller generates analysis prompts with distilled code - it does NOT perform the actual analysis! The output is a specialized prompt + distilled code that AI agents (like Claude) then execute. For large codebases, you can copy the output to tools like Gemini 2.0 with 1M context window.

Smart Context Management: AI agents can analyze your entire project for understanding the big picture, then zoom into specific modules (auth, API, database) for detailed work. No more overwhelming AI with irrelevant code!

📖 Complete CLI Reference

Command Synopsis

aid <path> [OPTIONS]

Core Arguments and Options

🎯 Primary Arguments

Argument	Type	Default	Description
`<path>`	String	(required)	Path to source file or directory to analyze. Use `.git` for git history mode, `-` (or empty) for stdin input

📁 Output Options

Option	Type	Default	Description
`-o, --output`	String	`.aid/<dirname>.[options].txt`	Output file path. Auto-generated based on input directory basename and options if not specified
`--stdout`	Flag	`false`	Print output to stdout in addition to file. When used alone, no file is created
`--format`	String	`text`	Output format: `text` (ultra-compact), `md` (clean Markdown), `jsonl` (one JSON per file), `json-structured` (rich semantic data), `xml` (structured XML)

🤖 AI Actions

Option	Type	Default	Description
`--ai-action`	String	(none)	Generate pre-configured prompts with distilled code for AI analysis. See AI Actions section below
`--ai-output`	String	`./.aid/<action>.<timestamp>.<dirname>.md`	Custom output path for generated AI prompt files

👁️ Visibility Filtering

Option	Type	Default	Description
`--public`	0\|1	`1`	Include public members (methods, functions, classes)
`--protected`	0\|1	`0`	Include protected members
`--internal`	0\|1	`0`	Include internal/package-private members
`--private`	0\|1	`0`	Include private members

📝 Content Filtering

Option	Type	Default	Description
`--comments`	0\|1	`0`	Include inline and block comments
`--docstrings`	0\|1	`1`	Include documentation comments (docstrings, JSDoc, etc.)
`--implementation`	0\|1	`0`	Include function/method bodies (implementation details)
`--imports`	0\|1	`1`	Include import/require statements
`--annotations`	0\|1	`1`	Include decorators and annotations

🎛️ Alternative Filtering Syntax

Option	Type	Default	Description
`--include-only`	String	(none)	Include ONLY these categories (comma-separated: `public,protected,imports`)
`--exclude-items`	String	(none)	Exclude these categories (comma-separated: `private,comments,implementation`)

📂 File Selection

Option	Type	Default	Description
`--include`	String	(all files)	Include file patterns (comma-separated: `.go,.py` or multiple: `--include ".go" --include ".py"`)
`--exclude`	String	(none)	Exclude file patterns (comma-separated: `test,.json` or multiple: `--exclude "test" --exclude "vendor/*"`)
`-r, --recursive`	0\|1	`1`	Process directories recursively. Set to 0 to process only immediate directory contents

🔧 Processing Options

Option	Type	Default	Description
`--raw`	Flag	`false`	Process all text files without language parsing. Overrides all content filters
`--lang`	String	`auto`	Force language detection: `auto`, `python`, `typescript`, `javascript`, `go`, `rust`, `java`, `csharp`, `kotlin`, `cpp`, `php`, `ruby`, `swift`

📍 Path Control

Option	Type	Default	Description
`--file-path-type`	String	`relative`	Path format in output: `relative` or `absolute`
`--relative-path-prefix`	String	(empty)	Custom prefix for relative paths (e.g., `module/` → `module/src/file.go`)

⚡ Performance Options

Option	Type	Default	Description
`-w, --workers`	Integer	`0`	Number of parallel workers. `0` = auto (80% of CPU cores), `1` = serial processing, `2+` = specific worker count

📊 Summary Output Options

Option	Type	Default	Description
`--summary-type`	String	`visual-progress-bar`	Summary format after processing. See Summary Types below
`--no-emoji`	Flag	`false`	Disable emojis in summary output for plain text terminals

📜 Git Mode Options (when path is `.git`)

Option	Type	Default	Description
`--git-limit`	Integer	`200`	Number of commits to analyze. Use `0` for all commits
`--with-analysis-prompt`	Flag	`false`	Add comprehensive AI analysis prompt for commit quality, patterns, and insights

🐛 Diagnostic Options

Option	Type	Default	Description
`-v, --verbose`	Count	`0`	Verbose output. Use `-vv` for detailed info, `-vvv` for full trace with data dumps
`--version`	Flag	`false`	Show version information and exit
`--help`	Flag	`false`	Show help message
`--help-extended`	Flag	`false`	Show complete documentation (man page style)
`--cheat`	Flag	`false`	Show quick reference card

AI Actions Detailed

AI actions generate pre-configured prompts combined with distilled code that AI agents can then execute for specific analysis tasks:

Action	Generated Prompt Type	AI Agent Will
`prompt-for-refactoring-suggestion`	Refactoring analysis prompt with distilled code	Analyze code for improvements, technical debt, effort sizing
`prompt-for-complex-codebase-analysis`	Enterprise-grade analysis prompt with full codebase	Generate architecture diagrams, compliance checks, findings
`prompt-for-security-analysis`	Security audit prompt with OWASP Top 10 guidelines	Detect vulnerabilities, suggest remediation steps
`prompt-for-performance-analysis`	Performance optimization prompt with complexity focus	Identify bottlenecks, analyze scalability issues
`prompt-for-best-practices-analysis`	Code quality prompt with industry standards	Assess code quality, suggest improvements
`prompt-for-bug-hunting`	Bug detection prompt with pattern analysis	Find bugs, analyze quality metrics
`prompt-for-single-file-docs`	Documentation generation prompt for single file	Create comprehensive API documentation
`prompt-for-diagrams`	Diagram generation prompt with Mermaid syntax	Generate 10+ architecture and process diagrams
`flow-for-deep-file-to-file-analysis`	Systematic analysis task list with directory structure	Perform file-by-file deep analysis
`flow-for-multi-file-docs`	Documentation workflow with file relationships	Create interconnected documentation

Summary Types

Type	Description	Example Output
`visual-progress-bar`	Default. Shows compression progress bar with colors	`✅ Distilled 150 files [████████░░] 85% (5MB → 750KB)`
`stock-ticker`	Compact stock market style	`📊 AID 97.5% ▲ \| 5MB→128KB \| ~1.2M tokens saved`
`speedometer-dashboard`	Multi-line dashboard with detailed metrics	Shows files, size, tokens, processing time in box format
`minimalist-sparkline`	Single line with sparkline visualization	`▁▃▅▇█ 150 files → 97.5% reduction (750KB) ✓`
`ci-friendly`	Clean format for CI/CD pipelines	`[aid] ✓ 85.9% saved \| 21 kB → 2.9 kB \| 4ms`
`json`	Machine-readable JSON output	`{"original_bytes":5242880,"distilled_bytes":131072,...}`
`off`	Disable summary output	No summary displayed

Exit Codes

Code	Meaning
`0`	Success
`1`	General error (file not found, parse error, etc.)
`2`	Invalid arguments or conflicting options

Examples

# Basic usage - distill with default settings (public APIs only)
aid ./src

# Include all visibility levels and implementation
aid ./src --private=1 --protected=1 --internal=1 --implementation=1

# Generate security analysis prompt (AI agent will execute the analysis)
aid --ai-action prompt-for-security-analysis ./api --private=1

# Process only Python and Go files, exclude tests
aid --include "*.py,*.go" --exclude "*test*,*spec*" ./

# Git history analysis with AI insights
aid .git --with-analysis-prompt --git-limit=500

# Raw text processing for documentation
aid ./docs --raw

# Force single-threaded processing for debugging (-v, -vv, -vvv)
aid ./complex-code -w 1 -vv

# Custom output with absolute paths
aid ./lib --output=/tmp/analysis.txt --file-path-type=absolute

# CI/CD integration with clean output
aid ./internal --summary-type=ci-friendly --no-emoji

⚠️ Limitations

Syntax Errors: Files with syntax errors may be skipped or partially processed
Dynamic Features: Runtime-determined types/interfaces in dynamic languages are not resolved
Macro Expansion: Complex macros (Rust, C++) show pre-expansion source
Generated Code: Consider using .aidignore to skip generated files

🔒 Security Considerations

⚠️ Important: AI Distiller extracts code structure which may include:

Function and variable names that could reveal business logic (e.g., processPayment, calculateTaxEvasion)
API endpoints and internal routes (e.g., /api/v1/internal/user-data)
Type information and data structures
Comments and docstrings (unless stripped)
File paths revealing project structure or codenames

Recommendations:

Always review output before sending to external services
Use --comments=0 to remove potentially sensitive documentation
Consider running a secrets scanner on your codebase first
For maximum security, run AI Distiller in an isolated environment
Future: We're exploring an --obfuscate flag to anonymize sensitive identifiers

🛠️ Advanced Usage

⚡ Parallel Processing

AI Distiller now supports parallel processing for significantly faster analysis of large codebases:

# Use default parallel processing (80% of CPU cores)
aid ./src

# Force serial processing (original behavior)
aid ./src -w 1

# Use specific number of workers
aid ./src -w 16

# Check performance with verbose output
aid ./src -v  # Shows: "Using 25 parallel workers (32 CPU cores available)"

Performance Benefits:

React packages: 3.5s → 0.5s (7x faster)
Large codebases: Near-linear speedup with CPU cores
Maintains identical output order as serial processing

Processing from stdin

AI Distiller can process code directly from stdin, perfect for:

Quick code snippet analysis
Pipeline integration
Testing without creating files
Dynamic code generation workflows

# Auto-detect language from stdin
echo 'class User { getName() { return this.name; } }' | aid --format text

# Explicit language specification
cat mycode.php | aid --lang php --private=0 --protected=0

# Use "-" to explicitly read from stdin
aid - --lang python < snippet.py

# Pipeline example: extract structure from generated code
generate-code.sh | aid --lang typescript --format json

Language Detection: When using stdin without --lang, AI Distiller automatically detects the language based on syntax patterns. Supported languages for auto-detection: python, typescript, javascript, go, ruby, swift, rust, java, c#, kotlin, c++, php.

Integration with AI Tools

# Create a context file for Claude or GPT
aid ./src --format text --implementation=0 > context.txt

# Generate a codebase summary for RAG systems
aid . --format json | jq -r '.files[].symbols[].name' > symbols.txt

# Extract API surface for documentation
aid ./api --comments=0 --implementation=0 --format md > api-ref.md

🚫 Ignoring Files with .aidignore

AI Distiller respects .aidignore files for excluding files and directories from processing. The syntax is similar to .gitignore.

Important: What AI Distiller Processes

AI Distiller only processes source code files with these extensions:

Python: .py, .pyw, .pyi
JavaScript: .js, .mjs, .cjs, .jsx
TypeScript: .ts, .tsx, .d.ts
Go: .go
Rust: .rs
Ruby: .rb, .rake, .gemspec
Java: .java
C#: .cs
Kotlin: .kt, .kts
C++: .cpp, .cc, .cxx, .c++, .h, .hpp, .hh, .hxx, .h++
PHP: .php, .phtml, .php3, .php4, .php5, .php7, .phps, .inc
Swift: .swift

Note: Files like .log, .txt, .md, images, PDFs, and other non-source files are automatically ignored by AI Distiller, so you don't need to add them to .aidignore.

Default Ignored Directories

AI Distiller automatically ignores these common dependency and build directories:

node_modules/ - npm packages
vendor/ - Go and PHP dependencies
target/ - Rust build output
build/, dist/ - Common build directories
__pycache__/, .pytest_cache/, venv/, .venv/, env/, .env/ - Python
.gradle/, gradle/ - Java/Kotlin
Pods/ - Swift/iOS dependencies
.bundle/ - Ruby bundler
bin/, obj/ - Compiled binaries
.vs/, .idea/, .vscode/ - IDE directories
coverage/, .nyc_output/ - Test coverage
bower_components/ - Legacy JavaScript
.terraform/ - Terraform
.git/, .svn/, .hg/ - Version control

You can override these defaults using ! patterns in .aidignore (see Advanced Usage below).

Basic Syntax

Create a .aidignore file in your project root or any subdirectory:

# Comments start with hash
*.test.js          # Ignore test files
*.spec.ts          # Ignore spec files
temp/              # Ignore temp directory
build/             # Ignore build directory
/secrets.py        # Ignore secrets.py only in root
node_modules/      # Ignore node_modules everywhere
**/*.bak           # Ignore .bak files in any directory
src/test_*         # Ignore test_* files in src/
!important.test.js # Don't ignore important.test.js (negation)

How It Works

.aidignore files work recursively - place them in any directory
Patterns are relative to the directory containing the .aidignore file
Use / prefix for patterns relative to the .aidignore location
Use ** for recursive matching
Directory patterns should end with /
Use ! prefix to negate a pattern (re-include previously ignored files)

Examples

# .aidignore in project root
node_modules/       # Excludes all node_modules directories
*.test.js          # Excludes all test files
*.spec.ts          # Excludes all spec files
dist/              # Excludes dist directory
.env.py            # Excludes environment config files
vendor/            # Excludes vendor directory

# More specific patterns
src/**/test_*.py   # Test files in src subdirectories
!src/test_utils.py # But include this specific test file
/config/*.local.py # Local config files in root config dir
**/*_generated.go  # Generated Go files anywhere

Advanced Usage: Including Normally Ignored Content

Include Default-Ignored Directories

Use ! patterns to include directories that are ignored by default:

# Include vendor directory for analysis
!vendor/

# Include specific node_modules package
!node_modules/my-local-package/

# Include Python virtual environment
!venv/

Include Non-Source Files

You can also include files that AI Distiller normally doesn't process:

# Include all markdown files
!*.md
!**/*.md

# Include configuration files
!*.yaml
!*.json
!.env

# Include specific documentation
!docs/**/*.txt
!README.md
!CHANGELOG.md

When you include non-source files with ! patterns, AI Distiller will include their raw content in the output.

Nested .aidignore Files

You can place .aidignore files in subdirectories for more specific control:

# project/.aidignore
*.test.py
!vendor/            # Include vendor in this project

# project/src/.aidignore
test_*.go
*.mock.ts
!test_helpers.ts   # Exception: include test_helpers.ts

🎯 Git History Analysis Mode

AI Distiller includes a special mode for analyzing git repositories. When you pass a .git directory, it switches to git log mode:

# View formatted git history
aid .git

# Limit to recent commits (default is 200)
aid .git --git-limit=500

# Include AI analysis prompt for comprehensive insights
aid .git --git-limit=1000 --with-analysis-prompt

The --with-analysis-prompt flag adds a sophisticated prompt combined with git history that AI agents can use to generate:

Contributor statistics with expertise areas and collaboration patterns
Timeline analysis with development phases and activity visualization
Functional categorization of commits (features, fixes, refactoring)
Codebase evolution insights including technology shifts
Actionable recommendations based on discovered patterns

The output file contains both the analysis prompt and formatted git history, ready for AI agents to process. Perfect for understanding project history, identifying knowledge silos, or generating impressive development reports.

❓ FAQ

How accurate are the token counts?

Token counts are estimated using OpenAI's cl100k_base tokenizer (1 token ≈ 4 characters). Actual token usage varies by model - Claude and GPT-4 use similar tokenizers, while others may differ by ±10%.

Can AI Distiller handle very large repositories?

Yes! We've tested on repositories with 50,000+ files. The parallel processing mode (-w flag) scales linearly with CPU cores. Memory usage is bounded - large files are processed in streaming chunks.

What about generated code and vendor directories?

Create a .aidignore file (same syntax as .gitignore) to exclude generated files, vendor directories, or any paths you don't want processed.

What happens with unsupported file types?

Files with unknown or unsupported extensions are automatically skipped - no errors, no interruption. AI Distiller only processes files it has parsers for, ensuring clean and relevant output. This means you can safely run it on mixed repositories containing documentation, images, configs, etc.

Is my code sent anywhere?

No! AI Distiller runs 100% locally. It only extracts and formats your code structure - you decide what to do with the output. The tool itself makes no network connections.

Which programming languages are supported?

Currently 12+ languages via tree-sitter: Python, TypeScript, JavaScript, Go, Java, C#, Rust, Ruby, Swift, Kotlin, PHP, C++. All parsers are bundled in the binary - no external dependencies needed.

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Development Setup

# Clone and setup
git clone https://github.com/janreges/ai-distiller
cd ai-distiller
make dev-init    # Initialize development environment

# Run tests
make test         # Unit tests
make test-integration  # Integration tests

# Build binary
make build        # Build for current platform

Building Release Binaries

AI Distiller requires CGO for full language support via tree-sitter parsers. To build release binaries for all supported platforms:

Prerequisites

Ubuntu/Debian:

# Install cross-compilation toolchains
sudo apt-get update
sudo apt-get install -y gcc-aarch64-linux-gnu gcc-mingw-w64-x86-64

# For macOS cross-compilation, you need osxcross:
# 1. Clone osxcross: git clone https://github.com/tpoechtrager/osxcross tools/osxcross
# 2. Obtain macOS SDK (see https://github.com/tpoechtrager/osxcross#packaging-the-sdk)
# 3. Place SDK in tools/osxcross/tarballs/
# 4. Build osxcross: cd tools/osxcross && ./build.sh

Build All Platforms

# Build release archives for all platforms
./scripts/build-releases.sh

# This creates:
# - aid-linux-amd64.tar.gz    (Linux 64-bit)
# - aid-linux-arm64.tar.gz    (Linux ARM64)
# - aid-darwin-amd64.tar.gz   (macOS Intel)
# - aid-darwin-arm64.tar.gz   (macOS Apple Silicon)
# - aid-windows-amd64.zip     (Windows 64-bit)

The script automatically detects available toolchains and builds for all possible platforms. Each archive contains the aid binary (or aid.exe for Windows) with full language support.

Note: Without proper toolchains, only the native platform will be built.

📄 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

Built on tree-sitter for accurate parsing
Inspired by the need for better AI-code interaction
Created with ❤️ by Claude Code & Ján Regeš from SiteOne (Czech Republic).

Name		Name	Last commit message	Last commit date
Latest commit History 248 Commits
.github/workflows		.github/workflows
cmd		cmd
docs		docs
internal		internal
mcp-npm		mcp-npm
scripts		scripts
testdata		testdata
.claude.json		.claude.json
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.golangci.yml		.golangci.yml
BUILD.md		BUILD.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
VERSION		VERSION
checksums.txt.template		checksums.txt.template
go.mod		go.mod
go.sum		go.sum
install.ps1		install.ps1
install.sh		install.sh

License

janreges/ai-distiller

Folders and files

Latest commit

History

Repository files navigation

AI Distiller (aid)

Table of Contents

✨ Key Features

🎯 Intelligent Filtering

🤖 AI-Powered Analysis Prompt Generation

📝 Multiple Output Formats

📊 Smart Summary Output

📁 Smart Project Root Detection

🌍 Language Support

Language-Specific Documentation:

🎯 How It Works

🚀 Transform Massive Codebases Into AI-Friendly Context

🎯 Why This Matters for AI-Assisted Development

⚠️ The Hidden Problem with AI Coding Tools

🔧 Flexible for Different Use Cases

🚀 Quick Start

One-Line Installation

Basic Usage

📖 Example Output

📖 Guides & Examples

Deep Code Analysis Prompt Generation

🤖 Use with Claude Code/Desktop (MCP)

Available MCP Tools

📖 Complete CLI Reference

Command Synopsis

Core Arguments and Options

🎯 Primary Arguments

📁 Output Options

🤖 AI Actions

👁️ Visibility Filtering

📝 Content Filtering

🎛️ Alternative Filtering Syntax

📂 File Selection

🔧 Processing Options

📍 Path Control

⚡ Performance Options

📊 Summary Output Options

📜 Git Mode Options (when path is .git)

🐛 Diagnostic Options

AI Actions Detailed

Summary Types

Exit Codes

Examples

⚠️ Limitations

🔒 Security Considerations

🛠️ Advanced Usage

⚡ Parallel Processing

Processing from stdin

Integration with AI Tools

🚫 Ignoring Files with .aidignore

Important: What AI Distiller Processes

Default Ignored Directories

Basic Syntax

How It Works

Examples

Advanced Usage: Including Normally Ignored Content

Include Default-Ignored Directories

Include Non-Source Files

Nested .aidignore Files

🎯 Git History Analysis Mode

❓ FAQ

🤝 Contributing

Development Setup

Building Release Binaries

Prerequisites

Build All Platforms

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

AI Distiller (`aid`)

📜 Git Mode Options (when path is `.git`)

Packages