Fire Enrich - AI-Powered Data Enrichment Tool

Turn a simple list of emails into a rich dataset with company profiles, funding data, tech stacks, and more. Powered by Firecrawl and a multi-agent AI system.

Technologies

Firecrawl: Web scraping and content aggregation
OpenAI: Intelligent data extraction and synthesis
Next.js 15: Modern React framework with App Router

Setup

Required API Keys

Service	Purpose	Get Key
Firecrawl	Web scraping and content aggregation	firecrawl.dev/app/api-keys
OpenAI	Intelligent data extraction	platform.openai.com/api-keys

Quick Start

Clone this repository

Create a .env.local file with your API keys:

FIRECRAWL_API_KEY=your_firecrawl_key
OPENAI_API_KEY=your_openai_key

Install dependencies: npm install or yarn install
Run the development server: npm run dev or yarn dev
Open http://localhost:3000

Example Enrichment

Before:

{
  "email": "erez@wiz.io"
}

After:

{
  "email": "erez@wiz.io",
  "companyName": "Wiz",
  "industry": "Cybersecurity",
  "employeeCount": "1001-5000",
  "yearFounded": 2020,
  "headquarters": "New York, NY",
  "fundingStage": "Series D",
  "totalRaised": "$900M",
  "website": "https://www.wiz.io",
  "sources": [
    "https://www.wiz.io/about",
    "https://techcrunch.com/2023/02/27/wiz-confirms-300m-at-a-10b-valuation-to-build-out-its-cloud-security-platform/"
  ]
}

How It Works

Architecture Overview: Following "ericciarla@firecrawl.dev" Through the System

Let's see exactly how Fire Enrich processes a real example - enriching data for the email ericciarla@firecrawl.dev.

graph TD
    Start["Input: ericciarla@firecrawl.dev - Industry, CEO, Funding Stage, Tech Stack"]:::primary
    
    Start -->|1. Extract Domain| Domain["Domain: firecrawl.dev - Corporate email detected"]:::primary
    
    Domain -->|2. Start Orchestration| Orchestrator["Agent Orchestrator - Executes agents in optimized sequence - Each phase builds on previous data"]:::synthesis

    %% Phase 1: Discovery
    Orchestrator -->|Phase 1| Discovery["Discovery Agent - Finds basic company info first"]:::agent
    
    Discovery -->|Parallel searches| DiscSearch["Parallel Searches: Firecrawl company, firecrawl.dev, What is Firecrawl"]:::search
    
    DiscSearch -->|Firecrawl API| DiscFC["3 concurrent API calls - Returns company website and basic information"]:::firecrawl
    
    DiscFC -->|Extracts| DiscData["Company: Firecrawl - Website: firecrawl.dev - Type: B2B SaaS"]:::source

    %% Phase 2: Company Profile
    DiscData -->|Phase 2| Profile["Company Profile Agent - Uses company name from Phase 1 to find industry details"]:::agent
    
    Profile -->|Parallel searches| ProfSearch["Parallel Searches: Firecrawl industry classification, Firecrawl web scraping API, Developer tools Firecrawl"]:::search
    
    ProfSearch -->|Firecrawl API| ProfFC["3 concurrent API calls - Searches industry-specific sources"]:::firecrawl
    
    ProfFC -->|Extracts| ProfData["Industry: Developer Tools - Sub-category: Web Scraping APIs - Market: B2B SaaS"]:::source

    %% Phase 3: Financial
    ProfData -->|Phase 3| Funding["Financial Intel Agent - Searches for funding using company and industry context"]:::agent
    
    Funding -->|Parallel searches| FundSearch["Parallel Searches: Firecrawl funding rounds, Mendable AI acquisition Firecrawl, Firecrawl investors crunchbase"]:::search
    
    FundSearch -->|Firecrawl API| FundFC["3 concurrent API calls - Checks TechCrunch, Crunchbase, venture news sites"]:::firecrawl
    
    FundFC -->|Extracts| FundData["Funding: Seed Stage - Part of Mendable AI - YC-backed company"]:::source

    %% Phase 4: Tech Stack
    FundData -->|Phase 4| Tech["Tech Stack Agent - Analyzes GitHub and tech docs - HTML source analysis"]:::agent
    
    Tech -->|Parallel searches| TechSearch["Parallel Searches: github.com/mendableai/firecrawl, Firecrawl API documentation, Direct HTML analysis"]:::search
    
    TechSearch -->|Firecrawl API| TechFC["3 concurrent API calls - HTML meta tag analysis - GitHub repo scan"]:::firecrawl
    
    TechFC -->|Extracts| TechData["Tech Stack: Node.js, Python, Redis, Playwright, Kubernetes"]:::source

    %% Phase 5: General
    TechData -->|Phase 5| General["General Purpose Agent - Handles custom field CEO - Uses all previous context"]:::agent
    
    General -->|Targeted search| GenSearch["Focused Search: Firecrawl CEO founder Eric, Eric Ciarla Firecrawl, LinkedIn company search"]:::search
    
    GenSearch -->|Firecrawl API| GenFC["3 concurrent API calls - Cross-references multiple sources"]:::firecrawl
    
    GenFC -->|Extracts| GenData["CEO: Eric Ciarla - Co-founder and CEO of Firecrawl - Previously at Mendable AI"]:::source

    %% Final Synthesis
    DiscData --> Synthesis
    ProfData --> Synthesis
    FundData --> Synthesis
    TechData --> Synthesis
    GenData --> Synthesis
    
    Synthesis["GPT-4o Final Synthesis - Combines all agent findings - Resolves conflicts, validates data"]:::synthesis
    
    Synthesis -->|Outputs| Results
    
    subgraph Results[Enriched Data]
        R1["Industry: Developer Tools / Web Scraping - Source: firecrawl.dev/about"]:::good
        R2["CEO: Eric Ciarla Co-founder and CEO - Source: linkedin.com/company/firecrawl"]:::good
        R3["Funding: Seed Part of Mendable AI - Source: crunchbase.com"]:::good
        R4["Tech Stack: Node.js, Python, Redis, K8s - Source: github.com/mendableai/firecrawl"]:::good
    end
    
    Results -->|Final Output| Output["Updated CSV Row: ericciarla@firecrawl.dev - Complete profile with 4 new data points and sources"]:::answer
    
    classDef primary fill:#ff8c42,stroke:#ff6b1a,stroke-width:2px,color:#fff
    classDef agent fill:#9c27b0,stroke:#7b1fa2,stroke-width:2px,color:#fff
    classDef search fill:#e8e8e8,stroke:#999,stroke-width:2px,color:#333
    classDef firecrawl fill:#ff6b1a,stroke:#ff4500,stroke-width:3px,color:#fff
    classDef source fill:#ffa726,stroke:#ff8c42,stroke-width:2px,color:#000
    classDef synthesis fill:#ff8c42,stroke:#ff6b1a,stroke-width:3px,color:#fff
    classDef good fill:#f5f5f5,stroke:#666,stroke-width:1px,color:#000
    classDef answer fill:#333,stroke:#000,stroke-width:3px,color:#fff

How Each Agent Works

Behind the scenes, each agent is a specialized module with its own expertise, search strategies, and type-safe output schema:

Discovery Agent (Phase 1)
- Establishes company basics: official name, website, type of business
- Essential first step that provides the foundation for all other agents
- Returns: Company name, website URL, business type
- Schema: DiscoveryResult with fields like companyName, website, domain
Company Profile Agent (Phase 2)
- Uses verified company name to search for industry and market positioning
- Builds on Discovery data to ensure accurate industry classification
- Returns: Industry, sub-category, business model, market segment
- Schema: ProfileResult with industry, headquarters, yearFounded, companyType
Financial Intel Agent (Phase 3)
- Leverages company name + industry context for targeted funding searches
- Knowing the industry helps identify relevant investor databases
- Returns: Funding stage, total raised, key investors, valuation
- Schema: FundingResult with fundingStage, totalRaised, lastRoundAmount, investors
Tech Stack Agent (Phase 4)
- Analyzes technology with context of company type and funding stage
- HTML analysis, GitHub repos, and technical documentation
- Returns: Programming languages, frameworks, infrastructure, tools
- Uses: Direct EnrichmentResult schema for flexible tech stack extraction
General Purpose Agent (Phase 5)
- Handles custom fields (like CEO, competitors, etc.) with full context
- Benefits from all previous data to make targeted searches
- Returns: Any custom field requested by the user
- Uses: Dynamic EnrichmentResult schema based on user-defined fields

Why Sequential Execution?

The agents execute in a carefully designed sequence where each phase builds upon the previous one:

Context Building: Each agent adds context that makes subsequent searches more accurate. For example, knowing a company's industry helps the funding agent search in the right venture databases.
Data Validation: Later agents can validate and refine data from earlier phases.
Efficiency: Prevents redundant searches by sharing discovered information across phases.
Parallel Searches Within Phases: While agents run sequentially, each agent performs multiple searches in parallel, maximizing speed.

This architecture balances accuracy with performance - we could run all agents in parallel, but the sequential approach with shared context produces significantly better results.

Extensibility Through Type-Safe Schemas

Each agent uses Zod schemas to ensure type safety and make the system easily extensible:

// Example: Adding a new field to the FundingAgent
const FundingResult = z.object({
  fundingStage: z.string().optional(),
  totalRaised: z.string().optional(),
  lastRoundAmount: z.string().optional(),
  investors: z.array(z.string()).optional(),
  // Add your new field here:
  debtFinancing: z.string().optional(),
});

To extend Fire Enrich with new data extraction capabilities:

Add to existing agent: Modify the Zod schema in /lib/agent-architecture/agents/[agent-name].ts
Create a new agent: Define a new schema and implement the AgentBase interface
Update the orchestrator: Add routing logic to direct fields to your new agent
Use custom fields: The General Agent handles any field not covered by specialized agents

The field routing system automatically categorizes user requests:

Fields with "industry" or "headquarter" → Company Profile Agent
Fields with "fund" or "invest" → Financial Intel Agent
Fields with "employee" or "revenue" → Metrics Agent
Fields with "tech" and "stack" → Tech Stack Agent
Everything else → General Purpose Agent

This design allows Fire Enrich to grow with your needs while maintaining type safety and predictable behavior.

Process Flow

Upload & Parse: Upload a CSV with emails. The system extracts the company domain from each email.
Field Selection: Choose the data points you need, from company descriptions to funding stages.
Sequential Agent Execution: Agents activate in phases, each building on previous discoveries for maximum accuracy.
Parallel Searches Per Phase: Within each phase, multiple searches run concurrently using the Firecrawl API.
AI Synthesis: GPT-4o analyzes all findings, resolves conflicts, and extracts structured data.
Real-time Results: Your table populates in real-time, complete with enriched data and source citations.

The Multi-Agent System

Fire Enrich employs a sophisticated orchestration system that coordinates specialized extraction modules. These aren't autonomous AI agents, but rather purpose-built components that work together intelligently:

Discovery Phase: Establishes the foundation by identifying the company and its digital presence
Profile Extraction: Specialized logic for industry classification and business model analysis
Financial Intelligence: Targeted searches across venture databases and news sources
Technical Analysis: Deep inspection including HTML parsing and repository analysis
Custom Field Handler: Flexible extraction for any user-defined data points

Each module uses GPT-4o for intelligent data extraction, but follows deterministic search patterns optimized through extensive testing. This hybrid approach combines the reliability of structured programming with the flexibility of AI-powered comprehension.

Key Features

Phased Extraction System: Sequential modules that build context for increasingly accurate results.
Drag & Drop CSV: Simple, intuitive interface to get started in seconds.
Customizable Fields: Choose from a list of common data points or generate your own with natural language.
Real-time Streaming: Watch your data get enriched row-by-row via Server-Sent Events.
Full Source Citations: Every piece of data is linked back to the URL it was found on, ensuring complete transparency.
Skip Common Providers: Automatically skips personal emails (Gmail, Yahoo, etc.) to save on API calls and focus on company data.

Configuration & Unlimited Mode

When you clone and run this repository locally, Fire Enrich automatically enables Unlimited Mode, removing the restrictions of the public demo. You can configure these limits in app/fire-enrich/config.ts:

const isUnlimitedMode = process.env.FIRE_ENRICH_UNLIMITED === 'true' || 
                       process.env.NODE_ENV === 'development';

export const FIRE_ENRICH_CONFIG = {
  CSV_LIMITS: {
    MAX_ROWS: isUnlimitedMode ? Infinity : 15,
    MAX_COLUMNS: isUnlimitedMode ? Infinity : 5,
  },
  REQUEST_LIMITS: {
    MAX_FIELDS_PER_ENRICHMENT: isUnlimitedMode ? 50 : 10,
  },
} as const;

Our Open Source Philosophy

Let's be blunt: professional data enrichment services are expensive for a reason. Our goal with Fire Enrich isn't to replicate every feature of mature platforms overnight. Instead, we want to build a powerful, open-source foundation that anyone can use, understand, and contribute to.

This is just the start. By open-sourcing it, we're inviting you to join us on this journey.

Add a new agent? Fork the repo and show us what you've got.
Improve a data extraction prompt? Open a pull request.
Have a new feature idea? Start a discussion in the issues.

We believe that by building in public, we can create a tool that is more accessible, affordable, and adaptable, thanks to the collective intelligence of the open-source community.

License

MIT License - see LICENSE file for details.

Contributing

We welcome contributions! Please feel free to submit a Pull Request.

Support

For questions and issues, please open an issue in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
app		app
components/ui		components/ui
hooks		hooks
lib		lib
public		public
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
components.json		components.json
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fire Enrich - AI-Powered Data Enrichment Tool

Technologies

Setup

Required API Keys

Quick Start

Example Enrichment

How It Works

Architecture Overview: Following "ericciarla@firecrawl.dev" Through the System

How Each Agent Works

Why Sequential Execution?

Extensibility Through Type-Safe Schemas

Process Flow

The Multi-Agent System

Key Features

Configuration & Unlimited Mode

Our Open Source Philosophy

License

Contributing

Support

About

Uh oh!

Releases

Packages

Languages

License

codehornets/fire-enrich

Folders a 8000 nd files

Latest commit

History

Repository files navigation

Fire Enrich - AI-Powered Data Enrichment Tool

Technologies

Setup

Required API Keys

Quick Start

Example Enrichment

How It Works

Architecture Overview: Following "ericciarla@firecrawl.dev" Through the System

How Each Agent Works

Why Sequential Execution?

Extensibility Through Type-Safe Schemas

Process Flow

The Multi-Agent System

Key Features

Configuration & Unlimited Mode

Our Open Source Philosophy

License

Contributing

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages