8000 GitHub - metabase/dataset-generator: AI Dataset Generator – Create realistic datasets for demos, learning, and dashboards
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

metabase/dataset-generator

Repository files navigation

AI Dataset Generator

Generate realistic datasets for demos, learning, and dashboards. Instantly preview data, export as CSV or SQL, and explore with Metabase.

Features:

  • Conversational prompt builder: choose business type, schema, row count, and more
  • Real-time data preview in the browser
  • Export as CSV (single file or multi-table ZIP) or as SQL inserts
  • One-click Metabase launch for data exploration

Prerequisites

Stack

  • Next.js (App Router, TypeScript)
  • Tailwind CSS + ShadCN UI (modern, dark-themed UI)
  • OpenAI API (GPT-4o for data generation)
  • Metabase (Dockerized, launched on demand)

Getting Started

  1. Clone the repo:

    git clone <your-repo-url>
    cd dataset-generator
  2. Create your .env file:

    Copy the example file and fill in your OpenAI API key:

    cp .env.example .env.local

    Then edit .env.local and add your OpenAI API key after the = sign.

  3. Start the Next.js app:

    npm install
    npm run dev
  4. Generate a dataset:

    • Use the prompt builder to define your dataset.
    • Click "Preview Data" to see a sample.
  5. Export or Explore:

    • Download your dataset as CSV or SQL Inserts.
    • Click "Start Metabase" to spin up Metabase in Docker.
    • Once Metabase is ready, click "Open Metabase" to explore your data.
    • When done, click "Stop Metabase" to shut down and clean up Docker containers.

Project Structure

  • /app/page.tsx – Main UI and prompt builder
  • /app/api/generate/route.ts – Synthetic data generator (OpenAI)
  • /app/api/metabase/start|stop|status/route.ts – Docker orchestration for Metabase
  • /lib/export/ – CSV/SQL export logic
  • /docker-compose.yml – Used only for Metabase, not for the app itself

Using Metabase

When you click "Start Metabase", it will launch Metabase in a Docker container. Once ready:

  1. Click "Open Metabase" to access the Metabase interface
  2. Follow Metabase's setup process
  3. To analyze your generated data:

Cost & Data Generation Summary

Action Calls OpenAI? Cost? Uses LLM? Uses Faker? Row Count
Preview Yes ~$0.05 Yes Yes 10
Download CSV No $0 No Yes 100+
Download SQL No $0 No Yes 100+

Key Points:

  • You only pay for the preview/spec generation (~$0.05 per preview)
  • All downloads use the same columns/spec, just with more rows, and are free

How It Works

  • When you preview a dataset, the app uses OpenAI to generate a detailed data spec (schema, business rules, event logic) for your chosen business type and parameters.
  • All actual data rows are generated locally using Faker, based on the LLM-generated spec.
  • Downloading or exporting data never calls OpenAI again—it's instant and free.

Usage Flow

  1. Select your business type, schema, and other parameters.
  2. Click "Preview Data" to generate a 10-row sample (incurs a small OpenAI cost).
  3. Download CSV/SQL for as many rows as you want—no extra cost, always uses the same schema/columns as the preview.

Schema Options

  • One Big Table (OBT): A single, denormalized table with all relevant columns.
  • Star Schema: Multiple tables (fact + dimension) for more advanced analytics. The LLM spec guides the structure, and the generator outputs all tables locally.

Extending/Contributing

  • To add new business types, rules, or schema logic, edit lib/spec-prompts.ts

About

AI Dataset Generator – Create realistic datasets for demos, learning, and dashboards

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0