🔍 Your AI-powered companion to explore, query, and make sense of Indonesia’s public data — all in one tool.
MainData.id is a modern web-based platform that helps people search, understand, query, and analyze open government data from Indonesia — without needing to download, clean, or manually write SQL.
Despite many government open data initiatives, the current user journey is still fragmented:
- Search across portals to find the right dataset.
- Download multiple CSV/Excel files.
- Learn the structure and metadata manually.
- Load into a spreadsheet or SQL engine.
- Write custom queries and build charts.
MainData.id makes all that just one step: Ask your question, get instant SQL & answers.
- 📊 Civil servants working with public dashboards and policy research.
- 🎓 Students & educators looking to teach or learn data literacy.
- 🔬 Researchers doing data-driven investigations.
- 🧑💻 Developers building tools using public data.
-
🔍 Ask in Natural Language
“Berapa jumlah penduduk DKI Jakarta tiap tahun?” → auto-generated SQL. -
📚 Smart Dataset Catalog
All datasets are indexed in a vector database so the app knows which data to use — and why. -
🧠 RAG + LLM Engine
We use a Retrieval-Augmented Generation approach powered by LiteLLM — so you can plug in OpenAI, Mistral, Anthropic, etc. -
⚡ In-browser SQL Engine
DuckDB-WASM runs entirely in your browser. No data leaves your computer. Blazing fast. -
🪶 Lightweight by design
No sign-up, no chat history. Just ask, explore, and move forward.
- User inputs a natural language question.
- Sends the question to the backend.
- Receives generated SQL and runs it using DuckDB-WASM.
- Displays results and charts.
- Receives user input.
- Embeds the query and performs semantic search (RAG) in Supabase.
- Builds a context-aware prompt.
- Calls LLM (via LiteLLM) to generate SQL.
- Returns generated SQL back to frontend.
- Hosts the vectorized dataset catalog.
- Enables RAG for dataset understanding.
opendata-lab/
├── frontend/ # Next.js UI, DuckDB, user interaction
├── backend/ # FastAPI server, RAG engine, LiteLLM
├── docs/ # projects documentation
cd frontend
pnpm install
pnpm dev
cd backend
uv sync
# copy & edit .env.example into .env
cp .env.example .env
# run db migrations
uv run alembic upgrade head
# seeding dataset catalog with dummy data
uv run scripts/seed_example_data.py
# fetching dataset catalog
uv run scripts/fetch_datasets.py
# for dev
uv run fastapi dev --host 0.0.0.0
# alternative if above doesn't work
uv run uvicorn main:app --host 0.0.0.0 --port 8000 --reload
# for prod
uv run uvicorn main:app --host 0.0.0.0 --port 8000
# run test
uv run pytest
# run test with watch mode (with pytest-watcher package)
uv run ptw .
- MVP with NL → SQL + DuckDB
- RAG-enhanced query understanding
- stream based response to frontend
- basic user registration & login, and history management
- Chart suggestion + auto-viz
- One-click CSV upload for user datasets
- Dataset versioning and provenance info
We currently doesn't accept pull requests, but we are very open for ideas & suggestions! We would also really appreaciate if you spread the words about this project.
GNU AFFERO GENERAL PUBLIC LICENSE ver 3
We believe public data should be truly accessible — not just downloadable. Let’s make data a common good, not a technical barrier.