8000 feat(server): support parquet databases by ezyang · Pull Request #156 · ezyang/scubaduck · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat(server): support parquet databases #156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ uv sync --frozen
SCUBADUCK_DB=/path/to/foo.sqlite flask --app scubaduck.server run --debug
```

DuckDB databases work too. Omit to get a simple test dataset, or
DuckDB databases and Parquet files work too. Omit to get a simple test dataset, or
`SCUBADUCK_DB=TEST` for a more complicated test dataset.

## How to use it
Expand Down
5 changes: 5 additions & 0 deletions scubaduck/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,11 @@ def _load_database(path: Path) -> duckdb.DuckDBPyConnection:
con.execute(
f"CREATE TABLE events AS SELECT * FROM read_csv_auto('{path.as_posix()}')"
)
elif ext in {".parquet", ".parq"}:
con = duckdb.connect()
con.execute(
f"CREATE TABLE events AS SELECT * FROM read_parquet('{path.as_posix()}')"
)
elif ext in {".db", ".sqlite"}:
con = duckdb.connect()
con.execute("LOAD sqlite")
Expand Down
20 changes: 20 additions & 0 deletions tests/test_server_db_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,26 @@ def test_envvar_db(monkeypatch: pytest.MonkeyPatch, tmp_path: Path) -> None:
assert len(rows) == 1


def test_envvar_parquet(monkeypatch: pytest.MonkeyPatch, tmp_path: Path) -> None:
parquet_file = tmp_path / "events.parquet"
con = duckdb.connect()
csv_path = Path("scubaduck/sample.csv").resolve()
con.execute(
f"COPY (SELECT * FROM read_csv_auto('{csv_path.as_posix()}')) TO '{parquet_file.as_posix()}' (FORMAT PARQUET)"
)
con.close() # pyright: ignore[reportUnknownMemberType, reportAttributeAccessIssue]

monkeypatch.setenv("SCUBADUCK_DB", str(parquet_file))
app = server.create_app()
client = app.test_client()
payload = _make_payload()
rv = client.post(
"/api/query", data=json.dumps(payload), content_type="application/json"
)
rows = rv.get_json()["rows"]
assert len(rows) == 3


def test_envvar_db_missing(monkeypatch: pytest.MonkeyPatch, tmp_path: Path) -> None:
missing = tmp_path / "missing.sqlite"
monkeypatch.setenv("SCUBADUCK_DB", str(missing))
Expand Down
0