GitHub

You're on the right track, and what you're describing is a modern, reproducible, cost-aware data analytics pipeline. Let's refine and structure it step-by-step to ensure it's robust, automatable, and production-ready. 🧠 Your High-Level Plan (Refined)

Here's your concept, revised into a clean, reliable pipeline:

Daily Extraction: BigQuery → Parquet or DuckDB

Run daily query jobs to extract data from BigQuery

Save to compressed Parquet (e.g., gzip, snappy) or .duckdb

Store locally (for dev) and/or push to:
```
 Google Cloud Storage (for backup + reproducibility)

 Optionally commit small files to GitHub (via DVC or Git LFS)
```

Versioned Storage & Tracking: DVC

Use DVC to:

 Track changes to the data files

 Push data to GCS remote storage

 Link data version to specific code/report commits

Avoid storing large files directly in GitHub

Local Analysis with DuckDB

Load Parquet or .duckdb into DuckDB

Join, filter, explore locally without re-querying BigQuery

Optionally define logic in dbt-duckdb if models become complex
Reporting with Evidence

Connect Evidence to .duckdb or query read_parquet() directly

Build SQL+Markdown reports with variables and filters

Version all reports and data models in Git

Deployment to GCP (Optional Automation/Serving)

Trigger daily jobs (BQ → Parquet → GCS) via:

 Cloud Functions

 Cloud Scheduler + Cloud Run / Composer

Host Evidence:

 Deploy to Cloud Run (good for lightweight services)

 Store static reports to Cloud Storage + CDN (if pre-rendered)

 Use GitHub Actions to regenerate + deploy reports on push

✅ Revised Pipeline (with Tech Choices) 🔄 Ingestion + Transformation Step Tool Notes Query BigQuery google-cloud-bigquery, SQL Use parameterized queries, incremental filters (e.g., WHERE date > @last_day) Save Locally pandas.to_parquet() or duckdb.register() Save to data/ folder Version Data DVC Store metadata in Git, push data to GCS 🔍 Analysis Step Tool Notes Explore / Join / Filter DuckDB Fast, local SQL Optional Modeling dbt-duckdb Structured SQL transformations, testable 📊 Reporting Step Tool Notes Report Development Evidence.dev Markdown + SQL, supports DuckDB Data Source .duckdb file or read_parquet() Load from data/ directory tracked by DVC ☁️ Deployment Step Tool Notes Automate Daily Jobs Cloud Scheduler → Cloud Run / Function Call Python script to re-query and re-save Upload to GCS gsutil, dvc push Store .parquet or .duckdb snapshots Deploy Reports Cloud Run / Static Hosting Deploy full Evidence app or static HTML exports CI/CD GitHub Actions Auto-run pipeline, push reports, update storage 🧩 Optional Enhancements

Dagster or Prefect instead of scripts for orchestration (nicer logs, retries, visual flow)

Great Expectations or Evidently to validate output before reporting

GCS Lifecycle Policies to auto-delete old data

FastAPI + Evidence hybrid for interactive dashboards

🚦 Deployment Diagram (Simplified)

[BigQuery] ↓ (daily query) [Python script or Cloud Function] ↓ [.parquet or .duckdb files] → [GCS + DVC remote] ↓ [DuckDB Analysis] ↔ [Evidence Reports] ↓ ↓ [Push to GitHub] [Deploy to GCP (Cloud Run / Static HTML)]

🧰 Would You Like…

A ready-made project template with:

    extract_config.yaml

    bq_to_parquet.py

    evidence/settings.yaml

    dvc.yaml stage definition?

A minimal cloudbuild.yaml or GitHub Actions CI to deploy Evidence?

Example GCS setup with DVC remote?

Let me know what you'd like next!

npm install npm run sources npm run dev


### Running Repository Recommendations

To generate repository recommendations based on your starred repos and upload them to your Google Sheet, run the following command:

```bash
python scripts/repo_recommender.py

This script will:

Read your starred repositories from the "starred" sheet in your configured Google Sheet.
Analyze your preferred topics and languages.
Search GitHub for similar repositories with at least 1000 stars.
Score and sort these recommendations.
Write the results to a new sheet named "recommendations" in the same Google Sheet workbook.

zalety:

totalny version control - wszystko w repo. nie trzeba widoków w bq (chyba), dashobard jest w markdown (vs tableau brak historii)
praca w gitlabie - praca w cursorze = ogromny potencjał na poprawę wydajności. z czasem + dokumentacjami LLM mógłby pykać wszystkie raporty praktycznie sam
Dowolność w wyglądzie raportu - połowa dokumentacji to opis elementów frontendowych. do przetestowania, ale wygląda jakby z tego można było robić od dashobardów po artykuły
SQL heavy - coś w czym team jest najlepszy. Pola kalkulowane same sie będą pisać
redukcja kosztów - odchodzi na
mega wydajne - do potwierdzenia, ale z opisu wynika że ten silnik jest bardzo dobry (duckdb na pewno jest mocny)

wady:

całkowicie nowe spojrzenie na BI - na pewno w dużym stopniu nieintuicyjne. zamiast UI w którym przerzucasz kafelki, musisz pisać i patrzeć w kod.
deployment & access - wielka niewiadoma. z listy kompatybilnych deploymentów to chyba tylko cloudflare pages sie zgadza. to sprawdzenia z devopsami, ale to jest chyba ta bramka która mamy np. do bistro,parkingu, whatifa. nawet jeśli by to działało, pizganie się z devopsami on daily basis brzmi jak udręka. potencjalnie musieliby nadać jakiś access administracyjny bo inaczej kicha w chuj

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.evidence/customization		.evidence/customization
.github/workflows		.github/workflows
.vscode		.vscode
pages		pages
scripts		scripts
sources/github		sources/github
static		static
.gitignore		.gitignore
.npmrc		.npmrc
README.md		README.md
degit.json		degit.json
evidence.config.yaml		evidence.config.yaml
package-lock.json		package-lock.json
package.json		package.json
plan.txt		plan.txt
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
repomix-output.xml		repomix-output.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Veatec22/evidence_test

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages