This project presents an end-to-end data pipeline that monitors and visualizes the rise of electric vehicles (EVs) in India. Using a cloud-native stack, it scrapes, stores, transforms, and visualizes vehicle registration trends across different fuel types and vehicle categories.
fuelnfleet-trends/
│
├── terraform/ # Infrastructure as Code (Terraform)
│ ├── main.tf # GCP resources (VM, permissions)
│ └── variables.tf # Variables for project and region
│
├── kestra/ # Workflow orchestration (Kestra flows)
│ ├── docker-compose.yml # Docker setup to deploy Kestra
│ ├── gcp_kv.yml # GCP auth via Kestra secrets
│ ├── gcp_setup.yml # Upload flow to GCS/BigQuery
│ ├── scrape_vahan_registrations.yml
│ └── scrape_vahan_registrations_scheduled.yml # Scheduled monthly ingestion
│
├── dbt/ # Data transformation using dbt
│ ├── dbt_project.yml
│ ├── profiles.yml
│ ├── models/ # dbt models: cleaning, formatting, fact table
│ │ ├── staging/
│ │ │ ├── schema.yml
│ │ │ ├── stg_fuel_type_data.sql
│ │ │ └── stg_vehicle_category_data.sql
│ │ └── core/
│ │ ├── schema.yml
│ │ └── fact_registrations.sql
│ └── macros/
│ ├── macros_properties.yml
│ ├── add_date.sql
│ └── titlecase.sql
│
└── README.md # Project overview & instructions
India is undergoing a major shift toward electric mobility. However, comprehensive insights into vehicle registrations by fuel type (EV, Fossil, Hybrid) remain fragmented. This project automates data collection, transformation, and visualization to enable policy makers, researchers, and industry professionals to track this transition seamlessly.
Component | Tool |
---|---|
Cloud Provider | Google Cloud Platform (GCP) |
Workflow Orchestration | Kestra |
Infrastructure as Code | Terraform |
Data Lake | Google Cloud Storage (GCS) |
Data Warehouse | BigQuery |
Transformation Layer | dbt |
Dashboard | Looker Studio |
To provision the required cloud resources on Google Cloud Platform (GCP), Terraform is used to automate the creation of:
- A Google Cloud Storage (GCS) bucket to act as the data lake
- A BigQuery dataset to act as the data warehouse
main.tf
: Defines the GCP provider and resources (bucket & dataset)variables.tf
: Contains all the configurable variables such asproject_id
,bucket_name
,dataset_name
, etc.
GCS Bucket
:fuel_n_fleet_bucket
- Class:
STANDARD
- Location:
US
- Auto-aborts incomplete multipart uploads older than 1 day
- Class:
BigQuery Dataset
:fuel_n_fleet
- Location:
US
- Location:
Before deploying, ensure you have:
- Enabled the necessary GCP APIs
- A service account JSON key with permissions (storage admin, bigquery admin, etc.)
my-creds.json
file at the root or updatevariables.tf
accordingly
Run the following commands to deploy:
terraform init
terraform plan
terraform apply
The ingestion pipeline is built using Kestra, an open-source data orchestration engine. The pipeline is deployed on a VM in Google Cloud and orchestrated through Docker.
The Kestra server is deployed using Docker Compose:
- Service 1:
postgres
– used as the metadata repository for Kestra - Service 2:
kestra
– main workflow engine
See docker-compose.yml
in the kestra/
directory for details.
-
gcp_kv.yml
– Defines and sets global variables (project ID, bucket, dataset) as key-value pairs. -
gcp_setup.yml
– Provisions GCS bucket and BigQuery dataset using the GCP plugin within Kestra itself (IaC within orchestration). -
scrape_vahan_registrations.yml
– Manually triggered flow for scraping registration data from the Vahan Dashboard.- Accepts dynamic inputs (year, month, category, and method)
- Allows both sequential and parallel scraping using Playwright + Python
- Uploads CSV data to GCS
- Stages data into external BigQuery tables
- Merges clean data into warehouse tables with deduplication
-
scrape_vahan_registrations_scheduled.yml
– A scheduled monthly pipeline- Triggered automatically on the 15th of every month for both vehicle category and fuel types
- Fully automates scrape → GCS → BigQuery pipeline
Each flow follows these stages:
- Scrape dynamic web data with Playwright
- Clean and convert to tabular format with Pandas
- Upload to GCS as
.csv
- Load as external table in BigQuery
- Transform and merge into curated fact tables (
fuel_type_data
,vehicle_category_data
)
Bonus: The ingestion is parameterized and modular, making it extendable to new data categories easily.
The scheduled flow (scrape_vahan_registrations_scheduled.yml
) includes two CRON triggers:
- Fuel type scraping → every 15th at 7 AM
- Vehicle category scraping → every 15th at 6 AM
The raw scraped data is cleaned, standardized, and modeled using dbt and loaded into BigQuery.
models/staging/
: Fix inconsistencies in scraped data (column misalignments, formatting)models/core/
: Unified fact tablefact_registrations
joining fuel & vehicle categoriesmacros/
: Custom logic for date parsing and title casingtests/
: Data quality checks on key fields (e.g., uniqueness, not-null)
- Standardizes raw
fuel_type_data
andvehicle_category_data
- Adds a derived
reg_date
field via macro - Ensures all
type
fields are title-cased - Combines two domains (fuel + vehicle) into a unified fact model
- Includes data quality tests:
- Not-null tests
- Unique constraints
- Accepted values for key fields
To run the dbt project locally:
# Run all models
dbt run
# Run tests
dbt test
# Generate and view docs
dbt docs generate
dbt docs serve
Ensure your profiles.yml is correctly set up to point to BigQuery using a service account key.
fuel_n_fleet:
outputs:
dev:
type: bigquery
method: service-account
project: dtc-de-project-454809
dataset: fuel_n_fleet
location: US
threads: 4
keyfile: /absolute/path/to/your/my-creds.json
target: dev
Test it out by running dbt test
The final insights are presented in an interactive Looker Studio dashboard, highlighting India's transition to Electric Vehicles (EVs) over time.
Dashboard Link: FueL & Fleet Trends – Looker Studio
-
EV vs Fossil vs Hybrid – Yearly Registration Breakdown
- Displays annual vehicle registration counts segmented by fuel type.
- Highlights the growing share of EVs year-over-year.
-
Registration Share by Fuel Type
- A categorical distribution showing percentage share of registrations across fuel types for a selected year/month.
-
Fleet Composition by Category
- Breaks down the fleet by vehicle category: Transport vs Non-Transport.
- Useful for understanding segment-wise adoption trends (e.g., EVs in commercial use).
- Clear trendline toward increased EV adoption in recent years.
- Hybrid vehicles are still a small percentage but slowly growing.
- Transport vehicles have more consistent fuel-type trends, while Non-Transport shows quicker EV uptake.
Dashboard is powered by clean data from BigQuery, transformed via dbt, and refreshed through a monthly Kestra pipeline.