This project automates the collection, processing, and analysis of CAISO's interconnection queue data.
- Automated weekly data collection from CAISO website
- Historical data tracking with dated snapshots
- Comprehensive analysis and KPI generation
- GitHub Actions automation for consistent data updates
- Email notifications for pipeline status
- Ensure Docker & pipenv are installed
- Clone the repository
- Build:
docker build -t caiso-queue:latest .
- Run the complete pipeline:
docker run --rm \ -v %CD%/reports:/app/reports \ -v %CD%/raw:/app/raw \ -e SMTP_HOST=... \ -e SMTP_USER=... \ -e SMTP_PASS=... \ -e NOTIFICATION_EMAIL=... \ caiso-queue:latest \ sh -c "python scripts/run_pipeline.py && python scripts/analyze_queue.py && python scripts/cleanup_raw.py"
The project includes a GitHub Actions workflow that:
- Downloads the latest CAISO queue report every Monday
- Processes and analyzes the data
- Generates updated reports
- Commits changes back to the repository
Refer to the top of this document.
The analysis generates the following KPIs in the reports/
directory:
-
Capacity by Fuel Type (
capacity_by_fuel.csv
)
Aggregated capacity in MW for each fuel type combination in the queue. -
Project Count by Status (
project_count_by_status.csv
)
Number of projects and total MW capacity grouped by application status. -
Top 5 ISO Zones (
top5_iso_zones.csv
)
The 5 ISO zones with the highest active capacity. -
Weekly Queue Growth (
weekly_queue_growth.csv
)
Weekly growth in MW capacity added to the queue. -
Cancellation Rate (
cancellation_rate.csv
)
Ratio of withdrawn projects to active projects, measured in MW. -
Average Lead Time (
average_lead_time.csv
)
Average days between interconnection request reception and queue date. -
Top Projects by Net MW (
top_projects_by_net_mw.csv
)
The 10 largest projects by net MW contribution to the grid, including project name, location, fuel type, and status.
-
Data Collection (
data_collection.py
)- Downloads the latest queue report from CAISO website
- Saves with date suffix for historical tracking
- Maintains a standard filename for compatibility
-
Data Processing (
parse_queue.py
)- Parses multi-sheet Excel workbook
- Handles complex header structures
- Loads data into SQLite database
-
Analysis (
analyze_queue.py
)- Generates standardized reports and KPIs
- Outputs CSV files for further analysis
- Tracks changes over time
-
Maintenance (
cleanup_raw.py
)- Manages historical data retention
- Cleans up old raw files
- Maintains optimal storage usage
- Go to your repository on GitHub
- Navigate to Settings → Secrets and variables → Actions
- Add the following secrets:
SMTP_HOST
: Your SMTP server addressSMTP_USER
: SMTP usernameSMTP_PASS
: SMTP passwordNOTIFICATION_EMAIL
: Notification recipient
-
Create a
.env
file:SMTP_HOST=smtp.example.com SMTP_USER=your_user SMTP_PASS=your_pass NOTIFICATION_EMAIL=you@example.com
-
Run with environment file:
docker run --rm ^ --env-file .env ^ -v %CD%/reports:/app/reports ^ -v %CD%/raw:/app/raw ^ caiso-queue:latest ^ sh -c "python scripts/run_pipeline.py"
- Place a sample XLSX file in
raw/
directory - Run the pipeline locally to verify:
- Data collection and parsing
- Report generation
- Database updates
- Check
data/caiso_queue.db
for processed data - Verify reports in
reports/
directory