Super Bowl Advertisement Analysis for Rogue Ridge

Project Overview

A comprehensive data-driven analysis of Super Bowl commercials to develop a strategic advertising approach for Forge & Field's new Rogue Ridge personal care product line.

Business Context

Client: Forge & Field Brands
Product: Rogue Ridge (Personal Care Line for American Men)
Investment: $11M total advertising budget
Key Objective: Develop a high-impact 30-second Super Bowl commercial

Project Status

Last Updated: June 2025 Current Phase: Deliverable 2 - Model Training and Preliminary Insights

Data Collection Overview

Data Sources and Volume

YouTube: 1,181 Super Bowl ad videos (2000-2025)
- 470 with metadata and comments (~50,000 comments)
Reddit: ~10,000 posts/comments (2020-2024)
News Articles: ~500 articles linked from Reddit
Video Files: 1,181 downloaded MP4s
Multimodal Content per Ad:
- Audio (MP3), Subtitle (TXT), Keyframes (JPG)

Data Completeness

Video Metadata: >95% complete
Reddit & News: Supplemented for missing YouTube discussions
Comment Richness: Multi-source, sentiment-scored

Key Scripts

extract_youtube_id_list.py: Scrape YouTube IDs from superbowl-ads.com
youtube_info.py: Fetch metadata & comments
reddit_updata.py: Search Reddit discussions using PRAW
superbowldownload.py: MP4 download fallback for non-YouTube videos
whisper_audio_process.py: Transcribe audio using Whisper

AI/ML Workflow Summary

Step-by-Step Pipeline

Preprocessing:
- Clean comments, subtitles, descriptions
- Remove emojis, filler text, duplicates
Sentiment Classification:
- Models: TextBlob, VADER, RoBERTa, FinBERT, BART
- Output: Positive / Neutral / Negative (via majority vote)
Multimodal Feature Extraction:
- Tools: Whisper (audio), FFmpeg (video), Gemini API
- Extracted: Mood, Emotion, Pacing, Slogan, Tone, Symbols
Feature Engineering:
- LabelEncoder for categorical features
- StandardScaler for numerical values
- PCA to 20 components
Model Training:
- Models: Logistic Regression, Random Forest, SVM, NB, KNN, MLP, CatBoost
- Best Model: CatBoost with 87.3% test accuracy
Validation:
- GridSearchCV + 3-fold CV
- EarlyStopping & max_depth to prevent overfitting
- Feature importance visualization

Gemini Output Feature Tags

Categories Extracted:

Visual Style: Color_Tone, Lighting, Composition, Style_Tag
Mood/Narrative: Emotional_Tone, Structure, Pacing, Twist
Semantics: Setting, Product Visibility, Masculine Symbols
Audio/Text: Slogan, Narration Style, Humor Use
Audience Profile: Gender, Age, Culture, Lifestyle

Next Steps

Task	Status	Action
Add 2025 Reddit data	⏳	Use `reddit_updata.py` with year filtering
Multi-label classification	✅	Implement sentiment scoring vector
Prompt Engineering	✅	Refine Gemini instructions for clarity
Resonance Modeling	❌	Design alignment test between ad tone and audience segment

Repository Structure

superbowlproject/
├── config/                  # API keys, prompt templates
├── database/                # Processed data & backups
├── deliverable-2-appendix/ # Attachments and examples
├── logs/                   # Logging outputs
├── models/                 # ML models and scripts
├── notebooks/              # Jupyter notebooks for exploration
├── scripts/                
├── src/                    # Core modules (scraping, processing, analysis)
├── tests/                  # Unit & integration tests
├── requirements.txt        # Python dependencies
└── README.md               # Documentation

Installation & Quick Start

Requirements

Python 3.8+
Reddit API credentials
YouTube Data API key
Google Cloud (for Gemini)

Setup

git clone https://github.com/SiyuSun341/SuperBowlProject.git
cd SuperBowlProject
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env       # Fill in your API keys

Run Example

python scripts/run_data_collection.py
python scripts/train_models.py
python scripts/generate_insights.py

Documentation

File	Description
`docs/api_documentation.md`	API usage reference
`docs/model_documentation.md`	ML model descriptions
`docs/setup_guide.md`	Environment & dependencies
`docs/user_guide.md`	Analysis walkthrough
`docs/superbowl_python_setup.md`	superbowl setup

Success Metrics

Data

1181 total ads (1205 total vedios)
95% completeness
500K total comments

Models

CatBoost: 87.3% test accuracy
PCA-d: 20 features, interpretable
Cross-validation: stable across folds

Strategic Insights

Precisely identify 5-7 key success factors for Super Bowl advertisements
Tailored advertising strategy for the Rogue Ridge personal care product line
Provide data-driven advertising design recommendations

Business Value

Strategic decision support for $11 million advertising investment
Actionable creative guidance for a 30-second commercial
Mitigate commercial failure risks
Help Forge & Field precisely target the intended audience (prototypical American men)

Key Deliverables

Comprehensive technical analysis report
Client-facing advertising design recommendations document
Pre-launch advertisement testing and optimization framework

Contact

Author: Siyu Sun Email: sunsiyu.suzy@gmail.com GitHub: SiyuSun341

License

This repository is part of an academic project at Purdue University (MBT Program). Do not distribute without permission.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Super Bowl Advertisement Analysis for Rogue Ridge

Project Overview

Business Context

Project Status

Data Collection Overview

Data Sources and Volume

Data Completeness

Key Scripts

AI/ML Workflow Summary

Step-by-Step Pipeline

Gemini Output Feature Tags

Categories Extracted:

Next Steps

Repository Structure

Installation & Quick Start

Requirements

Setup

Run Example

Documentation

Success Metrics

Data

Models

Strategic Insights

Business Value

Key Deliverables

Contact

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Deliverable 2 - Appendix		Deliverable 2 - Appendix
config		config
database		database
docs		docs
models		models
scripts		scripts
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

License

SiyuSun341/SuperBowlProject

Folders and files

Latest commit

History

Repository files navigation

Super Bowl Advertisement Analysis for Rogue Ridge

Project Overview

Business Context

Project Status

Data Collection Overview

Data Sources and Volume

Data Completeness

Key Scripts

AI/ML Workflow Summary

Step-by-Step Pipeline

Gemini Output Feature Tags

Categories Extracted:

Next Steps

Repository Structure

Installation & Quick Start

Requirements

Setup

Run Example

Documentation

Success Metrics

Data

Models

Strategic Insights

Business Value

Key Deliverables

Contact

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages