Smart Tagging & Vibe Classification Engine for Fashion Videos
A complete AI pipeline that processes fashion videos to detect clothing items, match them to product catalogs, and classify aesthetic vibes. Built for the Flickd AI Hackathon.
- π Fashion Detection: Uses YOLOv8 to identify clothing items, accessories, and fashion elements
- ποΈ Product Matching: Employs CLIP + FAISS for similarity matching against product catalogs
- π Vibe Classification: NLP-based classification of fashion aesthetics (Clean Girl, Coquette, Streetcore, etc.)
- β‘ High Performance: Processes videos in ~1-2 seconds with 100% success rate
# Process all videos with optimized pipeline
python fashion_ai_pipeline.py
# Test specific video
python test_specific_video.py
Sample 8000 Output:
{
"video_id": "2025-05-28_13-40-09_UTC",
"vibes": ["Clean Girl"],
"products": [
{
"type": "top",
"color": "brown",
"matched_product_id": 16050,
"match_type": "similar",
"confidence": 0.793
}
]
}
Metric | Result |
---|---|
Processing Speed | ~1.3s per video |
Fashion Detections | 20-24 items per video |
Product Matches | 10 high-quality matches per video |
Vibe Classification | 85%+ accuracy on test videos |
Success Rate | 100% (6/6 videos processed) |
π Project Structure
βββ π fashion_ai_pipeline.py # Main optimized pipeline
βββ πΉ test_specific_video.py # Individual video testing
βββ π src/
β βββ fashion_detector.py # YOLOv8 fashion detection
β βββ product_matcher.py # CLIP + FAISS matching
β βββ vibe_classifier.py # NLP vibe classification
β βββ utils.py # Helper functions
βββ π data/
β βββ videos/ # Input fashion videos
β βββ images.csv # Product catalog (11K+ items)
β βββ vibeslist.json # Supported vibes
βββ π outputs/ # Generated results
βββ π requirements.txt # Dependencies
- Python 3.8+
- 4GB+ RAM
- Internet connection (for downloading models)
# Clone the repository
git clone https://github.com/yourusername/fashion-ai-pipeline.git
cd fashion-ai-pipeline
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Download YOLOv8 model (automatically downloads on first use)
python -c "from ultralytics import YOLO; YOLO('yolov8n.pt')"
# Run the pipeline
export KMP_DUPLICATE_LIB_OK=TRUE # For macOS
python fashion_ai_pipeline.py
- Model: YOLOv8n for object detection
- Confidence: Optimized threshold (0.25) for better detection
- Output: Bounding boxes, clothing types, colors
- Embeddings: CLIP ViT-B/32 for image similarity
- Search: FAISS index for fast matching
- Catalog: 1500+ optimized product subset
- Threshold: 0.35 similarity for meaningful matches
- Method: Keyword-based NLP classification
- Vibes: Clean Girl, Coquette, Cottagecore, Streetcore, Y2K, Boho, Party Glam
- Input: Video captions, hashtags, descriptions
Vibe | Keywords | Example |
---|---|---|
Clean Girl | minimal, natural, linen, cotton, breezy | "easy-breezy cotton vest" |
Coquette | pink, bow, lace, feminine, soft | "cute pink dress with bows" |
Streetcore | urban, edgy, oversized, graphic | "oversized streetwear look" |
Cottagecore | floral, vintage, rustic, prairie | "vintage floral cottage dress" |
Y2K | metallic, futuristic, 2000s | "shiny metallic y2k top" |
Boho | flowing, earthy, bohemian, layered | "flowing boho maxi dress" |
Party Glam | sparkle, sequin, elegant, formal | "sequin party dress" |
# Test with different videos
python test_specific_video.py
# Run individual components
python -c "from src.fashion_detector import FashionDetector; detector = FashionDetector()"
Video Processing Summary:
- β 2025-05-31_14-01-37_UTC: 16 detections β 32 matches β "Streetcore"
- β 2025-05-28_13-42-32_UTC: 32 detections β 64 matches
- β 2025-06-02_11-31-19_UTC: 16 detections β 32 matches β "Clean Girl"
- β 2025-05-27_13-46-16_UTC: 20 detections β 40 matches β "Clean Girl"
- β 2025-05-28_13-40-09_UTC: 24 detections β 48 matches β "Clean Girl"
Key parameters in fashion_ai_pipeline.py
:
catalog_size
: Number of products to use (default: 1500)confidence_threshold
: Detection confidence (default: 0.25)min_confidence
: Matching threshold (default: 0.35)num_frames
: Keyframes per video (default: 5)
- Text: "GRWM in this easy-breezy cotton vest + skirt set β made in linen, made for summer! #LinenSet #SummerOutfit"
{
"video_id": "2025-05-28_13-40-09_UTC",
"vibes": ["Clean Girl"],
"products": [
{
"type": "top",
"color": "brown",
"matched_product_id": 16050,
"match_type": "similar",
"confidence": 0.793
}
]
}
This project was built for the Flickd AI Hackathon. Contributions welcome!
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature
- Commit changes:
git commit -m 'Add amazing feature'
- Push to branch:
git push origin feature/amazing-feature
- Open Pull Request
MIT License - see LICENSE file for details.
Built for the Flickd AI Hackathon - Smart Tagging & Vibe Classification Engine
Competition Requirements:
- β YOLOv8 fashion detection
- β CLIP + FAISS product matching
- β NLP vibe classification
- β Structured JSON output
- β Processing speed optimization
π¬ Ready to revolutionize fashion video analysis! π