Automated music information retrieval using deep learning models
-
Clone the repository
git clone https://github.com/jasonmokk/mir-workflow.git cd mir-workflow
Or pull the most recent changes
git pull
Run this command to ensure terminal always uses the correct Node path
echo 'export PATH="/usr/local/bin:$PATH"' >> ~/.zshrc source ~/.zshrc
-
Install dependencies
npm install npx playwright@1.52.0 install
-
Run the analysis
- Add your audio files to the
data/
directory. - Start the process:
npm start
- Add your audio files to the
Results are automatically saved to results/music_analysis_results_*.csv
.
- Automated batch processing of entire music collections
- Multi-dimensional mood analysis using 8 distinct mood models (happy, sad, relaxed, aggressive, electronic, acoustic, party)
- Genre classification with 9-category multi-class prediction (alternative, blues, electronic, folk/country, funk/soul/R&B, jazz, pop, rap/hip-hop, rock)
- Musical feature extraction including tempo (BPM), key detection, and danceability scoring
- Research-ready CSV output with 20+ analysis dimensions for statistical analysis and data visualization
- Audio format support (MP3, WAV - other formats may have limited browser compatibility)
- Memory-optimized processing for handling large datasets efficiently
This tool is designed for large-scale music analysis in academic research. It processes entire music collections to extract interpretable features including mood, rhythm, and harmony, making it a valuable asset for computational musicology, music psychology, and data-driven music research.
Feature | Description | Output Range |
---|---|---|
Danceability | Rhythmic suitability for dancing | 0-1.000 |
Mood - Happy | Positive emotional valence | 0-1.000 |
Mood - Sad | Negative emotional valence | 0-1.000 |
Mood - Relaxed | Low-energy, calm characteristics | 0-1.000 |
Mood - Aggressive | High-energy, intense characteristics | 0-1.000 |
Mood - Electronic | Electronic/synthetic music characteristics | 0-1.000 |
Mood - Acoustic | Acoustic/organic music characteristics | 0-1.000 |
Mood - Party | High-energy, celebratory characteristics | 0-1.000 |
Genre Classification | Multi-label genre probabilities (9 genres) | 0-1.000 each |
BPM | Beats per minute (tempo) | Numeric |
Key | Detected musical key | String |
The system classifies music into 9 genre categories using the Dortmund genre dataset:
- Alternative - Alternative rock and indie music
- Blues - Traditional and contemporary blues
- Electronic - Electronic dance music and synthesized genres
- Folk/Country - Folk, country, and Americana
- Funk/Soul/R&B - Funk, soul, and rhythm & blues
- Jazz - Jazz and jazz fusion
- Pop - Popular music and mainstream genres
- Rap/Hip-Hop - Hip-hop, rap, and related genres
- Rock - Rock music and subgenres
filename,bpm,key,mood_happy,mood_sad,mood_relaxed,mood_aggressive,mood_electronic,mood_acoustic,mood_party,genre_alternative,genre_blues,genre_electronic_genre,genre_folkcountry,genre_funksoulrnb,genre_jazz,genre_pop,genre_raphiphop,genre_rock,danceability
song1.mp3,128,C major,0.852,0.123,0.456,0.238,0.342,0.789,0.567,0.123,0.045,0.234,0.089,0.156,0.067,0.645,0.078,0.234,0.852
song2.wav,95,A minor,0.342,0.678,0.789,0.081,0.156,0.823,0.234,0.089,0.123,0.067,0.456,0.234,0.178,0.345,0.045,0.567,0.674
Collection Size | Estimated Time |
---|---|
10-50 files | 10-15 minutes |
100-200 files | 45-60 minutes |
400+ files | 2-3 hours |
Note: MP3 files generally process fastest. Files larger than 100MB are automatically skipped.
- Frontend: Essentia.js web application with TensorFlow.js model inference
- Backend: Express.js server with Playwright browser automation
- Processing: WebAssembly-optimized batch processing with memory management
- Models: MusiCNN architecture trained on Million Song Dataset (MSD-2)
- Audio Loading - Batch file processing with format validation
- Feature Extraction - Essentia.js spectral and temporal analysis
- Model Inference - CNN-based classification for mood and rhythm
- Harmonic Analysis - Key detection using pitch class profiles
- Beat Tracking - BPM extraction through onset detection
- Data Export - Structured CSV output with validation
The analysis uses pre-trained MusiCNN models:
- Training Datasets: Million Song Dataset (MSD-2) and MagnaTagATune Dataset (MTT-2)
- Architecture: Deep Convolutional Neural Network optimized for music
- Model Count: 9 specialized models for comprehensive analysis
- Inference: Real-time processing via WebAssembly and TensorFlow.js
- Output Types: Binary mood classification, multi-class genre classification, tempo/key analysis, and danceability scoring
Start the web interface for individual file analysis:
npm run server
Access at http://localhost:3000
Issue | Solution |
---|---|
Node.js not found |
Install Node.js 16+ from nodejs.org |
No audio files found |
Add audio files to the data/ directory |
Browser launch failed |
Run npx playwright@1.52.0 install |
Analysis hangs |
Try with fewer files first, check for corrupted audio files |
MIT License - Free for academic and commercial use.