A mid-size record label needed to identify which music genres to prioritize for next quarter signings. No internal data was available to guide the decision — so we built the answer from scratch using 114K+ Spotify tracks from Kaggle.
Phase 1: Hypothesis Formation. Defined 5 assumptions before touching the data — expected pop and hip-hop to dominate, shorter songs to win, high energy to equal popularity. Writing these out first prevents confirmation bias.
Phase 2: Data Preparation. Sourced 114K Spotify tracks, removed 16K zero-popularity tracks and 1 row with missing data. Result: 97,980 quality tracks across 114 genres.
Phase 3: Rigorous Testing. Tested each assumption against real data — correlations, distributions, segment comparisons. Documented what worked and what didn't.
Phase 4: Visualization. Created 3 simple, clear charts — not dashboards. Each tells one story in under 5 seconds. Ready for stakeholder presentation.
Phase 5: Strategy. Built a tier-based recommendation system with caveats, limitations, and outlined next steps for validation.
I deliberately didn't overstate the findings. What this analysis can't tell you: revenue impact (popularity is not profit), cost to sign artists, audience demographics, or whether these trends will persist. Caveats build trust. This is why the analysis includes next steps — validate with internal data before making major budget decisions.