owned this note
owned this note
Published
Linked with GitHub
## Applicant Information
- **Name:** Wei Lun Hsiao
- **Affiliation:** National Taiwan University, Department of Computer Science and Information Engineering
- **Email:** b13902049@csie.ntu.edu.tw
- **GitHub:** [AlenHsiaoWeiLun](https://github.com/AlenHsiaoWeiLun/gsoc_music_agent)
- **Location:** Taipei, Taiwan (GMT+8)
---
## 1. Motivation & Vision
### Bridging Emotion and Code
I’ve always believed that technology should not just be functional—it should *feel*. As a computer science student at National Taiwan University, my journey began with algorithms and abstractions. But over time, I realized that what drives me isn’t just solving problems—it’s building systems that understand people.
Music, for me, has always been more than background noise. It’s an emotional language—one that heals, energizes, and connects. That’s why this project resonates deeply. A music agent that listens, understands, and adapts to your emotional state? That’s not just technically exciting—it’s human.
### Long-Term Vision
This assistant isn’t just about music. It’s a prototype for a new kind of AI—one that’s emotionally aware, conversationally fluent, and designed for humans, not just power users.
In the long run, I plan to evolve this project into both a **research publication** and a **product platform**:
- As research, I aim to study *trajectory-based music recommendation*, *emotion-conditioned semantic reranking*, and *conversational feedback loops*, and turn these insights into a paper on human-centered AI.
- As a product, I envision integrating **real-time biofeedback** via wearable sensors (e.g., heart rate, breathing, GSR) to inform mood inference and dynamically adjust playlists. Imagine calming playlists that **respond to your anxiety levels**, or energizing transitions during workouts—emotion-driven, real-time, and deeply personal.
This is not just code. It’s **an interface between emotion and interaction**—a living agent that learns how to support you, musically and emotionally.
---
## 2. Technical Feasibility & System Design
### Proof-of-Concept Demonstrated
To validate the technical feasibility and reduce implementation risk, I have already developed several working proof-of-concept (PoC) components:
- ✅ **LLM-powered CLI dialogue agent** (Module 5): Uses the locally hosted Mistral-7B-Instruct model to interpret user moods and generate empathetic responses.
- ✅ **Spotify OAuth2 Token Manager** (Module 0): Fully functional token flow, including local `token.json` caching and automatic token refresh.
- ✅ **Fallback playlist engine** (Module 3): Returns mood-based track recommendations when Spotify's `/v1/recommendations` fails.
- ✅ **Feedback-aware reranker** (Module 4): Uses liked/disliked track history to re-score and reorder track recommendations dynamically.
- ✅ **Trajectory-based playlist generator** (Module 2): Builds multi-phase playlists that reflect emotional journeys (e.g., tired → energized), supporting dynamic filtering and audio feature interpolation.
These modules were tested on a local Debian workstation with real-time Spotify API interactions, offline LLM inference, and persistent user preference storage.
### System Architecture
SymMuse is composed of several key modules:
1. **Conversational Frontend Interface**: Web-based UI (React + Tailwind) styled as a chatbot for playlist interaction.
2. **NLP Emotion & Intent Understanding**: Lightweight sentiment and intent detection model using DistilBERT or LLM.
3. **Personalized Playlist Generator**: Emotion-driven generation using Spotify’s audio features and feedback refinement.
4. **Spotify API Integration Layer**: Secure OAuth2 token handling and playlist management.
5. **Privacy-Friendly Exploratory Mode**: Offline playlist recommendations using public datasets.
6. **Feedback & Refinement System**: Enables interactive playlist edits (e.g., “more upbeat”).
7. **Backend API Service**: RESTful interface for frontend, AI inference, and Spotify sync.
8. **Documentation & Deployment Tools**: For future contributors and deployment readiness.
9. **Proof-of-Concept Demo**: Video walkthrough to showcase system capabilities.
---
## 3. Innovation and Originality
SymMuse introduces a novel fusion of real-time speech-based emotion detection, semantic parsing, and intelligent music curation. Unlike existing platforms that rely solely on user history or static preferences, SymMuse responds to transient emotional cues and adapts in the moment.
Key innovations:
- **Emotion trajectory modeling** based on speech and conversation
- **Explainable recommendations** via LLM-generated feedback
- **Edge deployability** with privacy-preserving architecture
This is not a repackaged toolchain—it is a ground-up system tailored to enhance digital empathy.
---
## 4. Expected Impact
### Commercial Potential
- Add-on module for streaming platforms (Spotify, KKBOX)
- Integration with voice assistants or wearables (e.g., Apple Watch)
- Emotion-aware BGM for automotive, meditation, or mental health apps
### Social Value
- Promotes emotional self-awareness and resilience
- Provides gentle, ambient mental health support
- Sparks interdisciplinary innovation between music, psychology, and AI
SymMuse reflects a future where technology doesn’t just compute, but listens and feels.
---
## 5. Development Plan
### Current Progress
- SER model trained and deployed locally
- NLP module integrated with Mistral-7B
- CLI-based prototype generates real Spotify playlists
### Next Steps
- Front-end voice interface
- Heart-rate based emotion fusion
- Field testing with users
### Prototype UI Overview
The proposed conversational frontend interface follows a chatbot-style layout and emphasizes emotional awareness, real-time refinement, and explainable suggestions.

---
---
## Deliverables
1. **Conversational Frontend Interface**
- A responsive web-based UI built with React + Tailwind, styled like a chatbot (e.g., ChatGPT).
- Enables users to enter natural language queries (e.g., “play something chill”), receive music recommendations, and interact with playlists.
- Includes quick buttons for modifying mood, energy, genre, and toggling privacy mode.
2. **NLP Emotion & Intent Understanding Module**
- Lightweight NLP component capable of detecting user mood, intent (e.g., “add jazz”), and context (e.g., “for studying”).
- Powered by DistilBERT or an external LLM via API for prompt-based classification.
- Supports multilingual inputs (extendable).
3. **Personalized Playlist Generator**
- Generates mood-aligned playlists by mapping emotional cues to Spotify audio features (valence, energy, tempo, etc.).
- Uses Spotify’s recommendation engine, user’s listening history, and genre taxonomy.
- Supports multi-stage playlist arcs (e.g., emotional trajectory modeling).
- Re-ranks tracks based on implicit or explicit feedback (e.g., skipped or liked songs).
- Adjustable with follow-up inputs like “make it more energetic”.
4. **Spotify API Integration Layer**
- Handles OAuth2 authentication and secure token management.
- Provides access to user metadata (liked tracks, playlists, top artists).
- Enables playlist creation, modification, and real-time update.
5. **Exploratory (Privacy) Mode**
- Allows users to use the recommender without logging into Spotify.
- Uses open datasets (e.g., Moodify) and Spotify’s generic genre/mood seed data.
- Ensures full functionality without accessing personal data.
6. **Real-time Feedback & Refinement System**
- Supports interactive commands like “more upbeat”, “add lofi”, “remove vocals”.
- Refines current playlist using updated audio feature parameters.
- Prevents full regeneration unless explicitly requested by user.
7. **Modular Backend API Service**
- RESTful API endpoints to handle frontend queries, AI inference, playlist logic, and Spotify communication.
- Designed to be stateless and scalable.
- Includes internal logging and user session handling.
8. **Developer Documentation & Deployment Guide**
- Setup instructions, module overviews, and API usage examples.
- Includes annotated code, architecture diagram, environment setup, and example queries.
- Helps contributors extend or deploy the system easily.
9. **Proof-of-Concept Demo & Video Walkthrough**
- A short recorded demo showing real-time interaction with the agent, playlist generation, and refinement flow.
- Serves as a milestone check and onboarding aid for community contributors.
---
## Implementation Modules
### system Architecture

---
### Module 0: Spotify OAuth2 Token Manager
#### Goal
To securely and automatically manage access to the Spotify Web API by implementing a robust OAuth2 token handling system. This module enables the rest of the application (playlist generation, refinement, metadata retrieval) to interact with Spotify without manual token refresh or reauthentication.
#### Problem
Spotify's access tokens expire every 3600 seconds (1 hour). If not refreshed in time, API calls will fail with authorization errors. Manually updating access tokens is error-prone, disrupts workflow, and breaks automated systems. Moreover, hardcoding tokens poses security and privacy risks.
#### Solution & Implementation
This module uses the full Spotify OAuth2 authorization code flow with refresh tokens and automatic expiry checking. It:
- Loads client credentials from a `.env` file
- Launches a local Flask server to receive the OAuth callback
- Exchanges authorization code for access & refresh tokens
- Saves token information in a local `token.json`
- Automatically refreshes the token when expired
#### File Structure
```
~/gsoc_spotify_agent/
├── spotify_auth/
│ ├── token_manager.py # Token logic (get, refresh, save)
│ ├── secrets.env # Contains client ID, secret, redirect URI
│ └── token.json # Auto-generated on first login
└── token_test.py # Usage example and verification
```
#### Code Snippet
```python
from spotify_auth.token_manager import SpotifyAuth
auth = SpotifyAuth()
token = auth.get_access_token()
print("Access Token:", token[:50], "...")
```
#### Authorization Flow
1. On first run, opens browser for user to log in via Spotify
2. Redirects to `http://localhost:8888/callback`
3. Flask app intercepts the code and completes the handshake
4. Writes access + refresh token to disk
5. Subsequent runs auto-refresh expired tokens
#### Example Output
```
Access Token: BQCJv9kiaDKPTtHBMuY0hnn6Tzu5M8jU2UVBUnMVlEsduRvONJ ...
```
#### Extensibility Plan
- Token expiration fallback for API calls (auto retry)
- Multi-user support by storing tokens under user ID
- Token encryption for safer storage
- GUI login flow for desktop use
#### Summary
The token manager is the backbone of all authenticated Spotify API operations in this project. It eliminates friction, ensures API reliability, and aligns with OAuth2 best practices. Every module that talks to Spotify depends on this layer, and it has been built with extensibility, security, and automation in mind.
### Module 1: Mood & Sentiment Detection Engine
#### Goal
To interpret user input and infer both affective sentiment (e.g., positive, neutral, negative) and deeper emotional states (e.g., joy, sadness, anxiety, fatigue) through a cognitively-informed NLP pipeline. This module serves as the emotion comprehension layer of the conversational music agent—mirroring how humans infer others' emotions based on language, context, and tone. It provides the foundation for empathy-aligned playlist recommendations.
Inspired by appraisal theory in cognitive science, the goal is not only to classify emotion, but also to interpret implicit expressions of mood, even when users do not explicitly name their feelings.
#### Problem
Emotion detection from text is inherently ambiguous and context-dependent. Users often express emotional states indirectly or with mixed emotional valence:
- "I just want to lie down" may imply tiredness or sadness.
- "That movie was exhausting, but incredible" conveys emotional contradiction.
A purely sentiment-based classifier is insufficient. We need a low-latency model that integrates both sentiment and mood cues, understands implicit emotion, and is robust against hallucinated labels, especially in ambiguous or conversational settings.
#### Solution & Implementation
We begin with a lightweight transformer-based sentiment model: `cardiffnlp/twitter-roberta-base-sentiment`, which is well-suited for casual, emotionally rich language.
```python
# sentiment_api.py
from fastapi import FastAPI
from pydantic import BaseModel
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from scipy.special import softmax
import torch
app = FastAPI()
MODEL = "cardiffnlp/twitter-roberta-base-sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
class TextRequest(BaseModel):
text: str
@app.post("/sentiment")
def get_sentiment(req: TextRequest):
encoded = tokenizer(req.text, return_tensors="pt")
logits = model(**encoded).logits
probs = softmax(logits.detach().numpy()[0])
label = ["Negative", "Neutral", "Positive"][probs.argmax()]
confidence = float(probs.max())
return {
"text": req.text,
"sentiment": label,
"confidence": round(confidence, 3)
}
```
#### Example Output (via Swagger UI)
```json
POST /sentiment
{
"text": "I'm so happy I could cry!"
}
Response:
{
"text": "I'm so happy I could cry!",
"sentiment": "Positive",
"confidence": 0.99
}
```
#### Extensibility Plan
To better reflect real-world emotion, this module is designed with the following upgrades in mind:
- Add fine-grained emotion classification (`joy`, `anger`, `fear`, etc.) using the `go_emotions` dataset or `m3e` embeddings + KNN clustering
- Integrate voice-based emotional prosody analysis for future multimodal input
- Incorporate multi-turn context tracking to capture temporal emotion shifts, enabling richer cognitive modeling (e.g., frustration buildup or mood transitions)
- Explore emotion commonsense reasoning models (e.g., `COMET` or `EmoCause`) for implicit cause-effect emotion detection
#### Summary
This module forms the first layer of affective intelligence for the Spotify music agent. Beyond mere sentiment tagging, it introduces a cognitively grounded approach to emotion detection—modeling how humans interpret affect through language. It is API-ready, real-time, and designed for modular integration with playlist generation and user feedback loops.
By anchoring this component in both NLP and cognitive science, we ensure that future playlist recommendations are not just technically relevant, but emotionally resonant and contextually appropriate.
---
### Module 2: Playlist Generator with Spotify API (Advanced Strategy)
#### Goal
To generate highly personalized and emotionally aligned Spotify playlists using inputs from the NLP engine (Module 1). This module constructs not just a list of songs, but a curated music journey tailored to the user's mood, energy, and contextual needs.
#### Problem
Users typically require 2–3 hours of music in one session (~50+ tracks), but Spotify’s recommendation API returns a limited number of suggestions with diminishing quality. A single batch of recommendations rarely suffices to ensure emotional accuracy, musical diversity, and satisfaction.
Additionally, static or shallow queries (e.g., based solely on valence and energy) cannot fully capture the complexity of emotional states, context (like weather, activity, or time), or personal taste.
#### Solution & Implementation
We treat Spotify’s API not as a final recommender, but as a music pool. Our approach consists of multi-stage sampling, mood trajectory modeling, intelligent reranking, and robust playlist assembly.
##### Key Components
1. **Mood Trajectory Modeling**
- Use user emotion input to generate 2–4 mood stages (e.g., "tired but want to feel energetic" → low valence/energy → mid → high).
- Each stage has its own valence/energy targets and contributes ~15–20 tracks.
2. **Multi-Round Genre Sampling**
- For each stage, query Spotify `/v1/recommendations` multiple times with varied `seed_genres`, `seed_artists`, and valence/energy.
- Introduce ~20% "surprise genres" to promote diversity.
- Aim to collect a candidate pool of 100–150 songs.
3. **User Profile Matching**
- Retrieve user top artists/tracks using `GET /v1/me/top/{type}`.
- Score recommendations based on acoustic similarity (e.g., danceability, tempo, mood tags) or embedding proximity.
4. **Context-Aware Filtering & Reranking**
- Filter tracks using context keywords (e.g., remove vocals for "deep work").
- Prioritize songs that align with the current activity or emotional goal.
- Deduplicate artists, avoid recent repeats, and ensure smooth transitions.
5. **Playlist Assembly**
- Create a new playlist via `POST /v1/users/{user_id}/playlists`.
- Add top 50–70 filtered and scored tracks using `POST /v1/playlists/{playlist_id}/tracks`.
#### API Endpoint Availability Verification
| Endpoint | Purpose | Verified | Notes |
|----------|---------|----------|-------|
| `GET /v1/recommendations` | Base music recommendation | ✅ Yes | Used multiple times with different seeds per stage. |
| `POST /v1/users/{user_id}/playlists` | Create playlist | ✅ Yes | Named dynamically (e.g., "Mood Journey – Apr 5"). |
| `POST /v1/playlists/{playlist_id}/tracks` | Add tracks | ✅ Yes | Supports batch updates. |
| `GET /v1/audio-features/{id}` | Track features | ✅ Yes | Used for similarity scoring and filtering. |
| `GET /v1/available-genre-seeds` | Validate genres | ✅ Yes | Filters invalid genre tags. |
| `GET /v1/me/top/{type}` | User top artists/tracks | ✅ Yes | Powers personalization layer. |
#### Example Usage Snippet
```python
uris = []
for mood in mood_trajectory:
for genre in seed_genres + surprise_genres:
batch = get_recommendations(seed_genres=[genre], **mood)
uris.extend(batch)
scored = rerank_with_profile(uris, user_profile, context)
playlist = scored[:60]
create_playlist(name="Mood Journey", tracks=playlist)
```
#### Full PoC Script (Validated)
The following script demonstrates a full working implementation of the advanced playlist strategy. It generates a 60-track playlist using mood trajectory modeling, multi-round genre sampling, and dynamic playlist creation via the Spotify API.
```python
from spotify_auth.token_manager import SpotifyAuth
import requests
import random
ACCESS_TOKEN = SpotifyAuth().get_access_token()
USER_ID = "your_user_id"
HEADERS = {
"Authorization": f"Bearer {ACCESS_TOKEN}",
"Content-Type": "application/json"
}
def get_recommendations(seed_genres, valence, energy, limit=10):
url = "https://api.spotify.com/v1/recommendations"
params = {
"seed_genres": ",".join(seed_genres),
"target_valence": valence,
"target_energy": energy,
"limit": limit
}
res = requests.get(url, headers=HEADERS, params=params)
return [track["uri"] for track in res.json()["tracks"]]
def create_playlist(name, track_uris):
create_url = f"https://api.spotify.com/v1/users/{USER_ID}/playlists"
payload = {"name": name, "description": "Auto-generated by MeowWave", "public": False}
res = requests.post(create_url, headers=HEADERS, json=payload)
playlist_id = res.json()["id"]
# Batch add
add_url = f"https://api.spotify.com/v1/playlists/{playlist_id}/tracks"
for i in range(0, len(track_uris), 100):
chunk = {"uris": track_uris[i:i+100]}
requests.post(add_url, headers=HEADERS, json=chunk)
return f"https://open.spotify.com/playlist/{playlist_id}"
# ----- Full Example Flow -----
mood_trajectory = [
{"valence": 0.3, "energy": 0.2},
{"valence": 0.5, "energy": 0.5},
{"valence": 0.8, "energy": 0.7}
]
seed_genres = ["ambient", "acoustic"]
surprise_genres = random.sample(["indie", "folk", "jazz"], 1)
all_uris = []
for mood in mood_trajectory:
for genre in seed_genres + surprise_genres:
tracks = get_recommendations([genre], mood["valence"], mood["energy"], limit=10)
all_uris.extend(tracks)
playlist_url = create_playlist("Mood Journey – Apr 5", all_uris[:60])
print("✅ Playlist created:", playlist_url)
```
#### Inputs
- `seed_genres`, `seed_artists`, `seed_tracks`: Inferred from NLP module and user history.
- `valence`, `energy`: Per stage of mood trajectory.
- `context_keywords`: Affect ranking and filtering logic.
#### Output
- A Spotify playlist (new or updated) with ~50 emotionally coherent and contextually appropriate songs.
#### Extensibility Plan
- Use embeddings or vector similarity for deeper personalization.
- Incorporate Module 4 feedback for active playlist evolution.
- Allow real-time regeneration via conversational agent (Module 5).
- Explore integration with local LLMs.
#### Summary
Unlike naive playlist generators, this module models musical journeys, collects from diverse sources, filters and scores candidates, and delivers long-form playlists that reflect a user’s emotional context. It combines Spotify’s engine with custom logic and future extensibility hooks, forming the core of a deeply personalized music assistant.
Future work will explore replacing Spotify’s API with a fully self-trained recommender using custom embeddings and collaborative filtering based on anonymized user taste graphs.
---
### Module 3: Smart Recommendation Fallback Engine
#### Goal
To ensure the music agent can always provide recommendations, even when Spotify's official `/v1/recommendations` API fails due to insufficient listening history, region lock, or cold-start limitations. This module introduces a fallback mechanism based on predefined mood-tagged tracks.
#### Problem
Spotify’s recommendation API can unpredictably return 404 errors when there is insufficient listening data, or for newly created or low-activity accounts. This failure breaks the user experience and interrupts the emotion-to-music pipeline.
#### Solution & Proof of Concept (PoC)
We define a hard-coded track pool categorized by mood (e.g., "happy", "sad", "chill"). If the API call to Spotify fails, we randomly sample from the corresponding category and populate a playlist via Spotify’s playlist management API. This guarantees uninterrupted functionality and user satisfaction.
#### Fallback Track Pool
```python
MOCK_TRACKS = {
"happy": [
"5W3cjX2J3tjhG8zb6u0qHn", # Ed Sheeran - Shape of You
"3AhXZa8sUQht0UEdBJgpGc", # Pharrell Williams - Happy
"7y7w4M3zP28X4PjB0KukLx", # Justin Timberlake - Can't Stop The Feeling!
],
"sad": [
"2dLLR6qlu5UJ5gk0dKz0h3", # Adele - Someone Like You
"4JpKVNYnVcJ8tuMKjAj50A", # Sam Smith - Too Good At Goodbyes
"1rqqCSm0Qe4I9rUvWncaom", # Lewis Capaldi - Someone You Loved
],
"chill": [
"3Zwu2K0Qa5sT6teCCHPShP", # Billie Eilish - ocean eyes
"2eBnhLqmuM0r8C3O1aYJEa", # Khalid - Location
"0rCYPc082fS0P8U7EfUwCk", # Mac Miller - Good News
]
}
```
#### PoC Code Snippet
```python
from spotify_auth.token_manager import SpotifyAuth
import requests
import random
# Return fallback tracks based on mood
def recommend_by_mood(mood="chill", n=3):
track_pool = MOCK_TRACKS.get(mood, MOCK_TRACKS["chill"])
return random.sample(track_pool, k=min(n, len(track_pool)))
# Add selected tracks to Spotify playlist
def add_tracks_to_playlist(track_ids, playlist_id):
token = SpotifyAuth().get_access_token()
headers = {"Authorization": f"Bearer {token}", "Content-Type": "application/json"}
url = f"https://api.spotify.com/v1/playlists/{playlist_id}/tracks"
payload = {"uris": [f"spotify:track:{tid}" for tid in track_ids]}
res = requests.post(url, headers=headers, json=payload)
return res.status_code == 201
# PoC test run
if __name__ == "__main__":
mood = "happy"
playlist_id = "1cPzpV6Mt5UvBplFTad5rh"
track_ids = recommend_by_mood(mood)
print("🎧 Tracks to add:", track_ids)
if add_tracks_to_playlist(track_ids, playlist_id):
print("✅ Added fallback recommendations to playlist")
else:
print("❌ Failed to add tracks")
```
#### Extensibility Plan
- Move fallback tracks to `tracks.json` or `tracks.csv` for maintainability
- Integrate fuzzy matching for mood inputs (e.g., "relaxed" → "chill")
- Add metadata caching for song titles and cover art
- Connect fallback system to sentiment detection module
- Support automatic playlist naming: e.g., “Your Happy Mix – March 29”
#### Summary
This fallback module guarantees playlist generation regardless of Spotify’s internal recommendation availability. It builds trust with users and provides a foundation for more advanced hybrid recommendation systems that mix AI, NLP, and curated datasets.
---
### Module 4: Playlist Feedback & Personalization Engine
#### Goal
To close the loop between the AI music agent and the user by enabling dynamic feedback, preference learning, and real-time playlist personalization. This module empowers users to like/dislike specific tracks or request changes via natural language, thereby improving future recommendations and enabling adaptive listening experiences.
#### Problem
Spotify's API does not provide deep reinforcement learning mechanisms out-of-the-box. While initial recommendations may be useful, users often want to guide the music selection based on subjective experience. Without a feedback loop, the system cannot improve or adapt to user preferences over time.
Additionally, not all users are comfortable with full personalization; thus, the system must support opt-in learning and an alternative "exploration mode."
#### Solution & Implementation
The system introduces a feedback mechanism that captures user preferences (e.g., "like" or "dislike") either through UI interactions or conversational feedback. These preferences are stored locally as JSON-based user profiles.
A feedback-aware playlist generator uses these preferences to:
- Prioritize previously liked songs or similar tracks
- Avoid disliked or skipped songs
- Adjust valence/energy/genre weights based on user preference clusters
If a user opts out of personalization, the system will use an exploratory mode based on mood, genre, and random fallback selections.
#### Proof of Concept (PoC)
We simulate user feedback on track IDs and update a local preference store (`preferences.json`):
```json
{
"likes": ["5W3cjX2J3tjhG8zb6u0qHn", "3Zwu2K0Qa5sT6teCCHPShP"],
"dislikes": ["2dLLR6qlu5UJ5gk0dKz0h3"]
}
```
This preference store can be loaded by the playlist generator to reweight or re-rank recommendations in future sessions.
#### PoC Python Script
```python
# module4_feedback_test.py
from typing import Dict, List
import json
import os
PROFILE_PATH = "user_data/preferences.json"
FEEDBACK_EVENTS = [
{"track_id": "5W3cjX2J3tjhG8zb6u0qHn", "liked": True},
{"track_id": "2dLLR6qlu5UJ5gk0dKz0h3", "liked": False},
{"track_id": "3Zwu2K0Qa5sT6teCCHPShP", "liked": True}
]
os.makedirs("user_data", exist_ok=True)
if not os.path.exists(PROFILE_PATH):
with open(PROFILE_PATH, "w") as f:
json.dump({"likes": [], "dislikes": []}, f)
def update_preferences(events: List[Dict]):
with open(PROFILE_PATH, "r") as f:
prefs = json.load(f)
for event in events:
tid = event["track_id"]
if event["liked"] and tid not in prefs["likes"]:
prefs["likes"].append(tid)
if tid in prefs["dislikes"]:
prefs["dislikes"].remove(tid)
elif not event["liked"] and tid not in prefs["dislikes"]:
prefs["dislikes"].append(tid)
if tid in prefs["likes"]:
prefs["likes"].remove(tid)
with open(PROFILE_PATH, "w") as f:
json.dump(prefs, f, indent=2)
if __name__ == "__main__":
update_preferences(FEEDBACK_EVENTS)
print("Preferences updated. Current profile:")
with open(PROFILE_PATH, "r") as f:
print(json.dumps(json.load(f), indent=2))
```
#### Feedback-Based Reranking
We add a reranking step that prioritizes recommended tracks based on the feedback profile. Each track is scored based on its valence plus a weight determined by whether it has been liked or disliked in the past.
```python
# feedback_reranker.py
import json
from typing import List, Dict
PREF_PATH = "user_data/preferences.json"
RECOMMENDED_TRACKS = [
{"id": "5W3cjX2J3tjhG8zb6u0qHn", "valence": 0.9},
{"id": "2dLLR6qlu5UJ5gk0dKz0h3", "valence": 0.3},
{"id": "1rqqCSm0Qe4I9rUvWncaom", "valence": 0.5},
{"id": "3Zwu2K0Qa5sT6teCCHPShP", "valence": 0.6},
{"id": "7y7w4M3zP28X4PjB0KukLx", "valence": 0.7}
]
LIKE_BOOST = 1.0
DISLIKE_PENALTY = -1.0
def rerank_with_feedback(tracks: List[Dict], preferences: Dict) -> List[Dict]:
for t in tracks:
tid = t["id"]
score = t["valence"]
if tid in preferences["likes"]:
score += LIKE_BOOST
elif tid in preferences["dislikes"]:
score += DISLIKE_PENALTY
t["score"] = round(score, 4)
return sorted(tracks, key=lambda x: x["score"], reverse=True)
if __name__ == "__main__":
with open(PREF_PATH, "r") as f:
prefs = json.load(f)
reranked = rerank_with_feedback(RECOMMENDED_TRACKS, prefs)
print("Reranked Recommendations (Top First):")
for t in reranked:
print(f"{t['id']} (score={t['score']})")
```
#### Extensibility Plan
- Add conversation-based feedback interpretation (e.g., "this one is too slow" → adjust tempo)
- Sync preferences with cloud (for persistence across devices)
- Use clustering/embedding-based similarity search to expand liked songs
- Add support for opt-out toggle in UI or config
- Integrate scoring into backend API responses
#### Summary
This module provides the memory and learning layer of the AI agent. It enhances the user experience over time by adapting to personal tastes and enabling feedback-driven refinement. It also supports privacy-conscious users by offering a non-personalized exploratory mode.
This is a key differentiator from static recommendation agents and allows the system to evolve with each interaction.
---
### Module 5: Conversational Interface & Dialogue Controller
#### Goal
To enable emotionally aware, real-time, and intuitive interaction between users and the AI music agent through a chatbot-style interface. This module combines two LLM-powered components: one that generates warm, human-like responses, and another that parses actionable intents for playlist control.
#### Problem
Users often prefer expressing music preferences in free-form natural language ("I'm sad but want to feel better") rather than technical parameters (valence, energy, etc.). Traditional keyword-based systems fail to capture subtle intent, mood shifts, or emotional nuance.
#### Solution & Implementation
This module employs a **two-layer conversational architecture**:
1. **LLM Response Generator** – Generates empathetic, mood-matching replies using a locally deployed open-source LLM (e.g., `Mistral-7B-Instruct`).
2. **Intent & Emotion Extractor** – Parses the user's message into structured data for downstream modules (e.g., mood, energy, genre hints) using prompt engineering or fine-tuned classifiers.
Both layers operate asynchronously or can be fused into one when using a multi-turn capable model.
#### CLI PoC: Empathetic Response Generator
```python
# dialogue_agent.py
import os, torch
from transformers import AutoTokenizer, AutoModelForCausalLM
MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.1"
CACHE_DIR = "/tmp/mistral_cache"
HF_TOKEN = os.environ.get("HF_TOKEN")
# Load model
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, cache_dir=CACHE_DIR, token=HF_TOKEN, torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, cache_dir=CACHE_DIR, token=HF_TOKEN)
def generate_response(user_input: str) -> str:
prompt = f"""<s>[INST] You are an empathetic AI music assistant named MeowWave. Your task is to understand the user's emotional state and suggest music that matches their mood. Keep your response warm, concise, and avoid repeating the input. Always offer a genre or vibe that fits. Now respond to this message:\n\"{user_input}\" [/INST]"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True, top_p=0.9, temperature=0.7)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
return decoded.split("[/INST]")[-1].strip()
```
#### Example A: Emotional Support Interaction
```
🎧 Mistral MusicBot is ready! Type 'exit' to quit.
You: I’m feeling heartbroken and don’t want to think too much.
Agent: I'm sorry to hear that you're feeling heartbroken. It's important to take time for yourself and give yourself space to heal. Music can be a powerful tool in helping you through this difficult time. I suggest listening to some soft, soulful ballads in the pop or singer-songwriter genre. These genres often feature lyrics that express emotions of heartbreak and loss, and the gentle melodies can be soothing and comforting.
```
#### Example B: Productivity Context Interaction
```
🎧 Mistral MusicBot is ready! Type 'exit' to quit.
You: I'm starting a deep work session. Any good tunes?
Agent: Sure thing! For a productive work session, I recommend some instrumental music. Jazz, classical, or ambient electronic music would be great options. Would you like me to play some of that for you?
```
#### Intent & Emotion Extraction (Structured Layer)
Example output from a separate intent parser LLM (or future fine-tuned classifier):
```json
{
"mood": "heartbroken",
"valence": 0.2,
"energy": 0.3,
"context": ["emotional support"],
"recommended_genres": ["singer-songwriter", "ballads"]
}
```
This can be passed directly to Module 2's playlist generation pipeline.
#### Extensibility Plan
- Introduce memory/state (e.g., recent feedback, persistent mood)
- Add LLM role customization: "DJ Chillwave" vs "Motivation Coach"
- Enable GUI toggles for personality, tone, and recommendation style
- Multi-turn context tracking for evolving dialogue sessions
#### Summary
This module transforms the system from a static recommender to a truly conversational music agent. With both warm, human-like dialogue and actionable data extraction, it becomes a compelling emotional assistant and playlist navigator. The dual-layer design ensures flexibility, modularity, and the ability to evolve toward more sophisticated use cases.
---
### Module 6: Explainable Recommendation & Transparency Engine
#### Goal
To increase user trust, transparency, and engagement by providing human-readable explanations for why specific tracks are recommended. This module bridges the gap between black-box recommendation logic and interpretable, user-facing feedback. It draws on cognitive theories of explanation, transparency, and emotional alignment to enhance human-agent understanding.
Informed by affective computing and human-centered AI design, the module aims to deliver justifications that feel emotionally resonant, personalized, and context-aware.
#### Problem
Users often feel disconnected from recommendation systems due to their opaque nature. Without transparency, users are less likely to trust or engage meaningfully with the system. The lack of interpretability limits opportunities for feedback, learning, and affective bonding.
Psychological studies indicate that users are more likely to trust and follow AI decisions when they understand the rationale behind them. For emotionally driven use cases like music, explanations aligned with a user’s current state and preferences can significantly boost satisfaction and perceived empathy.
#### Solution & Implementation
We implement a dual-layer explanation engine that provides:
- Textual justifications based on user mood, preferences, and track features
- Visual cues, such as radar charts and affective icons, to intuitively convey audio profiles
These explanations are derived from a combination of:
- User mood/sentiment (from Module 1)
- Preferences (likes/dislikes, from Module 4)
- Track-level features (valence, energy, genre, acousticness, etc.)
- Similarity to known liked tracks (embedding-based or rule-based)
Explanations are generated through structured templates and optionally enhanced via large language models (LLMs), with future extensions into real-time adaptive narrative generation.
#### Example: Text-Based Explanation
"This track was selected because it aligns with your current mood of 'heartbroken' through low valence and tempo. It also resembles your liked artist Adele, and you’ve previously enjoyed similar ballads."
*Energy: 0.24 | Valence: 0.18 | Acousticness: High | Lyrics: Emotional breakup theme*
#### Example: Visual Radar Chart Comparison
Compare the track with your historical preferences across key dimensions.

#### PoC Implementation (Rule-Based Explanation Generator)
```python
# explain_track.py
def explain(track_id, features, user_profile):
mood = user_profile.get("mood", "neutral")
energy = user_profile.get("energy", 0.5)
genre = user_profile.get("genre", "pop")
explanation = f"This track matches your recent mood ({mood})"
if features.get("energy"):
if features["energy"] > energy:
explanation += ", and has slightly more energy to elevate your experience."
else:
explanation += ", and has calmer energy to match your preference."
if features.get("genre") == genre:
explanation += f" It also belongs to your preferred genre: {genre}."
if features.get("similar_to_liked"):
explanation += " Similar to songs you’ve liked before."
return explanation
# Example call
sample_features = {"energy": 0.7, "genre": "pop", "similar_to_liked": True}
user_profile = {"mood": "happy", "energy": 0.6, "genre": "pop"}
print(explain("track123", sample_features, user_profile))
```
#### Integration in UI
Each recommended track can include a collapsible explanation panel:
```
Track Title – Artist Name
"This track aligns with your mood (calm), has low energy, and fits your preferred genre: jazz."
Acousticness: High | Energy: Low | Genre Match: Yes
Explain: [Expand]
```
#### Extensibility Plan
- Integrate cosine similarity with liked-song embeddings for personalized traceability
- Support real-time explanation generation using transformer-based LLMs
- Add visual explanation components (radar chart, color-coded mood tags)
- UI toggles for "Why was this recommended?" to encourage curiosity-driven interaction
#### Summary
This module transforms the music agent from a black-box predictor into a transparent, explainable system. By surfacing structured, emotionally grounded explanations, it builds trust, supports reflective feedback, and deepens user engagement.
Explanations serve as cognitive anchors for users to understand and influence the agent's behavior—making the system not only smarter, but also more relatable and human-aligned.