Dr. Lu Meeting Notes

# Dr. Lu and Benjamin Meeting notes ## 02/07 ### Questions: Dr Lu please add any questions you think of that you'd like me to answer in this meeting here I would like to change the direction of my ACM MM paper. Originally, my plan was to solve the complex issue of creating a system capable of detecting errors in musical performances. I find myself doubting I can solve this problem very often. After hearing you talk about the halting problem today I realized that there too is a fundamental problem with my current solution—primarily the lack of data on incorrect performances and the ambitious nature of how to train a single model to do this task. It is a bit "sketchy", like you said. I now wish to refocus on a smaller aspect: Multi-Note(polyphonic) Pitch detection. Improving polyphonic pitch detection with multimodality is a more well defined goal with clearer ways to evaluate the system. This task is crucial for most music error detection systems and involves accurately identifying the timing and pitch of multiple notes simultaneously. I propose to enhance this detection by using my current model: integrating video inputs with audio, a step that I hypothesize can significantly boost the accuracy of our pitch detection. This shift in focus is to split this problem into a sequence of more solvable tasks. The main reason for the research remains unchanged. Also, by refining polyphonic pitch detection through multimodal inputs, we are making a critical contribution to the field that I have not yet seen. I would like to hear your thoughts on this. ![2/7 whiteboard](https://github.com/ben2002chou/meeting-notes/blob/main/whiteboard_2_7.jpg?raw=true) ## 01/31 ### Main meeting notes: We talked about the model inputs and discussed why we convert audio to score. We then talked more about how we plan to sychronize the inputs. ### Questions: * What is my contribution? Mainly a model that fuses multiple types of inputs that can handle comparison tasks. * Why do you convert to audio? Because my model works better with audio. Additionally, audio is is considerably less lossy data. * Where is the research? We see the performance evaluation task as a sequence to sequence trasnlations task. We train a model to go from sequences of audio and video to sequences of errors in time. ### How to Synchronize? Ways to teach my model to sychronize. 1. If I sychronize before feeding into model → DTW 2. Model sychronizes by itself → feed in unsynchronized data 3. Synchronize after model outputs → detect what notes and then do string matching Which is the least work? I think 2. ## Archive: ![image alt](https://github.com/ben2002chou/meeting-notes/blob/main/whiteboard_archive.jpg?raw=true)