꿈꾸는 AI 2020 DREAM_AI Open Challenge

# 꿈꾸는 AI 2020 DREAM_AI Open Challenge ## 주제 - 드라마 배경음악 자동 생성/탐색 ## 문제 정의 > 드라마 1편에는 약 10~20여개의 배경음악이 사용되고 있습니다. 모든 드라마에는 음악감독이 있어서 배경음악을 삽입하고 있지만, 모든 배경음악을 다 작곡하기는 어려우므로 저작권이 있는 음악을 활용하고 있습니다. 그러나 전세계 모든 곡의 저작권을 다 파악할 수는 없으므로 활용이 가능한 음원인지 아닌지를 판단하기가 너무 어렵고 비용도 많이 들어갑니다. 이에, 드라마 내의 각 장면(Scene)에 저작권료가 없는 AI 배경음악을 삽입하는 기술을 개발하고자 합니다. (※ AI 작곡도 가능하며, 또는 License-free 음원 탐색도 가능함.) - 무료로 오픈된 웹 드라마 파일 (2개 제공) - 총 10개 장면에 대해서 장면 당 0~10점, 총 100점 만점으로 평가. - 산출물 1) 알고리즘 구조에 대한 설명서 (자유 양식) 2) 음악 파일은 mid 형식으로 출력 ## 개발관련 1. [음원 분류](#음원-분류) **(예찬)** - 대사, BGM, 소음 간 분류 2. 분류한 BGM 정보 추출 - BGM의 제목, 장르, 연령대, 분위기 3. 대사 Feature 정보 추출 **(지민)** - Tempo, emotion 추출 4. [영상 Feature 정보 추출](#영상-feature-정보-추출) - 영상의 분위기 정보 5. Feature와 BGM 관계 파악 - 모든 Feature 와 BGM간 상관관계 알아내기 6. DataSet 찾기 **(다같이)** - Free License 노래 찾기 - Label 된 데이터 찾아보기 - BGM에 감정, 분위기 등과 같은 정보가 함께 있는 데이터 ## Brainstorming - 노래의 흐름과 대사의 흐름 매칭 - 한 장면에서 여러 감정이 들어나는 경우 - 영상에서 장면의 변화 정도를 알아낼 수 있는지 - 영상 속 물체의 Optical Flow 속에서 영상의 tempo를 추출할 수 있는가 ## 주체측에 질문할 내용 - 웹드라마 파일의 길이는 대략 어느정도 인가? - AI로 음원 생성시 가산점이 있는가? - 대사가 Text로 주어지는 가? - 프로젝트에서 사용한 데이터를 공개하지 않아도 되는가? - 데이터 출력 형식에 대한 구체적인 설명 요구 --- ## 음원 분류 - [Music Source Separation on MUSDB18](https://paperswithcode.com/sota/music-source-separation-on-musdb18?p=open-unmix-a-reference-implementation-for) - Demucs and Conv-Tasnet in facebook research - MIT License - [github site](https://github.com/facebookresearch/demucs) - [MusDB](https://sigsep.github.io/datasets/musdb.html) - [Paper](https://hal.archives-ouvertes.fr/hal-02379796/document) - [Result](https://ai.honu.io/papers/demucs/index.html) - my result - [myresult](https://github.com/fbdp1202/fbdp1202.github.io/tree/master/assets/wav/itw) - Wave-U-Net-Pytorch - [github site](https://github.com/f90/Wave-U-Net-Pytorch) - Linux-based - CUDA 10.1 - [Paper](https://arxiv.org/pdf/1806.03185.pdf) - OpenUnmix - MIT License - [github site](https://github.com/sigsep/open-unmix-pytorch) ## Melody Extraction - [github - Melody-extraction-with-melodic-segnet](https://github.com/bill317996/Melody-extraction-with-melodic-segnet) - MIT License - [Paper](https://arxiv.org/pdf/1810.12947.pdf) - pytorch 0.4.1 - Example Result ![](https://i.imgur.com/7kOJ2zF.png) - [github - melodyExtraction_JDC](https://github.com/keums/melodyExtraction_JDC) - MIT License - keras - 0.1초 단위로 frequency 정보가 추출됨. - itw 결과 - Example Result (test_audio_file.mp4) ![](https://i.imgur.com/eN2pBRn.png) - 데이터 format ![](https://i.imgur.com/VLlgfvd.png) - melody (others) (000002lab2_melody.wav) ![](https://i.imgur.com/FuvJIkS.png) - 사람 목소리 (vocals, 노래가사 + 대사) (000002lab2_vocal.wav) ![](https://i.imgur.com/iv8MELt.png) - [github - Vocal-Melody-Extraction](https://github.com/s603122001/Vocal-Melody-Extraction) - MIT License - keras ## 음악 찾기 - Shazam API - https://rapidapi.com/apidojo/api/shazam/endpoints - result ![](https://i.imgur.com/XFzdV3N.png) ![](https://i.imgur.com/RHojYQG.png) - ACRCloud - [HomePage](https://www.acrcloud.com/?utm_source=chrome&utm_medium=extension) - [SDK Page](https://console-v2.acrcloud.com/avr?region=eu-west-1#/dashboard) - ![](https://i.imgur.com/TQ9gUtW.png) - ![](https://i.imgur.com/1XquPn5.png) - [github SDK](https://github.com/acrcloud/acrcloud_sdk_python#functions) - Window에서 자꾸 DLL 에러 남... - Linux에서는 잘 작동함 ![](https://i.imgur.com/aPzI7sb.png) ## 음악생성 - [~~Foley Music: Learning to Generate Music from Videos~~](https://arxiv.org/pdf/2007.10984.pdf) ## 영상 Feature 정보 추출 - [Multi-label Emotion Classification in Music VideosUsing Ensembles of Audio and Video Features](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8995364) - [github - maelfabien/Multimodal-Emotion-Recognition](https://github.com/maelfabien/Multimodal-Emotion-Recognition) ## Music Mood Classification - [https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7973014](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7973014) - [https://asistdl.onlinelibrary.wiley.com/doi/pdf/10.1002/asi.23649](https://asistdl.onlinelibrary.wiley.com/doi/pdf/10.1002/asi.23649) - [https://www.aclweb.org/anthology/C16-1186.pdf](https://www.aclweb.org/anthology/C16-1186.pdf) - [DB - http://millionsongdataset.com/](http://millionsongdataset.com/) - [github - https://github.com/neokt/audio-music-mood-classification](https://github.com/neokt/audio-music-mood-classification) ## 영상 분위기에 따른 동영상 생성 - [Paper - https://www.koreascience.or.kr/article/JAKO201928862523828.pdf](https://www.koreascience.or.kr/article/JAKO201928862523828.pdf) ## 음악 mood classification - http://millionsongdataset.com/pages/contact-us/ - https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7973014 - https://asistdl.onlinelibrary.wiley.com/doi/pdf/10.1002/asi.23649 - https://res.mdpi.com/d_attachment/electronics/electronics-08-00164/article_deploy/electronics-08-00164.pdf - https://www.researchgate.net/profile/Alexander_Schindler/publication/313895558_Parallel_Convolutional_Neural_Networks_for_Music_Genre_and_Mood_Classification/links/58aead3645851503be9203b8/Parallel-Convolutional-Neural-Networks-for-Music-Genre-and-Mood-Classification.pdf - https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8279468 - https://openaccess.thecvf.com/content_cvpr_2017_workshops/w41/papers/Roy_DeepSpace_Mood-Based_Image_CVPR_2017_paper.pdf ## 음악 추천 - [멜론 음악추천](https://tech.kakao.com/2020/04/29/kakaoarena-3rd-part1/) ## Dataset 찾기 - [https://www.bensound.com/](https://www.bensound.com/)