Intro to AI Final Project Midterm Report -- Acoustic Fingerprint Recognization Using Bark

# Intro to AI Final Project Midterm Report -- Acoustic Fingerprint Recognization Using Bark ## Abstract This project is developed to resolve the problem that it usually requires a quantity of data to get an acoustic fingerprint recognizing model well-trained. In this project, we propose a method that satisfies the need of model training requiring only a small amount of data. For the classifying part, we leverage a linear classifier model along with CNN/DNN for implementation. ## Dataset source The data we used are obtained from a human-voice generating model, Bark. Bark is a model that produces pieces of voice according to literal paragraphs and base voices provided as input. In our implementation, we start from 10 pieces of voice, ranging from man 1 to 5 and woman 1 to 5. These voices are set as basis for further generation. In our implementation, due to the limitation of hardware, we produce 30 pieces for each person. ## Model building The building of model can be described as follow: ### Model This is a customizable model, with Judge as its component and a parameter. User may designate the total number of fingerprints to be recognized. A SoftMax function is applied for comparison and the final distribution of matching. ### Judge This is a fully connected CNN/DNN sub-model, and is not customizable. It consists of linear layer on head and tail, and convolutional layer within. Concatenating the parts mentioned above, we get a model that judges similarity between voices, hence later processes for training can be implemented. ### Training method For the dataset, we read in the voices generated by bark. For each epoch of training, we randomly sample a certain duration of voice (in our case, 3 seconds), as the input. The sampling process is written in class VoiceDataset, and the duration may be modified. Until now, we’ve trained the model to recognize acoustic fingerprints belonging to 7 people.