# Intro to AI Final Project Midterm Report -- Acoustic Fingerprint Recognization Using Bark
## Abstract
This project is developed to resolve the problem that it usually requires a quantity of data to get an acoustic fingerprint recognizing model well-trained. In this project, we propose a method that satisfies the need of model training requiring only a small amount of data. For the classifying part, we leverage a linear classifier model along with CNN/DNN for implementation.
## Dataset source
The data we used are obtained from a human-voice generating model, Bark. Bark is a model that produces pieces of voice according to literal paragraphs and base voices provided as input. In our implementation, we start from 10 pieces of voice, ranging from man 1 to 5 and woman 1 to 5. These voices are set as basis for further generation. In our implementation, due to the limitation of hardware, we produce 30 pieces for each person.
## Model building
The building of model can be described as follow:
### Model
This is a customizable model, with Judge as its component and a parameter. User may designate the total number of fingerprints to be recognized. A SoftMax function is applied for comparison and the final distribution of matching.
### Judge
This is a fully connected CNN/DNN sub-model, and is not customizable. It consists of linear layer on head and tail, and convolutional layer within.
Concatenating the parts mentioned above, we get a model that judges similarity between voices, hence later processes for training can be implemented.
### Training method
For the dataset, we read in the voices generated by bark. For each epoch of training, we randomly sample a certain duration of voice (in our case, 3 seconds), as the input. The sampling process is written in class VoiceDataset, and the duration may be modified.
Until now, we’ve trained the model to recognize acoustic fingerprints belonging to 7 people.