# Report Exercise 3
David Penz (11703497), Giulio Pace (11835706), Moritz Leidinger (11722966)
## Contents
- [How to run it](#How-to-run-it)
- [Architecture](#Architecture)
- [Segmentation](#Segmentation)
- [Scoring](#Scoring)
- [Ranking](#Ranking)
- [Possible improvements](#Possible-improvements)
- [Final considerations](#Final-considerations)
- [Problems we had](#Problems-we-had)
- [Feedback](#Feedback)
# How to run it
0. (Use Python 3.7)
1. Clone contents of `Exercise 3` folder from [this repo](https://github.com/tuwien-information-retrieval/air-2019-19).
2. Download the zip from [this GoogleDrive link](https://drive.google.com/file/d/1asOpgf8QWbVfXqoxoI9wSlTtxKmM05Sg/view?usp=sharing), extract it and put the resulting `ethn_all` folder into `Exercise3/data`.
3. Run `pip install -r requirements.txt` from inside `Exercise3` to install the required modules.
4. Move to the `src` folder with `cd src`.
5. Run `python main.py` to start the program.
6. A GUI will open. Select a file to use as a query and/or select to rebuild the index from the files in `Exercise3/data/ethn_all`. Rebuilding the index takes about 20 minutes.
7. Click `start` to run the query.
8. After processing, the query results will be shown in a table.
On some Linux distributions installing the `Gooey` module can cause some troubles. Some pointers on how the issue could be fixed can be found in [this](https://github.com/chriskiehl/Gooey/issues/243) and [this](https://github.com/wxWidgets/Phoenix/issues/465) GitHub issue. Specifically, the problem is caused by an error in installing `wxPython`.
In any case, the program can be run from the command line by running `python main.py --ignore-gooey -q <path_to_query_file>` in the `src` folder. To also rebuild the index from the files in `Exercise3/data/eth_all` also pass the `-r` flag.
# Architecture
## Segmentation
For the segmentatin we segment every 0.5 seconds and then used the pretrained model from SeFiRe for the labeling. We exclude music files with mostly spoken text from the database (which is provided as download in the ["How to run it"](#How-to-run-it) section).
## Scoring
For the scoring we create an index that stores the features of the songs of our database. There are two indices for the features:
- one is scaled and is used when the posed query file is already present in the index. In this case, no feature extraction needs to be done anymore.
- the second index is not scaled and it is needed for new query files. After extracting the queries' features, they get appended to the index, which is subsequently scaled.
The index is a dataframe where every index is the path to a file in the database and every row contains a list of features extracted from this file with the librosa library. The features include: chromagram, spectral centroid, spectral bandwidth, spectral rolloff, zero crossing rate and mel-frequency cepstral coefficients (MFCCs).
When a query is posed, we extract these features from the query file and calculate the euclidean distance of every feature of the query with every feature of every file in the index. We sum the distances per file, which gives us the final score for each file. These scores are stored in a Series which has the filepaths as an index.
## Ranking
To rank the elements, we order the Series with the scores in descending order. The element with the lowest score is the one with the closest similarity to the query. We then return the first 10 elements.
If a query is an exact copy of one element in the database, said element will score 0. This will never be shown though, as the file would be filtered from the index first. The worst possible score is 25, since there are 25 features in total. Usually fairly different songs will have a score of around 10. If two songs score less than 3 points, they are likely to be very similar.
# Possible improvements
We could improve the index-of-features because the response time for a query is not that fast (usually <10s but in case of a long query can be up to 20-30s). Additionally, the scoring could be improved by adding weighting to the features when computing the distances, to give more power to more important features. Finally, the GUI could be polished and improved.
# Final considerations
The program works as intended. As discussed in the previous chapter, there are some possible improvements that can be made, but overall we are pretty content with the final result.
For a bit of testing, we created an index of songs that we know well and that belong to very different musical genres. On this index, we performed some queries. Generally, all results produced by the scoring and ranking were reasonable and in line with our expectations.
# Problems we had
It is not documented how the SeFiRe tool decides which label to give to each segment (you only get the prediction for each segment from the SeFiRe library). We needed to manually check several tracks and try to guess the labels' meaning.
Additionally, it was hard to understand the meaning of each specific feature, making it almost impossible to make predictions or to evaluate the results while writing the code and before having the whole project running.
# Feedback
The formulation of the exercise description was quite open, which made it a bit unclear for us and left a lot of room for interpretations. Besides that, the topic of the project was extremely interesting for all of us since we rarely get to work with other than textual files or numeric values!