# InfQuery Module
<!-- Put the link to this slide here so people can follow -->
**slides:** [https://hackmd.io/@sonik/ByukrOckq](https://hackmd.io/@sonik/ByukrOckq)
> [name=Hamilakis Nicolas]
---
**TODO**: brief description
---
## Experiment format
Experiments are described with a yaml file which is easy to edit for humans and
easy to parse for programs. A schema validation garanties that all information
is correctly entered before the experiment is run.
```yaml=
settings: ...
experiments:
- kwargs: ...
input: ...
output: ...
models: ...
task: ...
```
---
### Settings
A set of global settings and parameters to take into account when running experiments.
```yaml=
settings:
sbatch_mode: array
debug: false
hostname: oberon
```
---
### experiments
A list of experiments to run.
```yaml=
experiments:
- ...
- ...
```
---
#### kwargs
Setting and parameters of an individual experiment.
```yaml=
kwargs:
quantize_options:
max_size_seq: 10200
batch_size: 16
strict: false
convert_audio_to: ".wav"
pseudo_probabilities_options:
batch_size: 256
decoding_span_size: 8
inner_batch_size: 64
```
---
#### input
A way to specify input files to the experiment (audio, csv, etc)
```yaml=
input:
audio:
ext: .wav
method: glob
root_dir: data/wav
lexical:
csv: gold.csv
columns: 'id,filename,voice,frequency,word,phones,length,correct'
```
---
#### output
A way to specify formating of output files.
```yaml=
output:
quantization:
keep: false
pseudo_proba:
keep: false
lexical:
keep: true
action: merge_by_family
format: csv
by: length
```
---
#### models
A way to specify on which models to run the experiment (InfQuery Selector)
```yaml=
model:
model_root: /data/infTrain/models
selector:
en:
- "baby50"
- "baby100"
en|fr:
- "baby200"
fr:
- "baby3600"
de: "all"
```
---
#### task
A way to specify what task to run (quantization, pseudo_probability, abx,
lexical, etc).
```yaml=
task: "quantize | proba | lexical"
```
---
## Data types
Different data types paired with dataloaders to allow integration &
interactions with the tasks (ex: audio, csv files, etc).
---
## Task Types
A task takes a model and a dataset (a list of data types) and produces a
dataset as output.
Task lists
- audio quantizations (CPC/Kmeans)
- pseudo probabilities (LSTM/BERT)
- comparative evaluation (lexical, semantic, etc.)
- distance evaluation
---
## Task Run example

---
## Task parallelization
Various issues arize from adding parallelization.
- identifying bottlenecks (what are the sections to render parallel)
- memory management
- process management
- gpu allocation
---

---
Parallelization happens at the model level, via sbatch_array or other ressource
manager (optional), a process (sbatch/other) is create for each model and the
task_sequence is sequential.
---
{"metaMigratedAt":"2023-06-16T19:43:12.991Z","metaMigratedFrom":"YAML","title":"InfQuery Module","breaks":true,"description":"A brief description of the infquery module & protocol","contributors":"[{\"id\":\"ee4f4b41-8621-42f9-b6f2-ed740cfcccf5\",\"add\":6340,\"del\":3085}]"}