# Label Studio review
## Initial questions
- How to deploy the app?
- Is the app communicating externally?
- Is there any data collection from the software producer?
- How to add a model for pre-annotation? Where does it takes place in the code?
## Modules
There is 2 main modules :
- label-studio : https://github.com/heartexlabs/label-studio
- label-studio-ml-backend : https://github.com/heartexlabs/label-studio-ml-backend
but also :
- label-studio-converter : https://github.com/heartexlabs/label-studio-converter
- it helps convert annotated data exports from label studio to the desired machine learning data format (eg. Yolo, Coco etc.)
- it might be useful later in the project
### Label-studio module
It is data annotation tool containing :
- backend
- Django and Flask
- Server of the application
- storing datas
- managing users accounts
- processing requests
- frontend
- ReactJs
- User interface:
- managing labeling
- create projects
- import data
- etc.
- database
- PostgreSQL
- store datas
- low level CRUD operations
- transparent
### Label-studio-ml-backend module
This is a tool for deploying and training machine learning models inside label-studio.
- You can import models for a lot of tasks : object detection, text. classification, image classification etc.
- The model is wrapped in web server api.
- A web service is an application that provides a standardized programming interface (API) to allow other applications to communicate with it via the network, here label-studio communicates with label-studio-ml.
- Model wrapping is made with a python initialisation script containing : predict() and fit() functions.
- Theses functions helps doing conversion from label-studio and model framework (eg. pytorch, tensorflow).
## How to deploy?
### Label-studio
```bash
#clone the repository
git clone https://github.com/heartexlabs/label-studio.git
cd label-studio-develop
#Install all package dependencies
pip install -e .
#Run database migrations
python label_studio/manage.py migrate
python label_studio/manage.py collectstatic
#Start the server in development mode at http://localhost:8080
python label_studio/manage.py runserver
```
### Label-studio-ml-backend
```bash
git clone https://github.com/heartexlabs/label-studio-ml-backend
cd label-studio-ml-backend
pip install -U -e .
pip install -r label_studio_ml/examples/requirements.txt
# this command create a folder my_ml_backend using simple_text_classifier.py scrupt as entry point.
# script contains functions to train, predict and evaluate the model
# init argument creates necessary files to loaunch the model as a webservice
label-studio-ml init my_ml_backend --script label_studio_ml/examples/simple_text_classifier/simple_text_classifier.py
## launch the service on localhost:9090 by default
label-studio-ml start my_ml_backend
```
### Linking both servers
Prerequisite is to have a project created.
- go in "settings" in the project interface.
- click on "Machine Learning" tab.
- add a model.
- put a description and a name.
- in URL field, enter the ML-backend server adress
- you can find it in the command line shell stack
- or in the config file of the model
- it is localhost:9090 by default
### Notes
This deployment is for development only, this mean concretely that there is only 1 worker and 8 threads, and application could not take the workload of several users in the same time.
This setup at the moment is just a quick fix for deploying the application for personal purpose.
## Wrapping a model
There is a list of script examples available in label-studio-ml-backend and they cover tasks like OCR, text classification or image object detection.
## Active Learning
Active learning is a machine learning approach that involves an algorithm selecting the most informative examples to learn from among a large pool of unlabeled data, then presenting these examples to a human expert for labeling to improve the algorithm's performance.

### Learning loop
The learning loop allow to train the model automatically after a bunch of annotations in order to preannotate the datas more quickly and more accurately.

Sequence diagram
+-------------+ +---------------------+
| Label Studio| | Label Studio ML |
+-------------+ +---------------------+
| Create Labeling Project
|
|
|
| Create ML Backend
| |
||
| |
| Link ML Backend |
| |
|------------------------------->|
| |
| Train Model |
| |
|<-------------------------------|
| |
| Predict with Model |
| |
|------------------------------->|
| |
| View Results |
| |
|<-------------------------------|
| |
| Update Model |
| |
|------------------------------->|
## Community vs Entreprise versions
See comparaison tab here : https://labelstud.io/guide/label_studio_compare.html
The main difference is that the paid version offers technical support from Heartex company at the origin of this software. The paid version offers more features to manage teams of annotators (performance statistics, accounts etc.). This functionnalities are not useful for our purpose.
### Useful functionnalities we don't have
- learning loop:
- the model does not retrain with each new annotation
- you have to retrain it and reload the backend with the new model
- sorting data by score.
- this is very useful to correct the most erroneous annotations and increase the performance of the model more quickly.
## Externals requests
### According to label-studio
"*Label Studio collects anonymous usage statistics about the number of page visits and data types being used in labeling configurations that you set up. No sensitive information is included in the information we collect. The information we collect helps us improve the experience of labeling data in Label Studio and helps us plan future data types and labeling configurations to support.*"
Link here : https://labelstud.io/guide/get_started.html#:~:text=Label%20Studio%20is%20an%20open,exploring%20multiple%20types%20of%20data.
### According to my tests
#### 4 tests
- Running the apps without internet connection.
- Stack trace from command line.
- Network tab in chrome explorer inspector.
- Wireshark paquet tracing
- This is too low level information, difficult to understand and filter.
- I need to investigate further to get desired information.
#### Conclusion
No external requests regarding the **datas**.
The apps run fine without access to internet.
The web dev inspector and the stack trace shows that there are connections that are attempted for the javascript library.
web dev inspector:

stack trace:

Failure to load this external content does not lead to the interruption of the software or any disturbances.
**but**
I recommend working offline for now because it is possible to have unsolicited requests (after a certain time? when loading the app? when shutting down?...). Some requests may not be logged in stack trace nor web dev inspector.
The explication above is not sufficient to understand application because it doesn't capture the details of the deployment, one should try the 2 following examples to get deeper in the code.
## Project example : annotating fish videos
This example project doesn't involve ml-backend to preannotate data. It is a 'hack' to accelerate annotation of images in object detection task, using the very good video annotation tool.
Indeed, why annotate each image 1 by 1 while there is sequentiality? We should rather use video annotation ?
#### Video annotation tool
- There is a good demo here : https://www.youtube.com/watch?v=Grp6UB_zB0Y&t=1872s
- Principle is to annotate at different points of the video called 'key frames' .
- The frames in the interval will be automatically annotated in a coherent way.
- Label-studio implements a linear extrapolation of bboxes from one keyframe to another.
- Videos formats accepted by label-studio: mpeg4/H.264 webp, webm*.
#### Datas
- https://alzayats.github.io/DeepFish/
- Dataset which consists of a series of images of fish in natural environments.
- I made a quick jpg to mp4 conversion script below. It has to be reworked to adapt output name, because actually it is just 'out.mp4'.
```bash!
#!/bin/bash
# Set the path to the first-level directory
FIRST_LEVEL="/Users/benjamin/Developments/label_studio/label-studio-ml-backend-master/fish_dataset/DeepFish/Classification"
# Iterate through each second-level directory
for dir in "${FIRST_LEVEL}"/*/; do
# Check if the directory exists and is a directory
if [ -d "$dir" ]; then
# Do something with the second-level directory
echo "Processing ${dir}"
for dir2 in "${dir}"*/; do
# Check if the directory exists and is a directory
if [ -d "$dir2" ]; then
# Do something with the second-level directory
echo "Processing ${dir2}"
### convert all jpg files to mpeg-4/h.264 video format with 25 framerate with ffmpeg
ffmpeg -framerate 25 -pattern_type glob -i ${dir2}'*.jpg' -c:v libx264 -pix_fmt yuv420p ${dir2}out.mp4
fi
done
fi
done
```
- be careful to use a web browser compatible with the chosen video format: https://caniuse.com/?search=video%20format
#### Launch label-studio
```
python label_studio/manage.py runserver
#or
label-studio start
### now reach http://localhost:8080 on your web browser
```
#### Sign in
- Create an account.
- The account is sotred locally in postgreSQL database.
- Credentials, projects informations and datas are stored locally.
#### Project creation
- click on 'create' button
- give project name
- data import : import the recently made video
- labeling setup :
- open 'templates'
- choose 'Videos'
- choose 'Video Object Tracking'
- add label : write the labels separated by line break
- switch to 'code' mode and enter in the 'video' tag, where attribute 'framerate' should be the same as the one choosen in ffmpeg conversion (25 for example).
- save
#### Annotation
- 'label all tasks' button: will pop-up randomly unannotated videos.
- Otherwise choose manually videos to annotate.
- Define 'keyframes', ie frames on which we will place the bboxes and the label.
- From these keyframes label-studio will do a linear interpolation.
- It results in bboxes which 'follow' the object from frame to frame.
#### Results
The export of the results is in the form of json which here is an example:
```jsonld
[
{
"id": 47,
"annotations": [
{
"id": 10,
"completed_by": 2,
"result": [
{
"value": {
"framesCount": 1142,
"duration": 47.541667,
"sequence": [
{
"frame": 1,
"enabled": true,
"rotation": 0,
"x": 16.587677725118482,
"y": 9.557661927330173,
"width": 3.554502369668245,
"height": 8.636124275934703,
"time": 0.041666666666666664
},
{
"x": 16.113744075829384,
"y": 10.400210637177457,
"width": 3.554502369668245,
"height": 8.636124275934703,
"rotation": 0,
"frame": 2,
"enabled": true,
"time": 0.08333333333333333
},
...
,
"labels": [
"poisson1"
]
},
"id": "0TUS4LF4hB",
"from_name": "box",
"to_name": "video",
"type": "videorectangle",
"origin": "manual"
},
...
"file_upload": "e1917952-out.mp4",
"drafts": [],
"predictions": [],
"data": {
"video": "\/data\/upload\/9\/e1917952-out.mp4"
},
"meta": {},
"created_at": "2023-05-02T14:55:46.864882Z",
"updated_at": "2023-05-02T15:03:10.980673Z",
"inner_id": 1,
"total_annotations": 1,
"cancelled_annotations": 0,
"total_predictions": 0,
"comment_count": 0,
"unresolved_comment_count": 0,
"last_comment_updated_at": null,
"project": 9,
"updated_by": 2,
"comment_authors": []
}
]
```
- We can find the coordinates of the bboxes (x,y,h,w) as well as the label of each bbox but **only for keyframes**.
- In order to label all the frames it would be interesting to write a python script with :
- Input: the json
- Output: the json augmented with bbox and label data for each frames.
- Algorithm to label all frames :
- take two neighbor keyframes: keyframe n and keyframe n+x
- for each one, take the points x, y , w ,h
- measure the difference between point k1 and k2.
- divide by the number of intermediate frames: we obtain an alpha value.
- increment each point by its alpha to reconstruct the intermediate frames.
## Example project: image object detection
- Install MMDetection : https://mmdetection.readthedocs.io/en/v1.2.0/INSTALL.html
- Do in CLI (adapt with your path):
```
label-studio-ml init coco-detector --from /Users/benjamin/Developments/label_studio/label-studio-ml-backend-master/label_studio_ml/examples/mmdetection-3/mmdetection.py
```
- Download model weights here : https://github.com/open-mmlab/mmdetection/tree/main/configs/faster_rcnn
- Save it for example in mmdetection folder :
/Users/benjamin/Developments/label_studio/mmdetection/mmdetection/checkpoint/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
- Config file is in mmdetection folder recently installed : mmdetection/mmdetection/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py
- Now you can launch the ml-backend server in CLI:
```
label-studio-ml start coco-detector --with \
config_file=/Users/benjamin/Developments/label_studio/mmdetection/mmdetection/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py \
checkpoint_file=/Users/benjamin/Developments/label_studio/mmdetection/mmdetection/checkpoint/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
device=cpu \
--port 8003
```
- Download example datas :
```
cd path/to/coco-detector
mkdir data && cd data
wget https://download.openmmlab.com/mmyolo/data/cat_dataset.zip && unzip cat_dataset.zip
```
- Launch label-studio :
```
label-studio start
```
- Create a project
- Choose template in Computer Vision > Object Detection with Bounding Boxes
- In labelling configuration, paste all the COCO dataset labels (The rcnn model you downloaded was trained on it) :
airplane
apple
backpack
banana
baseball_bat
baseball_glove
bear
bed
bench
bicycle
bird
boat
book
bottle
bowl
broccoli
bus
cake
car
carrot
cat
cell_phone
chair
clock
couch
cow
cup
dining_table
dog
donut
elephant
fire_hydrant
fork
frisbee
giraffe
hair_drier
handbag
horse
hot_dog
keyboard
kite
knife
laptop
microwave
motorcycle
mouse
orange
oven
parking_meter
person
pizza
potted_plant
refrigerator
remote
sandwich
scissors
sheep
sink
skateboard
skis
snowboard
spoon
sports_ball
stop_sign
suitcase
surfboard
teddy_bear
tennis_racket
tie
toaster
toilet
toothbrush
traffic_light
train
truck
tv
umbrella
vase
wine_glass
zebra
- import examples datas
- connect label-studio with ml-backend
- settings
- add model
- enter backend url
- save
- Now you can do the annotation loop :
- click on 'Label All Tasks' : model will predict bbox and label on all images.
- correct the annotations.
- re-train the model
- relaunch the ml-backend
- etc.
- until reaching good quality data labelling.
## Possible futures works
- Implementing the model of Kilian.
- External requests:
- Checking the external requests more deeply with Wireshark.
- Identify the code responsible.
- Try to remove it without breaking the app.
- Potentialy time consuming.
- Implement a learning loop
- Add score sorting functionnality.
- Adapt the application for production.
- Deploy on a server.