# Labeling Tool Product Design
Labeling tool is used to manage models and data for machine learning classification.
## Terminology
- **Model**: a model is a machine learning algorithm paired with a set of parameters. These parameters are usually learned/trained from a labeled dataset. Given input, the model can predict the corresponding output even the input may not exist in the training labeled dataset.
- **Conflicted Example**: a conflicted example is an example whose labeled output is not same as the predicted output. It may be due to either the model error or the labeling error. A conflicted example can be ressolved when 1) its label is corrected and same as the prediction; or 2) the model is improved and the prediction is same as the label.
## Editor Account
- An editor* has an account
- An editor is able to rigester to create an account
- An editor is able to login or logout the account
\* Editor: an editor is a person (engineer or non-engineer) that can login the labeling tool system and do data and model management work;
## Editor User Profile
- An editor can find the list of campaigns* she is working on
- An editor can find her daily work progress: how many examples** she works on every day
- An editor can find other editors' work progress (TBD)
\* Campaign: a campaign is associated with a list of categories the editor wants to add labels. One campaign can have one or more battles.
\** Example: A example at least contains an input (e.g., a sentence). For an unlabeled example, it has only an input. For a labeled example, it has an input and output (e.g., the category of the sentence).
- **Unlabeled Dataset**: an unlabeled dataset is a set of examples. Each example only has input (e.g., sentence), but does not have output.
## Leaderboard
- The leaderboard displays the overall performance of the system, including:
- the number of categories that perform better than a list of thresholds
- the performance of each category
- the number of available examples* for each category
- associated existing live campaigns** of each category
- An editor is able to select a list of categories from the leaderboard to start a campaign
\* available examples: once there is a model, the model runs prediction on a random sample of unlabeled dataset, and examples with prediction for a particular category is the available examples for this category;
\** live campaigin: after one campaign starts, an editor and stop it. A live campaign is a campaign that has not been stopped yet.
## Campaign
- creation
- An editor can create a campaign by selecting a list of categories she wants to associate with this campaign.
- access
- An eidtor can find a campaign (live or stopped) by searching the campaign ID
- An editor can click the link to find a live campaign on the leaderboard associated with categories.
- display: campaign display page should include the following information
- meta: campaign ID, owner, start/stop time
- number of categories whose performance is better than thresholds
- performance per category
- battles* in this campaign
- interaction:
- an editor can stop a campaign
- an editor can change the deactivate/activate categories in a campaign (Once a category is deactivated, this category is temporarily not associated with the campaign. An editor might want to do it once the performance of that category is good enough.)
- an editor can start a battle in a campaign
\* Battle: a battle is associated with a set of examples. When an editor starts a battle, the system prepares a list of unlabeled examples. The editor then starts to label these examples. After the labeling, all these unlabeled examples becomes labeled examples.
## Battle
- creation
- an editor can start a battle from a campaign page
- the system find the examples predicted as active categories in the campaign and add them into the battle
- the editor can specify the following parameter for the battle
- the maximal number of examples to be labeled in the campaign
- the maximal number of examples displayed on one page
- the maximal number of suggested categories (options) provided for each example
- an editor can start a battle from a list of failed examples*: the system finds similar examples to the failed examples and add these examples to the battle
- access
- an editor find a battle by directly search for its ID
- an editor can click the battle link on the campaign display page to find a battle
- display: the battle page contains the following information
- battle meta: battle ID, editor, start/end time, number of examples, etc.
- overall performance
- performance per category
- examples to be labeled (if available)
- example label display
- it can show one or more examples to be labeled on one page
- it can show one or more questions (suggested categories) for each example
- for each question, it provides a few options for the editor to label. For example, it can be "correct", "incorrect" and "not sure"
- if all the questions (suggested categories) are not correct, the eidtor is able to add a new category. When the editor adds a new category, she should get autocomplete suggested categories that match the partial category name typed.
- when the editor submits
\* Failed Example: a failed example is an example that its prediction has been in consistency status with its label, and its label has been confirmed by an editor. A failed example can be recovered when the model is improved and the prediction is same as the label.
[example UI in spreadsheet](https://docs.google.com/spreadsheets/d/1pFFYTnoQh4ktEKbLtsueIOuM_P_WC36y6JLhNKr-OpA/edit?usp=sharing)
## Data Cleaning Feature
- Goal: Users are able to navigate/search conflicts between label and prediction on the training and validation data set;
### Data Cleaning: Main UI
- show overall conflicts information on the training data
- total number of training examples
- total number of conflicts
- category level stats in table including columns
- name of the category
- number of examples
- number of conflicts
- number of conflicts that are not resolved (not added into queries to be corrected yet)
- percentage of all conflicts
- precision/recall/f1
- all the categories can be sorted by one of the following
- percentage of all all conflicts
- f1 score
- can select one category for the clean up
### Data Cleaning: Example List UI
- show a list of editable examples (with pagination);
- each example includes information
- sentence
- predicted categories
- label categories
- editor of the label
- editor timestamp
- you can delete this example if the input of the example is too ambiguous or confusing
- you can add correct categories from the label or prediction column: if the correct categories are inconsistent with the prediction, this query is added to failed query list
- show top n (e.g., 10) conflicted categories and number of not-resolved conflicts
- once clicking on the conflicted categories, only the examples of this pair of conflicted categories are displayed
- there is a tab where you can see all examples that are added to failed query list;
## Model Correction
- Goal: users are able to
- add new examples that to be correctd
- correct the examples from the data cleaning
### Ackownoledged Query Navigation UI
- show stats of failed queries
- the total number of failed queries: fixed, not fixed
- show per category stats of failed queries
- category name
- number/percentage of failed queries
- clicking on the category name, you can get a list of failed queries associated with this category
- you can search to get the failed queries by
- text search
- label category name
- prediction category name
- editor
### Faied Query Example List UI
- when you click on category from categories with stats or do a failed query search, you can get a list of failed queries
- there is a checkbox for each failed query, and you can select these queries to start a battle
- the backend is going to run a round of related query acquisition
- the battle will show up on the user's profile page