---
tags: Book
---
# Human Evaluation Playbook
## Preparation
Required roles:
- Microservice Operator
- Human Evaluators
- Stream Coordinator (or delegated)
Required parameters
- Number of Human Evaluators (default: 15)
- Number of samples per evaluator (default: 30)
## Evaluation Process
It is currently expected (as of 31 August 2021) that the human evaluation process will run on a seven (7) day cycle:
* Three (3) days of data collection and processing cycles
* Three (3) days for the Human Evaluators to evaluate humanly
* One (1) day for a human spot/sanity check on the human evaluations to prevent errors propagating through the pipeline
### Process
1. The Operator invokes the 'Predict Evaluation' and 'Prepare Human Evaluation Sheet' Micro Services sequentially.
- The 'Prepare Evaluation' micro-service must have the appropriate number of samples and human evaluators (eg, N_samples=30, N_evaluators=15)
- Output: python object processed into Google sheets, tabbed with 30 subjects per tab
3. The Operator passes the output of (1) to Stream Coordinator
- This pass must include the link to the evaluation spreadsheet
- Stream Coordinator coordinates human evaluators and assignment of tabs
- (Backlog) the 'Prepare Human Evaluation Sheet' is programmed for providing an automated ping on a channel
4. Human Evaluators perform a row by row evaluation of they deem a given user handle is suspicious or not
- They need to fill all relevant columns for each row, like `sybilness score`, and `is_sybil`
- [Spreadsheet instructions](/BkiLTm2WY)
6. Each Human Evaluator pings the Stream Coordinator individually when they are done
7. When Human Evaluations are done, Stream Coordinator instructs Operator to invoke the 'Retrieve Human Labels' Microservice
- A human spot/sanity check should be made on the retrieved labels to make sure no problems creep into the pipeline.