Human Evaluation Playbook

--- tags: Book --- # Human Evaluation Playbook ## Preparation Required roles: - Microservice Operator - Human Evaluators - Stream Coordinator (or delegated) Required parameters - Number of Human Evaluators (default: 15) - Number of samples per evaluator (default: 30) ## Evaluation Process It is currently expected (as of 31 August 2021) that the human evaluation process will run on a seven (7) day cycle: * Three (3) days of data collection and processing cycles * Three (3) days for the Human Evaluators to evaluate humanly * One (1) day for a human spot/sanity check on the human evaluations to prevent errors propagating through the pipeline ### Process 1. The Operator invokes the 'Predict Evaluation' and 'Prepare Human Evaluation Sheet' Micro Services sequentially. - The 'Prepare Evaluation' micro-service must have the appropriate number of samples and human evaluators (eg, N_samples=30, N_evaluators=15) - Output: python object processed into Google sheets, tabbed with 30 subjects per tab 3. The Operator passes the output of (1) to Stream Coordinator - This pass must include the link to the evaluation spreadsheet - Stream Coordinator coordinates human evaluators and assignment of tabs - (Backlog) the 'Prepare Human Evaluation Sheet' is programmed for providing an automated ping on a channel 4. Human Evaluators perform a row by row evaluation of they deem a given user handle is suspicious or not - They need to fill all relevant columns for each row, like `sybilness score`, and `is_sybil` - [Spreadsheet instructions](/BkiLTm2WY) 6. Each Human Evaluator pings the Stream Coordinator individually when they are done 7. When Human Evaluations are done, Stream Coordinator instructs Operator to invoke the 'Retrieve Human Labels' Microservice - A human spot/sanity check should be made on the retrieved labels to make sure no problems creep into the pipeline.