# Sybil Users CSV Spec
## Columns
- `handle` (id, non-nullable)
- Github / Gitcoin user handle. Not exhaustive.
- `aggregate_score` (float, non-nullable)
- Suggestion of what users to mark as sybil based on a prioritization logic.
- Either 0.0 or 1.0. See below description.
- `prediction_score` (float, non-nullable)
- Represents the ML confidence on a given user being sybil
- Real Number between 0 and 1.
- `evaluation_score` (float, **nullable**)
- Represents the normalized sybilness score of the human evaluations
- Real number between 0 and 1. Empty if no data is available
- `heuristic_score` (float, **nullable**)
- Represents how much a user is sybil according to SME heuristics
- Either 0.0 or 1.0. Empty if no data is available.
- `feature_1` (float, non-nullable)
- `feature_2` (float, non-nullable)
- ...
- `feature_n` (float, non-nullable)
## Metrics
### `evaluator_score`
| Score | Is Sybil | Confidence |
| -------- | -------- | -------- |
| 0.0 | F | high |
| 0.333 | F | low |
| 1.0 | T | high |
### `aggregate_score`
By assuming that a user should be classified as sybil based on specific thresholds as well as assuming that there's a importance order (evalution_score > heuristic_score > prediction_score), the following algorithm is proposed for computing an "aggregate score"
Pseudo-code:
- If evaluation_score is null
- if heuristic_score is not null then heuristic_score = aggregate_score
- else
- if prediction_score >= ML_THRESHOLD then 1.0
- else 0.0
- Else:
- if evaluation_score >= EVAL_THRESHOLD then 1.0
- else 0.0
Currently, ML_THRESHOLD = 0.5 (accuracy maximization) and EVAL_THRESHOLD = 0.8 (is_sybil=True & confidence >= so-so)