# Workshop 1 Agenda ## Running the SAD Workflows Command: `python -m gtc_sad WORKFLOW` ### `prepare_predict` Retrieves data from Github and PSQL and generates an feature dataset. Store all of them on a data lake. ### `prepare_training` Retrieves data from the human evaluation sheet and generate labels based on it + any heuristics on the data lake. ### `fit` Trains a ML model based on the features + label set and stores the pickled model on the data lake. ### `predict` Generates an prediction dataset based on the fitted model and the existing features. ### `prepare_human_eval` Generates human evaluation sheets based on the existing dataset of predictions and labels. ## `push_endpoint` Generates an "clean" dataset for marking users as sybil / non-sybil on the Gitcoin dashboard. ## Workflows output (just browse the GCS bucket for R12 / R13) ## Setting up the environment for SAD ### Creating the data sinks: GCS bucket and human evaluation sheet. - Create a GCS bucket - Clone an Human Evaluation Sheet ### Getting credentials: GCP and Github - Create an GCP service account (`config/gcp_credentials.json`) - Enable the Google Sheets API (`config/credentials_human_eval_sheets.json`) - Creating an Github token (`config/github_token.txt`) - Setting up (`config/params.json`) - Setting up (`gtc_sad/cloud/definitions.py`) - Connecting to an secure VPN (`config/psql_conn_string.txt`) ### Testing the SAD - Visual Studio Code ## Homework ### Required Without those items, it is not going to be possible to run workshop #2 unless we deactivate certain features - Get access to an trusted static IP for running the SAD (quick solution: PureVPN) - Get PostgreSQL credentials ### Desirable By having those items, we will have a smoother execution of workshop #2 - An clean human evaluations sheet - Github token - GCP project - GCS bucket ## Pointers: - R13 sheet: https://docs.google.com/spreadsheets/d/1P-plAeFmChHgKwnS2hk3TxpjZNH6xmncu3hCt9WUHfs - GCS buckets: https://console.cloud.google.com/storage/browser?authuser=1&project=gitcoin-322518&prefix= - R13 bucket: https://console.cloud.google.com/storage/browser/round_13;tab=objects?forceOnBucketsSortingFiltering=false&authuser=1&project=gitcoin-322518&prefix=&forceOnObjectsSortingFiltering=false - Cloud IAM: https://console.cloud.google.com/iam-admin/iam?authuser=1&project=gitcoin-322518 -