# Scoring infrastructure
## UW API
GET:https://underwrite.rociapi.com/score/123 → ```{ "creditScore": 10, "id": 123, "timestamp": 1662455521 }```
**Score**: 1..10, 101, 102
**id** aka **NFCS_ID**: Integer -> [wallet1, ..., walletN], e.g. ["0xA44CceF6D966d74f7d91B67796e5EFf861F43EEC", "0x9402F038CcCb9259Abb3d51a44f0EaC0D5241236"]
## Credit score model
Linear regression, random forest
GCE VM scrapper.rociapi.com
https://github.com/RociFi/Scraper-Scripts/tree/feature/DE-403-aaveV3-polygon/lending
Manually run via run_all_flow.py
Monthly
Re-training process: https://docs.google.com/document/d/1uJcSl64Usb8vV4gdwHsG4pjRt1aOuIaSO8I-8blw_xE/edit
Output to folder `https://console.cloud.google.com/storage/browser/protocol/credit_score?authuser=1&cloudshell=false&pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false`
~40GB per month
DS script merges data from all of the folders, transforming raw data from bucket into the format fitting for model:
https://github.com/RociFi/mvp-data-analytics/blob/main/data/dataAggregator.py
Open questions:
- Where features come from
- Where these changes are coming from https://github.com/RociFi/CreditRisk-Service/pull/43/files#diff-be128e51bb0a21c72290632d809e580ecaa72d8a6e18e84a2057935fdb359a43
**Features:**
'count_repays_to_count_borrows', 'avg_repay_to_avg_borrow', 'net_outstanding_to_total_borrowed', 'net_outstanding_to_total_repaid', 'count_redeems_to_count_deposits', 'total_redeemed_to_total_deposits', 'avg_redeem_to_avg_deposit', 'net_deposits_to_total_deposits', 'net_deposits_to_total_redeemed'
Data validation https://docs.google.com/document/d/1ayP8y7sm7_5R48A-zjWdMkdAECl4zCL-fjoXdHPVoYg/edit
Live model (https://github.com/RociFi/RociFi-microservices/):
- step1 retrieve lending tx data from the data sources (Subgraphs)
- step2 n/a
- step3 dex txs
- step4 aggregate data from step1 - step3 independent of particular lending data and dex data providers, e.g. `count_repays_to_count_borrows: 12`
- FakeDate param allows to skip fresh data and simulate training data period
Inconsistencies: code differences step1 ... step4 and scraper scripts, different chain inputs for scraper scripts, different chain inputs in DS script
API used:
- Thegraph hosted (full list: https://docs.google.com/document/d/1js0PFUfzb-LrtZ4d4_4yiCLfgan5FgI8-3lCJw9yb1w/edit)
- Etherscans
- Bitquery not in use
Requirements:
- Re-train automatically
- Validate result with the predefined set
- Use the same pipeline for training and live data
Problems:
- Training and live data mismatch
- Missing auto-tests
## Fraud model
Re-training process: https://docs.google.com/document/u/1/d/14OcpXROOek4WTdD8Haq7VOL2n6IQ22yw6o741GysCqA/edit
Fraud data adapter
Fraud API
## Coin prices
https://github.com/RociFi/coin-price-loader
Bunch of Java scripts to transform
Coingecko free plan using proxies → MySQL
Days
Readonly
Many customers
Now: manual, future: on cron
800GB
TODO: research upgrade plan
TODO: deprecate Bituery
TODO: negative prices in old JSON files
TODO: 1 day shift in coin prices
TODO: Stablecoin prices in step1