# 2021 06 22 ML Solutions Meeting
## Attendees
- Quansight
- Kim
- Fatma
- Eric
- Eskild
- Datum
- Brian
- Jeremy
- Matthew
## Clients Needs
- Voter registration
- Using datum's data alongside client's data to decern which party the user is more associated with
- Create lookalike audiences
- Use known voters address --> device IDs
- Beeswax data
- Row data with device ID with bid data
- Auction for ad placement
- Bid logs and win logs
- 16 million Texas voter data
- Perhaps start on a smaller scale - Austin area?
- If model is to be used statewide, use statewide data sampling
- To avoid sampling bias
- Stratification based on how democrat (DDD, DD, D)
- Could we differentiate between the three groups?
- Classification models
- XGBoost
- Tree-based method
- Multiclassification models
- Binary classification models would be simplier
- Things to remember:
- Data needs to be cleaned up
- Requires labeled data
- Use base URL and one-hot-encoding
- Start simple!!