# 2021 06 22 ML Solutions Meeting ## Attendees - Quansight - Kim - Fatma - Eric - Eskild - Datum - Brian - Jeremy - Matthew ## Clients Needs - Voter registration - Using datum's data alongside client's data to decern which party the user is more associated with - Create lookalike audiences - Use known voters address --> device IDs - Beeswax data - Row data with device ID with bid data - Auction for ad placement - Bid logs and win logs - 16 million Texas voter data - Perhaps start on a smaller scale - Austin area? - If model is to be used statewide, use statewide data sampling - To avoid sampling bias - Stratification based on how democrat (DDD, DD, D) - Could we differentiate between the three groups? - Classification models - XGBoost - Tree-based method - Multiclassification models - Binary classification models would be simplier - Things to remember: - Data needs to be cleaned up - Requires labeled data - Use base URL and one-hot-encoding - Start simple!!