# Dataset Shift and Model Adaptation ## Overview This workshop aims to equip participants with the right knowledge to understand, detect, and potentially resolve dataset shift problems. Understand different cases of dataset shift, methodologies to detect dataset shift, and choose a possible solution to a dataset shift problem. This workshop is suitable for professionals in all industries. ## Attendees **Finance** - Scotiabank - CIBC - https://vectorinstitute.ai/2021/08/17/cibcs-techniques-to-keep-ai-models-accurate-when-the-data-changes/ - RBC - Sun Life **Telecom** - Telus - want to optimize time when they send out emails/text messages for marketing campaigns (data keeps changing and shifting) - KT Corporation - Thomson Reuters - Dataraction Inc. - Nautical Crime Investigation Services - Skywatch **Health** - Roche - https://www.youtube.com/watch?v=iVk-ouzXiXA - https://www.youtube.com/watch?v=LmiNQio4db0 - https://www.youtube.com/watch?v=t5MSNf8600U&list=PLlcxuf1qTrwAri2DeaKL0fyMedK8-Rsa2&index=31 Antonio is a data scientist working on fraud detection at a fintech company. For his job, he checks for data distribution shifts in the training, test, and current datasets. When a model's performance goes down, he needs to manually identify upstream problems. Comparing multiple historical and current distributions is cumbersome and there is no standardized way to do it. Troubleshooting features is reactive rather than proactive. ## Dataset for assignments - cross-sectional or time series (no image data) - Iowa House Sale Prices (Kaggle) - Predict Future Sales (Kaggle) - prepare data for experiments - apply dataset shift analysis algorithms - identify potential shifts - use shift adaptation techniques - transfer learning and adaptive learning - few-shot learning - analyze model performance ## Day 1: Characterizing dataset shift ### Questions - ### Objectives - terminology and taxonomy - overview of different types and causes of dataset shift - Covariate shift adaptation using sample re-weighting - Label shift adaptation using black-box predictors - Concept shift adaptation - understand different cases of dataset shift - learn methods to detect dataset shift - choose appropriate solution for different dataset shift problems - Transfer learning - Active learning - Techniques for dataset shift detection and adaptation - Practical use of available packages in dataset shift - theory and classification of dataset shift problems - introduction to data shift in ML - domain adaptation - types of shift - covariate - concept - label ### Assignment 1 - long answer questions on concepts ## Day 2: Detecting and correcting dataset shift - identifying and correcting dataset shift - detecting datashift - dimensionality reduction - statistical tests - algorithms for shift correction ### Assignment 2 - find dataset ## Day 3: Advanced topics - transfer learning - active learning - practical examples ## Showcase on Active/Transfer Learning ## Showcase on cyclops ### Capstone ## Case Study The project is specifically aimed at adjusting AI models trained on historical data given the COVID-19 pandemic has caused behavioural and economic patterns to shift so drastically that the historical training data is now different from actual conditions.