Practical Data Science on AWS Instructor: Heiwad Osman (heiwad@amazon.com) Morning Task List Sign up for an account at https://aws.qwiklabs.com for access to the lab environment - Please use the same email you used in your registration profile. Sign up for an account at https://online.vitalsource.com - You may use a personal email if you prefer. I will email out a code later in the morning that you can use to claim your course eBooks. (from no-reply@gilmore.ca). This is the only way to receive the ‘presentation’. (Recommended) Get the VitalSource Bookshelf app from so you can download your ebooks. Notes We’ll be running this class on Central Time! Expect 9AM - 4PMish. Lunch break from 12 - 1PM Central Time! Synopsis: This class is an introduction to both data science process and basic sagemaker functionality. Agenda: Machine Learning intro/review Introduction to Amazon Sagemaker Data Visualization and Analysis (in Sagemaker) Training & Evaluating Models with Sagemaker Tuning Model Hyperparameters Deploying Models to Sagemaker Endpoints Additional topics & Features Questions? Resources: How to prepare for AWS ML Exam? https://aws.amazon.com/certification/certification-prep/ https://www.aws.training/Details/eLearning?id=42183 https://developers.google.com/machine-learning/crash-course https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html How to use the trained model artifact locally? import pickle as pkl import tarfile t = tarfile.open('model.tar.gz', 'r:gz') t.extractall() model = pkl.load(open(model_file_path, 'rb')) # prediction with test data pred = model.predict(dtest) Recommendations for how to view/output which features the Tuned model is using for predictions? Load the model locally, then Plot a Single XGBoost Decision Tree xgb.plot_tree(model, num_trees=4, ax=ax) https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html Can you recommend a good source/training that goes through the python code we are working with? Kaggle has great intro tutorials for python and the libraries we used in the class https://www.kaggle.com/learn/python https://www.kaggle.com/learn/data-visualization https://www.kaggle.com/learn/intro-to-machine-learning This book is great for python developers without ML background. https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow-dp-1492032646/dp/1492032646/ref=dp_ob_title_bk How to keep learning about AWS Sagemaker? Try our free course on edx.org https://www.edx.org/course/simplifying-machine-learning-app-development-with-amazon-sagemaker Also, find free machine learning courses available on aws.training https://aws.amazon.com/training/learning-paths/machine-learning/ Can we use a built-in algorithm for semantic segmentation of an image? Yes, see the example: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/semantic_segmentation_pascalvoc/semantic_segmentation_pascalvoc.ipynb How can I keep my data inside the VPC? Use VPC Endpoints such as for Rekognition and Sagemaker Can I process sensitive information with AWS AI services You can see which services are eligible to process HIPAA protected information such as https://aws.amazon.com/blogs/machine-learning/aws-expands-hipaa-eligible-machine-learning-services-for-healthcare-customers/ and https://aws.amazon.com/about-aws/whats-new/2018/05/amazon-rekognition-achieves-hipaa-eligibility/ What do the different instance types for sagemaker cost? https://aws.amazon.com/sagemaker/pricing/instance-types/ Do you have algorithms that can be trained ‘online’? Some of the built-in sagemaker algorithms support incremental training. Otherwise, you can bring your own algorithm that starts from pretrained weights instead of from scratch. See https://docs.aws.amazon.com/sagemaker/latest/dg/incremental-training.html How to get the latest docker image for an algorithm? Add repo_version="latest") More labs and classes? https://www.edx.org/course/amazon-sagemaker-simplifying-machine-learning-appl Amazon.qwiklabs.com I’m still having trouble understanding Bias vs Variance intuitively. What do you have? See this ML Cheat sheet for a good diagram This discussion has a good answer: What is the meaning of term variance in machine learning model? And our AWS documentation has some simple heuristics at Model fit underfitting vs overfitting I want to understand the math for bias-variance decomposition. How is it calculated and for which algorithms does it apply? MLXTend Python library has some functions to try to calculate it. Their documentation describes their Bias Variance decomposition method pretty well And Bias–variance tradeoff from Wikipedia provides some math derivations. How do I view the coefficients of my linear learner model? The sagemaker linear learner model is saved as an mxnet model file in S3. You can download model.tar.gz, untar and then unzip the algo file. Then load with mxnet as described at AWS forums or at Stack Overflow See code example below. import os import mxnet as mx import boto3 bucket = "<your_bucket" key = "<your_model_prefix>" boto3.resource('s3').Bucket(bucket).download_file(key, 'model.tar.gz') os.system('tar -zxvf model.tar.gz') # Linear learner model is itself a zip file, containing a mxnet model and other metadata. # First unzip the model. os.system('unzip model_algo-1') # Load the mxnet module mod = mx.module.Module.load("mx-mod", 0) # model's weights mod._arg_params['fc0_weight'].asnumpy().flatten() # model bias mod._arg_params['fc0_bias'].asnumpy().flatten() # Using the model for prediction # First create a mxnet data iterator: # https://mxnet.incubator.apache.org/tutorials/basic/data.html#reading-data-in-memory # https://mxnet.incubator.apache.org/tutorials/basic/data.html#reading-data-from-csv-files data_iter = create_data_iter() # Next bind the module with the data shapes. mod.bind(data_shapes=data_iter.provide_data) # Predict mod.predict(data_iter) Why do we oversample the minority class when we have a class imbalance for classification? You need to make sure that the learning algorithm is seeing enough examples of the minority class such that the weight optimization properly gets updated. This is one technique for trying to rectify class imbalance. See the following example notebook for more. https://www.kaggle.com/tanlikesmath/oversampling-mnist-with-fastai How do I use batch predictions instead of real-time endpoints? See example: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_batch_transform/introduction_to_batch_transform/batch_transform_pca_dbscan_movie_clusters.ipynb Where can I learn more about Deep Learning on AWS? We have a training offering for introduction to deep learning models available. Here is the description https://aws.amazon.com/training/course-descriptions/deep-learning/