第一题：WHERE IS THE PICTURE OF MOUNT RUSHMORE?

# 第一题：WHERE IS THE PICTURE OF MOUNT RUSHMORE? ## Overview ## Business Problem Welcome to this challenge! You have been recently hired by Anycorp (a major news broadcaster) as a data scientist. Your boss (non-technical) tells you that the company has built a large image archive over the years. Unfortunately, there is no/very little metadata associated with the images which makes it difficult for Anycorp journalists and news anchors to find relevant ones. Your boss points out that most of the images contain text which is important for understanding the context of the image. For your first project, your boss asks you to build a smart cloud solution to automatically extract metadata from the images. ## Learning Objectives In this challenge, you are going to extract various type of information from the images to make it easier for the Anycorp journalists to find relevant images. To acomplish this goal, you are going to leverage AWS image (Amazon Rekogntion) and NLP (Amazon Comprehend) AI services which make advanced machine learning accessible without the need to train any model! ## Task 1: Extract Text from Images using Amazon Rekognition ### Background You have been recently hired by Anycorp (a major news broadcaster) as a data scientist. Your boss (non-technical) tells you that the company has built a large image archive over the years. Unfortunately, there is no/very little metadata associated with the images which makes it difficult for Anycorp journalists and news anchors to find relevant ones. Your boss points out that most of the images contain text which is important for understanding the context of the image. For your first project, your boss asks you to build a smart cloud solution to automatically extract metadata from the images. ### Your Task You manager is writing an article on American civil war and is looking for the image of Abraham Lincoln statue. He mentions that although the word 'Lincoln' is not carved on the statue directly, it is engraved on the wall behind him. You have a little time to find the image! In this first task, you are going to extract text from images stored in S3 using Amazon Rekognition 'detect_text' API. Please note that Task #2 and Task #3 depend on correct completion of Task #1. More information on Amazon Rekognition 'detect_text' API: https://docs.aws.amazon.com/rekognition/latest/dg/text-detecting-text-procedure.html To complete this task, you need to: ### Step #1: Complete the "extract_text_from_image" function: This function returns text in an image stored in S3 by using Amazon Rekognition 'detect_text' API. Remember, you need to extract text tokens from the response returned by Rekognition and then join them by a single space (included in the code). The function should return a string containing all the text lines found in the image joined by single spaces. The portion of the function you write shouldn't be more than a few lines of code. Please note that the API response from Amazon Rekognition is a Python dictionary. It detects both "LINE" and "WORD" in the image which are linked together using a "ParentID" field for the "WORDS" tokens. So, in order to extract and combine text from the json response, you just need to first extract a list of tokens with "Type: LINE" (text_tokens) and then join them with a space (written for you). Just iterate over response['TextDetections'] and check if the item['Type'] == 'LINE'. If yes, token['DetectedText'] gives you the text in that line. Join all of these line tokens with a space as shown in the code. ### Step #2: Let's run the "extract_text_from_image" function you just completed on all the images. It is going to extract text from all images stored in S3 and populates the "text_in_image" column of the results_df. You don't need to do anything (no coding) other than running the code block under Task 1 Step 2 subsection. Please makes sure the "extract_text_from_image" function in Step #1 works as expected since this step can take a few minutes to complete and repeating it would take your time it. After it is done, take a look at "results_df" dataframe to make sure the "text_in_image" is populated. Other columns should be empty at this point. ### Step #3: Find the image file name with the term "Lincoln" (case insensitive) in it. To do that you can iterate over the of the "results_df" dataframe" and check if the term "Lincoln" exist in the "text_in_image" field. You can then either use the dataframe index or the 'image_key' column to extract the image file name. Once you find that image, supply "task1_answer" variable with the image file name. Your response should be a string in the form of 'xxx.jpg'. Run 'upload_answer_to_s3' line to upload you answer as json to S3. You are done! You'll receive credit in few minutes if your answer is correct. Remember, you have to complete Step #1 and Step #2 before Step #3 or you will not get credit! ### Getting Started: The entire challenge runs from a SageMaker Notebook already provisioned for you. Once in the AWS Console, type "SageMaker" in the search bar on top or select SageMaker from the "Services" menu (top left). In the SageMaker homepage, select SageMaker Notebook from the left menu. There should be a running SageMaker Notebook instance ready for you. Click on Jupyter and click on "Where_is_the_Picture_of_Mount_Rushmore_.ipynb". You need to complete this notebook which is broken down into five tasks. For this challenge, you have access to about 100 images. The images are uploaded to S3 and also to the SageMaker Notebook instance at: '/home/ec2-user/SageMaker/images/' Navigate to the AWS console once more and this time click on S3. There should be a bucket provisioned for you. Open the bucket. You should see about 100 images there. You need the Bucket name and the region you are in for running the SageMaker notebook. You region is listed on top right corner of the console. Click on the drop down menu to see your region code (e.g. us-east-1). ### Inventory: 1. SageMaker Notebook Instance 2. Partially filled Jupyter Notebook 3. S3 Bucket ### Service you use: 1. Amazon Rekognition 2. S3 ### Task Validation: The task automatically completes once you extract text for all images, find the image with the term 'Lincoln' and submit your answer using the 'submit_your_answer' (provided in the notebook). #### Clue 1:Calling Amazon Rekognition 'detect_text' API #### Clue 2:Extracting text from Amazon Rekognition API Response #### Clue 3:Putting it All together! ## Task 2: Extract Dominant Language from text using Amazon Comprehend ### Background: Your manager is happy with your progress but tells you that some of the images contain languages other than English. It is important to know which ones, as it impacts image selection. So, she wonders if it is possible to detect the dominant language automatically. ### Your Task: Your task is first to find dominant language in the text of all images. Out of all images containing German text (not all images), which one has the highest dominant language confidence score. To do this, you are going to leverage detect_dominant_language API of Amazon Comprehend AI service. More about the 'detect_dominant_language' API: https://docs.aws.amazon.com/comprehend/latest/dg/get-started-api-dominant-language.html Boto3 Amazon Comprehend API 'detect_dominant_language' reference: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/comprehend.html#Comprehend.Client.detect_dominant_language ### Step #1 (YOUR INPUT NEEDED): Complete "extract_dominant_language" function: This function uses Amazon Comprehend detect_dominant_language API to detect the dominant language in the text. Once you receive raw response from Amazon Comprehend, analyze it to see where you can find dominant language code and dominant language score. Return both the dominant language code and the dominant language score (detection confidence). The portion of the function you write shouldn't be more than a few lines of code. ### Step #2: Let's run the "extract_dominant_language" function you just completed on the text of all the images. It is going to extract dominant language and confidence score from 'text_in_image' column of 'results_df' and fill the 'dominant_language_code' and 'dominant_language_score' columns. You don't need to do anything (no coding) other than running the code block under Task 2 Step 2 subsection. Please makes sure the "extract_dominant_language" function in Step #1 works as expected before running this step. After it is done, take a look at "results_df" dataframe to make sure the 'dominant_language_code' and 'dominant_language_score' are populated. Other columns should be empty at this point. ### Step #3 (YOUR INPUT NEEDED): Out of all images with German text (language code: 'de'), which one has the higest 'dominant_language_score'? Code is provded to filter "results_df" to rows with German text. You can then either use the 'index' or the 'image_key' column to extract the image file name. Once you find that image, populate "task2_answer" with the image file name. Your response should be a string in the form of 'xxx.jpg'. Run 'upload_answer_to_s3' line to upload you answer as json to S3. You are done! You'll receive credit in few minutes if your answer is correct. ### Inventory: SageMaker Notebook Instance Partially filled Jupyter Notebook S3 Bucket ### Service you use: Amazon Comprehend S3 ### Task Validation: The task automatically completes once you find dominant language and confidence scores for text in all images, find the image with highest language confidence score (out of of images with German text), and submit your answer using the 'submit_your_answer' (provided in the notebook). #### Clue 1:Calling Amazon Comprehend 'detect_dominant_language' API #### Clue 2:Extracting dominant language and score from Amazon Comprehend API Response #### Clue 3:Putting it All together! ## Task 3: Sentiment Analysis Using Amazon Comprehend AI Service! ### Background: You wonder if you can do automated sentiment analysis and further enrich the image metadata. To do this, you are going to leverage 'detect_sentiment' API of Amazon Comprehend AI service. More about the 'detect_sentiment' API: https://docs.aws.amazon.com/comprehend/latest/dg/how-sentiment.htmlBoto3 Amazon Comprehend API 'detect_sentiment' reference: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/comprehend.html#Comprehend.Client.detect_sentiment ### Your Task: Your task is extract sentiment and associated scores from text found on each of the images, and find the image with most negative sentiment among those with English text. ### Step #1: Complete 'sentiment_detection' function: This function uses Amazon Comprehend 'detect_sentiment' API to extract text sentiment. Extract both the sentiment and sentiment_scores for each of the [POSITIVE | NEGATIVE | NEUTRAL | MIXED] sentiment possibilities. The portion of the function you write shouldn't be more than a few lines of code. First call Amazon Comprehend 'detect_sentiment' and get a raw response. Analyze the response and make see how you can extract both sentiment and sentiment score. sentiment_score is going to be a Python dictionary. You can complete this is less than 5 lines of code! ### Step #2: Let's run the 'sentiment_detection' function on the text of all images. It is going to extract sentiment and see sentiment scores from 'text_in_image' column of 'results_df' and fill the 'sentiment' and four 'sentiment_score' columns. You don't need to do anything (no coding) other than running the code block under Task 3 Step 2 subsection. Please makes sure the "sentiment_detection" function in Step #1 works as expected before running this step. After it is done, take a look at "results_df" dataframe to make sure the 'sentiment' and four 'sentiment_score' are populated. Other columns should be empty at this point. Note that in some cases it is not possible to extract the sentiment (maybe not enough text). Don't worry if you see those cases! ### Step #3: Based on the 'sentiment_negative_score' and 'dominant_language_code' column of the results_df, which image has the most negative English text sentiment in it? Code is provided to sort the dataframe by 'sentiment_negative_score' in descending order. You can then either use the 'index' or the 'image_key' column to extract the image file name. Once you find that image, populate "task3_answer" with the image file name. Your response should be a string in the form of 'xxx.jpg'. Run 'upload_answer_to_s3' line to upload you answer as json to S3. You are done! You'll receive credit in few minutes if your answer is correct. ### Inventory: SageMaker Notebook Instance Partially filled Jupyter Notebook S3 Bucket ### Service you use: Amazon Comprehend S3 ### Task Validation: The task automatically completes once you extract text sentiment and associate scores for all the images, find the image with most negative English text sentiment and submit your answer using the 'submit_your_answer' (provided in the notebook). ## Task 4: Count Faces on the Image using Amazon Rekognition! ### Background: Your manager explains that privacy policies differ for images containing faces of people. But unfortunately, they don't have information recorded for each image. Your task is to extract number of faces on each image using Amazon Rekognition and also find a photo a photo of "Mount Rushmore National Memorial". Your boss needs it quickly and as far as he remembers there is no text in that image. He gives you a little history lesson about the memorail and the fact that the sculpture features heads of four US presidents. You job is to leverage Amazon Rekognition 'detect_faces' API to quickly count number of faces in each image! More about the 'detect_faces' API: https://docs.aws.amazon.com/rekognition/latest/dg/API_DetectFaces.html Amazon Rekognition API 'detect_faces' reference: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/rekognition.html#Rekognition.Client.detect_faces ### Step #1 (YOUR INPUT IS NEEDED): Complete "extract_num_faces" function: This function uses Amazon Rekognition "detect_faces" API to extract faces from an image. Once you have response from "detect_faces" API, count number of items in the "FaceDetails" key. The function should return the number of faces in an image. The portion of the function you write shouldn't be more than a few lines of code. #### Step #2: Let's run the "extract_num_faces" function on the text of all images. It is going to extract number of faces on all images and fill the 'n_faces' columns. You don't need to do anything (no coding) other than running the code block under Task 4 Step 2 subsection. Please makes sure the "extract_num_faces" function in Step #1 works as expected before running this step. After it is done, take a look at "results_df" dataframe to make sure the 'n_faces' is populated. Other columns should be empty at this point. #### Step #3 (YOUR INPUT IS NEEDED): Based on the 'n_faces' column of the results_df, which image has four faces? You can use the provided code to sort the dataframe by 'n_faces' in descending order. You can then either use the 'index' or the 'image_key' column to extract the image file name. Once you find that image, populate "task4_answer" with the image file name. Your response should be a string in the form of 'xxx.jpg'. Run 'upload_answer_to_s3' line to upload you answer as json to S3. You are done! You'll receive credit in few minutes if your answer is correct. ### Inventory: 1. SageMaker Notebook Instance 2. Partially filled Jupyter Notebook 3. S3 Bucket ### Service you use: 1. Amazon Rekognition 2. S3 ### Task Validation: The task automatically completes once you extract number of faces in each image, find the image with four faces and submit your answer using the 'submit_your_answer' (provided in the notebook). #### Clue 1:Calling Amazon Rekognition 'detect_faces' API #### Clue 2:Extracting number of faces from Amazon Rekognition API Response #### Clue 3:Putting it All together! ## Task 5: Label images using Amazon Rekognition 'detect_labels' API ### Background: Now it is your turn to surprise your manager. You think she will be impressed if you can extract label for of all the images automatically! ### Your Task: Label all the images using Amazon Rekognition "detect_labels" API. This API analyses an image and returns labels along with their associated confidence score. More about the 'detect_labels' API: https://docs.aws.amazon.com/rekognition/latest/dg/labels-detect-labels-image.html Amazon Rekognition API 'detect_labels' reference: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/rekognition.html#Rekognition.Client.detect_labels ### Step #1 (YOUR INPUT IS NEEDED): Complete 'extract_image_labels' function: This function uses Amazon Rekognition 'detect_labels' API to extract labels from an image. Once you have response from 'detect_labels' API, extract top label and associated confidence score (top label is the one with highest confidence score). The function should return these two values. The portion of the function you write shouldn't be more than a few lines of code. Examine the raw response from 'detect_labels' to understand how to extract top label and the associated confidence score. ### Step #2: Let's run the "extract_image_labels" function on the text of all images. It is going to extract label and confidence scoore on all images and fill the 'label' and 'label_conf' columns. You don't need to do anything (no coding) other than running the code block under Task 5 Step 2 subsection. Please makes sure the "extract_image_labels" function in Step #1 works as expected before running this step. After it is done, take a look at "results_df" dataframe to make sure the 'label' and 'label_conf' columns are populated. All columns should be populated at this point. ### Step #3 (YOUR INPUT IS NEEDED): You should now find the image contianing a bird! Based on the 'label' column of the results_df, which image is labeld as "bird" or "animal"? To help, code is provided to sort the results_df based on 'label'. You can then either use the 'index' or the 'image_key' column to extract the image file name. Once you find that image, populate "task5_answer" with the image file name. Your response should be a string in the form of 'xxx.jpg'. Run 'upload_answer_to_s3' line to upload you answer as json to S3. You are done! You'll receive credit in few minutes if your answer is correct. ### Inventory: SageMaker Notebook Instance Partially filled Jupyter Notebook S3 Bucket ### Service you use: Amazon Rekognition S3 ### Task Validation: The task automatically completes once 1) you extract label for all the images, 2) you find the image with the label 'bird' or 'animal', and 3) you submit your answer using the 'submit_your_answer' (provided in the notebook). pd.set_option("max_rows", None) #### Clue 1:Calling Amazon Rekognition 'detect_labels' API #### Clue 2:Extract Label for Amazon Rekognition 'detect_labels' Response #### Clue 3:Find the Image with a Bird! # 第二题：PANDA KUNG FU ## Overview Restaurateur Bo has big dreams of becoming a Kung Fu expert, but he is spending too much time helping run his family noodle shop. He believes that if he can properly understand some customer trends he can reduce costs and save time with ordering. With the extra time and cash he can pursue Kung Fu working with local Grand Master Wayoog. Can he use Sagemaker to help understand his business and follow his dreams ? ## Task 1: Valley of Peace Background Bo has been working on a Jupyter notebook (complete with notes) upload it into Sagemaker and start working through his data entry. Your Task Download the Jupyter notebook and upload it into your Sagemaker instance. Follow Bo’s comments to execute his code. Be mindful of errors. #### Getting Started Download Bo's Notebook using the address below. Open the Amazon Sagemaker console by navigating to the Sagemaker service in the AWS Management Console. Under Notebook choose Notebook instances. Open the "BONOODLEHOUSE-xxxxxxx" instance. Open Jupyter Upload Bo's Notebook(from step 1) into Sagemaker and open it. Start troubleshooting Bo’s Code. If prompted Kernel should be set to "conda_python3" #### Inventory Bo's Notebook can be downloaded from here. https://aws-jam-challenge-resources.s3.amazonaws.com/panda-kung-fu/BoNoteBook.ipynb Services you should use Sagemaker #### Task Validation A cell will execute telling you the input code. When submitting do not include punctuation and symbols. #### Clue 1:The secret ingredient of the secret ingredient soup (Error Tracking) #### Clue 2:There is no secret ingredient. (Walkthrough) ## Task 2: You must believe! ### Background Bo wishes to better understand his total revenue for the week. ### Your Task Continue through the Jupyter Notebook and determine the Total Revenue for week 1. Round down, the answer should be a 3 digit number. ### Getting Started Continue working through Bo's notebook from Task 1. [Bo's Notebook can be downloaded from here.](https://aws-jam-challenge-resources.s3.amazonaws.com/panda-kung-fu/BoNoteBook.ipynb) Services you should use Sagemaker ### Task Validation You need to execute the cells in the notebook and run panda commands to get the correct sum. There should be minimal troubleshooting as you reapply code from earlier steps to different variables. ### Clue 1:There are no coincidences in this world (Creating new columns) ### Clue 2:There are no accidents! (Walkthrough) ## Task 3: If you only do what you can do, you will never be more than who you are. ### Background Bo is starting his first Kung Fu lessons! He is a very excited panda! However he messed up some data for week 2. Can you fix it and get the TotalRevenue? ### Your Task Continue through the Jupyter Notebook and determine the Total Revenue for week 2. Round down to the nearest 3 digit number. ### Getting Started Continue working through Bo's notebook from Task 1 & 2. [Bo's Notebook can be downloaded from here.](https://aws-jam-challenge-resources.s3.amazonaws.com/panda-kung-fu/BoNoteBook.ipynb) ### Services you should use Sagemaker ### Task Validation You need to execute the cells in the notebook and run panda commands to get the correct sum. There are a few items you will need to troubleshoot. #### Clue 1:and today is a gift... (replacing null values) #### Clue 2:that's why they call it present (Walkthrough) # 第三题：ANTARCTIC ICE CAP ## Overview Dr. X requested the most up to date high resolution satellite image of the Antarctic ice cap from NASA for her research. Unfortunately, due to bandwidth limitations, the image was transmitted in chunks, each chunk capturing only a part of the ice cap. Using Computer Vision techniques, you need to help Dr. X stitch them in proper order to get the complete image. ## Background Dr. X is trying to combine images she received from NASA to complete an image of the Antarctic ice sheet. She has started the work on a Jupyter notebook but the work is still unfinished. ## Task Your task is to download the notebook and follow the step by step instructions provided there to complete the image. ## Inventory You'll need the following resources to complete the task. Download Dr. X's notebook from here: https://aws-jam-challenge-resources.s3.amazonaws.com/image-stitching/ice-map.ipynb You'l also need to download the image components that Dr. X was trying to stitch together. Grab them from below: https://aws-jam-challenge-resources.s3.amazonaws.com/image-stitching/map_bot_left.jpg https://aws-jam-challenge-resources.s3.amazonaws.com/image-stitching/map_bot_right.jpg https://aws-jam-challenge-resources.s3.amazonaws.com/image-stitching/map_top_left.jpg https://aws-jam-challenge-resources.s3.amazonaws.com/image-stitching/map_top_right.jpg ## Getting Started Open the Amazon Sagemaker console by navigating to the Sagemaker service in the AWS Management Console. Under Notebook, choose Notebook instances. Upload Dr. X's notebook and rename it as ice-map.ipynb Open the notebook and follow the instructions therein. You can execute each cell of the notebook by pressing shift and enter buttons together. Write code to complete the missing pieces to obtain the final composite image. ## Task Validation Follow instructions in the notebook to upload the image you have created to an S3 bucket for evaluation. #### Clue 1:Solution to "join_top_bottom" function #### Clue 2:Stitch the bottom two images #### Clue 3:Complete solution # 第四题：ENCRYPT THE DATA LAKE ## Overview You are a Security Engineer working at MuMuMango on a project to create a data lake to store all the fruits your company sells. Your corporate policy dictates that the sensitive data in your data lake must be encrypted at rest with encryption keys controlled by your organization. You've just discovered that hundreds of objects have been uploaded to S3 without encryption at rest! Your task is to quickly remediate this policy violation to prevent damages to the MuMuMango brand and ensure that future uploads stay within policy. ## Your AWS Account has the following resources: - A Data Lake S3 bucket with hundreds of objects, some of which are not encrypted. - A Customer Managed KMS Key that must be used to encrypt at rest all current and future objects in the data lake. (There is only one in the account) - [An S3 Inventor](https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-inventory.html) file. For this challenge a single inventory CSV file is pre-created for you. - An IAM Role (FixEncryptionRole) that can and should be used when fixing the encryption error ## Requirements Your job is to bring your data lake in compliance with policy by: 1.Encrypting the existing objects in the Data Lake using a scalable method. 2.Configuring the Data Lake S3 bucket so that new objects are encrypted by default with the Customer Managed KMS key. Hints: - Make sure you complete the requirements above, in order - While the S3 Console allows you to change the encryption for a small number of objects, this method does not scale to millions or billons of objects. - The challenge verification will not award full points if you don't use a scalable method to encrypt objects. - Make sure you are using the generated KMS key not the S3 KMS key - After you have completed the task, clicking check my progress will verify you have completed the challenge ## Helpful Links Here are a few links to help you with the challenge: - [S3 Batch Operations](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/batch-ops.html) - [Amazon S3 Default Encryption for S3 Buckets](https://docs.aws.amazon.com/AmazonS3/latest/dev/bucket-encryption.html)