# <center>E-mail spam classification on sagemaker using AWS</center>
:::info
### What is Sagemaker?
<section> Sagemaker is a web service provided by amazon on the AWS platform. Here we can be able to build, train and deploy the machine learning models.
In this document, we are going to learn how to build a machine learning model so that it can detect whether the received email is spam or not.
</section>
:::
## Lets Get Started
***
### Creating a Amazon S3 Bucket
In this section, we will create an Amazon S3 bucket as a staging area. Amazon SageMaker uses Amazon S3 as the main storage for both data and model artifacts; you can actually use other sources when loading data into the Jupyter notebook instances.
1. Sign into the AWS Management Console at https://console.aws.amazon.com/
2. In the upper-right corner of the AWS Management Console, confirm you are in the desired AWS region. For this workshop we will use the **US East (N. Virginia) [us-east-1]**
3. Open the Amazon S3 console at https://console.aws.amazon.com/s3 or choose the Amazon S3 service in the menu.
4. In the Amazon S3 console, click the Create Bucket button.
5. For the Bucket Name, type **bucketname** you would like for and check whether or not it is available and click Next (take note of the bucket name, it will be needed later for loading data in the notebook instance). Press Next to move to the next screen. Note: if the bucket name is already taken, feel free to add an extra suffix.
6. Enable versioning of the objects in the bucket as shown in the screen below. This will be required to use AWS CodePipeline in the next sections of the workshop.
7. Click Next and then Next. Finally, click Create Bucket in the Review page.
### Creating Notebook Instance
1. First, we need to search for the sagemaker and click on the notebook instance from the left side service menu.
2. To write a program, we need a text editor or IDE form. Here the sagemaker provides notebook instances. We can make use of it to write and run programs. It also supports various programming languages, but we are going to use python.
3. Create a notebook instance.

4. Enter the notebook instance name.
5. Choose ml.t2.medium as a Notebook instance type.
6. Choose to create a new role in the I am role dropdown list. Notebook instances require permissions to call other services including Amazon SageMaker and Amazon S3 APIs. Choose specific S3 buckets in the Create an I am role window and input the name of the bucket that you have created in the previous section. Then click on Create role.
:::warning
Note: If you already had a role for a newly created bucket with the proper grants to access, Then no need to create a role, you can just opt to use the existing role and select your role from the list below.
:::
7. Keep No VPC selected in the VPC dropdown list.
8. Keep no configuration selected in the Lifecycle configuration dropdown list.
9. Keep No Custom Encryption selected in the Encryption key dropdown list.
10. Finally, click on Create notebook instance.
:::warning
Wait till the Notebook Instance status changed to INSERVICE. Then click on **open jupiter lab**
:::
So now we are going run our program on Jupiter IDE.
### Download dataset and training code to the notebook instance
For the purpose of this project, the code required to build and train the Machine Learning model is developed on a jupyter notebook instance.
As a consequence, in this section we will download the dataset to this repository into the Amazon SageMaker notebook instance and access the Jupyter notebook to run training.
You can see this at the jupyter dashboard.
1. Click on New > Terminal in the right-hand side of the Jupyter interface
<center><img src="https://i.imgur.com/HGqnRJm.png"></center>
2. Execute the following commands in bash
`cd SageMaker`
`git clone https://github.com/giuseppeporcelli/smlambdaworkshop`
By executing this command, we will get the training data.
3. When the clone operation completes, close the terminal window and return to the Jupyter landing page. The folder smlambdaworkshop will appear automatically (if not, you can hit the Refresh button).
4. Browse to the folder **smlambdaworkshop** > **training** and open the file **sms_spam_classifier_mxnet.ipynb**
:::info
#### References:
1. Link for github code reposiory https://github.com/giuseppeporcelli/smlambdaworkshop
:::
This is the code I have taken from github, but I have made a few modifications to it because of the change in the framework and the versions. Below are the modified parameters.
:::danger
#### Important Points before Running program:
1. It is mandatory to choose the kernel. So select "conda_mxnet_latest_p37" kernel version. I have tried different kernel versions by running. With the remaining mxnet kernals, The issue is that I am not able to load a file from the s3 bucket with that framework.
2. Change <bucketname> to your amazon s3 bucket name, e.g., smlambda-workshop2-ashraf-mohammad
3. Remove the term "train" from "train_instance" and "train_type" because the new framework throws errors.
4. Add the below lines to at hyperparameters
`Framework_version="1.2",
py_version="py3",`
5. Run the below command in the terminal to confirm whether the pandas package exists.
`Python-m pip install --upgrade --user pandas`
6. Now run the steps one by one by clicking on the run option (Shift+Enter) and **Run only till the training job is complete.**
:::
Later we will run the next steps to deploy the model and create an endpoint. Because deploying and creating and endpoint will charge in a SageMaker.
### Creating a Simple Email Service (SES)
1. On the Search console, search for SES or a simple email service. Then we will land on the dashboard of SES.
2. First, you need to verify your identity before using SE services. To verify identity, jump over to domains under identity management on the service menu.
3. Click on verify your domain and enter your domain name (example.com). Click on verify, then you can see a popup window which consists of "key": "Value" pairs.
4. Copy the key values pairs and paste it in your Domain Name System (DNS) server.
5. If you have copied and pasted the values on your DNS, the status of your verification will be pending. You can send or receive emails only after successful verification.
6. After your domain has been added, you can be able to verify your email address by clicking on verify, ,email in the identity management. It will send a verification mail to your mail address.
7. Navigate to your email account and check whether or not you have received an email from AWS. If yes, click the link and it will verify your identity. If no, click on resend the verification email on SES and repeat this step.
:::warning
#### Note:
To send email through SES, we need to verify both FROM and TO email addresses. Domain Name System (DNS) verification is optional for sending email. But to receive email on SES it is mandatory to have a verified DNS server. So you require a domain to receive email.
If you dont have a domain create a domain and follow the process given under creating a work mail section.
:::
**Example of key Vlaue pairs:**
<center><table>
<th>Type
</th><th>Name
</th><th>Value
</th>
<tr><td>TXT (Text)
</td><td>_amazonses.example.com
</td><td>gYRnQsP92DUC7zZxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
</td></tr>
<tr><td></td></tr>
<tr><td>CNAME
</td><td>gYRnQsP92DUC7zZxxxxxxxxxxxxxxxxx._domainkey.example.com
</td><td>gYRnQsP92DUC7zZxxxxxxxxxxxxxxxxx._domainkey.example.com
</td></tr>
<tr><td>CNAME
</td><td>gYRnQsP92DUC7zZxxxxxxxxxxxxxxxxx._domainkey.example.com</td><td>gYRnQsP92DUC7zZxxxxxxxxxxxxxxxxx._domainkey.example.com
</td></tr>
<tr><td>CNAME
</td><td>gYRnQsP92DUC7zZxxxxxxxxxxxxxxxxx._domainkey.example.com
</td><td>gYRnQsP92DUC7zZxxxxxxxxxxxxxxxxx._domainkey.example.com
</td></tr>
<tr><td></td></tr>
<tr><td>MX
</td><td>example.com
</td><td>10 feedback-smtp.<region>.example.com
</td></tr>
<tr><td>TXT
</td><td>example.com
</td><td>v=spf1 include:example.comn ~all
</td></tr>
</table>
</center>
### Creating a Work Mail
Since I have created a new domain. I want to create a email address for that domain. So I would like to use work mail to do that as it was one of the service from AWS.
1. Search for Work Mail on the aws service and click on it.
2. Create an organisation here by default
External domain will be opted. if your domain is hosted ot created on AWS Route53 then choose accordingly. you need to enter your domain name like example.com in the field labelled with enter your domain. This is the domain which was verified on SES and a enter a name for your organisation.
3. Click next and wait till the status gets activated. After the activation you can access the created organisation. Then click on create user and add the user it means to create email id. Select the domain from the dropdown list because workmail will create a domain by default then Fill the username and password and add the user.
4. once the user is added then make sure it was enabled.
5. Once it was done all are verified then navigate to organisation setting on service menu and click on the link provided in the web appilication.
6. This the link for your workmail organisation login page. you can login with the created user credentials and check the mailbox.
7. Now you can proceed to verify your created email address on SES. Once you on verify then you can get the mails in your workmail mailbox.
:::warning
#### Note:
If your **not able to receive mails** sent to your new mail address created by workmail then follow the given instruction below.
Navigate to the domain on the service menu and click on your domain name. If there is missing part highlighted overthere. copy those key value pairs and paste it on your DNS server. All the parameters under the Mail setup section should be verified.
:::
### Creating a ruleset on SES for storing a received email
**This section helps us to store received emails in s3 bucket.**
:::warning
#### Note:
Add a policy to your bucket such that it will allow SES to access and store received email object files in it. For that, paste the below code on your bucket policy.
```
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowSESPuts",
"Effect": "Allow",
"Principal": {
"Service": "ses.amazonaws.com"
},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::<bucketname>/*",
"Condition": {
"StringEquals": {
"aws:Referer": "<Your AWS account ID>"
}
}
}
]
}
```
Make sure that you have changed your <bucketname> with your bucketname and "<Your AWS account ID> with your AWS account ID.
:::
1. In SES, navigate to the ruleset present in the servie menu and click on create ruleset.
2. If you want to store emails received to a particular email id, enter that email address in the recipient email. Otherwise, if you want to store all the email received to your domain, leave it blank. Proceed to the next by clicking on the add-receiver.
3. Give a name for your rule like "Storing-email".
4. Now add an action which they should do while it has received the email. Add s3 service and choose your bucket name from the list. You can create a new bucket if you like to store all the received emails in another bucket.
5. Click next and check on enable, to active this rule. Even you can add multiple actions like SNS or Lambda if you want to use this received email on those services.
6. Check and confirm all the parameters and click on add rule.
7. Now you can see the activated rule set and rules in the dashboard of the rule set.
:::success
Once you are done with it you can make a test by sending a mail to your verified domain mail address (test@example.com). or you can send test email simply by navigating to verified email address or domains on service menu on ses and check on email from which you want to send and click on send a test email. Then fill the To address, subect, body of the email and click on SEND TEST EMAIL.
Check your s3 bucket it should have 2 files. one is the Notification file that is created because if adding the ruleset. and other one is the object file which was created by storing a received a email.
:::
:::danger
If the object file was not created in your s3 bucket.
1. Check the bucket policy
2. Check the rule set
3. Make sure that your account on SES should be moved out of sandbox which means that it should on production. Then only SES can able to receive emails. To check this status navigate to sending statistics section under the email sending at SES service menu. please follow the instructions given here to move out of <a href="https://docs.aws.amazon.com/ses/latest/DeveloperGuide/request-production-access.html">sandbox</a>.
:::
:::info
#### References:
https://docs.aws.amazon.com/ses/latest/DeveloperGuide/receiving-email.html
:::
### Creating IAM
Till now may be you have created many roles for creating buckets, instances. But here we are going yo create a new role for lambda function. Because we are going to call multiple functions in one lambda function. so we cant create a new policy for each and every function. I mean that we can create but itd not a good practice. we are going to add the policies in one role required for executing our lambda function.
1. Search for the IAM service on aws and on the service menu of IAM dashboard click on roles.
2. Create a new role and give the name of your role that has to called on lambda function.
3. click on create a custom policy and add the json code given here. No need to add any tags.
```
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "sagemaker:InvokeEndpoint",
"Resource": "*"
}
]
}
```
4. Now again navigate to the roles and open the created role. Click on attach policies and attach the policy which you have created earlier and other three policies are listed below.
1. AmazonS3FullAccess -- Used to call bucket where our email was stored .
2. AWSLambdaBasicExecutionRole -- It is used to execute for lambda function. It is mandatory to add
3. AmazonSageMakerFullAccess -- It is used to invoke our endpoint to call our model.
Once the policies are attached then it will be saved automatically.
### Creating Lambda Function
Here comes the final part of the project. In this section we are goin to write a python code. This code will run only when the event is triggered like email stored in bucket.
1. Navigate to lambda service and create a function. Name the lambda function and choose the exisiting role then opt for the role which was created in IAM from select list. Make sure to select python3.6 or higher version.
2. At this point navidate to s3 bucket where you have stored email and inside the properties of that bucket create a event "s3:ObjectPut" and add the lambdafunction which was created from the list at end of the page. click on create event. Now the trigger was set.
3. Inside your lambda function you can see the python compiler and files listed are alligned to left of compiler. By default lambda function create a handler named "lambda_function.lambda_handler" with the function defined as "lambda_handler".
4. we can write python code inside defined function.
5. While writing a code first we need to call the email object stored in s3 bucket.
6. Extract the subject and body of the email form the object and process the data for encoding format used while testing in our notebook instance like one hot coding.
7. Convert the data in to format of body that is of MIME type because invoke_endpoint() function only accepts MIME type of data. It willbe considered as payload for our endpoint.
8. Call the invoke_endpoint() function pass all the required parameters in the format mentioned in AWS documentation.
9. Get the repsonse and use encoding meathods by debugging the results.
10. Now you can get the predicted values and probability from the result and store all the data required in varaibles to send a email.
11. Prepare a message body and subject line and pass varaiables which we have processed. call the client function to send email and pass the required parameters from the varaibles like FROM,TO,SUBJECT and BODY.
12. To test the code create a Test configuration names as S3:PUT. it was a inbuild program to create a event put in s3 bucket.
13. Once it was done. Now you are ready to check.
### Final Test
1. Open your email account send a email to the created workmail or the verfied email address whcih was used in rule set of SES.
2. Wait for a min and reload the inbox. Check wether you have received a email back stating that it was HAM or SPAM with the probability.
<img src="https://i.imgur.com/xUJoOlC.png"/>
:::success
**Hope you have learnt how to work on Machine learning model end to end in AWS.
Thank you so much for everyone who guided and helped me on this work.**
**Guided By:
Prof.Chang
Department of Electronics
National Ilan University**
:::