# AWS Cloud Learning (雲端程式設計 Lab4-6)
###### tags: `Master` `Cloud Programming`
## Lab4: Multi-tier web server architecture
- Typical web development using 3 tier

層和層之間,需要透過Queue/message management system
- Objectives
(1) 熟悉storage and db services
(2) Build multi tier application
(3) Start using Java SDK and Eclipse AWS toolkit
- Working items
Part 1: Simple Storage Service(S3)
Part 2: Relational Database Service(RDS)
Part 3: Simple Queue Service(SQS)
Part 4: Build a batch processing web backend system
### Part 1: S3
- Use case
Like data sharing and accessing, hosting static website content
- S3 Components
(1) Bucket: 建立在所選的區域,但名字必須globally unique
(2) Object: 也就是file,存在Bucket下面,只支援上傳下載
(3) Folder: 建在S3之下,but it only exist logically in namespace. S3 uses the full path as the object key
- Goal:
use S3 to host **static web pages** which might also contain client side scripts.
- checkpoint 4-1 steps
(1) 首先,建好bucket
(2) Use bucket to host a website
:::info
如果要設為public bucket
要加入下列程式碼在permission
```
{ "Version":"2012-10-17",
"Statement":[{
"Sid":"PublicReadForGetBucketObjects",
"Effect":"Allow",
"Principal": "*",
"Action":["s3:GetObject"],
"Resource":["arn:aws:s3:::[your-bucket-name]/*" ]
} ]}
```
:::
(3) 上傳一個html檔案到bucket
(4) 那就可以透過特定網址來連到該網站
「透過AWS的Route 53服務,來將網域名稱設定到你的機器IP上。」
ex: http://[your-bucket-name].s3-website-us- east-1.amazonaws.com
### Part 2: RDS
A web service that makes it easier to set up, operate and scale a relational database in the cloud
- checkpoint 4-2 steps
(1) 建立一個rbs 資料庫
(2) 在Lab-EC2-Webserver上增加資料庫設定檔
(3) 下載網頁的php檔到該server
(4) 用http://[your EC2’s public DNS]/SamplePage.php 這個網址開啟網頁
--> 若遇到問題可以看log msg在 /var/log/httpd/error_log
(5) 記得關掉rds
### Part 3: SQS
Offers a **reliable, highly-scalable** hosted queue for **storing messages** as they travel between applications or microservices.
- Support two types of queue

- 可以send messages
### Part 4: Build a batch processing APP

- checkpoint 4-3 steps
(1) Create SQS queues
(2) Create S3 bucket
(3) Create EC2 instance
(4) Passing user data to the instance that can be used to perform common **automated configuration tasks** and even **run scripts** after the instance starts.
(5) 到SQS 的input queue 選擇send a Message
(包含圖片連結的訊息,如圖)

(6) Queue收到多個圖片後,把他merge起來,放到output queue
(7) S3應該也會有merge好的圖片,是由imageprocessor.py完成再傳過去
```python=
import boto
import json
import time
import sys
import getopt
import argparse
import os
import logging
import StringIO
import uuid
import math
import httplib
from boto.sqs.message import RawMessage
from boto.sqs.message import Message
from boto.s3.key import Key
##########################################################
# Connect to SQS and poll for messages
##########################################################
def main(argv=None):
# Handle command-line arguments for AWS credentials and resource names
parser = argparse.ArgumentParser(description='Process AWS resources and credentials.')
parser.add_argument('--input-queue', action='store', dest='input_queue', required=False, default="input", help='SQS queue from which input jobs are retrieved')
parser.add_argument('--output-queue', action='store', dest='output_queue', required=False, default="output", help='SQS queue to which job results are placed')
parser.add_argument('--s3-output-bucket', action='store', dest='s3_output_bucket', required=False, default="", help='S3 bucket where list of instances will be stored')
parser.add_argument('--region', action='store', dest='region', required=False, default="", help='Region that the SQS queus are in')
args = parser.parse_args()
# Get region
region_name = args.region
# If no region supplied, extract it from meta-data
if region_name == '':
conn = httplib.HTTPConnection("169.254.169.254", 80)
conn.request("GET", "/latest/meta-data/placement/availability-zone/")
response = conn.getresponse()
region_name = response.read()[:-1]
info_message('Using Region %s' % (region_name))
# Set queue names
input_queue_name = args.input_queue
output_queue_name = args.output_queue
# Get S3 endpoint
s3_endpoint = [region.endpoint for region in boto.s3.regions() if region.name == region_name][0]
# Get S3 bucket, create if none supplied
s3_output_bucket = args.s3_output_bucket
if s3_output_bucket == "":
s3_output_bucket = create_s3_output_bucket(s3_output_bucket, s3_endpoint, region_name)
info_message('Retrieving jobs from queue %s. Processed images will be stored in %s and a message placed in queue %s' % (input_queue_name, s3_output_bucket, output_queue_name))
try:
# Connect to SQS and open queue
sqs = boto.sqs.connect_to_region(region_name)
except Exception as ex:
error_message("Encountered an error setting SQS region. Please confirm you have queues in %s." % (region_name))
sys.exit(1)
try:
input_queue = sqs.get_queue(input_queue_name)
input_queue.set_message_class(RawMessage)
except Exception as ex:
error_message("Encountered an error connecting to SQS queue %s. Confirm that your input queue exists." % (input_queue_name))
sys.exit(2)
try:
output_queue = sqs.get_queue(output_queue_name)
output_queue.set_message_class(RawMessage)
except Exception as ex:
error_message("Encountered an error connecting to SQS queue %s. Confirm that your output queue exists." % (output_queue_name))
sys.exit(3)
info_message("Polling input queue...")
while True:
# Get messages
rs = input_queue.get_messages(num_messages=1)
if len(rs) > 0:
# Iterate each message
for raw_message in rs:
info_message("Message received...")
# Parse JSON message (going two levels deep to get the embedded message)
message = raw_message.get_body()
# Create a unique job id
job_id = str(uuid.uuid4())
# Process the image, creating the image montage
output_url = process_message(message, s3_output_bucket, s3_endpoint, job_id)
# Sleep for a while to simulate a heavy workload
# (Otherwise the queue empties too fast!)
time.sleep(15)
output_message = "Output available at: %s" % (output_url)
# Write message to output queue
write_output_message(output_message, output_queue)
info_message(output_message)
info_message("Image processing completed.")
# Delete message from the queue
input_queue.delete_message(raw_message)
time.sleep(5)
##############################################################################
# Process a newline-delimited list of URls
##############################################################################
def process_message(message, s3_output_bucket, s3_endpoint, job_id):
try:
output_dir = "/home/ec2-user/jobs/%s/" % (job_id)
# Download images from URLs specified in message
for line in message.splitlines():
info_message("Downloading image from %s" % line)
os.system("wget -P %s %s" % (output_dir, line))
output_image_name = "output-%s.jpg" % (job_id)
output_image_path = output_dir + output_image_name
# Invoke ImageMagick to create a montage
os.system("montage -size 400x400 null: %s*.* null: -thumbnail 400x400 -bordercolor white -background black +polaroid -resize 80%% -gravity center -background black -geometry -10+2 -tile x1 %s" % (output_dir, output_image_path))
# Write the resulting image to s3
output_url = write_image_to_s3(output_image_path, output_image_name, s3_output_bucket, s3_endpoint)
# Return the output url
return output_url
except:
error_message("An error occurred. Please show this to your class instructor.")
error_message(sys.exc_info()[0])
##############################################################################
# Write the result of a job to the output queue
##############################################################################
def write_output_message(message, output_queue):
m = RawMessage()
m.set_body(message)
status = output_queue.write(m)
##############################################################################
# Write an image to S3
##############################################################################
def write_image_to_s3(path, file_name, s3_output_bucket, s3_endpoint):
# Connect to S3 and get the output bucket
s3 = boto.connect_s3(host=s3_endpoint)
output_bucket = s3.get_bucket(s3_output_bucket)
# Create a key to store the instances_json text
k = Key(output_bucket)
k.key = "out/" + file_name
k.set_metadata("Content-Type", "image/jpeg")
k.set_contents_from_filename(path)
k.set_acl('public-read')
# Return a URL to the object
return "https://%s.s3.amazonaws.com/%s" % (s3_output_bucket, k.key)
##############################################################################
# Verify S3 bucket, create it if required
##############################################################################
def create_s3_output_bucket(s3_output_bucket, s3_endpoint, region_name):
# Connect to S3
s3 = boto.connect_s3(host=s3_endpoint)
# Find any existing buckets starting with 'image-bucket'
buckets = [bucket.name for bucket in s3.get_all_buckets() if bucket.name.startswith('image-bucket')]
if len(buckets) > 0:
return buckets[0]
# No buckets, so create one for them
name = 'image-bucket-' + str(uuid.uuid4())
s3.create_bucket(name, location=region_name)
return name
##############################################################################
# Use logging class to log simple info messages
##############################################################################
def info_message(message):
logger.info(message)
def error_message(message):
logger.error(message)
##############################################################################
# Generic stirng logging
##############################################################################
class Logger:
def __init__(self):
#self.stream = StringIO.StringIO()
#self.stream_handler = logging.StreamHandler(self.stream)
self.file_handler = logging.FileHandler('/home/ec2-user/image_processor.log')
self.log = logging.getLogger('image-processor')
self.log.setLevel(logging.INFO)
for handler in self.log.handlers:
self.log.removeHandler(handler)
self.log.addHandler(self.file_handler)
def info(self, message):
self.log.info(message)
def error(self, message):
self.log.error(message)
logger = Logger()
if __name__ == "__main__":
sys.exit(main())
```
- checkpoint 4-4 steps
(1) Make an AMI snapshot
(2) Create Auto Scaling Launch Configuration
(3) Create Auto Scaling Group
(4) Create alarm (use CloudWatch)
(5) 完成後,所達成的結果應該要把instance數量控制在1~5之間
imageprocessor.py要熟讀,之後作業也會用到
## Lab5 Auto-deployment & devops
- DevOps
Development和Operations是一種重視「軟體開發人員(Dev)」和「IT運維技術人員(Ops)」
- Goal:
(1) 熟悉auto deployment service Elastic Beanstalk
(2) Use the service to deploy frontend and backend system
- Working items:
Part 1: Introduction of Elastic Beanstalk
Part 2: Deploy a web application using EB
Part 3: Deploy a worker application using EB
Part 4: EB CLI
### Part 1: Introduction of EB
- What is Elastic Beanstalk
It allows you quickly deploy and manage applications in the AWS Cloud without worrying about the infrastrcture that runs those applications

- EB Components
(1) Application
(2) Application Version
(3) **Environment**
(4) Environment Configuration
(5) Configuration Template
- EB Architecture
It supports 2 types of environments
(1) Web server environment
(2) Worker environment
Only one Auto Scaling group is allowed per environment

### Part 2: Deploy a web application using EB
- Web Server Environment Tiers
The environment includes
(1) one elastic **load balancer**
(2) one **auto scaling group** for 1 or more EC2 instances
(3) A **URL aliased in Amazon Route 53** to an Elastic Load Balance URL
「透過AWS的Route 53服務,來將網域名稱設定到你的機器IP上。」
(4) The software stack running on the AWS EC2 instances is dependent on the container type
E.g.,: Apache Tomcat, Node.js, PHP, Python, etc.
(5) A **default security group** that only allows HTTP connection
- Purpose:
A **frontend server** that internet to the users or HTTP requests directly
- checkpoint 5-1 steps

(1) Create a Dynamo DB Table
Modify the configuration file
(eb-node-express/signup/app_config.json)
"STARTUP_SIGNUP_TABLE"
"AWS_REGION"
"NEW_SIGNUP_TOPIC"
--> It is encouraged to use DynamoDB instead of RDB:
Distributed & scalable NoSQL DB for high throughput, high volume data
(2) Create an SNS Topic
(3) Deploy the App
使用EB建立web application
頁面內容:可以註冊新的人,註冊完後會SNS通知email
(4) Change the App Configuration
Goal: The web app uses an environment variable, theme, to control the CSS that is applied. We will change the setting of this environment variable to change the look of the app.
(5) Clean up
### Part 3: Deploy a worker application using EB
Noted: Appication interacts with HTTP POST requests not the SQS service directly
- The environment includes
(1) one auto scaling group for launching EC2 instances
(2) the software stack of a chosen container type for application
(3) **An Amazon SQS queue** for receiving messages from users
(4) A **daemon** on **each EC2 instance** for **polling requests** from an Amazon SQS queue and then **sending the data to the application** in the worker environment tier by sending an **HTTP POST request locally** to http://localhost/ with the contents of the queue message in the body
(5) **A Dead Letter Queue** for holding the message that could not be successfully processed. 可以用來分析問題
:::info
**SQS Daemon介紹 (sqsd)**
The conversion of the SQS message to a POST request is executed by what AWS calls the "SQS Daemon" or "Sqsd".
This is a simple daemon they pre-install in the worker tier instances that is constantly monitoring an specific AWS SQS queue (provided by configuration) for new messages.
:::
- Purpose
A **batch processing backend server** for handling long running jobs.
- App architecture
(1) 有node.js寫成範例程式
(2) This application **fetches image file from remote server** and **upload it to Amazon S3**
(3) An aws-sqsd daemon polls Amazon SQS. When message found, it make HTTP POST request to node.js via nginx

- checkpoint 5-2 steps
(0) Download code. 其中包含主程式server.js
https://github.com/yuta-imai/eb-worker-sample
(1) Create a S3 bucket
(2) Deploy the App
(3) Check the correctness of deployment
做完後,SQS會有兩個new queue
其中一個是dead letter queue
選擇worker queue,send a message containing picture
傳完訊息之後,S3就會收到圖片
body
Message itself.
If you pass '{"url":"some_url"}',
it will come up as string, so you can use it by parsing as JSON to map.
- server.js (call library/package)
```javascript=
var aws = require('aws-sdk'),
http = require('http'),
winston = require('winston'),
url = require('url'),
mime = require('mime');
```
1. aws 為了s3使用
2. http 為了建立server to upload image
3. winston is designed to be a simple and universal logging library with support for multiple transports.
4. The url module provides utilities for URL resolution and parsing.
5. MIME的全称是"Multipurpose Internet Mail Extensions"
简单点说,mime是一个互联网标准,通过设定它就可以设定文件在浏览器的打开方式。
- server.js (some variable and logger)
```javascript=
var awsRegion = 'ap-northeast-1',
bucketName = process.env.BUCKET,
re_mime_matcher = /image/gi,
re_filename_extractor = /(\w+\.(png|jpg|gif))$/gi,
logfile = 'server.log';
var logger = new winston.Logger({
transports: [
new winston.transports.File({ filename: logfile})
]
});
```
In line 4:
regex filename
$ --> 匹配輸入的結尾
\w --> 包含數字字母與底線,等同於[A-Za-z0-9_]
- server.js (createServer, returnResponse, validateUrl)
```javascript=
http.createServer(function(req,res){
req.on('data',function(data){
try{
var hash = JSON.parse(data.toString());
}catch(e){
returnResponse(res,400,logger,'Invalid json: ' + data.toString());
}
if(!hash.url || !validateUrl(hash.url)){
returnResponse(res,400,logger,'Invalid url: ' + JSON.parse(hash));
}
fetchImage(hash.url,function(error,fetchResult){
if(error){
returnResponse(res,400,logger,'Could not fetch image');
}
uploadImageToS3(
bucketName,
fetchResult.filename,
fetchResult.body,
function(error,filename){
if(error) {
returnResponse(res,400,logger, 'Could not upload image to S3: ' + JSON.stringify(error));
}
returnResponse(res,200,logger,'uploaded: ' + filename) ;
}
);
});
});
}).listen(process.env.PORT || 3000);
function returnResponse(httpResponse, status, logger, message){
if(status === 200){
logger.info(message);
}else{
logger.error(message);
}
httpResponse.writeHead(status);
httpResponse.write(message);
httpResponse.end();
}
function validateUrl(string){
var result = url.parse(string);
if(result.hostname){
return true;
}else{
return false;
}
}
```
Line 1:
自訂一個 function 去接收 request 以及輸出 response
Line 3:
req.on
The request object that's passed in to a handler implements the ReadableStream interface.
This stream can be listened to or piped elsewhere just like any other stream.
We can grab the data right out of the stream
by listening to the stream's 'data' and 'end' events.
Line 5:
Because it's binary data, you need to convert it to a string
先將data to string
在把data 轉成json格式
Line 10:
接下來就能用.url等方式,取出相對應的資訊做判斷
hash.url 可以取得url,並丟入fetchImage
Line 14:
fetchImage(hash.url,function(error,fetchResult){}
function(error,fetchResult) is the callback function of fetchimage
如果fetchImage沒有error的話,其返回值會傳至fetchResult
並接續進行uploadImageToS3
- server.js (fetchImage, uploadImageToS3)
```javascript=
function fetchImage(url,callback){
http.get(url,function(res){
if(!res.headers['content-type'].match(re_mime_matcher)){
callback('mime_error',null);
}
var imageData = '';
res.setEncoding('binary');
res.on('data',function(chunk){
imageData += chunk;
});
res.on('end',function(){
var filename = extractFileName(url);
var buf = new Buffer(imageData,'binary');
callback(null,{filename:filename,body:buf})
});
});
}
function uploadImageToS3(bucketName,filename,body,callback){
var s3 = new aws.S3({region:awsRegion});
var params = {
ACL: 'public-read',
Body: body,
Bucket: bucketName,
ContentLength: body.length,
ContentType: mime.lookup(filename),
Key: filename
};
s3.putObject(params,function(error,res){
if(error)callback(error,null);
else callback(null,filename);
});
}
```
Line 10:
chunk is a Buffer, which is Node's way of storing binary data.
Because it's binary data, you need to convert it to a string
- server.js (extractFileName)
```javascript=
function extractFileName(string){
var filename = string.match(re_filename_extractor);
return filename[0];
}
```
The string.match() is an inbuilt function in JavaScript
which is used to search a string for a match against a any regular expression
return filename[0]; --> 取出第一個match的