Try   HackMD

AWS架站 專題練習(二)延伸題

建立 AWS EC2 實例:登入 AWS 控制台,建立一個 EC2 實例作為爬蟲程式運行的伺服器。

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

安裝相關套件:在 EC2 實例上安裝 Python、相應的套件(如 BeautifulSoup、mysql-connector-python)以及所需的資料庫軟體(如 MySQL)。

sudo yum update
sudo yum install python3-pip
sudo su
pip3 install newsapi-python
pip3 install mysql-connector-python
pip3 install pymysql
cd /home/
sudo vim test.py

Python程式碼

​​​​import pymysql
​​​​from newsapi import NewsApiClient
​​​​from datetime import datetime, timedelta

​​​​end_date = datetime.now().date()
​​​​start_date = end_date - timedelta(days=31)

​​​​start_date_str = start_date.strftime("%Y-%m-%d")
​​​​end_date_str = end_date.strftime("%Y-%m-%d")

​​​​# 初始化客戶端
​​​​newsapi = NewsApiClient(api_key="6974662d392d421b9b206c87914b5a91")

​​​​# 建立資料庫連線
​​​​db = pymysql.connect(
​​​​        host="localhost", user="root", password="Aaa123456*", database="test"
​​​​        )
​​​​print("進入資料庫 test")

​​​​cursor = db.cursor()



​​​​# 搜尋新聞
​​​​response = newsapi.get_everything(
​​​​            q='台灣',
​​​​            language='zh',
​​​​            sort_by='publishedAt',
​​​​            page_size=100,
​    		domains='udn.com,ettoday.net,storm.mg,chinatimes.com')

​​​​# 顯示篩選後的報導標題和內容
​​​​for article in response["articles"]:
​​​​        print('標題:', article['title'])
​​​​        print('內容:', article['content'])
​​​​        print('---')
​	
​	
​​​​cursor.execute("show tables")
​​​​table_exists = False
​​​​tables = cursor.fetchall()
​​​​for table in tables:
​​​​    if 'news' in table[0]:
​​​​        table_exists = True
​​​​        break

​​​​# 顯示結果
​​​​if table_exists:
​​​​    print("資料表 'news' 已存在")
​	
​​​​else:
​​​​    print("資料表 'news' 不存在")
​    # 執行 SQL 建立資料表
​    create_table_query = """
​    CREATE TABLE news (
​​    id INT AUTO_INCREMENT PRIMARY KEY,
​​    author VARCHAR(255),
​​    title VARCHAR(500),
​    description VARCHAR(1000),
​​    url VARCHAR(1000),
​​    url_to_image VARCHAR(1000),
​​    published_at VARCHAR(255),
​​    content VARCHAR(255)
​     )
​    """
​    cursor.execute(create_table_query)
​​​​# 插入新聞資料
​​​​for article in response["articles"]:
​​​​    source_name = article["source"]["name"]
​​​​    author = article["author"]
​​​​    title = article["title"]
​​​​    description = article["description"]
​​​​    url = article["url"]
​​​​    urlToImage = article["urlToImage"]
​​​​    publishedAt = article["publishedAt"]
​​​​    content = article["content"]

​​​​    # 執行插入資料的 SQL 語句
​​​​    query = "INSERT INTO news (author, title, description, url, url_to_image, published_at, content) VALUES (%s, %s, %s, %s, %s, %s, %s)"
​​​​    values = (author, title, description, url, urlToImage, publishedAt, content)
​​​​    cursor.execute(query, values)

​​​​# 提交變更並關閉資料庫連線
​​​​db.commit()
​​​​db.close()

部署爬蟲程式:將專題二的爬蟲程式上傳到 EC2 實例上,可以使用 SCP 或 SFTP 進行檔案傳輸。

執行結果

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

設定排程:使用 cron 或其他排程工具,在 EC2 實例上設定每日上午 6:00 啟動爬蟲程式。您可以編輯 cron 設定檔(crontab -e)並新增一行指令,例如:

sudo yum install cronie
sudo systemctl start crond
sudo systemctl enable crond

crontab -e

​​​​0 6 * * * python3 /home/test.py

sudo systemctl restart crond

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

請確保您的使用者帳戶擁有足夠的權限來存取該目錄和創建檔案。不然無法儲存json檔:

sudo su
python3 test.py

執行結果

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

SQL部分

若要在 Amazon Linux 上安裝 MariaDB 最新版本:
https://rpmfind.net/linux/rpm2html/search.php?query=mariadb-server

​​​​wget  https://rpmfind.net/linux/openmandriva/cooker/repository/aarch64/main/release/mariadb-server-10.10.4-1-omv2390.aarch64.rpm
​​​​yum -y install mariadb-server-10.10.4-1-omv2390.aarch64.rpm

安裝Mysql8.0版本
最新版本官網
wget https://dev.mysql.com/get/mysql80-community-release-el9-1.noarch.rpm
yum -y install mysql80-community-release-el9-1.noarch.rpm
yum -y install mysql-community-server
ps -ef | grep mysql
啟動
service mysqld start
重啟
service mysqld restart

當我們安裝完畢mysql後,系統會默認生成隨機密碼,我們可以通過以下命令查看,找到初次登錄需要的密碼。

​​​​more /var/log/mysqld.log | grep password
​​​​mysql -uroot -pXXXX

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

登錄之後如果我們想要進行操作,首先要修改密碼。需要注意的是密碼不能設置的太簡單,否者會報錯。密碼規則:至少八位,包括大小寫和字符。

​​​​ALTER USER 'root'@'localhost' IDENTIFIED BY 'Aaa123456*';

補充:編輯 MySQL 的設定檔。通常位於 /etc/my.cnf 或 /etc/mysql/my.cnf。
在 [mysqld] 區塊中新增或修改以下行:

​​​​validate_password=OFF

重啟

​​​​service mysqld restart

sudo systemctl start mariadb
sudo mysql

​​​​GRANT ALL PRIVILEGES ON *.* TO 'admin'@'localhost' IDENTIFIED BY '123456' WITH GRANT OPTION;
​​​​FLUSH PRIVILEGES;
​​​​CREATE DATABASE test;:創建名為 ‘test’ 的資料庫。
​​​​SHOW DATABASES;    //查看資料庫

https://hackmd.io/@9P41NF5dT8C2W8JEN4RMTQ/r1EmjYOL3#在AWS佈署專題二的爬蟲,並且需設定排程,於每日上午600啟動並存入到資料庫

異地備援

  • 請重建架設在雲端的爬蟲於本地端,並設定雲端爬蟲同時會存入一份資料到地端資料庫

因為在教室需要內網穿透套件,所以我們要先下載這個套件

ngork官網下載地點~

以下是安裝程序注意路徑問題

​​​​sudo wget https://bin.equinox.io/c/bNyj1mQVY4c/ngrok-v3-stable-linux-amd64.tgz

​​​​sudo tar xvzf ngrok-v3-stable-linux-amd64.tgz -C /usr/local/bin 

之後登入ngork官網,就會自動帶入你的 ‘token’

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

​​​​ngrok config add-authtoken 你的token
​​​​#網頁上會幫你寫好直接複製

​​​​#接著系統會回覆已經儲存至什麼路徑
​​​​Authtoken saved to configuration file: /home/acero/.config/ngrok/ngrok.yml

​​​​#再輸入: (這裡連接到你的資料庫)
​​​​ngrok tcp 3306

之後確認畫面正確~

這就是你的外部連接的 hostname 和 port

​​​​Forwarding                    tcp://0.tcp.ap.ngrok.io:17046 -> localhost:3306

資料庫設定

這裡使用MySQL
sudo apt-get install mysql-server

需要更改你的連線成允許外部連線

​​​​#先找到設定檔位置:
​​​​sudo grep -r bind-address /etc/mysql/*

sudo vim /etc/mysql/mysql.conf.d/mysqld.cnf
將bind-address 改成 0.0.0.0

建立外部連線使用者

sudo vim /etc/mysql/debian.cnf //找到初始帳密登入

​​​​CREATE USER '使用者'@'%' IDENTIFIED BY '密碼';
​​​​GRANT ALL PRIVILEGES ON *.* TO '使用者'@'%';
​​​​FLUSH PRIVILEGES;
​​​​上面那個%的意思就是可以讓外部連線直接進入本地端資料庫囉

測試如何從雲端(外部)連接到你的資料庫囉!

如果可以從遠端連入就可以囉!

​​​​mysql -h 0.tcp.jp.ngrok.io -P 10561 -u sean -p

測試程式碼 - 成功!

​​​​import pymysql

​​​​List = [("Alice", 55), ("Bob", 75), ("Emma", 85), ("John", 95)]

​​​​# 檢查資料庫 NTC 是否存在
​​​​# 如果不存在,建立一個
​​​​# 如果存在,DROP 他,重新創建一個
​​​​# 替這個 NTC 新增一個表單

​​​​db = pymysql.connect(user="使用者", password="密碼",host="0.tcp.jp.ngrok.io",port=17046)

​​​​cursor = db.cursor()
​​​​cursor.execute("show databases like 'ntc'")
​​​​result = cursor.fetchone()

​​​​if result is None:
​​​​    print("創造一個資料庫 ntc")
​​​​    cursor.execute("create database ntc")
​​​​    cursor.execute("show databases")
​​​​else:
​​​​    print("發現存在,刪除並重新建立ntc")
​​​​    cursor.execute("drop database ntc")
​​​​    cursor.execute("create database ntc")
​​​​    cursor.execute("show databases")
​​​​db = pymysql.connect(
​​​​user="使用者", password="密碼",host="0.tcp.jp.ngrok.io",port=17046, database="ntc"
​​​​)

​​​​cursor = db.cursor()

​​​​# 替這個 NTC 新增一個表單
​​​​cursor.execute("create table score(name varchar(10),value int)")
​​​​cursor.execute("show tables")

​​​​# 將這幾筆資料存入 Table 內
​​​​sql = "INSERT INTO score (name, value) VALUES (%s,%s)"
​​​​cursor.executemany(sql, List)
​​​​db.commit()

​​​​print(cursor.rowcount, "OK !")

​​​​# 移除不及格的學生,他下學期不用來了
​​​​sql = "DELETE FROM score WHERE value < 60 "
​​​​cursor.execute(sql)
​​​​db.commit()

​​​​# 查詢
​​​​cursor.execute("SELECT * FROM score")
​​​​result = cursor.fetchall()

​​​​for x in result:
​​​​    print(x)