專題練習(二)

使用 python(請使用: from newsapi import NewsApiClient 的語法取資料,不要用 requests.get),
控制 newsapi 裡面的選項取得報導,且報導須滿足以下要求:
1. 標題或報導內容一定要有武漢肺炎四個字,且一定不能出現外遇兩個字
2. 爬出來的新聞只能來自 ETtoday , 風傳媒, 中國時報, 聯合新聞網
3. 由新到就排序
4. 一頁呈現 100 篇報導

import json
from newsapi import NewsApiClient

# 初始化 NewsApiClient
newsapi = NewsApiClient(api_key='6974662d392d421b9b206c87914b5a91')

# 搜尋新聞(指定四個網站搜尋)
response = newsapi.get_everything(
    q='台灣',
    language='zh',
    sort_by='publishedAt',
    page_size=100,
    domains='udn.com , ettoday.net,storm.mg,chinatimes.com'
    #若想用API預設的所有新聞網，刪掉domains這一行
)

# 篩選報導
articles = []
for article in response['articles']:
    if '外遇' not in article['title'] or '外遇' not in article['content']:
        articles.append(article)
    elif '外遇' in article['title'] or '外遇' in article['content'] :
        articles.append(article)
# 顯示篩選後的報導標題和內容
for article in articles:
    print('標題:', article['title'])
    print('內容:', article['content'])
    print('---')

# 將結果存為 JSON 檔案
with open('news.json', 'w', encoding='utf-8') as file:
    json.dump(articles, file, ensure_ascii=False, indent=4)

執行結果

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

加分題

import requests
import json

# 設定 API 端點 URL
url = "https://newsapi.org/v2/everything"

# 設定請求的參數
params = {
    "q": "台灣",
    "apiKey": "6974662d392d421b9b206c87914b5a91",  # Replace with your actual API key
    "pageSize": 100,
    "language": "zh",
    "sortBy": "publishedAt",
    "domains": "udn.com, ettoday.net,storm.mg,chinatimes.com"
}

# 發送 GET 請求
response = requests.get(url, params=params)

# 檢查請求是否成功
if response.status_code == 200:
    # 將回應轉換為 JSON 格式
    data = response.json()

    # 根據條件過濾文章
    articles = []
    for article in data["articles"]:
        if '外遇' not in article['title'] or '外遇' not in article['content']:
            articles.append(article)
        elif '外遇' in article['title'] or '外遇' in article['content'] :
            articles.append(article)

    # 顯示過濾後的文章
    for article in articles:
        print("標題:", article["title"])
        print("內容:", article["content"])
        print("---")

    # 將結果存為 JSON 檔案
    with open("news_requests.json", "w", encoding="utf-8") as file:
        json.dump(articles, file, ensure_ascii=False, indent=4)
else:
    print("Error occurred:", response.status_code)

執行結果

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

專題練習(三)

爬蟲練習 3
請把下方網頁的所有英文句子(EX. You bet!)(只要文字)用 SLENIUM 抓下來,
且爬取的時候不顯示螢幕,並把爬下來的語料存成三種形式

txt
直接串資料庫存入
pickle (課程中沒有此部分,請自我突破)

python程式碼

import requests
from bs4 import BeautifulSoup

# 發送請求並取得網頁內容
url = 'https://gogakuru.com/english/phrase/genre/180_%E5%88%9D%E7%B4%9A%E3%83%AC%E3%83%99%E3%83%AB.html?layoutPhrase=1&orderPhrase=1&condMovie=0&flow=enSearchGenre&condGenre=180&perPage=50'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# 找到所有的句子元素
sentence_elements = soup.find_all('span', class_='font-en')

# 提取英文句子並存入列表
sentences = [element.text.strip() for element in sentence_elements]

# 存成 txt 檔案
with open('./english_sentences.txt', 'w', encoding='utf-8') as file:
    for sentence in sentences:
        file.write(sentence + '\n')

# 存成 pickle 檔案
import pickle
with open('./sentences.pickle', 'wb') as file:
    pickle.dump(sentences, file)

執行後可以得到sentences.pickle及english_sentences.txt

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

VirtualBox練習

在 Linux 系統中，有一些資料夾是允許一般使用者（非 root 使用者）進行存取和執行操作的。以下是其中一些常見的資料夾：

/home：用於存放使用者的個人目錄，每個使用者都有自己的個人目錄，一般使用者對自己的個人目錄具有完全的存取權限。
/tmp：用於暫存檔案的目錄，一般使用者可以在其中創建、編輯和刪除暫存檔案。
/var/tmp：另一個暫存檔案目錄，一般使用者可以在其中創建、編輯和刪除暫存檔案。
/var/log：用於存放系統日誌檔案的目錄，一般使用者可以查看一些特定的日誌檔案，但通常無法編輯或刪除。
/usr/local：用於安裝本地軟體的目錄，一般使用者可以在其中創建、編輯和刪除他們自己的應用程式和資源。

第二部目前使用VirtualBox Linux建立資料庫在使用python程式創建資料表並匯入資料。

cd /var/lib/：進入 /var/lib/ 資料夾。
sudo vim test.py：使用 Vim 文字編輯器創建或編輯 test.py 檔案。
sudo apt update：更新軟體套件清單。
sudo apt install mariadb-server：安裝 MariaDB 伺服器。
sudo apt-get install python3-pip：安裝 Python 3 的 pip 套件管理工具。
sudo systemctl start mariadb：啟動 MariaDB 伺服器。

ps -ef | grep mysql：列出正在執行的 MySQL 進程。

SQL部分

sudo mysql：以 root 權限進入 MySQL 伺服器。

GRANT ALL ON *.*  TO 'admin'@'localhost' IDENTIFIED BY '123456' WITH GRANT OPTION;：授予 'admin'@'localhost' 使用者所有權限並設定密碼為 '123456'。
FLUSH PRIVILEGES;：重新載入權限設定。
CREATE DATABASE test;：創建名為 'test' 的資料庫。
SHOW DATABASES;：顯示所有資料庫。
status;：顯示 MariaDB 伺服器的狀態資訊。

這些指令旨在設定並啟動 MariaDB 伺服器，並在其中創建一個名為 'test' 的資料庫。您可以根據需要對這些指令進行修改和擴展。

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

sudo apt install net-tools：安裝 net-tools 套件，這個套件提供了網路工具，如 ifconfig。

ifconfig：顯示網路接口的配置和狀態。
sudo vim test.py：使用 Vim 文字編輯器創建或編輯 test.py 檔案。由於您將檔案存放在 /var/lib/ 目錄下，因此需要使用 sudo 權限。

sudo su：切換到 root 使用者。
ip addr show：顯示網路接口的配置和狀態。
pip3 install beautifulsoup4：使用 pip3 安裝 beautifulsoup4 套件，這是一個用於解析 HTML 和 XML 的套件。

pip3 install mysql-connector-python：使用 pip3 安裝 mysql-connector-python 套件，這是一個用於與 MySQL 數據庫進行連接和操作的套件。

vim test.py：使用 Vim 文字編輯器創建或編輯 test.py 檔案。

Python程式碼

#     =============================================================================
# import requests
# from bs4 import BeautifulSoup
# 
# # 發送請求並取得網頁內容
# url = 'https://gogakuru.com/english/phrase/genre/180_%E5%88%9D%E7%B4%9A%E3%83%AC%E3%83%99%E3%83%AB.html?layoutPhrase=1&orderPhrase=1&condMovie=0&flow=enSearchGenre&condGenre=180&perPage=50'
# response = requests.get(url)
# soup = BeautifulSoup(response.content, 'html.parser')
# 
# # 找到所有的句子元素
# sentence_elements = soup.find_all('span', class_='font-en')
# 
# # 提取英文句子並存入列表
# sentences = [element.text.strip() for element in sentence_elements]
# 
# # 存成 txt 檔案
# with open('./english_sentences.txt', 'w', encoding='utf-8') as file:
#     for sentence in sentences:
#         file.write(sentence + '\n')
# =============================================================================
import requests
from bs4 import BeautifulSoup
import mysql.connector
base_url = 'https://gogakuru.com/english/phrase/genre/180_%E5%88%9D%E7%B4%9A%E3%83%AC%E3%83%99%E3%83%AB.html?pageID='
last_url = '&perPage=50&flow=enSearchGenre&condGenre=180'
sentences = []

# 使用迴圈遍歷頁數
for page_count in range(1, 200):
    url = base_url + str(page_count) + last_url
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')

    # 使用 BeautifulSoup 解析網頁內容，找到包含英文句子的元素
    sentence_elements = soup.find_all('span', class_='font-en')
    sentences1 = [element.text.strip() for element in sentence_elements]
    sentences.extend(sentences1)

# 將句子寫入檔案
with open('./english_sentences.txt', 'w', encoding='utf-8') as file:
    for sentence in sentences:
        file.write(sentence + '\n')

# 建立資料庫連線
conn = mysql.connector.connect(
    host = '127.0.0.1',  # 主機名或 IP 地址
    user = 'admin',  # 用戶名
    password = '123456',  # 用戶密碼
    database = 'test'  # 數據庫名稱
)
cursor = conn.cursor()

# 建立資料表
cursor.execute('''CREATE TABLE IF NOT EXISTS sentences
                (id INT AUTO_INCREMENT PRIMARY KEY,
                sentence TEXT)''')

# 讀取 txt 檔案中的句子資料
with open('./english_sentences.txt', 'r', encoding='utf-8') as file:
    sentences = file.readlines()
    sentences = [sentence.strip() for sentence in sentences]

# 將句子資料插入資料庫表中
for sentence in sentences:
    cursor.execute("INSERT INTO sentences (sentence) VALUES (%s)", (sentence,))

# 提交變更並關閉資料庫連線
conn.commit()
conn.close()

從linux執行Python

python3 test.py

執行成功後
mysql
use test

查詢資料表名稱：
SHOW TABLES;
查詢資料表內容：
SELECT * FROM sentences;

專題練習(二)

執行結果

加分題

執行結果

專題練習(三)

python程式碼

VirtualBox練習

SQL部分

Python程式碼

Read more

dialog小型專案 (習慣調查表)

AWS架站 專題練習(二)延伸題

ELK筆記

mongo DB(二)

AWS架站專題練習(二)延伸題