<style>
.new {
color: red;
font-weight: bold;
}
</style>
# Python requests
###### tags: `presentation` `sprout`
資訊之芽 熊育霆 2022/05/22
---
大家先把這個 repo 下載下來,等等會用到
https://github.com/bearomorphism/sprout-request
```bash=
git clone https://github.com/bearomorphism/sprout-request.git
```
如果你想要當成自己的專案做 commit 或 push,記得 fork 一下
---
## 爬蟲 crawler
什麼是爬蟲?
* [爬蟲 wiki](https://zh.wikipedia.org/wiki/%E7%B6%B2%E8%B7%AF%E7%88%AC%E8%9F%B2)
----
### 爬蟲可以做什麼
* 網路上有許多網頁,裡面有很多很有價值的知識
* https://userinyerface.com/
* 例如這個 https://longdogechallenge.com/
* 又例如這個 http://www.partridgegetslucky.com/
* 還有這個 https://thatsthefinger.com/
* 我們可以手動去擷取這些網頁的內容
----
* 但是有價值的知識真的太多了,沒辦法一頁一頁手動擷取
* 例如 http://eelslap.com/
* 還有 https://matias.ma/nsfw/
* 這邊還有很多 https://theuselessweb.com/
* 最重要的是,我們不能忘記那個辛苦端火鍋的夥伴 https://www.youtube.com/watch?v=dQw4w9WgXcQ
有辦法寫個程式去把這些非常有價值的資訊蒐集起來嗎?
----
上面兩張是從去年的投影片抄的
----
### 爬蟲可以做的事情
* 正經:大數據分析、統計
* 不正經:
* 自動下載迷因
* 搶票機器人
* 下載漫畫
* 下載各種神奇的東西
* 我們對各位的特殊癖好沒有興趣
大家可以把自己寫的爬蟲機器人分享在 github 和 discord
----
其實上網找也可以找到一些別人寫好的機器人來參考學習
![](https://i.imgur.com/904KocZ.png)
----
結合下週要教 BS4 和 Selenium 比較容易寫爬蟲
這堂課算是一個入門(?
---
## Set up
[安裝 python requests](https://docs.python-requests.org/en/latest/user/install/#install)
就跟以前安裝其他套件一樣
把下面這串貼到你的 terminal/小黑窗
```bash
pip install requests
```
----
![](https://i.smalljoys.me/2018/08/screen-shot-2018-08-27-at-4-48-11-pm.png)
---
## Recall: what are requests?
根據 [IBM Documentation](https://www.ibm.com/docs/en/cics-ts/5.3?topic=protocol-http-requests)
> An HTTP request is made by a client, to a named host, which is located on a server. The aim of the request is to access a resource on the server.
----
這是中文翻譯(我國文11級分,翻太爛請見諒)
> 一個**HTTP 請求**是由**用戶端**製造並且送到一個位於**伺服器**的**主機**。一個請求的目的是要存取伺服器上的**資源**。
----
專業術語小教室
* HTTP request: HTTP 請求
* client: 用戶端/客戶
* server: 伺服器
* host: 主機
* resource: 資源
----
你在上網時會一直跟伺服器要資料,像是HTML之類的東西。網頁是藉由瀏覽器把HTML變成人看得懂的東西。
如果你吃飽太閒,可以看一下[這篇文章](https://github.com/alex/what-happens-when)
----
Recall: [HTTP request methods](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods)
其實上週也有教,簡單來說就是
* GET 從伺服器拿資料
* POST 把資料給伺服器
----
不過你也可以用 GET 把資料給伺服器、用 POST 從伺服器拿資料,很奇怪吧?
如果你有點進去上一頁的文件看的話,會發現還有其他 HTTP request methods,但是我們這堂課只會用到 GET 和 POST
----
Questions?
---
## Python Requests Module
如果你偏好看官方文件的話,所有的官方文件都在這裡了
[English version](https://docs.python-requests.org/en/latest/)
[中文版](https://docs.python-requests.org/zh_CN/latest/)
----
Requests is an elegant and simple HTTP library for Python, built for human beings.
Requests 是一個優雅、簡單的 Python HTTP 函式庫,為了人類而設計的
Requests 唯一的一个非转基因的 Python HTTP 库,人类可以安全享用。
---
## How to use `requests`
[xkcd](https://xkcd.com/) <- 練習會用到喔
----
建議大家開一個 ipython notebook 跟著練習
或是打開稍早 clone 下來的 repo
----
Set up
```python=
import requests
url = 'https://xkcd.com/'
res = requests.get(url) # 用 get 去拿網站的資料
```
----
這裡變數取 res 是 response 的意思,簡單來說就是你發送請求以後伺服器送回來給你的東西
![](https://i.imgur.com/Y4hsjk7.png)
----
requests
![](https://i.imgur.com/rV8HgGz.png)
----
印出 res 看看
```python=
import requests
url = 'https://xkcd.com/'
res = requests.get(url)
print(res) # <Response [200]>
```
----
看看 res 有哪些屬性可以用
```python=
import requests
url = 'https://xkcd.com/353/'
res = requests.get(url)
print(dir(res)) # <Response [200]>
```
```python
['__attrs__', '__bool__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__nonzero__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_content', '_content_consumed', '_next', 'apparent_encoding', 'close', 'connection', 'content', 'cookies', 'elapsed', 'encoding', 'headers', 'history', 'is_permanent_redirect', 'is_redirect', 'iter_content', 'iter_lines', 'json', 'links', 'next', 'ok', 'raise_for_status', 'raw', 'reason', 'request', 'status_code', 'text', 'url']
```
----
看說明
```python
help(r)
```
---
### 偷看 res 的屬性
自己執行看看
```python=
print(res.status_code)
print(res.text)
print(res.content)
print(res.ok)
print(res.url)
```
----
```python=
print(res.status_code)
print(res.text) # content
print(res.content) # binary content
print(res.ok) # O 不 OK
print(res.url)
```
---
### 下載圖片
我們上面舉例的 xkcd 漫畫網站,每一頁的圖片下方都有附圖片連結
https://imgs.xkcd.com/comics/python.png
直接貼到瀏覽器網址列上就會顯示圖片
----
貼到網址列後會看到這個
![](https://imgs.xkcd.com/comics/python.png)
----
在下載圖片之前我們先來認識一個好康的函式庫
**[pathlib](https://docs.python.org/3/library/pathlib.html)**
```python=
# 建立一個 images 資料夾
from pathlib import Path
Path("./images").mkdir(parents=True, exist_ok=True)
# Reference: https://stackoverflow.com/questions/273192/how-can-i-safely-create-a-nested-directory
```
----
用 requests 下載圖片
```python=
import requests
# 建立一個 images 資料夾
from pathlib import Path
Path("./images").mkdir(parents=True, exist_ok=True)
# Reference: https://stackoverflow.com/questions/273192/how-can-i-safely-create-a-nested-directory
url = 'https://imgs.xkcd.com/comics/python.png'
res = requests.get(url)
with open(Path("./images/python.png"), "wb") as f:
f.write(res.content)
```
---
## [API](https://zh.wikipedia.org/wiki/%E5%BA%94%E7%94%A8%E7%A8%8B%E5%BA%8F%E6%8E%A5%E5%8F%A3)
我們先來了解一下 API 是什麼
----
用一張圖解釋
![](https://img-comment-fun.9cache.com/media/aAYBA2o/aqxMnkn4_700w_0.jpg)
----
* 前端 frontend: 你看得到的東西,會在你的電腦上(用戶端)跑的程式碼,例如渲染後的網頁等等
* HTML, JS, CSS
* 題外話,[HTML是一種程式語言嗎?](https://www.google.com/search?q=Is+html+a+programming+language&oq=Is+html+a+programming+language&aqs=chrome..69i57.10829j0j7&sourceid=chrome&ie=UTF-8)
* 後端 backend: 你看不到的東西,會在伺服器上跑的程式碼,像是資料庫
* 可以用很多其他語言寫,Python, NodeJs, C/C++, PHP, Go ...
* API: 負責前端與後端溝通的界面,簡單來說就是有服務生/小精靈在你的電腦和伺服器之間班資料
----
xkcd 的 api 說明
https://xkcd.com/json.html
----
https://xkcd.com/614/info.0.json
```json=
{
"month": "7",
"num": 614,
"link": "",
"year": "2009",
"news": "",
"safe_title": "Woodpecker",
"transcript": "[[A man with a beret and a woman are standing on a boardwalk, leaning on a handrail.]]\nMan: A woodpecker!\n<<Pop pop pop>>\nWoman: Yup.\n\n[[The woodpecker is banging its head against a tree.]]\nWoman: He hatched about this time last year.\n<<Pop pop pop pop>>\n\n[[The woman walks away. The man is still standing at the handrail.]]\n\nMan: ... woodpecker?\nMan: It's your birthday!\n\nMan: Did you know?\n\nMan: Did... did nobody tell you?\n\n[[The man stands, looking.]]\n\n[[The man walks away.]]\n\n[[There is a tree.]]\n\n[[The man approaches the tree with a present in a box, tied up with ribbon.]]\n\n[[The man sets the present down at the base of the tree and looks up.]]\n\n[[The man walks away.]]\n\n[[The present is sitting at the bottom of the tree.]]\n\n[[The woodpecker looks down at the present.]]\n\n[[The woodpecker sits on the present.]]\n\n[[The woodpecker pulls on the ribbon tying the present closed.]]\n\n((full width panel))\n[[The woodpecker is flying, with an electric drill dangling from its feet, held by the cord.]]\n\n{{Title text: If you don't have an extension cord I can get that too. Because we're friends! Right?}}",
"alt": "If you don't have an extension cord I can get that too. Because we're friends! Right?",
"img": "https://imgs.xkcd.com/comics/woodpecker.png",
"title": "Woodpecker",
"day": "24"
}
```
我們來觀察一下上面的 json ,其中有一個欄位提供了圖片的網址
----
結合 requests
```python=
import requests
from pathlib import Path
Path("./images").mkdir(parents=True, exist_ok=True)
url = 'https://xkcd.com/614/info.0.json'
res = requests.get(url)
res_json = res.json() # returns a dictionary
print(res_json)
img_url = res_json['img']
print(img_url)
img_name = img_url.split('/')[-1]
with open(Path(f"./images/{img_name}"), "wb") as f:
img_res = requests.get(img_url)
f.write(img_res.content)
```
----
可以用這兩個網站來玩玩
* [JSONPlaceholder - Free Fake REST API](https://jsonplaceholder.typicode.com/)
* [Lorem Picsum](https://picsum.photos/)
----
大多數網站的 api 都不會隨便讓你亂打,你會需要金鑰
因為 set up 比較麻煩,需要申請 token 或帳號之類的
講完課有時間再回來講
---
## Practice (5 min)
打開 xkcd-api-practice.py 開始練習
---
## 進階用法
接下來的東西比較雜,下週爬蟲可能需要使用
先打開這個網站
https://httpbin.org/
----
### GET query
[Query string - Wikipedia](https://en.wikipedia.org/wiki/Query_string)
----
Recall: Query string 是問號後面那串
![](https://i.imgur.com/Ep2vvNN.png)
----
https://httpbin.org/get
https://httpbin.org/get?a=1&rick=roll
----
```python=
import requests
url = 'https://httpbin.org/get?a=1&rick=roll'
res = requests.get(url)
print(res.text)
```
----
更安全的用法
```python=
import requests
payload = {'a': 1, 'rick': 'roll'}
url = 'https://httpbin.org/get'
res = requests.get(url, params=payload)
print(res.text)
print(res.url) # https://httpbin.org/get?a=1&rick=roll
```
---
### POST
```python=
import requests
payload = {'a': 1, 'rick': 'roll'}
res = requests.post('https://httpbin.org/post', data=payload)
print(res.text)
```
----
```json
{
...
"form": {
"a": "1",
"rick": "roll"
},
...
}
```
---
Any Questions?
![](https://i.imgur.com/w2xd7oj.png)
---
## 雜七雜八
* [系上朋友寫的長輩圖機器人](https://github.com/superr0ng/LINEtools)
* [API List: A public list of free APIs for programmers](https://apilist.fun/)
---
## 參考資料
* [Python Requests Tutorial: Request Web Pages, Download Images, POST Data, Read JSON, and More - YouTube](https://www.youtube.com/watch?v=tb8gHvYlCFs&t=1266s&ab_channel=CoreySchafer)
* [去年投影片](https://drive.google.com/file/d/1j__HsD32Tn987a5w2NwkS8KjObVziemh/view)
---
End
{"metaMigratedAt":"2023-06-17T00:57:09.276Z","metaMigratedFrom":"YAML","title":"Python requests","breaks":true,"slideOptions":"{\"transition\":\"fade\"}","contributors":"[{\"id\":\"f93c8d2e-91fa-44cf-b9d2-ea6d875fcb79\",\"add\":10512,\"del\":671}]"}