Introduction to Python Applications 2025 - Lecture 4
===
###### tags: `Python` `Python and Its Application 2025`
## File I/O
+ `open`
+ Read text file: `FILE = open(filename,'r')`
+ Read whole file as a string: `FILE.read()`
+ Read while file as lines: `FILE.readlines()`
+ Read a single line: `FILE.readline()`
+ Is iterable: `for line in FILE:`
+ Named argument `encoding`
+ `utf8`
+ `big5`
+ `cp950`
+ [Refrence](https://docs.python.org/3/library/codecs.html)
+ Usage: `open(filename, 'r', encoding='utf8')`
+ Write text file: `FILE = open(filename,'w')`
+ Print to file: `print(things,file=FILE)`
+ Write a string: `FILE.write(string)`
+ Append text file: `FILE = open(filename,'a')`
+ Print to file: `print(things,file=FILE)`
+ Write a string: `FILE.write(string)`
+ `close`
+ `FILE.close()`
+ Almost every programmer sometimes forgets to close a file.
+ `with` block
+ Prevent you forget to close the file.
+ ```python3
with open(filename,'r') as FILE:
lines = FILE.readlines()
```
## [JSON](https://zh.wikipedia.org/wiki/JSON)
+ [Reference](https://docs.python.org/3/library/json.html#module-json)
+ `import json`
+ `dumps`
+ `json.dumps({'name':'MZ','height':177.5,'weight':115})`
+ `dump`
+ ```python3
with open(filename,'w') as FILE:
json.dump({'name':'MZ','height':177.5,'weight':115},FILE)
```
+ `loads`
+ `lst = json.loads('[1,2,3,4,5]')`
+ `load`
+ ```python3
with open(filename,'r') as FILE:
obj = json.load(FILE)
```
+ Sample
+ Download [data](https://data.gov.tw/dataset/85903) in json format and save as `animal.json`.
+ ```python3=
import webbrowser, json
with open('animal.json',encoding='utf8') as a_json:
content = json.load(a_json)
cnt = 0
for data in content:
if not data.get('animal_colour'): continue
if not data.get('animal_kind'): continue
if not data.get('album_file'): continue
if data['animal_colour'] == '黑色' and data['animal_kind'] == '狗':
webbrowser.open(data['album_file'])
cnt += 1
if cnt >= 5: break
```
## CSV
+ [Reference](https://docs.python.org/3/library/csv.html)
+ CSVs are almost broken all the time.
+ `import csv`
+ `csv.reader`
+ ```python3
import csv
with open(csvfile,'r') as FILE:
rd = csv.reader(FILE)
rows = [row for row in rd]
print(rows)
```
+ `csv.writer`
+ ```python3
import csv
with open(csvfile,'w') as FILE:
wt = csv.writer(FILE)
wt.writerow(['Name','Height','Weight'])
wt.writerow(['MZ',177.5,115])
```
+ `csv.DictReader`
+ ```python3
import csv
with open(csvfile,'r') as FILE:
rd = csv.DictReader(FILE)
rows = [row for row in rd]
print(rows)
```
+ `csv.DictWriter`
+ ```python3
import csv
with open(csvfile,'w') as FILE:
fns = ['Name','Height','Weight']
wt = csv.DictWriter(FILE,fieldnames=fns)
wt.writerow({'Name':'MZ','Height':177.5,'Weight':115})
```
## `requests`: a browser (?)
+ `import requests`: don't forget `s`
+ URL: Uniform Resource Locator
+ 這術語好長我們叫他網址好了
+ `result = requests.get('https://www.nycu.edu.tw/')`
+ 這網址好長請試著複製貼上吧
+ 如果不能使用,需要偽造成一般的瀏覽器。先前往[`http://useragentstring.com/`](http://useragentstring.com/)取得瀏覽器資訊。
+ 製作含 User-Agent 的標頭 `headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36"}`
+ 改用 `result = requests.get('https://www.nycu.edu.tw/', headers=headers)` 取網頁內容。
+ `result.text`
+ 這內容好字串:`type(result.text)`
+ String processing
+ `str`
+ `import re`: regular expression
+ `result.raise_for_status()`
+ 連網頁做了好多事情,要是網址有錯瀏覽器會直接當掉給你看嗎?
+ `raise_for_status()`: If there is any exception, raise it now.
+ Save the content
+ Open a file to save it: `the_file = open('a_name.html', 'wb')`
+ Filename: `a_name.html`
+ Mode: `wb` means "write binary"
+ `the_file.write(chunk)`: write a chunk of bytes
+ Mode: `wt` means "write text"
+ `print(some_str,file=the_file)`: write a string
+ `for chunk in result.iter_content(102400):` to iterate 102400-byte chunks of `result`
+ Remember to close the file: `the_file.close()`
+ Sample code
```python3=
import requests
url = input('Input URL: ')
result = requests.get(url)
result.raise_for_status()
name = input('input filename: ')
FILE = open(name,'wb')
for chunk in result.iter_content(102400):
FILE.write(chunk)
FILE.close()
```
+ Sample code 2: `with`-block (Probably, every body sometimes forgets to close the file.)
```python3=
import requests
url = input('Input URL: ')
result = requests.get(url)
result.raise_for_status()
name = input('input filename: ')
with open(name,'wb') as FILE:
for chunk in result.iter_content(102400):
FILE.write(chunk)
```
+ Try to download some images with the sample codes.
```python3=
import requests
def url_to_file(url,filename):
result = requests.get(url)
result.raise_for_status()
with open(filename,'wb') as FILE:
for chunk in result.iter_content(102400):
FILE.write(chunk)
```
+ URL
+ Almost everything is specified: `https://en.wikipedia.org/w/api.php?action=rsd`
+ Same protocol: `//en.wikipedia.org/w/api.php?action=rsd`
+ Same address: `/w/api.php?action=rsd`
+ Relative path: `api.php?action=rsd`
+ Task: open all URLs which end with `html` in a wikipedia page.
+ `import webbrowser` then use `webbrowser.open(URL)` to open the page
+ Task: Download an image from a webpage.
+ Hint: reuse `url_to_file` sample code.
+ Hint: find `<img` in `result.text`. You should discover there is a URL append to a `src=` nearby.
+ Bonus: Download random 3 images from a webpage
+ Bonus: Download all identifiable images from a webpage
## Open Data
+ [政府開放資料平台](https://data.gov.tw/)
+ [紫外線即時監測資料](https://data.gov.tw/dataset/6076)
+ Task: 請撰寫一個程式,透過紫外線即時監測資料,找出近三個小時中, UVI 指數前三高的地方。
+ Hint: `requests.get('http://opendata.epa.gov.tw/ws/Data/UV/?$format=json')`
+ [新竹市不動產實價登錄資訊-買賣案件](https://data.gov.tw/dataset/92433)
+ Task: 找出新竹市屋齡五年以下,格局至少三房,總價666萬以下的交易紀錄共有多少筆?
+ Task: 符合前項規範下,依照門牌分類,找出交易量前10多的門牌。
+ [動物認領養](https://data.gov.tw/dataset/85903)
+ Task: 透過公開資料,找出最稀有的待領養動物以及毛色前十名。