Introduction to Python Applications 2025 - Lecture 4

Introduction to Python Applications 2025 - Lecture 4 === ###### tags: `Python` `Python and Its Application 2025` ## File I/O + `open` + Read text file: `FILE = open(filename,'r')` + Read whole file as a string: `FILE.read()` + Read while file as lines: `FILE.readlines()` + Read a single line: `FILE.readline()` + Is iterable: `for line in FILE:` + Named argument `encoding` + `utf8` + `big5` + `cp950` + [Refrence](https://docs.python.org/3/library/codecs.html) + Usage: `open(filename, 'r', encoding='utf8')` + Write text file: `FILE = open(filename,'w')` + Print to file: `print(things,file=FILE)` + Write a string: `FILE.write(string)` + Append text file: `FILE = open(filename,'a')` + Print to file: `print(things,file=FILE)` + Write a string: `FILE.write(string)` + `close` + `FILE.close()` + Almost every programmer sometimes forgets to close a file. + `with` block + Prevent you forget to close the file. + ```python3 with open(filename,'r') as FILE: lines = FILE.readlines() ``` ## [JSON](https://zh.wikipedia.org/wiki/JSON) + [Reference](https://docs.python.org/3/library/json.html#module-json) + `import json` + `dumps` + `json.dumps({'name':'MZ','height':177.5,'weight':115})` + `dump` + ```python3 with open(filename,'w') as FILE: json.dump({'name':'MZ','height':177.5,'weight':115},FILE) ``` + `loads` + `lst = json.loads('[1,2,3,4,5]')` + `load` + ```python3 with open(filename,'r') as FILE: obj = json.load(FILE) ``` + Sample + Download [data](https://data.gov.tw/dataset/85903) in json format and save as `animal.json`. + ```python3= import webbrowser, json with open('animal.json',encoding='utf8') as a_json: content = json.load(a_json) cnt = 0 for data in content: if not data.get('animal_colour'): continue if not data.get('animal_kind'): continue if not data.get('album_file'): continue if data['animal_colour'] == '黑色' and data['animal_kind'] == '狗': webbrowser.open(data['album_file']) cnt += 1 if cnt >= 5: break ``` ## CSV + [Reference](https://docs.python.org/3/library/csv.html) + CSVs are almost broken all the time. + `import csv` + `csv.reader` + ```python3 import csv with open(csvfile,'r') as FILE: rd = csv.reader(FILE) rows = [row for row in rd] print(rows) ``` + `csv.writer` + ```python3 import csv with open(csvfile,'w') as FILE: wt = csv.writer(FILE) wt.writerow(['Name','Height','Weight']) wt.writerow(['MZ',177.5,115]) ``` + `csv.DictReader` + ```python3 import csv with open(csvfile,'r') as FILE: rd = csv.DictReader(FILE) rows = [row for row in rd] print(rows) ``` + `csv.DictWriter` + ```python3 import csv with open(csvfile,'w') as FILE: fns = ['Name','Height','Weight'] wt = csv.DictWriter(FILE,fieldnames=fns) wt.writerow({'Name':'MZ','Height':177.5,'Weight':115}) ``` ## `requests`: a browser (?) + `import requests`: don't forget `s` + URL: Uniform Resource Locator + 這術語好長我們叫他網址好了 + `result = requests.get('https://www.nycu.edu.tw/')` + 這網址好長請試著複製貼上吧 + 如果不能使用，需要偽造成一般的瀏覽器。先前往[`http://useragentstring.com/`](http://useragentstring.com/)取得瀏覽器資訊。 + 製作含 User-Agent 的標頭 `headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36"}` + 改用 `result = requests.get('https://www.nycu.edu.tw/', headers=headers)` 取網頁內容。 + `result.text` + 這內容好字串：`type(result.text)` + String processing + `str` + `import re`: regular expression + `result.raise_for_status()` + 連網頁做了好多事情，要是網址有錯瀏覽器會直接當掉給你看嗎？ + `raise_for_status()`: If there is any exception, raise it now. + Save the content + Open a file to save it: `the_file = open('a_name.html', 'wb')` + Filename: `a_name.html` + Mode: `wb` means "write binary" + `the_file.write(chunk)`: write a chunk of bytes + Mode: `wt` means "write text" + `print(some_str,file=the_file)`: write a string + `for chunk in result.iter_content(102400):` to iterate 102400-byte chunks of `result` + Remember to close the file: `the_file.close()` + Sample code ```python3= import requests url = input('Input URL: ') result = requests.get(url) result.raise_for_status() name = input('input filename: ') FILE = open(name,'wb') for chunk in result.iter_content(102400): FILE.write(chunk) FILE.close() ``` + Sample code 2: `with`-block (Probably, every body sometimes forgets to close the file.) ```python3= import requests url = input('Input URL: ') result = requests.get(url) result.raise_for_status() name = input('input filename: ') with open(name,'wb') as FILE: for chunk in result.iter_content(102400): FILE.write(chunk) ``` + Try to download some images with the sample codes. ```python3= import requests def url_to_file(url,filename): result = requests.get(url) result.raise_for_status() with open(filename,'wb') as FILE: for chunk in result.iter_content(102400): FILE.write(chunk) ``` + URL + Almost everything is specified: `https://en.wikipedia.org/w/api.php?action=rsd` + Same protocol: `//en.wikipedia.org/w/api.php?action=rsd` + Same address: `/w/api.php?action=rsd` + Relative path: `api.php?action=rsd` + Task: open all URLs which end with `html` in a wikipedia page. + `import webbrowser` then use `webbrowser.open(URL)` to open the page + Task: Download an image from a webpage. + Hint: reuse `url_to_file` sample code. + Hint: find `<img` in `result.text`. You should discover there is a URL append to a `src=` nearby. + Bonus: Download random 3 images from a webpage + Bonus: Download all identifiable images from a webpage ## Open Data + [政府開放資料平台](https://data.gov.tw/) + [紫外線即時監測資料](https://data.gov.tw/dataset/6076) + Task: 請撰寫一個程式，透過紫外線即時監測資料，找出近三個小時中， UVI 指數前三高的地方。 + Hint: `requests.get('http://opendata.epa.gov.tw/ws/Data/UV/?$format=json')` + [新竹市不動產實價登錄資訊-買賣案件](https://data.gov.tw/dataset/92433) + Task: 找出新竹市屋齡五年以下，格局至少三房，總價666萬以下的交易紀錄共有多少筆？ + Task: 符合前項規範下，依照門牌分類，找出交易量前10多的門牌。 + [動物認領養](https://data.gov.tw/dataset/85903) + Task: 透過公開資料，找出最稀有的待領養動物以及毛色前十名。