Try   HackMD

Scratch and Python 2018 - Python Lecture 4

File I/O

  • open
    • Read text file: FILE = open(filename,'r')
      • Read whole file as a string: FILE.read()
      • Read while file as lines: FILE.readlines()
      • Read a single line: FILE.readline()
      • Is iterable: for line in FILE:
    • Write text file: FILE = open(filename,'w')
      • Print to file: print(things,file=FILE)
      • Write a string: FILE.write(string)
  • close
    • FILE.close()
  • with block
    • Prevent you forget to close the file.
    • ​​​​​​with open(filename,'r') as FILE:
      ​​​​​​    lines = FILE.readlines()
      

JSON

  • Reference
  • import json
  • dumps
    • json.dumps({'name':'MZ','height':177.5,'weight':115})
  • dump
    • ​​​​​​with open(filename,'w') as FILE:
      ​​​​​​    json.dump({'name':'MZ','height':177.5,'weight':115},FILE)
      
  • loads
    • lst = json.loads('[1,2,3,4,5]')
  • load
    • ​​​​​​with open(filename,'r') as FILE:
      ​​​​​​    obj = json.load(FILE)
      

CSV

  • Reference
  • import csv
  • csv.reader
    • ​​​​​​import csv
      ​​​​​​with open(csvfile,'r') as FILE:
      ​​​​​​    rd = csv.reader(FILE)
      ​​​​​​    rows = [row for row in rd]
      ​​​​​​    
      ​​​​​​print(rows)
      
  • csv.writer
    • ​​​​​​import csv
      ​​​​​​with open(csvfile,'w') as FILE:
      ​​​​​​    wt = csv.writer(FILE)
      ​​​​​​    wt.writerow(['Name','Height','Weight'])
      ​​​​​​    wt.writerow(['MZ',177.5,115])
      
  • csv.DictReader
    • ​​​​​​import csv
      ​​​​​​with open(csvfile,'r') as FILE:
      ​​​​​​    rd = csv.DictReader(FILE)
      ​​​​​​    rows = [row for row in rd]
      ​​​​​​    
      ​​​​​​print(rows)
      
  • csv.DictWriter
    • ​​​​​​import csv
      ​​​​​​with open(csvfile,'w') as FILE:
      ​​​​​​    fns = ['Name','Height','Weight']
      ​​​​​​    wt = csv.DictWriter(FILE,fieldnames=fns)
      ​​​​​​    wt.writerow({'Name':'MZ','Height':177.5,'Weight':115})
      

requests: a browser (?)

  • import requests: don't forget s
  • URL: Uniform Resource Locator
    • 這術語好長我們叫他網址好了
  • result = requests.get('http://www.nctu.edu.tw/')
    • 這網址好長請試著複製貼上吧
  • result.text
    • 這內容好字串:type(result.text)
    • String processing
      • str
      • import re: regular expression
  • result.raise_for_status()
    • 連網頁做了好多事情,要是網址有錯瀏覽器會直接當掉給你看嗎?
    • It is OK to raise an exception, always using try-except blocks is tedious.
      • Java is tedious: almost always try or throws (raise in Python)
    • raise_for_status(): If there is any exception, raise it now.
  • Save the content
    • Open a file to save it: the_file = open('a_name.html', 'wb')
      • Filename: a_name.html
      • Mode: wb means "write binary"
        • the_file.write(chunk): write a chunk of bytes
      • Mode: wt means "write text"
        • print(some_str,file=the_file): write a string
    • for chunk in result.iter_content(102400): to iterate 102400-byte chunks of result
    • Remember to close the file: the_file.close()
    • Sample code
      ​​​​​​​​import requests ​​​​​​​​url = input('Input URL: ') ​​​​​​​​result = requests.get(url) ​​​​​​​​result.raise_for_status() ​​​​​​​​name = input('input filename: ') ​​​​​​​​FILE = open(name,'wb') ​​​​​​​​for chunk in result.iter_content(102400): ​​​​​​​​ FILE.write(chunk) ​​​​​​​​FILE.close()
    • Sample code 2: with-block (Probably, every body sometimes forgets to close the file.)
      ​​​​​​​​import requests ​​​​​​​​url = input('Input URL: ') ​​​​​​​​result = requests.get(url) ​​​​​​​​result.raise_for_status() ​​​​​​​​name = input('input filename: ') ​​​​​​​​with open(name,'wb') as FILE: ​​​​​​​​ for chunk in result.iter_content(102400): ​​​​​​​​ FILE.write(chunk)
    • Try to download some images with the sample codes.
      ​​​​​​​​import requests ​​​​​​​​def url_to_file(url,filename): ​​​​​​​​ result = requests.get(url) ​​​​​​​​ result.raise_for_status() ​​​​​​​​ with open(filename,'wb') as FILE: ​​​​​​​​ for chunk in result.iter_content(102400): ​​​​​​​​ FILE.write(chunk)

Open Data

  • 政府開放資料平台
  • 紫外線即時監測資料
    • Task: 請撰寫一個程式,透過紫外線即時監測資料,找出 UVI 前三高的地方。
      • Hint: requests.get('http://opendata.epa.gov.tw/ws/Data/UV/?$format=json')
      • Sample Code
      ​​​​​​​​import requests, json ​​​​​​​​res = requests.get('http://opendata.epa.gov.tw/ws/Data/UV/?$format=json') ​​​​​​​​data = json.loads(res.text) ​​​​​​​​uvi_place = [(float(d['UVI']),d['SiteName']) for d in data if d['UVI']!=''] ​​​​​​​​uvi_place.sort(reverse=True) ​​​​​​​​print(uvi_place[:3])
  • 全國電子發票B2C開立資料集
    • Task: 請撰寫一個程式,統計 2014 年至 2017 年,各行業平均客單價。輸出為 CSV 格式的檔案。
      • Hint: requests.get('http://sip.einvoice.nat.gov.tw/ods-main/ODS308E/download/3886F055-EB77-4DF9-98E2-F3F49A7D3434/1/845E38D0-76D4-4B49-922A-96F41705F175/0/?fileType=csv')
      • Sample Code
      ​​​​​​​​import requests, csv ​​​​​​​​def url_to_file(url,filename): ​​​​​​​​ result = requests.get(url) ​​​​​​​​ result.raise_for_status() ​​​​​​​​ with open(filename,'wb') as FILE: ​​​​​​​​ for chunk in result.iter_content(102400): ​​​​​​​​ FILE.write(chunk) ​​​​​​​​url = 'http://sip.einvoice.nat.gov.tw/ods-main/ODS308E/download/3886F055-EB77-4DF9-98E2-F3F49A7D3434/1/845E38D0-76D4-4B49-922A-96F41705F175/0/?fileType=csv' ​​​​​​​​filename = 'C:\\Users\\user\\Desktop\\task2.csv' ​​​​​​​​url_to_file(url, filename) ​​​​​​​​with open(filename,'r',encoding='utf8') as FILE: ​​​​​​​​ rd = csv.DictReader(FILE) ​​​​​​​​ rows = [row for row in rd] ​​​​​​​​avg = {} ​​​​​​​​for row in rows: ​​​​​​​​ if row['\ufeff發票年月'].startswith('2018'): continue ​​​​​​​​ avg.setdefault(row['行業名稱'],[]).append(float(row['平均客單價'])) ​​​​​​​​for k, v in avg.items(): ​​​​​​​​ print(k,sum(v)/len(v))