Try   HackMD

資料格式 CSV、JSON和XML介紹

tags: Python File
2024/04/01 by JohnAxer, 2025/02/21 修訂

投影片 PDF

目前在資料儲存或交換,最常用的三種格式分別是 CSV、JSON、XML,這三種格式都是純文字檔,意思是這三種格式的檔案都可以直接用「記事本」(notepad) 開啟來瀏覽內容。

文字檔最重要的就是文字編碼,常見的有 Unicode(UTF-8)、ANSI(Big5、ASCII) 等。文字編碼不正確,開啟檔案就會看到亂碼。

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

注意圖片右下角,我們可以看到 Windows(CRLF) 和 UTF-8 等檔案格式的說明:

  • Windows(CRLF):使用的換行字元(\r\n),Linux系統都常用 LF(\n) 當換行字元。
  • UTF-8:檔案的文字編碼。

CSV 格式

  • CSV (Comma-Separated Values) 是以「分隔符號」將資料列隔開為數個欄位的純文字檔,一列一筆資料。分隔符號通常使用「,」(注意是英文的逗號),但也可以利用其他的符號來分隔,例如:「:」。
  • CSV 檔可以用試算表軟體開啟,例如:Excel。如果用 Excel 開啟後是亂碼狀態,通常都是編碼解讀不正確,因為 Excel 預設是 ASCII 編碼,最簡單的解決的方式是用記事本開啟後,複製貼上到 Excel 中即可。

CSV 範例

  • 範例1:
    ​​品名,單位,數量,進貨價,出貨價 ​​鉛筆,支,100,2,5 ​​原子筆,支,200,20,30 ​​A4紙,箱,300,500,600 ​​剪刀,支,150,30,60 ​​膠水,瓶,50,15,30
  • 範例2:
    注意:第3欄的英文句子用「"」括住
    ​​Sender,Receiver,Message ​​Alice,Bob,"How are you?" ​​Bob,Alice,"Fine, thank you." ​​Alice,Bob,"I'm glad to hear that." ​​Bob,Alice,"Good, me too." ​​Alice,Bob,"I have to go now." ​​Bob,Alice,"Ok, see you later."

CSV 程式讀寫

  • 讀取 csv

    ​​​​# 讀取 csv 檔案 ​​​​import csv ​​​​file_path = "example.csv" ​​​​with open(file_path, 'r', encoding='utf-8') as f: ​​​​ reader = csv.reader(f) ​​​​ #將 csv 的各列讀入後儲存在 Dictionary 中 ​​​​ #reader = csv.DictReader(f) ​​​​ ​​​​ #方法一 ​​​​ data = list(reader) ​​​​ ​​​​ #方法二 ​​​​ data = [] ​​​​ for row in reader: ​​​​ data.append(row) ​​​​print(data)
  • 寫入 csv

    ​​import csv ​​ ​​file_path = "output.csv" ​​data = [["李白", "唐朝", 25], ["王維", "唐朝", 30], ["杜甫", "唐朝", 40]] ​​with open(file_path, 'w', newline='', encoding='utf-8') as f: ​​ writer = csv.writer(f) ​​ for row in data: ​​ writer.writerow(row) ​​print("CSV file Saved")

JSON 格式

JSON (JavaScript Object Notation) 是一種輕量級的資料定義與交換格式,以 UTF-8編碼。JSON 雖然是因為 JavaScript 而誕生,但卻是獨立於程式語言的資料格式,目前大量應用於 Web API,作為函數參數傳遞及執行結果回傳使用。

JSON 範例

  • 範例1:
{ "name": "Robert", "age": 18, "phone": "0911-223344", "score": [15, 14, 14, 13, 13] }
  • 範例2:
[ { "name": "Robert", "age": 18, "phone": "0911-223344", "score": [15, 14, 14, 13, 13] }, { "name": "Tom", "age": 17, "phone": "0910-221133", "score": [10, 11, 12, 13, 12] } ]

JSON 程式讀寫

  • JSON 字串讀寫
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →
import json s = ''' { "name": "Robert", "age": 18, "phone": "0911-223344", "score": [15, 14, 14, 13, 13] } ''' a = json.loads(s) # 將 JSON 字串轉成 Python 的資料結構,a 是 dict print(a) # {'name': 'Robert', 'age': 18, 'phone': '0911-223344', 'score': [15, 14, 14, 13, 13]} print(type(a)) # <class 'dict'> print(a["score"]) # [15, 14, 14, 13, 13] print(a["score"][1]) # 14 a["age"] = 19 # 修改 a 的 age 為 19 print(a) # {'name': 'Robert', 'age': 19, 'phone': '0911-223344', 'score': [15, 14, 14, 13, 13]} b = json.dumps(a) # 將 Python 的資料結構轉成 JSON 字串,b 是 str print(type(b)) # <class 'str'>

想想看,下面程式執行後, a 應該是什麼型態的容器? a 的內容呢?

import json s = ''' [ { "name": "Robert", "age": 18, "phone": "0911-223344", "score": [15, 14, 14, 13, 13] }, { "name": "Tom", "age": 18, "phone": "0910-221133", "score": [10, 11, 12, 13, 12] } ] ''' a = json.load(s) print(type(a))
  • JSON 檔案讀寫
    image
import json #讀取一個檔案,檔案名稱 ex1.json,內容如範例1 with open("ex1.json", "r") as file: a = json.load(file) print(a) a["score"][1] = 10 #寫入一個檔案,檔案名稱為 r1.json with open("r1.json", "w") as file: json.dump(a, file, indent=4) # indent:縮排4格

XML 格式

XML (eXtensible Markup Language) 是一種標記式語言,主要以 UTF-8 編碼。XML 通常用於資料定義、交換與儲存。XML 另外有資料格式與型態驗證語言支援,包括 XSD 及 DTD,可以驗證資料格式與型態是否符合規定。

XML 範例

  • 範例
<?xml version="1.0" encoding="utf-8"?> <Products> <Product> <ItemName>鉛筆</ItemName> <Unit></Unit> <Amount>100</Amount> <Purchase>2</Purchase> <Sell>5</Sell> </Product> <Product> <ItemName>原子筆</ItemName> <Unit></Unit> <Amount>200</Amount> <Purchase>20</Purchase> <Sell>30</Sell> </Product> </Products>
  • 每一個 xml 檔案必須且只有一個根元素
  • 每一個元素都有開頭標記及結束標記
  • 元素間的排列必須是 well-formed

image

XML 程式讀寫

  • XML 建立與寫入
import xml.etree.ElementTree as ET root = ET.Element("Products") product = ET.SubElement(root, "Product") name = ET.SubElement(product, "ItemName") name.text = "鉛筆" unit = ET.SubElement(product, "Unit") unit.text = "支" amount = ET.SubElement(product, "Amount") amount.text = "100" purchase = ET.SubElement(product, "Purchase") purchase.text = "10" sell = ET.SubElement(product, "Sell") sell.text = "20" tree = ET.ElementTree(root) tree.write("products.xml", encoding="utf-8", xml_declaration=True) print("XML file has been created.")

範例產生之檔案 products.xml 內容如下

<?xml version="1.0" encoding="utf-8"?> <Products> <Product> <ItemName>鉛筆</ItemName> <Unit></Unit> <Amount>100</Amount> <Purchase>10</Purchase> <Sell>20</Sell> </Product> </Products>
  • XML 讀取與操作
import xml.etree.ElementTree as ET import json #讀取xml檔案 products.xml。 tree = ET.parse("products.xml") products = tree.getroot() print(products.tag) # products for product in products: print(product.tag) # product for item in product: print(item.tag, item.text) print() # 轉換為 dict data = [] for product in products: d = {} for item in product: d[item.tag] = item.text data.append(d) print(data) # data 為一個 list,其內每個元素均為 dict