可以用欄位名稱取用元素的序組 (tuple)

###### tags: `Python` # 可以用欄位名稱取用元素的序組 (tuple) 不知道大家在使用 [time](https://docs.python.org/3/library/time.html) 模組的 [localtime()](https://docs.python.org/3/library/time.html?highlight=process_time#time.localtime) 時有沒有注意到這個函式傳回的是一個 [struct_time](https://docs.python.org/3/library/time.html?highlight=process_time#time.struct_time)： ```python >>> import time >>> t1 = time.localtime() >>> t1 time.struct_time(tm_year=2023, tm_mon=1, tm_mday=28, tm_hour=17, tm_min=26, tm_sec=48, tm_wday=5, tm_yday=28, tm_isdst=0) >>> type(t1) <class 'time.struct_time'> ``` 這種物件很特別, 它既可以當成序組 (tuple) 來用, 像是這樣用索引編號取得元素： ```python >>> t1[0] 2023 ``` 也可以用欄位名稱來取得個別元素, 例如第 0 個元素的名稱為 "tm_year"： ```python >>> t1.tm_year 2023 ``` ## 具名序組這種可用欄位名稱取用元素的序組稱為**具名序組 (named tuple)**, 可以幫序組的元素加上具有說明意義的名稱, 像是剛剛的 struct_time 物件, 就可以很容易區別哪一個元素是年份、哪一個是小時, 不會弄錯順序。如果你也希望可以建立這樣的物件, 有兩種方式，一種是使用 [collections 模組](https://docs.python.org/3/library/collections.html#module-collections)下的 [namedtuple() 函式](https://docs.python.org/3/library?/collections.html#collections.namedtuple)，另外一種則是使用 [typing 模組](https://docs.python.org/3/library/typing.html#module-typing)內的 [NamedTuple 類別](https://docs.python.org/3/library/typing.html#typing.NamedTuple)。 ### 使用 nametuple 函式自製具名序組 namedtuple() 是所謂的**工廠 (factory) 函式**, 這種函式的功用就是幫你依據需求製造出新的物件, 而 namedtuple() 會製造的是一種新的類別, 可以用來產生具名序組。假設我們想要用只有 2 個元素的序組來表示幾何平面上的一點, 那麼就可以如下利用 namedtuple() 定義新的類別： ```python >>> import collections >>> P = collections.namedtuple( ... 'Point', ... ['X', 'Y'] ... ) ``` 第 1 個參數是新類別的名稱, 第 2 個參數是一串字串, 代表個別元素的名稱, 上述範例的意思就是定義一個 Point 類別, 它建立的物件其實就是一個只有 2 個元素的序組, 其中第 1 個元素叫做 'X'、第 2 個元素叫做 'Y'。我們把新定義的類別取別名為 P, 即可如下建立物件： ```python >>> p1 = P(1,2) >>> p2 = P(Y=3,X=5) >>> p1 Point(X=1, Y=2) >>> p2 Point(X=5, Y=3) >>> type(p1) <class '__main__.Point'> ``` 你可以注意到建立物件的時候還可以用欄位名稱當成具名參數。建立物件後就可以使用序組或是欄位名稱的方式取用元素： ```python >>> p1[0] 1 >>> p2.Y 3 ``` 如果想要用字串當成像是字典的索引鍵那樣取用元素, 可以透過內建函式 [getattr()](https://docs.python.org/3/library/functions.html?highlight=getattr#getattr)： ```python >>> getattr(p1, 'X') 1 ``` #### 具名序組元素的預設值你也可以在定義具名序組類別時幫個別元素準備預設值, 例如： ```python >>> P = collections.namedtuple( ... 'Point', ... ['X', 'Y'], ... defaults=[1, 1] ... ) >>> p3 = P() >>> p3 Point(X=1, Y=1) ``` defaults 參數必須是一個可走訪物件, 由它依序提供個別元素的預設值。如果 defaults 內的資料數量少於元素個數, 就必須依循位置參數要出現在套用預設值的參數前的規則, 把 defaults 提供的預設值套用在序組中排在後面的元素, 前面的元素仍需要在建立物件時指定內容, 例如： ```python >>> P = collections.namedtuple( ... 'Point', ... ['X', 'Y'], ... defaults=[1] ... ) >>> p4 = P() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: Point.__new__() missing 1 required positional argument: 'X' Point.__new__() missing 1 required positional argumen' ``` 此例由於 defaults 僅有 1 項資料, 所以套用在後面的 Y, 建立物件時沒有指定 X 值, 就會出錯。改成這樣就可以正常運作： ```python >>> p4 = P(3) >>> p4 Point(X=3, Y=1) ``` 你可以看到 Y 欄位套用了預設值 1。 #### 具名序組的特殊成員具名序組是衍生自 tuple 的類別, 因此可以使用在任何序組可以應用的場合, 除此之外, 具名序組也具有額外的成員： ```python >>> P._fields ('X', 'Y') >>> P._field_defaults {'Y': 1} ``` \_fields 會以序組傳回欄位名稱清單, 而 \_field_defaults 則是以字典傳回個別欄位的預設值。要特別留意的是這些成員的名稱都是以 '\_' 開頭。如果想要從串列建立具名序組, 可以用自動拆包的方式, 或是使用具名序組特有的 \_make() 方法： ```python >>> p5 = P(*[5,6]) >>> p6 = P._make([7,8]) >>> p5 Point(X=5, Y=6) >>> p6 Point(X=7, Y=8) ``` 同樣的方式, 也可以從字典拆解成具名參數建立具名序組： ```python >>> p7 = P(**{'X':10, 'Y':11}) >>> p7 Point(X=10, Y=11) ``` 不過要特別注意字典內的索引鍵要和具名序組內的欄位名稱相符。具名序組也提供反向將序組內容轉換成字典的方法： ```python >>> p7._asdict() {'X': 10, 'Y': 11} ``` 要記得具名序組終究還是序組, 是不可修改內容的物件, 若是要調整內容, 可以使用具名序組特有的 \_replace() 方法, 例如： ```python >>> p7 Point(X=10, Y=11) >>> p8 = p7._replace(X=100) >>> p8 Point(X=100, Y=11) ``` 你可以看到 \_replace() 會建立一個新的物件, 所以 p7 的內容不會變。 ### 使用 NamedTuple 類別建立具名序組使用 typing.NamedTuple 也可以建立具名序組，使用起來可能會更直覺一點，例如以下就可以建立跟剛剛使用 collections.nametuple 函式所建立的相同功能具名序組： ```python >>> class Point(NamedTuple): ... x: int ... y: int ``` 這兩種方式建立的類別都是同樣的功能： ```python >>> p9 = Point(3, 5) >>> p9 Point(x=3, y=5) >>> p10 = Point(y=10, x=8) >>> p10 Point(x=8, y=10) ``` 最大的差別就在於使用 NamedTuple 類別具備型別提示的功能，你可以知道 x 和 y 應該要是整數，這也可以透過 `__annotations__` 屬性得知： ```python >>> Point.__annotations__ {'x': <class 'int'>, 'y': <class 'int'>} >>> p9.__annotations__ {'x': <class 'int'>, 'y': <class 'int'>} ``` 如果是使用 namedtuple 函式建立的類別，則 `__annotations__` 是空的字典： ```python >>> Point = namedtuple( ... 'Point', ... ['x', 'y'] ... ) >>> Point.__annotations__ {} ``` 而且建立的具名序組是沒有 `__annotations__` 屬性的： ```python >>> p7 = Point(10, 8) >>> p7.__annotations__ Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'Point' object has no attribute '__annotations__'. Did you mean: '__contains__'? ``` #### 設定預設值如果要設定個別欄位的預設值，就只要在定義類別時加上即可，例如： ```python >>> class Point(NamedTuple): ... x: int = 1 ... y: int = 1 >>> p11 = Point() >>> p11 Point(x=1, y=1) ``` 但一樣要注意，如果只提供部分欄位預設值，那沒有預設值的欄位一定要排在前面，以下這樣就會出錯： ```python >>> class Point(NamedTuple): ... x: int = 1 ... y: int Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\meebo\AppData\Roaming\uv\python\cpython-3.13.1-windows-x86_64-none\Lib\typing.py", line 3007, in __new__ raise TypeError(f"Non-default namedtuple field {field_name} " ...<2 lines>... f"{', '.join(default_names)}") TypeError: Non-default namedtuple field y cannot follow default field x ``` 改成這樣就可以了： ```python >>> class Point(NamedTuple): ... x: int ... y: int = 1 >>> p12 = Point(5) >>> p12 Point(x=5, y=1) ``` #### 特殊屬性使用 namedtuple 函式或是 NamedTuple 類別建立的新類別基本上用法都一樣，所以前面介紹過的特殊屬性也都可以用再以 MamedTuple 子類別上： ```python >>> p12._fields ('x', 'y') >>> p12._field_defaults {'y': 1} >>> p12._asdict() {'x': 5, 'y': 1} >>> p13 = p12._make([4, 5]) >>> p13 Point(x=4, y=5) >>> p12 Point(x=5, y=1) ``` ## 結語具名序組可以用在固定數量且個別項目具有特定意義的一組資料上, 既可以快速透過索引取用資料, 也可以用欄位名稱識別個別元素, 在像是存取 csv 檔或是 sqllite 表格資料時會非常有用, 不會搞不清楚到底哪一個元素是什麼用途, 你也可以將之應用在自己的程式中。至於要使用哪一種方式建立具名序組，我個人建議使用 NamedTuple 類別，語法比較直接，而且可以透過型別提示表達個別欄位的型別，對於後續程式維護或是多人合作更方便。