# pydantic
[](https://hackmd.io/@RogelioKG/pydantic)
## References
+ 🔗 [**MyApollo - 用 pydantic 輕鬆進行資料驗證**](https://myapollo.com.tw/blog/pydantic-validate-data/)
+ 🎬 [**ArjanCodes - Why You Should Use Pydantic in 2024 | Tutorial**](https://youtu.be/502XOB0u8OY)
+ 📄 [**Doc - Getting help with Pydantic**](https://docs.pydantic.dev/latest/help_with_pydantic/)
## Note
此文以能最快上手 Pydantic 為優先 (學習兩成功能,滿足八成用途) 的組織結構撰寫,若有不嚴謹、不完善之處請見諒。
## Brief
+ pydantic 用於資料驗證
+ 利用 type hints 來進行驗證
+ 仰賴 `pydantic-core` 庫,其使用 Rust 編寫而成,速度飛快
+ pydantic 的 optional dependencies
+ `pydantic[email]`:處理 email 欄位驗證
+ `pydantic[timezone]`:處理 timezone 欄位驗證
## Models
### `BaseModel`:[Model](https://docs.pydantic.dev/latest/concepts/models/#model-methods-and-properties)
+ 允許嵌套
+ 方法
+ 驗證 (<mark>若允許 None,Model 屬性型別要多標註 `| None` ,並給預設值 `None`</mark> )
+ `model_validate()` (class method)
+ 輸入:object(可以是 dict 或是 ORM 物件)
+ 輸出:Model 物件
+ `model_validate_json()` (class method)
+ 輸入:字串 (JSON)
+ 輸出:Model 物件
+ 倒出 (<mark>使用 `exclude_unset=True` 過濾 None 值</mark> )
+ `model_dump()` (instance method)
+ 輸出:dict
+ `model_dump_json()` (instance method)
+ 輸出:字串 (JSON)
+ JSON Schema
+ `model_json_schema()` (class method)
+ 參數
+ `mode="validation"` (輸入的 JSON Schema)\
`mode="serialization"` (輸出的 JSON Schema)
+ 輸出:dict (JSON Schema)
+ 搭配 ORM
```py
with db.Session() as session:
emp = session.query(table.Employee).filter(table.Employee.emp_name == "RogelioKG").first()
print(schema.EmployeeModel.model_validate(emp).model_dump(mode="json"))
# model_validate 會回傳 EmployeeModel 實例
# 再搭配 model_dump 就可以拿到 dict 了
```
+ 範例
```py
from datetime import datetime
from pydantic import BaseModel, PositiveInt
class User(BaseModel):
id: int
name: str = "John Doe"
signup_ts: datetime | None
tastes: dict[str, PositiveInt]
def test_user():
external_data = {
"id": 123,
"signup_ts": "2019-06-01 12:22",
"tastes": {
"wine": 9,
b"cheese": 7,
"cabbage": "1",
},
}
user = User(**external_data)
print(type(user.signup_ts))
# <class 'datetime.datetime'>
print(user.model_dump())
# {
# 'id': 123,
# 'name': 'John Doe',
# 'signup_ts': datetime.datetime(2019, 6, 1, 12, 22),
# 'tastes': {'wine': 9, 'cheese': 7, 'cabbage': 1},
# }
print(user.model_dump_json())
# {
# 'id': 123,
# 'name': 'John Doe',
# 'signup_ts': '2019-06-01T12:22:00',
# 'tastes': {'wine': 9, 'cheese': 7, 'cabbage': 1},
# }
print(User.model_json_schema())
# {
# "properties": {
# "id": {"title": "Id", "type": "integer"},
# "name": {"default": "John Doe", "title": "Name", "type": "string"},
# "signup_ts": {
# "anyOf": [{"format": "date-time", "type": "string"}, {"type": "null"}],
# "title": "Signup Ts",
# },
# "tastes": {
# "additionalProperties": {"exclusiveMinimum": 0, "type": "integer"},
# "title": "Tastes",
# "type": "object",
# },
# },
# "required": ["id", "signup_ts", "tastes"],
# "title": "User",
# "type": "object",
# }
if __name__ == "__main__":
test_user()
```
### `ConfigDict`:設定 Model 行為
+ 範例
```py
class User(BaseModel):
id: int
name: str
# extra="forbid" (不允許額外欄位出現在輸入資料)
# forzen=True (不可變物件)
# from_attributes=True (可驗證 ORM - SQLAlchemy 回傳物件)
model_config = ConfigDict(extra="forbid", frozen=True)
user = User(id=1, name="Rogelio")
```
## Types
### `~`:型別
+ [Standard Library Types](https://docs.pydantic.dev/latest/api/standard_library_types/)
+ [Pydantic Types](https://docs.pydantic.dev/latest/api/types/)
+ `Strict~` 嚴格型別
+ 使用原生型別標註,若發生型態不相符但相容,Pydantic 會自動隱式轉型
+ 若希望禁止隱式轉型,請使用嚴格型別
+ [Network Types](https://docs.pydantic.dev/latest/api/networks/)
+ `EmailStr` Email
+ `SecretStr` 密碼
+ ...
### `TypeAdaptor` :轉接頭
+ 範例
無須顯式繼承 `BaseModel` 也能進行驗證
```py
from pydantic import StrictInt, TypeAdapter
adapter = TypeAdapter(list[StrictInt])
print(adapter.json_schema()) # {'items': {'type': 'integer'}, 'type': 'array'}
print(adapter.validate_python([1, 2, 3])) # [1, 2, 3]
print(adapter.validate_python([1, 2, "3"])) # ValidationError
```
```py
from dataclasses import dataclass
from pydantic import TypeAdapter
@dataclass
class User:
name: str
age: int
# 也可以轉接 dataclass
adapter = TypeAdapter(User)
data = {"name": "Alice", "age": "30"}
print(adapter.validate_python(data)) # User(name='Alice', age=30)
```
### `FailFast`:出錯趕緊下台
+ 範例
檢查到第一個錯誤 (`"invalid"`) 就會直接報錯,而不會將整個 `list` 都檢查完畢
```py
from typing import Annotated
from pydantic import FailFast, TypeAdapter, ValidationError
ta = TypeAdapter(Annotated[list[bool], FailFast()])
try:
ta.validate_python([True, "invalid", False, "also invalid"])
except ValidationError as exc:
print(exc)
```
## Fields
+ 範例
example, description 等資訊被用在與 FastAPI 整合的 Swagger UI 的 API 文檔之中
```py
from enum import IntFlag, auto
from pydantic import BaseModel, EmailStr, Field, SecretStr
class Role(IntFlag):
Author = auto()
Editor = auto()
Developer = auto()
Admin = Author | Editor | Developer
class User(BaseModel):
name: str = Field(
examples=["Arjan"],
description="The full name of the user",
)
email: EmailStr = Field(
examples=["example@arjancodes.com"],
description="The email address of the user",
)
password: SecretStr = Field(
examples=["Password123"],
description="The password of the user",
)
role: Role = Field(
default=Role.Author,
description="The role of the user",
)
```
一個更乾淨的寫法
```py
class UserType:
Name = Annotated[
str,
Field(description="使用者姓名", example="RogelioKG"),
]
Email = Annotated[
EmailStr,
Field(description="使用者電子郵件", example="user@example.com"),
]
Avatar = Annotated[
str,
Field(description="使用者頭像圖片網址,可為空", example="https://example.com/avatar.jpg"),
]
Password = Annotated[
SecretStr,
Field(min_length=6, description="使用者登入密碼,至少 6 個字元", example="securePass123"),
]
Age = Annotated[
int,
Field(gt=0, lt=100, description="使用者年齡,需介於 1 到 99 歲", example=25),
]
Birthday = Annotated[
date,
Field(description="使用者生日(格式 YYYY-MM-DD)", example="1998-08-08"),
]
class UserBase(BaseModel):
name: UserType.Name
email: UserType.Email
avatar: UserType.Avatar | None = None
model_config = ConfigDict(from_attributes=True)
class UserCreate(UserBase):
password: UserType.Password
age: UserType.Age
birthday: UserType.Birthday
class UserRead(UserBase):
age: UserType.Age
birthday: UserType.Birthday
class UserUpdate(BaseModel):
name: UserType.Name | None = None
email: UserType.Email | None = None
avatar: UserType.Avatar | None = None
password: UserType.Password | None = None
age: UserType.Age | None = None
birthday: UserType.Birthday | None = None
```
+ discriminated union
一種特殊的 Union 型別,<mark>用來描述多個子型別共用一個屬性 (通常是字串),並根據這個屬性的值來區分不同型別的結構</mark>。\
這些子型別都屬於某個共同的 Union 型別,通常搭配 switch 使用能讓編譯器自動進行型別推斷與檢查。
```ts
type Shape =
| { kind: "circle"; radius: number }
| { kind: "square"; sideLength: number }
| { kind: "rectangle"; width: number; height: number };
function getArea(shape: Shape): number {
switch (shape.kind) {
case "circle":
return Math.PI * shape.radius ** 2;
case "square":
return shape.sideLength ** 2;
case "rectangle":
return shape.width * shape.height;
default:
const _exhaustiveCheck: never = shape;
return _exhaustiveCheck;
}
}
```
```py
from typing import Literal
from pydantic import BaseModel, Field, ValidationError
class Cat(BaseModel):
pet_type: Literal["cat"]
meows: int
class Dog(BaseModel):
pet_type: Literal["dog"]
barks: float
class Lizard(BaseModel):
pet_type: Literal["reptile", "lizard"]
scales: bool
class Model(BaseModel):
# 這裡多指定一個 discriminator,對 Pydantic 效率幫助很大
pet: Cat | Dog | Lizard = Field(discriminator="pet_type")
n: int
print(Model(pet={"pet_type": "dog", "barks": 3.14}, n=1))
# pet=Dog(pet_type='dog', barks=3.14) n=1
try:
a = Model(pet={"pet_type": "dog"}, n=1)
except ValidationError as e:
print(e)
```
## Validators
+ Pydantic 預設驗證
+ 會對可轉型的資料,進行隱式強制轉型
+ validator
+ `field_validator` 欄位驗證器
+ `model_validator` Model 驗證器
+ mode
+ `mode=before`:在 Pydantic 預設驗證之前進行驗證
+ `mode=after`:在 Pydantic 預設驗證之後進行驗證
+ `mode=wrap`:完全掌控驗證流程,可在任意位置驗證、任意決定要不要執行 Pydantic 預設驗證
```py
from collections.abc import Callable
from typing import Any
from pydantic import BaseModel, ValidationInfo, field_validator
class User(BaseModel):
age: int
@field_validator("age", mode="wrap")
def validate_age(
cls,
handler: Callable[[Any], int], # int 是目標型別
value: Any,
info: ValidationInfo,
) -> int:
print("raw value =", value)
result = handler(value) # Pydantic 預設驗證
print("validated =", result)
if result < 0:
raise ValueError("Age cannot be negative!")
return result
```
+ Decorator 寫法
```py
from pydantic import BaseModel, ValidationError, field_validator
class Model(BaseModel):
number: int
@field_validator('number', mode='after')
@classmethod
def is_even_validator(cls, value: int) -> int:
if value % 2 == 1:
raise ValueError(f'{value} is not an even number')
return value
try:
Model(number=1)
except ValidationError as err:
print(err)
```
+ Annotated 寫法
此寫法無法用於驗證 Model
```py
from typing import Annotated
from pydantic import AfterValidator, BaseModel, ValidationError
def is_even_validator(number: int) -> int:
if number % 2 == 1:
raise ValueError(f"{number} is not an even number")
return number
class Model(BaseModel):
number: Annotated[int, AfterValidator(is_even_validator)]
try:
Model(number=1)
except ValidationError as err:
print(err)
```
## Serializer
+ validator
+ `field_serializer`
+ `model_serializer`
+ mode
+ `mode=before`
+ `mode=after`
+ `mode=wrap`
+ 範例 (ArjanCodes)
```py
import enum
import hashlib
import re
from collections.abc import Callable
from typing import Any, Self
from pydantic import (
BaseModel,
EmailStr,
Field,
SecretStr,
SerializationInfo,
field_serializer,
field_validator,
model_serializer,
model_validator,
)
VALID_PASSWORD_REGEX = re.compile(r"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$")
VALID_NAME_REGEX = re.compile(r"^[a-zA-Z]{2,}$")
class Role(enum.IntFlag):
User = 0
Author = 1
Editor = 2
Admin = 4
SuperAdmin = 8
class User(BaseModel):
name: str = Field(examples=["Example"])
email: EmailStr = Field(
examples=["user@arjancodes.com"],
description="The email address of the user",
frozen=True, # 此欄位不可變更
)
password: SecretStr = Field(
examples=["Password123"],
description="The password of the user",
exclude=True, # 此欄位不加入 serialization
)
role: Role = Field(
description="The role of the user",
examples=[1, 2, 4, 8],
default=0,
validate_default=True, # 即便是使用預設值,也會跑一遍驗證流程
)
@field_validator("name")
def validate_name(cls, v: str) -> str:
if not VALID_NAME_REGEX.match(v):
raise ValueError(
"Name is invalid, must contain only letters and be at least 2 characters long"
)
return v
@field_validator("role", mode="before")
@classmethod
def validate_role(cls, v: int | str | Role) -> Role:
op = {int: lambda x: Role(x), str: lambda x: Role[x], Role: lambda x: x}
try:
return op[type(v)](v)
except (KeyError, ValueError) as exc:
roles = ", ".join([x.name for x in Role])
raise ValueError(f"Role is invalid, please use one of the following: {roles}") from exc
@model_validator(mode="before")
@classmethod
def validate_user_pre(cls, v: dict[str, Any]) -> dict[str, Any]:
if "name" not in v or "password" not in v:
raise ValueError("Name and password are required")
if v["name"].casefold() in v["password"].casefold():
raise ValueError("Password cannot contain name")
if not VALID_PASSWORD_REGEX.match(v["password"]):
raise ValueError(
"Password is invalid, must contain 8 characters, 1 uppercase, 1 lowercase, 1 number"
)
v["password"] = hashlib.sha256(v["password"].encode()).hexdigest()
return v
@field_serializer("role", when_used="json")
@classmethod
def serialize_role(cls, v: Role) -> str:
return v.name
@model_validator(mode="after")
def validate_user_post(self, v: Any) -> Self:
if self.role == Role.Admin and self.name != "Arjan":
raise ValueError("Only Arjan can be an admin")
return self
@model_serializer(mode="wrap", when_used="json")
def serialize_user(
self,
serializer: Callable[[BaseModel], dict[str, Any]], # Pydantic 預設序列化
info: SerializationInfo,
) -> dict[str, Any]:
if not info.include and not info.exclude:
return {"name": self.name, "role": self.role.name}
return serializer(self)
def main() -> None:
data = {
"name": "Arjan",
"email": "example@arjancodes.com",
"password": "Password123",
"role": "Admin",
}
user = User.model_validate(data)
if user:
print(
"The serializer that returns a dict:",
user.model_dump(),
sep="\n",
end="\n\n",
)
# The serializer that returns a dict:
# {
# "name": "Arjan",
# "email": "example@arjancodes.com",
# "role": <Role.Admin: 4>,
# }
print(
"The serializer that returns a JSON string:",
user.model_dump(mode="json"),
sep="\n",
end="\n\n",
)
# The serializer that returns a JSON string:
# {
# "name": "Arjan",
# "role": "Admin",
# }
print(
"The serializer that returns a json string, excluding the role:",
user.model_dump(exclude=["role"], mode="json"),
sep="\n",
end="\n\n",
)
# The serializer that returns a json string, excluding the role:
# {
# "name": "Arjan",
# "email": "example@arjancodes.com",
# }
print("The serializer that encodes all values to a dict:", dict(user), sep="\n")
# The serializer that encodes all values to a dict:
# {
# "name": "Arjan",
# "email": "example@arjancodes.com",
# "password": SecretStr("**********"),
# "role": <Role.Admin: 4>,
# }
if __name__ == "__main__":
main()
```
## Performace
[Pydantic - 顧慮性能的 best practices](https://docs.pydantic.dev/latest/concepts/performance/#avoid-wrap-validators-if-you-really-care-about-performance),以下僅做簡述。
+ 多用 `FailFast`
+ 多用 `TypedDict` + `TypeAdapter`,少用 nested `BaseModel`
+ `TypeAdapter` 能重複利用最好 (所以別把它寫在函式裡)
+ discriminated union 記得指定 `discriminator`