# SQLAlchemy 2.0(06)查詢資料(3-aggregate and subquery) ###### tags: `python` `sqlalchemy` [官方文件](https://docs.sqlalchemy.org/en/20/tutorial/data_select.html) 查詢過程中弄點聚合函數的操作是正常的,我們總是會需要統計個數(count)或是加總數量(sum)之類,在SQLAlchemy中大致都可以從`sqlalchemy.func`來處理。 ## order by 資料查詢過程中我們可能會需要做點排序,這很簡單,利用`order_by`接在查詢過程中就行: ```python= print(select(user_table).order_by(user_table.c.name)) ``` ``` SELECT user_account.id, user_account.name, user_account.fullname FROM user_account ORDER BY user_account.name ``` 即使你用ORM也是可以用類似的方式排序: ```python= print(select(User).order_by(User.fullname.desc())) ``` ``` SELECT user_account.id, user_account.name, user_account.fullname FROM user_account ORDER BY user_account.fullname DESC ``` 很明顯的,上面我們加入`desc`來指定排序方式。 ## aggregate functions 處理報表的過程中一定會用到很多聚合函數,先跟著官方文件給出範例: ```python= with engine.connect() as conn: result = conn.execute( select(User.name, func.count(Address.id).label("count")) .join(Address) .group_by(User.name) .having(func.count(Address.id) > 1) ) print(result.all()) ``` ``` SELECT user_account.name, count(address.id) AS count FROM user_account JOIN address ON user_account.id = address.user_id GROUP BY user_account.name HAVING count(address.id) > ? ``` 表達式也算是直觀,在`select`裡面利用`func.count`指定要加總個數的欄位,然後再接`label`給這個結果一個別名,用`group_by`來指定要群組的欄位,再用`having`來做一個範圍整理。 這邊給出第二個範例: ```python= from sqlalchemy import func, desc stmt = ( select(Address.user_id, func.count(Address.id).label("num_addresses")) .group_by("user_id") .order_by("user_id", desc("num_addresses")) ) print(stmt) ``` ``` SELECT address.user_id, count(address.id) AS num_addresses FROM address GROUP BY address.user_id ORDER BY address.user_id, num_addresses DESC ``` 這邊範例主要說明的是,SQLAlchemy可以在一個查詢的上下文中就直接把過程中聚合函數生成的新的欄位`num_addresses`拿來做為排序依序。 ## aliases 這邊主要說明的是,我們可以先把`table`給予一個別名,也許你會想同一張表自己join自己之類的: ```python= user_alias_1 = user_table.alias() user_alias_2 = user_table.alias() print( select(user_alias_1.c.name, user_alias_2.c.name).join_from( user_alias_1, user_alias_2, user_alias_1.c.id > user_alias_2.c.id ) ) ``` ``` SELECT user_account_1.name, user_account_2.name AS name_1 FROM user_account AS user_account_1 JOIN user_account AS user_account_2 ON user_account_1.id > user_account_2.id ``` 先是利用`Table.alias()`來宣告變數,再將之應用於實際的查詢過程,以實現自己關聯自己的一種作法。 如果是在ORM的話當然也有類似的作法: ```python= from sqlalchemy.orm import aliased address_alias_1 = aliased(Address) address_alias_2 = aliased(Address) print( select(User) .join_from(User, address_alias_1) .where(address_alias_1.email_address == "patrick@aol.com") .join_from(User, address_alias_2) .where(address_alias_2.email_address == "patrick@gmail.com") ) ``` ``` SELECT user_account.id, user_account.name, user_account.fullname FROM user_account JOIN address AS address_1 ON user_account.id = address_1.user_id JOIN address AS address_2 ON user_account.id = address_2.user_id WHERE address_1.email_address = :email_address_1 AND address_2.email_address = :email_address_2 ``` ## subquery 子查詢是資料庫查詢中常見的一種作法,也許我們會希望先有一個子查詢統計某一個資訊再關聯。總之你就是在整個`select`的最後加上`subquery()`就可以讓SQLAlchemy知道這是一個子查詢: ```python= subq = ( select(func.count(address_table.c.id).label("count"), address_table.c.user_id) .group_by(address_table.c.user_id) .subquery() ) ``` ``` SELECT count(address.id) AS count, address.user_id FROM address GROUP BY address.user_id ``` 然後就可以把這個子查詢拿來用到我們真正想做查詢的地方: ```python= stmt = select(user_table.c.name, user_table.c.fullname, subq.c.count).join_from( user_table, subq ) print(stmt) ``` ``` SELECT user_account.name, user_account.fullname, anon_1.count FROM user_account JOIN (SELECT count(address.id) AS count, address.user_id AS user_id FROM address GROUP BY address.user_id) AS anon_1 ON user_account.id = anon_1.user_id ``` ## CTE CTE的話,我記得我在查詢oracle效能調校的時候有看過一段,就是CTE建置起來的區塊如果被select多次的話,就會在記憶體中產生一個暫存表,以加速查詢速度。不確定我有沒有記錯,不過能確定的是它的表達式是`with xxx as`,跟子查詢類似,只要在`select`的最後接上`cte`就能讓SQLAlchemy知道: ```python= subq = ( select(func.count(address_table.c.id).label("count"), address_table.c.user_id) .group_by(address_table.c.user_id) .cte() ) stmt = select(user_table.c.name, user_table.c.fullname, subq.c.count).join_from( user_table, subq ) print(stmt) ``` ``` WITH anon_1 AS (SELECT count(address.id) AS count, address.user_id AS user_id FROM address GROUP BY address.user_id) SELECT user_account.name, user_account.fullname, anon_1.count FROM user_account JOIN anon_1 ON user_account.id = anon_1.user_id ``` 這邊沒有特別的再記錄屬於ORM的操作模式,因為官方文件有把ORM的部份另外一個專區,就後面再另外記錄。 ## scalar_subquery `scalar_subquery`是一種回傳一個column、一個row的查詢方式。大概就是你用聚合函數去統計某一個欄位的結果之類的: ```python= subq = ( select(func.count(address_table.c.id)) .where(user_table.c.id == address_table.c.user_id) .scalar_subquery() ) print(subq) ``` ``` (SELECT count(address.id) AS count_1 FROM address, user_account WHERE user_account.id = address.user_id) ``` 你可以直接拿這個結構來做條件式: ```python= print(subq == 5) ``` ``` (SELECT count(address.id) AS count_1 FROM address, user_account WHERE user_account.id = address.user_id) = :param_1 ``` 或是直接拿著它去做查詢,不過要注意的是,這時候跟其它表一起查詢的時候會是在column上的直接做子查詢: ```python= stmt = select(user_table.c.name, subq.label("address_count")) print(stmt) ``` ``` SELECT user_account.name, (SELECT count(address.id) AS count_1 FROM address WHERE user_account.id = address.user_id) AS address_count FROM user_account ``` 這種子查詢方式一般效能不好,就盡量少用吧。不過如果真的要用,然後又有一些關聯上的需求,就可以使用`ScalarSelect.correlate`: ```python= subq = ( select(func.count(address_table.c.id)) .where(user_table.c.id == address_table.c.user_id) .scalar_subquery() .correlate(user_table) ) ``` 大致就是你在你的子查詢中已經定義好要相關聯的資料表,然後再把這個子查詢實際用真正的查詢中: ```python= with engine.connect() as conn: result = conn.execute( select( user_table.c.name, address_table.c.email_address, subq.label("address_count"), ) .join_from(user_table, address_table) .order_by(user_table.c.id, address_table.c.id) ) print(result.all()) ``` ``` SELECT user_account.name, address.email_address, (SELECT count(address.id) AS count_1 FROM address WHERE user_account.id = address.user_id) AS address_count FROM user_account JOIN address ON user_account.id = address.user_id ORDER BY user_account.id, address.id ``` 經過上面子查詢的設置,在這邊可以看的到子查詢對於條件的指定是明確知道說子查詢的欄位是要跟外部查詢的那一張資料表做條件關聯,上面範例就是根據`.correlate(user_table)`讓sqlalchemy知道是跟`user_table`做條件關聯,而不是跟另一個join的`address`做條件關聯。 SQLAlchemy還有lateral的用法,不過這一部份個人較為少用,無法有效說明,就不多做記錄來誤導他人。