[API] Slurm REST API === ###### tags: `SlinkyProject / CLI & REST API` ###### tags: `Kubernetes`, `k8s`, `app`, `slurm`, `SlinkyProject`, `Slurm CLI` <br> [TOC] <br> ## docs - **Slurm REST API 概覽**: https://slurm.schedmd.com/rest.html - **REST API 快速入門**: https://slurm.schedmd.com/rest_quickstart.html - **REST API 完整文件**: https://slurm.schedmd.com/rest_api.html - route prefix 有分 **/Slurm** 和 **/Slurmdb** 兩類 - 根據當前的 Slinky 版本所搭載的 Slurm 版本,能支援的 REST API 也有所不同 - 存取 `http://${SLURM_RESTAPI_IP}:6820/slurm/v0.0.40/accounts`,出現底下錯誤: `Unable to find requested URL endpoint. Please query the '/openapi/v3' endpoint or visit 'https://slurm.schedmd.com/rest_api.html' for the OpenAPI specification which includes a list of all possible slurmrestd endpoints.` - v0.0.40 已經被 phase out - 改用 `http://${SLURM_RESTAPI_IP}:6820/slurmdb/v0.0.41/accounts` 即可正常執行 <br> --- <br> ## 快速上手 ### 完整 script ```bash= # Step 1: 取得 REST API 服務的 token export JWT_TOKEN="$( kubectl -n slurm exec slurm-controller-0 -- scontrol token \ | cut -d= -f2- )" # Step 2: 取得 REST API 服務的 IP SLURM_RESTAPI_IP="$(kubectl -n slurm get services slurm-restapi -o jsonpath='{.spec.clusterIP}')" # Step 3: 呼叫 API(使用 JWT token) curl -H "X-SLURM-USER-TOKEN: ${JWT_TOKEN}" \ http://${SLURM_RESTAPI_IP}:6820/openapi/v3 ``` - 底下逐步解說 <br> ### Step 1: 取得 REST API 服務的 token ```bash # scontrol token [lifespan=] [username=] fetch an auth token $ kubectl -n slurm exec slurm-controller-0 -- scontrol token SLURM_JWT=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3NjY5OTM4NDksImlhdCI6MTc2Njk5MjA0OSwic3VuIjoic2x1cm0ifQ.SnRktJKKFihgDkBSYAFCXKPcUcIBC2KmVsXfJQzcxxM ``` - ### 每次都會產生不同的 token - ### 透過 [[Online JWT Decoder]](https://fusionauth.io/dev-tools/jwt-decoder),檢視 token 內容 - **Header** ```json { "alg": "HS256", "typ": "JWT" } ``` - **alg**: algorithm - **typ**: type - **Payload** ```json { "exp": 1766993849, "iat": 1766992049, "sun": "slurm" } ``` - [Online Epoch Converter - Unix Timestamp Converter](https://www.epochconverter.com/) - GMT: Monday, December 29, 2025 7:37:29 AM - Your time zone: Monday, December 29, 2025 3:37:29 PM GMT+08:00 - Relative: In 25 minutes - ### 使用 root 身份,產生有效期為 10 年的 token ``` $ scontrol token lifespan=315576000 username=root SLURM_JWT=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjIwODI1NzMyMjksImlhdCI6MTc2Njk5NzIyOSwic3VuIjoicm9vdCJ9.V9yeqhgqQ9l7uDEBnvd8tmKwMASUr5xeEkLLma3BX54 ``` - **Payload** ``` { "exp": 2082573229, "iat": 1766997229, "sun": "root" } ``` - `"exp": 2082573229` - GMT: Saturday, December 29, 2035 8:33:49 PM - Your time zone: Sunday, December 30, 2035 4:33:49 AM GMT+08:00 - Relative: **In 10 years** <br> ### Step 2: 取得 REST API 服務的 IP ```bash # 取得 REST API 服務的 IP $ SLURM_RESTAPI_IP="$(kubectl -n slurm get services slurm-restapi -o jsonpath='{.spec.clusterIP}')" $ echo $SLURM_RESTAPI_IP 10.100.59.7 ``` <br> ### Step 3: 呼叫 API(使用 JWT token) ```bash $ curl -H "X-SLURM-USER-TOKEN: ${JWT_TOKEN}" \ http://${SLURM_RESTAPI_IP}:6820/openapi/v3 { "tags": [ { "name": "slurm", "description": "methods that query slurmctld" }, { "name": "slurmdb", "description": "methods that query slurmdbd" }, { "name": "util", "description": "utilities available directly through slurmrestd" }, { "name": "openapi", "description": "methods that query for generated OpenAPI specifications" } ], "paths": { "\/slurm\/v0.0.41\/shares": { "get": { "tags": [ "slurm" ], "summary": "get fairshare info", "deprecated": true, "operationId": "slurm_v0041_get_shares", "parameters": [ { "in": "query", "name": "accounts", "style": "form", "explode": false, "deprecated": true, "allowEmptyValue": false, "allowReserved": false, "description": "Accounts to query", "required": false, "schema": { "type": "string" } }, { "in": "query", "name": "users", "style": "form", "explode": false, "deprecated": true, "allowEmptyValue": false, "allowReserved": false, "description": "Users to query", "required": false, "schema": { "type": "string" } } ], "responses": { "200": { "content": { "application\/json": { "schema": { "$ref": "#\/components\/schemas\/v0.0.41_openapi_shares_resp" } }, "application\/yaml": { "schema": { "$ref": "#\/components\/schemas\/v0.0.41_openapi_shares_resp" } } }, "description": "shares information" }, "default": { "content": { "application\/json": { "schema": { "$ref": "#\/components\/schemas\/v0.0.41_openapi_shares_resp" } }, "application\/yaml": { "schema": { "$ref": "#\/components\/schemas\/v0.0.41_openapi_shares_resp" } } }, "description": "shares information" } } } }, ... }, ... } ``` - ### 第一層內容 ```json { "tags":[...], "paths":{ "\/slurm\/v0.0.41\/jobs\/":{...} "\/slurm\/v0.0.42\/jobs\/":{...} "\/slurm\/v0.0.43\/jobs\/":{...} "\/slurm\/v0.0.44\/jobs\/":{...} "\/slurmdb\/v0.0.41\/accounts\/": {...} "\/slurmdb\/v0.0.42\/accounts\/": {...} "\/slurmdb\/v0.0.43\/accounts\/": {...} "\/slurmdb\/v0.0.44\/accounts\/": {...} ... }, "components":{...}, "info":{...}, "servers":[...], "security":[...], "openapi":"3.0.3" } ``` <br> --- <br> ## REST API 使用範例 ### token 過期的 error ``` { "errors": [ { "description": "Failed to open slurmdbd connection, authentication error", "error_number": 7000, "error": "Unable to connect to database", "source": "openapi_get_db_conn" } ], ... } ``` ### 查詢 account/root ```json= $ curl -H "X-SLURM-USER-TOKEN: ${JWT_TOKEN}" \ http://${SLURM_RESTAPI_IP}:6820/slurmdb/v0.0.41/account/root { "accounts": [ { "associations": [], "coordinators": [], "description": "default root account", "name": "root", "organization": "root", "flags": [] } ], "meta": { "plugin": { "type": "openapi\/slurmdbd", "name": "Slurm OpenAPI slurmdbd", "data_parser": "data_parser\/v0.0.41", "accounting_storage": "accounting_storage\/slurmdbd" }, "client": { "source": "slurm-restapi-86b668d6cb-vzj74:6820(fd:11)", "user": "nobody", "group": "" }, "command": [], "slurm": { "version": { "major": "25", "micro": "1", "minor": "11" }, "release": "25.11.1", "cluster": "slurm_slurm" } }, "errors": [], "warnings": [] } ``` <br> --- ### 新增 account ```json= $ curl -X POST \ -H "X-SLURM-USER-TOKEN: ${JWT_TOKEN}" \ -H "Content-Type: application/json" \ -d '{ "accounts": [{ "name": "project_a", "description": "Project A Research Account", "organization": "Research Lab" }] }' \ http://${SLURM_RESTAPI_IP}:6820/slurmdb/v0.0.41/accounts { "errors": [], "warnings": [], "meta": { "plugin": { "type": "openapi\/slurmdbd", "name": "Slurm OpenAPI slurmdbd", "data_parser": "data_parser\/v0.0.41", "accounting_storage": "accounting_storage\/slurmdbd" }, "client": { "source": "slurm-restapi-86b668d6cb-vzj74:6820(fd:11)", "user": "nobody", "group": "" }, "command": [], "slurm": { "version": { "major": "25", "micro": "1", "minor": "11" }, "release": "25.11.1", "cluster": "slurm_slurm" } } } ``` - ### 新增前: ``` $ sacctmgr list account Account Descr Org ---------- -------------------- -------------------- root default root account root ``` - ### 新增後: ``` $ sacctmgr list account Account Descr Org ---------- -------------------- -------------------- project_a Project A Research + Research Lab root default root account root ``` <br> --- ### 刪除 account ```json= $ curl -X DELETE \ -H "X-SLURM-USER-TOKEN: ${JWT_TOKEN}" \ http://${SLURM_RESTAPI_IP}:6820/slurmdb/v0.0.41/account/project_a { "removed_accounts": [ "project_a" ], "meta": { "plugin": { "type": "openapi\/slurmdbd", "name": "Slurm OpenAPI slurmdbd", "data_parser": "data_parser\/v0.0.41", "accounting_storage": "accounting_storage\/slurmdbd" }, "client": { "source": "slurm-restapi-86b668d6cb-vzj74:6820(fd:11)", "user": "nobody", "group": "" }, "command": [], "slurm": { "version": { "major": "25", "micro": "1", "minor": "11" }, "release": "25.11.1", "cluster": "slurm_slurm" } }, "errors": [], "warnings": [] } ``` - ### 刪除前: ``` $ sacctmgr list account Account Descr Org ---------- -------------------- -------------------- project_a Project A Research + Research Lab root default root account root ``` - ### 刪除後: ``` $ sacctmgr list account Account Descr Org ---------- -------------------- -------------------- root default root account root ``` <br> --- <br> ## 在 K8s 管理 token > 資訊來源: > - [[issue] Slurm rest api token not generated at deployment #97](https://github.com/SlinkyProject/slurm-operator/issues/97#issuecomment-3649902053) > - [[Commit 47e86be] feat: add Token CRD](https://github.com/SlinkyProject/slurm-operator/commit/47e86be3aa5d3661e23bb9bdc1595151c5aaaae6) > - [[CR example] slurm-operator/hack/resources/token.yaml](https://github.com/SlinkyProject/slurm-operator/blob/main/hack/resources/token.yaml) ### CR Example > slurm-token.yaml ```yaml= apiVersion: slinky.slurm.net/v1beta1 kind: Token metadata: name: slurm-token namespace: slurm spec: jwtHs256KeyRef: name: slurm-auth-jwths256 key: jwt_hs256.key username: root refresh: true lifetime: 15m ``` - ### 套用到 K8s: ``` $ kubectl apply -f slurm-token.yaml ``` - ### 查詢 token CR ``` $ kubectl -n slurm get token NAME USER IAT AGE slurm-token root 72s 5m46s ``` - ### 查詢 token CR 所產生的 secret ``` $ kubectl -n slurm get secret | egrep "NAME|token" NAME TYPE DATA AGE slurm-token-jwt-root Opaque 1 6m45s ``` - ### 查詢 JWT token ``` $ kubectl -n get secret/slurm-token-jwt-root -o jsonpath="{.data.SLURM_JWT}" | base64 -d eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzbHVybS1vcGVyYXRvciIsImV4cCI6MTc2NzAwMjI4NCwibmJmIjoxNzY3MDAxMzg0LCJpYXQiOjE3NjcwMDEzODQsImp0aSI6IjcyYjZhZWMzLTlmNDAtNDMwNC04Y2M5LTNhZjAwYzg0NmExNyIsInN1biI6InJvb3QifQ.LjXjlq7jjmr1aJOYJh2YpUh_t5HQKxoIex-eYgfzNj8 ``` - secret 名稱:`<token-cr-name>-jwt-<username>` - ### Kubernetes Duration 格式規則 Kubernetes `metav1.Duration` 支援的標準格式(基於 Go 的 `time.Duration`): - 可以組合多個單位:`1h30m`、`2h45m30s` - 支援的基本單位:`ns`、`us`/`µs`、`ms`、`s`、`m`、`h` - **不直接支援** `d`(天)和 `y`(年),需要換算成小時,例如: - 1 天 = `24h` - 7 天 = `168h` - 30 天 = `720h` <br> --- <br> ## [詳解] 如何使用 Slurm REST API 與 Slurm 溝通 (round-1) > - codebase: https://github.com/slinkyproject > <!-- http://10.78.26.44:30000/deepwiki/slurm-operator-v1.0.0-20251125/-/issues/31 --> > - by us.anthropic.claude-sonnet-4-5-20250929-v1:0 > (底下資訊不一定正確,請參考最新文件:https://slurm.schedmd.com/rest_api.html) 根據這個 Slurm Operator 專案的實作,我為您整理出完整的使用指南: ### 一、基本架構 **REST API 服務資訊** - **Service 名稱**: `slurmrestd` - **預設端口**: `6820` - **協定**: HTTP - **API 版本**: 支援 OpenAPI v3 從 `internal/builder/restapi_app.go:26` 可以看到: ```go SlurmrestdPort = 6820 ``` ### 二、身份驗證機制 #### JWT Token 認證 專案使用 **JWT (JSON Web Token)** 進行身份驗證,實作在 `internal/controller/token/slurmjwt/token.go` **Token 規格**: - **簽署演算法**: HS256 - **預設使用者**: `slurm` - **預設有效期**: 近乎無限(`math.MaxInt32 - 1` 秒) **Token Claims 結構**: ```go type TokenClaims struct { jwt.RegisteredClaims ID string // 唯一識別碼 Issuer string // 發行者: "slurm-operator" IssuedAt time // 發行時間 ExpiresAt time // 過期時間 SlurmUsername string `json:"sun"` // Slurm 使用者名稱 } ``` **產生 Token 的方式**: ```go // 使用 signing key 建立 token token := NewToken(signingKey) token.WithUsername("your-username") // 設定使用者 token.WithLifetime(24 * time.Hour) // 設定有效期 // 產生簽署後的 token signedToken, err := token.NewSignedToken() ``` ### 三、API 使用方式 #### 基本 API 請求格式 從 `helm/slurm/templates/NOTES.txt:91` 可以看到範例: ```bash # 取得 REST API 服務的 IP SLURM_RESTAPI_IP="$(kubectl get services -n slurm slurm-restapi -o jsonpath='{.spec.clusterIP}')" # 呼叫 API(使用 JWT token) curl -H "X-SLURM-USER-TOKEN: ${JWT_TOKEN}" \ http://${SLURM_RESTAPI_IP}:6820/openapi/v3 # 或使用 auth/none(僅限開發環境) curl -H "X-SLURM-USER-TOKEN: auth/none" \ http://${SLURM_RESTAPI_IP}:6820/openapi/v3 ``` #### API Endpoints 結構 根據 Slurm REST API 標準,常用的 endpoints 包括: ```bash # 取得 API 文件 GET /openapi/v3 # Account 管理 GET /slurm/v0.0.40/accounts # 列出所有 accounts GET /slurm/v0.0.40/account/{name} # 取得特定 account POST /slurm/v0.0.40/accounts # 建立 account DELETE /slurm/v0.0.40/account/{name} # 刪除 account # User 管理 GET /slurm/v0.0.40/users # 列出所有 users GET /slurm/v0.0.40/user/{name} # 取得特定 user POST /slurm/v0.0.40/users # 建立 user DELETE /slurm/v0.0.40/user/{name} # 刪除 user # Job 管理 GET /slurm/v0.0.40/jobs # 列出 jobs POST /slurm/v0.0.40/job/submit # 提交 job DELETE /slurm/v0.0.40/job/{job_id} # 取消 job # Node 資訊 GET /slurm/v0.0.40/nodes # 列出所有 nodes GET /slurm/v0.0.40/node/{name} # 取得特定 node ``` ### 四、Account 與 User 管理實例 #### 1. 新增 Account ```bash # 建立 JWT token(使用 slurm 使用者) JWT_TOKEN="your-jwt-token-here" # 新增 account curl -X POST \ -H "X-SLURM-USER-TOKEN: ${JWT_TOKEN}" \ -H "Content-Type: application/json" \ -d '{ "accounts": [{ "name": "project_a", "description": "Project A Research Account", "organization": "Research Lab" }] }' \ http://${SLURM_RESTAPI_IP}:6820/slurm/v0.0.40/accounts ``` #### 2. 新增 User ```bash # 新增 user 到 account curl -X POST \ -H "X-SLURM-USER-TOKEN: ${JWT_TOKEN}" \ -H "Content-Type: application/json" \ -d '{ "users": [{ "name": "john_doe", "default_account": "project_a", "admin_level": "None" }] }' \ http://${SLURM_RESTAPI_IP}:6820/slurm/v0.0.40/users ``` #### 3. 查詢 Account 資訊 ```bash # 列出所有 accounts curl -H "X-SLURM-USER-TOKEN: ${JWT_TOKEN}" \ http://${SLURM_RESTAPI_IP}:6820/slurm/v0.0.40/accounts # 查詢特定 account curl -H "X-SLURM-USER-TOKEN: ${JWT_TOKEN}" \ http://${SLURM_RESTAPI_IP}:6820/slurm/v0.0.40/account/project_a ``` ### 五、重要注意事項 #### 1. **Accounting 功能需要啟用** 從 `internal/builder/controller_config.go:206-216` 可以看到: ```go // 需要啟用 Accounting 才能使用 account/user 管理 if accounting != nil { conf.AddProperty("AccountingStorageType", "accounting_storage/slurmdbd") conf.AddProperty("AccountingStorageHost", accounting.ServiceKey().Name) conf.AddProperty("AccountingStoragePort", SlurmdbdPort) } else { conf.AddProperty("AccountingStorageType", "accounting_storage/none") } ``` **啟用方式** (從 `docs/installation.md:228`): ```bash # 安裝時啟用 accounting helm install slurm ./helm/slurm \ --set 'accounting.enabled=true' ``` #### 2. **JWT Signing Key 安全性** - JWT token 的簽署金鑰 (`signingKey`) **必須妥善保管** - 所有需要存取 REST API 的服務都必須使用相同的 signing key - Signing key 儲存在 Kubernetes Secret 中 #### 3. **Token 有效期限設定** ```go // 預設是近乎無限期 lifetime := math.MaxInt32 - 1 // 約 68 年 // 建議根據安全需求設定適當的有效期 token.WithLifetime(24 * time.Hour) // 例如:24 小時 ``` #### 4. **API 版本控制** - 當前常用版本: `/slurm/v0.0.40/` - 建議檢查 OpenAPI 文件確認支援的版本 - 不同版本的 API 格式可能有差異 #### 5. **Port-Forward 設定(開發環境)** 從 `docs/usage/develop.md:200`: ```bash # 在本地開發時需要設定 port-forward kubectl port-forward --namespace slurm services/slurm-restapi 6820:6820 ``` #### 6. **Database Backend 需求** Account 和 User 管理需要 **slurmdbd** (Slurm Database Daemon) 配合: - 需要 MariaDB 或 MySQL 作為後端資料庫 - 配置資訊在 `AccountingSpec.StorageConfig`: ```go type StorageConfig struct { Host string // 資料庫主機 Port int // 資料庫端口 Database string // 資料庫名稱 Username string // 資料庫使用者 } ``` #### 7. **權限管理** - 只有具備 `admin` 權限的使用者可以管理 account 和 user - JWT token 中的 `SlurmUsername` 決定操作權限 - 確保 token 對應的使用者在 Slurm 中有適當的管理權限 #### 8. **錯誤處理** ```go // Token 驗證 valid, err := VerifyToken(tokenString, signingKey) if err != nil || !valid { // 處理驗證失敗 } // API 呼叫需要檢查 HTTP status code // 常見錯誤碼: // - 401: 未授權(token 無效或過期) // - 403: 禁止存取(權限不足) // - 404: 資源不存在 // - 500: 伺服器錯誤 ``` ### 六、完整使用流程範例 ```bash #!/bin/bash # 1. 取得 REST API 服務資訊 NAMESPACE="slurm" API_SERVICE="slurm-restapi" API_PORT="6820" # 2. 設定 port-forward(開發環境) kubectl port-forward -n ${NAMESPACE} services/${API_SERVICE} ${API_PORT}:${API_PORT} & # 3. 產生或取得 JWT token # (實際環境中應該從安全的地方取得) JWT_TOKEN="your-jwt-token" # 4. 測試連線 curl -H "X-SLURM-USER-TOKEN: ${JWT_TOKEN}" \ http://localhost:${API_PORT}/openapi/v3 # 5. 建立 account curl -X POST \ -H "X-SLURM-USER-TOKEN: ${JWT_TOKEN}" \ -H "Content-Type: application/json" \ -d '{ "accounts": [{ "name": "research_team", "description": "Research Team Account" }] }' \ http://localhost:${API_PORT}/slurm/v0.0.40/accounts # 6. 建立 user curl -X POST \ -H "X-SLURM-USER-TOKEN: ${JWT_TOKEN}" \ -H "Content-Type: application/json" \ -d '{ "users": [{ "name": "researcher01", "default_account": "research_team" }] }' \ http://localhost:${API_PORT}/slurm/v0.0.40/users # 7. 驗證建立結果 curl -H "X-SLURM-USER-TOKEN: ${JWT_TOKEN}" \ http://localhost:${API_PORT}/slurm/v0.0.40/accounts curl -H "X-SLURM-USER-TOKEN: ${JWT_TOKEN}" \ http://localhost:${API_PORT}/slurm/v0.0.40/users ``` ### 七、相關資源連結 - **Slurm REST API 概覽**: https://slurm.schedmd.com/rest.html - **REST API 快速入門**: https://slurm.schedmd.com/rest_quickstart.html - **REST API 完整文件**: https://slurm.schedmd.com/rest_api.html - **JWT 相容性說明**: https://slurm.schedmd.com/jwt.html#compatibility <br> --- <br> ## [詳解] 如何使用 Slurm REST API 與 Slurm 溝通 (round-2) > - codebase: https://github.com/SchedMD/slurm > <!-- http://10.78.26.44:30000/deepwiki/slurm/-/issues/17 --> > - by `us.anthropic.claude-sonnet-4-5-20250929-v1:0` (2025/12/29) > (底下資訊不一定正確,請參考最新文件:https://slurm.schedmd.com/rest_api.html) 根據我對 Slurm 代碼庫的分析,我現在可以為你提供完整的說明: ### 一、基本架構 Slurm REST API 透過 **slurmrestd** daemon 提供服務,使用 OpenAPI 規範,支援與 slurmctld 和 slurmdbd 溝通。 ### 二、認證方式 REST API 支援多種認證方式: 1. **JWT (JSON Web Token)** - 推薦方式 - Header: `X-SLURM-USER-TOKEN` - 用戶名: `X-SLURM-USER-NAME` 2. **HTTP Basic Authentication** 3. **Bearer Token** - Header: `Authorization: Bearer <token>` ### 三、新增 Account 的 API 操作 #### 1. **查詢所有 Accounts** ```bash curl -H "X-SLURM-USER-TOKEN:$SLURM_JWT" \ -X GET \ "http://<server>:<port>/slurmdb/v0.0.43/accounts/" ``` #### 2. **查詢特定 Account** ```bash curl -H "X-SLURM-USER-TOKEN:$SLURM_JWT" \ -X GET \ "http://<server>:<port>/slurmdb/v0.0.43/account/<account_name>" ``` 支援的查詢參數: - `with_assocs`: 包含 associations - `with_coords`: 包含 coordinators - `with_deleted`: 包含已刪除的記錄 #### 3. **新增或更新 Account** ```bash curl -H "X-SLURM-USER-TOKEN:$SLURM_JWT" \ -H "Content-Type: application/json" \ -X POST \ "http://<server>:<port>/slurmdb/v0.0.43/accounts/" \ -d '{ "accounts": [ { "name": "new_account", "description": "New account description", "organization": "My Organization", "coordinators": [ {"name": "coordinator_username"} ] } ] }' ``` #### 4. **刪除 Account** ```bash curl -H "X-SLURM-USER-TOKEN:$SLURM_JWT" \ -X DELETE \ "http://<server>:<port>/slurmdb/v0.0.43/account/<account_name>" ``` ### 四、新增 User 的 API 操作 #### 1. **查詢所有 Users** ```bash curl -H "X-SLURM-USER-TOKEN:$SLURM_JWT" \ -X GET \ "http://<server>:<port>/slurmdb/v0.0.43/users/" ``` #### 2. **查詢特定 User** ```bash curl -H "X-SLURM-USER-TOKEN:$SLURM_JWT" \ -X GET \ "http://<server>:<port>/slurmdb/v0.0.43/user/<username>" ``` 支援的查詢參數: - `with_assocs`: 包含 associations - `with_coords`: 包含 coordinators - `with_wckeys`: 包含 wckeys - `with_deleted`: 包含已刪除的記錄 #### 3. **新增或更新 User** ```bash curl -H "X-SLURM-USER-TOKEN:$SLURM_JWT" \ -H "Content-Type: application/json" \ -X POST \ "http://<server>:<port>/slurmdb/v0.0.43/users/" \ -d '{ "users": [ { "name": "new_user", "default_account": "default_account_name", "default_wckey": "default_wckey", "admin_level": "None" } ] }' ``` #### 4. **新增 User Association** (將 User 關聯到 Account) ```bash curl -H "X-SLURM-USER-TOKEN:$SLURM_JWT" \ -H "Content-Type: application/json" \ -X POST \ "http://<server>:<port>/slurmdb/v0.0.43/users_association/" \ -d '{ "user": { "name": "username", "default_account": "account_name" }, "association": { "account": ["account_name"], "cluster": ["cluster_name"], "user": ["username"] } }' ``` #### 5. **刪除 User** ```bash curl -H "X-SLURM-USER-TOKEN:$SLURM_JWT" \ -X DELETE \ "http://<server>:<port>/slurmdb/v0.0.43/user/<username>" ``` ### 五、重要注意事項 #### 1. **認證設置** ```bash # 取得 JWT token unset SLURM_JWT export $(scontrol token) # 或指定有效期限 (秒) export $(scontrol token lifespan=3600) ``` #### 2. **啟動 slurmrestd** ```bash # 設置環境變數 export SLURM_JWT=daemon export SLURMRESTD_DEBUG=debug # 啟動服務 slurmrestd <host>:<port> ``` #### 3. **API 版本** - 當前版本: `v0.0.43` - 查詢可用版本: `slurmrestd -d list` #### 4. **權限要求** - API 調用者必須有適當的權限 - 新增/修改 account 和 user 通常需要管理員權限 - JWT token 必須由有權限的用戶生成 #### 5. **資料結構注意事項** 根據源碼分析 (users.c:154-164, accounts.c:193-195): - **User 的 associations** 不能直接透過 users endpoint 設置,必須使用專用的 associations endpoint - **User 的 coordinators** 也不能直接透過 users endpoint 設置,必須使用專用的 coordinators endpoint - **Account 的 associations** 同樣必須透過 associations endpoint 設置 #### 6. **操作行為** - POST 請求會自動判斷是**新增**還是**修改**操作 - 如果資源不存在,會新增 - 如果資源已存在,會修改 - 可以使用 `old_name` 欄位來重新命名 user #### 7. **WCKey 處理** - 如果設置 `default_wckey`,系統會自動將其加入 `wckey_list` - WCKey 會自動標記為 default #### 8. **錯誤處理** - API 回應包含 `errors` 和 `warnings` 陣列 - 檢查 HTTP status code: - 200: 成功 - 400: 請求格式錯誤 - 401: 認證失敗 - 404: 資源不存在 - 500: 伺服器錯誤 #### 9. **安全性考量** - **不要在 slurm.conf 中硬編碼敏感資訊** - JWT token 有時效性,需定期更新 - 建議使用 HTTPS 而非 HTTP (實際部署時) - slurmrestd 應該以專用的非特權用戶運行 #### 10. **性能優化** - 批次操作時,可以在單一 POST 請求中包含多個 users 或 accounts - 使用適當的查詢參數避免傳輸不必要的資料 ### 六、完整範例工作流程 ```bash # 1. 設置環境 export SLURM_JWT=daemon export SLURMRESTD_DEBUG=debug # 2. 啟動 slurmrestd (如果尚未啟動) slurmrestd localhost:6820 & # 3. 取得認證 token unset SLURM_JWT export $(scontrol token lifespan=3600) # 4. 新增 Account curl -H "X-SLURM-USER-TOKEN:$SLURM_JWT" \ -H "Content-Type: application/json" \ -X POST \ "http://localhost:6820/slurmdb/v0.0.43/accounts/" \ -d '{ "accounts": [{ "name": "project_a", "description": "Project A Research Account", "organization": "Research Lab" }] }' # 5. 新增 User curl -H "X-SLURM-USER-TOKEN:$SLURM_JWT" \ -H "Content-Type: application/json" \ -X POST \ "http://localhost:6820/slurmdb/v0.0.43/users/" \ -d '{ "users": [{ "name": "john_doe", "default_account": "project_a", "admin_level": "None" }] }' # 6. 查詢確認 curl -H "X-SLURM-USER-TOKEN:$SLURM_JWT" \ -X GET \ "http://localhost:6820/slurmdb/v0.0.43/user/john_doe?with_assocs=true" ``` ### 七、相關文件路徑 - REST API 實作: `/src/slurmrestd/plugins/openapi/slurmdbd/` - Users 處理: `users.c` - Accounts 處理: `accounts.c` - 認證處理: `/src/slurmrestd/rest_auth.h` - 文件: `/doc/html/rest_quickstart.shtml`, `/doc/html/rest_api.shtml` 這些資訊應該能幫助你使用 Slurm REST API 來管理 accounts 和 users! <br> --- <br> ## CLI 轉 REST API 案例 ### `sacctmgr modify user <username> set MaxJobs=0 MaxSubmitJobs=0` 轉換成 rest API - ### `us.anthropic.claude-sonnet-4-5-20250929-v1:0` (2026/01/31) > codebase: https://github.com/SchedMD/slurm ## Slurm REST API 設定使用者 MaxJobs 和 MaxSubmitJobs 根據程式碼分析,`sacctmgr modify user <username> set MaxJobs=0 MaxSubmitJobs=0` 指令需要透過 **Associations API** 來實現,因為 MaxJobs 和 MaxSubmitJobs 是 **association 層級的限制**,而非單純的使用者屬性。 ### REST API 端點 ``` POST /slurmdb/v0.0.43/associations/ ``` - https://slurm.schedmd.com/rest_api.html#slurmdbV0044PostAssociations ### 請求格式 ```json { "associations": [ { "user": "username", "cluster": "your-cluster-name", "account": "user-account", "max": { "jobs": { "active": { "set": true, "number": 0 }, "total": { "set": true, "number": 0 } } } } ] } ``` ### 欄位說明 根據 `src/plugins/data_parser/v0.0.43/parsers.c:4963-4965`: - **`max/jobs/active`** (max_jobs): 對應 `MaxJobs` - 使用者在此 association 中可同時執行的最大 job 數量 - **`max/jobs/total`** (max_submit_jobs): 對應 `MaxSubmitJobs` - 使用者在此 association 中可處於 pending 或 running 狀態的最大 job 總數 ### 完整 curl 範例 ```bash # 1. 先查詢現有 association 資訊 curl -X GET "http://localhost:6820/slurmdb/v0.0.43/associations/?user=username" \ -H "X-SLURM-USER-NAME: slurm" \ -H "X-SLURM-USER-TOKEN: your-token" \ -H "Content-Type: application/json" # 2. 修改 association 設定 curl -X POST "http://localhost:6820/slurmdb/v0.0.43/associations/" \ -H "X-SLURM-USER-NAME: slurm" \ -H "X-SLURM-USER-TOKEN: your-token" \ -H "Content-Type: application/json" \ -d '{ "associations": [ { "user": "username", "cluster": "linux", "account": "root", "max": { "jobs": { "active": { "set": true, "number": 0 }, "total": { "set": true, "number": 0 } } } } ] }' ``` - ### CLI 執行範例 ``` # 設定前: $ sacctmgr list association user=tj_tsai@asus.com Cluster Account User Partition Share Priority GrpJobs GrpTRES GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode MaxSubmit MaxWall MaxTRESMins QOS Def QOS GrpTRESRunMin ---------- ---------- ---------- ---------- --------- ---------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- ------------- slurm_slu+ tn tj_tsai@a+ 1 normal # 設定後: $ sacctmgr list association user=tj_tsai@asus.com Cluster Account User Partition Share Priority GrpJobs GrpTRES GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode MaxSubmit MaxWall MaxTRESMins QOS Def QOS GrpTRESRunMin ---------- ---------- ---------- ---------- --------- ---------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- ------------- slurm_slu+ tn tj_tsai@a+ 1 2 2 normal ``` - ### REST API 執行範例 1. **先查詢現有 association 資訊** ```json= $ TOKEN=<your-token> $ curl -X GET "http://slurm-restapi.slurm.svc.cluster.local:6820/slurmdb/v0.0.43/associations/?user=tj_tsai%40asus.com" \ -H "X-SLURM-USER-NAME: slurm" \ -H "X-SLURM-USER-TOKEN: ${TOKEN}" \ -H "Content-Type: application/json" { "associations": [ { "accounting": [], "account": "tn", "cluster": "slurm_slurm", "comment": "", "default": { "qos": "" }, "flags": [], "max": { "jobs": { "per": { "count": { "set": false, "infinite": true, "number": 0 }, "accruing": { "set": false, "infinite": true, "number": 0 }, "submitted": { "set": false, "infinite": true, "number": 0 }, "wall_clock": { "set": false, "infinite": true, "number": 0 } }, "active": { "set": false, "infinite": false, "number": 0 }, "accruing": { "set": false, "infinite": true, "number": 0 }, "total": { "set": false, "infinite": false, "number": 0 } }, "tres": { "total": [], "group": { "minutes": [], "active": [] }, "minutes": { "total": [], "per": { "job": [] } }, "per": { "job": [], "node": [] } }, "per": { "account": { "wall_clock": { "set": false, "infinite": true, "number": 0 } } } }, "id": 6, "is_default": true, "lineage": "\/tn\/0-tj_tsai@asus.com\/", "min": { "priority_threshold": { "set": false, "infinite": true, "number": 0 } }, "parent_account": "", "partition": "", "priority": { "set": false, "infinite": true, "number": 0 }, "qos": [ "normal" ], "shares_raw": 1, "user": "tj_tsai@asus.com" } ], "meta": { "plugin": { "type": "openapi\/slurmdbd", "name": "Slurm OpenAPI slurmdbd", "data_parser": "data_parser\/v0.0.43", "accounting_storage": "accounting_storage\/slurmdbd" }, "client": { "source": "slurm-restapi-f844ccc5c-qr5nw:6820(fd:11)", "user": "nobody", "group": "" }, "command": [], "slurm": { "version": { "major": "25", "micro": "2", "minor": "11" }, "release": "25.11.2", "cluster": "slurm_slurm" } }, "errors": [], "warnings": [] } ``` 2. **修改 association 設定** ```json= curl -X POST "http://slurm-restapi.slurm.svc.cluster.local:6820/slurmdb/v0.0.43/associations/" \ -H "X-SLURM-USER-NAME: slurm" \ -H "X-SLURM-USER-TOKEN: ${TOKEN}" \ -H "Content-Type: application/json" \ -d '{ "associations": [ { "user": "tj_tsai@asus.com", "cluster": "slurm_slurm", "account": "tn", "max": { "jobs": { "active": { "set": true, "number": 2 }, "total": { "set": true, "number": 2 } } } } ] }' { "errors": [], "warnings": [], "meta": { "plugin": { "type": "openapi\/slurmdbd", "name": "Slurm OpenAPI slurmdbd", "data_parser": "data_parser\/v0.0.43", "accounting_storage": "accounting_storage\/slurmdbd" }, "client": { "source": "slurm-restapi-f844ccc5c-qr5nw:6820(fd:11)", "user": "nobody", "group": "" }, "command": [], "slurm": { "version": { "major": "25", "micro": "2", "minor": "11" }, "release": "25.11.2", "cluster": "slurm_slurm" } } } ``` - 再次查詢現有 association 資訊 ```json "max": { "jobs": { "active": { "set": true, "infinite": false, "number": 2 }, "accruing": { "set": false, "infinite": true, "number": 0 }, "total": { "set": true, "infinite": false, "number": 2 } } } ``` ### 重要注意事項 1. **必須指定完整的 association**: 需要提供 `user`、`cluster`、`account` 來唯一識別 association 2. **`set` 欄位**: 必須設定為 `true` 表示要更新此欄位 3. **修改行為** (程式碼: `src/slurmrestd/plugins/openapi/slurmdbd/associations.c:250-319`): - 如果 association 已存在 → **修改 (modify)** - 如果 association 不存在 → **新增 (add)** 4. **權限**: 需要 slurmdbd 管理權限 5. **認證方式**: - JWT Token: `X-SLURM-USER-TOKEN` - Basic Auth - Local auth ### 測試範例 (Python) 參考 `testsuite/python/tests/test_112_43.py:626-684`: ```python from openapi_client.models.v0043_openapi_assocs_resp import V0043OpenapiAssocsResp from openapi_client.models.v0043_assoc import V0043Assoc from openapi_client.models.v0043_uint32_no_val_struct import V0043Uint32NoValStruct associations = V0043OpenapiAssocsResp( associations=[ V0043Assoc( account="root", cluster="linux", user="username", max=dict( jobs=dict( active=V0043Uint32NoValStruct( set=True, number=0 ), total=V0043Uint32NoValStruct( set=True, number=0 ) ), ), ) ] ) resp = slurmdb.slurmdb_v0043_post_associations( v0043_openapi_assocs_resp=associations ) ``` ### 相關檔案 - API Handler: `src/slurmrestd/plugins/openapi/slurmdbd/associations.c:250-319` - 資料結構: `src/plugins/data_parser/v0.0.43/parsers.c:4963-4965` - OpenAPI 文件: `doc/html/rest_api.shtml` <br> --- <br> ## 討論 ### slurm rest API 中,建立 partition 的 endpoint & 用法為何? - ### us.anthropic.claude-sonnet-4-5-20250929-v1:0 (2025/12/31) > codebase: https://github.com/SchedMD/slurm ### **Slurm REST API 中建立 Partition 的結論** 根據程式碼分析,**Slurm REST API 目前不支援建立 partition 的功能**。 #### **現有的 Partition 相關 Endpoints(僅支援查詢)** 1. **列出所有 Partitions** - **Endpoint**: `GET /slurm/{data_parser}/partitions/` 範例:[GET /slurm/v0.0.44/partitions/](https://slurm.schedmd.com/rest_api.html#slurmV0044GetPartitions) - **功能**: 取得所有 partition 的資訊 - **實作位置**: `src/slurmrestd/plugins/openapi/slurmctld/partitions.c:44` 2. **查詢特定 Partition** - **Endpoint**: `GET /slurm/{data_parser}/partition/{partition_name}` - **功能**: 取得指定名稱的 partition 詳細資訊 - **實作位置**: `src/slurmrestd/plugins/openapi/slurmctld/partitions.c:86` #### **查詢範例** ```bash # 列出所有 partitions curl -X GET "http://localhost:6820/slurm/v0.0.43/partitions/" \ -H "X-SLURM-USER-NAME: slurm" \ -H "X-SLURM-USER-TOKEN: <token>" # 查詢特定 partition (例如: compute) curl -X GET "http://localhost:6820/slurm/v0.0.43/partition/compute" \ -H "X-SLURM-USER-NAME: slurm" \ -H "X-SLURM-USER-TOKEN: <token>" ``` #### **重要發現** 從程式碼可以看到,partition 的 handler 明確拒絕非 GET 方法: ```c // src/slurmrestd/plugins/openapi/slurmctld/partitions.c:51-56 if (ctxt->method != HTTP_REQUEST_GET) { resp_error(ctxt, ESLURM_REST_INVALID_QUERY, __func__, "Unsupported HTTP method requested: %s", get_http_method_string(ctxt->method)); goto done; } ``` #### **建立/更新 Partition 的替代方法** 如需建立或修改 partition,必須使用以下方式: 1. **使用 `scontrol` 命令** ```bash scontrol create PartitionName=newpart Nodes=node[1-10] State=UP scontrol update PartitionName=newpart MaxTime=24:00:00 ``` 2. **直接編輯 `slurm.conf`** ``` PartitionName=newpart Nodes=node[1-10] Default=NO State=UP ``` 編輯後需執行 `scontrol reconfigure` 3. **使用 Slurm C API** - 透過 `slurm_update_partition()` 函數(需要撰寫 C 程式) 目前 Slurm REST API 的設計僅提供**讀取 (read-only)** 功能給 partition 資源,不支援建立、更新或刪除操作。 --- - ### gpt-5.2 (2025/12/31) 在 **Slurm REST API(slurmrestd / openapi/slurmctld)** 裡,**「建立 partition」目前沒有對應的 REST 端點**(也就是沒有 `POST /slurm/.../partition` 或類似的 create API)。在官方 REST API 的 Methods 清單中,partition 只有 **查詢用**的兩個 `GET`:`/partition/{partition_name}` 與 `/partitions/`。 ([Slurm][20251231-A-1]) ### REST API 能用的 partition URL(查詢) 以官方文件示例版本(Slurm-25.11.1 / API v0.0.44)為例: ([Slurm][20251231-A-1]) * **列出所有 partitions** * `GET /slurm/v0.0.44/partitions/` ([Slurm][20251231-A-1]) * **查單一 partition** * `GET /slurm/v0.0.44/partition/{partition_name}` ([Slurm][20251231-A-1]) #### curl 範例(JWT) slurmrestd 的 `rest_auth/jwt` 會要求你在每個 request 帶上: * `X-SLURM-USER-NAME: <user name>` * `X-SLURM-USER-TOKEN: <JSON Web Token>` 且 token 通常用 `scontrol token` 產生。 ([Slurm][20251231-A-2]) ```bash TOKEN="$(scontrol token | sed 's/^SLURM_JWT=//')" curl -sS \ -H "X-SLURM-USER-NAME: $USER" \ -H "X-SLURM-USER-TOKEN: $TOKEN" \ http://<slurmrestd-host>:<port>/slurm/v0.0.44/partitions/ ``` ```bash curl -sS \ -H "X-SLURM-USER-NAME: $USER" \ -H "X-SLURM-USER-TOKEN: $TOKEN" \ http://<slurmrestd-host>:<port>/slurm/v0.0.44/partition/debug ``` > REST API 文件也列出支援 `X-SLURM-USER-NAME`、`X-SLURM-USER-TOKEN`(APIKey in Header)以及 HTTP Basic Auth。 ([Slurm][20251231-A-1]) --- ### 那要「建立 partition」該怎麼做? #### 做法 A:用 `scontrol create` 動態建立(最像你要的「API 操作」) ` scontrol` 支援 **create partition**,並且 partition 的欄位規格在文件的 *PARTITIONS - SPECIFICATIONS FOR CREATE AND UPDATE COMMANDS* 章節。 ([Slurm][20251231-A-3]) ```bash sudo scontrol create \ PartitionName=debug \ Nodes=node[0-3] \ Default=NO \ MaxTime=INFINITE \ State=UP ``` > `scontrol` 的 `create <SPECIFICATION>` 可以「Create a new node, partition, or reservation」。 ([Slurm][20251231-A-3]) #### 做法 B:寫入 `slurm.conf` 後 reconfigure(最「正規/可長期維護」) `slurm.conf` 本來就用來描述「節點如何被分到 partitions」等設定。 ([Slurm][20251231-A-4]) 你可以更新 `slurm.conf` 後,要求 slurmctld reconfigure: * REST API 有提供: * `GET /slurm/v0.0.44/reconfigure/`(request slurmctld reconfigure) ([Slurm][20251231-A-1]) ```bash curl -sS \ -H "X-SLURM-USER-NAME: $USER" \ -H "X-SLURM-USER-TOKEN: $TOKEN" \ http://<slurmrestd-host>:<port>/slurm/v0.0.44/reconfigure/ ``` --- #### 重點結論 * **REST API:只能查 partitions(GET),不能建立 partition。** ([Slurm][20251231-A-1]) * 要「建立」:用 `scontrol create PartitionName=...` ([Slurm][20251231-A-3]) 或改 `slurm.conf` 再 `reconfigure`(可用 REST 的 `/reconfigure/`)。 ([Slurm][20251231-A-1]) 如果你告訴我你叢集的 Slurm 版本、slurmrestd 開的 base path(例如 `v0.0.40` 還是 `v0.0.44`),我可以把上面的 URL 全部換成你環境「精準可用」的版本字串。 [20251231-A-1]: https://slurm.schedmd.com/rest_api.html "Slurm Workload Manager - " [20251231-A-2]: https://slurm.schedmd.com/slurmrestd.html "Slurm Workload Manager - slurmrestd" [20251231-A-3]: https://slurm.schedmd.com/scontrol.html "Slurm Workload Manager - scontrol" [20251231-A-4]: https://slurm.schedmd.com/slurm.conf.html?utm_source=chatgpt.com "Slurm Workload Manager - slurm.conf" <br> {%hackmd vaaMgNRPS4KGJDSFG0ZE0w %}