# NHI OMOP ## 服務IP:Port - https://bdcode.vghtpe.gov.tw/ - [10.221.252.51/atlas](http://10.221.252.51/atlas) Atlas 分析版 - 10.221.252.51:5436 Atlas NHI資料源 (Postgres) - 10.221.252.50:5432 Atlas ICU資料源 (Postgres) - 10.221.252.51:5438 Atlas WebAPI (Postgres) - 10.221.252.51:8787 Hades (RStudio) ## Rabbit報告 - [連結](https://drive.google.com/drive/folders/1RFzS1mWo9Tcu-gDhvLIg7iEeViowF5YN?usp=drive_link) # 林鴻睿 ## 參考連結 [急診OMOP轉換流程手冊](https://hackmd.io/-ykODPJiRXWuWZCIh9lP6Q) [OMOP v5.4規範](https://ohdsi.github.io/CommonDataModel/cdm54.html) [Athena id查詢](https://athena.ohdsi.org/search-terms/start) [Athena API](https://hackmd.io/QBKU9ZTkStqJk6TgcY3-CA#API-Test) [Athena API帶參數範例](https://athena.ohdsi.org/api/v1/concepts?pageSize=15&domain=Measurement&standardConcept=Standard&invalidReason=Valid&page=1&query=sodium) [北榮大數據](https://wd.vghtpe.gov.tw/bdc/Fpage.action?muid=19745&fid=17871) [OHDSI White Rabbit](https://github.com/OHDSI/WhiteRabbit) [GoogleSheet](https://docs.google.com/spreadsheets/d/1CcmLsikAMBxu3yTH5Hy17NmVtoOQkRQHxZCh1_o5LPU/edit?usp=sharing) [OMOP健保資料庫前人整理GoogleSheet](https://docs.google.com/spreadsheets/d/1Ur_qzITI2Rz8a6NJyDAFsseYK5vlp01L/edit?gid=147600560#gid=147600560) [OHDSI官方文檔](https://ohdsi.github.io/TheBookOfOhdsi/) [OHDSI日文社群文檔](https://rwd-data-environment-in-hospital.github.io/Documents/) ## OMOP Table SQL Queries ### measurement - 進度: 完成 - 缺: - 其他: - `measurement_id`, `measurement_type_concept_id`, 目前流水號 - `measurement_concept_id`目前以`Athena API`腳本方式大量自動取得 ```sql= WITH measurement_base AS ( SELECT CAST("PERSON"."person_id" AS INTEGER) AS "person_id", "LAB_ITEM_MAPPING"."id" AS "measurement_concept_id", CAST("LAB_20231003"."sdate" AS DATE) AS "measurement_date", CASE WHEN "LAB_20231003"."outcome" ~ '^[+-]?[0-9]*\.?[0-9]+$' THEN CAST("NEPHR"."LAB_20231003"."outcome" AS NUMERIC) ELSE NULL END AS "value_as_number", "VISIT_OCCURRENCE"."visit_occurrence_id" FROM "LAB_20231003" LEFT JOIN "PERSON" ON "LAB_20231003"."uuid" = "PERSON"."person_source_value" LEFT JOIN "LAB_ITEM_MAPPING" ON "LAB_ITEM_MAPPING"."item" = "NEPHR"."LAB_20231003"."item" LEFT JOIN "VISIT_OCCURRENCE" ON "PERSON"."person_id" = "VISIT_OCCURRENCE"."person_id" AND "VISIT_OCCURRENCE"."visit_start_date" = "LAB_20231003"."sdate" AND "VISIT_OCCURRENCE"."visit_end_date" = "NEPHR"."LAB_20231003"."sdate" ) INSERT INTO "measurement" ( "measurement_id", "person_id", "measurement_concept_id", "measurement_date", "measurement_type_concept_id", "value_as_number", "visit_occurrence_id" ) SELECT ROW_NUMBER() OVER () + 375050204 AS "measurement_id", "person_id", "measurement_concept_id", "measurement_date", 1 AS "measurement_type_concept_id", "value_as_number", "visit_occurrence_id" FROM measurement_base WHERE "person_id" IS NOT NULL AND "measurement_concept_id" IS NOT NULL AND "measurement_date" IS NOT NULL AND "value_as_number" IS NOT NULL AND "visit_occurrence_id" IS NOT NULL ``` - "item"欄位補充說明 HTN(CD or DD - icd) / 共病症 BUN / 血尿素氮 (Blood Urea Nitrogen) HDL(HDLC) / 高密度脂蛋白 (High-Density Lipoprotein) LDL(LDLC) / 低密度脂蛋白 (Low-Density Lipoprotein) TG / 三酸甘油酯 (Triglycerides) WBC / 白血球 (White Blood Cells) Hb(HGB) / 血紅素 (Hemoglobin) CREAT / 血清肌酐 (Creatinine) eGFR / 預測腎小管過濾率 (estimated Glomerular Filtration Rate) NA / 鈉 (Sodium) K / 鉀 (Potassium) Cl / 氯 (Chloride) Ca / 鈣 (Calcium) IP / 無機磷 (Inorganic Phosphate) CHOL / 總膽固醇 (Cholesterol) UA / 尿酸 (Uric Acid) HCO3 / 重碳酸根 (Bicarbonate) HbA1c / 糖化血紅素 (Hemoglobin A1c) UPCR / 尿蛋白對肌酐比值 (Urine Protein to Creatinine Ratio) UACR / 尿白蛋白對肌酐比值 (Urine Albumin to Creatinine Ratio) - [Broadsea Atlas](https://github.com/OHDSI/Broadsea/blob/7e61aaeff305d1ee621d278dc643e06d5793a2b6/docker-compose.yml#L95)示範`table` ![measurement](https://hackmd.io/_uploads/HkPRUuUN1l.png) ### observation - 進度: 完成 - 缺: - 其他: - `observatin_id`目前流水號 - `observation_type_concept_id`, 目前全部帶`44814721`(`EHR/Electronic Health Record`的`concept_id`碼) - `observation_date`目前暫帶`ID.firstvisit` ```sql= WITH observation_base AS ( SELECT "PERSON"."person_id" as "person_id", CASE WHEN "ID"."bloodtype" = 'A' THEN 1576146 WHEN "ID"."bloodtype" = 'A -' THEN 45910926 WHEN "ID"."bloodtype" = 'A +' THEN 45930334 WHEN "ID"."bloodtype" = 'AB' THEN 1576148 WHEN "ID"."bloodtype" = 'AB -' THEN 45930337 WHEN "ID"."bloodtype" = 'AB +' THEN 45934055 WHEN "ID"."bloodtype" = 'B' THEN 1576147 WHEN "ID"."bloodtype" = 'B-' THEN 45917006 WHEN "ID"."bloodtype" = 'B +' THEN 45930336 WHEN "ID"."bloodtype" = 'O' THEN 1576149 WHEN "ID"."bloodtype" = 'O -' THEN 45917005 WHEN "ID"."bloodtype" = 'O +' THEN 45930335 ELSE NULL END AS "value_as_concept_id", "ID"."bloodtype" AS "value_as_string", CASE WHEN "ID"."firstvisit" = '' OR "ID"."firstvisit" IS NULL OR "ID"."firstvisit" = '20201920' THEN NULL ELSE CAST("ID"."firstvisit" AS DATE) END AS "observation_date" FROM "ID" LEFT JOIN "PERSON" ON "ID"."uuid" = "PERSON"."person_source_value" ) INSERT INTO "observation" ( "observation_id", "person_id", "observation_concept_id", "observation_date", "observation_type_concept_id", "value_as_concept_id", "value_as_string" ) SELECT ROW_NUMBER() OVER () + 475050204 AS "observation_id", "person_id", 4246053 AS "observation_concept_id", "observation_date", 44814721 AS "observation_type_concept_id", "value_as_concept_id", "value_as_string" FROM observation_base WHERE "person_id" IS NOT NULL AND "observation_date" IS NOT NULL AND "value_as_concept_id" IS NOT NULL --LIMIT 10000; ``` - [Broadsea Atlas](https://github.com/OHDSI/Broadsea/blob/7e61aaeff305d1ee621d278dc643e06d5793a2b6/docker-compose.yml#L95)示範`table` ![observation](https://hackmd.io/_uploads/HyG1wOL4Jx.png) ### observatin_period (Atlas Data Source顯示圖必要table, 不能空, [參考1](https://forums.ohdsi.org/t/broadsea-atlas-why-are-most-data-source-reports-empty-when-using-a-local-data-source/18858/3), [參考2](https://forums.ohdsi.org/t/atlas-now-load-info-properly/16522/2), [參考3](https://github.com/OHDSI/Achilles/blob/main/inst/sql/sql_server/analyses/505.sql)(Atlas顯示也用此Table做日期filter)) ```sql= WITH observation_period_base AS ( SELECT "PERSON"."person_id" as "person_id", CAST("ID"."firstvisit" AS DATE) AS "observation_period_start_date", CAST("ID"."lastvisit" AS DATE) AS "observation_period_end_date" FROM "ID" LEFT JOIN "PERSON" ON "ID"."uuid" = "PERSON"."person_source_value" WHERE "ID"."firstvisit" != '' AND "ID"."lastvisit" != '' AND "ID"."lastvisit" NOT IN ('20182332', '20201920') ), ranked_periods AS ( SELECT "person_id", "observation_period_start_date", "observation_period_end_date", ROW_NUMBER() OVER (PARTITION BY "person_id" ORDER BY "observation_period_start_date") AS row_num FROM observation_period_base ) INSERT INTO "observation_period" ( "observation_period_id", "person_id", "observation_period_start_date", "observation_period_end_date", "period_type_concept_id" ) SELECT ROW_NUMBER() OVER () + 29755 AS "observation_period_id", -- Create unique ID "person_id", "observation_period_start_date", "observation_period_end_date", 44814724 AS "period_type_concept_id" FROM ranked_periods WHERE row_num = 1 AND "person_id" IS NOT NULL AND "observation_period_start_date" IS NOT NULL AND "observation_period_end_date" IS NOT NULL; ``` - [Broadsea Atlas](https://github.com/OHDSI/Broadsea/blob/7e61aaeff305d1ee621d278dc643e06d5793a2b6/docker-compose.yml#L95)示範`table` ![ob](https://hackmd.io/_uploads/r1Z_A8AV1x.png) ### cost - 進度: 完成 - 缺: `cost_event_id `, ` cost_domain_id`, - 其他: - `cost_type_concept_id`目前帶`5032:Covered Charge` - `cost_event_id`,來自於`CD`帶`1`,來自於`DO`帶`2` - `cost_domain_id`目前暫帶`visit_temp` ```sql= WITH cost_CD_base AS ( SELECT CAST("CD"."t_amt" AS NUMERIC) AS "total_charge", CAST("CD"."t_appl_amt" AS NUMERIC) AS "total_cost", CAST("CD"."part_amt" AS NUMERIC) AS "paid_patient_copay", "CD"."case_pay_code" AS "drg_source_value", 1 AS "cost_event_id" FROM "CD" ), cost_DO_base AS ( SELECT CAST("DO"."order_price" AS NUMERIC) AS "total_charge", CAST("DO"."order_amt" AS NUMERIC) AS "total_cost", CAST(NULL AS NUMERIC) AS "paid_patient_copay", CAST(NULL AS VARCHAR) AS "drg_source_value", 2 AS "cost_event_id" FROM "DO" ) INSERT INTO "COST" ( "cost_id", "cost_event_id", "cost_domain_id", "cost_type_concept_id", "total_charge", "total_cost", "paid_patient_copay", "drg_source_value" ) SELECT ROW_NUMBER() OVER () AS "cost_id", "cost_event_id", 'visit_temp' AS "cost_domain_id", 5032 AS "cost_type_concept_id", "total_charge", "total_cost", "paid_patient_copay", "drg_source_value" FROM ( SELECT * FROM cost_CD_base UNION ALL SELECT * FROM cost_DO_base ) AS combined_cost_data LIMIT 10000; ``` - [Broadsea Atlas](https://github.com/OHDSI/Broadsea/blob/7e61aaeff305d1ee621d278dc643e06d5793a2b6/docker-compose.yml#L95)示範`table` ![cost](https://hackmd.io/_uploads/HysJDOI4kx.png) ### death ```sql= WITH death_base AS ( SELECT "PERSON"."person_id" AS "person_id", CAST("ID"."lastvisit" AS DATE) AS "death_date", 32815 AS "death_type_concept_id" FROM "ID" JOIN "PERSON" ON "ID"."uuid" = "PERSON"."person_source_value" WHERE "ID"."lastvisit" != '' AND "ID"."lastvisit" NOT IN ('20182332', '20201920') AND "ID"."isdead" = 'true' ), ranked_deaths AS ( SELECT "person_id", "death_date", "death_type_concept_id", ROW_NUMBER() OVER (PARTITION BY "person_id" ORDER BY "death_date" ASC) AS row_num FROM death_base ) INSERT INTO "death" ( "person_id", "death_date", "death_type_concept_id" ) SELECT "person_id", "death_date", "death_type_concept_id" FROM ranked_deaths WHERE row_num = 1; ``` - [Broadsea Atlas](https://github.com/OHDSI/Broadsea/blob/7e61aaeff305d1ee621d278dc643e06d5793a2b6/docker-compose.yml#L95)示範`table` ![d](https://hackmd.io/_uploads/BJjKAIA4kl.png) ### condition era & drug era - [官方網站SQL](https://ohdsi.github.io/CommonDataModel/sqlScripts.html#drug_eras) - [官方GithubSQL](https://github.com/OHDSI/ETL-CMS/tree/master/SQL) ## Broadsea Atlas, Postgres, Achilles, R Studio - [Broadsea Atlas docker-compose.yaml](https://github.com/OHDSI/Broadsea/blob/7e61aaeff305d1ee621d278dc643e06d5793a2b6/docker-compose.yml#L95) (`L95`, 改`FLYWAY_DATASOURCE_PASSWORD=mypass`) - `Broadsea Postgres`建立`Achilles`用`schema`準備 - `cdmDatabaseSchema` - [Github](https://github.com/OHDSI/CommonDataModel/releases/tag/v5.4.1)下載解包,跑`ddl.sql`,`indices.sql`,`primary_keys.sql`。記得改`@cdmDatabaseSchema`至自己的`schema`名,此`schema`將為`Achilles`的`cdmDatabaseSchema` - `schema`建立完成將自己的`cdm tables: person, measurement, observation, etc`上傳 - `vocabulary`的`table`內容於[Athena官網](https://athena.ohdsi.org/)建帳戶下載後上傳 - `resultsDatabaseSchema`準備 - [Github](https://gist.github.com/asd1245dss/bdc0770e089677ff3c3ef1a052d89df3)或[WebAPI](http://127.0.0.1/WebAPI/ddl/results?dialect=postgresql&schema=results&vocabSchema=cdm&tempSchema=temp&initConceptHierarchy=true),複製`sql`,`schema`記得改自己`schema`名稱,此`schema`將為`Achilles`的`resultsDatabaseSchema` - `Achilles`生成`results` - 登入(`127.0.0.1:8787, username:ohdsi, password:mypass`) ```r= if (!require("remotes")) install.packages("remotes") remotes::install_github("OHDSI/Achilles") library(Achilles) connectionDetails <- createConnectionDetails( dbms="postgresql", server="broadsea-atlasdb/postgres", user="postgres", password='mypass', port="5432" ) achilles(connectionDetails, cdmDatabaseSchema = "cdm", resultsDatabaseSchema = "cdm_results", vocabDatabaseSchema = "cdm", numThreads = 4, sourceName = "CDM", cdmVersion = "5.4", createTable = TRUE, createIndices = TRUE, sqlDialect = "postgresql" ) ``` - `Achilles`執行約需`1`小時(依資料量) - 執行完`results schema`會有`achilles_results`等相關`tables`產生 - ![atlas_results](https://hackmd.io/_uploads/S1Qz04ZHyg.png) - `Broadsea postgres`修改`webapi`的`tables` - `Achilles`生成完`results`需要於`Broadsea postgres`中`webapi.source`和`webapi.source_daimon`下`sql`進行修改來讓`Atlas`知道顯示來源 - `webapi.source`的`sql`範例 ```sql= INSERT INTO "webapi"."source" ( "source_id", "source_name", "source_key", "source_connection", "source_dialect", "username", "password", "krb_auth_method", "keytab_name", "krb_keytab", "krb_admin_server", "deleted_date", "created_by_id", "created_date", "modified_by_id", "modified_date" , "is_cache_enabled" ) VALUES ( 2, 'CDM', 'OHDSI', 'jdbc:postgresql://ip:port/postgres?user=postgres&password=postgres', 'postgresql', NULL, NULL, 'PASSWORD', NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, true ); ``` - `webapi.source_daimon`的`sql`範例 ```sql= INSERT INTO "webapi"."source_daimon"("source_daimon_id", "source_id", "daimon_type", "table_qualifier", "priority") VALUES (4, 2, 0, 'cdm', 2); INSERT INTO "webapi"."source_daimon"("source_daimon_id", "source_id", "daimon_type", "table_qualifier", "priority") VALUES (5, 2, 1, 'cdm', 2); INSERT INTO "webapi"."source_daimon"("source_daimon_id", "source_id", "daimon_type", "table_qualifier", "priority") VALUES (6, 2, 2, 'cdm_results', 2); ``` - 刷新添增 - [http://127.0.0.1/WebAPI/source/refresh](http://127.0.0.1/WebAPI/source/refresh)刷新`WebAPI` (記得改自己`ip`) - ![2024-12-12_15-56](https://hackmd.io/_uploads/H1VLDGuV1e.png) - [http://127.0.0.1/atlas/#/configure](http://127.0.0.1/atlas/#/configure)檢視 (記得改自己`ip`) - ![2024-12-12_15-57_1](https://hackmd.io/_uploads/rkgcwfOV1e.png) - 顯示圖 - ![dashboard](https://hackmd.io/_uploads/ByVdOZ-r1l.png) - ![death](https://hackmd.io/_uploads/SJnud-Wrke.png) - ![density](https://hackmd.io/_uploads/Hk7KuZ-B1g.png) - ![measurement](https://hackmd.io/_uploads/Sk9Kub-Bye.png) - ![observation_period](https://hackmd.io/_uploads/Sy-cdZWS1x.png) - ![person](https://hackmd.io/_uploads/B1Kqu-bSyl.png) - ![profile](https://hackmd.io/_uploads/HkXiO-bSJx.png) - ![visit](https://hackmd.io/_uploads/BkKidZZHJe.png) - ![condition](https://hackmd.io/_uploads/H1MN4lOB1x.png) - [Atlas顯示SQL參考](https://github.com/OHDSI/Achilles/tree/main/inst/sql/sql_server/export) ### Concept Sets, Cohort Definition, Cohort Pathways 1. `Search`用`concept_id`搜,綠色`Add To New Concept Set` ![1](https://hackmd.io/_uploads/B1nsfNUP1l.png) 2. `Cohort Sets`,選`concept_id`,綠色`Add`,右上綠色`Save` ![2](https://hackmd.io/_uploads/r11Pm4UD1e.png) 3. `Cohort Definition`設定條件,性別,年齡,日期等,右上綠色`Save` ![3](https://hackmd.io/_uploads/Hkc57VIvye.png) 4. `Cohort Pathways`, `target`設大範圍(有高血壓),小範圍設細條件(女性,50歲+, 高血壓),設好後`Execution`生圖和報告 ![4](https://hackmd.io/_uploads/H19bE4Uwyg.png) ![5](https://hackmd.io/_uploads/HkAKV4UvJx.png) ![6](https://hackmd.io/_uploads/HyPjN4IDyl.png) ## [DQD (DataQualityDashboard)](https://github.com/OHDSI/DataQualityDashboard) 分析 ```r= remotes::install_github("OHDSI/SqlRender") #如有可跳過 install.packages("remotes") remotes::install_github("OHDSI/DataQualityDashboard") connectionDetails <- DatabaseConnector::createConnectionDetails( dbms = "postgresql", user = "postgres", password = "mypass", server = "ip/databasename", port = "port") DataQualityDashboard::executeDqChecks( connectionDetails = connectionDetails, cdmDatabaseSchema = "cdm", resultsDatabaseSchema = "dqd_results", cdmSourceName = "CDM", cdmVersion = "5.4", # 根據 CDM 版本修改 numThreads = 4, outputFolder = "output", outputFile = "results.json" ) jsonFilePath <- "/home/ohdsi/output/results.json" #要full-path, 沒有會跳error DataQualityDashboard::viewDqDashboard(jsonFilePath) ``` - `cdm_source`的`table`記得查一下放相關值,不然`DQD`會跳`error` - ![cdm_source](https://hackmd.io/_uploads/rkMDOVbB1l.png) - 範例![value](https://hackmd.io/_uploads/Hk_v_4bBJx.png) - `DQD`結果 - ![dqd](https://hackmd.io/_uploads/H1mTUrbSyx.png) - ![dqd2](https://hackmd.io/_uploads/BJVFprZSyx.png) ## White Rabbit & Rabbit In A Hat - [White Rabbit & Rabbit In A Hat下載](https://github.com/OHDSI/WhiteRabbit/releases/) - `White Rabbit Location Database連結` - `Postgres RESEARCH`讀取`table`問題 ![wr_error](https://hackmd.io/_uploads/H1BZfGB4kx.png) ![wr_non_exist](https://hackmd.io/_uploads/BkBWzMHNkg.png) #### Windows ![image](https://hackmd.io/_uploads/ryib-GrVJe.png) - [postgres bug](https://github.com/OHDSI/WhiteRabbit/issues/416) - 原始碼判斷有`bug`,[v1.0.0](https://github.com/OHDSI/WhiteRabbit/releases/tag/v1.0.0), [v0.10.6](https://github.com/OHDSI/WhiteRabbit/releases/tag/v0.10.6)可連結數據庫但無法正常讀取`RESEARCH database`的`table`內容,目前放棄資料庫方式。 - `White Rabbit Scan 手動上傳csv` - 除連結資料庫外,也可以手動上傳`csv`生成`scan report`, `csv`來源為資料庫`SELECT FROM TABLE LIMIT 100`,有包含到想要欄位即可。 - ![2024-12-10_14-22](https://hackmd.io/_uploads/HytPC8SEJg.png) - `White Rabbit`生成的`scan report`將會餵入下一工具`Rabbit In A Hat` - `Rabbit In A Hat` - ![2024-12-10_14-34](https://hackmd.io/_uploads/H1NzbDBE1l.png) - 上傳剛剛`White Rabbit`生成之`scan report`進行欄位連連看 - 欄位連完按`Generate`,選擇產出檔案,有`word`, `xlsx`, `sql`, 等 - ![2024-12-10_14-36](https://hackmd.io/_uploads/BJV_-DSEJe.png) - 生成`SQL`範例![2024-12-10_14-37](https://hackmd.io/_uploads/SJjsZwSEJg.png) ## Broadsea 3.5 & HTTPS - Clone - https://github.com/OHDSI/Broadsea/releases/tag/v3.5.0 - `git clone https://github.com/OHDSI/Broadsea` - 拿憑證`*.crt`,`ssl.key` ``` cat STAR_vghtpe_gov_tw.crt USERTrustRSAAAACA.crt SectigoRSADomainValidationSecureServerCA.crt AAACertificateServices.crt > ./certs/broadsea.crt mv ssl.key ./certs/broadsea.key ``` - 改`.env`,相對應`HOST` - ![env](https://hackmd.io/_uploads/BksDVnIn1x.png) - `Up & Down` - `docker compose --profile default up -d` - `docker compose --profile default down -v` ![atlas](https://hackmd.io/_uploads/Sy3Vr2Ih1x.png) ## Postgrest API - Postgres ```sql= CREATE ROLE web_anon NOLOGIN; GRANT USAGE ON SCHEMA nhi TO web_anon; GRANT SELECT ON ALL TABLES IN SCHEMA nhi TO web_anon; ``` - https://github.com/PostgREST/postgrest ```yaml= version: '3.8' services: postgrest: image: postgrest/postgrest:latest container_name: postgrest environment: PGRST_DB_URI: postgres://user:password@postgres_ip:port/database PGRST_DB_SCHEMA: nhi PGRST_DB_ANON_ROLE: web_anon PGRST_SERVER_PORT: 3000 ports: - "3000:3000" restart: always ``` - curl - `curl "http://postgrest_ip:3000/person?limit=10"` - ![person](https://hackmd.io/_uploads/SkiMYhDnkl.png) ## 10.221.252.51 硬碟塞爆 - `2024/12/16`,`10.221.252.51`運行的`docker-compose Broadsea`中`Atlas postgres`突然斷線,`DB`子凱來電,通知硬碟被塞爆,判斷是`docker-compose up`的工程師`postgres volume`放錯地方,沒放進`120T`的空間,放錯進`50G /root` - `ssh`進去`10.221.252.51`, `docker-compose.yml`所有`volumes`的`path`發現為`docker`默認,所有`volume`被放到`/root`, 應該改為`/mnt/bgcdb/void/...`連接到`120T bgcdb` ![bgcdb](https://hackmd.io/_uploads/ryIwcvR4Jg.png) # 鄭嘉祥 ## 匯入Athena VOCAB資料 - 從官方[GitHub](https://github.com/OHDSI/CommonDataModel/releases/tag/v5.4.2)下載並執行SQL檔案就可以在資料庫建立OMOP CDM相關表格 - 於[Athena](https://athena.ohdsi.org/search-terms/start)官網下載VOCAB壓縮檔,其中為收錄相關字彙的CSV檔案(`ICD10PCS`選項須打勾才能處理`DD`表格中ICD10的手術代碼) ![image](https://hackmd.io/_uploads/SJRow-YNJg.png) - 於pgAdmin的PSQL Tool中執行以下指令(需根據檔案位置變更路徑) 以匯入VOCAB資料至相對應的表格 ```cmd= \COPY cdm.DRUG_STRENGTH FROM 'C:\Users\vghuser\Downloads\VOCAB\DRUG_STRENGTH.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ENCODING 'UTF8'; \COPY cdm.CONCEPT FROM 'C:\Users\vghuser\Downloads\VOCAB\CONCEPT.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ENCODING 'UTF8'; \COPY cdm.CONCEPT_RELATIONSHIP FROM 'C:\Users\vghuser\Downloads\VOCAB\CONCEPT_RELATIONSHIP.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ENCODING 'UTF8'; \COPY cdm.CONCEPT_ANCESTOR FROM 'C:\Users\vghuser\Downloads\VOCAB\CONCEPT_ANCESTOR.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ENCODING 'UTF8'; \COPY cdm.CONCEPT_SYNONYM FROM 'C:\Users\vghuser\Downloads\VOCAB\CONCEPT_SYNONYM.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ENCODING 'UTF8'; \COPY cdm.VOCABULARY FROM 'C:\Users\vghuser\Downloads\VOCAB\VOCABULARY.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ENCODING 'UTF8'; \COPY cdm.RELATIONSHIP FROM 'C:\Users\vghuser\Downloads\VOCAB\RELATIONSHIP.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ENCODING 'UTF8'; \COPY cdm.CONCEPT_CLASS FROM 'C:\Users\vghuser\Downloads\VOCAB\CONCEPT_CLASS.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ENCODING 'UTF8'; \COPY cdm.DOMAIN FROM 'C:\Users\vghuser\Downloads\VOCAB\DOMAIN.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ENCODING 'UTF8'; ``` ## 原始資料清理 ### ICD疾病代碼 ```sql= DROP TABLE IF EXISTS patient_visits; CREATE TABLE patient_visits AS SELECT * FROM dblink( 'dbname=NHIRaw host=10.221.252.51 port=5436 user=postgres password=mypass', $$WITH outpatient_icd_10 AS ( SELECT uuid, func_date::DATE AS start_date, COALESCE(CAST(treat_end_date AS DATE), func_date::DATE) AS end_date, UNNEST(STRING_TO_ARRAY(acode_icd9_1, ' ')) AS icd_code, func_type FROM "NEPHR"."CD" WHERE func_date::DATE >= '2016-01-01'::DATE GROUP BY uuid, func_date::DATE, COALESCE(CAST(treat_end_date AS DATE), func_date::DATE), UNNEST(STRING_TO_ARRAY(acode_icd9_1, ' ')), func_type UNION ALL SELECT uuid, func_date::DATE, COALESCE(CAST(treat_end_date AS DATE), func_date::DATE), UNNEST(STRING_TO_ARRAY(acode_icd9_2, ' ')), func_type FROM "NEPHR"."CD" WHERE func_date::DATE >= '2016-01-01'::DATE GROUP BY uuid, func_date::DATE, COALESCE(CAST(treat_end_date AS DATE), func_date::DATE), UNNEST(STRING_TO_ARRAY(acode_icd9_2, ' ')), func_type UNION ALL SELECT uuid, func_date::DATE, COALESCE(CAST(treat_end_date AS DATE), func_date::DATE), UNNEST(STRING_TO_ARRAY(acode_icd9_3, ' ')), func_type FROM "NEPHR"."CD" WHERE func_date::DATE >= '2016-01-01'::DATE GROUP BY uuid, func_date::DATE, COALESCE(CAST(treat_end_date AS DATE), func_date::DATE), UNNEST(STRING_TO_ARRAY(acode_icd9_3, ' ')), func_type ), outpatient_icd_9 AS ( SELECT uuid, func_date::DATE AS start_date, COALESCE(CAST(treat_end_date AS DATE), func_date::DATE) AS end_date, UNNEST(STRING_TO_ARRAY(acode_icd9_1, ' ')) AS icd_code, func_type FROM "NEPHR"."CD" WHERE func_date::DATE < '2016-01-01'::DATE GROUP BY uuid, func_date::DATE, COALESCE(CAST(treat_end_date AS DATE), func_date::DATE), UNNEST(STRING_TO_ARRAY(acode_icd9_1, ' ')), func_type UNION ALL SELECT uuid, func_date::DATE, COALESCE(CAST(treat_end_date AS DATE), func_date::DATE), UNNEST(STRING_TO_ARRAY(acode_icd9_2, ' ')), func_type FROM "NEPHR"."CD" WHERE func_date::DATE < '2016-01-01'::DATE GROUP BY uuid, func_date::DATE, COALESCE(CAST(treat_end_date AS DATE), func_date::DATE), UNNEST(STRING_TO_ARRAY(acode_icd9_2, ' ')), func_type UNION ALL SELECT uuid, func_date::DATE, COALESCE(CAST(treat_end_date AS DATE), func_date::DATE), UNNEST(STRING_TO_ARRAY(acode_icd9_3, ' ')), func_type FROM "NEPHR"."CD" WHERE func_date::DATE < '2016-01-01'::DATE GROUP BY uuid, func_date::DATE, COALESCE(CAST(treat_end_date AS DATE), func_date::DATE), UNNEST(STRING_TO_ARRAY(acode_icd9_3, ' ')), func_type ), inpatient_icd_10 AS ( SELECT uuid, in_date::DATE AS start_date, COALESCE(CAST(out_date AS DATE), in_date::DATE) AS end_date, UNNEST(STRING_TO_ARRAY(icd9cm_code, ' ')) AS icd_code, func_type FROM "NEPHR"."DD" WHERE in_date::DATE >= '2016-01-01'::DATE GROUP BY uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code, ' ')), func_type UNION ALL SELECT uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code_1, ' ')), func_type FROM "NEPHR"."DD" WHERE in_date::DATE >= '2016-01-01'::DATE GROUP BY uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code_1, ' ')), func_type UNION ALL SELECT uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code_2, ' ')), func_type FROM "NEPHR"."DD" WHERE in_date::DATE >= '2016-01-01'::DATE GROUP BY uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code_2, ' ')), func_type UNION ALL SELECT uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code_3, ' ')), func_type FROM "NEPHR"."DD" WHERE in_date::DATE >= '2016-01-01'::DATE GROUP BY uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code_3, ' ')), func_type UNION ALL SELECT uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code_4, ' ')), func_type FROM "NEPHR"."DD" WHERE in_date::DATE >= '2016-01-01'::DATE GROUP BY uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code_4, ' ')), func_type ), inpatient_icd_9 AS ( SELECT uuid, in_date::DATE AS start_date, COALESCE(CAST(out_date AS DATE), in_date::DATE) AS end_date, UNNEST(STRING_TO_ARRAY(icd9cm_code, ' ')) AS icd_code, func_type FROM "NEPHR"."DD" WHERE in_date::DATE < '2016-01-01'::DATE GROUP BY uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code, ' ')), func_type UNION ALL SELECT uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code_1, ' ')), func_type FROM "NEPHR"."DD" WHERE in_date::DATE < '2016-01-01'::DATE GROUP BY uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code_1, ' ')), func_type UNION ALL SELECT uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code_2, ' ')), func_type FROM "NEPHR"."DD" WHERE in_date::DATE < '2016-01-01'::DATE GROUP BY uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code_2, ' ')), func_type UNION ALL SELECT uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code_3, ' ')), func_type FROM "NEPHR"."DD" WHERE in_date::DATE < '2016-01-01'::DATE GROUP BY uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code_3, ' ')), func_type UNION ALL SELECT uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code_4, ' ')), func_type FROM "NEPHR"."DD" WHERE in_date::DATE < '2016-01-01'::DATE GROUP BY uuid, in_date::DATE, COALESCE(CAST(out_date AS DATE), in_date::DATE), UNNEST(STRING_TO_ARRAY(icd9cm_code_4, ' ')), func_type ) SELECT uuid, start_date, end_date, func_type, icd_code, CASE WHEN icd_type = 9 AND icd_code ~ '^[A-Z]\d+' AND SUBSTRING(icd_code, 1, 1) NOT IN ('E', 'V') THEN 10 WHEN icd_type = 10 AND icd_code ~ '^\d+' THEN 9 ELSE icd_type END AS icd_type FROM ( SELECT *, 9 AS icd_type FROM outpatient_icd_9 UNION ALL SELECT *, 10 AS icd_type FROM outpatient_icd_10 UNION ALL SELECT *, 9 AS icd_type FROM inpatient_icd_9 UNION ALL SELECT *, 10 AS icd_type FROM inpatient_icd_10 ) WHERE LENGTH(icd_code) <> 0$$) AS t ( uuid TEXT, start_date DATE, end_date DATE, func_type TEXT, icd_code TEXT, icd_type INTEGER ); CREATE INDEX idx_patient_visits_uuid ON patient_visits (uuid); CREATE INDEX idx_patient_visits_icd_code ON patient_visits (icd_code); ``` ## OMOP Table SQL Queries ### care_site - 紀錄醫療院所之識別碼與名稱及其醫療部門 ![image](https://hackmd.io/_uploads/HJ355PB41e.png) - [醫事機構代碼等資料來源](https://info.nhi.gov.tw/IODE0000/IODE0000S09?id=1120) ![image](https://hackmd.io/_uploads/ByOGZ8UH1x.png) - `care_site`之間為階層結構的關係,例如北榮之下有不同的主科別,主科別之下又有次科別。階層結構的關係必須記錄於`fact_relationship`表格中。 - 目前`care_site_source_value`欄位用於紀錄醫院或科別的代號。 ![北榮CARE_SITE示意圖](https://hackmd.io/_uploads/BktomkzwJg.png) #### 整理醫療院所數據 ```python= def retrieve_care_site_data(update_data: bool = False) -> DataFrame: care_site_df = ( pd.read_csv(CARE_SITE_DATA_SRC_URL) if update_data else pd.read_csv("./csv/CARE_SITE_SOURCE_DATA.csv") ) care_site_df = care_site_df[["醫事機構名稱", "醫事機構代碼"]] care_site_df.columns = ["care_site_name", "care_site_source_value"] specialties_data = pd.read_csv("./csv/SPECIALTIES.csv") specialties_data = specialties_data[["科別", "代號"]] specialties_data.columns = ["care_site_name", "care_site_source_value"] subspecialties_data = pd.read_csv("./csv/SUBSPECIALTIES.csv") subspecialties_data = subspecialties_data[["次分科", "次分科別代號"]] subspecialties_data.columns = ["care_site_name", "care_site_source_value"] return pd.concat( [care_site_df, specialties_data, subspecialties_data], axis=0 ).reset_index(inplace=False) ``` #### 插入醫療院所數據 ```python= def load_care_site_data( conn: connection, schema: str, update_data: bool = False ) -> None: cursor = conn.cursor() cursor.execute(f"TRUNCATE {schema}.care_site;") cursor.close() df = retrieve_care_site_data(update_data) data = [ (index + 1, row["care_site_name"], row["care_site_source_value"]) for index, row in df.iterrows() ] insert_data_into_target_db( conn, schema, "care_site", ["care_site_id", "care_site_name", "care_site_source_value"], data, ) print(f"Successfully copied data into {schema}.care_site.") ``` #### 建立care_site(醫院與主次科別)之間的關聯 ```sql= WITH src AS ( SELECT (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care site' AND vocabulary_id = 'Domain') AS domain_concept_id_1, care_site_id AS fact_id_1, (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care site' AND vocabulary_id = 'Domain') AS domain_concept_id_2, (SELECT care_site_id FROM @target_schema.care_site WHERE care_site_source_value = '02') AS fact_id_2, (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care Site is part of Care Site') AS relationship_concept_id FROM @target_schema.care_site WHERE care_site_source_value IN ('AA', 'AB', 'AC', 'AD', 'AE', 'AF', 'AG', 'AH', 'AI', 'AJ') UNION ALL SELECT (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care site' AND vocabulary_id = 'Domain') AS domain_concept_id_1, care_site_id AS fact_id_1, (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care site' AND vocabulary_id = 'Domain') AS domain_concept_id_2, (SELECT care_site_id FROM @target_schema.care_site WHERE care_site_source_value = '03') AS fact_id_2, (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care Site is part of Care Site') AS relationship_concept_id FROM @target_schema.care_site WHERE care_site_source_value IN ('BA', 'BB', 'BC', 'BD') UNION ALL SELECT (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care site' AND vocabulary_id = 'Domain') AS domain_concept_id_1, care_site_id AS fact_id_1, (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care site' AND vocabulary_id = 'Domain') AS domain_concept_id_2, (SELECT care_site_id FROM @target_schema.care_site WHERE care_site_source_value = '04') AS fact_id_2, (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care Site is part of Care Site') AS relationship_concept_id FROM @target_schema.care_site WHERE care_site_source_value IN ('CA', 'CB') UNION ALL SELECT (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care site' AND vocabulary_id = 'Domain') AS domain_concept_id_1, care_site_id AS fact_id_1, (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care site' AND vocabulary_id = 'Domain') AS domain_concept_id_2, (SELECT care_site_id FROM @target_schema.care_site WHERE care_site_source_value = '06') AS fact_id_2, (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care Site is part of Care Site') AS relationship_concept_id FROM @target_schema.care_site WHERE care_site_source_value IN ('HA') UNION ALL SELECT (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care site' AND vocabulary_id = 'Domain') AS domain_concept_id_1, care_site_id AS fact_id_1, (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care site' AND vocabulary_id = 'Domain') AS domain_concept_id_2, (SELECT care_site_id FROM @target_schema.care_site WHERE care_site_source_value = '12') AS fact_id_2, (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care Site is part of Care Site') AS relationship_concept_id FROM @target_schema.care_site WHERE care_site_source_value IN ('DA') UNION ALL SELECT (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care site' AND vocabulary_id = 'Domain') AS domain_concept_id_1, care_site_id AS fact_id_1, (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care site' AND vocabulary_id = 'Domain') AS domain_concept_id_2, (SELECT care_site_id FROM @target_schema.care_site WHERE care_site_source_value = '40') AS fact_id_2, (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care Site is part of Care Site') AS relationship_concept_id FROM @target_schema.care_site WHERE care_site_source_value IN ('GA') UNION ALL SELECT (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care site' AND vocabulary_id = 'Domain') AS domain_concept_id_1, care_site_id AS fact_id_1, (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care site' AND vocabulary_id = 'Domain') AS domain_concept_id_2, (SELECT care_site_id FROM @target_schema.care_site WHERE care_site_source_value = '82') AS fact_id_2, (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care Site is part of Care Site') AS relationship_concept_id FROM @target_schema.care_site WHERE care_site_source_value IN ('FA', 'FB') UNION ALL SELECT (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care site' AND vocabulary_id = 'Domain') AS domain_concept_id_1, care_site_id AS fact_id_1, (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care site' AND vocabulary_id = 'Domain') AS domain_concept_id_2, (SELECT care_site_id FROM @target_schema.care_site WHERE care_site_name = '臺北榮民總醫院') AS fact_id_2, (SELECT concept_id FROM @target_schema.concept WHERE concept_name = 'Care Site is part of Care Site') AS relationship_concept_id FROM @target_schema.care_site WHERE care_site_source_value IN ( '00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '22', '23', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '81', '82', '83', '84', '2A', '2B' ) ) INSERT INTO @target_schema.fact_relationship ( domain_concept_id_1, fact_id_1, domain_concept_id_2, fact_id_2, relationship_concept_id ) SELECT domain_concept_id_1, fact_id_1, domain_concept_id_2, fact_id_2, relationship_concept_id FROM src; ``` ### location - 可紀錄地址、郵遞區號、經緯度等 - 目前主要紀錄院內`ID`表格中的個人居住之郵遞區號 - 此表格主要用於OMOP中`person`的`location_id`欄位 ![image](https://hackmd.io/_uploads/By36Y8IEJe.png) ##### 插入郵遞區號數據 ```sql= INSERT INTO @target_schema.location ( location_id, zip ) SELECT ROW_NUMBER() OVER(ORDER BY zipcode), zipcode FROM ( SELECT id.zipcode FROM dblink( 'dbname=@source_database host=@source_host port=@source_port user=@source_username password=@source_password', $$SELECT DISTINCT REPLACE(zipcode, ' ', '') FROM @source_schema."ID" WHERE zipcode IS NOT NULL AND LENGTH(REPLACE(zipcode, ' ', '')) <> 0$$ ) AS id( zipcode VARCHAR(9) ) UNION -- 若有些紀錄遺失郵遞區號,可用此值代替。 SELECT 'UNKNOWN' ); ``` - [Broadsea Atlas](https://github.com/OHDSI/Broadsea/blob/7e61aaeff305d1ee621d278dc643e06d5793a2b6/docker-compose.yml#L95)示範`table` ![location](https://hackmd.io/_uploads/rkxVwuLEJg.png) ### person - 紀錄個人資訊,包含生日、種族、族群等資料 - 資料來源為院內`ID`表格 #### SQL範例 ```sql= WITH unknown_zipcode AS ( SELECT location_id FROM @target_schema.location WHERE zip = 'UNKNOWN' ), gender AS ( SELECT concept_id, concept_code FROM @target_schema.concept WHERE domain_id = 'Gender' ), race AS ( SELECT concept_id, concept_name FROM @target_schema.concept WHERE domain_id = 'Race' ), ethnicity AS ( SELECT concept_id, concept_code FROM @target_schema.concept WHERE domain_id = 'Ethnicity' ), person_source AS ( SELECT * FROM dblink( 'dbname=@source_database host=@source_host port=@source_port user=@source_username password=@source_password', $$SELECT uuid, sex, birthday, REPLACE(COALESCE(zipcode, ''), ' ', '') FROM @source_schema."ID"$$ ) AS t( uuid VARCHAR, sex VARCHAR, birthday VARCHAR, zipcode VARCHAR ) ) INSERT INTO @target_schema.person ( person_id, gender_concept_id, year_of_birth, month_of_birth, day_of_birth, race_concept_id, ethnicity_concept_id, location_id, care_site_id, person_source_value, gender_source_value ) SELECT ROW_NUMBER() OVER(), CASE sex WHEN '1' THEN (SELECT concept_id FROM gender WHERE concept_code = 'M') -- Male concept ID WHEN '2' THEN (SELECT concept_id FROM gender WHERE concept_code = 'F') -- Female concept ID ELSE (SELECT concept_id FROM gender WHERE concept_code = 'U') -- UNKNOWN concept ID END, CAST( CASE WHEN omop_is_valid_date(birthday, 'YYYYMMDD') THEN SUBSTRING(birthday, 1, 4) ELSE '0001' END AS INT ), CAST( CASE WHEN omop_is_valid_date(birthday, 'YYYYMMDD') THEN SUBSTRING(birthday, 5, 2) ELSE '01' END AS INT ), CAST( CASE WHEN omop_is_valid_date(birthday, 'YYYYMMDD') THEN SUBSTRING(birthday, 7, 2) ELSE '01' END AS INT ), (SELECT concept_id FROM race WHERE concept_name = 'Unknown'), -- From OMOP offical website: Only use this field (ethnicity_concept_id) if you have US-based data and a source of this information. (SELECT concept_id FROM ethnicity WHERE concept_code = 'Not Hispanic'), COALESCE(l.location_id, (SELECT location_id FROM unknown_zipcode)), ( SELECT care_site_id FROM @target_schema.care_site WHERE care_site_name = '臺北榮民總醫院' ), uuid, sex FROM person_source LEFT JOIN @target_schema.location l ON person_source.zipcode = l.zip ``` - [Broadsea Atlas](https://github.com/OHDSI/Broadsea/blob/7e61aaeff305d1ee621d278dc643e06d5793a2b6/docker-compose.yml#L95)示範`table` ![person](https://hackmd.io/_uploads/SJ8xvuIVyg.png) ### visit_occurrence - 紀錄就診開始和結束時間 - 資料來源為院內`CD`和`DD`表格 - 目前主要紀錄門診與住院 #### CD表格 ![image](https://hackmd.io/_uploads/SyP65ErNJx.png) 筆記: - 可能的`visit_concept_id`值: 1. 9201: Inpatient Visit(住院訪問) 2. 9202: Outpatient Visit(門診訪問) 3. 9203: Emergency Room Visit(急診訪問) - `visit_type_concept_id`可設為32817 - EHR(電子健康紀錄) - [參考資料](https://ohdsi.github.io/ETL-LambdaBuilder/CPRD/CPRD_Visit_Occurrence.html) - [Broadsea Atlas](https://github.com/OHDSI/Broadsea/blob/7e61aaeff305d1ee621d278dc643e06d5793a2b6/docker-compose.yml#L95)示範`table` ![visit_occurrence](https://hackmd.io/_uploads/ryBZDOUNke.png) #### DD表格 ![image](https://hackmd.io/_uploads/HJlN54SNyl.png) - [Broadsea Atlas](https://github.com/OHDSI/Broadsea/blob/7e61aaeff305d1ee621d278dc643e06d5793a2b6/docker-compose.yml#L95)示範`table` ![visit_occurrence](https://hackmd.io/_uploads/rJaZvdL4yx.png) #### 輸入資料 ```sql= WITH source AS ( SELECT t.uuid, t.visit_concept_id, t.visit_start_date, t.visit_end_date, t.visit_type_concept_id, t.visit_source_value, t.visit_source_concept_id FROM dblink( 'dbname=@source_database host=@source_host port=@source_port user=@source_username password=@source_password', 'SELECT DISTINCT uuid, 9202, CAST(func_date AS DATE), COALESCE(CAST(treat_end_date AS DATE), CAST(func_date AS DATE)), 32834, -- EHR outpatient note acode_icd9_1, 45880088 -- ICD-10 FROM @source_schema."CD" WHERE CAST(func_date AS DATE) >= ''2016-01-01'' GROUP BY uuid, func_date, treat_end_date, acode_icd9_1 UNION ALL SELECT DISTINCT uuid, 9202, CAST(func_date AS DATE), COALESCE(CAST(treat_end_date AS DATE), CAST(func_date AS DATE)), 32834, -- EHR outpatient note acode_icd9_1, 45880957 -- ICD-9 FROM @source_schema."CD" WHERE CAST(func_date AS DATE) < ''2016-01-01'' GROUP BY uuid, func_date, treat_end_date, acode_icd9_1 UNION ALL SELECT DISTINCT uuid, 9201, CAST(in_date AS DATE), COALESCE(CAST(out_date AS DATE), CAST(in_date AS DATE)), 32829, -- EHR inpatient note icd9cm_code, 45880088 -- ICD-10 FROM @source_schema."DD" WHERE CAST(in_date AS DATE) >= ''2016-01-01'' GROUP BY uuid, in_date, out_date, icd9cm_code UNION ALL SELECT DISTINCT uuid, 9201, CAST(in_date AS DATE), COALESCE(CAST(out_date AS DATE), CAST(in_date AS DATE)), 32829, -- EHR inpatient note icd9cm_code, 45880957 -- ICD-9 FROM @source_schema."DD" WHERE CAST(in_date AS DATE) < ''2016-01-01'' GROUP BY uuid, in_date, out_date, icd9cm_code' ) AS t( uuid character varying(100), visit_concept_id integer, visit_start_date date, visit_end_date date, visit_type_concept_id integer, visit_source_value character varying(50), visit_source_concept_id integer ) ) INSERT INTO @target_schema.visit_occurrence ( visit_occurrence_id, person_id, visit_concept_id, visit_start_date, visit_end_date, visit_type_concept_id, care_site_id, visit_source_value, visit_source_concept_id ) SELECT ROW_NUMBER() OVER (), p.person_id, r.visit_concept_id, r.visit_start_date, r.visit_end_date, r.visit_type_concept_id, ( SELECT care_site_id FROM @target_schema.care_site WHERE care_site_name = '臺北榮民總醫院' ), r.visit_source_value, r.visit_source_concept_id FROM source r LEFT JOIN @target_schema.person p ON r.uuid = p.person_source_value; ``` ### condition_occurrence - 紀錄病人患有之疾病與其發病時間(定為就診時間) - 資料來源為CD與DD表格 - 筆記:須注意疾病分類號是否為ICD-9或是ICD-10 ![image](https://hackmd.io/_uploads/S17hLllS1x.png) ```sql= INSERT INTO @target_schema.condition_occurrence ( condition_occurrence_id, person_id, condition_concept_id, condition_start_date, condition_end_date, condition_type_concept_id, visit_occurrence_id, condition_source_value, condition_source_concept_id ) SELECT ROW_NUMBER() OVER(), v.person_id, c.concept_id, v.visit_start_date, v.visit_end_date, v.visit_type_concept_id, v.visit_occurrence_id, v.visit_source_value, v.visit_source_concept_id FROM @target_schema.visit_occurrence v LEFT JOIN @target_schema.concept c ON ( CASE -- ICD-10 Code WHEN v.visit_source_concept_id = 45880088 THEN ( CASE WHEN LENGTH(v.visit_source_value) >= 4 THEN SUBSTRING( v.visit_source_value FROM 1 FOR 3 ) || '.' || SUBSTRING( v.visit_source_value FROM 4 ) ELSE v.visit_source_value END ) -- ICD-9 Code ELSE ( CASE WHEN v.visit_source_value LIKE 'E%' AND LENGTH(v.visit_source_value) > 4 THEN SUBSTRING( v.visit_source_value FROM 1 FOR 4 ) || '.' || SUBSTRING( v.visit_source_value FROM 5 ) WHEN v.visit_source_value LIKE 'V%' AND LENGTH(v.visit_source_value) > 3 THEN SUBSTRING( v.visit_source_value FROM 1 FOR 3 ) || '.' || SUBSTRING( v.visit_source_value FROM 4 ) WHEN LENGTH(v.visit_source_value) > 3 THEN SUBSTRING( v.visit_source_value FROM 1 FOR 3 ) || '.' || SUBSTRING( v.visit_source_value FROM 4 ) ELSE v.visit_source_value END ) END ) = c.concept_code WHERE c.concept_id IS NOT NULL; ``` - 疾病統計 ![image](https://hackmd.io/_uploads/Hy4vdVLDyl.png) - 手術統計 ![image](https://hackmd.io/_uploads/r1D8gUIP1e.png) ### 表格欄位定義 [Google Sheet](https://docs.google.com/spreadsheets/d/1T2PqYeDPAvSULc3IQC1zA2Eifj7ykavq5tIVxsGu4lY/edit?usp=sharing) ## 血液透析 ![image](https://hackmd.io/_uploads/rJvKjPjwyg.png) - 血液透析資料來源為`BDC_BASE`中`DW_Detect`與`DW_DetectA`表格。 - `DW_DetectA`資料為2021-08-05至2023-10-31。 - `DW_Detect`資料起始日期為2024-06-01。 - 資料將插入於`procedure_occurrence`中,需要`PTIMESTAMP`與`MEDICALID`欄位作為`procedure_date`與`person_source_value`。 - 忽略`PTIMESTAMP`中時間之部分 - 以`PTIMESTAMP`與`MEDICALID`作為`GROUP BY`的要件,取出每個病人做血液透析之日期 - [血液透析之Concept](https://athena.ohdsi.org/search-terms/terms/43018324)。 ### SQL範例 ```sql= SELECT TO_DATE(SUBSTRING("PTIMESTAMP", 1, 10), 'YYYY-MM-DD') AS procedure_date, "MEDICALID" AS person_source_value FROM ( SELECT * FROM "BDC_BASE"."DW_Detect" UNION ALL SELECT * FROM "BDC_BASE"."DW_DetectA" ) WHERE TO_DATE(SUBSTRING("PTIMESTAMP", 1, 10), 'YYYY-MM-DD') >= '2010-01-01'::DATE AND "MEDICALID" ~ '^[0-9]+$' GROUP BY procedure_date, person_source_value; ``` ## 錯誤排除 ### Achilles錯誤 #### 錯誤訊息:`ERROR: index "idx_ar_aid" does not exist` - 根據[此討論](https://forums.ohdsi.org/t/error-index-idx-ar-aid-does-not-exist-sql-drop-index-ohdsi-idx-ar-aid/10641),此錯誤並非為嚴重錯誤。 #### 錯誤訊息:`ERROR: could not resize shared memory` - 解決方式為於`docker-compose.yml`中增加`shm_size`的值(預設為`"64m"`)。例如: ```yaml= services: postgres: image: postgres:latest environment: POSTGRES_PASSWORD: your_password ports: - "5432:5432" shm_size: "8g" # Set shared memory to 8GB # ... other configurations ``` ## 自動化 利用Python腳本自動初始化資料庫並執行SQL檔案來匯入資料。 ![image](https://hackmd.io/_uploads/SyaCeoPBJe.png) ## Broadsea 3.5 ![image](https://hackmd.io/_uploads/BJlIcbKI1l.png) ![image](https://hackmd.io/_uploads/rJgqjZK8ye.png) ![image](https://hackmd.io/_uploads/r1VtiWY81e.png) ## omoponfhir-main - [GitHub](https://github.com/omoponfhir/omoponfhir-main) ### 解決無Patient問題 - [解決方式](https://github.com/omoponfhir/omoponfhir-main/issues/3#issuecomment-569881201) - omoponfhir執行後,需要自行載入`person_id`至`f_person`表格中,而其他欄位可為NULL。 - 範例: ```sql= INSERT INTO sas.f_person ( person_id ) SELECT person_id FROM sas.person; ``` ```sql= INSERT INTO sas.f_observation_view ( observation_id, observation_date, observation_datetime, observation_source_value, qualifier_source_value, unit_source_value, value_as_number, value_as_string, value_source_value, person_id, observation_concept_id, observation_source_concept_id, observation_type_concept_id, provider_id, qualifier_concept_id, unit_concept_id, value_as_concept_id, visit_occurrence_id ) SELECT observation_id, observation_date, observation_datetime, observation_source_value, qualifier_source_value, unit_source_value, value_as_number, value_as_string, value_source_value, person_id, observation_concept_id, observation_source_concept_id, observation_type_concept_id, provider_id, qualifier_concept_id, unit_concept_id, value_as_concept_id, visit_occurrence_id FROM sas.observation; ``` ### 範例:取得病人資料 - http://localhost:8080/omoponfhir4/fhir/Patient/1 ![image](https://hackmd.io/_uploads/SyThGae_yx.png) ## 參考連結 [Entity-relationship Diagram of OMOP CDM](https://ohdsi.github.io/CommonDataModel/cdm54erd.html) [OHDSI Software Tools](https://www.ohdsi.org/software-tools/) [WhiteRabbit](https://ohdsi.github.io/WhiteRabbit/WhiteRabbit.html) [健保特約醫療院所名冊(特約醫事機構代碼)](https://info.nhi.gov.tw/IODE0000/IODE0000S09?id=1120) [健保用藥品項114年1月查詢檔](https://www.nhi.gov.tw/ch/cp-5622-8092a-2466-1.html) [支付標準代碼](https://info.nhi.gov.tw/INAE5000/INAE5001S01) [健保用藥品項查詢項目檔](https://data.gov.tw/dataset/23715) [健保資料庫(NHIRD)之資料庫串聯](https://linhsingfen.blogspot.com/2019/01/nhird_2.html) [OHDSI Tool Documents](https://rwd-data-environment-in-hospital.github.io/Documents/Usagi_setup.html) [Atlas準備作業](https://github.com/OHDSI/WebAPI/wiki/CDM-Configuration)