# Structured Process for Synthesizing Clinical Trial Data James Joseph and/on behalf Trinath Panda Sacramento, California @ WUSS on Thursday, September 5, 2024 2:30 PM PST contact@edaclinical.com --- # Synthesizing Realistic Clinical Trial Data Using Synthea James Joseph and/on behalf of **Anna Yudovin,** Trinath Panda North Bethesda, Maryland Tuesday, Septemeber 24, 2024, 10:00 AM EST contact@edaclinical.com --- ## Introduction - **Objective:** Introduce a structured process for synthesizing clinical trial data as it is collected. - **Key Features:** Defines data synthesis expectations, ensures reproducibility, and evaluates outcomes. - **Training Scenario:** Statistical programmers manage source data changes and synthesis to deepen clinical trial understanding. <!-- **Speaking Notes:** - This structured process replicates the real-world collection and management of clinical trial data, helping programmers practice handling complex data structures and error identification. --> --- ## Overview of the Structured Process - **Adaptation:** like data definitino table, but we specify data generation (not tabulation) - **Focus:** Reproducibility, practical training, and synthesis of trial data as collected. - **Conformance Engine:** Uses a conformance rules engine to score candidates’ data synthesis based on CDISC standards, sponsor-specific rules, and study-specific criteria. <!-- **Speaking Notes:** - Scoring is based on a combination of CDISC standards and sponsor-defined requirements. These scores help identify weaknesses in data conformance. - Examples of sponsor-specific rules might include study-specific variables like patient inclusion/exclusion criteria. --> --- ## What is Synthetic Data? - **Definition:** Data generated to resemble real clinical trial data. - **Difference from Simulated Data:** Controlled creation of data vs. generating data to fit external targets. <!-- **Speaking Notes:** - Synthetic data generation provides control over scenarios, letting us introduce custom issues or patterns, such as errors in data types or missing fields, for focused learning. --> --- ## Key Variables for Tab (1 of 2) | Tab | Key Variables | |---------------------|-------------------------------| | **1: Trial Design Domains** | VISITNUM, VISIT, VISITDY | Ensures consistent data collection and trial scheduling. | | **2: Form Inventory** | FormID, Form Name, ~~Conditional Flag~~ | Tracks forms that are always or conditionally collected. | | **3: Visit Forms** | VISITNUM, VISITDY, FormID | Links forms to visits for correct data collection. | <!-- **Speaking Notes:** all about the visits and the forms in this tab ~~Event~~ Conditions - unscheduled visit, adverse event, re-implementation device, then these forms are used. --> --- ## Key Variables for Tab (2 of 2) | Tab | Key Variables | |---------------------|-------------------------------| | **4: Field Inventory** | FormID, FieldID, ~~Field Label~~ | data points are accurately captured. | | **5: Field Attributes** | FieldID, ~~Type, Format, Codelist, Origin, SigDig~~ | Defines field attributes for accurate data synthesis. | | **6: Field Methods** | FieldID, ~~Derivation/Method, Comment~~ | Defines data synthesis methods to match real clinical scenarios. | <!-- **Speaking Notes:** We do not need a unique field id , same form same way no matter what visit --> --- ## Tab 1: Trial Design **Instructions:** - Identify all timing variables through the schedule of assessments. - Focus on trial visit number (VISITNUM), description (VISIT), and study day (VISITDY). --- **Trial Design** | VISITNUM | VISIT | VISITDY | |----------|----------|---------| | 001 | Baseline | 1 | | 002 | Week 1 | 7 | | 003 | Week 2 | 14 | | 004 | Week 3 | 21 | | 005 | Follow-up | 30 | <!-- **Speaking Notes:** - data is generated/collected at the right times. Any mismatch in these can lead to misaligned data, affecting analysis. - Example: A mismatch in Visit Day (VISITDY) could result in incorrect timelines for patient assessments, leading to data inconsistencies. --> --- ## Tab 2: Form Inventory **Instructions:** - Assign unique FormIDs to each form used in the casebook. - Flag forms that are conditionally collected based on trial needs. --- **Tab 1: Form Inventory** | FormID | Form Name | Event Flag (Yes/No) | |--------|--------------------|---------------------------| | F001 | Informed Consent | No* | | F002 | Adverse Event | Yes | | F003 | Demographics | No | | F004 | Vital Signs | No | <!-- **Speaking Notes:** need from all patients. we are creating all patient data, then assigning conditional rates, then restricting full set of patient visit data - Conditional forms (like F001 - Informed Consent) are only collected if specific criteria are met, such as a patient agreeing to join the trial.--> --- **Tab 2: Form Inventory** | FormID | Form Name | Event Flag (Yes/No) | |--------|--------------------|---------------------------| | F005 | Laboratory Results | No | | F006 | Medical History | Yes | <!-- **Speaking Notes:** - Forms like F005 (Laboratory Results) are collected routinely for all patients, whereas forms like F001 (Informed Consent) may only be triggered by specific conditions. - Correctly flagging these forms prevents unnecessary data collection, saving time and effort. --> --- ## Tab 3: Visit Forms **Instructions:** - Track all combinations of time points and forms. - Ensure forms repeat at appropriate time points when scheduled. --- **Tab 3: Visit Forms** | VISITNUM | VISITDY | ~~VISIT~~ | FormID | ~~Form Name~~ | |----------|---------|----------|--------|----------------------| | 1 | 0 | Baseline | F003 | Demographics Form | | 1 | 0 | Baseline | F004 | Vital Signs Form | | 2 | 7 | Week 1 | F004 | Vital Signs Form | | 2 | 7 | Week 1 | F005 | Laboratory Results | <!-- **Speaking Notes:** - Each row represents a visit where a form is administered. For example, Demographics and Vital Signs are collected during the baseline visit. - This tab helps ensure forms are not missed during the trial, which could result in gaps in patient data. --> --- **Tab 3: Visit Forms** | VISITNUM | VISITDY | ~~VISIT~~ | FormID | ~~Form Name~~ | |----------|---------|----------|--------|----------------------| | 3 | 14 | Week 2 | F004 | Vital Signs Form | | 3 | 14 | Week 2 | F005 | Laboratory Results | <!-- **Speaking Notes:** - Forms like Vital Signs (F004) may be collected at multiple visits, and this tab ensures consistent tracking across all visits. - Repeating forms like F005 (Laboratory Results) ensures key data is captured at critical points in the trial, such as every few weeks. combination of all unique visit and forms --> --- ## Field Tabs | Tab | Key Variables | |---------------------|-------------------------------| | **4: Field Inventory** | FormID, FieldID, ~~Field Label~~ | data points are accurately captured. | | **5: Field Attributes** | FieldID, ~~Type, Format, Codelist, Origin, SigDig~~ | Defines field attributes for accurate data synthesis. | | **6: Field Methods** | FieldID, ~~Derivation/Method, Comment~~ | Defines data synthesis methods to match real clinical scenarios. | <!-- **Speaking Notes:** all have same number of rows --> --- ## Tab 4: Field Inventory - List all fields, assigning unique FieldIDs. - Ensure clear Field Labels and link fields to their FormID. - Document every field to ensure no data is missed. --- **Tab 4: Field Inventory (1 of 2)** | FormID | FieldID | Field Label | OutputName | |--------|----------|---------------------------|---------------| | F003 | F003.01 | Subject ID | Subject_ID | | F003 | F003.02 | Date of Birth | DOB | | F004 | F004.01 | Systolic Blood Pressure | Systolic_BP | <!-- **Speaking Notes:** - For fields like Subject ID (Required), the data must be consistently entered. Missing a required field violates the trial protocol and will cause issues during submission. - Conditional fields like Systolic_BP may only be required under specific conditions, such as during certain visits. --> --- **Tab 4: Field Inventory (2 of 2)** | FormID | FieldID | Field Label | OutputName | |--------|----------|---------------------------|---------------| | F004 | F004.02 | Diastolic Blood Pressure | Diastolic_BP | | F005 | F005.01 | Hemoglobin Level | Hgb_Level | | F005 | F005.02 | White Blood Cell Count | WBC_Count | <!-- **Speaking Notes:** - Permissible fields like Hemoglobin Level offer flexibility for data entry when it’s not strictly required but can be useful for exploratory analysis. - Ensuring fields like WBC_Count are marked as Required guarantees essential data is collected for key trial metrics. --> --- ## Tab 5: Field Attributes - Define field attributes: type, format, controlled terms. - Specify significant digits for numeric fields. - Classify fields as **Required**, **Expected**, **Conditionally Required**, or **Permissible**. <!-- **Speaking Notes:** cannot fill in NOT DONE if there is a value there --> --- **Tab 5: Field Attributes (1 of 2)** | FieldID | OutputName | Type1 | Type2 | Format | Controlled Terms/Codelist | Significant Digits | ~~Origin~~ | State/Classification | |----------|---------------------|-------|--------|-----------|---------------------------|--------------------|-----------|---------------------------| | F001.01 | DOB | Char | Closed | YYYY-MM-DD | N/A | N/A | Collected | Expected | | F003.02 | Diastolic_BP | Num | Closed | 999/99 | Normal, Elevated, Hypertensive | 2 | Collected | Conditionally Required | <!-- **Speaking Notes:** - Controlled terms and codelists ensure that data entry for fields like Blood Pressure stays within defined ranges (Normal, Elevated, etc.), preventing data errors. - Example: Incorrect formatting of Date of Birth leads to downstream analysis errors if not fixed early. --> --- **Tab 5: Field Attributes (2 of 2)** | FieldID | Hgb_Level | Type1 | Type2 | Format | Controlled Terms/Codelist | Significant Digits | Origin | State/Classification | |----------|---------------------|-------|--------|-----------|---------------------------|--------------------|-----------|---------------------------| | F004.05 | Hgb_Level | Num | Closed | 9.99 | Low, Normal, High | 2 | Collected | Permissible | | F005.07 | Adverse Event | Char | Open | N/A | Headache, Nausea, Rash | N/A | Required | <!-- **Speaking Notes:** - Permissible fields like Hemoglobin allow flexibility, while required fields like Adverse Event are critical to understanding trial outcomes and safety measures. --> --- ## Tab 6: Field Methods **Instructions** - Define the method or logic for generating data. - Later: simulate population, event rates, times See Wilins --- **Tab 6: Field Methods** | FieldID | Derivation/Method | Comments | |----------|------------------------------------------------------|-----------------------------------------| | F003.01 | N/A | Directly collected from subject | | F003.02 | N/A | Directly collected from subject | | F004.01 | Randomly generated within normal range (90-120 mmHg) | Based on typical vital sign ranges | <!-- **Speaking Notes:** - Methods like random generation for fields such as blood pressure ensure that the synthesized data matches real-world ranges. - If blood pressure is measured only during certain visits, it should be flagged as conditionally required. --> --- ## Tab 6: Field Methods (2 of 2) | FieldID | Derivation/Method | Comments | |----------|------------------------------------------------------|-----------------------------------------| | F004.02 | Randomly generated within normal range (60-80 mmHg) | Based on typical vital sign ranges | | F005.01 | Mean of previous 3 Hemoglobin values | Calculated if previous data available | <!-- **Speaking Notes:** - Using prior values to calculate Hemoglobin levels creates more realistic synthetic data, ensuring data is consistent over time. - Random generation ensures required fields like blood pressure fall within expected clinical ranges. --> --- ## Testing Programmers | Issue | Example | |-------------------------------------|---------------------------------------------------------------------------| | Increase/decrease | Events, observed values, subjects | Inconsistent data | Changing format, terms, linkages | | Missing values | Lab test note done, subject disposed | Unscheduled visits | New visitnums, windows, denominators --- ## Synthetic Data Use Cases | Use Case | Description | Example Application | |-----------------------------------------|------------------------------------------------|---------------------------------------------| | **Standard of Care** | Compare new treatment against existing care | Phase 3 trials comparing treatments | | **Placebo for Safety** | Use synthetic placebo for safety assessments | Phase 2 trials where placebo may be unethical| | **Phase 4/Real-World Evidence** | Generate synthetic data post-market approval | Drug label expansion, new indications | | **Regulatory Submissions** | Synthetic data used to support drug approvals | Supporting evidence for indication expansion| <!-- **Speaking Notes:** - Synthetic data has been widely used across multiple phases of drug development. For example, in Phase 3 trials, it helps compare the efficacy of new treatments with the standard of care. - In Phase 4, synthetic data supports real-world evidence for drug label expansion or new indications, providing critical data to regulators. --> --- ## Adapting Field Methods for Real Clinical Data 1. **Continuous Variables:** Simulate data such as **lab results** or **vital signs** using normal distributions with appropriate mean and variance. --- - Use the **RAND** function in SAS to simulate data from distributions like Normal, Uniform, or Exponential. - Adjust parameters to fit clinical trial data expectations: - **Example:** Generate continuous variables such as blood pressure or glucose levels. --- 2. **Categorical Variables:** Simulate patient classifications (e.g., response categories, adverse event severity) using discrete distributions. - Example: Simulate **Concomitant Medication Use (CM)** by generating probabilities for medication classes --- Use **RAND('TABLE', p1, p2, ...)** to generate categorical variables based on predefined probabilities. - Useful for variables like **Adverse Event Severity** or **Treatment Response**. - **Example**: Assigning mild, moderate, and severe adverse event categories to patient data. --- 3. Mixture Distributions for Realistic Clinical Data Combine multiple distributions to model complex clinical outcomes. - **Example:** Simulate patient visit durations based on categories like routine visits vs. complex follow-ups. - Use normal distributions with different means to represent each subpopulation. --- ## Overview | Variable Type | Method | Example | |----------------------|------------------------------------------------|---------| | Continuous Variables | **RAND** function (Normal, Uniform, etc.) | Blood pressure, glucose levels | | Categorical Variables| **RAND("Table")** for empirical distributions | Adverse event severity, CM use | | Mixture Distributions| **RAND + Mixtures** for complex scenarios | Visit type, treatment effects | --- ## **Optimize** - Simulate 1,000 patient profiles with baseline characteristics and follow-up visits. - **Use BY-group processing** to run multiple simulations efficiently. - Generate multiple patient profiles at once, reducing overhead. --- ## Better models - Leverage more advanced techniques such as **multivariate normal distributions** for correlated variables in trials (e.g., patient age, treatment dose, and outcome). - Consider using **SAS/IML** for complex simulations where correlated or multivariate data is needed. --- ## So far Field Methods should reflect real-world data collection practices: - Continuous variables should have biologically **plausible ranges**. - Categorical variables should reflect **probabilities seen** in trial populations. - Mixture distributions when multiple **conditions apply**. --- # Using Synthea to Synthesize Data for the SilentNight Study --- ### **SilentNight Study** - **Title**: In-Home Assessment of Three Anti-Snoring Devices - **Protocol**: SRC-AI-SilentNight-10090 - **Objective**: Compare the effectiveness of three anti-snoring devices (Mute, myTAP V, SPT) using patient-reported outcomes and snoring data. --- The study includes: - Cross-over design (each participant serves as their own control). - Use of audio recordings and questionnaires to assess snoring severity and sleep quality. --- ## Using Synthea to Generate Realistic Patient Data **Synthea** can be used to create synthetic patients with conditions like snoring and related interventions. Before we set up the synthetic patient generation process for the SilentNight study... --- ## What is Synthea? An **open-source tool** for generating synthetic Electronic Health Records (EHRs). --- **Purpose** To provide synthetic patient data for research, testing, and training purposes **without violating privacy** or **intellectual property** laws. --- **Features**: - Simulates **entire lifespans** of patients. - Encodes outputs in **FHIR** and **C-CDA** formats. - Supports modules that represent disease **progression** and treatment. --- ## Origins of Synthea ``MITRE`` ![image](https://hackmd.io/_uploads/rkStw4eAR.png) Massachusetts Institute Technology Research and Engineering --- ## **Objective** To create a system for generating large-scale, realistic synthetic health records for secondary use in research, development, and education. --- ## Motivation **Healthcare Challenges**: - Limited access to real patient data due to privacy concerns. - Anonymization is often insufficient and can still lead to re-identification risks. - High demand for high-quality data for research, development, and clinical training. --- ## **Project Launch** Synthea generates synthetic data using publicly available health statistics and disease models Initially focused on **modeling the progression** of common diseases like **Type 2 Diabetes** <!--, but quickly expanded to include multiple chronic conditions and primary care visits. --> --- ## PADARSER Framework Publicly Available Data Approach to Realistic Synthetic EHR (used by Synthea). [![image](https://hackmd.io/_uploads/r1eGi4xAC.png)](https://watermark.silverchair.com/ocx079.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAA10wggNZBgkqhkiG9w0BBwagggNKMIIDRgIBADCCAz8GCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMHJ-jCFX2FkMzXKR2AgEQgIIDEH21y-YKBwVjJmpXT5I1norMagimmAL5O5ftwepTYXkUZNi7bhXbr_ZTPyxXVczhGzOnYUL9Mc1i1Pm5hItL6ZleuE8pWsxPtnLf3GambWYGcSyVkxypPuNbyhQQYZiphoTVDzVveylpcY2yhhAGVV71lTWGboMPjrOIZyefReKExE9PHew61HiCCNDq9xbkm-q3j_b9p7UrkTqrKErEG-hAog-_gh7qExyfLI0m58JC7ludw70urx1acLCnSZJXxwNQMv6rqBIiDXgczIbZXAEyuIZqoNFVnk9FXmwrIqUaDljTaFsVs_22yTZOjm6gpHY0M9FcGruNYcr02v_hl3AMNw0MQkj2trFncy6cmnptYaSxhVCfeAqkpFXP2AlAAkuvRpWy4e6c6pWhShemB915zbVi8mP8OfOC28JuIR-L-iEL_8B1H5knwpUkYHGa1vKdASX7FhOZMhv0Amjcagk1_KwkEbF6FXsk9q7luOPAeLcUJShRXLHPGDfa8vm3wuxDZTIjqL5eJoRycuAop7_KQM1KTj7y3bhY4W6N5qU1Hycu47oGSTBUbWhpOxX00L_Wf_mZhO7NqI0Zrxvkw1jwu6-oTRXmIZZYe_eLs-0vuJfRF3tpm0q1Ky0QyQUTIvnI_dUEZohW89HhVi_zuiDnTr47Uia8ED9BjCw-C2Znj-s1al3XwX3O4x_2rGiV5RhPR4MDNCl-f1x-k-dEkboGSxqbp9VCse3o77hsL6fv1DFad7dVJQ7EEPFSIxOPz0ec3FgKQTZ0PjCHJUCyVsMCPKVYizgcNRLCVjDx-XW0JZitqqEblD20xSDivZUNDcRR8oCR3Cgb54sPd7Cw9wortaJRznQ-BliPyPTPH5McI7vrQRzCDHVCGtoR5H9YKdHHbo5tFsEidHuSooOjeRrY6N0bLM9_YSJzZCPILrFvOwJS6Ok9PhdrqAuxo7io4WG8ISecYAnQPOwCTMDJS_iyv7kquEIuinoETaHB78XmyjOvcgvh8BVYa9kqmo16s9eLT3Vyzb9wdMJ-zWXoTfg) --- ``PADARSER`` 1. **Public Data**: Uses publicly available health statistics. 2. **No Real EHR Data**: Synthetic data generated without real patient records. 3. **Care Maps**: Based on clinical guidelines or expert input. 4. **Privacy Preservation**: Completely eliminates risks of re-identification. --- **Output** Provides synthetic patient records that mimic real-world healthcare data but are risk-free. --- ## **Data Sources**: - U.S. Census Bureau demographics. - CDC disease prevalence and incidence rates. - NIH research reports. --- **Top 10 Reasons for Primary Care Visits** 1. Routine health checks. 2. Hypertension. 3. Diabetes. 4. Pregnancy. 5. Respiratory infections. 6. General exams. 7. Lipid metabolism disorders. 8. Ear infections. 9. Asthma. 10. Urinary tract infections. --- **Top 10 Chronic Conditions Modeled**: Includes **Ischemic heart disease**, **Lung cancer**, **Alzheimer’s**, and more. --- ## How Synthea Works - **Modular Design**: - Diseases and treatments are represented as modules. - Modules can be developed and expanded by the community. --- ### **How Simulations Work** - **Life-Long Simulation**: From birth to death, Synthea simulates patient health states, encounters, diagnoses, and treatments. - **Timesteps**: Configurable periods (usually 7 days) where health events (e.g., medical visits, diagnoses) occur. --- ### **Core Elements** - **State Transition Machines**: Each health state (e.g., “Infection,” “Diagnosis”) is represented as a state in a JSON-based machine. - **Clinical Care Maps**: Dictate patient progression based on real clinical guidelines. - **Census Data**: Helps simulate population-level health conditions. --- #### State Transition Machine ![image](https://hackmd.io/_uploads/SJ552NgAC.png) --- **Types of States**: - **Control States**: Manage the flow of the simulation (e.g., delays, filters). - **Clinical States**: Represent medical events (e.g., conditions, medications, procedures). --- **Transitions**: - **Direct**: Moves patients between states. - **Conditional**: Transitions based on patient attributes (e.g., age, gender). - **Distributed**: Random transitions based on predefined probabilities. --- #### Clinical Care Maps ![1000012925](https://hackmd.io/_uploads/SksnsNlAR.jpg) <!-- hypertension: encounters: doctors, prescription, fainting, symptoms: increase in blood pressure, - mind maps --> --- ### **Outputs** - EHR records are produced in **FHIR** and **C-CDA** formats. - Accessible via **FHIR API**. --- #### Patients ```csvpreview Id,BIRTHDATE,DEATHDATE,SSN,DRIVERS,PASSPORT,PREFIX,FIRST,MIDDLE,LAST,SUFFIX,MAIDEN,MARITAL,RACE,ETHNICITY,GENDER,BIRTHPLACE,ADDRESS,CITY,STATE,COUNTY,FIPS,ZIP,LAT,LON,HEALTHCARE_EXPENSES,HEALTHCARE_COVERAGE,INCOME 39a59f42-5a52-886d-844b-183228219521,1989-01-13,,999-23-7837,S99973957,X28834206X,Mrs.,Vonda514,,Kutch271,,Littel644,M,white,nonhispanic,F,Nice Provence-Alpes-Cote dAzur FR,508 Gerhold View Unit 85,Worcester,Massachusetts,Worcester County,25027,01602,42.177418916419676,-71.74487793141057,10498.40,0.00,95959 ``` --- #### Condition ```csvpreview START,STOP,PATIENT,ENCOUNTER,SYSTEM,CODE,DESCRIPTION 2005-10-28,,8c85983a-a538-522f-bce0-03678b0fc7ce,9c197d02-ae7d-7f32-8068-58841a3f4a1e,http://snomed.info/sct,39898005,Sleep disorder (disorder) 2005-12-04,,8c85983a-a538-522f-bce0-03678b0fc7ce,204970d2-19c3-c299-8ff1-ce351d0a28ae,http://snomed.info/sct,78275009,Obstructive sleep apnea syndrome (disorder) ``` --- #### Encounter Id,START,STOP,PATIENT,ORGANIZATION,PROVIDER,PAYER,ENCOUNTERCLASS,CODE,DESCRIPTION,BASE_ENCOUNTER_COST,TOTAL_CLAIM_COST,PAYER_COVERAGE,REASONCODE,REASONDESCRIPTION 7e1ff077-0d86-10af-07b7-e98ab77ae1bd,2017-03-23T21:49:46Z,2017-03-23T22:04:46Z,39a59f42-5a52-886d-844b-183228219521,a6fb79e7-4abb-3a68-b62d-e501427fdca4,d173012f-c03f-38f8-b4e2-4d128736f74b,26aab0cd-6aba-3e1b-ac5b-05c8867e762c,wellness,162673000,General examination of patient (procedure),136.80,272.80,0.00,, --- #### Observation ```csvpreview DATE,PATIENT,ENCOUNTER,CATEGORY,CODE,DESCRIPTION,VALUE,UNITS,TYPE 2014-05-13T03:13:21Z,2e23caa4-d831-1f47-c522-0518bab7bd3d,,,QALY,QALY,38.0,a,numeric 2014-03-19T02:47:28Z,7cd3260f-2795-3080-b267-3bb0a9f40624,,,QALY,QALY,37.0,a,numeric 2014-08-03T04:35:13Z,3f6021df-8aac-ed27-a705-e71bc0e4b649,,,QALY,QALY,24.0,a,numeric ``` --- #### Device ```csvpreview START,STOP,PATIENT,ENCOUNTER,CODE,DESCRIPTION,UDI 2005-12-03T18:20:11Z,2005-12-03T23:06:11Z,8c85983a-a538-522f-bce0-03678b0fc7ce,204970d2-19c3-c299-8ff1-ce351d0a28ae,701077002,Respiratory apnea monitoring system (physical object),(01)45065531515463(11)051112(17)301127(10)50306183570988073(21)60077480 ``` --- ## **FHIR API Integration**: Synthea provides access to synthetic patient records via FHIR API, enabling real-time data querying for developers and researchers. --- ## 1 - Install and Set Up Synthea 1. **Clone the Synthea Repository**: ```bash git clone https://github.com/synthetichealth/synthea.git cd synthea ./gradlew build ``` 2. Configure Population: To match the study requirements, configure Synthea to generate a population reflecting the characteristics of the study subjects (e.g., adults with snoring issues). Modify the population settings in the synthea.properties file: synthea.population.population_size = 30 --- ## 2 - Modify Existing Modules or Create a New Module Synthea has existing modules for conditions like sleep apnea, but for the SilentNight study, we need to model snoring and different device interventions. 1. Existing Module: The Sleep Apnea module can be modified to reflect the trial conditions. 2. Custom Snoring Module: Create a new custom module for snoring with various interventions (Mute, myTAP V, SPT). --- Example JSON Structure for a Snoring Module: ```json { "name": "Snoring", "states": { "Initial": { "type": "Initial", "direct_transition": "Snoring_Condition" }, "Snoring_Condition": { "type": "ConditionOnset", "target_encounter": "Snoring", "codes": [{ "system": "SNOMED-CT", "code": "52818001", "display": "Snoring" }], "direct_transition": "Trial_Device" }, "Trial_Device": { "type": "Simple", "direct_transition": "Device_Outcome" }, "Device_Outcome": { "type": "Observation", "codes": [{ "system": "SNOMED-CT", "code": "LA6568-3", "display": "Reduced snoring severity" }], "direct_transition": "End" } } } ``` --- ### 3 - Define Patient Characteristics The SilentNight study involves specific inclusion and exclusion criteria. Synthea allows us to define patient demographics based on these criteria. 1. Inclusion: Adults aged 21 to 55. History of snoring for over 6 months. Co-sleeping with a bed partner. --- 2. Exclusion: High risk of obstructive sleep apnea. Respiratory conditions like COPD. These can be specified in the module as patient attributes: ```json "states": { "Patient_Attributes": { "type": "Attribute", "attributes": { "age": { "value": 21, "distribution": "uniform", "maximum": 55 }, "snoring_history": true, "co_sleeping": true } } } ``` --- ### 4 - Implement Interventions 1. Mute Device: Create an encounter where the snorer uses the Mute nasal device for 1 week. ```json { "Mute_Trial": { "type": "Procedure", "codes": [{ "system": "SNOMED-CT", "code": "73761001", "display": "Nasal dilator therapy" }], "direct_transition": "Mute_Outcome" } } ``` --- 2. myTAP V Device: Define a mandibular advancement procedure using the myTAP V for 2 weeks. 3. SPT Device: Define positional therapy using SPT, including calibration and sleep training modes. --- ### 5 - Output Data for Analysis Synthea generates outputs in FHIR, CSV, and other formats. For the SilentNight study, CSV output can be used for analysis of snoring severity and device outcomes. ```bash ./run_synthea -p 30 --exporter.csv.export true ``` Review Output Files: patients.csv: Contains patient demographics. observations.csv: Includes snoring severity observations. procedures.csv: Contains data on the interventions (Mute, myTAP, SPT). --- ### 6 - Analyze Synthetic Data 1. Primary Analysis: Compare snoring severity across device use (Mute, myTAP V, SPT). Use CSV data outputs to calculate average snoring severity for each intervention period. --- 2. Secondary Analysis: Assess patient-reported outcomes regarding device comfort and effectiveness. Tools like R or Python can be used to standardize and analyze the CSV files to produce Tables, Listings and Figures for the Clinical Study Report. --- ## Next Steps **Expanding Modules** - More diseases and treatments to the system. - Improve disease progression models, simulate more complex clinical interactions (e.g., drug interactions, adherence rates). --- **Validation**: Ongoing validation efforts are comparing synthetic data with real-world statistics to ensure realism in population-level outputs. ---
{"image":"https://hackmd.io/_uploads/HJN7n8DhA.jpg","title":"Clinical Trial Data Synthesis ","description":"James Joseph, Trinath Panda","contributors":"[{\"id\":\"6911e540-3a20-4834-b9cd-5fa027a43ec9\",\"add\":30582,\"del\":8432},{\"id\":null,\"add\":4130,\"del\":8124}]"}
    295 views
   Owned this note