# Structured Process for Synthesizing Clinical Trial Data
James Joseph and/on behalf Trinath Panda
Sacramento, California @ WUSS on
Thursday, September 5, 2024 2:30 PM PST
contact@edaclinical.com
---
# Synthesizing Realistic Clinical Trial Data Using Synthea
James Joseph and/on behalf of **Anna Yudovin,** Trinath Panda
North Bethesda, Maryland
Tuesday, Septemeber 24, 2024, 10:00 AM EST
contact@edaclinical.com
---
## Introduction
- **Objective:** Introduce a structured process for synthesizing clinical trial data as it is collected.
- **Key Features:** Defines data synthesis expectations, ensures reproducibility, and evaluates outcomes.
- **Training Scenario:** Statistical programmers manage source data changes and synthesis to deepen clinical trial understanding.
<!--
**Speaking Notes:**
- This structured process replicates the real-world collection and management of clinical trial data,
helping programmers practice handling complex data structures and error identification.
-->
---
## Overview of the Structured Process
- **Adaptation:** like data definitino table, but we specify data generation (not tabulation)
- **Focus:** Reproducibility, practical training, and synthesis of trial data as collected.
- **Conformance Engine:** Uses a conformance rules engine to score candidates’ data synthesis based on CDISC standards, sponsor-specific rules, and study-specific criteria.
<!--
**Speaking Notes:**
- Scoring is based on a combination of CDISC standards and sponsor-defined requirements. These scores help identify weaknesses in data conformance.
- Examples of sponsor-specific rules might include study-specific variables like patient inclusion/exclusion criteria.
-->
---
## What is Synthetic Data?
- **Definition:** Data generated to resemble real clinical trial data.
- **Difference from Simulated Data:** Controlled creation of data vs. generating data to fit external targets.
<!--
**Speaking Notes:**
- Synthetic data generation provides control over scenarios, letting us introduce custom issues or patterns, such as errors in data types or missing fields, for focused learning.
-->
---
## Key Variables for Tab (1 of 2)
| Tab | Key Variables |
|---------------------|-------------------------------|
| **1: Trial Design Domains** | VISITNUM, VISIT, VISITDY | Ensures consistent data collection and trial scheduling. |
| **2: Form Inventory** | FormID, Form Name, ~~Conditional Flag~~ | Tracks forms that are always or conditionally collected. |
| **3: Visit Forms** | VISITNUM, VISITDY, FormID | Links forms to visits for correct data collection. |
<!--
**Speaking Notes:**
all about the visits and the forms in this tab
~~Event~~ Conditions - unscheduled visit, adverse event, re-implementation device, then these forms are used.
-->
---
## Key Variables for Tab (2 of 2)
| Tab | Key Variables |
|---------------------|-------------------------------|
| **4: Field Inventory** | FormID, FieldID, ~~Field Label~~ | data points are accurately captured. |
| **5: Field Attributes** | FieldID, ~~Type, Format, Codelist, Origin, SigDig~~ | Defines field attributes for accurate data synthesis. |
| **6: Field Methods** | FieldID, ~~Derivation/Method, Comment~~ | Defines data synthesis methods to match real clinical scenarios. |
<!--
**Speaking Notes:**
We do not need a unique field id , same form same way no matter what visit
-->
---
## Tab 1: Trial Design
**Instructions:**
- Identify all timing variables through the schedule of assessments.
- Focus on trial visit number (VISITNUM), description (VISIT), and study day (VISITDY).
---
**Trial Design**
| VISITNUM | VISIT | VISITDY |
|----------|----------|---------|
| 001 | Baseline | 1 |
| 002 | Week 1 | 7 |
| 003 | Week 2 | 14 |
| 004 | Week 3 | 21 |
| 005 | Follow-up | 30 |
<!--
**Speaking Notes:**
- data is generated/collected at the right times. Any mismatch in these can lead to misaligned data, affecting analysis.
- Example: A mismatch in Visit Day (VISITDY) could result in incorrect timelines for patient assessments, leading to data inconsistencies.
-->
---
## Tab 2: Form Inventory
**Instructions:**
- Assign unique FormIDs to each form used in the casebook.
- Flag forms that are conditionally collected based on trial needs.
---
**Tab 1: Form Inventory**
| FormID | Form Name | Event Flag (Yes/No) |
|--------|--------------------|---------------------------|
| F001 | Informed Consent | No* |
| F002 | Adverse Event | Yes |
| F003 | Demographics | No |
| F004 | Vital Signs | No |
<!--
**Speaking Notes:**
need from all patients.
we are creating all patient data, then assigning conditional rates, then restricting full set of patient visit data
- Conditional forms (like F001 - Informed Consent) are only collected if specific criteria are met, such as a patient agreeing to join the trial.-->
---
**Tab 2: Form Inventory**
| FormID | Form Name | Event Flag (Yes/No) |
|--------|--------------------|---------------------------|
| F005 | Laboratory Results | No |
| F006 | Medical History | Yes |
<!--
**Speaking Notes:**
- Forms like F005 (Laboratory Results) are collected routinely for all patients, whereas forms like F001 (Informed Consent) may only be triggered by specific conditions.
- Correctly flagging these forms prevents unnecessary data collection, saving time and effort.
-->
---
## Tab 3: Visit Forms
**Instructions:**
- Track all combinations of time points and forms.
- Ensure forms repeat at appropriate time points when scheduled.
---
**Tab 3: Visit Forms**
| VISITNUM | VISITDY | ~~VISIT~~ | FormID | ~~Form Name~~ |
|----------|---------|----------|--------|----------------------|
| 1 | 0 | Baseline | F003 | Demographics Form |
| 1 | 0 | Baseline | F004 | Vital Signs Form |
| 2 | 7 | Week 1 | F004 | Vital Signs Form |
| 2 | 7 | Week 1 | F005 | Laboratory Results |
<!--
**Speaking Notes:**
- Each row represents a visit where a form is administered. For example, Demographics and Vital Signs are collected during the baseline visit.
- This tab helps ensure forms are not missed during the trial, which could result in gaps in patient data.
-->
---
**Tab 3: Visit Forms**
| VISITNUM | VISITDY | ~~VISIT~~ | FormID | ~~Form Name~~ |
|----------|---------|----------|--------|----------------------|
| 3 | 14 | Week 2 | F004 | Vital Signs Form |
| 3 | 14 | Week 2 | F005 | Laboratory Results |
<!--
**Speaking Notes:**
- Forms like Vital Signs (F004) may be collected at multiple visits, and this tab ensures consistent tracking across all visits.
- Repeating forms like F005 (Laboratory Results) ensures key data is captured at critical points in the trial, such as every few weeks.
combination of all unique visit and forms
-->
---
## Field Tabs
| Tab | Key Variables |
|---------------------|-------------------------------|
| **4: Field Inventory** | FormID, FieldID, ~~Field Label~~ | data points are accurately captured. |
| **5: Field Attributes** | FieldID, ~~Type, Format, Codelist, Origin, SigDig~~ | Defines field attributes for accurate data synthesis. |
| **6: Field Methods** | FieldID, ~~Derivation/Method, Comment~~ | Defines data synthesis methods to match real clinical scenarios. |
<!--
**Speaking Notes:**
all have same number of rows
-->
---
## Tab 4: Field Inventory
- List all fields, assigning unique FieldIDs.
- Ensure clear Field Labels and link fields to their FormID.
- Document every field to ensure no data is missed.
---
**Tab 4: Field Inventory (1 of 2)**
| FormID | FieldID | Field Label | OutputName |
|--------|----------|---------------------------|---------------|
| F003 | F003.01 | Subject ID | Subject_ID |
| F003 | F003.02 | Date of Birth | DOB |
| F004 | F004.01 | Systolic Blood Pressure | Systolic_BP |
<!--
**Speaking Notes:**
- For fields like Subject ID (Required), the data must be consistently entered. Missing a required field violates the trial protocol and will cause issues during submission.
- Conditional fields like Systolic_BP may only be required under specific conditions, such as during certain visits.
-->
---
**Tab 4: Field Inventory (2 of 2)**
| FormID | FieldID | Field Label | OutputName |
|--------|----------|---------------------------|---------------|
| F004 | F004.02 | Diastolic Blood Pressure | Diastolic_BP |
| F005 | F005.01 | Hemoglobin Level | Hgb_Level |
| F005 | F005.02 | White Blood Cell Count | WBC_Count |
<!--
**Speaking Notes:**
- Permissible fields like Hemoglobin Level offer flexibility for data entry when it’s not strictly required but can be useful for exploratory analysis.
- Ensuring fields like WBC_Count are marked as Required guarantees essential data is collected for key trial metrics.
-->
---
## Tab 5: Field Attributes
- Define field attributes: type, format, controlled terms.
- Specify significant digits for numeric fields.
- Classify fields as **Required**, **Expected**, **Conditionally Required**, or **Permissible**.
<!--
**Speaking Notes:**
cannot fill in NOT DONE if there is a value there
-->
---
**Tab 5: Field Attributes (1 of 2)**
| FieldID | OutputName | Type1 | Type2 | Format | Controlled Terms/Codelist | Significant Digits | ~~Origin~~ | State/Classification |
|----------|---------------------|-------|--------|-----------|---------------------------|--------------------|-----------|---------------------------|
| F001.01 | DOB | Char | Closed | YYYY-MM-DD | N/A | N/A | Collected | Expected |
| F003.02 | Diastolic_BP | Num | Closed | 999/99 | Normal, Elevated, Hypertensive | 2 | Collected | Conditionally Required |
<!--
**Speaking Notes:**
- Controlled terms and codelists ensure that data entry for fields like Blood Pressure stays within defined ranges (Normal, Elevated, etc.), preventing data errors.
- Example: Incorrect formatting of Date of Birth leads to downstream analysis errors if not fixed early.
-->
---
**Tab 5: Field Attributes (2 of 2)**
| FieldID | Hgb_Level | Type1 | Type2 | Format | Controlled Terms/Codelist | Significant Digits | Origin | State/Classification |
|----------|---------------------|-------|--------|-----------|---------------------------|--------------------|-----------|---------------------------|
| F004.05 | Hgb_Level | Num | Closed | 9.99 | Low, Normal, High | 2 | Collected | Permissible |
| F005.07 | Adverse Event | Char | Open | N/A | Headache, Nausea, Rash | N/A | Required |
<!--
**Speaking Notes:**
- Permissible fields like Hemoglobin allow flexibility, while required fields like Adverse Event are critical to understanding trial outcomes and safety measures.
-->
---
## Tab 6: Field Methods
**Instructions**
- Define the method or logic for generating data.
- Later: simulate population, event rates, times
See Wilins
---
**Tab 6: Field Methods**
| FieldID | Derivation/Method | Comments |
|----------|------------------------------------------------------|-----------------------------------------|
| F003.01 | N/A | Directly collected from subject |
| F003.02 | N/A | Directly collected from subject |
| F004.01 | Randomly generated within normal range (90-120 mmHg) | Based on typical vital sign ranges |
<!--
**Speaking Notes:**
- Methods like random generation for fields such as blood pressure ensure that the synthesized data matches real-world ranges.
- If blood pressure is measured only during certain visits, it should be flagged as conditionally required.
-->
---
## Tab 6: Field Methods (2 of 2)
| FieldID | Derivation/Method | Comments |
|----------|------------------------------------------------------|-----------------------------------------|
| F004.02 | Randomly generated within normal range (60-80 mmHg) | Based on typical vital sign ranges |
| F005.01 | Mean of previous 3 Hemoglobin values | Calculated if previous data available |
<!--
**Speaking Notes:**
- Using prior values to calculate Hemoglobin levels creates more realistic synthetic data, ensuring data is consistent over time.
- Random generation ensures required fields like blood pressure fall within expected clinical ranges.
-->
---
## Testing Programmers
| Issue | Example |
|-------------------------------------|---------------------------------------------------------------------------|
| Increase/decrease | Events, observed values, subjects
| Inconsistent data | Changing format, terms, linkages |
| Missing values | Lab test note done, subject disposed
| Unscheduled visits | New visitnums, windows, denominators
---
## Synthetic Data Use Cases
| Use Case | Description | Example Application |
|-----------------------------------------|------------------------------------------------|---------------------------------------------|
| **Standard of Care** | Compare new treatment against existing care | Phase 3 trials comparing treatments |
| **Placebo for Safety** | Use synthetic placebo for safety assessments | Phase 2 trials where placebo may be unethical|
| **Phase 4/Real-World Evidence** | Generate synthetic data post-market approval | Drug label expansion, new indications |
| **Regulatory Submissions** | Synthetic data used to support drug approvals | Supporting evidence for indication expansion|
<!--
**Speaking Notes:**
- Synthetic data has been widely used across multiple phases of drug development. For example, in Phase 3 trials, it helps compare the efficacy of new treatments with the standard of care.
- In Phase 4, synthetic data supports real-world evidence for drug label expansion or new indications, providing critical data to regulators.
-->
---
## Adapting Field Methods for Real Clinical Data
1. **Continuous Variables:**
Simulate data such as **lab results** or **vital signs** using normal distributions with appropriate mean and variance.
---
- Use the **RAND** function in SAS to simulate data from distributions like Normal, Uniform, or Exponential.
- Adjust parameters to fit clinical trial data expectations:
- **Example:** Generate continuous variables such as blood pressure or glucose levels.
---
2. **Categorical Variables:**
Simulate patient classifications (e.g., response categories, adverse event severity) using discrete distributions.
- Example: Simulate **Concomitant Medication Use (CM)** by generating probabilities for medication classes
---
Use **RAND('TABLE', p1, p2, ...)**
to generate categorical variables based on predefined probabilities.
- Useful for variables like **Adverse Event Severity** or **Treatment Response**.
- **Example**: Assigning mild, moderate, and severe adverse event categories to patient data.
---
3. Mixture Distributions for Realistic Clinical Data
Combine multiple distributions to model complex clinical outcomes.
- **Example:** Simulate patient visit durations based on categories like routine visits vs. complex follow-ups.
- Use normal distributions with different means to represent each subpopulation.
---
## Overview
| Variable Type | Method | Example |
|----------------------|------------------------------------------------|---------|
| Continuous Variables | **RAND** function (Normal, Uniform, etc.) | Blood pressure, glucose levels |
| Categorical Variables| **RAND("Table")** for empirical distributions | Adverse event severity, CM use |
| Mixture Distributions| **RAND + Mixtures** for complex scenarios | Visit type, treatment effects |
---
## **Optimize**
- Simulate 1,000 patient profiles with baseline characteristics and follow-up visits.
- **Use BY-group processing** to run multiple simulations efficiently.
- Generate multiple patient profiles at once, reducing overhead.
---
## Better models
- Leverage more advanced techniques such as **multivariate normal distributions** for correlated variables in trials (e.g., patient age, treatment dose, and outcome).
- Consider using **SAS/IML** for complex simulations where correlated or multivariate data is needed.
---
## So far
Field Methods should reflect real-world data collection practices:
- Continuous variables should have biologically **plausible ranges**.
- Categorical variables should reflect **probabilities seen** in trial populations.
- Mixture distributions when multiple **conditions apply**.
---
# Using Synthea to Synthesize Data for the SilentNight Study
---
### **SilentNight Study**
- **Title**: In-Home Assessment of Three Anti-Snoring Devices
- **Protocol**: SRC-AI-SilentNight-10090
- **Objective**: Compare the effectiveness of three anti-snoring devices (Mute, myTAP V, SPT) using patient-reported outcomes and snoring data.
---
The study includes:
- Cross-over design (each participant serves as their own control).
- Use of audio recordings and questionnaires to assess snoring severity and sleep quality.
---
## Using Synthea to Generate Realistic Patient Data
**Synthea** can be used to create synthetic patients with conditions like snoring and related interventions.
Before we set up the synthetic patient generation process for the SilentNight study...
---
## What is Synthea?
An **open-source tool** for generating synthetic Electronic Health Records (EHRs).
---
**Purpose**
To provide synthetic patient data for research, testing, and training purposes **without violating privacy** or **intellectual property** laws.
---
**Features**:
- Simulates **entire lifespans** of patients.
- Encodes outputs in **FHIR** and **C-CDA** formats.
- Supports modules that represent disease **progression** and treatment.
---
## Origins of Synthea
``MITRE``

Massachusetts Institute Technology Research and Engineering
---
## **Objective**
To create a system for generating large-scale, realistic synthetic health records for secondary use in research, development, and education.
---
## Motivation
**Healthcare Challenges**:
- Limited access to real patient data due to privacy concerns.
- Anonymization is often insufficient and can still lead to re-identification risks.
- High demand for high-quality data for research, development, and clinical training.
---
## **Project Launch**
Synthea generates synthetic data using publicly available health statistics and disease models
Initially focused on **modeling the progression** of common diseases like **Type 2 Diabetes**
<!--, but quickly expanded to include multiple chronic conditions and primary care visits. -->
---
## PADARSER Framework
Publicly Available Data Approach to Realistic Synthetic EHR (used by Synthea).
[](https://watermark.silverchair.com/ocx079.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAA10wggNZBgkqhkiG9w0BBwagggNKMIIDRgIBADCCAz8GCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMHJ-jCFX2FkMzXKR2AgEQgIIDEH21y-YKBwVjJmpXT5I1norMagimmAL5O5ftwepTYXkUZNi7bhXbr_ZTPyxXVczhGzOnYUL9Mc1i1Pm5hItL6ZleuE8pWsxPtnLf3GambWYGcSyVkxypPuNbyhQQYZiphoTVDzVveylpcY2yhhAGVV71lTWGboMPjrOIZyefReKExE9PHew61HiCCNDq9xbkm-q3j_b9p7UrkTqrKErEG-hAog-_gh7qExyfLI0m58JC7ludw70urx1acLCnSZJXxwNQMv6rqBIiDXgczIbZXAEyuIZqoNFVnk9FXmwrIqUaDljTaFsVs_22yTZOjm6gpHY0M9FcGruNYcr02v_hl3AMNw0MQkj2trFncy6cmnptYaSxhVCfeAqkpFXP2AlAAkuvRpWy4e6c6pWhShemB915zbVi8mP8OfOC28JuIR-L-iEL_8B1H5knwpUkYHGa1vKdASX7FhOZMhv0Amjcagk1_KwkEbF6FXsk9q7luOPAeLcUJShRXLHPGDfa8vm3wuxDZTIjqL5eJoRycuAop7_KQM1KTj7y3bhY4W6N5qU1Hycu47oGSTBUbWhpOxX00L_Wf_mZhO7NqI0Zrxvkw1jwu6-oTRXmIZZYe_eLs-0vuJfRF3tpm0q1Ky0QyQUTIvnI_dUEZohW89HhVi_zuiDnTr47Uia8ED9BjCw-C2Znj-s1al3XwX3O4x_2rGiV5RhPR4MDNCl-f1x-k-dEkboGSxqbp9VCse3o77hsL6fv1DFad7dVJQ7EEPFSIxOPz0ec3FgKQTZ0PjCHJUCyVsMCPKVYizgcNRLCVjDx-XW0JZitqqEblD20xSDivZUNDcRR8oCR3Cgb54sPd7Cw9wortaJRznQ-BliPyPTPH5McI7vrQRzCDHVCGtoR5H9YKdHHbo5tFsEidHuSooOjeRrY6N0bLM9_YSJzZCPILrFvOwJS6Ok9PhdrqAuxo7io4WG8ISecYAnQPOwCTMDJS_iyv7kquEIuinoETaHB78XmyjOvcgvh8BVYa9kqmo16s9eLT3Vyzb9wdMJ-zWXoTfg)
---
``PADARSER``
1. **Public Data**: Uses publicly available health statistics.
2. **No Real EHR Data**: Synthetic data generated without real patient records.
3. **Care Maps**: Based on clinical guidelines or expert input.
4. **Privacy Preservation**: Completely eliminates risks of re-identification.
---
**Output**
Provides synthetic patient records that mimic real-world healthcare data but are risk-free.
---
## **Data Sources**:
- U.S. Census Bureau demographics.
- CDC disease prevalence and incidence rates.
- NIH research reports.
---
**Top 10 Reasons for Primary Care Visits**
1. Routine health checks.
2. Hypertension.
3. Diabetes.
4. Pregnancy.
5. Respiratory infections.
6. General exams.
7. Lipid metabolism disorders.
8. Ear infections.
9. Asthma.
10. Urinary tract infections.
---
**Top 10 Chronic Conditions Modeled**:
Includes **Ischemic heart disease**, **Lung cancer**, **Alzheimer’s**, and more.
---
## How Synthea Works
- **Modular Design**:
- Diseases and treatments are represented as modules.
- Modules can be developed and expanded by the community.
---
### **How Simulations Work**
- **Life-Long Simulation**: From birth to death, Synthea simulates patient health states, encounters, diagnoses, and treatments.
- **Timesteps**: Configurable periods (usually 7 days) where health events (e.g., medical visits, diagnoses) occur.
---
### **Core Elements**
- **State Transition Machines**: Each health state (e.g., “Infection,” “Diagnosis”) is represented as a state in a JSON-based machine.
- **Clinical Care Maps**: Dictate patient progression based on real clinical guidelines.
- **Census Data**: Helps simulate population-level health conditions.
---
#### State Transition Machine

---
**Types of States**:
- **Control States**: Manage the flow of the simulation (e.g., delays, filters).
- **Clinical States**: Represent medical events (e.g., conditions, medications, procedures).
---
**Transitions**:
- **Direct**: Moves patients between states.
- **Conditional**: Transitions based on patient attributes (e.g., age, gender).
- **Distributed**: Random transitions based on predefined probabilities.
---
#### Clinical Care Maps

<!-- hypertension: encounters: doctors, prescription, fainting, symptoms: increase in blood pressure, - mind maps -->
---
### **Outputs**
- EHR records are produced in **FHIR** and **C-CDA** formats.
- Accessible via **FHIR API**.
---
#### Patients
```csvpreview
Id,BIRTHDATE,DEATHDATE,SSN,DRIVERS,PASSPORT,PREFIX,FIRST,MIDDLE,LAST,SUFFIX,MAIDEN,MARITAL,RACE,ETHNICITY,GENDER,BIRTHPLACE,ADDRESS,CITY,STATE,COUNTY,FIPS,ZIP,LAT,LON,HEALTHCARE_EXPENSES,HEALTHCARE_COVERAGE,INCOME
39a59f42-5a52-886d-844b-183228219521,1989-01-13,,999-23-7837,S99973957,X28834206X,Mrs.,Vonda514,,Kutch271,,Littel644,M,white,nonhispanic,F,Nice Provence-Alpes-Cote dAzur FR,508 Gerhold View Unit 85,Worcester,Massachusetts,Worcester County,25027,01602,42.177418916419676,-71.74487793141057,10498.40,0.00,95959
```
---
#### Condition
```csvpreview
START,STOP,PATIENT,ENCOUNTER,SYSTEM,CODE,DESCRIPTION
2005-10-28,,8c85983a-a538-522f-bce0-03678b0fc7ce,9c197d02-ae7d-7f32-8068-58841a3f4a1e,http://snomed.info/sct,39898005,Sleep disorder (disorder) 2005-12-04,,8c85983a-a538-522f-bce0-03678b0fc7ce,204970d2-19c3-c299-8ff1-ce351d0a28ae,http://snomed.info/sct,78275009,Obstructive sleep apnea syndrome (disorder)
```
---
#### Encounter
Id,START,STOP,PATIENT,ORGANIZATION,PROVIDER,PAYER,ENCOUNTERCLASS,CODE,DESCRIPTION,BASE_ENCOUNTER_COST,TOTAL_CLAIM_COST,PAYER_COVERAGE,REASONCODE,REASONDESCRIPTION 7e1ff077-0d86-10af-07b7-e98ab77ae1bd,2017-03-23T21:49:46Z,2017-03-23T22:04:46Z,39a59f42-5a52-886d-844b-183228219521,a6fb79e7-4abb-3a68-b62d-e501427fdca4,d173012f-c03f-38f8-b4e2-4d128736f74b,26aab0cd-6aba-3e1b-ac5b-05c8867e762c,wellness,162673000,General examination of patient (procedure),136.80,272.80,0.00,,
---
#### Observation
```csvpreview
DATE,PATIENT,ENCOUNTER,CATEGORY,CODE,DESCRIPTION,VALUE,UNITS,TYPE 2014-05-13T03:13:21Z,2e23caa4-d831-1f47-c522-0518bab7bd3d,,,QALY,QALY,38.0,a,numeric 2014-03-19T02:47:28Z,7cd3260f-2795-3080-b267-3bb0a9f40624,,,QALY,QALY,37.0,a,numeric 2014-08-03T04:35:13Z,3f6021df-8aac-ed27-a705-e71bc0e4b649,,,QALY,QALY,24.0,a,numeric
```
---
#### Device
```csvpreview
START,STOP,PATIENT,ENCOUNTER,CODE,DESCRIPTION,UDI 2005-12-03T18:20:11Z,2005-12-03T23:06:11Z,8c85983a-a538-522f-bce0-03678b0fc7ce,204970d2-19c3-c299-8ff1-ce351d0a28ae,701077002,Respiratory apnea monitoring system (physical object),(01)45065531515463(11)051112(17)301127(10)50306183570988073(21)60077480
```
---
## **FHIR API Integration**:
Synthea provides access to synthetic patient records via FHIR API, enabling real-time data querying for developers and researchers.
---
## 1 - Install and Set Up Synthea
1. **Clone the Synthea Repository**:
```bash
git clone https://github.com/synthetichealth/synthea.git
cd synthea
./gradlew build
```
2. Configure Population: To match the study requirements, configure Synthea to generate a population reflecting the characteristics of the study subjects (e.g., adults with snoring issues).
Modify the population settings in the synthea.properties file:
synthea.population.population_size = 30
---
## 2 - Modify Existing Modules or Create a New Module
Synthea has existing modules for conditions like sleep apnea, but for the SilentNight study, we need to model snoring and different device interventions.
1. Existing Module: The Sleep Apnea module can be modified to reflect the trial conditions.
2. Custom Snoring Module: Create a new custom module for snoring with various interventions (Mute, myTAP V, SPT).
---
Example JSON Structure for a Snoring Module:
```json
{
"name": "Snoring",
"states": {
"Initial": {
"type": "Initial",
"direct_transition": "Snoring_Condition"
},
"Snoring_Condition": {
"type": "ConditionOnset",
"target_encounter": "Snoring",
"codes": [{
"system": "SNOMED-CT",
"code": "52818001",
"display": "Snoring"
}],
"direct_transition": "Trial_Device"
},
"Trial_Device": {
"type": "Simple",
"direct_transition": "Device_Outcome"
},
"Device_Outcome": {
"type": "Observation",
"codes": [{
"system": "SNOMED-CT",
"code": "LA6568-3",
"display": "Reduced snoring severity"
}],
"direct_transition": "End"
}
}
}
```
---
### 3 - Define Patient Characteristics
The SilentNight study involves specific inclusion and exclusion criteria. Synthea allows us to define patient demographics based on these criteria.
1. Inclusion:
Adults aged 21 to 55.
History of snoring for over 6 months.
Co-sleeping with a bed partner.
---
2. Exclusion:
High risk of obstructive sleep apnea.
Respiratory conditions like COPD.
These can be specified in the module as patient attributes:
```json
"states": {
"Patient_Attributes": {
"type": "Attribute",
"attributes": {
"age": {
"value": 21,
"distribution": "uniform",
"maximum": 55
},
"snoring_history": true,
"co_sleeping": true
}
}
}
```
---
### 4 - Implement Interventions
1. Mute Device: Create an encounter where the snorer uses the Mute nasal device for 1 week.
```json
{
"Mute_Trial": {
"type": "Procedure",
"codes": [{
"system": "SNOMED-CT",
"code": "73761001",
"display": "Nasal dilator therapy"
}],
"direct_transition": "Mute_Outcome"
}
}
```
---
2. myTAP V Device: Define a mandibular advancement procedure using the myTAP V for 2 weeks.
3. SPT Device: Define positional therapy using SPT, including calibration and sleep training modes.
---
### 5 - Output Data for Analysis
Synthea generates outputs in FHIR, CSV, and other formats. For the SilentNight study, CSV output can be used for analysis of snoring severity and device outcomes.
```bash
./run_synthea -p 30 --exporter.csv.export true
```
Review Output Files:
patients.csv: Contains patient demographics.
observations.csv: Includes snoring severity observations.
procedures.csv: Contains data on the interventions (Mute, myTAP, SPT).
---
### 6 - Analyze Synthetic Data
1. Primary Analysis:
Compare snoring severity across device use (Mute, myTAP V, SPT).
Use CSV data outputs to calculate average snoring severity for each intervention period.
---
2. Secondary Analysis:
Assess patient-reported outcomes regarding device comfort and effectiveness.
Tools like R or Python can be used to standardize and analyze the CSV files to produce Tables, Listings and Figures for the Clinical Study Report.
---
## Next Steps
**Expanding Modules**
- More diseases and treatments to the system.
- Improve disease progression models, simulate more complex clinical interactions (e.g., drug interactions, adherence rates).
---
**Validation**: Ongoing validation efforts are comparing synthetic data with real-world statistics to ensure realism in population-level outputs.
---
{"image":"https://hackmd.io/_uploads/HJN7n8DhA.jpg","title":"Clinical Trial Data Synthesis ","description":"James Joseph, Trinath Panda","contributors":"[{\"id\":\"6911e540-3a20-4834-b9cd-5fa027a43ec9\",\"add\":30582,\"del\":8432},{\"id\":null,\"add\":4130,\"del\":8124}]"}