---
# System prepended metadata

title: 'Clinical Trial Data Synthesis '
tags: [SUG, MTx]

---

# Structured Process for Synthesizing Clinical Trial Data  
James Joseph and/on behalf Trinath Panda 
Sacramento, California  @ WUSS on
Thursday, September 5, 2024 2:30 PM PST
contact@edaclinical.com

---

# Synthesizing Realistic Clinical Trial Data Using Synthea  
James Joseph and/on behalf of **Anna Yudovin,** Trinath Panda 
North Bethesda, Maryland
Tuesday, Septemeber 24, 2024, 10:00 AM EST
contact@edaclinical.com

---

## Introduction
- **Objective:** Introduce a structured process for synthesizing clinical trial data as it is collected.  
- **Key Features:** Defines data synthesis expectations, ensures reproducibility, and evaluates outcomes.  
- **Training Scenario:** Statistical programmers manage source data changes and synthesis to deepen clinical trial understanding.

<!-- 
**Speaking Notes:**
- This structured process replicates the real-world collection and management of clinical trial data, 
helping programmers practice handling complex data structures and error identification.
-->

---

## Overview of the Structured Process
- **Adaptation:** like data definitino table, but we specify data generation (not tabulation)  
- **Focus:** Reproducibility, practical training, and synthesis of trial data as collected.  
- **Conformance Engine:** Uses a conformance rules engine to score candidates’ data synthesis based on CDISC standards, sponsor-specific rules, and study-specific criteria.

<!-- 
**Speaking Notes:**
- Scoring is based on a combination of CDISC standards and sponsor-defined requirements. These scores help identify weaknesses in data conformance.
- Examples of sponsor-specific rules might include study-specific variables like patient inclusion/exclusion criteria.
-->

---

## What is Synthetic Data?
- **Definition:** Data generated to resemble real clinical trial data.  
- **Difference from Simulated Data:** Controlled creation of data vs. generating data to fit external targets.

<!-- 
**Speaking Notes:**
- Synthetic data generation provides control over scenarios, letting us introduce custom issues or patterns, such as errors in data types or missing fields, for focused learning.
-->

---

## Key Variables for Tab (1 of 2)
| Tab                 | Key Variables                 | 
|---------------------|-------------------------------|
| **1: Trial Design Domains** | VISITNUM, VISIT, VISITDY       | Ensures consistent data collection and trial scheduling.                      |
| **2: Form Inventory**       | FormID, Form Name, ~~Conditional Flag~~ | Tracks forms that are always or conditionally collected.                      |
| **3: Visit Forms**          | VISITNUM, VISITDY, FormID      | Links forms to visits for correct data collection.                            |

<!-- 
**Speaking Notes:**
all about the visits and the forms in this tab
~~Event~~ Conditions - unscheduled visit, adverse event, re-implementation device, then these forms are used. 
-->

---

## Key Variables for Tab (2 of 2)
| Tab                 | Key Variables                 |
|---------------------|-------------------------------|
| **4: Field Inventory**      | FormID, FieldID, ~~Field Label~~   | data points are accurately captured.                              |
| **5: Field Attributes**     | FieldID, ~~Type, Format, Codelist, Origin, SigDig~~ | Defines field attributes for accurate data synthesis.                         |
| **6: Field Methods**        | FieldID, ~~Derivation/Method, Comment~~     | Defines data synthesis methods to match real clinical scenarios.              |

<!-- 
**Speaking Notes:**

We do not need a unique field id , same form same way no matter what visit 

-->

---


## Tab 1: Trial Design  
**Instructions:**
   - Identify all timing variables through the schedule of assessments.
   - Focus on trial visit number (VISITNUM), description (VISIT), and study day (VISITDY).

---

**Trial Design**

| VISITNUM | VISIT    | VISITDY |
|----------|----------|---------|
| 001      | Baseline | 1       |
| 002      | Week 1   | 7       |
| 003      | Week 2   | 14      |
| 004      | Week 3   | 21      |
| 005      | Follow-up | 30     |


<!-- 
**Speaking Notes:**
- data is  generated/collected at the right times. Any mismatch in these can lead to misaligned data, affecting analysis.
- Example: A mismatch in Visit Day (VISITDY) could result in incorrect timelines for patient assessments, leading to data inconsistencies.
-->

---

## Tab 2: Form Inventory

**Instructions:**
- Assign unique FormIDs to each form used in the casebook.
- Flag forms that are conditionally collected based on trial needs.

---

**Tab 1: Form Inventory**

| FormID | Form Name          | Event Flag (Yes/No) |
|--------|--------------------|---------------------------|
| F001   | Informed Consent    | No*                       |
| F002   | Adverse Event       | Yes                       |
| F003   | Demographics        | No                        |
| F004   | Vital Signs         | No                        |



<!-- 
**Speaking Notes:**
need from all patients. 
we are creating all patient data, then assigning conditional rates, then restricting full set of patient visit data
 
- Conditional forms (like F001 - Informed Consent) are only collected if specific criteria are met, such as a patient agreeing to join the trial.-->

---

**Tab 2: Form Inventory**

| FormID | Form Name          | Event Flag (Yes/No) |
|--------|--------------------|---------------------------|
| F005   | Laboratory Results  | No                        |
| F006   | Medical History     | Yes                        |

<!-- 
**Speaking Notes:**
- Forms like F005 (Laboratory Results) are collected routinely for all patients, whereas forms like F001 (Informed Consent) may only be triggered by specific conditions.
- Correctly flagging these forms prevents unnecessary data collection, saving time and effort.
-->

---

## Tab 3: Visit Forms 

**Instructions:**
   - Track all combinations of time points and forms.
   - Ensure forms repeat at appropriate time points when scheduled.

---


**Tab 3: Visit Forms**
| VISITNUM | VISITDY | ~~VISIT~~    | FormID | ~~Form Name~~            |
|----------|---------|----------|--------|----------------------|
| 1        | 0       | Baseline | F003   | Demographics Form    |
| 1        | 0       | Baseline | F004   | Vital Signs Form     |
| 2        | 7       | Week 1   | F004   | Vital Signs Form     |
| 2        | 7       | Week 1   | F005   | Laboratory Results   |


<!-- 
**Speaking Notes:**
- Each row represents a visit where a form is administered. For example, Demographics and Vital Signs are collected during the baseline visit.
- This tab helps ensure forms are not missed during the trial, which could result in gaps in patient data.
-->

---

**Tab 3: Visit Forms**

| VISITNUM | VISITDY | ~~VISIT~~    | FormID | ~~Form Name~~            |
|----------|---------|----------|--------|----------------------|
| 3        | 14      | Week 2   | F004   | Vital Signs Form     |
| 3        | 14      | Week 2   | F005   | Laboratory Results   |

<!-- 
**Speaking Notes:**
- Forms like Vital Signs (F004) may be collected at multiple visits, and this tab ensures consistent tracking across all visits.
- Repeating forms like F005 (Laboratory Results) ensures key data is captured at critical points in the trial, such as every few weeks.
combination of all unique visit and forms 
-->

---

## Field Tabs
| Tab                 | Key Variables                 |
|---------------------|-------------------------------|
| **4: Field Inventory**      | FormID, FieldID, ~~Field Label~~   | data points are accurately captured.                              |
| **5: Field Attributes**     | FieldID, ~~Type, Format, Codelist, Origin, SigDig~~ | Defines field attributes for accurate data synthesis.                         |
| **6: Field Methods**        | FieldID, ~~Derivation/Method, Comment~~     | Defines data synthesis methods to match real clinical scenarios.              |

<!-- 
**Speaking Notes:**
all have same number of rows 
-->



---


## Tab 4: Field Inventory 

- List all fields, assigning unique FieldIDs.
- Ensure clear Field Labels and link fields to their FormID.
- Document every field to ensure no data is missed.

---

**Tab 4: Field Inventory  (1 of 2)**

| FormID | FieldID  | Field Label               |  OutputName     |
|--------|----------|---------------------------|---------------|
| F003   | F003.01  | Subject ID                | Subject_ID    |
| F003   | F003.02  | Date of Birth             | DOB           |
| F004   | F004.01  | Systolic Blood Pressure   | Systolic_BP   |

<!-- 
**Speaking Notes:**
- For fields like Subject ID (Required), the data must be consistently entered. Missing a required field violates the trial protocol and will cause issues during submission.
- Conditional fields like Systolic_BP may only be required under specific conditions, such as during certain visits.
-->

---

**Tab 4: Field Inventory (2 of 2)**

| FormID | FieldID  | Field Label               | OutputName     |
|--------|----------|---------------------------|---------------|
| F004   | F004.02  | Diastolic Blood Pressure  | Diastolic_BP  |
| F005   | F005.01  | Hemoglobin Level          | Hgb_Level     |
| F005   | F005.02  | White Blood Cell Count    | WBC_Count     |

<!-- 
**Speaking Notes:**
- Permissible fields like Hemoglobin Level offer flexibility for data entry when it’s not strictly required but can be useful for exploratory analysis.
- Ensuring fields like WBC_Count are marked as Required guarantees essential data is collected for key trial metrics.
-->

---

## Tab 5: Field Attributes

- Define field attributes: type, format, controlled terms.
- Specify significant digits for numeric fields.
- Classify fields as **Required**, **Expected**, **Conditionally Required**, or **Permissible**.

<!-- 
**Speaking Notes:**

cannot fill in NOT DONE if there is a value there


-->


---

**Tab 5: Field Attributes (1 of 2)**

| FieldID  | OutputName      | Type1 | Type2  | Format    | Controlled Terms/Codelist | Significant Digits | ~~Origin~~    | State/Classification      |
|----------|---------------------|-------|--------|-----------|---------------------------|--------------------|-----------|---------------------------|
| F001.01  | DOB        | Char  | Closed | YYYY-MM-DD | N/A                       | N/A                | Collected | Expected                  |
| F003.02  | Diastolic_BP       | Num   | Closed | 999/99    | Normal, Elevated, Hypertensive | 2            | Collected | Conditionally Required     |

<!-- 
**Speaking Notes:**
- Controlled terms and codelists ensure that data entry for fields like Blood Pressure stays within defined ranges (Normal, Elevated, etc.), preventing data errors.
- Example: Incorrect formatting of Date of Birth leads to downstream analysis errors if not fixed early.
-->

---

**Tab 5: Field Attributes (2 of 2)**

| FieldID  | Hgb_Level       | Type1 | Type2  | Format    | Controlled Terms/Codelist | Significant Digits | Origin    | State/Classification      |
|----------|---------------------|-------|--------|-----------|---------------------------|--------------------|-----------|---------------------------|
| F004.05  | Hgb_Level           | Num   | Closed | 9.99      | Low, Normal, High          | 2                 | Collected | Permissible                |
| F005.07  | Adverse Event        | Char  | Open   | N/A       | Headache, Nausea, Rash     | N/A               | Required                  |

<!-- 
**Speaking Notes:**
- Permissible fields like Hemoglobin allow flexibility, while required fields like Adverse Event are critical to understanding trial outcomes and safety measures.
-->

---

## Tab 6: Field Methods

**Instructions**
- Define the method or logic for generating data.
- Later: simulate population, event rates, times 
See Wilins


---

**Tab 6: Field Methods**

| FieldID  | Derivation/Method                                    | Comments                                |
|----------|------------------------------------------------------|-----------------------------------------|
| F003.01  | N/A                                                  | Directly collected from subject         |
| F003.02  | N/A                                                  | Directly collected from subject         |
| F004.01  | Randomly generated within normal range (90-120 mmHg)  | Based on typical vital sign ranges      |

<!-- 
**Speaking Notes:**
- Methods like random generation for fields such as blood pressure ensure that the synthesized data matches real-world ranges.
- If blood pressure is measured only during certain visits, it should be flagged as conditionally required.
-->

---

## Tab 6: Field Methods (2 of 2)

| FieldID  | Derivation/Method                                    | Comments                                |
|----------|------------------------------------------------------|-----------------------------------------|
| F004.02  | Randomly generated within normal range (60-80 mmHg)   | Based on typical vital sign ranges      |
| F005.01  | Mean of previous 3 Hemoglobin values                 | Calculated if previous data available   |

<!-- 
**Speaking Notes:**
- Using prior values to calculate Hemoglobin levels creates more realistic synthetic data, ensuring data is consistent over time.
- Random generation ensures required fields like blood pressure fall within expected clinical ranges.
-->

---

## Testing Programmers

| Issue                               | Example                                                                   |
|-------------------------------------|---------------------------------------------------------------------------|
| Increase/decrease           | Events, observed values, subjects
| Inconsistent data            | Changing format, terms, linkages                           |
| Missing values          | Lab test note done, subject disposed
| Unscheduled visits  | New visitnums, windows, denominators 




---

## Synthetic Data Use Cases

| Use Case                                | Description                                    | Example Application                         |
|-----------------------------------------|------------------------------------------------|---------------------------------------------|
| **Standard of Care**                    | Compare new treatment against existing care    | Phase 3 trials comparing treatments         |
| **Placebo for Safety**                  | Use synthetic placebo for safety assessments   | Phase 2 trials where placebo may be unethical|
| **Phase 4/Real-World Evidence**         | Generate synthetic data post-market approval   | Drug label expansion, new indications        |
| **Regulatory Submissions**              | Synthetic data used to support drug approvals  | Supporting evidence for indication expansion|

<!-- 
**Speaking Notes:**
- Synthetic data has been widely used across multiple phases of drug development. For example, in Phase 3 trials, it helps compare the efficacy of new treatments with the standard of care.
- In Phase 4, synthetic data supports real-world evidence for drug label expansion or new indications, providing critical data to regulators.
-->



---

## Adapting Field Methods for Real Clinical Data

1. **Continuous Variables:**  
Simulate data such as **lab results** or **vital signs** using normal distributions with appropriate mean and variance.

---

- Use the **RAND** function in SAS to simulate data from distributions like Normal, Uniform, or Exponential.
- Adjust parameters to fit clinical trial data expectations:
   - **Example:** Generate continuous variables such as blood pressure or glucose levels.

---


2. **Categorical Variables:**  
Simulate patient classifications (e.g., response categories, adverse event severity) using discrete distributions.
   - Example: Simulate **Concomitant Medication Use (CM)** by generating probabilities for medication classes

---


Use **RAND('TABLE', p1, p2, ...)**

to generate categorical variables based on predefined probabilities.
   - Useful for variables like **Adverse Event Severity** or **Treatment Response**.
- **Example**: Assigning mild, moderate, and severe adverse event categories to patient data.

---

3. Mixture Distributions for Realistic Clinical Data

Combine multiple distributions to model complex clinical outcomes.

- **Example:** Simulate patient visit durations based on categories like routine visits vs. complex follow-ups.
- Use normal distributions with different means to represent each subpopulation.

---

## Overview

| Variable Type        | Method                                         | Example |
|----------------------|------------------------------------------------|---------|
| Continuous Variables | **RAND** function (Normal, Uniform, etc.)       | Blood pressure, glucose levels |
| Categorical Variables| **RAND("Table")** for empirical distributions   | Adverse event severity, CM use |
| Mixture Distributions| **RAND + Mixtures** for complex scenarios       | Visit type, treatment effects |


---

## **Optimize** 

- Simulate 1,000 patient profiles with baseline characteristics and follow-up visits.
- **Use BY-group processing** to run multiple simulations efficiently.
- Generate multiple patient profiles at once, reducing overhead.


---

## Better models

- Leverage more advanced techniques such as **multivariate normal distributions** for correlated variables in trials (e.g., patient age, treatment dose, and outcome).
- Consider using **SAS/IML** for complex simulations where correlated or multivariate data is needed.

---

## So far

Field Methods should reflect real-world data collection practices:
   - Continuous variables should have biologically **plausible ranges**.
   - Categorical variables should reflect **probabilities seen** in trial populations.
   - Mixture distributions when multiple **conditions apply**.

---


# Using Synthea to Synthesize Data for the SilentNight Study

---

### **SilentNight Study**
- **Title**: In-Home Assessment of Three Anti-Snoring Devices
- **Protocol**: SRC-AI-SilentNight-10090
- **Objective**: Compare the effectiveness of three anti-snoring devices (Mute, myTAP V, SPT) using patient-reported outcomes and snoring data.

---

The study includes:
- Cross-over design (each participant serves as their own control).
- Use of audio recordings and questionnaires to assess snoring severity and sleep quality.

---

## Using Synthea to Generate Realistic Patient Data

**Synthea** can be used to create synthetic patients with conditions like snoring and related interventions. 


Before we set up the synthetic patient generation process for the SilentNight study...

---


## What is Synthea?

An **open-source tool** for generating synthetic Electronic Health Records (EHRs).

---

**Purpose**
To provide synthetic patient data for research, testing, and training purposes **without violating privacy** or **intellectual property** laws.

---

**Features**:
  - Simulates **entire lifespans** of patients.
  - Encodes outputs in **FHIR** and **C-CDA** formats.
  - Supports modules that represent disease **progression** and treatment.

---

## Origins of Synthea

``MITRE``
![image](https://hackmd.io/_uploads/rkStw4eAR.png)

Massachusetts Institute Technology Research and Engineering

---

## **Objective**

To create a system for generating large-scale, realistic synthetic health records for secondary use in research, development, and education.

---

## Motivation

**Healthcare Challenges**:
  - Limited access to real patient data due to privacy concerns.
  - Anonymization is often insufficient and can still lead to re-identification risks.
  - High demand for high-quality data for research, development, and clinical training.




---

## **Project Launch** 
Synthea generates synthetic data using publicly available health statistics and disease models

Initially focused on **modeling the progression** of common diseases like **Type 2 Diabetes**

<!--, but quickly expanded to include multiple chronic conditions and primary care visits. -->

---

## PADARSER Framework

Publicly Available Data Approach to Realistic Synthetic EHR (used by Synthea).

[![image](https://hackmd.io/_uploads/r1eGi4xAC.png)](https://watermark.silverchair.com/ocx079.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAA10wggNZBgkqhkiG9w0BBwagggNKMIIDRgIBADCCAz8GCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMHJ-jCFX2FkMzXKR2AgEQgIIDEH21y-YKBwVjJmpXT5I1norMagimmAL5O5ftwepTYXkUZNi7bhXbr_ZTPyxXVczhGzOnYUL9Mc1i1Pm5hItL6ZleuE8pWsxPtnLf3GambWYGcSyVkxypPuNbyhQQYZiphoTVDzVveylpcY2yhhAGVV71lTWGboMPjrOIZyefReKExE9PHew61HiCCNDq9xbkm-q3j_b9p7UrkTqrKErEG-hAog-_gh7qExyfLI0m58JC7ludw70urx1acLCnSZJXxwNQMv6rqBIiDXgczIbZXAEyuIZqoNFVnk9FXmwrIqUaDljTaFsVs_22yTZOjm6gpHY0M9FcGruNYcr02v_hl3AMNw0MQkj2trFncy6cmnptYaSxhVCfeAqkpFXP2AlAAkuvRpWy4e6c6pWhShemB915zbVi8mP8OfOC28JuIR-L-iEL_8B1H5knwpUkYHGa1vKdASX7FhOZMhv0Amjcagk1_KwkEbF6FXsk9q7luOPAeLcUJShRXLHPGDfa8vm3wuxDZTIjqL5eJoRycuAop7_KQM1KTj7y3bhY4W6N5qU1Hycu47oGSTBUbWhpOxX00L_Wf_mZhO7NqI0Zrxvkw1jwu6-oTRXmIZZYe_eLs-0vuJfRF3tpm0q1Ky0QyQUTIvnI_dUEZohW89HhVi_zuiDnTr47Uia8ED9BjCw-C2Znj-s1al3XwX3O4x_2rGiV5RhPR4MDNCl-f1x-k-dEkboGSxqbp9VCse3o77hsL6fv1DFad7dVJQ7EEPFSIxOPz0ec3FgKQTZ0PjCHJUCyVsMCPKVYizgcNRLCVjDx-XW0JZitqqEblD20xSDivZUNDcRR8oCR3Cgb54sPd7Cw9wortaJRznQ-BliPyPTPH5McI7vrQRzCDHVCGtoR5H9YKdHHbo5tFsEidHuSooOjeRrY6N0bLM9_YSJzZCPILrFvOwJS6Ok9PhdrqAuxo7io4WG8ISecYAnQPOwCTMDJS_iyv7kquEIuinoETaHB78XmyjOvcgvh8BVYa9kqmo16s9eLT3Vyzb9wdMJ-zWXoTfg)


---

``PADARSER``

1. **Public Data**: Uses publicly available health statistics.
2. **No Real EHR Data**: Synthetic data generated without real patient records.
3. **Care Maps**: Based on clinical guidelines or expert input.
4. **Privacy Preservation**: Completely eliminates risks of re-identification.


---

**Output** 
Provides synthetic patient records that mimic real-world healthcare data but are risk-free.


---

## **Data Sources**: 
  - U.S. Census Bureau demographics.
  - CDC disease prevalence and incidence rates.
  - NIH research reports.

---



**Top 10 Reasons for Primary Care Visits**
  1. Routine health checks.
  2. Hypertension.
  3. Diabetes.
  4. Pregnancy.
  5. Respiratory infections.
  6. General exams.
  7. Lipid metabolism disorders.
  8. Ear infections.
  9. Asthma.
  10. Urinary tract infections.

---

**Top 10 Chronic Conditions Modeled**:

Includes **Ischemic heart disease**, **Lung cancer**, **Alzheimer’s**, and more.


---


## How Synthea Works

- **Modular Design**: 
  - Diseases and treatments are represented as modules.
  - Modules can be developed and expanded by the community.

---

### **How Simulations Work**
  - **Life-Long Simulation**: From birth to death, Synthea simulates patient health states, encounters, diagnoses, and treatments.
  - **Timesteps**: Configurable periods (usually 7 days) where health events (e.g., medical visits, diagnoses) occur.

---

### **Core Elements**
  - **State Transition Machines**: Each health state (e.g., “Infection,” “Diagnosis”) is represented as a state in a JSON-based machine.
  - **Clinical Care Maps**: Dictate patient progression based on real clinical guidelines.
  - **Census Data**: Helps simulate population-level health conditions.

---

#### State Transition Machine 
![image](https://hackmd.io/_uploads/SJ552NgAC.png)

---

**Types of States**:
  - **Control States**: Manage the flow of the simulation (e.g., delays, filters).
  - **Clinical States**: Represent medical events (e.g., conditions, medications, procedures).

---

**Transitions**:
  - **Direct**: Moves patients between states.
  - **Conditional**: Transitions based on patient attributes (e.g., age, gender).
  - **Distributed**: Random transitions based on predefined probabilities.

---


#### Clinical Care Maps
![1000012925](https://hackmd.io/_uploads/SksnsNlAR.jpg)


<!-- hypertension: encounters: doctors, prescription,  fainting, symptoms: increase in blood pressure, - mind maps -->
---

### **Outputs**

  - EHR records are produced in **FHIR** and **C-CDA** formats.
  - Accessible via **FHIR API**.

---

#### Patients 

```csvpreview
Id,BIRTHDATE,DEATHDATE,SSN,DRIVERS,PASSPORT,PREFIX,FIRST,MIDDLE,LAST,SUFFIX,MAIDEN,MARITAL,RACE,ETHNICITY,GENDER,BIRTHPLACE,ADDRESS,CITY,STATE,COUNTY,FIPS,ZIP,LAT,LON,HEALTHCARE_EXPENSES,HEALTHCARE_COVERAGE,INCOME
39a59f42-5a52-886d-844b-183228219521,1989-01-13,,999-23-7837,S99973957,X28834206X,Mrs.,Vonda514,,Kutch271,,Littel644,M,white,nonhispanic,F,Nice Provence-Alpes-Cote dAzur FR,508 Gerhold View Unit 85,Worcester,Massachusetts,Worcester County,25027,01602,42.177418916419676,-71.74487793141057,10498.40,0.00,95959
```
---

#### Condition

```csvpreview
START,STOP,PATIENT,ENCOUNTER,SYSTEM,CODE,DESCRIPTION
2005-10-28,,8c85983a-a538-522f-bce0-03678b0fc7ce,9c197d02-ae7d-7f32-8068-58841a3f4a1e,http://snomed.info/sct,39898005,Sleep disorder (disorder) 2005-12-04,,8c85983a-a538-522f-bce0-03678b0fc7ce,204970d2-19c3-c299-8ff1-ce351d0a28ae,http://snomed.info/sct,78275009,Obstructive sleep apnea syndrome (disorder)
```

---

#### Encounter 

Id,START,STOP,PATIENT,ORGANIZATION,PROVIDER,PAYER,ENCOUNTERCLASS,CODE,DESCRIPTION,BASE_ENCOUNTER_COST,TOTAL_CLAIM_COST,PAYER_COVERAGE,REASONCODE,REASONDESCRIPTION 7e1ff077-0d86-10af-07b7-e98ab77ae1bd,2017-03-23T21:49:46Z,2017-03-23T22:04:46Z,39a59f42-5a52-886d-844b-183228219521,a6fb79e7-4abb-3a68-b62d-e501427fdca4,d173012f-c03f-38f8-b4e2-4d128736f74b,26aab0cd-6aba-3e1b-ac5b-05c8867e762c,wellness,162673000,General examination of patient (procedure),136.80,272.80,0.00,, 


---

#### Observation 

```csvpreview
DATE,PATIENT,ENCOUNTER,CATEGORY,CODE,DESCRIPTION,VALUE,UNITS,TYPE 2014-05-13T03:13:21Z,2e23caa4-d831-1f47-c522-0518bab7bd3d,,,QALY,QALY,38.0,a,numeric 2014-03-19T02:47:28Z,7cd3260f-2795-3080-b267-3bb0a9f40624,,,QALY,QALY,37.0,a,numeric 2014-08-03T04:35:13Z,3f6021df-8aac-ed27-a705-e71bc0e4b649,,,QALY,QALY,24.0,a,numeric 
```

---

#### Device 

```csvpreview
START,STOP,PATIENT,ENCOUNTER,CODE,DESCRIPTION,UDI 2005-12-03T18:20:11Z,2005-12-03T23:06:11Z,8c85983a-a538-522f-bce0-03678b0fc7ce,204970d2-19c3-c299-8ff1-ce351d0a28ae,701077002,Respiratory apnea monitoring system (physical object),(01)45065531515463(11)051112(17)301127(10)50306183570988073(21)60077480 
```

---


## **FHIR API Integration**:
Synthea provides access to synthetic patient records via FHIR API, enabling real-time data querying for developers and researchers.

---

## 1 - Install and Set Up Synthea

1. **Clone the Synthea Repository**:
   ```bash
   git clone https://github.com/synthetichealth/synthea.git
   cd synthea
   ./gradlew build
    ```
2. Configure Population: To match the study requirements, configure Synthea to generate a population reflecting the characteristics of the study subjects (e.g., adults with snoring issues).

Modify the population settings in the synthea.properties file:

synthea.population.population_size = 30




---

## 2 - Modify Existing Modules or Create a New Module

Synthea has existing modules for conditions like sleep apnea, but for the SilentNight study, we need to model snoring and different device interventions.

1. Existing Module: The Sleep Apnea module can be modified to reflect the trial conditions.


2. Custom Snoring Module: Create a new custom module for snoring with various interventions (Mute, myTAP V, SPT).


---

Example JSON Structure for a Snoring Module:

```json
{
  "name": "Snoring",
  "states": {
    "Initial": {
      "type": "Initial",
      "direct_transition": "Snoring_Condition"
    },
    "Snoring_Condition": {
      "type": "ConditionOnset",
      "target_encounter": "Snoring",
      "codes": [{
        "system": "SNOMED-CT",
        "code": "52818001",
        "display": "Snoring"
      }],
      "direct_transition": "Trial_Device"
    },
    "Trial_Device": {
      "type": "Simple",
      "direct_transition": "Device_Outcome"
    },
    "Device_Outcome": {
      "type": "Observation",
      "codes": [{
        "system": "SNOMED-CT",
        "code": "LA6568-3",
        "display": "Reduced snoring severity"
      }],
      "direct_transition": "End"
    }
  }
}
```


---

### 3 - Define Patient Characteristics

The SilentNight study involves specific inclusion and exclusion criteria. Synthea allows us to define patient demographics based on these criteria.

1. Inclusion:

Adults aged 21 to 55.

History of snoring for over 6 months.

Co-sleeping with a bed partner.


---

2. Exclusion:

High risk of obstructive sleep apnea.

Respiratory conditions like COPD.


These can be specified in the module as patient attributes:

```json
"states": {
  "Patient_Attributes": {
    "type": "Attribute",
    "attributes": {
      "age": {
        "value": 21,
        "distribution": "uniform",
        "maximum": 55
      },
      "snoring_history": true,
      "co_sleeping": true
    }
  }
}
```



---

### 4 - Implement Interventions

1. Mute Device: Create an encounter where the snorer uses the Mute nasal device for 1 week.

```json
{
  "Mute_Trial": {
    "type": "Procedure",
    "codes": [{
      "system": "SNOMED-CT",
      "code": "73761001",
      "display": "Nasal dilator therapy"
    }],
    "direct_transition": "Mute_Outcome"
  }
}
```

---

2. myTAP V Device: Define a mandibular advancement procedure using the myTAP V for 2 weeks.


3. SPT Device: Define positional therapy using SPT, including calibration and sleep training modes.




---

### 5 - Output Data for Analysis

Synthea generates outputs in FHIR, CSV, and other formats. For the SilentNight study, CSV output can be used for analysis of snoring severity and device outcomes.

```bash
./run_synthea -p 30 --exporter.csv.export true
```

Review Output Files:

patients.csv: Contains patient demographics.

observations.csv: Includes snoring severity observations.

procedures.csv: Contains data on the interventions (Mute, myTAP, SPT).



---

### 6 - Analyze Synthetic Data

1. Primary Analysis:

Compare snoring severity across device use (Mute, myTAP V, SPT).

Use CSV data outputs to calculate average snoring severity for each intervention period.

---

2. Secondary Analysis:

Assess patient-reported outcomes regarding device comfort and effectiveness.


Tools like R or Python can be used to standardize and analyze the CSV files to produce Tables, Listings and Figures for the Clinical Study Report.


---

## Next Steps 
  
**Expanding Modules**
- More diseases and treatments to the system.
- Improve disease progression models, simulate more complex clinical interactions (e.g., drug interactions, adherence rates).

---

**Validation**: Ongoing validation efforts are comparing synthetic data with real-world statistics to ensure realism in population-level outputs.

---