owned this note
owned this note
Published
Linked with GitHub
# PPMI Overview
### Additional Hacks: https://hackmd.io/D3JsvYx1TPut1h7uslxLVQ
[toc]
## Datasets Present in PPMI
### Curated Data Cut
This data is the primary dataset I've been starting with because it has everything already organized. It includes everything that the main PPMI participants get tested on. However, that differs depending on which group the patient is in (PD,Prodromal,Control)
* **Parkinsons Disease Group Data**
* **ID Section**:
* **Site**
* **Participant ID**
* **Cohort at enrollment**
* PD, Prodromal, Healthy Patient
* **Analytic Cohort based on consensus committee review (CCR)**
* This changes for a few patients that start in Prodromal and move to PD group
* **Subgroup based on CCR**
* This is a subgroup based on whether the patients have either genetic identifiers or symptoms that link to the onset of Parkinsons.
* Sporadic (No history of PD. This is the most common), LRRK2 (Common genetic mutation), GBA (common genetic mutation), hyposmia (loss of smell), SNCA (Genetic Mutation), RBD (REM sleep disorder), PRKN (Young onset parkinsons genetic mutation)
* The majority of patients in the PD group are Sporadic, while the all of the patients in the prodromal group have one of the predictors since there is no predictor for sporadic
* **Event ID**
* When measurment was taken (Baseline - Visit 20)
* **Year**
* **Visit Date**
* **Demographics Section:**
* **Age at Enrollment**
* **Age at Visit**
* **Indicator for Ashkenazi Jewish/Berber/Basque descent**
* Common ascestory for patients with the above genetic mutations
* **Education Level**
* **Years of Education**
* **Family History of PD**
* **Handedness, Gender, Race, Sex, Hispanic ethnicity, Sexual Orientation**
* **Genetics Sections**
* **APOE**
* APOE is a genetic varient that has been linked to Parkinsons disease and Alzhiemers. APOE comes in three forms, or alleles, called 2, 3 and 4. Each patient has two alleles. Allel 2 and 4 have been linked to PD
* **Biologics Section**
* **Abeta**
* CSF readings of A-beta 1-42 (pg/ml)
* **Asyn**
* CSF readings of $\alpha$ synuclein (pg/ml). Out of all these asyn has been shown to be to best for early prediction while the others don't show much correlation
* **P-tau**
* CSF readings of Ptau (pg/ml)
* **T-tau**
* CSF readings of Ttau (pg/ml)
* **Urate**
* Serum Uric Acid in blood (mg/dl)
* **Hemohi**
* Level of CSF hemoglobin (pg/ml)
* **Urine BMP Totals**
* Elevated Urine bis(monoacylglycerol)phosphate (BMP) levels have been shown in patients with the LRRK-2 genetic mutation. Could be useful as an early biomarker
* **CSF/Serum Neurofilament Light**
* Neurofifilament light (NfL) is a neuron-specific protein component involved in the assembly and maintenance of the axonal cytoskeleton, which is elevated in CSF and serum due to axonal damage and neurodegeneration. Used as a biomarker for the progression of motor symptoms.
* **Datscan Section**
* **SBR for Left/Right Caudate and Putamen**
* It also indicates what side is Ipsliateral and Contralateral. And gives the mean values for the regions
* **Clinical Section** The Surveys are described in the data hack
* **Age at ..**
* Gives the age of the patient at Diagnosis, Pd symptom onset, age at LP, Age at DaTSCAN, age at Lumbar puncture, and age at UPSIT
* **BMI**
* **Cognitive State Evaluation**
* **Dominant Side of PD**
* **Duration from PD diagnosis to enrollment**
* **Benton Judgment of Line Orientation**
* This test measures a person's ability to match the angle and orientation of lines in space. It is evaluated out of 30.
* **Total clock drawing score**
* Evaluation of a patient to draw clocks
* **Epworth Sleepiness Scale Score**
* This test evaluates a person's sleepiness throughout the day. A patient is given 8 scenarios were they must give a sleepiness rating of 0-3, 0 no impairment - 3 falling asleep. It is evaluated out of 24 points.
* **Geriatric Depression Scale Score**
* Evaluates whether a person is depressed or not. Scored out of 15 with a score greater than 5 representing depression. For each question 1 is yes, 0 is no.
* **The Hopkins Verbal Learning Test**
* A test that measures verbal learning and memory. Envolves memorization of a list of words to test the ability to recall immediately after memorization (immediate recall) and after a 20-minute delay (delayed recall). The test is scored out of 36.
* **Hoen and Yahr**
* The scale has been used for the staging of the functional disability associated with Parkinson's disease. It helps in describing the progression of the disease through various stages, thus allowing us to measure the severity of the case. Between 0 (no imparment) - 5 (max imparment)

* **Lexical Fluency Score**
* used to assess language and executive function in clinical practice.
* **Letter number sequencing score**
* It is a test that measures an individual's short-term memory skills in being able to process and re-sequence information. The participant must sequence a random order of numbers and letters. The total score for this test is 21.
* **Modified Boston naming test score**
* consisting of 60 black and white line drawings of objects, is a measure of confrontation naming that takes into account the finding that patients with dysnomia often have greater difficulties with the naming of low frequency objects.
* **Mild cognitive evaluation**
* **MOCA**
* The montreal cognitive assessment is used to assess cognitive impairment. It is comprised of a variety of recognition tasks. It is scored out of 30 points. Correct answers are 1 point.
* **Modified Schwab and England score**
* The patient is asked to evaluate how well they are able to live on their own with scores ranging from 0-100% in intervals of 10% with 100% representing complete independence and 0% representing a vegetative state.
* **MDS-UPDRS parts I-IV**
* Total UPDRS is assessed by the following conditions:
1. Mentation, behavior, and mood
2. Activities of daily living
3. Motor examination
4. Hoehn and Yahr staging scale
* **Orthostasis**
* evaluation of blood pressure change
* **Total Levodopa Equivalent Daily Dose**
* Variable representing all medication
* **Quip**
* QUIP has 4 primary questions (pertaining to commonly reported thoughts, urges/desires, and behaviors associated with ICDs), each applied to the 4 ICDs (compulsive gambling, buying, eating, and sexual behavior) and 3 related disorders (medication use, punding, and hobbyism) evaluated on a 0-1 scale.
* **UPDRS Part III**
* UPDRS Part III survey in which the patient evaluates 18 separate motor complications on a scale from 0 (no motor complications) - 4 (severe motor complications).
* **REM**
* Rapid eye movement sleep behavior disorderis one of the most specific prodromal indicators for Parkinson's disease. It is scored out of 10. Patients are asked questions about their sleep and enacting dreams.
* **SCOPA**
* This test evaluates autonomic symptoms in patients with Parkinson’s disease. The 23 items of the SCOPA-AUT are grouped into six domains: gastrointestinal functioning (seven items), urinary functioning (six items), cardiovascular functioning (three items),thermoregulatory functioning (four items), pupillomotor functioning (one item), sexual function (two items for men and two for women). The maximum score is 69, with the score for each item ranging from 0 (never experiencing the symptom) to 3 (often experiencing the symptom).
* **STAI**
* This test is used to measure anxiety in patients. 20 questions for both trait and state anxiety. Scores are given in a range from 1(almost never) - 4(almost always) with a max score of 160.
* **UPSIT**
* Smell dysfunction occurs in 90% of cases with PD and has been shown to help distinguish PD from other parkinsonisms. The UPSIT is a measurement of the individual's ability to detect odors at a suprathreshold level. There are 40 questions in a scratch and sniff booklet. The patient assigns each smell from a list of 4 possible answers.
* **Trails Making Test**
* It has two parts, in which the subject is instructed to connect a set of 25 dots as quickly as possible while maintaining accuracy.
* **Semantic Fluency Results**
* This test involves a patient naming as many unique words in a semantic category (animals,fruit,vegetables) during 3 60 second time periods with the total score being the total number of uniques words produced
* **Initial symptoms at diagnosis**
* **Primary Diagnosis**
* One of 25 diagnosis availiable
* **Prodromal Group**
The prodromal group is missing the DaTSCAN section. A lot of patients also don't have alpha synclein measurements.
* **ID Section**
* Indicates participant phenoconverted during the study based on CCR
* First,Second, and Third phenoconversion diagnosis based on CCR
* **Control Group**
The control group is missing any information directly related to having PD such as symptoms, medications etc.
### Medication Datasets
There are two medication datasets. One that includes PD medications, their doses, and time of starting and stoping medication.
The other includes non-PD medication, their doses, and time of starting and stoping medication.
### Sensor Datasets
#### Roche Sensor
This Dataset includes data extracted from a phone sensor conducted on around 32 patients. It includes SDMT, Voice and Speech tests, Dexterity test, Shape drawing test, hand-turning test, postural tremor test, rest tremor test, balance test, Uturn test, and anxiety and depression scale
#### Verily Study Watch
This was a watch given to patients that determines hourly ambulatory minutes, hourly step count, sleep metrics, and cardio metrics. (344 Patients)
### Gait Dataset
More Detailed hack: https://hackmd.io/b90w3y86T0mdcnAf8AgivA
There is a substudy that investigates a patients gait using six tests that they extracted 30 features from. We don't have the raw data from this. (200 patients)
### Biospecimen Datasets
#### Project 151
Proteomics data from cerebrospinal fluid of Parkinson Disease patients and healthy volunteers are measured using the SOMAscan platform. Gives protein expression by measuring fluorescence. Proteins need to be mapped to ID code.
#### Project 177
Proteomics analysis was conducted on cerebrospinal fluid and blood plasma of both Parkinson's Disease patients and healthy participants in the PPMI cohort. Analysis was conducted using mass spectrometry on 2283 samples from 482 participants from cerebrospinal fluid and 949 samples from 179 participants from blood plasma. Gives ID that we would need to convert that represents a protein, the metric is protein abundance
#### Project 196
Proteomics analysis was conducted on cerebrospinal fluid and blood plasma of Parkinson's Disease patients and healthy participants in the PPMI cohort. Analysis was conducted using Olink Explore on 666 CSF samples & 898 plasma samples derived from 225 participants matched across both matrices with an additional 16 participants’ samples included as bridging samples. Gives value for normalized protein exression.
#### Project 214
Proteomics analysis was conducted in samples from cerebrospinal fluid (CSF) and blood plasma of 18 patients with Parkinson's Disease (PD) and 20 healthy controls (HC) from the PPMI cohort. The sample analysis here was intended to bridge analyses with another project on prodromal subjects. Related to above project. Gives normalized protein expression.
#### Project 180
Plasma samples from PPMI were analyzed by liquid chromatography with mass spectrometry(LC/MS) for a variety of metabolite and lipids as interrelated markers of Parkinson's disease and its pathophysiology.
### LONI , SAA, Amprion
LONI: Laboratory of Neuro Imaging -- the lab at USC that hosts the PPMI project.
SAA: Seed Amplification Assay, it is the assay they use to detect CSF alpha-synuclein oligomers (see attached doc).
735 samples - Detection of $α$-synuclein oligomers in CSF using the 24h Seed Amplification Assay (SAA).
Luis Concha, Amprion.
PPMI Project ID: [237]
### Imaging Datasets
#### DaTscan
Taking a look at the DaTscan, it uses Ioflupane I 123 as a tracer. The resolution of the images has a range of (128,128, 120/240/480) with a pixel spacing in the range of 2.6-3.3 mm$^{2}$. DaTscan images are from 2 cohorts, control and PD. For PD there are a total of 2676 scans. For control there was 276 scans.
* If we dont account for overlap of days there are: 368 patients 1 scan, 112 patients 2 scans, 125 patients 3 scans, 219 patients 4 scans, 34 patients 5 scans, 42 patients 6 scans, 11 patients 7 scans, 1 patient 8 scans, 3 patients 9 scans, 3 patients 10 scans, 2 patients 11 scans, 0 patients 13 scans, 3 patients 14 scans, 3 patients 15 scans. Howevere a lot of these higher scan numbers are overlap at the same time.
* If we account for time unique scans, then there are 381 patients with 1 scan, 125 patients with 2 scans, 154 patients with 3 scans, 273 patients with 4 scans, 9 patients with 5 scans, and 2 patients with 6 scans.
* There were 265 non overlap control scans all with a unique patient.
We also have access to analysis done upon the DaTscan images by PPMI, specifically calculation of SBR in the left/right caudate andthe left/right putamen.
> SPECT raw projection data was imported to a HERMES (Hermes Medical Solutions,
Skeppsbron 44, 111 30 Stockholm, Sweden) system for iterative (HOSEM) reconstruction. This was done for all imaging centers to ensure consistency of the reconstructions. Iterative
reconstruction was done without any filtering applied. The HOSEM reconstructed files were then transferred to the PMOD (PMOD Technologies, Zurich, Switzerland) for subsequent processing. Attenuation correction ellipses where drawn on the images and a Chang 0 attenuation correction was applied images utilizing a site specific mu that was empirically derived from phantom data acquired during site initiation for the trial. Once attenuation correction was completed a standard Gaussian 3D 6.0 mm filter was applied. These files were then normalized to standard Montreal Neurologic Institute (MNI)space so that all scans were in the same anatomical alignment. Next the transaxial slice with the highest striatal uptake was identified and the 8 hottest striatal slices around it were averaged in to generate a single slice image. Regions of interest (ROI) were then place on the of left and right caudate, the left and right putamen, and the occipital cortex (reference tissue). Count densities for each region were extracted and used to calculate striatal binding ratios (SBRs) for each of the 4 striatal regions. SBR is calculated as (target region/reference region)-1.
In addition, there was a visual interpretation done by experts in the field on whether a DaTscan had normal or abnormal signs.
> Abnormal images typically fall into at least one of the following three general categories:
a) Activity is asymmetric, e.g. uptake in the region of the putamen of one
hemisphere is absent or greatly reduced with respect to the other. Uptake is still
visible in the caudate nuclei of both hemispheres resulting in a comma or
crescent shape in one and a circular or oval focus in the other. There may be
reduced uptake between at least one striatum and surrounding tissues.
b) Ioflupane uptake is absent in the putamen of both hemispheres and confined to the
caudate nuclei. The signal is relatively symmetric and forms two roughly circular or
oval foci. Uptake of one or both is generally reduced.
c) Uptake is absent in the putamen of both hemispheres and greatly reduced in one
or both caudate nuclei. Uptake of the striata with respect to the background is
reduced (1).
#### MRI (3 Tesla)
**Paired stats**
We have 267 patients for whom paired scans are available
Our of these patients on 201 patients have NHY scores, aligned dates in their csv file. We are using 161 patients for train set at 40 patients in our test set.
**Unpaired stats**
The MRI data has a lot more variety and scans compared to other modalities. There is 2D/3D T1 weighted, T2 weighted, and PD weighted images with Gradient Echo Sequences (GES), GeneRalized Autocalibrating Partial Parallel Acquisition (GRAPPA), or Fluid-attenuated inversion recover(FLAIR). The resolution depends heavily on what type of scans (6744 2D and 3241 3D). The 2D scan had a typical resolution of around (448,448,16) with pixel spacing of around 0.5mm and slice spaceing of around 1.5mm. The 3D images have a typical resolution of around (256,256,192) with a pixel/slice spacing of around 1mm.
For T1 weighted MRI there were 465, 2285, 1670 images for the control, PD, and prodromal groups respectivly.
* For the prodromal group and looking at time unique images, there are 312 patients with 1 image, 41 patients with 2 images, 24 patients with 3 images, 5 patients with 4 images.
* For the PD group and looking at time unique images, there are 481 patients with 1 image, 57 patients with 2 images, 52 patients with 3 images, 97 patients with 4 images, and 1 patient with 5 images
* For the control group and looking at time unique images, there were 146 patients with 1 image, 46 patients with 2 images, 14 patients with 3 images, and 3 patients with 4 images
For T2 weighted MRI there were 386,1411,746 images for the control, PD, and prodromal groups respectivly.
* For the prodromal group and looking at time unique images, there are 313 patients with 1 image, 85 patients with 2 images, 24 patients with 3 images, 5 patients with 4 images.
* For the PD group and looking at time unique images, there are 500 patients with 1 image, 67 patients with 2 images, 51 patients with 3 images, 96 patients with 4 images, and 1 patient with 5 images
* For the control group and looking at time unique images, there were 148 patients with 1 image, 48 patients with 2 images, 17 patients with 3 images, and 3 patients with 4 images
For PD weighted MRI there were 343,1986,1157 images for the control, PD, and prodromal groups respectivly.
* For the prodromal group and looking at time unique images, there are 334 patients with 1 image, 97 patients with 2 images, 22 patients with 3 images, 5 patients with 4 images.
* For the PD group and looking at time unique images, there are 461 patients with 1 image, 54 patients with 2 images, 52 patients with 3 images, 98 patients with 4 images, and 1 patient with 5 images
* For the control group and looking at time unique images, there were 134 patients with 1 image, 48 patients with 2 images, 17 patients with 3 images, and 3 patients with 4 images
For MRI, grey matter volume(mm$^{3}$) was extracted from the T1 weighted MRIs.

In the dataset there are 1039 fMRI images. The resolution of these images is on average (448,448,210), (66,68,40)? with an average pixel spacing of 3.5mm.
For fMRI there were 75,667,346 images for the control, PD, and prodromal groups respectivly.
* For the prodromal group and looking at time unique images, there are 137 patients with 1 image, 16 patients with 2 images, 10 patients with 3 images.
* For the PD group and looking at time unique images, there are 148 patients with 1 image, 62 patients with 2 images, 39 patients with 3 images, 7 patients with 4 images.
* For the control group and looking at time unique images, there were 41 patients with 1 image, 2 patients with 2 images.

#### DTI
A DTI scan can reveal whether or not the water molecules in the axons are flowing properly along axonal directions. There are currently 3670 DTI images with a resolution of (1044,1044,65) and a pixel spacing of 2mm.
For DTI there were 380,1655,637 images for the control, PD, and prodromal groups respectivly.
* For the prodromal group and looking at time unique images, there are 159 patients with 1 image, 23 patients with 2 images, 24 patients with 3 images and 5 patients with 4 images.
* For the PD group and looking at time unique images, there are 169 patients with 1 image, 44 patients with 2 images, 53 patients with 3 images, 95 patients with 4 images and 1 patient with 5 images.
* For the control group and looking at time unique images, there were 41 patients with 1 image, 49 patients with 2 images, 14 patients with 3 images, and 3 patients with 4 images.
Fractional anisotropy measurements were also produced from the DTI images in six specific regions of interest: the caudal, middle, and rostral aspects of the substantia nigra.

## Results Collection

**Figure** SVM Classifier results on different datasets

**Figure** Deep GP classifier results

**Figure** Motion Code Results

**Figure** Motion Code with Cerebral Spinal Fluid
| Method | Accuracy |
| -------- | -------- |
| Linear Regression | 0.8705035971223022 |
| SVC | 0.8729016786570744 |
| SVC + Kmeans | 0.8752997601918465 |
| Sentence Transformer Encoder + SVC | 0.7961630695443646 |
| VAE encoder + SVC | 0.8657074340527577 |
| VAE encoder + Probabilistic SVC Threshold | 0.9096045197740112 |
**Figure** Results on Early Diagnostic Dataset