FWO interview
=============
Pitch talk
----------
- *CV disease* might affect any of us, as it causes almost 50% of all deaths in Europe. Yet, it is strongly preventable.
- However, the risk scores used in clinical practice suffer from *low accuracy*.
- *Machine learning* on the other hand can predict risk with a *much higher* accuracy and therefore support *personalised and preventive medicine*. Machine learning *was not utilised* in preventive cardiology so far.
- *In this project,* I will develop machine learning models for *risk stratification* using *longitudinal cohorts*. I will also utilise *federated learning* to scale up data and to prevent the need to share patients' sensitive information.
- The best models will be valorised in *software* for clinical practice. In addition, researches will be provided with a valuable *open-source research tool*.
- Available cohorts, multidisciplinary team and my expertise in computer science make this project *highly feasible*
Committee
---------
**Claudia Pagliari**

- Psychologist, eHealth, Health Informatics
**Henk van Weert**

- Professor of General Practice, Amsterdam Uninversitei
- Diagnostic decision support and care for chronically ill patients in general practice.
- Cardiology and oncology
- Attrial fibrilation risk prediction
**Raf De Wilde**

- Marketing and IT management in medicine
**Bohets Hilde**

- Drug development
- Biomarkers
-
**Fahey Tom**

- Public Health Medicine and General Practice
- Member of different boards and committees
**Isabelle Aujoulat**

- Patient centered health care
- Empowerment of patients
**Kristen Hollands**
<img src='https://i.imgur.com/XUSLAhd.png' width='200'>
- Mobility, Ageing, Stroke, Rehabilitation
**Filippo Macaluso**

- Effect of physical stress on muscle (e.g. heart muscle)
**Carol Tishelman**

- nursing, palliative care, cancer
**Verleye Leen**

- Federal center for health care
Questions
---------
- *What is a scientific value of this project?*
- **ML Platform**
- Can be used by other researchers in preventive cardiology in their projects
- **Model validation** on external cohorts
- Federated learning to further improve accuracy using external cohorts
- Clustering methods to **characterise stages of CV disease progression using routinely measured data**
- Explore information in **LV and LA deformation curves** using NN
- Ultimately, to develop ML models for **CV outcome prediction for early risk stratification using routinely measured data, including echocardiography**.
- *What is the socio-economical impact?*
- **Personalised patient-centered health assessment and early risk-stratification**
- **Early intervention** in high-risk patients to prevent progression to symptomatic disease
- With **more precise targeting**, we can increase **cost-effectiveness of patient management**
- *What is a strategic dimension of this project*?
- Crrently we can see emerging market of AI healthcare, the call is to utilise amount of medical data to improve the health of the society
- Utilise the large data through **ML platform for researchers** which will be used for **model validation** on external cohorts and for **further improvement in accuracy** through **federated learning**. It will be open-sourced to be broadly used.
- The best models can be integrated into **clinical application—subscription based** for funding of further development. Planning to start a spinoff
- Risk scores
- ACC/AHA
- non-fatal myocardial infarction, coronary heart disease death, as well as fatal or non-fatal stroke
- age, sex, race, total cholesterol, hdl cholesterol, systolic and diastolic blood pressure, BMI, smoking, diabetes, history of hypertension
- ESC SCORE
- CV mortality
- sex, age (years), total cholesterol (mmol/L), systolic blood pressure (mmHg) and current smoking (yes/no).
- *What differentiate you from competition?*
- Use of ML platform to validate models on different populations and further improve models using federated learning
- Team combining expertise in cardiology and machine learning with already published results on machine learning in early CV risk prediction
- Availability of longitudinal general population cohorts
- Data privacy through not forcing users to upload data to cloud
- *Barriers for impact and mitigation plans*
- Slower adoption of new biomarkers in clinical practice
- focusing on clinically suitable protocols
- mitigate technical barriers by easily integrable software solution
- Issues with data privacy
- Data will be utilized only locally or using federated learning
- Not storing any data in clinical application
- Problems with standardization of the data:
- Part of WP1 is formation of the blueprint for data standardization
- Federated learning won't improve accuracy of the models
- Developing models using local cohorts
- Further communication with our partners about data sharing
- *Why to use machine learning over traditional linear methods?*
- You can of course use traditional linear models, but there are disadvantages
- Traditional statistical modeling techniques are **intended rather for inference**, whereas ML's goal is to make **precise predictions**
- Also because of that, we have to recalibrate the tradition risk score
- Tree-based ensemble methods for example can incorporate a **large amount of features, can detect non-linear patterns, are capable of non-continuous decision boundary** and thus we can detect complex individual patterns **which leads to more precise and personalised treatment**
-----------------
- Federated learning research is focused on these ML methods
- Traditional risk scores often have to be recalibrated
- *What is the comparison to existing classifiers, what is the added clinical value?*
- Current risk scores suffer from low accuracy, AUC 0.6
- ML can detect **complex pattern and therefore support personalised and precise risk stratifiction**
- This tool can process a large amount of data and produce a novel information that can clinician can work with
-
------------------
- TODO: studies
- *What is the prevalence of CV disease?*
- In FLEMENGHO (Flemish Study on Environment, Genes and Health Outcomes)
- **Mean age: 51**
- 51% women
- **Median follow up: 18 years** (61.000 person years)
- Heart failure incidence: **3 events per 1000 person-years**
- Total: 3340
- CV event: 694 (20%), **17 events per 1000 person-years**
- Cardiac event: **12 events per 1000 person-years**
- Total with echo: 1450
- Hypertension: 611 **(40%)**
- LV hypetrophy: 270 **(19%)**
- LV diastolic dysfunction: 250 **(17%)**
- In EPOGH (European Project on Genes in Hypertension):
- Total: 1700
- **Mean age: 45 years;** 55% women
- Median follow up: **9 years**
- Hypertensive: 750 (44%)
- CV event: 100 (6%)
- *Do you see a danger in that ML is a black box?*
- It is important, but we give doctor another information, it is not a substitution of the doctor. This mitigates the problem.
- TODO: Interpretability
- *How will be the results validated?*
- Two large longitudinal population cohorts
- The blueprint for the standardization of the data will be available and will use the ML platform for validation on external cohorts from our partners in Norway and Sweden
- The open source code can be investigated
- *Echocardiography can't be used for early risk stratification*
- *FLEMENGHO study** from our unit **has shown that echocardiographic measures** including LV diastolic dysfunction, longitudinal strain and cardiac remodelling/hypertrophy **are independent predictors CV outcome**
- *Will all patients to be sent to echocardiography, how do you preselect?*
- This was actualy topic of my **previous paper**. By using **routinely measured clinical data** including ECG and biomarkers, you might detect **subclinical cardiac abnormalities**, which **indicates** need for further in-depth echocardiographic examination
- *What data are there to be used? What kind of data (images...)?*
- See "How will be the results validated?"
- First, the models are trained on tabular data, so we have attributes expressed as **single values** (for example LV mass) obtained for **every individual**
- **Another approach will be** to extract information from **LA and LV deformation curves**
- *What about difficulties with translation to clinical practice (imprecise measurements, missing data, what features to include...?)*
- We focus on **routinely measured** clinical **variables** and try to minimize amount of variables needed
- Most of variables are **correlated**, thus missing or imprecise information from a particular variable can be **substituted by compound information** from the others
- Easily intergrable software solution to mitigate any other technical obstacles
- *What is novel in this project*
- Same as scientific and socio-economic objectives
- *Is there any international collaboration?*
- Our partner research units in Sweden and Norway providing their cohorts through the ML platform—SCAPIS, Tromso, HUNT.
- *Is the genomics used? If not, why?*
- Genomics is surely very important aspect for personalised CV risk prediction, but it is **currently out of scope of this project**.
- *Is the project feasable?*
- Available cohorts
- Experts in the team
- Unit of **Hypertension and Cardiovascular Epidemiology** at KU Leuven
- **Artifical intellgence unit in Department of Public Health and Primary care** at KU Leuven—my co-promotor Celine Vens
- I've already developed a prototype of the research application
- Published results on ML in early CV risk stratification
- *How is this project related to other projects?*
- Our unit focuses on early risk stratification, so for example there is another project exploring cardiovascular health assessment using physical stress test
- *The post-project trajectory*
- Distribution of the software package to healthcare professionals through product website
- Provide classifers for cardiac abnormalities to PACS providers (GE, Philips) for their competitive advantage or developing new tools
- Communication with health professionals on conferences and connection to cardiology societies
- *How is the patient involved?*
- Patient gets more **personalised** information about his or hers **risk**
- Based on that, patient may adhere to **preventive strategies like sport programs**
- Further step: personal patient mobile applications
- *What is your expertise in machine learning?*
1. University education, my diploma thesis on data mining in small business
2. Long experience in software development
3. Already published paper "Detecting early cardiac maladaptation using ML" in Europen Heart Journal
4. Second paper using unsupervised learning for phenomapping
5. Continuous education by scientific literature and courses
- *Why do you think you are the right person for this project?*
- see "what is your expretise in machine learning"
- already prototype of the software for researchers
- hard working, fast-learner
- very interested in improving well-being and health of the society
- *Under what license will be the software published? Who will benefit from the license?*
- The platform for researchers will be published under open-source license (for example GNU/GPL), so everybody can use it, and even improve it.
- Software for clinical practice will be published under suitable subscription based commercial license to support further development
- The server software would remain close software at least in the beginning to remain partially in control, but it could be open-sourced later, so users can run their own instances.
- *Who will use this tool? Is there a demand for it?*
- The research software will be used by the **researched** in CV epidemiology
- The software for clinical practice will be used by **heart failure specialist**, **preventive cardiologists** and **GP**s
- A great demand from the community, they are waiting for the software
- *What about the standardization of the input data?*
- Important aspect is to produce **data blueprint** for standardization,
- Part of **WP1**
- *Who is supporting the machine learning part / what are the roles in the project?*
- I am responsible for the computer science part, including machine learning
- Co-promotor Prof. Celine Vens from artifical intelligence unit from public health and primary care
- CV epidemiology
- Promotor: Prof. Tatiana Kouznetsova
- Co-promotor: Nicolas Cauwenberghs
- *Who are the beneficiaries of this project?*
- General population: better early risk stratification and thus improvement in well-being and cardiovascular health
- CV researchers: better research tools
- Clinicians: a decision support tool
- *How many/what features are required to perform the classification?*
- For early detection of cardiac maladaptation, around 19 archieved optimal performance.
- Anthropometrics (BMI), Disease history (history of HT), Blood pressure, ECG (R wave, T wave), biochemistry (leptin),
- Has to be validated on external cohorts through the ML platform
- *Where do you see yourself in 4 years? What are your goals in the academia?*
- Obtain PhD degree
- Deepen my knowledge of data science and cardiology
- Have both
and practical impact
- In that time, functional research and clinical application
- Continue in research
- interesting, because it is about novel approaches
- impact on society
- *Why do you think your group is the right one for this project?*
- We have determined team from both cardiology and machine learning fields
- Celine Vens, Tatiana Kouznetsova, Nicholas Cauwenberg
- Availability of longitudinal general population cohorts
- Solid international connections with other research units
- *How will be the data entered in clinical practice?*
- First manually
- Parsing a data file produce from echocardiographic system
- Later, connect to electronic health records
- Patient application in tablet to enter questioner
- *Is the ambition of this tool to replace the doctor?*
- No
- Decision support tool
- Another input for the doctor to make right decisions
- *How do we know unsupervised learning will detect disease without a label?*
- Echocardiography features describes overall cardiac health
- It creates distinct groups
- We can easily recognize, which groups are "health" and "unhealthy"
- **Study results**
- P60
- Cum laude