JeniaJitsev
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    HIDA COVID Alpha X Team Hackathon =============== ## Outcome - **2. place** in Classification Challenge - congratulation to the whole team! ## Basic Info International virtual COVID-data challenge - HIDA, ELLIS, IDSI, Helmholtz AI - https://www.helmholtz-hida.de/en/activities/events/details/international-virtual-covid-data-challenge/ - Repo: https://github.com/JeniaJitsev/HIDA_COVID_Alpha_X_hackathon ## Submission - **IMPORTANT**: deadline shifted to **16:30** - Collaborative slides link for everyone: https://docs.google.com/presentation/d/1V7TT8YwQFYeoM8lCg3h_bdIGeqz83DCAkWswpu3wnCs/edit?usp=sharing - In case video is an option - the instructions for OBS workflow - https://drive.google.com/file/d/1IXqOo9b3ugCspR7PATqhAsjhixyqIPg3/view ## Next meetup - Proposal: Exchange before submission, 15:30 - Room: https://webconf.fz-juelich.de/b/jit-ffc-wyd - 29.04, 10:15 https://bbb.hzdr.de/b/hel-wwv-d67 ## Meetup Together Pre-Submission - **IMPORTANT**: deadline shifted to **16:30** - Meeting: 15:30 - Room : https://webconf.fz-juelich.de/b/jit-ffc-wyd - Submission file - Ultimate deadline for dropping all stuff in: 15:45 - HackathonCovidSubmissions (https://hmgubox2.helmholtz-muenchen.de/index.php/s/Hz6MKFXFw6nxFo7) - presentation slides : https://docs.google.com/presentation/d/1V7TT8YwQFYeoM8lCg3h_bdIGeqz83DCAkWswpu3wnCs/edit?usp=sharing ### Discussion Presubmission ## Submission procedure 1. use /testSet/testSet.txt as template 2. Rename with a team name (Team_COVID_Alpha_X) 3. Task 1: Replace NaNs with imputed values 4. Task 2: Fill-in prognosis with MILD or SEVERE Put into folder HackathonCovidSubmissions (https://hmgubox2.helmholtz-muenchen.de/index.php/s/Hz6MKFXFw6nxFo7) : before 16:00 - Slides with short explanation of the way to and solution itself - Jenia: collaborative slides link for everyone: https://docs.google.com/presentation/d/1V7TT8YwQFYeoM8lCg3h_bdIGeqz83DCAkWswpu3wnCs/edit?usp=sharing ## Meetup Together Discussion 29.04 - Room : https://bbb.hzdr.de/b/hel-wwv-d67 - Submission file - 1 hour before - 15:00; ultimate deadline: 15:30 - presentation slides : can be already started describing methods; results plugging in later ### Installing relevant packages - like datawig, fancyimpute etc - See here: https://github.com/JeniaJitsev/HIDA_COVID_Alpha_X_hackathon/blob/main/helpers/test.md ## Task 2 Blitz Discussion 29.04 Room: https://webconf.fz-juelich.de/b/jit-ffc-wyd - Deal with missing images - an option: use an additional mask vector, eg [0 1 .. 0], where each entry indicates whether input is missing or not - only images missing : a 2-dim vector eg [0 1] for case when imaging is missing, table is on - when image is missing: let image network feature vector become zeros (or any other reasonable dummy value), do not apply any input to image encoder - full missing data mask : an n+1-dim vector (n-dim of the table vector) with those entries that are missing indicated by 0 - nearest neighbor on training set to replace missing image input - first usual BCE loss for disease severity prediction only - if works, try loss including clinical data ground truth vector as additional teacher signal - Build classifier by fine-tuning on X-Ray set - Plug into fused architecture (using mask vector as additional input?) ## Access and Resources - Accesing the dataset https://hida-hackathon2021.scc.kit.edu/data/#accessing-the-dataset - Github repo: https://github.com/JeniaJitsev/HIDA_COVID_Alpha_X_hackathon - REMINDER: please put your **github names** here --> - mehdidc, stkmrc, niclaspopp, hannahrabea, shravaniCD ## Dataset Dataset: `/hkfs/work/workspace/scratch/ej4555-hida2021/HackathonCovidData` - create shortcut : `ln -s /hkfs/work/workspace/scratch/ej4555-hida2021/HackathonCovidData dataset` - Using the Dataset We propose the following steps in order to handle the data sets: Create a workspace `ws_allocate hida_workspace 30` Create a symlink to your workspace `ln -s $(ws_find hida_workspace) $HOME/hida_workspace` Copy the HackathonCovidData into your workspace `cp -vr /hkfs/work/workspace/scratch/ej4555-hida2021/HackathonCovidData $HOME/hida_workspace/` ## Workflow ### Installing relevant packages - like datawig, fancyimpute etc - See here: https://github.com/JeniaJitsev/HIDA_COVID_Alpha_X_hackathon/blob/main/helpers/test.md ### Virtual env: ```bash python -m venv hack_env source hack_env/bin/activate ``` ### Package install example: 1. Open New Terminal : File->New->Terminal 2. Create and source virtual env ```bash python -m venv hack_env source hack_env/bin/activate ``` 3. Install a package: e.g `pip instal opencv-python` ### Working from Terminal in interactive mode - Requires only ssh login, Jupyter not necessary (but can be used with Jupyter as well when starting Terminal there) https://www.nhr.kit.edu/userdocs/horeka/Slurm%3A_Interactive_Jobs/ - examples of interactive sessions: - Run interactive bash session with 1 GPU for 1 hour - ```bash srun --partition=haicore-gpu4 --gres=gpu:1 --time=01:00:00 --pty bash -i ``` - Run a script ```bash sbatch --partition=haicore-gpu4 --gres=gpu:1 --time=01:00:00 --pty my_script.sh ``` ### Creating a workspace 1. Create a workspace `ws_allocate hida_workspace 30` 2. Create a symlink to your workspace `ln -s $(ws_find hida_workspace) $HOME/hida_workspace` ## Team * COVID Alpha X * members * Helene * Ashish * Hannah * Marco * Niclas * Shravani * Mehdi * coach * Jenia * General * Mehdi: metrics, evaluation, testing https://github.com/JeniaJitsev/HIDA_COVID_Alpha_X_hackathon/tree/main/code_container * Task 1 * Helene --> using simpleImputer by datawig * [Notebook for imputation](https://github.com/JeniaJitsev/HIDA_COVID_Alpha_X_hackathon/blob/main/code_testing/data_wig_impute/Notebook.ipynb) * [imputations methods](https://github.com/JeniaJitsev/HIDA_COVID_Alpha_X_hackathon/blob/main/code_testing/data_wig_impute/impute_datawig.py) * mean score using [tests.py](https://github.com/JeniaJitsev/HIDA_COVID_Alpha_X_hackathon/blob/main/code_container/tests.py): 0.33 * Marco -> using MICE package in R * https://github.com/JeniaJitsev/HIDA_COVID_Alpha_X_hackathon/tree/main/MICE_Imputation * Shravani * Hannah --> using iterativeImpute by sklearn * Task 2 * Mehdi -> first step: basic skeleton of repo, validation metrics * * ashish * Niclas [Grand Scheme Image](https://github.com/JeniaJitsev/HIDA_COVID_Alpha_X_hackathon/blob/main/images/scheme_1.png) ## Dataloaders - X-Ray examples: - have a look here - a lot of data readers already implemented: https://github.com/mlmed/torchxrayvision#dataset-tools - https://github.com/mlmed/torchxrayvision/blob/master/torchxrayvision/datasets.py ## Discussion - Work distribution - Subteam for clinical data, subteam for images? Or concentrate all together on baseline solution for task 1 (clinical data, no images, missing values and main predictor disease severity) first? ### Task 1 (using clinical data) #### Infos - Test code for datawig data imputation: https://github.com/JeniaJitsev/HIDA_COVID_Alpha_X_hackathon/blob/main/code_testing/data_wig_impute/test_impute_datawig.py #### Material & Ideas Ashish: A nice idea would to go through this link https://towardsdatascience.com/7-ways-to-handle-missing-values-in-machine-learning-1a6326adf79e there is a cool library called datawig (by AWS) - Marco: Here is a quick overview of the data structure, how many values are missing: | | NAs | total | | -------- | -------- | -------- | | PatientID | 0 | 863 | | ImageFile | 0 | 863 | | Hospital | 0 | 863 | | Age | 1 | 863 | | Sex | 0 | 863 | | Temp_C | 154 | 863 | | Cough | 5 | 863 | | DifficultyInBreathing | 4 | 863 | | WBC | 9 | 863 | | CRP | 33 | 863 | | Fibrinogen | 591 | 863 | | LDH | 136 | 863 | | Ddimer | 621 | 863 | | Ox_percentage | 243 | 863 | | PaO2 | 170 | 863 | | SaO2 | 583 | 863 | | pH | 207 | 863 | | CardiovascularDisease | 19 | 863 | | RespiratoryFailure | 159 | 863 | | Prognosis | 0 | 86 | - If somebody has a Jupyter notebook on clinical data exploration and sniplets of code to try - push it into [our repo](https://github.com/JeniaJitsev/HIDA_COVID_Alpha_X_hackathon), so that everyone can try it as well Helene: I would like to try a linear model to predict the missing values based on the other patients who have those values. I think just using a distribution of the values over all patient would neglect the correlations between certain values. The simplest thing I can think of is to find 2-3 most important features for each feature to be predicted Jenia: MICE (see below) could be a strong baseline for missing value problem, it seems straightforward to use and goes beyond a simple linear model approach - it is build into `scikit-learn` as IterativeImputer https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html - the library it originates from offers other impute methods as well: https://github.com/iskandr/fancyimpute - Jenia: On missing data imputation baseline - "Another algorithm of fancyimpute that is more robust than KNN is MICE(Multiple Imputations by Chained Equations). ** MICE ** performs multiple regression for imputing. Use the below code snippet to run MICE, `from fancyimpute import IterativeImputer mice_impute = IterativeImputer() traindatafill = Mice_impute.fit_transform(traindata)` IterativeImputer was merged into **scikit-learn** from fancyimpute. However, it can still be imported from fancyimpute." https://github.com/iskandr/fancyimpute - Jenia: more advanced approaches (beyond baseline) for missing value problem on clinical data would be generative models (GANs, VAEs), see here - Advantage of such methods could be that one would be able to train end-to-end with an image encoder part on common end loss - but workshop is too short for that, I assume - Example using GANs: GAIN - a very strong work (a lot of follow ups): Jinsung Yoon, James Jordon, Mihaela van der Schaar, "GAIN: Missing Data Imputation using Generative Adversarial Nets," International Conference on Machine Learning (ICML), 2018. Paper Link: http://proceedings.mlr.press/v80/yoon18a/yoon18a.pdf https://github.com/jsyoon0823/GAIN https://github.com/dhanajitb/GAIN-Pytorch - https://openaccess.thecvf.com/content_CVPR_2020/html/Yoon_GAMIN_Generative_Adversarial_Multiple_Imputation_Network_for_Highly_Missing_Data_CVPR_2020_paper.html - Example using VAE: Handling Incomplete Heterogeneous Data using VAEs - https://arxiv.org/abs/1807.03653 - "Variational autoencoders (VAEs), as well as other generative models, have been shown to be efficient and accurate for capturing the latent structure of vast amounts of complex high-dimensional data. However, existing VAEs can still not directly handle data that are heterogenous (mixed continuous and discrete) or incomplete (with missing data at random), which is indeed common in real-world applications. In this paper, we propose a general framework to design VAEs suitable for fitting incomplete heterogenous data. The proposed HI-VAE includes likelihood models for real-valued, positive real valued, interval, categorical, ordinal and count data, and allows accurate estimation (and potentially imputation) of missing data. Furthermore, HI-VAE presents competitive predictive performance in supervised tasks, outperforming supervised models when trained on incomplete data." - https://github.com/probabilistic-learning/HI-VAE https://github.com/probabilistic-learning/HI-VAE/blob/master/script_HIVAE.sh - Advantage: can handle any kind of values - continuous, discrete, so would fit into mixed value case in clinical data here. ### Task 2 (using X-ray Images) #### Info - using 320x320 resizing (as in pre-trained CheXPert models) #### Material - https://github.com/albumentations-team/autoalbument gamechaning aug - https://github.com/PyTorchLightning/pytorch-lightning/blob/master/notebooks/01-mnist-hello-world.ipynb pytorch lightning - Ashish:https://github.com/facebookresearch/CovidPrognosis Mehdi shared this and it looks great we already get a pretrained model and we can fine tune. (by Facebook) - Jenia: hint - this is unsupervised (self-supervised, via contrastive loss) pre-training, so it may be weaker and have more troubles than supervised pre-trained models (see below, eg BiT). However, unsupervised pre-training is cool, so worth trying. Maybe worth comparing a supervised pre-trained and a self-supervised pre-trained one. - Ashish: https://github.com/jfhealthcare/Chexpert pretrained model available (top 5th solutions of https://stanfordmlgroup.github.io/competitions/chexpert/) - Jenia: Further pre-trained models (on NIH ChestX-ray14) based on a strong paper (from Chicago hospital): https://github.com/IVPLatNU/deepcovidxr - Marco: Review on performance of transfer learning for COVID 19 detection on X-ray images: https://pubmed.ncbi.nlm.nih.gov/32773400/ - Marco: If we want to do lung segmentation, here are two pretrained models I found: - https://github.com/haimingt/opacity_segmentation_covid_chest_X_ray (Paper: https://www.medrxiv.org/content/10.1101/2020.10.19.20215483v1.full) - https://github.com/raghavian/lungVAE (Paper: https://arxiv.org/abs/2005.10052) - Jenia: for transfer learning, downloading a pre-trained BiT model can be considered. BiT is currently state of the art in transfer (via supervised pre-training on classification task, ImageNet-1k, 21k). - See papers here: - Supervised Transfer Learning at Scale for Medical Imaging (Google Health) : https://arxiv.org/abs/2101.05913 - Big Transfer (BiT): General Visual Representation Learning: https://arxiv.org/abs/1912.11370 (original BiT paper, Google Research Labs Zuerich) - and repo here https://github.com/google-research/big_transfer#how-to-fine-tune-bit - Jenia: there are further pre-trained models (based on EfficientNet, VIT), also using X-Ray datasets for pretraining (CheXpert) - DenseNet-121 based models pretrained on CheXpert or MIMIC-CXR, or other large X-Ray datasets, are here: https://github.com/mlmed/torchxrayvision#models-demo-notebook ## Schedule ### April 28, 2021 - 11:00 AM in Zoom Welcome to the International COVID-DATA Challenge Helmholtz Information & Data Science Academy Helmholtz AI & Ellis Netzwerk Israel Data Science Initiative Zoom - 11:15 AM in Zoom Networking session – Get to know each other - 11:25 AM in Zoom Introduction to our challenge incl. Q&A Helmholtz AI Ellis Netzwerk BRACCO Imaging/Centro Diagnostico Italiano - 11:50AM in Zoom Introduction to the Computer resources by our partners HAICORE, Karlsruhe Institute for Technology NVIDIA - 12:15 PM in Slack Teambuilding via Slack & time for your challenge - 4:00 PM in Zoom "Coffee Break" - Short team introductions, time for questions & support by the Data Coaches - 4:20-10:00 PM in Slack More time for your challenges ### April 29, 2021 - 10:00 AM in Zoom Welcome back: A brief overview of day #2 Helmholtz Information & Data Science Academy - 10:10 AM in Slack More time for your challenges and to prepare your submission and video - 4:00 PM Deadline: Submission of solution and video - 4:30 PM in Zoom Solutions for the ages – a short crash course on sustainable software development Helmholtz-Zentrum Dresden-Rossendorf - 5:00 PM in Zoom Award ceremony & thank you Please note: All times are in CEST

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully