owned this note changed 5 years ago
Published Linked with GitHub

2/28 Deep Learning Discussion

Today's question: Imagine that you have and your advisor are having a discussion about whether to use a variational autoencoder (“deep learning” or AI) for your current research problem. Your advisor and your collaborators are very much for trying it, but none of you have much experience with deep learning. What considerations might you bring to the conversation in exploring whether deep learning is a good fit for your problem?

Background Questions

  • What exactly is machine learning? How is deep learning different?
  • What is an autoencoder?

Class Responses:

  • "I think one consideration would be how structured/unstructured your data is because it seems like deep learning is really useful for unstructured data but maybe unnecessary for structured data.

  • The article talked about how learning in biological datasets is much more difficult than in text or images, so you should consider the format of your data.

  • "If I had large data sets that may have unknown implications, such as RNAseq, we may want to use deep learning as there is a large expanse of unstructured data that may have underlying correlations we are unable to parse out ourselves.

  • "Employing a VAE would be appropriate when exploring a massive data set that can be parsed and further organized. This especially holds true when there are unknown differentiating factors, like extents of expression of certain genes in cancer types/subtypes."

  • how I might experimentally show the model's validity?

  • Do we have enough data to feed a deep learning approach? Is our data accurate enough? Are there some similar examples of deep learning with our data type?

  • Collaborate? Can I do this?

  • biology can be very redundant, non linear, and has its own strange logic that can make biological relationships hard to understand or even define.

  • the results produced are simply as good as the data that goes in - for instance, is our data set of interest well-annotated and massive? What challenge(s) is this approach slated to overcome (i.e. identifying genetic variation, categorization, stratification of disease types/populations, etc.)? And based on our research problem, does the potential value to be reaped outweigh deep learning's existing limitations (i.e. inherent biases in our approach, reproducibility concerns - different models, while seeking to be both explainable and accurate, will not always approach the same problem and may yield varied conclusions).

Select a repo