Conceptual questions due Friday, April 19th, 2024 at 6:00 PM EST
Programming assignment due Friday, April 26th, 2024 at 6:00 PM EST
Answer the following questions, showing your work where necessary. Please explain your answers and work.
Please use to typeset your answers, as it makes it easier for you and us.
Do NOT include your name anywhere within this submission. Points will be deducted if you do so.
When you pass a cow through a VAE
Show that, for some discrete distributions and , the difference between the KL-Divergence and Cross-Entropy loss can be marginal. In what scenario could this happen?
Hint: Recall the following equations:
Your friend has been developing a multimodal model, where they take in multiple types of input streams at the same time and reason about them in a unified architecture. Specifically, they try to figure out a per-frame classification based on both an image and a frame of audio, both of which are sampled from a large video corpus.
Their network is performing about the same as it did with audio alone, but way better than with images alone. You're curious how well the image input is propagating through the network and thereby contributing to the model prediction.
How can you leverage content from generative models (i.e. reconstruction) to try to answer that question? Furthermore, if you find that the image data is not propagating meaningfully through the network, can you try to force the image information to propagate further (i.e. by incorporating something into the loss)?
One common problem encountered during GAN training is mode collapse. Explain what mode collapse is and why it occurs. (2-4 sentences)
While training your GAN model, you notice that the discriminator rapidly converges to near-100% accuracy and your generator accuracy doesn't improve over time. What kinds of things could be causing this? List a few possible contributing factors. (2-4 sentences)