HW6 Conceptual: Variational Autoencoders

Conceptual questions due Friday, April 19th, 2024 at 6:00 PM EST
Programming assignment due Friday, April 26th, 2024 at 6:00 PM EST

Answer the following questions, showing your work where necessary. Please explain your answers and work.

Please use

L A T E X

to typeset your answers, as it makes it easier for you and us.

Do NOT include your name anywhere within this submission. Points will be deducted if you do so.

Theme

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

When you pass a cow through a VAE

Conceptual Questions

Show that, for some discrete distributions
$P$ and
$Q$ , the difference between the KL-Divergence and Cross-Entropy loss can be marginal. In what scenario could this happen?

Hint: Recall the following equations:
$D_{KL} (P | Q) = \sum_{x} P (x) \log (\frac{P (x)}{Q (x)})$

$CE (P, Q) = - \sum_{x} P (x) \log Q (x)$
Your friend has been developing a multimodal model, where they take in multiple types of input streams at the same time and reason about them in a unified architecture. Specifically, they try to figure out a per-frame classification based on both an image and a frame of audio, both of which are sampled from a large video corpus.

Their network is performing about the same as it did with audio alone, but way better than with images alone. You're curious how well the image input is propagating through the network and thereby contributing to the model prediction.

How can you leverage content from generative models (i.e. reconstruction) to try to answer that question? Furthermore, if you find that the image data is not propagating meaningfully through the network, can you try to force the image information to propagate further (i.e. by incorporating something into the loss)?
One common problem encountered during GAN training is mode collapse. Explain what mode collapse is and why it occurs. (2-4 sentences)
While training your GAN model, you notice that the discriminator rapidly converges to near-100% accuracy and your generator accuracy doesn't improve over time. What kinds of things could be causing this? List a few possible contributing factors. (2-4 sentences)

CS2470-only Questions

For VAEs, why do we have to maximize the lower bound of the log-likelihood? When would this be problematic?
One intriguing application of GANs is to "translate" existing images into their "artistic" counterparts. Similar features have even been launched in Photoshop (see this video). Check out this CycleGAN project page, and answer: how do the authors manage to train conditioned GANs without paired images? (2-3 sentences)
Regular GAN's suffer from the vanishing gradient problem. One attempt to alleviate this issue is using Wasserstein loss. What is Wasserstein loss, and how is it different from the regular GAN loss function? Why does it not suffer as badly from vanishing gradients, and is there any special considerations that need to be made when implementing it (i.e. enforcing some constraint)? Feel free to read up on this method on your own to complement the lecture material.

HW6 Conceptual: Variational Autoencoders

Theme

Conceptual Questions

CS2470-only Questions

Read more

HW3 Programming: CNNs

Deep Learning Final Project

HW6 Programming: Variational Autoencoders

HW5 Conceptual: Image Captioning