--- tags: hw6, conceptual --- # HW6 Conceptual: Variational Autoencoders **Due April 21st at 6PM** Answer the following questions, showing your work where necessary. Please explain your answers and work. :::info We encourage the use of $\LaTeX$ to typeset your answers, as it makes it easier for you and us, though you are not required to do so. ::: :::warning Do **NOT** include your name anywhere within this submission. Points will be deducted if you do so. ::: ## Theme ![](https://external-preview.redd.it/kPm83LewI0Qc5ya69zCS3MyrHzJ_1GPqRAqzwDKlbpQ.jpg?auto=webp&s=20cd77aa61d6a873eb542c365dc299f68a390849) *Hopefully you don't look like this right now.* ## Conceptual Questions 1. Show that, for some discrete distributions $P$ and $Q$, the difference between the KL-Divergence and Cross-Entropy loss can be marginal. In what scenario could this happen? :::info **Hint:** Recall the following equations: $$D_\text{KL}(P|Q)=\sum_{x}P(x)\log\left(\frac{P(x)}{Q(x)}\right)$$ $$\text{CE}(P, Q) = -\sum_{x}P(x)\log Q(x)$$ 2. Your friend has been developing a **multimodal model**, where they take in multiple types of input streams at the same time and reason about them in a unified architecture. Specifically, they try to figure out a per-frame classification based on both an image and a frame of audio, both of which are sampled from a large video corpus. Their network is performing about the same as it did with audio alone, but way better than with images alone. You're curious how well the image input is propagating through the network and thereby contributing to the model prediction. How can you leverage content from generative models (i.e. reconstruction) to try to answer that question? Furthermore, if you find that the image data is not propagating meaningfully through the network, can you try to force the image information to propagate further (i.e. by incorporating something into the loss)? 3. One common problem encountered during GAN training is mode collapse. Explain what mode collapse is and why it occurs. (2-4 sentences) 4. While training your GAN model, you notice that the discriminator rapidly converges to near-100\% accuracy and your generator accuracy doesn't improve over time. What kinds of things could be causing this? List a few possible contributing factors. (2-4 sentences) ## Ethical Implications ![](https://i.imgur.com/hmHNv5w.jpg) Variational autoencoders are just one of many ways to implement generative AI models. Many generative models today are trained on millions of copyrighted works. This raises concerns about the legality of these models and how these models should be used. Read this [article](https://www.theverge.com/23444685/generative-ai-copyright-infringement-legal-fair-use-training-data) to learn more about the issues associated with deep learning models trained on copyright-protected works. The article mentions the [fair use doctrine](https://www.copyright.gov/fair-use/), which provides some guidance on the usage of copyrighted works for training generative models. From the U.S. Copyright Office, “fair use is a legal doctrine that promotes freedom of expression by permitting the unlicensed use of copyright-protected works in certain circumstances.” Skim over the fair use doctrine. Take a close look at the four factors and their descriptions in section 107 of the fair use doctrine. 1. Summarize the considerations that should be taken into account when determining whether something is fair use in the context of generative models. Are these considerations sufficient for protecting copyright owners? Why or why not? (6-8 sentences) In the question above, we asked you to think about the legal consequences of generative models. As generative models have rapidly improved, however, we also need to consider what progress in AI-generated art means on a more fundamentally human level. Read this [blog post](https://www.theredhandfiles.com/chat-gpt-what-do-you-think/) by Nick Cave, an Australian musician and poet, a response to a song generated by ChatGPT “in his style.” This is an artist’s visceral response to a work generated by a machine—maybe art, maybe not. 2. In your own words, summarize Cave’s argument. Do you agree or disagree with his assertion that machine learning models “cannot create a genuine song?” (4-6 sentences) 3. Cave’s argument is focused on the creative as opposed to consumption. How much do you think the process of creation of art matters? If someone enjoys or relates to a piece of art, does it or should it matter where it came from? (4-6 sentences) ## CS2470-only Questions 1. For VAEs, why do we have to maximize the _lower bound_ of the log-likelihood? When would this be problematic? 2. One intriguing application of GANs is to "translate" existing images into their "artistic" counterparts. Similar features have even been launched in Photoshop (see [this video](https://www.youtube.com/watch?v=BzFY4pzb8cA&t=2427s)). Check out this [CycleGAN project page](https://junyanz.github.io/CycleGAN/), and answer: how do the authors manage to train conditioned GANs without paired images? (2-3 sentences) 3. Regular GAN's suffer from the vanishing gradient problem. One attempt to alleviate this issue is using Wasserstein loss. What is Wasserstein loss, and how is it different from the regular GAN loss function? Why does it not suffer as badly from vanishing gradients, and is there any special considerations that need to be made when implementing it (i.e. enforcing some constraint)? Feel free to read up on this method on your own to complement the lecture material.