tzhang428
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    4
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Reproduction of HouseGan paper ## Introduction The paper by Nelson Nauata et al. presents a novel graph-constrained generative adversarial network for a new house layout generation problem, whose task is to take an architectural constraint (a bubble diagram) as a graph (i.e., the number and types of rooms with their spatial adjacency) and produce a set of axis-aligned bounding boxes of rooms. The paper also employ convolutional message passing neural networks (Conv-MPN), which differs from graph convolutional networks (GCNs). They argue that the architecture enables more effective higher-order reasoning for composing layouts and validating adjacency constraints. We aim to reproduce the results shown in the paper by using the existing code (making some changes as required to fit our dataset). First, we tried to replicate the results from the LIFULL HOME's dataset as used in the paper and then tried out four different methods to see if the results changed. Here are the alterations below describing our method: 1. Remove the CMP layers, edges and room-type related features from the discriminator. 1. Replacing the CMP layer with CNN and removing the edges and room-type features, which are required information in CMP. Since CMP is removed, this information will not be needed anymore. 1. Changing sum-pooling to average pooling in the CMP layer. 1. Changing the amount of neurons in particular networks. ## Original Model Structure It is crucial to understand the orinal paper and architeture for the reproduction purpose. Starting from the dataset, we will explain the data set and HouseGan architecture used in this paper. ### Data Set This paper uses bubble diagrams and images of house layouts. These graphs are originally from the LIFULL HOME’s database, where they extract 117,587 house layouts. #### Bubble diagrams The bubble diagrams are not raw data, and are derived by applying an algorithm on the layouts. Every room is a node in the bubble diagrams, each with information of room types such as living room, kitchen, bedroom, etc. The authors provides preproccesed data in their Git hub repository. Figure below is a exmaple of the bubble diagrams reconstruted by Nelson Nauata et al. ![](https://i.imgur.com/rbOy8Vc.png)! #### House layout Rooms are in the form of bounding boxes and are aligned with axes in these images. It is assumed that two rooms are connected if the Manhattan distance between the two bounding boxes is less than 8 pixels. ![](https://i.imgur.com/LgwN5EA.png) ### GAN Structure The HouseGAN, just as a noram GAN, has a generator and a discriminator. The crucial mechanism used in the paper is called convolutional message passing (CMP) which allows the network to identify representations that the aurthor wants to keep, such as room type and room adjacent information. Following subsections will give detailed explanination on the HouseGan and CMP struture. #### Generator The following flow chart gives general idea about the generator. * step 1: The generator takes an n x 128 x 1 gaussian noise and nx10x1 hot encoding variables as input where n is the total number of rooms in a batch. The 10x1 hot encoding represents the room type and there are 10 different types of rooms. * step 2: Noise and room type inputs are concatenated and form a new input with dimension nx138x1. * step 3: Apply a linear layer. The output size becomes nx1024x1. * step 4: Reshape the input. The output size has dimension n x 16 x 8 x 8. * step 5: Extract both adjacent and non adjacent room information for each room and sum up them separately which forms adjacent feature n x 16 x 8 x 8 and non adjacent feature, n x 16 x 8 x 8. Then, concatenate these two features with original input along the second dimension (channel). The output size is n x 48 x 8 x 8. Feature extraction uses edge information and is given. * step 6: Pass through 3 layers of CNN. Each layer uses leakyrelu as activation function. With specified stride and padding, these 3 layers only compress the number of channels. The output from the CNNs are n x 32 x 8 x 8, n x 32 x 8 x 8, and n x 16 x 8 x 8. Step 5 and 6 together are called the CMP layer. * step 7: Upsample the input and use leakyRelu (kernel size 4, stride 2, padding 1). The output has a size of n x 16 x 16 x 16. * step 8: Repeat step 5 and 6, but this time the image has size 16 x 16 instead of 8 x 8. After passing through the CMP layer, the output has szie n x 16 x 16 x 16. * step 9: Upsample the input and use leakyRelu (kernel size 4, stride 2, padding 1), and the output becomes n x 16 x 32 x 32. * step 10: Pass through a decoder layer which is a 3 layer CNN network using leakyRelu for the first two layers and tanh for the last layer (kernel size 3, stride 1, padding 1). The output sizes from each layer are n x 256 x 32 x 32, n x 128 x 32 x 32 and n x 1 x 32 x 32. The final outputs are the GAN generated room masks. ```flow st=>start: input: Noise + Nodes op=>operation: Linear layer & Reshape op2=>operation: CMP layer (Need edges as argument) op3=>operation: Upsample layer op4=>operation: CMP layer (Need edges as argument) op5=>operation: Upsample layer op6=>operation: Decoder layer e=>end: Output st->op->op2->op3->op4->op5->op6->e ``` #### Discriminator The following flow chart gives general idea about the discriminator. * Step 1: The discriminator takes input of a mask with size of n x 32 x 32 x 1 either from the output of the generator or a real floorplan. Where n is total number of room in a batch. * Step 2: A linear layer with 8,192 neurons is applied with another input of node, whose size is 10 x 1. The output is further reshaped from 8,192 to n x 32 x 32 x 8. * Step 3: Concate the masks and the output from step 2. The output becomes n x 32 x 32 x 9. * Step 4: Put the output from step 3 into three convolutional layers. Because padding and stride are both 1 (padding = 1 here means to pad both side of width and height with 1), the output size is only decided by the spatial dimension of input and channel dimension of the final CNN (Input channel, output channel, kernel size, kernel size) = (n x 16 x 16 x 3 x 3) . Therefore, the output is n x 32 x 32 x 16. * Step 5: Put the output from step 4 into two times of CMP and downsample. The output size is n x 8 x 8 x 16. * Step 6: Put the output from step 5 into three CNN layers. Because the stride of the three CNN layers are 2 and padding are 1, the spatial dimension is halved for each CNN layer, i.e. 8x8, 4x4, 2x2 and finally 1 x 1. Additionally, the channel dimension of the final CNN (Input channel, output channel, kernel size, kernel size) = (128 x 128 x 3 x 3) . Therefore, the output size is n x 1 x 1 x 128. * Step 7 : There is one subtle but important thing to mention. All previous steps are conducted by each room. In this step, these rooms from the same graph will be merged together. Accordingly, the output from step 6 is n rooms features with dimension of 1 x 1 x 128. The paper merges rooms which belong to the same graph with a sum pool. The output size is now m x 1 x 1 x 128, where m is the number of example in a batch size. Then, we reshape its dimension from m x 1 x 1 x 128 into m x 128. * Final step: Put the reshaped output from step 7 into a linear layer with one neuron. The output size is 1. ```flow st=>start: Input: Nodes op=>operation: Linear layer & Reshape op2=>operation: Concatination with Masks op3=>operation: 3 CNN layers op4=>operation: CMP layers (Need edges as argument) & Downsample op5=>operation: 3 CNN layers op6=>operation: Sum pooling & Reshape op7=>operation: Linear layer e=>end: Output st->op->op2->op3->op4->op5->op6->op7->e ``` ### Convolutional Message Passing Neural Network (Conv-MPN /CMP) Conv-MPN is a variant of a graph neural network (GNN), and learns to infer relationships of nodes by exchanging messages. Conv-MPN is specifically designed for cases where a node has an explicit spatial embedding, and makes two key distinctions from a standard message passing neural network (MPN): 1) the feature of a node is represented as a 3D volume as in CNNs instead of a 1D vector; and 2) convolutions encode messages instead of fully connected layers or matrix multiplications. This design allows Conv-MPN to exploit the spatial information associated with the nodes. The Conv-MPN in this paper takes the standard MPN architecture then replaces: 1) a latent vector with a latent 3D volume for the feature representation 2) a fully connected layers (or matrix multiplications) with convolutions for the message encoding. <u>Convolutional message passing</u> Conv-MPN module updates a graph of room-wise feature volumes via convolutional message passing, a node feature spreads across a volume and a simple pooling could keep all the information in a message without collisions. Instead of encoding a message for every pair of nodes, the paper just pool features across all the neighboring nodes to encode a message, followed by CNN to update a feature vector. The Con-MPN update the vector by : 1) concatenating a sum-pooled feature across rooms that are connected in the graph 2) concatenating a sum-pooled feature across non-connected rooms 3) applying a CNN: $$ g^l_r \leftarrow CNN[\,g^l_r \;; Pool_{s \in N(r)} \;\; g^l_s; Pool_{s in \overline{N}(r)}\;\; g^l_s\,] $$ # Experiments We have conducted five sets of experiment runing on GOOGLE Cloud/Colab using up to four GPUs. These five simulation include one original model from paper and four modified network strutures. Each model is trained with 20 epochs due to the computational limits, 20 minutes per epoch under 4 GPUs, and 45 minutes under 1 GPU. ### Adjust pooling methods The Conv-MPN module updates the graph by concatenating a sum-pooled feature across rooms for both connected and non-connected rooms, in the paper of CMP they explained a simple pooling could keep all the information in a message for a node feature spreads across a volume and without collisions, our goal is that we would like to know how much it would affect the result by changing sum-pooling to average pooling. The features we derived are the representation for each room. When we concatenate adjacent and non-adjacent features, these features should also represent one room instead of the sum of features for all rooms. Therefore, we want to use average pooling to repesent one room feature. ### Remove edges and room types information in discriminator From lecture notes, we learned that the GAN discriminator tries to discriminate between real and fake images. The HouseGAN discriminator takes three variables as inputs, room mask, edge, and room type which add computation complexity. We would like to test if the discriminator will still learn something without additional information (such as edge and room type) and try to lower the computational cost. Hence, for one design, the CMP and room-type related layers are removed from the discriminator, and for the other design, the CMP layer is replaced by a simple 3 layer CNN. #### Remove CMP layers in discriminator Two CMP layers in the discrminator are removed, which tremendously decreases the complexity. The input and output sizes are adjusted to fit with the rests of layers. #### Replace CMP layers with CNN layers in Discriminator In the paper of deep convolutional GAN in 2015 [https://arxiv.org/abs/1511.06434], the authors indicated the stridden CNN in discriminator can give a good result. Therefore, we would like to know whether replacing each CMP layers with three CNN layers can derive a similar result. The arguments of the three CNN layers are each with (C_in, C_out, Kernel_size, p, s) = (16,16,3,1,1). Since CMP is removed, the information of edges and room types will not be used. ### Reduce number of Neurons While the paper goes in depth about which types of hidden layers are used in the network and the amount of neurons that are present in each layer, the paper fails to offer any motivation on why these particular amounts of neurons were used. We thus assume this means the amounts were somewhat arbitrarily chosen. This made us wonder whether any alterations to the amount of neurons in the existing layers would still produce comparable results. In this variation instead of adding a 32x32x8 tensor to the segmentation mask at the start of the discriminator, a 32x32x6 tensor was added instead. Another change was made during the upsampling, where instead of the feature of size 32x32x9 being upsampled to size 32x32x16, we get a feature of size 32x32x7 that is upsampled to size 32x32x12. In the end the amount of layers and their type remain the same and only the amount of neurons in some of the layers are altered. ## Results Five sets of results are generated and are cross compared by our architecture member for a subjective score. Original we decided to use the FID score, but we need to generate 50,000 fake images with 5000 real one. This large amount of images had shut down the Jupytern notebook conntion to the virtual machine and we did not find the solution. ### Pooling CMP learns to infer relationships by exchanging messages during feature extraction. It represents the feature associated with a node as a feature volume and utilizes CNN for message passing while retaining the standard message-passing neural architecture. After comparing the results, the sum pooling layer in CMP in the original paper performs pretty much the same. We assume that is because average pooling method smooths out the image and hence the sharp features may not be identified when this pooling method is used, while Sum pooling (which is proportional to Mean pooling) measures the summed value of the existence of a pattern in a given region. In short, sum-pooling is just a scaled version of mean pooling, but the non-linear layers in the models will lead to a different result, this result might be subtle, so the outcome shouldn’t differ too much. But we can still see in the average pooling method, that the proportional size of a single room is very out of scale compared to sum pooling, we argue that this is because the room size information is not included in the CMP layer. ![](https://i.imgur.com/DnIKlij.png) ### Remove edges and room types information in Discrimintor #### Remove CMP The generated images use a removed CMP discriminator is shown in figure below. ![](https://i.imgur.com/Ztgj3yQ.jpg) We believe that the CMP and room type are useful information in the discriminator. Without this information, generated room maps lose the edge structures and all rooms stack together. #### Replace CMP layers with CNN layers The results look poor. This architectures mostly generates layouts of small sizes. Also, the relation of nodes is not expressed in most of the outputs. This result further confirms our belief that the information of edges and nodes are important. ![d](https://i.imgur.com/pCUgv6I.png) ### Neuron reduction The results are quite similar to those of the original. While the rooms are structured differently, there is no clear difference in quality between the original method and this variation. This indicates that the network functions similarily while using fewer neurons, this would imply some redundancy might be present in the original network. Whether the optimal amount of neurons was used in the original experiment or in this one is hard to tell, but this does mean that there is merit in spending time testing out more different neural combinations to figure out which combinations perform optimally. Another interesting thing that could be looked at is finding the minimum number of neurons that can be used in the network that would still produce visually similar results to the original. ![](https://i.imgur.com/vVF4Z5m.png) ### Overall Comparsion It can be seen that the model with average pooling, and the model with reduced neurons give the similar results as the original model. For the models removing information of edges and room types in discriminator, it can be seen the output are obviously worse than original one. |Input graph |Original|Pooling |No CMP |CNN|Neuron reduction| |--|--|--|--|--|--| |![](https://i.imgur.com/sN4fslQ.png)|![](https://i.imgur.com/OJo6e9Q.png)|![](https://i.imgur.com/7rPcNWq.png)|![](https://i.imgur.com/61Mwunh.jpg)|![](https://i.imgur.com/jeDmYHC.png)|![](https://i.imgur.com/fS4Iklx.png) | |![](https://i.imgur.com/o90Coia.png)|![](https://i.imgur.com/82pnBqD.png)|![](https://i.imgur.com/UHbsAzs.png)|![](https://i.imgur.com/iwnUdIK.jpg)|![](https://i.imgur.com/4EyVtfA.png)|![](https://i.imgur.com/rcVWoV8.png) | |![](https://i.imgur.com/apoyeFR.png)|![](https://i.imgur.com/yGu4tFi.png)|![](https://i.imgur.com/qO8Mmrl.png)|![](https://i.imgur.com/r362h37.jpg)|![](https://i.imgur.com/8PQ44iU.png)|![](https://i.imgur.com/jKvQPlI.png) | |![](https://i.imgur.com/qT3zETK.png)|![](https://i.imgur.com/twlsQeH.png)|![](https://i.imgur.com/jyghfzw.png)|![](https://i.imgur.com/aX6ggeA.jpg)|![](https://i.imgur.com/QETL8B6.png)|![](https://i.imgur.com/NTd6B4t.png) | ## Conclusion In the past couple of weeks we have taken an in depth look at the paper and the provided code. We also studied any method/network that was used in the paper in order to get a full understanding of how the GAN functions. We applied this knowledge to make small alterations to the network in order to test the importance of specific parts. The alteration where we replaced sumpooling with average pooling as well as the alteration where we reduced the amount of neurons in the network, we obtained very similar results to the baseline, meaning these specific factors were not too important to the results. Our other two alterations showed the importance of the edges and room type features, as these methods left those features out and did not generate adequate layouts. Ultimately, we're pleased to see the variation in results between the different methods considering the hardware limitations that did not allow us to train the models extensively nor generate an FID score.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully