112-1 Intro to AI (HW4)

# 112-1 Intro to AI ## Homework 4 ###### tags:`1121iai` In this assignment, you'll be creating your own **Neural Radiance Fields (NeRF)**, providing you with an opportunity to explore the current advancements and applications of **3D rendering**. You can capture and document any scene that engages your interest and, after that, edit the camera trajectory to produce an animation. TA: 魏渤翰 bohanwei@ntu.edu.tw Original idea provided by 聞浩凱 ## Introduction ### Neural Radiance Fields (NeRF): * A computer graphics technique for modeling 3D scenes from 2D images. * Represents a scene as a continuous 3D function, capturing both geometry and appearance. * Applications like 3D scene reconstruction, novel view synthesis, and virtual reality (VR). #### Working Principle: 1. NeRF models a scene as a function that takes 3D spatial coordinates as inputs and predicts the color and opacity (sigma) at those points. 2. It uses a **neural network architecture** to approximate this function. 3. Using a large number of network parameters and training on a set of images taken from various viewpoints. #### Training Process: * NeRF requires a set of images of a scene captured from different angles. Each image provides a view of the scene and corresponds to a 2D projection. * The network is trained to minimize the difference between the predicted and observed colors of the scene. Training involves optimizing the network parameters to accurately represent scene radiance and geometry. * Once trained, NeRF can synthesize images from novel viewpoints not present in the training set. It generates rays from the camera through each pixel in the target view and the network is evaluated along these rays to compute color and opacity, yielding a rendered image. ![](https://hackmd.io/_uploads/HJ8t6U2T3.png) <h4 class="text-center"> Figure 1. The training process of NeRF </h4> ### Nerfstudio: * Nerfstudio provides a simple API that allows for a simplified end-to-end process of creating, training, and testing NeRFs. The library supports a more interpretable implementation of NeRFs by modularizing each component. * In this assignment, we will utilize Nerfstudio to construct our own NeRF with the data we have collected. ![](https://hackmd.io/_uploads/SymhbP3Th.png) <h4 class="text-center"> Figure 2. Example of Nerfstudio </h4> ## Data Collection This step will affect the effectiveness of your training outcomes. Therefore, during the video recording stage, please slow down the movement of your camera to capture enough imagery and relative positional information. Please generate your training data using one of the methods mentioned in the reference method, with a strong recommendation for collecting data through recording, including COLMAP and KIRI Engine. Here are a few methods for collecting data using a smartphone. If you can use the COLMAP method to generate data, you can earn an additional 10 points (optional). [Reference method](https://docs.nerf.studio/quickstart/custom_dataset.html) ### KIRI Engine Capture This method works for both Android and iPhone and does not require a LiDAR-supported device. However, the file download phase may be slow if you using NTU network. 1. set up KIRI Engine ![](https://hackmd.io/_uploads/rJU8_a2Ma.png) 1.1. downloading the app 1.2. turn on the developer Mode (a toggle can be found in the settings menu). 2. Prepare your own data ![截圖 2023-11-07 晚上7.29.20.png](https://hackmd.io/_uploads/S1bnyjDXT.png) 2.1. Navigate to captures window. 2.2. Select Dev. tab. 2.3. Tap the + button to create a new capture. 2.4. Choose Camera pose as the capture option. 2.5. Start recording a video to generate training data. Please experiment with different methods, such as video length, camera angles, and camera movement speed, to create the most comprehensive training data. It is highly recommended to record for at least 2 minutes. 3. Process data 3.1. After processing is complete, export the scene. It will be sent to your email. 3.2. Rename the .zip file to custom_data.zip, and put the file into your google drive (*your_drive*/nerfstudio/). ### Polycam Capture If you are iphone user,and your iphone have LiDAR. You can try to use this method to capture less blurry images. 1. Setting up Polycam ![](https://hackmd.io/_uploads/By1EqpnMa.png) 1.1. downloading the app 1.2. turn on the developer Mode. 2. Prepare your own data ![](https://hackmd.io/_uploads/HkOv5a3f6.png) 2.1. Capture data in LiDAR or Room mode. 2.2. Start recording a video to generate training data. Please experiment with different methods, such as video length, camera angles, and camera movement speed, to create the most comprehensive training data. It is highly recommended to record for at least 2 minutes. 2.3. Tap `Process` to process the data in the Polycam app. 2.4. Navigate to the export app pane, and select `raw data` to export a .zip file. 3. Process data 3.1. Put the .zip file into your google drive(*your_drive*/nerfstudio/). 3.2. Convert the Polycam data into the nerfstudio format using the following command (in colab cell 5): <pre><code> ns-process-data polycam --data {OUTPUT_FILE.zip} --output-dir {output directory} </code></pre> ## Sample Code https://colab.research.google.com/drive/1-FeJ51DhESSQZkYN1CqworfT90Zg1fVD?usp=sharing If you are using Colab, please remember not to click "Run all." Instead, execute the cells one by one in order to avoid errors. ## Report Format ### File Structure Please follow the format. If the format is wrong but the files are acceptible, you will get minus 10 points as a penalty. <pre><code> {student_id}_hw4 # In lowercase ├── {student_id}.mp4 # your trajectory video └── {student_id}.pdf # your text report # Zip it and Summit to NTU COOL -> {student_id}_hw4.zip </code></pre> 1. video (70%) Please use your creativity to construct your surroundings into a NeRF and render a trajectory video within this environment. The video should have a duration of 20 seconds and include at least 20 rendering cameras. After training your model using Colab, please edit in the cell7 window. If the window doesn't properly display the model, stop the training in cell8, and rerun cell7 + cell8. The output video quality should be 1920 $\times$ 1080, 24FPS, and it should utilize a high-quality NeRF mode(without too many blurry sections, which will be one of the evaluation criteria), as demonstrated in the demo video. {%youtube nGLw4EaZAfU %} 2. Text report（20%） Please make sure to answer all questions in the example file, excluding the optional ones, and submit the PDF file. If the file name or format is incorrect, it will be considered as a score of 0. [example_file](https://drive.google.com/file/d/1YvMmdjpFP3zDTlPeh_CAUiUI5jySMnaQ/view?usp=drive_link) ## Reference 1. 2020_ECCV_Nerf: Representing scenes as neural radiance fields for view synthesis 2. https://docs.nerf.studio/ ## Acknowledgement