Special Project I

# 113學年度實務專題 :::info :bulb:Notes and Progress ::: ## 🎉 Welcome :::success Professor 王瑞堂 Asistant Professor 錦洋 ::: ## :book: Study Note and Progress of The Week ### :small_orange_diamond: Week 1 <details> <summary>Summary</summary> :small_blue_diamond: More Detail Study Note [Google Drive](https://drive.google.com/drive/folders/1hYGlJcfQULbn67g91Cwnl4CrQrRhd0zE?usp=sharing) - **The Interconnected Relationship Between AI and 6G：** - Overview: This document explores how Artificial Intelligence (AI) and Sixth-Generation (6G) mobile communication technology will be interconnected. - Key Points: AI will be embedded into 6G networks, allowing self-optimizing, autonomous decision-making. 6G networks will enhance AI capabilities by providing ultra-fast speeds, real-time data transmission, and massive connectivity. AI-driven applications such as smart cities, autonomous vehicles, and immersive AR/VR will benefit from 6G's improved infrastructure. - **The Relationship Between BIM and AI: A Transformative Development：** - Overview: This document discusses the impact of AI on Building Information Modeling (BIM) and how the two technologies are reshaping the construction industry. - Key Points: AI enhances BIM workflows by automating repetitive tasks and optimizing design choices. Smart construction and safety management are improved through AI’s predictive analytics. AI-driven BIM tools help in cost estimation, project scheduling, and building performance analysis. - **ChatGPT in Cybersecurity - Offensive and Defensive Applications：** - Overview: Examines how ChatGPT is being used in cybersecurity, both for security professionals and potential cyber threats. - Key Points: Defensive Applications: Assists in code analysis, vulnerability detection, threat intelligence, and automation of security tasks. Offensive Risks: Hackers may misuse ChatGPT for social engineering, generating malicious scripts, and bypassing security protections. Ethical concerns and OpenAI’s measures to prevent malicious use. - **Role of AI in 6G Networks：** - Overview: Discusses AI's role in the development and functionality of 6G. - Key Points: AI will be deeply integrated into 6G for network optimization, predictive maintenance, and resource management. Real-time AI applications such as holography, immersive VR, and smart transportation will be supported by 6G’s high-speed connectivity. Edge computing and IoT will benefit from AI-driven automation. - **Prompt Engineering in Generative AI：** - Overview: Covers strategies for improving AI-generated outputs through effective prompt engineering. - Key Points: Well-structured prompts enhance AI’s ability to understand tasks. Methods include providing clear instructions, offering background context, and breaking tasks into smaller steps. Applications in AI-assisted writing, programming, and decision-making. - **AI in Architecture and BIM Tools：** - Overview: Explores how AI is revolutionizing architecture and BIM tools - Key Points: AI-powered generative design helps architects create optimized layouts. AI automates clash detection, cost estimation, and predictive maintenance in construction. AI-driven insights improve energy efficiency and sustainable design practices. - **Using AI + Revit for Architectural Design, Estimation, and Proposal Creation：** - Overview: Discusses the integration of AI with Revit for enhancing architectural workflows. - Key Points: AI automates material selection, cost estimation, and proposal writing. Revit’s BIM models can be analyzed and improved using AI-driven insights. Streamlines design processes and enhances collaboration among project teams. - **6G Networks - A New Era of AI and Machine Learning：** - Overview: Examines how AI and ML will be integral to 6G networks. - Key Points: 6G will be 100x faster than 5G and support near-instant data transmission. AI will enable real-time network adjustments and predictive troubleshooting. Applications include autonomous vehicles, Industry 4.0, smart healthcare, and real-time holography. - **6G, 5G Networks, and IoT - Features, Comparison, and Impacts：** - Overview: Compares 5G and 6G technologies and their effect on IoT. - Key Points: 5G offers speeds up to 10 Gbps and supports smart cities, healthcare, and industrial automation. 6G will exceed 100 Gbps, integrate AI for predictive networking, and enable real-time holographic communications. IoT devices will benefit from low-latency connectivity and AI-driven automation. - **Revit + AI Integration with ChatGPT for Interior Design：** - Overview: Describes how Revit and ChatGPT can enhance interior design workflows. - Key Points: AI generates creative interior design concepts based on user prompts. Revit’s modeling tools integrate with AI to automate design iterations. AI streamlines material selection and visualization for clients. - **Application of BIM with AI, IoT, GIS, and Big Data：** - Overview: Discusses how BIM is enhanced by AI, IoT, GIS, and Big Data. - Key Points: AI optimizes project planning, cost estimation, and risk analysis. IoT enables smart sensors to monitor construction sites in real-time. GIS integrates geospatial data for urban planning and infrastructure management. Big Data analytics improves project efficiency and sustainability. </details> ### :small_orange_diamond: Week 2 <details> <summary> Summary</summary> :small_blue_diamond: More Detail Study Note [Google Drive](https://drive.google.com/drive/folders/127VtxO9NJ0oWo7JM1gOIpT6CDODmqhvh?usp=sharing) - **Weekly Report on LLM, GIS, and BIM Integration Study：** - Understanding the Need Before the Tools LLM + GIS Integration:GIS struggles with unstructured text data, but LLMs can enhance analysis for urban planning, environmental monitoring, and crisis response. BIM for Carbon Analysis: BIM can track carbon emissions throughout a building’s lifecycle, which is essential for carbon neutrality and ESG compliance. Why Combine LLM, GIS, and BIM? Together, they provide comprehensive data analysis, simplify insights, and automate ESG reports. - Main Problems Identified Data Accessibility: GIS/BIM data is too complex for non-experts → Solution: Use LLMs to simplify and explain data. Carbon Lifecycle Estimation: Difficult to measure emissions at different stages → Solution: Use BIM for accurate carbon footprint tracking. ESG Compliance: Reporting is time-consuming → Solution: Automate ESG reporting with integrated data. - Strategy for Integration API Integration: Develop APIs to allow GIS and LLM to communicate and share data. Middleware Development: Create real-time data processing systems for automated updates. Software Integration: Develop GIS plugins and BIM tools to incorporate AI-powered insights. - Practical Applications Urban Planning: LLM + GIS for better land use and transport analysis. Environmental Monitoring: GIS imagery + LLM for tracking climate changes. Carbon Lifecycle Analysis: BIM + AI for reducing emissions. ESG Reporting: Automate reports using combined data. </details> ### :small_orange_diamond: Week 3 <details> <summary> Summary</summary> :small_blue_diamond: More Detail Study Note [Google Drive](https://drive.google.com/drive/folders/1eY5sJWwSF6nFcZlFyar1CQBx-sT-mWOW?usp=sharing) - **Study Notes on LLaMA (Large Language Model Meta AI)：** - Introduction : LLaMA (Large Language Model Meta AI) is a family of transformer-based NLP models developed by Meta. Used for text generation, translation, summarization, and AI research. - LLaMA Versions : LLaMA 1 (2023): Smaller models (7B, 13B, 33B, 65B) for research use. LLaMA 2 (2023): Open-sourced, improved dataset, better instruction-following. LLaMA 3 (Expected 2024): Anticipated improvements in efficiency and performance.） - Applications : Chatbots & Virtual Assistants for customer support. Content Generation for writing, blogs, and scripts. Code Assistance for debugging and completion. Translation & Summarization across multiple languages. Scientific Research for AI model advancements. - Technical Details : Trained on diverse datasets including internet data and research papers. Token Limit: Supports longer text contexts. Hardware Requirements: 7B Model → High-end consumer GPUs. 13B Model → Multiple GPUs or cloud services. 70B Model → Needs A100 GPUs or TPU clusters. - Conclusion : LLaMA is a powerful open-source alternative to proprietary AI models like GPT-4. It is optimized for efficiency, accessibility, and research applications and is expected to have a growing impact on NLP advancements. </details> <details> <summary> Testing</summary> - **First Install and Test LLaMA 3.2：** - Installed LLaMA 3.2 using Ollama on a Windows system. Verified the installation with the command `ollama list` - Executed LLaMA 3.2 via command `ollama run llama3.2` - AI Interaction Example User asked: "What is AI?" LLaMA 3.2 explained AI, its types (Narrow AI, General AI, Superintelligence), and its applications (virtual assistants, image recognition, self-driving cars, etc.). - ![Screenshot 2025-02-16 153429](https://hackmd.io/_uploads/rk93pYEi1l.png) - Conclusion The LLaMA 3.2 installation and test were successful. The model is functional, efficient for local AI tasks, and can provide detailed, structured responses for educational and research purposes. </details> ### :small_orange_diamond: Week 4 <details> <summary> Summary</summary> :small_blue_diamond: More Detail Study Note [Google Drive](https://drive.google.com/drive/folders/1g_nF6iIJpS1YQBRPNieGBXngFe0ctR9F?usp=sharing) - **Study Note on DeepSeek-V3 – Capabilities, Design, and Implications：** - Design and Functionality : DeepSeek-V3 is a large language model (LLM) built on a transformer-based architecture. Processes text by breaking it into tokens and predicting likely sequences. Uses attention mechanisms to weigh word importance in a sentence. - Training and Limitations : Trained on books, articles, and online data, but static (no real-time updates). Strengths: Language processing, summarization, translation, and coding assistance. Limitations: Lacks true understanding, struggles with deep reasoning and subjective topics. - Future Development : Expected improvements: Better reasoning, creativity, and emotional intelligence. Challenges: Bias, transparency, and explainability. Human-AI collaboration: AI assisting repetitive tasks, humans handling oversight. - Research Applications : AI can assist in summarizing literature, hypothesis generation, and data analysis. Researchers should cross-check AI outputs for accuracy. - **Study Note on LOD 500 – Applications, Benefits, and Challenges in BIM：** - What is LOD 500? Level of Development 500 (LOD 500) is the most detailed BIM (Building Information Modeling) level, representing as-built conditions. Used for facility management, operations, and maintenance. - Comparison with Other LODs : LOD 100: Conceptual design with approximate shapes and sizes. LOD 200: Generic systems and assemblies with approximate quantities. LOD 300: Precise geometry and specifications for construction. LOD 400: Fabrication and assembly details for specific components. LOD 500: As-built conditions with exact measurements and operational data. - Applications Facility Management: Optimizes maintenance, space use, and emergency planning. Infrastructure & Industrial Projects: Essential for bridges, tunnels, and power plants. Renovation & Retrofitting: Provides an accurate model for modifications. - Benefits Accuracy & Reduced Errors: Laser scanning and photogrammetry improve precision. Enhanced Collaboration: A shared model improves communication. Cost Savings: Early error detection prevents expensive repairs. - **Study Note on Integrating DeepSeek AI with LOD 500 in BIM：** - Overview AI integration with LOD 500 aims to automate workflows, improve accuracy, and enhance analytics in BIM. DeepSeek AI can process and optimize large BIM datasets for facility management. - Technical Implementation AI Training: Uses supervised learning and deep learning (CNNs) for object recognition. Automated Model Creation: AI classifies building elements and generates retrofit designs. Data Standards: Integration with ISO 19650 and Autodesk Revit for compatibility. - Challenges Data Complexity: Requires cloud computing and AI-powered data cleaning. Cost Barriers: AI integration can be expensive; phased adoption is suggested. Ethical Concerns: AI bias and transparency must be addressed. </details> <details> <summary> Testing</summary> :small_blue_diamond: More Detail Study Note [Google Drive](https://drive.google.com/drive/folders/1Fzn3HPT_Pg1Iaj0ZyjDjUr9iF8SNiTNK?usp=sharing) - **First Try : Image Generation：** - Since the Deepseek AI model is a text-to-text-based model, I thought of another way to generate image outputs from text inputs. First, I used Stable Diffusion AI to help me generate images. I created an account on Stable Diffusion AI to obtain an API. Then, I wrote some code in VS Code with the help of Deepseek itself. To execute the code I had written, I used Windows Command Prompt and set up a Python folder. The generated images were then saved directly in the same folder as the Python files. ![Screenshot 2025-03-03 213205](https://hackmd.io/_uploads/HJw-Qq4okg.png) ![Screenshot 2025-03-03 213252](https://hackmd.io/_uploads/SJbGQcNo1g.png) ![Screenshot 2025-03-03 212931](https://hackmd.io/_uploads/rysNQ94jJe.png) ![Screenshot 2025-03-04 225208](https://hackmd.io/_uploads/Hkk0Qq4iyx.png) - My plan moving forward is to integrate Deepseek R1, which I have downloaded locally from Olama. I will use the Deepseek R1 AI model to generate more detailed, precise, and clear prompts that Stable Diffusion AI can easily understand, ensuring that the output aligns with my expectations. - Additionally, I plan to upload image files to Stable Diffusion AI, allowing the Deepseek AI model to generate corresponding descriptive prompts based on the given input. However, for now, I will focus primarily on generating images. </details> <details> <summary>Comment and Question</summary> I would like to ask if what I am doing aligns with what the professor expects. You are welcome to leave comments, suggestions, or recommendations by contacting me via Google Chat. I would greatly appreciate your feedback. Thank you very much! </details> ### :small_orange_diamond: Week 5 <details> <summary>Comment</summary> I tried to load the DeepseekR1 model but have an issues in Load and Testing </details> <details> <summary>Testing</summary> https://colab.research.google.com/drive/1SZ7fLnu1po3sEDIeaC6t-XdvZmaBIgpF?usp=sharing </details> <details> <summary>Testing</summary> https://colab.research.google.com/drive/1Rkl0X0SM4yYsXuVvp62C8HF6K7hz3M_3?usp=sharing </details> ### :small_orange_diamond: Week 6 <details> <summary>Comment</summary> I found the way to load and use the stable-dreamfussion. I have tried Stable-Dreamfussion on Google Colab and Run Locally, but both have an error to load and run the model. So I tried another way to produce the 3D image from prompt. It is the Lumalabs.AI.Genie which i know from someone dscord and link. And it works. </details> <details> <summary>stable-dreamfusion Testing</summary> This is the github link with the google colab link for the stable-dreamfusion. https://github.com/ashawkey/stable-dreamfusion?tab=readme-ov-file ![Screenshot 2025-03-20 153909](https://hackmd.io/_uploads/BJQKvrY2yl.png) </details> <details> <summary>stable-dreamfusion Testing Locally</summary> ![Screenshot 2025-03-20 153400](https://hackmd.io/_uploads/HJFiBHtnJl.png) ![Screenshot 2025-03-20 153456](https://hackmd.io/_uploads/rkI2rHYh1l.png) ![Screenshot 2025-03-20 151702](https://hackmd.io/_uploads/S1mTHHK31x.png) ![Screenshot 2025-03-20 151724](https://hackmd.io/_uploads/HkRaSSFhkl.png) </details> <details> <summary>Meshgen Testing</summary> I Tried to use another AI call Mesh.gen to input a prompt and the output is 3D file that can be opened in Blender. This AI literally work on blender but I cannot run it because ut says that my vram is'n enough. https://github.com/huggingface/meshgen?tab=readme-ov-file to run this AI, I need to download the model first into my disk, and then use Adds-On in blender. Once this Model is load, I can Use it by pressing N and write the prompt and click generate. ![image](https://hackmd.io/_uploads/HJtKzHthyx.png) ![image](https://hackmd.io/_uploads/rkc-XSF2kl.png) </details> <details> <summary>Testing</summary> I Tried to use another AI call Lumalabs.AI/genie to input a prompt and the output is 3D file that can be opened in Blender. ![image](https://hackmd.io/_uploads/S10aZrYnJx.png) ![image](https://hackmd.io/_uploads/B1g-GSFhJl.png) </details> ### :small_orange_diamond: Week 7 <details> <summary>SHAP-E Testing</summary> I try to use SHAP-E AI Moel to generate an image with the prompt. Note that SHAP-E AI Moel can o the Prompt-to-3 and image-to-3. I tried to run in google colab with a different prompt, the left side is the example from github with the prompt 'a shark'. The midddle one is the prompt I give 'a woodden door'. And the right side is the complete prompt from llama 'Rustic wooden door with a polished metal knob and a subtle love-themed design etched into the wood.' https://colab.research.google.com/drive/1zH0bvzkr-VGYLEmlNdbIFUa1sZ03lzPP?usp=sharing ![Screenshot 2025-03-26 022007](https://hackmd.io/_uploads/rkca4dg61g.png) ![Screenshot 2025-03-26 022030](https://hackmd.io/_uploads/ry4CNuxpJe.png) ![Screenshot 2025-03-26 002638](https://hackmd.io/_uploads/HyM1HOeakl.png) The second one I tried to load the SHAP-E moel to my laptop locally and its works, with prompt 'a house with rock texture'. But the output is far from good. ![Screenshot 2025-03-26 023033](https://hackmd.io/_uploads/BJ99DueTJg.png) ![Screenshot 2025-03-26 015946](https://hackmd.io/_uploads/rJ-28_xp1e.png) </details> <details> <summary>My Comment Related to SHAP-E</summary> 🤖 Is SHAP-E “not smart enough”? Short answer: Yes and no. ❌ Weaknesses: SHAP-E was trained on a limited 3D dataset (like Objaverse-lite) Its understanding of real-world structure is shallow — it doesn’t know what a “chair” should look like deeply It’s also trained to produce implicit fields, which are compact but harder to control or refine Text prompts are not always interpreted correctly (ex: "a house with rock texture" = ???) That’s why results often look blobby, melted, or off-topic. 🧠 So is using another AI like LLaMA to improve prompts… useless? Not entirely — but kinda wasted for SHAP-E alone. LLaMA or GPT could refine your prompt like: "a house with rock texture" → "a medieval stone cottage with pointed roof and small windows" But if SHAP-E never learned what that looks like, a better prompt doesn’t help much. 💡 So yeah — throwing better words at a model that doesn’t understand them well won’t fix much. </details> <details> <summary>Update About Stable-Dreamfusion, mesh eror</summary> I sucses to load the AI Model locally and run it. First try to train and run the model it tooks me just a few minute, but in the end i just realizedd it is error. I give the prompt, "a high quality, detailed photo of a house with rock texture in its surface". But in the end the output is not as expected. ![Screenshot 2025-03-27 002321](https://hackmd.io/_uploads/SyShpiZ6Je.png) ![Screenshot 2025-03-27 002355](https://hackmd.io/_uploads/SJ3b0j-Tkg.png) ![Screenshot 2025-03-27 002454](https://hackmd.io/_uploads/HkJQCsZpke.png) The second try to train andd run the model tooks very long time, it tooks me about 9 hour to finish I give the prompt, "a high quality, detailed photo of a halloween pumpkin". ![Screenshot 2025-03-27 002614](https://hackmd.io/_uploads/rk7fyn-pyg.png) ![Screenshot 2025-03-27 002724](https://hackmd.io/_uploads/Skemk2Wakx.png) ![Screenshot 2025-03-27 002815](https://hackmd.io/_uploads/ry2NJnbTkl.png) ![Screenshot 2025-03-27 002826](https://hackmd.io/_uploads/SkuHkhba1l.png) ![Screenshot 2025-03-27 002839](https://hackmd.io/_uploads/rJ8v12Wakx.png) ![Screenshot 2025-03-27 002851](https://hackmd.io/_uploads/r1Q_k2Zpyx.png) ![Screenshot 2025-03-27 002905](https://hackmd.io/_uploads/Bk7Y1nZT1x.png) But in the end, the mesh file cannot be export. i try to debugging for all night and i still couldn't figure out why. ![Screenshot 2025-03-27 003018](https://hackmd.io/_uploads/BySzenZakx.png) </details> ### :small_orange_diamond: Week 8 <details> <summary>Update About SHAP-E Resolution test (Local)</summary> last 5 days I try to figure out hoe wo integrate the quality of the 3D object generated by SHAP-E This image is from test with the various option such as 1. Guidance Scale 2. Karras Steps (Sampling Steps) 3. Sigma Range for Sampling 4. Reduce Blurriness with Higher s_churn ![Screenshot 2025-04-02 233511](https://hackmd.io/_uploads/BJhrKR5TJx.png) I give the 'A wooden window' prompt. The first object is the original settings(the photo on below). And second object until forth object, I set the Guidance Scale, Karras Steps, Sigma Range, and Reduce Blurriness Gradualy. In the fifth object, I change the prompt to 'pyramid', because I want to test this ai is better with solid object. (the one to four object has many holes in its surface rather than fifth object (pyramid).) ![Screenshot 2025-04-02 201532](https://hackmd.io/_uploads/SJcVqRq61l.png) After that I try to compare the karras steps between every object that I generate. The prompt is 'a detailed chair' 1st object : karras steps=400 (3 min run) 2nd object : karras steps=600 (8 min run) 3rd object : karras steps=2000 (15 min run) 4th object : karras steps=4000 (32 min run) 5th object : karras steps=6.400 (50 min run) In the end, the most suitable karras steps is 2000 with the 15 minute run (object 3). ![Screenshot 2025-04-02 234749](https://hackmd.io/_uploads/Hy4rh0qTyl.png) this is the image of the settings i use to generate object 3 ![Screenshot 2025-04-02 235417](https://hackmd.io/_uploads/rJ03605akl.png) </details> <details> <summary>How to Run SHAP-E Locally</summary> This is step by step how to load and run the SHAP-E model i used in my local. ``` conda create -n shap-e python=3.10 -y conda activate shap-e git clone https://github.com/openai/shap-e.git cd shap-e pip install -e . pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 notepad generate_mesh.py # (look at the below code) pip install pyyaml pip install ipywidgets python generate_mesh.py ``` This is the code for generate_mesh.py ``` import torch from shap_e.diffusion.sample import sample_latents from shap_e.diffusion.gaussian_diffusion import diffusion_from_config from shap_e.models.download import load_model, load_config from shap_e.util.notebooks import create_pan_cameras, decode_latent_images, decode_latent_mesh device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print(f"Using device: {device}") # Load models print("Loading models...") xm = load_model('transmitter', device=device) model = load_model('text300M', device=device) diffusion = diffusion_from_config(load_config('diffusion')) # Generate shape from text prompt = "A detailed chair" print(f"Generating: {prompt}") latents = sample_latents( batch_size=1, model=model, diffusion=diffusion, guidance_scale=50.0, model_kwargs=dict(texts=[prompt]), progress=True, clip_denoised=True, use_fp16=torch.cuda.is_available(), use_karras=True, karras_steps=2000, sigma_min=5e-4, sigma_max=80, s_churn=0, ) # Save mesh print("Saving .obj file...") for i, latent in enumerate(latents): mesh = decode_latent_mesh(xm, latent).tri_mesh() with open(f'shap-e_output_{i}.obj', 'w') as f: mesh.write_obj(f) print("Done!") ``` If you need more information, you can contact me via Line or Google Chat. Special Thanks. </details> ### :small_orange_diamond: Week 9 <details> <summary>Problem SHAP-E Locally</summary> There is some question and problem that I cannot answer. In the previous week, when I generate the 3D image, its only take second, minute, not longger than 1 hour. I mentioned last week that karras steps i need to render is : karras steps=400 (3 min run) karras steps=600 (8 min run) karras steps=2000 (15 min run) karras steps=4000 (32 min run) karras steps=6.400 (50 min run) But this week when I try to run again, karras steps=2000 need 1 hour and 36 minute to complete. At first I think this is because I render using CPU not GPU. But I confirm that I am using the GPU to render it. I search the solution online and I dont know what to do, so I try to downgrade the karras steps bake to 64. It supposed to only 15 second last week. but it takes 3 minutes now. </details> <details> <summary>Try to Improving Quality (1) SHAP-E Locally</summary> In this week I tried to optimize and improve the SHAP-E generated image. I plan to reducing the 'hole' texture in the 3D image. but unfortunatly I can't do it after I seting the code. ![Screenshot 2025-04-23 214256](https://hackmd.io/_uploads/B1bzjK8Jxl.png) </details> <details> <summary>Try to Improving Quality (2) SHAP-E Locally</summary> After many attempt failed, I look for the online answer and ask ChatGPT to help me deal with this problem, I changed my code to ``` import torch from shap_e.diffusion.sample import sample_latents from shap_e.diffusion.gaussian_diffusion import diffusion_from_config from shap_e.models.download import load_model, load_config from shap_e.util.notebooks import create_pan_cameras, decode_latent_images, gif_widget device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') xm = load_model('transmitter', device=device) model = load_model('text300M', device=device) diffusion = diffusion_from_config(load_config('diffusion')) batch_size = 2 guidance_scale = 15.0 prompt = "a wooden chair" latents = sample_latents( batch_size=batch_size, model=model, diffusion=diffusion, guidance_scale=guidance_scale, model_kwargs=dict(texts=[prompt] * batch_size), progress=True, clip_denoised=True, use_fp16=True, use_karras=True, karras_steps=64, sigma_min=1e-3, sigma_max=160, s_churn=0, ) render_mode = 'nerf' # you can change this to 'stf' size = 64 # this is the size of the renders; higher values take longer to render. cameras = create_pan_cameras(size, device) for i, latent in enumerate(latents): images = decode_latent_images(xm, latent, cameras, rendering_mode=render_mode) # Example of saving the latents as meshes. from shap_e.util.notebooks import decode_latent_mesh for i, latent in enumerate(latents): t = decode_latent_mesh(xm, latent).tri_mesh() with open(f'example_mesh_{i}.ply', 'wb') as f: t.write_ply(f) with open(f'example_mesh_{i}.obj', 'w') as f: t.write_obj(f) ``` So the change is in the NERF and CAMERAS I used and it start to worked. But the required time also increased to almost 20 minutes per batch. For example it need aproximately 20 minutes to generate 4 batch of images in one runetime. And it required aproximately 15 minutes to generate 4 batch of images in one runetime. I also generated the door and chair with the saame prompt I used in the previous week for comparison. ![Screenshot 2025-04-23 235050](https://hackmd.io/_uploads/H1J_hYUkeg.png) </details> ### :small_orange_diamond: Week 10 <details> <summary>Blender API Sript (error)</summary> I write an API in Blender script. First it is not succesfull and many erors. Although it seems fine and the script running without an error, the OBJ file cannot be read and cannot be import to the Blender. At first I think I need to solve this with adds on. So I try to search at the Blender add on pages online, google. But I can't find the answer. This is the code : ``` import bpy import os import time WATCH_FOLDER = r"C:\Users\Andrew Bartholomeo\shap-e" last_imported = "" def import_latest_obj(): global last_imported files = [f for f in os.listdir(WATCH_FOLDER) if f.endswith(".obj")] if not files: return latest_file = max(files, key=lambda f: os.path.getmtime(os.path.join(WATCH_FOLDER, f))) if latest_file == last_imported: return full_path = os.path.join(WATCH_FOLDER, latest_file) bpy.ops.object.select_all(action='SELECT') bpy.ops.object.delete(use_global=False) bpy.ops.import_scene.obj(filepath=full_path) print(f"Imported: {latest_file}") last_imported = latest_file def repeat_check(): import_latest_obj() return 5.0 bpy.app.timers.register(repeat_check) ``` </details> <details> <summary>Blender API Script (success)</summary> I try to edit the code to : ``` import bpy import os import time WATCH_FOLDER = r"C:\Users\Andrew Bartholomeo\shap-e" CHECK_INTERVAL = 2 # seconds def get_latest_obj_file(folder): files = [f for f in os.listdir(folder) if f.endswith(".obj")] if not files: return None files = [os.path.join(folder, f) for f in files] return max(files, key=os.path.getctime) def import_latest_obj(): latest_obj = get_latest_obj_file(WATCH_FOLDER) if latest_obj: print(f"Importing {latest_obj}") bpy.ops.wm.obj_import(filepath=latest_obj) else: print("No OBJ file found.") # Optional: run this only once import_latest_obj() ``` This code is runing succesfull in Blender Script. A new problem occour. Whenever I open the Blender app and start a new layout, i must type and copy the code in the scripting session, So the 3D object not 'automatically' open in the Blender. I try to save the script by the `Scripting tab > Text > Register` and save the `.blend` file. When I open the same file, the script is run automatically and the 3D object can be seen directly in Blender Layout. But the script is saved only inside that specific .blend project file. So If I open that saved project, the script runs automatically. If I open a different .blend file, the script won't be there, and I have to copy or retype it again. I want to open every new Blender project and included the script so I try to save the script 1. Open Blender 2. Paste and register the script 3. Go to File → Defaults → Save Startup File. Now, every new Blender project I create will include the script. Other way, the 3D object will 'automatcally' open in the Blender. But this script will affects all new projects, so when I want to make a new 3d Object I need to disable the script and delete it first. </details> <details> <summary>Blender API Add on </summary> I figure out that I need to make Add on for my script that run automatically when i wan to Import the 3D object. The code for Add on : ``` bl_info = { "name": "Auto Import Latest OBJ", "blender": (4, 0, 0), "category": "Import-Export", "author": "Andrew Bartholomeo", "version": (1, 0), "description": "Automatically imports the latest OBJ file from the shap-e output folder", } import bpy import os WATCH_FOLDER = r"C:\Users\Andrew Bartholomeo\shap-e" def get_latest_obj_file(folder): files = [f for f in os.listdir(folder) if f.endswith(".obj")] if not files: return None files = [os.path.join(folder, f) for f in files] return max(files, key=os.path.getctime) def import_latest_obj(): latest_obj = get_latest_obj_file(WATCH_FOLDER) if latest_obj: print(f"Importing {latest_obj}") bpy.ops.wm.obj_import(filepath=latest_obj) else: print("No OBJ file found.") class OBJECT_OT_import_latest_obj(bpy.types.Operator): bl_idname = "object.import_latest_obj" bl_label = "Import Latest OBJ" bl_description = "Imports the latest OBJ file from the shap-e output folder" bl_options = {'REGISTER', 'UNDO'} def execute(self, context): import_latest_obj() return {'FINISHED'} class VIEW3D_PT_auto_import_panel(bpy.types.Panel): bl_label = "Auto Import OBJ" bl_idname = "VIEW3D_PT_auto_import_obj" bl_space_type = 'VIEW_3D' bl_region_type = 'UI' bl_category = 'Auto Import' def draw(self, context): layout = self.layout layout.operator("object.import_latest_obj", text="Import Latest OBJ") def register(): bpy.utils.register_class(OBJECT_OT_import_latest_obj) bpy.utils.register_class(VIEW3D_PT_auto_import_panel) def unregister(): bpy.utils.unregister_class(OBJECT_OT_import_latest_obj) bpy.utils.unregister_class(VIEW3D_PT_auto_import_panel) if __name__ == "__main__": register() ``` Step by step how to make Add on : 1. Copy the full code above into a text file. 2. Save it as: `auto_import_obj.py` 3. In Blender: Go to Edit → Preferences → Add-ons → Install 4. Select `auto_import_obj.py` 5. Check the box to enable it 6. Press N in the 3D Viewport to open the sidebar. 7. Go to the "Auto Import" tab → click "Import Latest OBJ". ![Screenshot 2025-04-30 221501](https://hackmd.io/_uploads/HJytx3kxxe.png) ![Screenshot 2025-04-30 221635](https://hackmd.io/_uploads/rkk1-n1lgl.png) </details> ### :small_orange_diamond: Week 11 <details> <summary>Fix GPU Usage Shap-e (error) 1</summary> In the previos weeks I had a problem that I need a long tme to generate 3d model use shap-e. I think the main problem is I use CPU instead of GPU to render it, so the process requaired a long time to finish. To fix this I try to instal PyTorch3D in my Conda environment I had a problem that I couldn’t install PyTorch3D in my Conda environment. The main issue was a mismatch between the CUDA version installed on my system and the version PyTorch3D was compiled against. Specifically, PyTorch was built with CUDA 11.8, but my system initially had CUDA 12.8, which caused a RuntimeError when trying to build PyTorch3D from source. At first, I tried to install PyTorch3D directly from the GitHub repository using ``` pip install "git+https://github.com/facebookresearch/pytorch3d.git" ``` However, this failed with various errors, including missing build tools `ninja`, missing dependencies `CUB_HOME`, and eventually a critical WinError 2 indicating a system path problem. Building from source on Windows turned out to be much more complex than I expected. Then I followed the advice to avoid building from source and instead use a precompiled wheel for PyTorch3D. I set up my environment carefully by making sure I had: Python 3.10.16 (64-bit) PyTorch 2.1.0 with CUDA 11.8 CUDA 11.8 Toolkit installed After configuring the environment, I tried installing PyTorch3D from Facebook's official prebuilt wheels using: ``` pip install pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py3.10_cu118_pyt210/download.html ``` Unfortunately, it still failed with the message: `Could not find a version that satisfies the requirement pytorch3d` I double-checked my Python and PyTorch versions, and they matched the requirements. I also faced a warning about `NumPy 2.2.4` being incompatible with some modules, and was advised to downgrade to `NumPy 1.x` to avoid internal API errors. Eventually, despite trying multiple approaches, I still couldn’t get PyTorch3D to install using the pip method. In the final step, I was advised to manually download the `.whl` file from the official link and install it locally, but I didn’t proceed with that part due to time constraints. </details> <details> <summary>Fix GPU Usage Shap-e (error) 2</summary> At first, I tried running `pip install pytorch3d`, but it failed with an error saying no matching distribution was found. I learned that PyTorch3D isn’t published on PyPI for every platform, especially for Windows with certain CUDA versions. Then I tried using `conda install -c pytorch3d pytorch3d`, but that also didn’t work — it couldn’t find the package on the specified Conda channel. After that, I was advised to clone the PyTorch3D GitHub repository and try installing it from source. I ran `pip install -r requirements.txt`, but that failed because the requirements.txt file didn’t exist in the repository. So I manually installed the required dependencies like `fvcore`, `iopath`, `torchmetrics`, `opencv-python`, and `numpy`. Then I tried `python setup.py install`, but it threw a [WinError 2] saying that a file was missing. This suggested that either I wasn’t in the correct directory or some build tools were missing. To avoid more issues, I followed a simpler suggestion: using a direct pip install from GitHub with this command: ``` pip install git+https://github.com/facebookresearch/pytorch3d.git ``` But it still couldn't work, I cant figure out why. So I try to edit the code, I wanted my code to force GPU usage and not fallback to CPU. Originally, I was using this line in my code: ``` device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') ``` But this would automatically use CPU if CUDA wasn't available, which I didn’t want. I was advised to change it to: ``` if not torch.cuda.is_available(): raise RuntimeError("CUDA is not available. Please run on a machine with a GPU.") device = torch.device('cuda') ``` This way, the program will stop immediately if no GPU is detected, which is exactly what I needed. I try to rerun everything and it still the same as the previous week. I think the code is already using GPU but because of my GPU Vram is only 6GB so it takes longer time to render. </details> ### :small_orange_diamond: Week 12 <details> <summary>Update Generated obj</summary> In this week, I try to fix my code for generated shap-e 3d obj. In the previous weeks, everytime i'm using shap-e to generate 3D obj, the output only 1 per batch. Of course this can be sett to how many images I want fr the output per batch. But the main issues is, everytime this AI generated 3D obj, the output will always changed to the latest output. It means that the previous obj that generated by shap-e will always missing and displaced by the latest obj that is just being generated. Shap-e cannot save the previous file, so its can only save the generated output at that time. To fix this, I change the `generate_mesh.py` code for a bit and adding some system that can save every of shap-e output. ``` import re from datetime import datetime import torch from shap_e.diffusion.sample import sample_latents from shap_e.diffusion.gaussian_diffusion import diffusion_from_config from shap_e.models.download import load_model, load_config from shap_e.util.notebooks import create_pan_cameras, decode_latent_images from shap_e.util.notebooks import gif_widget # if you use it from shap_e.util.notebooks import decode_latent_mesh # — helper to make a filesystem-safe base name from your prompt — def slugify(text: str, maxlen: int = 50) -> str: s = re.sub(r'[^a-zA-Z0-9\-]+', '_', text.lower()).strip('_') return s[:maxlen] # — configuration — device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') batch_size = 1 guidance_scale = 15.0 prompt = "a sofa" # — prep models & diffusion — xm = load_model('transmitter', device=device) model = load_model('text300M', device=device) diffusion = diffusion_from_config(load_config('diffusion')) # — sample latents from your text prompt — latents = sample_latents( batch_size=batch_size, model=model, diffusion=diffusion, guidance_scale=guidance_scale, model_kwargs=dict(texts=[prompt] * batch_size), progress=True, clip_denoised=True, use_fp16=True, use_karras=True, karras_steps=64, sigma_min=1e-3, sigma_max=160, s_churn=0, ) # — optional: render images as GIFs or stills — render_mode = 'nerf' # or 'stf' size = 64 cameras = create_pan_cameras(size, device) for i, latent in enumerate(latents): images = decode_latent_images(xm, latent, cameras, rendering_mode=render_mode) # e.g. display them in-notebook or save frames as you like # — now export meshes with unique filenames — base_name = slugify(prompt) timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") for i, latent in enumerate(latents): mesh = decode_latent_mesh(xm, latent).tri_mesh() # construct unique filenames ply_filename = f"{base_name}_{timestamp}_{i}.ply" obj_filename = f"{base_name}_{timestamp}_{i}.obj" # write out PLY and OBJ with open(ply_filename, "wb") as f_ply: mesh.write_ply(f_ply) with open(obj_filename, "w") as f_obj: mesh.write_obj(f_obj) print(f"Saved mesh {i}: {ply_filename}, {obj_filename}") ``` What I changed 1. Imported `re` and `datetime` for filename processing. 2. Added `slugify()` function to turn input prompt into a safe base filename (lowercased, non- alphanumerics → `_`, truncated). 3. Added `timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")` so each run gets a unique stamp. 4. Rewrote the mesh-export loop to use: ``` base_name = slugify(prompt) ply_filename = f"{base_name}_{timestamp}_{i}.ply" obj_filename = f"{base_name}_{timestamp}_{i}.obj" ``` which guarantees no previous files are ever overwritten. 5. Kept the rest of sampling/rendering logic exactly the same—just replaced the hard-coded `example_mesh_{i}.obj` with dynamic names based the prompt and time. Now every time I run the script, I will see files like: ``` a_wooden_square_table_with_small_detail_cactus_vase_on_top_20250519_173012_0.obj a_wooden_square_table_with_small_detail_cactus_vase_on_top_20250519_173012_0.ply ``` and none of my old exports will get clobbered. </details> <details> <summary>Update Blender Add on</summary> In the previous weeks, I write some python script to make an ADD on in Blender. And its work well, I can automatically import the obj file generated by shap-e by clicking `Import Latest OBJ` in the `Auto Import OBJ` add on. This week I try to improve the add on, so not only can import the latest obj, but also I can pick The obj, which want I want to Import. ``` bl_info = { "name": "Auto Import OBJ (Custom Loader)", "blender": (4, 0, 0), "category": "Import-Export", "author": "Andrew Bartholomeo", "version": (1, 1), "description": "Shows a dropdown of OBJ files in a folder and imports them via a pure-Python loader", } import bpy import os # Set this to your shap-e output directory WATCH_FOLDER = r"C:\Users\Andrew Bartholomeo\shap-e" def scan_obj_files(self, context): items = [] if os.path.isdir(WATCH_FOLDER): for fn in sorted(os.listdir(WATCH_FOLDER)): if fn.lower().endswith(".obj"): items.append((fn, fn, "")) if not items: items = [("NONE", "No OBJ files found", "")] return items def load_obj_to_blender(filepath, name="Imported_OBJ"): verts = [] faces = [] with open(filepath, 'r') as f: for line in f: if line.startswith('v '): parts = line.strip().split()[1:4] verts.append((float(parts[0]), float(parts[1]), float(parts[2]))) elif line.startswith('f '): idx = [int(p.split('/')[0]) - 1 for p in line.strip().split()[1:]] faces.append(idx) mesh = bpy.data.meshes.new(name) mesh.from_pydata(verts, [], faces) mesh.update() obj = bpy.data.objects.new(name, mesh) bpy.context.collection.objects.link(obj) bpy.context.view_layer.objects.active = obj obj.select_set(True) class OBJECT_OT_import_selected_obj(bpy.types.Operator): bl_idname = "object.import_selected_obj" bl_label = "Import Selected OBJ" bl_description = "Import the selected OBJ file" bl_options = {'REGISTER', 'UNDO'} def execute(self, context): fn = context.scene.auto_import_obj_file if fn and fn != "NONE": path = os.path.join(WATCH_FOLDER, fn) self.report({'INFO'}, f"Importing {fn}") load_obj_to_blender(path, name=os.path.splitext(fn)[0]) else: self.report({'WARNING'}, "No valid OBJ selected") return {'FINISHED'} class VIEW3D_PT_auto_import_panel(bpy.types.Panel): bl_label = "Auto Import OBJ" bl_idname = "VIEW3D_PT_auto_import_obj" bl_space_type = 'VIEW_3D' bl_region_type = 'UI' bl_category = 'Auto Import' def draw(self, context): layout = self.layout scn = context.scene layout.prop(scn, "auto_import_obj_file", text="Choose OBJ") layout.operator("object.import_selected_obj", text="Import Selected OBJ") classes = ( OBJECT_OT_import_selected_obj, VIEW3D_PT_auto_import_panel, ) def register(): for cls in classes: bpy.utils.register_class(cls) bpy.types.Scene.auto_import_obj_file = bpy.props.EnumProperty( name="OBJ Files", description="Pick an OBJ file to import", items=scan_obj_files, ) def unregister(): for cls in reversed(classes): bpy.utils.unregister_class(cls) del bpy.types.Scene.auto_import_obj_file if __name__ == "__main__": register() ``` ![Screenshot 2025-05-20 222321](https://hackmd.io/_uploads/ry9RWfcWxx.png) ![Screenshot 2025-05-20 222336](https://hackmd.io/_uploads/S1NyGG5Wlg.png) ![Screenshot 2025-05-20 222353](https://hackmd.io/_uploads/SkTyffqWxl.png) ![Screenshot 2025-05-20 222813](https://hackmd.io/_uploads/S1IgGMqWgl.png) </details> ### :small_orange_diamond: Summer Vacation <details> <summary>Try to build the img-to-3d</summary> In this week of summer vacation, I tried to build and learn the img-to-3d function. I use shap-e to generate the 3d obj from 2d img. At first, the 3d obj is not good enough, but I try to sett the parameter and this bellow code is the best parameter i can adjust so the 3d obj looks good enough. The problem i have is the time of the rendering, it takes long time (10 mins) to generate the obj file. I use the cuda and GPU but Im still cannot fint the answer regarding this problem. ``` import torch from shap_e.diffusion.sample import sample_latents from shap_e.diffusion.gaussian_diffusion import diffusion_from_config from shap_e.models.download import load_model, load_config from shap_e.util.notebooks import create_pan_cameras, decode_latent_images, gif_widget from shap_e.util.image_util import load_image device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') xm = load_model('transmitter', device=device) model = load_model('image300M', device=device) diffusion = diffusion_from_config(load_config('diffusion')) batch_size = 1 guidance_scale = 3.0 image = load_image("C:/Users/Andrew Bartholomeo/shap-e/shap_e/examples/example_data/modern-cushion-chair.jpg") latents = sample_latents( batch_size=batch_size, model=model, diffusion=diffusion, guidance_scale=guidance_scale, model_kwargs=dict(images=[image] * batch_size), progress=True, clip_denoised=True, use_fp16=True, use_karras=True, karras_steps=64, sigma_min=1e-3, sigma_max=160, s_churn=0, ) render_mode = 'nerf' # you can change this to 'stf' for mesh rendering size = 64 # this is the size of the renders; higher values take longer to render. cameras = create_pan_cameras(size, device) for i, latent in enumerate(latents): images = decode_latent_images(xm, latent, cameras, rendering_mode=render_mode) from shap_e.util.notebooks import decode_latent_mesh for i, latent in enumerate(latents): t = decode_latent_mesh(xm, latent).tri_mesh() with open(f'example_mesh_{i}.ply', 'wb') as f: t.write_ply(f) with open(f'example_mesh_{i}.obj', 'w') as f: t.write_obj(f) ``` ![modern-cushion-chair](https://hackmd.io/_uploads/HJvN0Z4cgx.jpg) ![Screenshot 2025-09-02 142709](https://hackmd.io/_uploads/B1VI0bVclg.png) </details> <details> <summary>Text-to-3D with CMD Prompt Input</summary> In this week of summer vacation, I tried to improve the text-to-3d code for the efficiency. I make two changes for the code: 1. Remove the PLY export, keep only the OBJ. 2. Make the program ask for the prompt in the terminal (cmd) instead of hardcoding it. So basically the code before is generate the obj file and ply file. I figure out this is maybe the problem why it takes long time to generate a 3d obj. So in this update, im remove the PLY export and only keep the OBJ export. The duration of time is slightly decreased (2 minute), but my target is under 3 minute. In the code before, if I want to input a prompt, i need to hardcoding it. Means that i need to type the rompt in the code itself. But now i change it to make the program ask for the prompt in the terminal instead in the code. ``` import re from datetime import datetime import torch from shap_e.diffusion.sample import sample_latents from shap_e.diffusion.gaussian_diffusion import diffusion_from_config from shap_e.models.download import load_model, load_config from shap_e.util.notebooks import create_pan_cameras, decode_latent_images from shap_e.util.notebooks import decode_latent_mesh # — helper to make a filesystem-safe base name from your prompt — def slugify(text: str, maxlen: int = 50) -> str: s = re.sub(r'[^a-zA-Z0-9\-]+', '_', text.lower()).strip('_') return s[:maxlen] # — configuration — device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') batch_size = 1 guidance_scale = 15.0 # — ask user for prompt from CMD — prompt = input("Please input the prompt: ") # — prep models & diffusion — xm = load_model('transmitter', device=device) model = load_model('text300M', device=device) diffusion = diffusion_from_config(load_config('diffusion')) # — sample latents from your text prompt — latents = sample_latents( batch_size=batch_size, model=model, diffusion=diffusion, guidance_scale=guidance_scale, model_kwargs=dict(texts=[prompt] * batch_size), progress=True, clip_denoised=True, use_fp16=True, use_karras=True, karras_steps=64, sigma_min=1e-3, sigma_max=160, s_churn=0, ) # — optional: render images as GIFs or stills — render_mode = 'nerf' # or 'stf' size = 64 cameras = create_pan_cameras(size, device) for i, latent in enumerate(latents): images = decode_latent_images(xm, latent, cameras, rendering_mode=render_mode) # you can display or save images if you want # — export OBJ only — base_name = slugify(prompt) timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") for i, latent in enumerate(latents): mesh = decode_latent_mesh(xm, latent).tri_mesh() # construct unique filename obj_filename = f"{base_name}_{timestamp}_{i}.obj" # write out OBJ only with open(obj_filename, "w") as f_obj: mesh.write_obj(f_obj) print(f"Saved mesh {i}: {obj_filename}") ``` ![Screenshot 2025-09-02 152103](https://hackmd.io/_uploads/HkmWjGN9el.png) </details> <details> <summary>Img-to-3D with CMD Prompt Input</summary> In this week of summer vacation, I tried to improve the img-to-3d code for the efficiency. I make two changes for the code: 1. Remove the PLY export, keep only the OBJ. 2. Make the program ask for the img path in the terminal (cmd) instead of hardcoding it. So basically the code before is generate the obj file and ply file. I figure out this is maybe the problem why it takes long time to generate a 3d obj. So in this update, im remove the PLY export and only keep the OBJ export. The duration of time is slightly decreased (2 minute), but my target is under 3 minute. In the code before, if I want to input a image, i need to sopy the image path in my local, i need to hardcoding it. Means that i need to type and copy the image path in the code itself. But now i change it to make the program ask for the full path in the terminal instead in the code. ``` import torch from shap_e.diffusion.sample import sample_latents from shap_e.diffusion.gaussian_diffusion import diffusion_from_config from shap_e.models.download import load_model, load_config from shap_e.util.notebooks import create_pan_cameras, decode_latent_images from shap_e.util.image_util import load_image from shap_e.util.notebooks import decode_latent_mesh from datetime import datetime import os # — device setup — device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # — load models — xm = load_model('transmitter', device=device) model = load_model('image300M', device=device) diffusion = diffusion_from_config(load_config('diffusion')) batch_size = 1 guidance_scale = 3.0 # — ask for image path via CMD — image_path = input("Please input the full path to your image: ").strip().strip('"') # check if the file exists before continuing if not os.path.isfile(image_path): raise FileNotFoundError(f"Image not found: {image_path}") # load image image = load_image(image_path) # — generate latents — latents = sample_latents( batch_size=batch_size, model=model, diffusion=diffusion, guidance_scale=guidance_scale, model_kwargs=dict(images=[image] * batch_size), progress=True, clip_denoised=True, use_fp16=True, use_karras=True, karras_steps=64, sigma_min=1e-3, sigma_max=160, s_churn=0, ) # — render images (optional) — render_mode = 'nerf' # or 'stf' size = 64 cameras = create_pan_cameras(size, device) for i, latent in enumerate(latents): images = decode_latent_images(xm, latent, cameras, rendering_mode=render_mode) # — export meshes — timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") base_name = os.path.splitext(os.path.basename(image_path))[0] for i, latent in enumerate(latents): mesh = decode_latent_mesh(xm, latent).tri_mesh() obj_filename = f"{base_name}_{timestamp}_{i}.obj" with open(obj_filename, "w") as f_obj: mesh.write_obj(f_obj) print(f"Saved mesh {i}: {obj_filename}") ``` ![Screenshot 2025-09-02 203740](https://hackmd.io/_uploads/ByP7HDEqex.png) </details> <details> <summary>Img-to-3D with CMD pop up file explorer Input</summary> In my updated version of the code, I changed how the input image is selected. Before, I had to hardcode the full image path directly into the script, which was inconvenient and error-prone. Now, I integrated Python’s Tkinter file dialog so that when I run the program, a window pops up and lets me click to choose an image instead of typing or pasting the path in CMD. To do this, I imported `Tk` and `filedialog`, added `Tk().withdraw()` to hide the empty Tkinter window, and then used `filedialog.askopenfilename()` with filters for image formats like PNG, JPG, JPEG, and BMP. This modification makes the workflow much easier and more user-friendly because I don’t need to worry about paths or quotes in CMD anymore; I can just select the picture directly. ``` import torch from shap_e.diffusion.sample import sample_latents from shap_e.diffusion.gaussian_diffusion import diffusion_from_config from shap_e.models.download import load_model, load_config from shap_e.util.notebooks import create_pan_cameras, decode_latent_images from shap_e.util.image_util import load_image from shap_e.util.notebooks import decode_latent_mesh from datetime import datetime import os from tkinter import Tk, filedialog # — device setup — device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # — load models — xm = load_model('transmitter', device=device) model = load_model('image300M', device=device) diffusion = diffusion_from_config(load_config('diffusion')) batch_size = 1 guidance_scale = 3.0 # — open a file picker for image selection — Tk().withdraw() # hide empty Tkinter root window image_path = filedialog.askopenfilename( title="Select an image to convert to 3D", filetypes=[("Image files", "*.png;*.jpg;*.jpeg;*.bmp")] ) if not image_path: raise ValueError("No image selected!") print(f"Selected image: {image_path}") # load image image = load_image(image_path) # — generate latents — latents = sample_latents( batch_size=batch_size, model=model, diffusion=diffusion, guidance_scale=guidance_scale, model_kwargs=dict(images=[image] * batch_size), progress=True, clip_denoised=True, use_fp16=True, use_karras=True, karras_steps=64, sigma_min=1e-3, sigma_max=160, s_churn=0, ) # — render images (optional preview) — render_mode = 'nerf' # or 'stf' size = 64 cameras = create_pan_cameras(size, device) for i, latent in enumerate(latents): images = decode_latent_images(xm, latent, cameras, rendering_mode=render_mode) # — export meshes as OBJ only — timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") base_name = os.path.splitext(os.path.basename(image_path))[0] for i, latent in enumerate(latents): mesh = decode_latent_mesh(xm, latent).tri_mesh() obj_filename = f"{base_name}_{timestamp}_{i}.obj" with open(obj_filename, "w") as f_obj: mesh.write_obj(f_obj) print(f"Saved mesh {i}: {obj_filename}") ``` ![Screenshot 2025-09-02 204138](https://hackmd.io/_uploads/rJPEyFV9ex.png) ![bed-queen-4G4ZyD0-600](https://hackmd.io/_uploads/r1evyKVclx.jpg) ![Screenshot 2025-09-02 222916](https://hackmd.io/_uploads/S1RByFNcgg.png) ![Screenshot 2025-09-02 223126](https://hackmd.io/_uploads/SkdAyY49gx.png) </details> ### :small_orange_diamond: Next Semester Progress <details> <summary>Link To Next Semester Note and Project</summary> https://hackmd.io/@andrewbartholomeo/BkQcVC1ixg </details>