# 2023.10.13 ## LLaVa v1.5 (2023.10.05) * 更強大了 * Hallucination比較輕微,但還是存在 * 我們單純街景描述的任務真的需要這麼複雜的模型嗎? ## BLIP & BLIP-2 & GIT * 只需要Image - Text pairs 就可以fine tune * 更單純的資料集 大小: BLIP BLIP-2: 10.2GB GIT: 1.5GB [Fine-tune large image-captioning models using Hugging Face PEFT and int8 quantization](https://colab.research.google.com/drive/16XbIysCzgpAld7Kd9-xz-23VPWmqdWmW?usp=sharing#scrollTo=hLhbdBLNxBuF) ## 製作資料集 ### 真實世界資料集 * Mapillary Vistas 資料集 * mTurk 標註Captions * GPT-4 生Captions, 我們再微調 ### AI生圖 ## Detailed Description 1. **Location and General Setting:** - "This street view image captures a bustling urban scene on a sunny afternoon in New York City. The street is lined with tall, glass-fronted buildings, and people are walking along the sidewalk." 2. **Landmarks and Architecture:** - "In the foreground, you can see the iconic Flatiron Building, a triangular structure of ornate Beaux-Arts architecture, its terra cotta façade gleaming in the sunlight." 3. **People and Activities:** - "Pedestrians, a mix of professionals in business attire and tourists in casual clothing, are crossing the street at a busy intersection. Some are waiting at a hot dog vendor's cart." 4. **Vehicles and Traffic:** - "There's a steady flow of traffic on the road, with yellow taxis, black limousines, and several buses moving in both directions. The traffic signals are showing a green light for pedestrians to cross." 5. **Weather and Atmosphere:** - "The weather is clear, with a bright blue sky overhead, and there are a few fluffy white clouds. Long shadows are cast by the buildings, indicating that it's late afternoon." 6. **Street Details:** - "The street is paved with dark asphalt, and there are distinct crosswalks painted in white. A row of mature trees lines the sidewalk, providing some shade for pedestrians." 7. **Signage and Storefronts:** - "Various storefronts line the street, including a coffee shop with a chalkboard menu and a bookstore with an eye-catching window display. Neon signs advertising businesses add a colorful glow to the scene." 8. **Emotional or Historical Context:** - "This image captures the essence of the city that never sleeps, with a palpable sense of energy and movement. It's reminiscent of the classic street scenes captured by photographers during the mid-20th century." 9. **Focal Points:** - "The image's central focus is on a street performer playing a saxophone on a corner, attracting a small crowd of onlookers. His hat is laid out for tips, and a vibrant collection of coins and bills can be seen." 10. **Accessibility Considerations:** - "For accessibility, there are tactile paving blocks near the crosswalk, and the sidewalk is free of obstacles, providing a safe path for those with mobility challenges."