# 2023.10.13
## LLaVa v1.5 (2023.10.05)
* 更強大了
* Hallucination比較輕微,但還是存在
* 我們單純街景描述的任務真的需要這麼複雜的模型嗎?
## BLIP & BLIP-2 & GIT
* 只需要Image - Text pairs 就可以fine tune
* 更單純的資料集
大小:
BLIP
BLIP-2: 10.2GB
GIT: 1.5GB
[Fine-tune large image-captioning models using Hugging Face PEFT and int8 quantization](https://colab.research.google.com/drive/16XbIysCzgpAld7Kd9-xz-23VPWmqdWmW?usp=sharing#scrollTo=hLhbdBLNxBuF)
## 製作資料集
### 真實世界資料集
* Mapillary Vistas 資料集
* mTurk 標註Captions
* GPT-4 生Captions, 我們再微調
### AI生圖
## Detailed Description
1. **Location and General Setting:**
- "This street view image captures a bustling urban scene on a sunny afternoon in New York City. The street is lined with tall, glass-fronted buildings, and people are walking along the sidewalk."
2. **Landmarks and Architecture:**
- "In the foreground, you can see the iconic Flatiron Building, a triangular structure of ornate Beaux-Arts architecture, its terra cotta façade gleaming in the sunlight."
3. **People and Activities:**
- "Pedestrians, a mix of professionals in business attire and tourists in casual clothing, are crossing the street at a busy intersection. Some are waiting at a hot dog vendor's cart."
4. **Vehicles and Traffic:**
- "There's a steady flow of traffic on the road, with yellow taxis, black limousines, and several buses moving in both directions. The traffic signals are showing a green light for pedestrians to cross."
5. **Weather and Atmosphere:**
- "The weather is clear, with a bright blue sky overhead, and there are a few fluffy white clouds. Long shadows are cast by the buildings, indicating that it's late afternoon."
6. **Street Details:**
- "The street is paved with dark asphalt, and there are distinct crosswalks painted in white. A row of mature trees lines the sidewalk, providing some shade for pedestrians."
7. **Signage and Storefronts:**
- "Various storefronts line the street, including a coffee shop with a chalkboard menu and a bookstore with an eye-catching window display. Neon signs advertising businesses add a colorful glow to the scene."
8. **Emotional or Historical Context:**
- "This image captures the essence of the city that never sleeps, with a palpable sense of energy and movement. It's reminiscent of the classic street scenes captured by photographers during the mid-20th century."
9. **Focal Points:**
- "The image's central focus is on a street performer playing a saxophone on a corner, attracting a small crowd of onlookers. His hat is laid out for tips, and a vibrant collection of coins and bills can be seen."
10. **Accessibility Considerations:**
- "For accessibility, there are tactile paving blocks near the crosswalk, and the sidewalk is free of obstacles, providing a safe path for those with mobility challenges."