AI / ML領域相關學習筆記入口頁面

Deeplearning.ai GenAI/LLM系列課程筆記

GenAI

AI Agents

AI Agents in LangGraph

RAG

Building Multimodal Search and RAG。多模態搜尋與RAG

Image Not Showing Possible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Learn More →

講得很基礎、
第二章以後大部分是Weaviate向量資料庫的操作API教學
RAG的部分講的不深、適合想學Weaviate操作的新手
如果對多模態基礎不了解的第一章可以看一下

Overview of Multimodality

Multimodal Embedding Models
Image Not Showing Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Training Multimodal Models
- Start with specialist models
  - 每種資料類型有其獨特的特徵，因此需要專門的模型來捕捉這些特徵
  - 專家模型是針對特定類型資料（如圖像、文本、音頻等）進行訓練的模型
- Unify the specialist models
  Image Not Showing Possible Reasons
  - The image was uploaded to a note which you don't have access to
  - The note which the image was originally uploaded to has been deleted
  Learn More →
Unify the Models Using Contrastive Learning
- Process to train any embedding model
  - 資料準備：收集多模態資料（如圖片和對應的描述文字）
  - 嵌入生成：使用專家模型生成各自的資料嵌入（向量表示）
  - 對比學習：在共享嵌入空間中，通過對比正樣本（如圖片和描述文字）和負樣本（不樣本的圖片和文字），學習有效的嵌入
- Unify multiple models
  - 嵌入空間共享：將各專家模型的嵌入映射到同一向量空間
  - 對比損失：使用對比損失函數，最大化正樣本的相似度，最小化負樣本的相似度
- Create one vector space
  - 一個統一的嵌入空間，使得不同模態的資料能夠在同一向量空間中進行比較和計算
- Tune models by providing contrastive examples
  - 利用正、負樣本對進行訓練，調整模型參數，使正樣本對的嵌入更接近，負樣本對的嵌入更遠
Understanding Contrastive Learning for Text。瞭解對比學習的概念，以文字為例
Image Not Showing Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- 學習之前：
  - 錨點（藍色）與正樣本（綠色）和負樣本（紅色）的位置
  - 初始狀態下，錨點與正樣本和負樣本之間的距離可能不反映其實際的語義相似度
- 學習過程：
  - 目標：調整嵌入，使錨點更靠近正樣本，遠離負樣本
  - 機制：通常通過對比損失函數實現，該函數鼓勵減少錨點與正樣本之間的距離，同時增加錨點與負樣本之間的距離
- 學習之後：
  - 錨點（藍色）現在更接近正樣本（綠色），且遠離負樣本（紅色）
  - 模型已學會區分語義上相似和不相似的對，使其能更好地辨別相關和不相關的例子
Contrastive Loss Function。對比學習損失函素
Image Not Showing Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →

Encoder (
$θ$ )：

功能：編碼器處理每張圖片 (
$I^{a}$ ,
$I^{+}$ , 和
$I^{-}$ ) 生成相應的嵌入 (
$f^{a}$ ,
$f^{+}$ , 和
$f^{-}$ )。

輸出：

$f^{a} = θ (I^{a})$

$f^{+} = θ (I^{+})$

$f^{-} = θ (I^{-})$

Distance Function (
$δ$ )：

目的：測量兩個嵌入之間的相似性或不相似性。

計算的距離類型：

$δ (f^{a}, f^{+})$ ：錨點與正樣本之間的距離。

$δ (f^{a}, f^{-})$ ：錨點與負樣本之間的距離。

學習目標：

最小化
$δ (f^{a}, f^{+})$ ：

目標：減少錨點與正樣本嵌入之間的距離，使它們更相似。

最大化
$δ (f^{a}, f^{-})$ ：

目標：增加錨點與負樣本嵌入之間的距離，使它們更不相似。

Understanding Contrastive Learning for Multimodal Data
類似的概念拓展到多模態資料
Image Not Showing Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
圖-文配對。將圖-文配對相近的嵌入空間拉近、反之，則推遠
- Finding contrastive examples
  Image Not Showing Possible Reasons
  - The image was uploaded to a note which you don't have access to
  - The note which the image was originally uploaded to has been deleted
  Learn More →
- Text Encoder（文本編碼器）：
  - 輸入：一系列的文本輸入（如 "Pepper the aussie pup"）
  - 輸出：編碼後的文本表示
    $T_{1}, T_{2}, \dots, T_{N}$ 。
  - 目的：將文本輸入轉換為數值向量表示，捕捉文本的語義。
- Image Encoder（圖像編碼器）：
  - 輸入：一系列的圖像輸入（如狗的圖片）
  - 輸出：編碼後的圖像表示
    $I_{1}, I_{2}, \dots, I_{N}$
  - 目的：將圖像輸入轉換為數值向量表示，捕捉圖像的視覺特徵
- 矩陣結構：
  - 矩陣顯示每個編碼圖像
    $I_{i}$ 和每個編碼文本
    $T_{j}$ 之間的互動
  - 矩陣中的每個單元格
    $I_{i} \cdot T_{j}$ 代表圖像
    $I_{i}$ 和文本
    $T_{j}$ 之間的相似度或互動度量
- 目標是通過比較嵌入找到圖像和文本描述之間的最佳匹配。
- 在訓練過程中，模型被優化以最大化正確圖像-文本對的相似度得分，並最小化錯誤對的得分
Understanding the Contrastive Loss Function
Image Not Showing Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- 嵌入生成:
  - $q_{i} = f (圖片)$ 表示通過函數
    $f$ （例如圖像編碼器）對圖片進行編碼，生成查詢向量
    $q_{i}$ 。
  - $k_{i} = g (圖片)$ 表示通過函數
    $g$ 對圖片進行編碼，生成鍵值向量
    $k_{i}$ 。
- 損失函數
  $L_{I, M}$
  
  $L_{I, M} = - \log \frac{\exp (q_{i}^{⊤} k_{i} / τ)}{\exp (q_{i}^{⊤} k_{i} / τ) + \sum_{j \neq i} \exp (q_{i}^{⊤} k_{j} / τ)}$
  - 分子
    $\exp (q_{i}^{⊤} k_{i} / τ)$ : 表示查詢向量
    $q_{i}$ 與正樣本鍵值向量
    $k_{i}$ 的相似度，經過溫度參數
    $τ$ 縮放
  - 分母
    $\exp (q_{i}^{⊤} k_{i} / τ) + \sum_{j \neq i} \exp (q_{i}^{⊤} k_{j} / τ)$ : 包含了查詢向量
    $q_{i}$ 與所有鍵值向量（包括正樣本
    $k_{i}$ 和負樣本
    $k_{j}$ ）的相似度和
    - 分母特別排除了正樣本自己
      $\sum_{j \neq i}$ ，更強調負樣本的對比
    - 通過將正樣本的距離放在分母中，損失函數實際上是在計算錨點和正樣本之間相似度的相對機率。錨點和正樣本的相似度在所有樣本（包括正樣本和負樣本）中的比例
    包含正樣本在分母中有助於在同一範圍內進行比較，使得優化過程更穩定，損失函數範圍在 (0,1) 之間，有助於梯度穩定。這樣的設計強調了相對相似度，而不是絕對相似度
  - $τ$ : 溫度參數，控制相似度分佈的集中程度
目標是最小化損失函數
$L_{I, M}$ ，即最大化查詢向量與正樣本鍵值向量之間的相似度，同時最小化與負樣本鍵值向量之間的相似度

Lab

核心程式碼

Contrastive Loss Function











# The ideal distance metric for a positive sample is set to 1, for a negative sample it is set to 0      
class ContrastiveLoss(nn.Module):
    def __init__(self):
        super(ContrastiveLoss, self).__init__()
        self.similarity = nn.CosineSimilarity(dim=-1, eps=1e-7)

    def forward(self, anchor, contrastive, distance):
        # use cosine similarity from torch to get score
        score = self.similarity(anchor, contrastive)
        # after cosine apply MSE between distance and score
        return nn.MSELoss()(score, distance) #Ensures that the calculated score is close to the ideal distance (1 or 0)

Model Training



































def train_model(epoch_count=10):#
    net = Network()
    lrs = []
    losses = []

    for epoch in range(epoch_count):
        epoch_loss = 0
        batches=0
        print('epoch -', epoch)
        lrs.append(optimizer.param_groups[0]['lr'])
        print('learning rate', lrs[-1])

        for anchor, contrastive, distance, label in tqdm(trainLoader):
            batches += 1
            optimizer.zero_grad()
            anchor_out = net(anchor.to(device))
            contrastive_out = net(contrastive.to(device))
            distance = distance.to(torch.float32).to(device)
            loss = loss_function(anchor_out, contrastive_out, distance)
            epoch_loss += loss
            loss.backward()
            optimizer.step()

        losses.append(epoch_loss.cpu().detach().numpy() / batches)
        scheduler.step()
        print('epoch_loss', losses[-1])

        # Save a checkpoint of the model
        checkpoint_path = os.path.join(checkpoint_dir, f'model_epoch_{epoch}.pt')
        torch.save(net.state_dict(), checkpoint_path)

    return {
        "net": net,
        "losses": losses
    }

視覺化顯示embedding。Visualize the Vector Space!

Generate 64d Representations of the Training Set











encoded_data = []
labels = []

with torch.no_grad():
    for anchor, _, _, label in tqdm(trainLoader):
        output = model(anchor.to(device))
        encoded_data.extend(output.cpu().numpy())
        labels.extend(label.cpu().numpy())

encoded_data = np.array(encoded_data)
labels = np.array(labels)

Reduce Dimensionality of Data: 64d -> 3d
- Apply PCA to reduce dimensionality of data from 64d -> 3d to make it easier to visualize!
```
pca = PCA(n_components=3)
encoded_data_3d = pca.fit_transform(encoded_data)
```

Interactive Scatter Plot in 3d – with PCA






















scatter = go.Scatter3d(
    x=encoded_data_3d[:, 0],
    y=encoded_data_3d[:, 1],
    z=encoded_data_3d[:, 2],
    mode='markers',
    marker=dict(size=4, color=labels, colorscale='Viridis', opacity=0.8),
    text=labels, 
    hoverinfo='text',
)

# Create layout

layout = go.Layout(
    title="MNIST Dataset - Encoded and PCA Reduced 3D Scatter Plot",
    scene=dict(
        xaxis=dict(title="PC1"),
        yaxis=dict(title="PC2"),
        zaxis=dict(title="PC3"),
    ),
    width=1000, 
    height=750,
)

Scatterplot in 2d - with UMAP


mapper = umap.UMAP(random_state=42, metric='cosine').fit(encoded_data)
umap.plot.points(mapper, labels=labels);

觀察隨著訓練的epoch增加，embedding空間分布的變化
每個資料點間距離越拉越開
- epoch=0
- epoch=97

…
後面章節大部分是Weaviate向量資料庫的操作API教學，揀選一些概念性的投影片放上
…

共享多模態向量空間（Shared Multimodal Vector Space）

將不同模態的數據映射到同一個向量空間，使得不同模態之間可以進行直接比較和運算(檢索)
不同模態的數據具有不同的特徵和結構。例如，圖像有像素值，文本有詞向量。共享多模態向量空間將這些異質數據轉換為統一的向量表示，使得不同模態的數據可以在同一平面上進行處理

vector search
Multivector Recommender Systems
Specialized Embedding Models

補充

其他對比學習損失函數設計

對比學習設計

公式

$L = - \log \frac{\exp (sim (v_{i}, v_{j}) / τ)}{\sum_{k = 1}^{N} \exp (sim (v_{i}, v_{k}) / τ)}$
- $sim (v_{i}, v_{j})$ ：
  - 表示樣本
    $v_{i}$ 和
    $v_{j}$ 之間的相似度。
  - 通常使用余弦相似度來計算，公式為：
    
    $sim (v_{i}, v_{j}) = \frac{v_{i} \cdot v_{j}}{‖ v_{i} ‖ ‖ v_{j} ‖}$
  - 相似度值越高，表示兩個嵌入的相似度越高。
- $τ$ (溫度參數)：
  - 這是一個控制分佈集中程度的縮放因子。
  - $τ$ 值越低，分佈越尖銳，模型更關注最相似的樣本對。
  - $τ$ 值越高，分佈越平滑。
1. 分子(正樣本間的相似度)：
  
  $\exp (sim (v_{i}, v_{j}) / τ)$
  - 計算錨點樣本
    $v_{i}$ 和正樣本
    $v_{j}$ 之間相似度的指數，並由溫度參數
    $τ$ 縮放。
  - 這個項強調
    $v_{i}$ 和
    $v_{j}$ 之間的相似度。
2. 分母(負樣本間的相似度)：
  
  $\sum_{k = 1}^{N} \exp (sim (v_{i}, v_{k}) / τ)$
  - 計算錨點樣本
    $v_{i}$ 與批次中所有其他樣本
    $v_{k}$ 之間相似度的指數和，並由
    $τ$ 縮放。
  - 包含正樣本和負樣本。
3. 對數和負號：
  
  $- \log ()$
  - 取對數將機率轉換為對數機率，使損失更穩定，適合基於梯度的優化
    - 非常低機率的小變化（例如從
      $0.001$ 到
      $0.0001$ ）在其對數值中會產生顯著差異，例如，
      $\log (0.001) \approx - 6.91$ 對比
      $\log (0.0001) \approx - 9.21$
    - 穩定性：對數函數的平滑效果防止非常小的機率在計算中引起不穩定。在優化模型時，極小的機率可能導致非常大的梯度，這會使優化過程變得不穩定或發散。對數轉換通過產生較為溫和的梯度值來緩解這一問題
  - 負號確保最大化正樣本對的機率（即提升模型）對應於最小化損失
直觀理解
- 損失函數鼓勵模型使錨點
  $v_{i}$ 和正樣本
  $v_{j}$ 之間的相似度（分子）儘可能高，相較於錨點
  $v_{i}$ 和所有其他樣本
  $v_{k}$ 的相似度（分母）。
- 通過最小化這個損失，模型學會區分正樣本對（相似項）和負樣本對（不相似項），有效地將相似項聚集在嵌入空間中，將不相似項分開。

AI / ML領域相關學習筆記入口頁面

Deeplearning.ai GenAI/LLM系列課程筆記

GenAI

AI Agents

RAG

Overview of Multimodality

Lab

視覺化顯示embedding。Visualize the Vector Space!

共享多模態向量空間（Shared Multimodal Vector Space）

補充

其他對比學習損失函數設計

Read more

[GenAI][AI Agents] Long-Term Agentic Memory With LangGraph - Introduction to Agent Memory

[GenAI][AI Agents] Long-Term Agentic Memory With LangGraph - Baseline Email Assistant

[AI Agents in LangGraph](https://learn.deeplearning.ai/courses/ai-agents-in-langgraph/lesson/1/introduction)

AI / ML領域相關學習筆記入口頁面