# 2021 vehicle reid paper survey ###### tags: `Reid` # 總結 - Self-supervised + attentive # Outlone paper name 1. Discovering Discriminative Geometric Features with Self-Supervised Attention for Vehicle Re-Identification and Beyond 2. The Devil is in the Details: Self-Supervised Attention for Vehicle Re-Identification ((ECCV) 2020) 3. Cluster Contrast for Unsupervised Person Re-Identification ## `1` Discovering Discriminative Geometric Features with Self-Supervised Attention for Vehicle Re-Identification and Beyond ECCV2020 ### Unknow - cosine classifier - D2-Net [14] - data 有 label - 正則化 - 第1個,就是減少特徵,把他的權重變成0,這叫L1正規化。 - 第2個,就是減少特徵權重差異,讓某些特徵的權重不要太突出,這叫L2正規化 ### Abstract - - we aim to address the challenge of automatically learning to detect geometric features as landmarks with no extra labels - 主要自動學習landmarks based on selfsupervised attention with no extra labels - 下圖(a)other method manage label,(b,c)沒有使用extra information no label 產生的landmark - ![](https://i.imgur.com/t1d1ysz.png) - ![](https://i.imgur.com/H8n22qm.png) ### Introduction - Three branches - global: for image feature extraction - attentional: producing attention masks (local geometric features) - self-supervised : regularizing the attention learning by sharing the attention encoder - 每個分支機構都會引入自己的損失進行培訓,並且所有損失都將同時進行優化。 在測試時,僅激活車輛ReID的全局和關注分支。 - Contributions - learn discriminative geometric features for vehicle ReID based on self-supervised attention with only ID labels - we propose an end-to-end,Unlike another pre-trained network to localize discriminative parts ### Related works - 與paper4 SAVER 比較,這篇paper更好的地方是 - no need for pretraining or preprocessing of input images. - testing preformance比SAVER ### Method ### Self-supervised learning - SSL嘗試通過探索數據的內在屬性來學習深度表示[27],從而提供了監督增強功能,以促進下游任務的學習 ### Our approach: VAL + SSL for geometric feature discovery - ### System overview - Regularization perspective from lifting - SB primarily regularizes the AB for producing geometric features from the attention maps. (SB去tune AB) - Backbone : ResNet50 - pre-trained on ImageNet - Three line - Fs : self Supervised learning hidden info (可以增加attention強度) - FA : gA(fG*fA) - FS : gsfs - f, g denote different functions in each branch and L denotes a loss function - ![](https://i.imgur.com/5O7X2Z2.png) ### Instantiation using deep neural networks - 1 - ![](https://i.imgur.com/35aJf1F.png) - ![](https://i.imgur.com/WepfBrs.png) ### Self-supervised branch - we rotate each training image by 0, 90, 180 or 270 and assign 0, 1, 2, or 3 as its pseudo label accordingly.(為了產生landmark) - SB training 4個角度 但是模型也可以推廣到其他四個角度。(Fig 5 ) - only on training - geometric feature discovery ### loss - hard mining triplet loss - smoothed cross-entropy loss ### augmentation in training - ![](https://i.imgur.com/i7MQyHY.jpg) --- --- ## 2 Unsupervised Vehicle Re-Identification via Self-supervised Metric Learning using Feature Dictionary ### Q - 背景特徵、顏色特徵怎麼來的 - ### Abstract - initially extracts features from vehicle images and stores them in a dictionary,based on the dictionary, the proposed method conducts dictionary-based positive label mining (DPLM) to search for positive labels - 先取feature存在dictionary,在根據 dictionary 做triplet ## 3 (黃子愷) Regularizing Deep Networks with Semantic Data Augmentation   ### Q - 需要在ImageNet等較大規模的數據集上效果比較明顯 ### Abstract - 常規的數據增廣方案,例如翻轉,平移或旋轉,是low-level的,與數據無關且與類無關的操作,導致擴充樣本的多樣性有限。本文提出Semantic Data增廣算法,比如改變目標的背景或視角 ### Introduction - 例如,在訓練集內給每個類別訓練一個GAN,我們就可以從中得到無窮多的樣本。不幸的是,該過程會很消耗算力 - 朝特定方向轉化特徵對應著意義豐富的語義變換 - 若我們任意收集一定數量藍色汽車和紅色汽車的圖片,取得前者深度特徵均值指向後者深度特徵均值的向量,則這一向量就代表了“將汽車的顏色由藍色變為紅色”這一語義變換。 - ![](https://i.imgur.com/xgx1I3J.png) - 在特徵空間完成擴增過程,無需訓練任何輔助生成模型(如GAN等) ### SEMANTIC TRANSFORMATIONS IN DEEP FEATURE SPACE - 如“修改汽車顏色”或“修改背景”的變換可以通過深度特徵的線性轉化實現,沿著對應變換的語義方向 - 作者認為,在最後的特徵層,通過增加一定的平移對應不同的語義上的變換,但是,作者也指明了,並非所有的方向都是一個有意義的方向,比如這個方向可能是戴上眼鏡,這個方向對於人來說是有意義的,但是對於汽車飛機就沒有意義了.所以我們需要從一個有意義的分佈中採樣,作者假設該分佈是一個零均值的正態分佈(zero-mean normal distribution) ![](https://i.imgur.com/7BiRLy7.png) - 增廣後的特徵可直接用於訓練,就沒有必要來顯示語義變換的結果 ### Semantic Direction Sampling - 方向要對應圖像中主要物體的有意義的語義變換,而不會改變圖像類別的身份 - 每個類別的方向應該對應著意義豐富的變換,這些方向通過該類別的協方差矩陣(covariance matrix)能很好地表示 - 從一個零均值正態分佈中選擇語義方向,利用估計的類別條件協方差矩陣。該協方差矩陣可以獲取訓練數據中類內特徵的分佈 - 所以不同的類別的樣本會從不同的正態分佈 - ![](https://i.imgur.com/jwpIyo1.png) - 如下圖,“鳥”這一類的樣本在“飛翔”這一方向上具有較大的方差,因為訓練數據中同時包含“飛翔”和“不飛翔”的鳥,相對而言,其在“變老”這一方向上方差幾乎為0,因為數據中不可能存在“老”或“年輕”的鳥。在多維空間中,我們可以利用類內特徵分佈刻畫某類圖像可能在哪些方向上有語義的變化。 - ![](https://i.imgur.com/9sCXUPy.png) ### Algorithm 1 The ISDA - - ![](https://i.imgur.com/G1cBOpN.png) --- --- # 4 The Devil is in the Details: Self-Supervised Attention for Vehicle Re-Identification ((ECCV) 2020) - Self-Supervised做了甚麼 - 凸組合式甚麼 - convex combination - Ic = α × Io + (1 − α) × Ir (找到weight 比例) ### Abstract - 近幾年有很多用基於注意力的模型來解決車輛再識別(re-id)問題,特別關注車輛中包含有識別信息的區域,這些re-id方法依賴於昂貴的關鍵點標籤,考慮到車輛re-id數據集,嚴格監督的方法無法跨不同領域擴展。這篇paper提出一種有效學習車輛特徵的方法。 - 提出方法 Self-supervised Attention for Vehicle Re-identification (SAVER) - 學習方式 (一起?) - 最左列(a,e):車輛圖像,第二列(b,f):粗化重建,第三列(c,g):殘差,最右列(d,h):歸一化殘差(便於可視化)。儘管有相同的粗糙重建,但兩輛車有不同的殘差突出關鍵區域,如擋風玻璃貼紙,保險槓設計。 - ![](https://i.imgur.com/9ey1T4B.png) - 解決在車輛re-id數據集的preformance ### Self-Supervised Attention for Vehicle Re-identification - Our proposed pipeline is composed of two modules, namely, Self-Supervised Residual Generation, and Deep Feature Extraction. - - 自動突出顯示車輛圖像中的顯著區域 - 自我監督的重建網絡負責創建車輛圖像的整體形狀和結構,出顯示顯著區域並通過從輸入圖像中減去重構來消除背景干擾物,將殘差和原始輸入圖像的凸組合(具有可訓練的參數α - ![](https://i.imgur.com/fqY3Y7i.png) - 輸入圖像經過卷積編碼器,並被映射到三維潛在變量通過潛在變量的均值µ和協方差Σ繪製標準多元高斯樣本並進行縮放。最以生成刪除了最細粒度細節的輸入圖像模板。。自監督圖像重構網絡生成粗圖像模板Ig後,我們將其從原始輸入中減去,得到殘差圖像,即Ir = Io−Ig。 - ![](https://i.imgur.com/QQHYNmu.png) - 在Vehicle Universe 進行了預訓練 (多種vehicle reid dataset組成) - 此殘差包含重新識別所需的關鍵細節 ### Self-Supervised Residual Generation - 用Self-Supervised 解決domain gap 問題 - We utilize data from several sources, including CompCars, StanfordCars, BoxCars116K, CityFlow, PKU VD1&VD2, Vehicle-1M, VehicleID, VeRi and VeRi-Wild to pre-train the self-supervised residual generation module - 如第4節所述,在訓練端到端管道之前,我們在第4.2.1節中介紹的大規模Vehicle Universe數據集上對該模型進行了預訓練,這種預訓練使重構模型可以概括為: 車輛圖像,其中包含各種品牌,型號,顏色,方向和圖像質量。 因此,它捕獲了領域不變特徵,以後可以針對特定數據集對其進行微調。 此外,預訓練可提高端到端管道訓練的收斂速度。 重要的是要注意,與傳統的VAE實現不同,我們使用三維潛在特徵圖,即通道,高度和寬度尺寸,而不是僅具有通道尺寸的一維潛在矢量,以提高重建質量並保留更多空間 信息。 此外,我們在計算公式時對LKL進行縮放。 1提高重建質量。 我們將在第5節中進一步探討KL散度縮放因子λ的影響。一旦自監督圖像重建網絡生成了粗糙圖像模板Ig,我們將從原始輸入中減去它以獲得殘差圖像,即Ir = Io-Ig。 - Formally, we pre-train our reconstruction model using the mean squared error (MSE) and Kullback-Leibler (KL) divergence ### variational auto-encoding (VAE) - why VAE - 與一般AE比,VAE多了noise ,希望在有noise的區間也能還原出Image - ![](https://i.imgur.com/QtAWO9C.png) - VAE的思想是,每個樣本都有自己特定的正太分佈q ( z ∣ x ) q(z|x)q ( z ∣ x ),我們有理由學習一個解碼器/生成器,把從特定正太分佈採樣的z zz還原為x,我們可從特定分佈q ( z ∣ x ) q(z|x)q ( z ∣ x )中隨機採樣,生成各式各樣與x xx類似的樣本,為了使模型具備通用生成能力(不根據真實樣本),我們希望所有的q ( z ∣ x ) q(z|x)q ( z ∣ x )都近似於標準正太分佈,這樣我們就可以從標準正太分佈中採樣,生成隨機樣本。 - 概念 - encode 產生兩組vector(m1,m2.m3)<--原本AE,(sig1,2,3) 另外還有使用gaussian直接產出來的(e1,e2,e3)三d vector,(sig,e)可以視為noise在後面的minimize要越接近0越好,因為(e)是固定gaussian的,所以noise取決於sig大小 - Minimize: mi(l2 Regularization)=0,sig=0 - ![](https://i.imgur.com/9Oov6hR.png) - VAE Q - 雖然output 要越接近input或是某個image越好,但是在VAE得判斷上是沒有positive的考慮,如下圖同樣都是一個pixel不同,人眼直接左邊比較像 - ![](https://i.imgur.com/IjTQ9i1.png) ---------------------------------------------------------------------- ### Cluster Contrast for Unsupervised Person Re-Identification 參考:https://zhuanlan.zhihu.com/p/361220658 ,https://arxiv.org/pdf/2103.11568v2.pdf * Unknow - Instance feature - ![](https://i.imgur.com/CG2oV6n.png) - contrastive loss - 優化[公式]把positive pair和negative pair分開的能力 - transfer learning - 與這個任務直接相關的資料並不多,但與任務不直接相關的資料卻不少( Youtube 的中英文語音資料來進行台語的語音辨識) - ![](https://i.imgur.com/GohNgB8.png) - Density-Based Spatial Clustering of Applications with Noise (DBScan) - two hyper-parameters - the maximum distance (由這個參數值為半徑劃出的圓型區域稱 為 ε-鄰域) - min sample (多少samples可以形成cluster 在這裡指for an instance) - 其餘不符合的視為noise - 相比 K-means,DBSCAN 不需要預先聲明聚類數量。 - 基於 "密度" 進行Cluster - ![](https://i.imgur.com/IfYxm04.png) - K-means - 基於距離的不同 - InfoNCE - like contrastive loss - similarity怎麼做的 - dot similarity - Jaccard distance [53] - unsupervised domain adaptation method - in ppt - 為甚麼不使用 triplet loss - 難以找到hard example ,過多的easy examples不要讓Model表現的好 * Abstarct - 為了解決標註成本過高的問題,無監督行人重識別受到了越來越多的關注。 * Introduction - Unsupervised Person Re-ID (主要分為以下三個過程) - 1.initialization:即對所有訓練數據提取特徵並存儲到Instance- level memory dictionary,之後對所有特徵聚類以生成偽標籤並根據偽 標籤計算出每個類別的cluster centroid feature - 2.是loss計算過程,即計算mini-batch內每個instance feature 和 cluster centroid feature的InfoNCE loss - 3.是Instance-level memory dictionary更新過程,主要是將mini- batch 內的每個instance feature 更新到相應的memory dictionary - 解決問題與動機 - Q1: 訓練數據每個類別實例數目不同(如圖1 所示)會導致每個類別的實例 特徵被更新的比例不同 - Q2: 二是聚類算法不可避免的會將不同類別的數據聚類到同一類(如圖2 所 示),這樣會導致cluster centroid feature 表示錯誤, - instance level cluster(previous) - updates memory時 use averaging operation of features,容易受到某 幾個noise導致不準確 and cluster size問題 * Related Work - Deep Unsupervised Person Re-ID - 1. trains model directory on unlabeled dataset - 2. utilizes transfer learning to improve unsupervised person re-ID - pipeline generally involves three stages: memory dictionary initialization,pseudo label generation, and neural network training - Memory Dictionary - Contrast learning [13] can be thought of as training an encoder for a dictionary look-up task - 在之前的USL,During training, instance feature vectors in memory dictionary are updated by the features of query instances in the same cluster. - storing one single cluster feature vector * Methodology - Cluster Contrast(overall) - 可以分為內存初始化階段和模型訓練階段,內存初始化階段又可以拆分為訓 練數據特徵提出、訓練數據偽標籤生產及cluster-level memory dictionary 初始化等過程,並且在每個epoch重複上述過程重新初始化內 存字典 - initialization, updating,and loss computation - upper part only assigned pseudo labels by clustering algorithm - lower part model training stage - baseline pipeline together - the cluster feature is initialized through sampling a random instance feature from the correspond - pseudo labels are changing during training - ![](https://i.imgur.com/uXcBwoA.png) - ![](https://i.imgur.com/lbTBYog.png) - cluster-level - each cluster is represented by a single feature vector - USL Person ReID as Contrastive Learning - ImageNet pre-train the neural network to extract feature,then DBScan [7] or K-means [28] clustering,contrastive loss is used to compute the loss values between the query instances and the memory dictionary. - Cluster-level Memory Initialization(part1) - 首先對訓練數據集提取特徵,之後對訓練數據集特徵聚類以生成偽標籤,最 後根據生成的偽標籤初始化cluster-level memory dictionary - clustering algorithm runs in every epoch so N is changing as the model trains - 如下圖random instance in the cluster to initialize the cluster feature - ci: cluster’s feature - xi: cluster set (train data min batch) - ![](https://i.imgur.com/M0GJk9R.png) - Memory Updating (part2) - P: p個person identities - K: K of instances for each person identity(fixed number) - Qi is the instance features set with cluster id i in current mini batch - like hard triplet loss - ![](https://i.imgur.com/JXnJ5JO.png) - Loss Function - https://zhuanlan.zhihu.com/p/129076690 - contrastive loss 分子找到postive算dot similiar,分母其他 negitive,理想狀態分子大分母小=1 log=0,下圖像分類交叉熵 - ![](https://i.imgur.com/VqMzUZP.png) - Discussion. - it represents each cluster as a single feature vector - it updates the cluster feature vector using the batch hard query instance feature vector. * Experiment - Implementation Details - ResNet-50 pre-trained on ImageNet -