Best-Buddies Similarity for Robust Template Matching (BBS)

# Best-Buddies Similarity for Robust Template Matching (BBS) Student: Nguyen Xuan Hoang ID: 18020544 > :memo: Original paper: **[Best-Buddies Similarity for Robust Template Matching ][Best Buddies]** [Best Buddies]: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43841.pdf *Tali Dekel, MIT CSAIL Shaul Oron, Tel Aviv University Michael Rubinsteiny, Google Research Shai Avidan, Tel Aviv University William T. Freeman, MIT CSAIL* > :pushpin: Source code: **github.com/hoangngx/BBS** *Xiangyu (Shawn) Sun, Software Engineer at 2K* ### 1. Introduction Nowadays there are an enormous amount of methods to detect an object in an image using deep learning. However, there exists a simple, robust and parameter-free way to do so, called template matching. Template matching is a technique in digital image processing for finding small parts of an image which match a template image. It involves defining a measure or a cost to find the “similarity” between the (known) reference patterns and the (unknown) test pattern by performing the matching operation. ![Imgur](https://i.imgur.com/e9JnQ1w.jpg) **So why template matching is still not as widely used as deep learning to detect object?** Template matching methods have been used with great success over the years but they still suffer from a number of drawbacks. Typically, all pixels (or features) within the template and a candidate window in the target image are taken into account when measuring their similarity. This is undesirable in some cases, for example, when the background behind the object may change between the template and the target image (which occurs very often). ![Imgur](https://i.imgur.com/kJzUfxPh.png) *Same dog, but the background has changed dramatically* Thus, some methods like: SSD (Sum of Squared Differences), SAD (Sum of Absolute Differences), NCC (Normalized Cross-Correlation), etc. may take the background as part of the object to find similarities between two images, which might lead to false detections of the template. **But...BBS can solve that:** BBS measures the similarity between two sets of points in two templates. A key feature of this measure is that it relies only on a subset (usually small) of pairs of points – the Best-Buddies Pairs (BBPs). A pair of points is considered a BBP if each point is the nearest neighbor of the other in the corresponding point set. BBS is then taken to be the fraction of BBP out of all the points in the set. The following experiment will shown that BBS can neglect most of the pixels in the background and just focus on the desired object, which reduces the dissimilarities between two templates (will be seen it part 3). ### 2. Why BBS? As mentioned above, those template matching techniques (SSD, SAD, NCC,...) are used due to their computational effeciency, but not for the utility. In addition, some advanced algorithms were created to overpass the noise and outliers such as [M-estimators][M-s] or [Hamming-based distance][H-d]. However, all the methods mentioned so far assume a strict rigid geometric deformation (only translation) between the template and the target image, as they penalize pixel-wise differences at corresponding positions in the template and the query region. [M-s]: https://en.wikipedia.org/wiki/M-estimator#:~:text=In%20statistics%2C%20M%2Destimators%20are,special%20cases%20of%20M%2Destimators. [H-d]: https://en.wikipedia.org/wiki/Hamming_distance The BBS is a bi-directional measure. Specifically, the [Bidirectional Similarity (BDS)][BDS] was used as a similarity measure between two images, where an image is represented by a set of patches. The BDS sums over the distances between each patch in one image to its nearest neighbor in the other image, and vice versa. In contrast, the BBS is based only on a count of the BBPs. Moreover, the BDS does not distinguish between inliers and outliers. These proprieties makes the BBS a more robust and reliable measure as demonstrated by our experiments. [BDS]: http://www.wisdom.weizmann.ac.il/~vision/VisualSummary/bidirectional_similarity_CVPR2008.pdf ### 3. Method To best understand the effectiveness of BBS to distinguish outliers and inliers between two templates, let consider an example. ![Imgur](https://i.imgur.com/06NifQlh.png) The figure above can represent two images (P: top left, Q: top right) before calculating the BBPs (Best-Buddies Pairs) and after calculating it (second row). We can consider the blue points are the object we want to detech (foreground) and red points are the noise (background). For image P, where the foreground and background points are well separated, 95% of the BBPs are foreground points. For image Q, despite the significant overlap between foreground and background, 60% of the BBPs are foreground points. We can see that most of the red points (noise) in both template after finding BBPs are vanished. This example demonstrates the robustness of BBS to high level of outliers in the data. BBS captures the foreground points and does not force the background points to match. **Now let's dive into some math :face_with_open_mouth_vomiting:** ![Imgur](https://i.imgur.com/33dOBec.png) I found an interesting piece of code to this experiment written in C++ and Opencv2 > code to compute the BBS value of each point ``` vector<float> minVal1, minVal2; vector<int> idx1, idx2, ii1, ii2; for (int ix = 0; ix < N; ix++) { auto min1 = min_element(begin(D[ix]), end(D[ix])); minVal1.push_back(*min1); idx1.push_back(distance(begin(D[ix]), min1)); ii1.push_back(ix * N + idx1[ix]); } for (int ix = 0; ix < N; ix++) { auto min2 = min_element(begin(D_r[ix]), end(D_r[ix])); minVal2.push_back(*min2); idx2.push_back(distance(begin(D_r[ix]), min2)); ii2.push_back(ix * N + idx2[ix]); } vector<vector<int>> IDX_MAT1, IDX_MAT2; IDX_MAT1.resize(N); IDX_MAT2.resize(N); for (int i = 0; i < N; i++) { IDX_MAT1[i].resize(N); IDX_MAT2[i].resize(N); } int sum = 0, sum2 = 0; int pt1 = 0, pt2 = 0; for (int ix = 0; ix < N; ix++) { for (int jx = 0; jx < N; jx++) { IDX_MAT1[ix][jx] = 0; IDX_MAT2[ix][jx] = 999; if (pt1 < N && ((ix*N + jx) == ii1[pt1])) { IDX_MAT1[ix][jx] = 1; pt1++; } if (pt2 < N && ((jx*N + ix) == ii2[pt2])) { IDX_MAT2[ix][jx] = 1; pt2++; } if (IDX_MAT2[ix][jx] == IDX_MAT1[ix][jx]) sum += 1; } } BBS[rowi][coli] = sum; ``` ![Imgur](https://i.imgur.com/iSs9unb.png) > code to compute distance matrix ``` for (int idxP = 0; idxP < N; idxP++) { Mat Temp(9, N, CV_32FC3); for (int i = 0; i < Temp.cols; i++) { for (int j = 0; j < Temp.rows; j++) { Temp.at<Vec3f>(j, i)[0] = pow(((-TMat.at<Vec3f>(j, i)[0] + PMat.at<Vec3f>(j, idxP)[0])*Gaussian[j]), 2); Temp.at<Vec3f>(j, i)[1] = pow(((-TMat.at<Vec3f>(j, i)[1] + PMat.at<Vec3f>(j, idxP)[1])*Gaussian[j]), 2); Temp.at<Vec3f>(j, i)[2] = pow(((-TMat.at<Vec3f>(j, i)[2] + PMat.at<Vec3f>(j, idxP)[2])*Gaussian[j]), 2); } } for (int jx = 0; jx < N; jx++) { float res = 0; for (int ix = 0; ix < 9; ix++) { if (D[ix][jx] < 1e-4) D[ix][jx] = 0; res += Temp.at<Vec3f>(ix, idxP)[0]; res += Temp.at<Vec3f>(ix, idxP)[1]; res += Temp.at<Vec3f>(ix, idxP)[2]; } Drgb[jx][idxP] = res; } } ``` ### 4. Result - The experiment takes place on real world data. It compares BBS with six similarity measures commonly used for template matching 1) Sum-of-Square-Difference (SSD) 2) Sum-of-Absolute-Difference (SAD) 3) Normalized-CrossCorrelation (NCC) 4) Color Histogram Matching (HM) 5) Earth Movers Distance (EMD) 6) Bidirectional Similarity (BDS) computed in the same appearancelocation space as BBS ![Imgur](https://i.imgur.com/50LrfQO.png) **Examples results with annotated data.** Left, input images with the annotated template marked in green. Right, target images and the detected bounding boxes; the groundtruth (GT) marked in green (the experiment results in blue). BBS successfully match the template in all these examples. - In our own experiment with C++ and Opencv2, the data are images of two soda cans. Final result was much like we expected. ![Imgur](https://i.imgur.com/frv9M3X.png) **Examples result with own data, written in C++ and OpenCV2**. Left, input images with the annotated template marked in green. Right, target images and the detected bounding boxes; the groundtruth (GT) marked in green (the BBS results in red)