ORB-SLAM Relocalization

# ORB-SLAM Relocalization ## Overview: orb-slam relocate沒有previous frame資訊時須透過全局估計初始位姿。 1. calcuate bow of current frame (mCurrentFrame.ComputeBoW()) 2. find cadidate frames which have enough share words and the covisibility score is high enough (KeyFrameDatabase::DetectRelocalizationCandidates) 3. get mappoints matched between current frame and cadidate frame, and discard candidates which nmatches are less than 15(ORBmatcher::SearchByBoW) 4. create PnPsolver for each candidate frame 5. get Tcw from PnPsolver, and optimize Tcw. ## BOW ![](https://i.imgur.com/yS0nGqL.png) ![](https://i.imgur.com/jWHN8t5.png) 1. Extract ORB features from Images and build Vocabuary tree (ORBvoc.txt is built offline), which categorizes features into groups(node) using k-means++ based on hamming distance. 2. 每一類用該類中所有特徵的平均特徵（meanValue）作為代表，稱為單詞（word) 3. DBoW2使用了兩種資料結構，逆向索引inverse index和正向索引direct index, 使用下面的程式碼，計算詞包mBowVec和mFeatVec，其中mFeatVec記錄了在第4層所有node節點正向索引. ``` ＃透過該frame的描述子查找vocabuary獲取bow void Frame::ComputeBoW() { if(mBowVec.empty()) { vector<cv::Mat> vCurrentDesc = Converter::toDescriptorVector(mDescriptors); mpORBvocabulary->transform(vCurrentDesc,mBowVec,mFeatVec,4); //第四層 } } ``` 3. 正向索引:frame-> words->features ``` /// Vector of nodes with indexes of local features class FeatureVector: public std::map<NodeId,std::vector<unsigned int> > #NodeId: Vocabuary tree中的node index #vector<unsigned int>: 該node下的features. ``` 每個frame的orb features被分群成很多的node,該node下面含屬於該群的features. ![](https://i.imgur.com/4q33Q2Z.png) Part of ORBmatcher::SearchByBoW ![](https://i.imgur.com/NhrOqPD.png) 4. 逆向索引：word -> frames -> word weight in frame ![](https://i.imgur.com/mijwZiK.png) ## Frame and Mappoints http://zhaoxuhui.top/blog/2021/07/30/orb-slam3-note-3-frame-and-mappoints.html ## Relocalization ``` bool Tracking::Relocalization() { // Compute Bag of Words Vector mCurrentFrame.ComputeBoW(); // Relocalization is performed when tracking is lost // Track Lost: Query KeyFrame Database for keyframe candidates for relocalisation vector<KeyFrame*> vpCandidateKFs = mpKeyFrameDB->DetectRelocalizationCandidates(&mCurrentFrame); if(vpCandidateKFs.empty()) return false; const int nKFs = vpCandidateKFs.size(); // We perform first an ORB matching with each candidate // If enough matches are found we setup a PnP solver //PNP問題：已知空間中存在地圖點世界坐標系下的坐標為Pw1，Pw2，Pw3...... Pwn 以及對應於在該幀中的匹配點像素坐標u1，u2，u3......un，求解此時相機的位姿Tcw /*ORBMatcher主要利用投影、DBow2、三角法、sim3等方法，通過計算兩個描述子間的距離：DescriptorDistance，尋找3D-2D，2D-2D的最佳匹配 https://blog.csdn.net/u012936940/article/details/81396061 */ ORBmatcher matcher(0.75,true); vector<PnPsolver*> vpPnPsolvers; vpPnPsolvers.resize(nKFs); vector<vector<MapPoint*> > vvpMapPointMatches; vvpMapPointMatches.resize(nKFs); vector<bool> vbDiscarded; vbDiscarded.resize(nKFs); int nCandidates=0; for(int i=0; i<nKFs; i++) { KeyFrame* pKF = vpCandidateKFs[i]; if(pKF->isBad()) vbDiscarded[i] = true; else { /* ORBmatcher::SearchByBoW: 通過BoW對兩個關鍵幀之間的特徵點進行匹配透過傳入的reference返回vpMapPointMatches, 以及return nmatches */ int nmatches = matcher.SearchByBoW(pKF,mCurrentFrame,vvpMapPointMatches[i]); if(nmatches<15) { vbDiscarded[i] = true; continue; } else { PnPsolver* pSolver = new PnPsolver(mCurrentFrame,vvpMapPointMatches[i]); pSolver->SetRansacParameters(0.99,10,300,4,0.5,5.991); vpPnPsolvers[i] = pSolver; nCandidates++; } } } // Alternatively perform some iterations of P4P RANSAC // Until we found a camera pose supported by enough inliers bool bMatch = false; ORBmatcher matcher2(0.9,true); // 透過找出來的Candidates, 利用對應的PnPsolver估計相機位姿Tcw while(nCandidates>0 && !bMatch) { for(int i=0; i<nKFs; i++) { if(vbDiscarded[i]) continue; // Perform 5 Ransac Iterations vector<bool> vbInliers; int nInliers; bool bNoMore; PnPsolver* pSolver = vpPnPsolvers[i]; cv::Mat Tcw = pSolver->iterate(5,bNoMore,vbInliers,nInliers); // If Ransac reachs max. iterations discard keyframe if(bNoMore) { vbDiscarded[i]=true; nCandidates--; } // If a Camera Pose is computed, optimize if(!Tcw.empty()) { Tcw.copyTo(mCurrentFrame.mTcw); set<MapPoint*> sFound; const int np = vbInliers.size(); for(int j=0; j<np; j++) { if(vbInliers[j]) { mCurrentFrame.mvpMapPoints[j]=vvpMapPointMatches[i][j]; sFound.insert(vvpMapPointMatches[i][j]); } else mCurrentFrame.mvpMapPoints[j]=NULL; } int nGood = Optimizer::PoseOptimization(&mCurrentFrame); if(nGood<10) continue; for(int io =0; io<mCurrentFrame.N; io++) if(mCurrentFrame.mvbOutlier[io]) mCurrentFrame.mvpMapPoints[io]=static_cast<MapPoint*>(NULL); // If few inliers, search by projection in a coarse window and optimize again if(nGood<50) { int nadditional =matcher2.SearchByProjection(mCurrentFrame,vpCandidateKFs[i],sFound,10,100); if(nadditional+nGood>=50) { nGood = Optimizer::PoseOptimization(&mCurrentFrame); // If many inliers but still not enough, search by projection again in a narrower window // the camera has been already optimized with many points if(nGood>30 && nGood<50) { sFound.clear(); for(int ip =0; ip<mCurrentFrame.N; ip++) if(mCurrentFrame.mvpMapPoints[ip]) sFound.insert(mCurrentFrame.mvpMapPoints[ip]); nadditional =matcher2.SearchByProjection(mCurrentFrame,vpCandidateKFs[i],sFound,3,64); // Final optimization if(nGood+nadditional>=50) { nGood = Optimizer::PoseOptimization(&mCurrentFrame); for(int io =0; io<mCurrentFrame.N; io++) if(mCurrentFrame.mvbOutlier[io]) mCurrentFrame.mvpMapPoints[io]=NULL; } } } } // If the pose is supported by enough inliers stop ransacs and continue if(nGood>=50) { bMatch = true; break; } } } } if(!bMatch) { return false; } else { mnLastRelocFrameId = mCurrentFrame.mnId; return true; } } ``` DetectRelocalizationCandidates:返回與該frame具有一定數量共同word且covisibility score高於0.75*bestScore的cadidate frames ``` vector<KeyFrame*> KeyFrameDatabase::DetectRelocalizationCandidates(Frame *F) { list<KeyFrame*> lKFsSharingWords; // 記錄跟current frame有share word的 frames // Search all keyframes that share a word with current frame { unique_lock<mutex> lock(mMutex); //遍歷每個word, 透過mvInvertedFile 得到該word對應的frames, 把每個frame加入為pkf的cadidate frame for(DBoW2::BowVector::const_iterator vit=F->mBowVec.begin(), vend=F->mBowVec.end(); vit != vend; vit++) { list<KeyFrame*> &lKFs = mvInvertedFile[vit->first]; for(list<KeyFrame*>::iterator lit=lKFs.begin(), lend= lKFs.end(); lit!=lend; lit++) { KeyFrame* pKFi=*lit; if(pKFi->mnRelocQuery!=F->mnId) // pKFi還没有tag as candicate frame { pKFi->mnRelocWords=0; //第一次被加入候選幀時初始化 pKFi->mnRelocQuery=F->mnId; //tag as candicate frame lKFsSharingWords.push_back(pKFi); // add to pkf list } pKFi->mnRelocWords++; //紀錄該候選幀跟current frame有多少共同words } } } if(lKFsSharingWords.empty()) return vector<KeyFrame*>(); // Only compare against those keyframes that share enough words int maxCommonWords=0; # 找出share max words 的 candidate frame for(list<KeyFrame*>::iterator lit=lKFsSharingWords.begin(), lend= lKFsSharingWords.end(); lit!=lend; lit++) { if((*lit)->mnRelocWords>maxCommonWords) maxCommonWords=(*lit)->mnRelocWords; } int minCommonWords = maxCommonWords*0.8f; list<pair<float,KeyFrame*> > lScoreAndMatch; int nscores=0; // Compute similarity score. for(list<KeyFrame*>::iterator lit=lKFsSharingWords.begin(), lend= lKFsSharingWords.end(); lit!=lend; lit++) { KeyFrame* pKFi = *lit; if(pKFi->mnRelocWords>minCommonWords) { nscores++; float si = mpVoc->score(F->mBowVec,pKFi->mBowVec); pKFi->mRelocScore=si; lScoreAndMatch.push_back(make_pair(si,pKFi)); } } if(lScoreAndMatch.empty()) return vector<KeyFrame*>(); list<pair<float,KeyFrame*> > lAccScoreAndMatch; float bestAccScore = 0; // Lets now accumulate score by covisibility for(list<pair<float,KeyFrame*> >::iterator it=lScoreAndMatch.begin(), itend=lScoreAndMatch.end(); it!=itend; it++) { KeyFrame* pKFi = it->second; vector<KeyFrame*> vpNeighs = pKFi->GetBestCovisibilityKeyFrames(10); float bestScore = it->first; float accScore = bestScore; KeyFrame* pBestKF = pKFi; for(vector<KeyFrame*>::iterator vit=vpNeighs.begin(), vend=vpNeighs.end(); vit!=vend; vit++) { KeyFrame* pKF2 = *vit; if(pKF2->mnRelocQuery!=F->mnId) continue; accScore+=pKF2->mRelocScore; if(pKF2->mRelocScore>bestScore) { pBestKF=pKF2; bestScore = pKF2->mRelocScore; } } lAccScoreAndMatch.push_back(make_pair(accScore,pBestKF)); if(accScore>bestAccScore) bestAccScore=accScore; } // Return all those keyframes with a score higher than 0.75*bestScore float minScoreToRetain = 0.75f*bestAccScore; set<KeyFrame*> spAlreadyAddedKF; vector<KeyFrame*> vpRelocCandidates; vpRelocCandidates.reserve(lAccScoreAndMatch.size()); for(list<pair<float,KeyFrame*> >::iterator it=lAccScoreAndMatch.begin(), itend=lAccScoreAndMatch.end(); it!=itend; it++) { const float &si = it->first; if(si>minScoreToRetain) { KeyFrame* pKFi = it->second; if(!spAlreadyAddedKF.count(pKFi)) { vpRelocCandidates.push_back(pKFi); spAlreadyAddedKF.insert(pKFi); } } } ＃返回與該frame具有一定數量共同word且covisibility score高於0.75*bestScore的cadidate frames return vpRelocCandidates; } ``` ### fuse with rssi when radiomap needs to be updated: ap改變or空間改變因為一開始建立map時就是用orb-slam為ground truth下去建立的，所以應該以orbslam定位為主。 1. relocalization with orb: assume we already built a radiomap which maps rssi fingerprint->(x,y) and we know the avgloss between Tcw_wifi and Tcw_orb 1. calcuate bow of current frame (mCurrentFrame.ComputeBoW()). 2. find cadidate frames which have enough share words and the covisibility score is high enough (KeyFrameDatabase::DetectRelocalizationCandidates) 3. get mappoints matched between current frame and cadidate frame, and discard candidates which nmatches are less than 15(ORBmatcher::SearchByBoW) 4. create PnPsolver for each candidate frame 5. get Tcw from PnPsolver, and optimize Tcw. if success, we start to update our radiomap from current location. if fail, change a position or we can introduce wifi-localization to complement: 1.get current wifi fingerprint 2.locate Tcw by comparing current fingerprint with the radiomap we've built. 3.locate Tcw_wifi with wifi-localization and Tcw_orb via orb-slam relocalization with those discarded candidates which nmatches are less than 15, calculate the loss between Tcw_wifi and Tcw_orb, if it's less than avgloss, we set Tcw as a linear combination of Tcw_orb and Tcw_wifi. if still fail, rebuild it. ###### tags: `SLAM`