Fiduccia Mattheyses C++實作

# Fiduccia Mattheyses :::warning 進入本篇內容之前，需要先了解 HyperGraph 的基本概念及結構，並且在此僅涉及實作，不包含演算法原理、算式解析。 ::: --- ## 符號/單辭定義 | 符號 | 定義 | | --------- | --------------------------------------------------- | | **S** | 所有頂點的集合 | | **Lₛ** | 經演算法分割後的子集合1 | | **Rₛ** | 經演算法分割後的子集合2 | | gain | 移動該頂點造成的收益(可能為負) | | cut edge | 若一超邊為 cut edge，則他同時包含 Lₛ 和 Rₛ 中的頂點 | | lhs / rhs | 左側 / 右側 | --- ## 概述 FM演算法本質目的為將一集合 **S** 分割為兩個子集合 **Lₛ**, **Rₛ**，且具以下規範 1. **Lₛ** 和 **Rₛ** 之間的關聯越**低**越好 (即為 cut edge 越少越好) 2. 在子集合中，頂點之間的關聯越**高**越好 3. 能夠在線性時間下完成 --- ## 資料結構在實作演算法之前，首先定義 HyperGraph 的結構 ### HyperGraph | 資料 | usage | | ------------------ | ---------------------- | | VerticesMap | 即為所有頂點集合 **S** | | EdgesMap | 所有超邊的集合 | ### Vertex | 資料 | usage | | -------------- | ------------------------------------- | | ID | 在同一頂點集合內將會是唯一的 | | Name | Debug | | ConnectedEdges | 包含該頂點的超邊集合 | | Neighbors | 能通過 **1** 條超邊抵達的其他頂點集合 | ### Edge | 資料 | usage | | ----------------- | ---------------------------- | | ID | 在同一超邊集合內將會是唯一的 | | Name | Debug | | ConnectedVertices | 該超邊包含的所有頂點 | --- ## 演算法實作 ### gain 值計算公式 1. 當移動一個頂點會產生新的 Cut Edge 時，gain\-\- 2. 當移動一個頂點會減少一個 Cut Edge 時，gain++ 3. gain += (同子集鄰居數量 - 不同子集鄰居數量) * (映射至 +-1) ### 1. 初始化 Trait 對於在 Graph 中的所有頂點建立其特徵，包含以下資訊在初始化過程中， IsLHS 將依照奇偶數去賦值 (任意賦值) | name | usage | | ------ | ---------------------------------------------------------------------- | | Sorted | 是否被排序過 | | IsLHS | 該點的所在的子集合<br>&emsp;T: 在 **Lₛ** 中 <br> &emsp;F: 在 **Rₛ** 中 | | Gain | 該頂點的 gain 值 | ### 2. 迭代演算 #### 2.1 找出 **Lₛ** 及 **Rₛ** 中 Gain 值最大的頂點被標記為 Sorted 的頂點將不再列入計算，當沒有符合的頂點時，結束演算法。後續把找到的兩個結果稱為 lmax 及 rmax #### 2.2 判斷要移動的節點 - 在左右子集中的頂點數量失衡時，優先嘗試回復平衡 - 當 lmax 及 rmax 的 gain 相等時，可以隨機選用，在此實作中選擇能夠讓子集合連結性更高的頂點。 - 若 gain 不相等時，選擇 gain 值較高者 #### 2.3 移動該頂點並標記為 Sorted #### 2.4 更新可能受影響的 gain 值 gain 值只有在鄰居變動時才會更動，因此只須更新移動的頂點的鄰居頂點即可 ### 3. 輸出結果根據最終特徵將 **S** 拆分為 **Lₛ**, **Rₛ** 並計算 Cut Edge 的數量 ### 補充 - gain 值計算中的第3點並非原始演算法中的內容，其目的是添加頂點間的群聚性，讓原始 gain 值相同時優先把小集群合併至大集合 - 在 2.2 中的頂點數量失衡部分，在此實作中使用一個 [0, 0.5] 浮點數 `split` 來表達一個子集合最少必須擁有多少比例的頂點 --- ## 演算效能 :::info 該實作未經嚴謹的效能優化，此表僅供參考 ::: - 連接數為一個超邊可包含的最小、最大頂點數 ![](https://hackmd.io/_uploads/Hk8QWX3ya.png) 通過簡易測試我們大致可以發現以下規律 - 整體演算法時間複雜度大約可以表示為 O(n\*m)，其中 n 為頂點數，m 為超邊數，為線性關係。 - 失衡的超邊數量可能導致演算的時間大幅增加，如 1000 頂點，卻有 8000 個超邊。 - 在連結性更高的集合中 gain 值計算的成本會大幅增加，如超邊過多、超邊包含的頂點過多。 --- ## 演算結果在此以一個簡易的超圖作為範例，包含 8 的頂點 A~H ### case 1 超邊定義: 0.(A, B, C, D) 1.(E, F, G, H) ![](https://hackmd.io/_uploads/SyHHNm3JT.png)![](https://hackmd.io/_uploads/HyO9EX3yp.png) ### case 2 超邊定義: 0.(A, B, C, D) 1.(E, F, G, H) 2.(A, E) ![](https://hackmd.io/_uploads/r1OpE7nkp.png)![](https://hackmd.io/_uploads/BJ2K4mnkp.png) ### case 3 超邊定義: 0.(A, B, C, D) 1.(E, F, G, H) 2.(A, E) 3.(A, B, E) ![](https://hackmd.io/_uploads/HkDarX21T.png)![](https://hackmd.io/_uploads/rJmxHm316.png) --- ## 參考資料 - [Fiduccia-Mattheyses algorithm概述](https://blog.csdn.net/qinchun_bei/article/details/101067957) - [Fiduccia Mattheyses演算法(即FM演算法)的論文解讀](https://www.gushiciku.cn/pl/a8n0/zh-tw) - [超图学习（Learning with hypergraphs）（一）](https://zhuanlan.zhihu.com/p/361471954) - NCU Course (private) --- ## TODO - 效能優化 - 實作 Bucket - Trait 存儲模式 - Graph 相關物件建構 --- ## 原始碼 :::info 使用 VS2022 C++11 編寫 ::: ``` cpp= #include <iostream> #include <set> #include <map> #include <vector> #include <string> #include <stdlib.h> #include <algorithm> using namespace std; #pragma region class def class HyperGraph // 超圖 { public: class Vertex; class Edge; HyperGraph(); ~HyperGraph(); Vertex* GetVertex(int id, string name = ""); // 取得或創建新的 Vertex Edge* GetEdge(int id, string name = ""); // 取得或創建新的 Edge Vertex* CreateVertex(int id, string name = ""); // 創建新的 Vertex Edge* CreateEdge(int id, initializer_list<int> connectVertices = {}, string name = ""); // 創建新的 Edge，可初始化連接的 Vertices const map<int, Vertex*> VerticesMap(); // 取得超圖內所有頂點 const map<int, Edge*> EdgesMap(); // 取得超圖內所有超邊 friend std::ostream& operator<< (std::ostream& os, HyperGraph& obj); private: map<int, Vertex*> verticesMap; map<int, Edge*> edgesMap; }; class HyperGraph::Vertex { friend HyperGraph; friend Edge; public: ~Vertex(); const int ID; string Name = ""; const set<Edge*>& ConnectedEdges(); // 取得所有與該頂點相連的超邊 bool Contains(Edge* edge); // 該頂點是否連接輸入超邊 void Connect(Edge* edge); // 將該頂點連接至輸入超邊 void DisConnect(Edge* edge); // 將該頂點從輸入超邊斷開 set<Vertex*> GetNeighbors(); // 取得該頂點的鄰居 (能通過任意超邊抵達的另一頂點) friend std::ostream& operator<< (std::ostream& os, Vertex& obj); private: Vertex(); Vertex(int id, string name = ""); set<Edge*> connectedEdges; }; class HyperGraph::Edge { friend HyperGraph; friend Vertex; public: ~Edge(); const int ID; string Name = ""; const set<Vertex*>& ConnectedVertices(); // 取得所有與該超邊相連的頂點 bool Contains(Vertex* vertex); // 該超邊是否包含輸入頂點 void Connect(Vertex* vertex); // 將輸入超邊連接至該頂點 void DisConnect(Vertex* vertex); // 將輸入頂點從該超邊斷開 friend std::ostream& operator<< (std::ostream& os, Edge& obj); private: Edge(); Edge(int id, string name = ""); set<Vertex*> connectedVertices; }; class FiducciaMattheyses { public: struct Trait { bool Sorted = false; bool IsLHS; float Gain; }; FiducciaMattheyses(HyperGraph& g, float split = 0.5); // 建構子 & 實做函式 ~FiducciaMattheyses(); HyperGraph& Source; set<HyperGraph::Vertex*> LHS, RHS; set<HyperGraph::Edge*> CutEdges; friend std::ostream& operator<< (std::ostream& os, FiducciaMattheyses& obj); private: map<HyperGraph::Vertex*, Trait> traitTable; float con3step; // (1.0 / vertices count) void calGain(HyperGraph::Vertex* vertex); }; #pragma endregion //#define DEBUG int main(int argc, char* argv[]) { HyperGraph graph; {// Create Vertices int i = 0; for (auto str : { "A","B","C","D","E","F","G","H" }) graph.CreateVertex(i++, str); } // 實作演算法時預切分是奇偶分類，所以以下為預設最耦合的超邊連接 graph.CreateEdge(0, { 0,1,2,3 }); graph.CreateEdge(1, { 4,5,6,7 }); //graph.CreateEdge(2, { 0,4 }); //graph.CreateEdge(3, { 0,1,4 }); cout << graph << '\n'; auto fm = FiducciaMattheyses(graph, 0); cout << fm << '\n'; return 0; } #pragma region HyperGraph HyperGraph::HyperGraph() { } HyperGraph::~HyperGraph() { for (auto v : verticesMap) delete v.second; for (auto e : edgesMap) delete e.second; } HyperGraph::Vertex* HyperGraph::GetVertex(int id, string name) { if (!verticesMap.count(id)) verticesMap[id] = new Vertex(id, name); return verticesMap[id]; } HyperGraph::Vertex* HyperGraph::CreateVertex(int id, string name) { if (verticesMap.count(id)) { cout << "Already Contains Vertex " << id << '\n'; return nullptr; } auto v = GetVertex(id, name); return v; } const map<int, HyperGraph::Vertex*> HyperGraph::VerticesMap() { return this->verticesMap; } const map<int, HyperGraph::Edge*> HyperGraph::EdgesMap() { return this->edgesMap; } HyperGraph::Edge* HyperGraph::GetEdge(int id, string name) { if (!edgesMap.count(id)) edgesMap[id] = new Edge(id, name); return edgesMap[id]; } HyperGraph::Edge* HyperGraph::CreateEdge(int id, initializer_list<int> connectVertices, string name) { if (edgesMap.count(id)) { cout << "Already Contains Edge " << id << '\n'; return nullptr; } auto e = GetEdge(id, name); for (int v : connectVertices) e->Connect(GetVertex(v)); return e; } #pragma endregion #pragma region Vertex HyperGraph::Vertex::Vertex() :ID(-1) {} HyperGraph::Vertex::Vertex(int id, string name) :ID(id), Name(name) {} HyperGraph::Vertex::~Vertex() { } const set<HyperGraph::Edge*>& HyperGraph::Vertex::ConnectedEdges() { return this->connectedEdges; } bool HyperGraph::Vertex::Contains(Edge* edge) { return connectedEdges.count(edge); } void HyperGraph::Vertex::Connect(Edge* edge) { edge->Connect(this); } void HyperGraph::Vertex::DisConnect(Edge* edge) { edge->DisConnect(this); } set<HyperGraph::Vertex*> HyperGraph::Vertex::GetNeighbors() { auto result = set<Vertex*>(); for (auto e : ConnectedEdges()) { for (auto v : e->ConnectedVertices()) { if (v->ID == ID) continue; result.insert(v); } } return result; } #pragma endregion #pragma region Edge HyperGraph::Edge::Edge() :ID(-1) {} HyperGraph::Edge::Edge(int id, string name) :ID(id), Name(name) {} HyperGraph::Edge::~Edge() { } const set<HyperGraph::Vertex*>& HyperGraph::Edge::ConnectedVertices() { return this->connectedVertices; } bool HyperGraph::Edge::Contains(Vertex* vertex) { return connectedVertices.count(vertex); } void HyperGraph::Edge::Connect(Vertex* vertex) { if (Contains(vertex)) return; connectedVertices.insert(vertex); vertex->connectedEdges.insert(this); } void HyperGraph::Edge::DisConnect(Vertex* vertex) { if (!Contains(vertex)) return; connectedVertices.erase(vertex); vertex->connectedEdges.erase(this); } #pragma endregion #pragma region FiducciaMattheyses FiducciaMattheyses::FiducciaMattheyses(HyperGraph& g, float split) : Source(g), con3step(1.0 / Source.VerticesMap().size()) { // 預建置 Vertex Trait bool isOdd = true; for (auto pair : g.VerticesMap()) { traitTable[pair.second] = Trait(); traitTable[pair.second].IsLHS = isOdd; isOdd = !isOdd; } // 初始 Gain 值計算 for (auto pair : g.VerticesMap()) calGain(pair.second); int minSize = Source.VerticesMap().size(); int maxSize = minSize; minSize *= split; maxSize -= minSize; auto compGain = [&](HyperGraph::Vertex* lhs, HyperGraph::Vertex* rhs)->bool {return traitTable[lhs].Gain < traitTable[rhs].Gain; }; // 直到所有 Vertex 都被計算過 while (1) { // O(n) 尋找左/右子集中 Gain 值最大的 Vertex // 當所有 Vertex 都被標記為 Sorted 時，結束迴圈 HyperGraph::Vertex* lMax = nullptr, * rMax = nullptr; int lsize = 0, rsize = 0; int curLmax = 0, curRmax = 0; bool finish = true; for (auto t : traitTable) { if (!t.second.Sorted)finish = false; if (t.second.IsLHS) { lsize++; if (t.second.Sorted) continue; if ((lMax == nullptr) || (t.second.Gain > curLmax)) { lMax = t.first; curLmax = t.second.Gain; } } else { rsize++; if (t.second.Sorted) continue; if ((rMax == nullptr) || (t.second.Gain > curRmax)) { rMax = t.first; curRmax = t.second.Gain; } } } if (finish) break; // T: R2L, F: L2R // 判斷要進行移動的 Vertex bool moveSide; HyperGraph::Vertex* toMove = nullptr; if (lMax == nullptr) { moveSide = true; toMove = rMax; } else if (rMax == nullptr) { moveSide = false; toMove = lMax; } else { float lgain = traitTable[lMax].Gain, rgain = traitTable[rMax].Gain; // 當左右 Gain 值都 <= 0 時，提前結束演算 // (意即此時移動任何節點，都是無/負收益) if ((lgain <= 0) && (rgain <= 0)) break; if (lsize < minSize) moveSide = true; // 左子集過小 else if (lsize > maxSize) moveSide = false; // 左子集過大 else if (lgain == rgain) moveSide = (lsize < rsize); // 效益相等，選較小的子集 else moveSide = (lgain < rgain); // 選取效益較佳的子集 toMove = (moveSide ? rMax : lMax); #ifdef DEBUG cout << "gain LHS: " << lgain << ", RHS: " << rgain << '\n'; if (moveSide) cout << "move " << ((toMove->Name == "") ? to_string(toMove->ID) : toMove->Name) << " to Left\n"; else cout << "move " << ((toMove->Name == "") ? to_string(toMove->ID) : toMove->Name) << " to Right\n"; #endif // DEBUG } // 移動節點 traitTable[toMove].IsLHS = !traitTable[toMove].IsLHS; // 重新計算可能被影響到的 Gain 值 for (auto v : toMove->GetNeighbors()) { calGain(v); } // 標記被移動的節點為 Sorted traitTable[toMove].Sorted = true; } // 演算結束後，根據標記 insert result for (auto pair : traitTable) { if (pair.second.IsLHS) LHS.insert(pair.first); else RHS.insert(pair.first); } // 計算那些超邊為 Cut Edge for (auto e : Source.EdgesMap()) { auto vertices = e.second->ConnectedVertices(); if (vertices.empty()) continue; bool side = traitTable.at(*vertices.begin()).IsLHS; for (auto v : vertices) { if (side != traitTable[v].IsLHS) { CutEdges.insert(e.second); break; } } } traitTable.clear(); } FiducciaMattheyses::~FiducciaMattheyses() { } void FiducciaMattheyses::calGain(HyperGraph::Vertex* vertex) { float& gain = traitTable[vertex].Gain; gain = 0; int eCount = 0; float totalDiff = 0; float totalSame = 0; for (auto e : vertex->ConnectedEdges()) { // 此處的 Gain( v(i)) 值定義 // 1. 每當有一 e(j) 包含 v(i)，且 e(j) 中沒有其他 v 在 v(i) 的分區: gain +1 // 即為若移動此節點，會減少一個 Cut Edge // 2. 每當有一 e(j) 包含 v(i)，且 e(j) 中所有的 v 都在 v(i) 的分區: gain -1 // 即為若移動此節點，會增加一個 Cut Edge // 3. 次級條件，gain += (同分區鄰居數量 - 不同分區鄰居數量) * con3step (使其總和不超過 +-1) // 即為盡量使同邊節點聚集 bool side = traitTable[vertex].IsLHS; int same = -1, diff = 0; for (auto v : e->ConnectedVertices()) { if (traitTable[v].IsLHS == side) same++; else diff++; } if (same == 0) gain++; if (diff == 0) gain--; eCount++; totalDiff += diff; totalSame += same; } gain += con3step * (totalDiff - totalSame) / eCount; #ifdef DEBUG cout << "Gain of " << ((vertex->Name == "") ? to_string(vertex->ID) : vertex->Name) << " is " << traitTable[vertex].Gain << '\n'; #endif // DEBUG } #pragma endregion #pragma region ostream std::ostream& operator<<(std::ostream& os, HyperGraph& obj) { bool deep = ((obj.VerticesMap().size() <= 100) && (obj.EdgesMap().size() <= 100)); os << "HyperGraph:\n" << "- Vertices * " << obj.verticesMap.size() << " :\n"; if (deep) for (auto pair : obj.verticesMap) os << *pair.second; os << "- Edges * " << obj.edgesMap.size() << " :\n"; if (deep) for (auto pair : obj.edgesMap) os << *pair.second; return os; } std::ostream& operator<<(std::ostream& os, HyperGraph::Vertex& obj) { os << "v: "; if (obj.Name != "") os << obj.Name; else os << obj.ID; os << " -> "; for (auto e : obj.connectedEdges) os << ((e->Name == "") ? to_string(e->ID) : e->Name) << ", "; os << '\n'; return os; } std::ostream& operator<<(std::ostream& os, HyperGraph::Edge& obj) { os << "e: "; if (obj.Name != "") os << obj.Name; else os << obj.ID; os << " -> "; for (auto v : obj.connectedVertices) os << ((v->Name == "") ? to_string(v->ID) : v->Name) << ", "; os << '\n'; return os; } std::ostream& operator<<(std::ostream& os, FiducciaMattheyses& obj) { bool deep = obj.Source.VerticesMap().size() < 100; os << "FiducciaMattheyses:\n"; os << "- LHS * " << obj.LHS.size() << ":\n"; if (deep) { for (auto v : obj.LHS) { os << *v; } } os << "- RHS * " << obj.RHS.size() << ":\n"; if (deep) { for (auto v : obj.RHS) { os << *v; } } os << "- Cut Edges * " << obj.CutEdges.size() << ":\n"; if (deep) { for (auto e : obj.CutEdges) { os << *e; } } return os; } #pragma endregion ``` > [name=Naive Ys]Last updated : 2023/09/23