嘗試修改專案為平行化測試

# 嘗試修改專案為平行化測試修改專案後位置 https://github.com/x213212/andersen 上次課堂作業 https://x8795278.blogspot.com/2020/11/simple-llvm-pass-analysis.html 分析的專案 https://github.com/grievejia/andersen 在尋找可以開啟修改為平行化的地方，雖然後面改完了但是發現一個大bug 主要想法是 https://github.com/grievejia/andersen/blob/master/include/Andersen.h 在這個地方 ![](https://i.imgur.com/rbRqzRc.png) 去尋找程式進入點那我選擇的是 collectConstraints 這個地方 https://github.com/grievejia/andersen/blob/ec036a43ac7bc3523cf50d43ad711c095c5f1112/lib/ConstraintCollect.cpp 可以看到 ![](https://i.imgur.com/MXMkegw.png) ```c= // Here is a notable point before we proceed: // For functions with non-local linkage type, theoretically we should not // trust anything that get passed to it or get returned by it. However, // precision will be seriously hurt if we do that because if we do not run a // -internalize pass before the -anders pass, almost every function is marked // external. We'll just assume that even external linkage will not ruin the // analysis result first for (auto const &f : M) { if (f.isDeclaration() || f.isIntrinsic()) continue; // Scan the function body // A visitor pattern might help modularity, but it needs more boilerplate // codes to set up, and it breaks down the main logic into pieces // First, create a value node for each instruction with pointer type. It is // necessary to do the job here rather than on-the-fly because an // instruction may refer to the value node defined before it (e.g. phi // nodes) for (const_inst_iterator itr = inst_begin(f), ite = inst_end(f); itr != ite; ++itr) { auto inst = &*itr.getInstructionIterator(); if (inst->getType()->isPointerTy()) nodeFactory.createValueNode(inst); } // Now, collect constraint for each relevant instruction for (const_inst_iterator itr = inst_begin(f), ite = inst_end(f); itr != ite; ++itr) { auto inst = &*itr.getInstructionIterator(); collectConstraintsForInstruction(inst); } } ``` 意味者我們可以在最後一個 function > collectConstraintsForInstruction 找到我們的蒐集相關約束的程式碼也就是 https://github.com/grievejia/andersen/blob/ec036a43ac7bc3523cf50d43ad711c095c5f1112/lib/ConstraintCollect.cpp ![](https://i.imgur.com/OHw5afl.png) 可以看到 >switch (inst->getOpcode()) { case Instruction::Alloca: { 這是原來的分析方式，程式流程每一個 stmt 走 switch case 應該會是這樣 ![](https://i.imgur.com/c0AjOWK.png) 應該是這樣子的走法，我想假設我在這邊引用thread pool 會發生什麼事我隨便找了一個 thread pool 就直接改了 https://github.com/AngryHacker/articles/issues/1#issue-369867252 那麼合體後可能 ![](https://i.imgur.com/qvc2K3q.png) 程式流程會變成這樣可以看到原本的 function 裡面有這些東西代表可能nodeFactory 可能有順序上關係，那我們要找方法迴避一下 > NodeIndex valNode = nodeFactory.getValueNodeFor(inst); 下面有用到一些作弊和 function 指標的方式算是紀錄一下我打算在每一個 task 其實只是單存插入以 inst 當 key 和 fptr 當 value >typedef void (Andersen::*fptr)(const llvm::Instruction *); map<const llvm::Instruction *, fptr> mymap2; 那麼我就可以在任務隊列任務出來的時候看當時設定的幾個 thread 來決定處理幾個我額外用一個 Andersen *thistmp; 再void Andersen::collectConstraints(const Module &M) 裡面去指向自己 thistmp = this; 這樣方便我們在 task 指定的時候防止抓不到我們自己的 fucntion 主要是Andersen fucntion 定義在外面所以用了指標，絕對不是炫技xd 那麼簡單說一下技巧可能在 cmake 的時候仔細看 ![](https://i.imgur.com/2x0AfNw.png) ![](https://i.imgur.com/26mwArT.png) 加入我們的 -lpthread 再把剛剛的專案 make 完後的 libthreadpool.o 複製到 CMakeFiles/AndersenObj.dir/libthreadpool.o 回到目錄我們在一開始的 ConstraintCollect.cpp #include <threadpool.h> 也不會在編譯完找不到我們的標頭檔了 ![](https://i.imgur.com/qkOjr8W.png) ![](https://i.imgur.com/IfgXVDB.png) 再往下面看我們蒐集約束的位置 ```c= // Now, collect constraint for each relevant instruction //#pragma omp for schedule(static) for (const_inst_iterator itr = inst_begin(f), ite = inst_end(f); itr != ite; ++itr) { // fprintf(stderr, "%p\n", &inst); auto inst = &*itr.getInstructionIterator(); threadpool_add(pool, &dummy_task, (Instruction *)inst, 0); pthread_mutex_lock(&lock); tasks++; pthread_mutex_unlock(&lock); // collectConstraintsForInstruction(inst); } ``` 到這邊我們要來看一下dummy_task 這邊我對 arg 進行了強轉，沒有gdb 只能 printf 記憶體位置，如果指向 auto inst 和arg記憶體位置一樣有成功，很大機率就是成功了，記得一直強轉一下或許有機會 ```c= void dummy_task(void *arg) { (thistmp->collectConstraintsForInstruction)((Instruction *)arg); } ``` 這邊可以看到 thistmp 為什麼要用這個呢原因就是要在外部 fucntion 去呼叫我們的 void Andersen:: class 裡面的 fucntion ，其實是不能的 void 不行呼叫 void Andersen:: fucntion ![](https://i.imgur.com/XTEcRps.png) 也就是這樣呼叫方式是不行的所以我先預存 this 指標 ![](https://i.imgur.com/uJUaaNf.png) 這樣就可以了這邊用到了兩次指標，再來看我們的switch case 變得怎樣吧那麼目前我們的測資只有用到這些 case function 也就是說對應到原本的code ```c= switch (inst->getOpcode()) { case Instruction::Alloca: { NodeIndex valNode = nodeFactory.getValueNodeFor(inst); assert(valNode != AndersNodeFactory::InvalidIndex && "Failed to find alloca value node"); NodeIndex objNode = nodeFactory.createObjectNode(inst); constraints.emplace_back(AndersConstraint::ADDR_OF, valNode, objNode); break; } case Instruction::Call: case Instruction::Invoke: { ImmutableCallSite cs(inst); assert(cs && "Something wrong with callsite?"); addConstraintForCall(cs); break; } case Instruction::Ret: { if (inst->getNumOperands() > 0 && inst->getOperand(0)->getType()->isPointerTy()) { NodeIndex retIndex = nodeFactory.getReturnNodeFor(inst->getParent()->getParent()); assert(retIndex != AndersNodeFactory::InvalidIndex && "Failed to find return node"); NodeIndex valIndex = nodeFactory.getValueNodeFor(inst->getOperand(0)); assert(valIndex != AndersNodeFactory::InvalidIndex && "Failed to find return value node"); constraints.emplace_back(AndersConstraint::COPY, retIndex, valIndex); } break; } case Instruction::Load: { if (inst->getType()->isPointerTy()) { NodeIndex opIndex = nodeFactory.getValueNodeFor(inst->getOperand(0)); assert(opIndex != AndersNodeFactory::InvalidIndex && "Failed to find load operand node"); NodeIndex valIndex = nodeFactory.getValueNodeFor(inst); assert(valIndex != AndersNodeFactory::InvalidIndex && "Failed to find load value node"); constraints.emplace_back(AndersConstraint::LOAD, valIndex, opIndex); } break; } case Instruction::Store: { if (inst->getOperand(0)->getType()->isPointerTy()) { NodeIndex srcIndex = nodeFactory.getValueNodeFor(inst->getOperand(0)); assert(srcIndex != AndersNodeFactory::InvalidIndex && "Failed to find store src node"); NodeIndex dstIndex = nodeFactory.getValueNodeFor(inst->getOperand(1)); assert(dstIndex != AndersNodeFactory::InvalidIndex && "Failed to find store dst node"); constraints.emplace_back(AndersConstraint::STORE, dstIndex, srcIndex); } break; } ``` ```c= void Andersen::instAlloca(const Instruction *inst) { NodeIndex valNode = nodeFactory.getValueNodeFor(inst); assert(valNode != AndersNodeFactory::InvalidIndex && "Failed to find alloca value node"); NodeIndex objNode = nodeFactory.createObjectNode(inst); constraints.emplace_back(AndersConstraint::ADDR_OF, valNode, objNode); } void Andersen::instInvoke(const Instruction *inst) { ImmutableCallSite cs(inst); assert(cs && "Something wrong with callsite?"); addConstraintForCall(cs); } void Andersen::instRet(const Instruction *inst) { if (inst->getNumOperands() > 0 && inst->getOperand(0)->getType()->isPointerTy()) { NodeIndex retIndex = nodeFactory.getReturnNodeFor(inst->getParent()->getParent()); assert(retIndex != AndersNodeFactory::InvalidIndex && "Failed to find return node"); NodeIndex valIndex = nodeFactory.getValueNodeFor(inst->getOperand(0)); assert(valIndex != AndersNodeFactory::InvalidIndex && "Failed to find return value node"); constraints.emplace_back(AndersConstraint::COPY, retIndex, valIndex); } } void Andersen::instLoad(const Instruction *inst) { if (inst->getType()->isPointerTy()) { NodeIndex opIndex = nodeFactory.getValueNodeFor(inst->getOperand(0)); assert(opIndex != AndersNodeFactory::InvalidIndex && "Failed to find load operand node"); NodeIndex valIndex = nodeFactory.getValueNodeFor(inst); assert(valIndex != AndersNodeFactory::InvalidIndex && "Failed to find load value node"); constraints.emplace_back(AndersConstraint::LOAD, valIndex, opIndex); } } void Andersen::instStore(const Instruction *inst) { if (inst->getOperand(0)->getType()->isPointerTy()) { NodeIndex srcIndex = nodeFactory.getValueNodeFor(inst->getOperand(0)); assert(srcIndex != AndersNodeFactory::InvalidIndex && "Failed to find store src node"); NodeIndex dstIndex = nodeFactory.getValueNodeFor(inst->getOperand(1)); assert(dstIndex != AndersNodeFactory::InvalidIndex && "Failed to find store dst node"); constraints.emplace_back(AndersConstraint::STORE, dstIndex, srcIndex); } } ``` 這邊可以看到我先把 case 裡面的東西給獨立出來成一個 fucntion 當switch case 條件成真的時候我把我當時的 this Andersen::instAlloca; fucntion 指標給指給 fptr 再來我把 inst 當key fptr 當 value 並加上鎖 >pthread_mutex_lock(&lock); > pthread_mutex_unlock(&lock); ```c= switch (inst->getOpcode()) { case Instruction::Alloca: { // Andersen *p =this; // cout << "Alloca" <<endl; // cout << "fuck123" <<endl; pthread_mutex_lock(&lock); fptr f = &Andersen::instAlloca; mymap2[inst] = f; done++; pthread_mutex_unlock(&lock); // Andersen xobject; // (this->*f)(inst); // //(this)->f(inst); // instAlloca(inst); break; } case Instruction::Call: case Instruction::Invoke: { pthread_mutex_lock(&lock); fptr f = &Andersen::instInvoke; // (this->*f)(inst); mymap2[inst] = f; done++; pthread_mutex_unlock(&lock); // cout << "fuck" <<endl; // fptr f = instInvoke; // f(inst); break; } case Instruction::Ret: { pthread_mutex_lock(&lock); fptr f = &Andersen::instRet; // (this->*f)(inst); mymap2[inst] = f; done++; pthread_mutex_unlock(&lock); // cout << "fuck2" <<endl; // fptr f = instRet; // f(inst); break; } case Instruction::Load: { pthread_mutex_lock(&lock); fptr f = &Andersen::instLoad; // (this->*f)(inst); mymap2[inst] = f; done++; pthread_mutex_unlock(&lock); // cout << "fuck3" <<endl; // fptr f = instLoad; // f(inst); // instLoad(inst); break; } case Instruction::Store: { pthread_mutex_lock(&lock); // cout << "fuck4" <<endl; fptr f = &Andersen::instStore; mymap2[inst] = f; done++; pthread_mutex_unlock(&lock); // (this->*f)(inst); // fptr f = instStore; // f(inst); // instStore(inst); break; } } ``` 到這邊產生任務已經完成了如何完成一個任務呢 ```c= fprintf(stderr, "Added %d tasks\n", tasks); /* 不断检查任务数是否完成一半以上，没有则继续休眠 */ while (done != tasks) { // fprintf(stderr, "Did %d tasks\n", done); // usleep(1); } fprintf(stderr, "Did %d tasks\n", done); /* 这时候销毁线程池,0 代表 immediate_shutdown */ assert(threadpool_destroy(pool, 0) == 0); int k = 0; for (auto const &f : M) { if (f.isDeclaration() || f.isIntrinsic()) continue; for (const_inst_iterator itr = inst_begin(f), ite = inst_end(f); itr != ite; ++itr) { k++; auto inst = &*itr.getInstructionIterator(); // fprintf(stderr, "%p\n", &inst); 去找我的 key 和 value fptr f = mymap2.find((Instruction *)inst)->second; ; (this->*f)(inst); // fprintf(stderr, "%d\n", k); // mymap2[inst] = f; // collectConstraintsForInstruction(inst); } } ``` 因為順序性問題我透過去找我的 key 和 value ```c= fptr f = mymap2.find((Instruction *)inst)->second; (this->*f)(inst); ``` 最後實作完成的時候發現的bug 就是 thread 為1個的時候很正常 ![](https://i.imgur.com/cENu81A.png) 左邊是我的版本，有加上那麼多 thread 很正常也就是說我額外寫了一個產生程式碼的工具 ```c= #include <iostream> #include <omp.h> using namespace std; /* run this program using the console pauser or add your own getch, system("pause") or input loop */ #include <stdio.h> #include <stdlib.h> #include <vector> #include <fstream> int main() { ofstream myfile ("exampletest.c"); int i = 0; if (myfile.is_open()) { myfile << "int main (){\n"; for (int i =0 ; i <15; i++){ //myfile << "inline void test"<<i<<"()"<<"{ int *i"<<i<<"; i"<<i<<"=malloc(sizeof (int)*1); free(i"<<i<<"); }\n"; // myfile<<"inline void test"<<i<<"()" <<"{ int i, j, k; int *a = &i; int *b = &k; a = &j; int **p = &a; int **q = &b; p = q; int *c = *q; if(a>0) q=p; else a=&k; }\n"; //myfile << "test"<<i<<"();\n"; //myfile <<"int i, j, k;\nint *a = &i;\nint *b = &k;\na = &j;\nint **p = &a;\nint **q = &b;\np = q;\n "; myfile <<"int i"<<i<<", j"<<i<<", k"<<i<<";\nint *a"<<i<<" = &i"<<i<<";\nint *b"<<i<<" = &k"<<i<<";\na"<<i<<" = &j"<<i<<";\nint **p"<<i<<" = &a"<<i<<";\nint **q"<<i<<" = &b"<<i<<";\np"<<i<<" = q"<<i<<";\n "; } myfile << "}\n"; myfile.close(); } else cout << "Unable to open file"; } ``` 可以輸出成一個exampletest.c 檔案 ![](https://i.imgur.com/UD7yEYF.png) ```bash= vim gen.cpp g++ gen.cpp -o gen ./gen clang -S -emit-llvm exampletest.c -o testexe.ll opt-10 -S -load ../lib/libAndersen.so -dump-result -anders-aa ./testexe.ll -o andertest.ll ``` 真實環境我是不知道有沒有這麼多指標互指，大專案應該是有 ![](https://i.imgur.com/G5qjUJb.png) 那麼我產生了150000 次得到了程式碼數量快兩百萬行的程式碼這邊時間以2000001行來測試，這邊時間比較偏向於一開始蒐集的位置可以知道蒐集約束這邊應該不是瓶頸，或許是後面的fucntion void collectConstraints(const llvm::Module &); void optimizeConstraints(); <==== void solveConstraints(); <=== 其實在分析專案前已經看過了，那邊也是在做類似的事情，也就是程式碼在其他 fucntion 的時候遍歷fucntion return 等等，不是在 main fucntion thread pool 單執行緒版本 30s ![](https://i.imgur.com/ChAMoSM.png) 原版 23s ![](https://i.imgur.com/0snLhcH.png) 可以看到為什麼我在這邊還是單執行緒在程式碼量小的時候還可以多開thread 去處理可能要自己調task pool 和 thread 數量，也就是前面我說的為什麼有bug，太多 thread 搶著要mymap2 那這個地方又有加鎖，所以程式會被整個 lock 住 ![](https://i.imgur.com/Xl2MqH6.png) 結論就是單一thread thread pool lose 多多少少 pthread_mutex_lock(&lock); 會影響 Original win 多 thread thread pool lose 程式碼量大的話 Original win thread pool版本都會小輸幾秒，應該就是最大的瓶頸就在存取我們的mymap2 這部分。 2thread ![](https://i.imgur.com/rZdQ9o5.png) 4thread ![](https://i.imgur.com/8KtIZi4.png) 8thread ![](https://i.imgur.com/NBuIg9X.png) Original win [](https://i.imgur.com/uiySM3M.png)