# 嘗試修改專案為平行化測試
修改專案後位置
https://github.com/x213212/andersen
上次課堂作業
https://x8795278.blogspot.com/2020/11/simple-llvm-pass-analysis.html
分析的專案
https://github.com/grievejia/andersen
在尋找可以開啟修改為平行化的地方,雖然後面改完了但是發現一個大bug
主要想法是
https://github.com/grievejia/andersen/blob/master/include/Andersen.h
在這個地方
![](https://i.imgur.com/rbRqzRc.png)
去尋找程式進入點那我選擇的是 collectConstraints 這個地方
https://github.com/grievejia/andersen/blob/ec036a43ac7bc3523cf50d43ad711c095c5f1112/lib/ConstraintCollect.cpp
可以看到
![](https://i.imgur.com/MXMkegw.png)
```c=
// Here is a notable point before we proceed:
// For functions with non-local linkage type, theoretically we should not
// trust anything that get passed to it or get returned by it. However,
// precision will be seriously hurt if we do that because if we do not run a
// -internalize pass before the -anders pass, almost every function is marked
// external. We'll just assume that even external linkage will not ruin the
// analysis result first
for (auto const &f : M) {
if (f.isDeclaration() || f.isIntrinsic())
continue;
// Scan the function body
// A visitor pattern might help modularity, but it needs more boilerplate
// codes to set up, and it breaks down the main logic into pieces
// First, create a value node for each instruction with pointer type. It is
// necessary to do the job here rather than on-the-fly because an
// instruction may refer to the value node defined before it (e.g. phi
// nodes)
for (const_inst_iterator itr = inst_begin(f), ite = inst_end(f); itr != ite;
++itr) {
auto inst = &*itr.getInstructionIterator();
if (inst->getType()->isPointerTy())
nodeFactory.createValueNode(inst);
}
// Now, collect constraint for each relevant instruction
for (const_inst_iterator itr = inst_begin(f), ite = inst_end(f); itr != ite;
++itr) {
auto inst = &*itr.getInstructionIterator();
collectConstraintsForInstruction(inst);
}
}
```
意味者我們可以在最後一個 function
> collectConstraintsForInstruction
找到我們的蒐集相關約束的程式碼
也就是
https://github.com/grievejia/andersen/blob/ec036a43ac7bc3523cf50d43ad711c095c5f1112/lib/ConstraintCollect.cpp
![](https://i.imgur.com/OHw5afl.png)
可以看到
>switch (inst->getOpcode()) {
case Instruction::Alloca: {
這是原來的分析方式,程式流程 每一個 stmt 走 switch case 應該會是這樣
![](https://i.imgur.com/c0AjOWK.png)
應該是這樣子的走法,我想假設我在這邊引用thread pool 會發生什麼事
我隨便找了一個 thread pool 就直接改了
https://github.com/AngryHacker/articles/issues/1#issue-369867252
那麼合體後可能
![](https://i.imgur.com/qvc2K3q.png)
程式流程會變成這樣
可以看到原本的 function 裡面有這些東西
代表 可能nodeFactory 可能有順序上關係,那我們要找方法迴避一下
> NodeIndex valNode = nodeFactory.getValueNodeFor(inst);
下面有用到一些作弊 和 function 指標的方式算是紀錄一下
我打算 在每一個 task 其實只是單存 插入 以 inst 當 key 和 fptr 當 value
>typedef void (Andersen::*fptr)(const llvm::Instruction *);
map<const llvm::Instruction *, fptr> mymap2;
那麼我就可以在 任務隊列任務出來的時候看當時設定的幾個 thread 來決定處理幾個
我額外用一個
Andersen *thistmp;
再void Andersen::collectConstraints(const Module &M)
裡面去 指向自己
thistmp = this;
這樣方便我們在 task 指定的時候 防止抓不到 我們自己的 fucntion
主要是Andersen fucntion 定義在外面 所以用了指標,絕對不是炫技xd
那麼簡單說一下技巧可能 在 cmake 的時候 仔細看
![](https://i.imgur.com/2x0AfNw.png)
![](https://i.imgur.com/26mwArT.png)
加入我們的 -lpthread
再把剛剛的 專案 make 完後的
libthreadpool.o
複製到
CMakeFiles/AndersenObj.dir/libthreadpool.o
回到目錄我們在 一開始的 ConstraintCollect.cpp
#include <threadpool.h> 也不會在編譯完 找不到我們的 標頭檔了
![](https://i.imgur.com/qkOjr8W.png)
![](https://i.imgur.com/IfgXVDB.png)
再往下面看我們蒐集約束的位置
```c=
// Now, collect constraint for each relevant instruction
//#pragma omp for schedule(static)
for (const_inst_iterator itr = inst_begin(f), ite = inst_end(f); itr != ite;
++itr) {
// fprintf(stderr, "%p\n", &inst);
auto inst = &*itr.getInstructionIterator();
threadpool_add(pool, &dummy_task, (Instruction *)inst, 0);
pthread_mutex_lock(&lock);
tasks++;
pthread_mutex_unlock(&lock);
// collectConstraintsForInstruction(inst);
}
```
到這邊我們要來看一下dummy_task
這邊我對 arg 進行了強轉,沒有gdb 只能 printf 記憶體位置,如果指向
auto inst 和arg記憶體位置 一樣
有成功 ,很大機率就是成功了,記得一直強轉一下或許有機會
```c=
void dummy_task(void *arg) {
(thistmp->collectConstraintsForInstruction)((Instruction *)arg);
}
```
這邊可以看到 thistmp 為什麼要用這個呢原因就是
要在 外部 fucntion 去呼叫我們的 void Andersen:: class 裡面的 fucntion ,其實是不能的
void 不行呼叫 void Andersen:: fucntion
![](https://i.imgur.com/XTEcRps.png)
也就是這樣呼叫方式是不行的
所以我先預存 this 指標
![](https://i.imgur.com/uJUaaNf.png)
這樣就可以了
這邊用到了兩次指標,再來看我們的switch case 變得怎樣吧
那麼目前我們的 測資只有用到這些 case function
也就是說對應到原本的code
```c=
switch (inst->getOpcode()) {
case Instruction::Alloca: {
NodeIndex valNode = nodeFactory.getValueNodeFor(inst);
assert(valNode != AndersNodeFactory::InvalidIndex &&
"Failed to find alloca value node");
NodeIndex objNode = nodeFactory.createObjectNode(inst);
constraints.emplace_back(AndersConstraint::ADDR_OF, valNode, objNode);
break;
}
case Instruction::Call:
case Instruction::Invoke: {
ImmutableCallSite cs(inst);
assert(cs && "Something wrong with callsite?");
addConstraintForCall(cs);
break;
}
case Instruction::Ret: {
if (inst->getNumOperands() > 0 &&
inst->getOperand(0)->getType()->isPointerTy()) {
NodeIndex retIndex =
nodeFactory.getReturnNodeFor(inst->getParent()->getParent());
assert(retIndex != AndersNodeFactory::InvalidIndex &&
"Failed to find return node");
NodeIndex valIndex = nodeFactory.getValueNodeFor(inst->getOperand(0));
assert(valIndex != AndersNodeFactory::InvalidIndex &&
"Failed to find return value node");
constraints.emplace_back(AndersConstraint::COPY, retIndex, valIndex);
}
break;
}
case Instruction::Load: {
if (inst->getType()->isPointerTy()) {
NodeIndex opIndex = nodeFactory.getValueNodeFor(inst->getOperand(0));
assert(opIndex != AndersNodeFactory::InvalidIndex &&
"Failed to find load operand node");
NodeIndex valIndex = nodeFactory.getValueNodeFor(inst);
assert(valIndex != AndersNodeFactory::InvalidIndex &&
"Failed to find load value node");
constraints.emplace_back(AndersConstraint::LOAD, valIndex, opIndex);
}
break;
}
case Instruction::Store: {
if (inst->getOperand(0)->getType()->isPointerTy()) {
NodeIndex srcIndex = nodeFactory.getValueNodeFor(inst->getOperand(0));
assert(srcIndex != AndersNodeFactory::InvalidIndex &&
"Failed to find store src node");
NodeIndex dstIndex = nodeFactory.getValueNodeFor(inst->getOperand(1));
assert(dstIndex != AndersNodeFactory::InvalidIndex &&
"Failed to find store dst node");
constraints.emplace_back(AndersConstraint::STORE, dstIndex, srcIndex);
}
break;
}
```
```c=
void Andersen::instAlloca(const Instruction *inst) {
NodeIndex valNode = nodeFactory.getValueNodeFor(inst);
assert(valNode != AndersNodeFactory::InvalidIndex &&
"Failed to find alloca value node");
NodeIndex objNode = nodeFactory.createObjectNode(inst);
constraints.emplace_back(AndersConstraint::ADDR_OF, valNode, objNode);
}
void Andersen::instInvoke(const Instruction *inst) {
ImmutableCallSite cs(inst);
assert(cs && "Something wrong with callsite?");
addConstraintForCall(cs);
}
void Andersen::instRet(const Instruction *inst) {
if (inst->getNumOperands() > 0 &&
inst->getOperand(0)->getType()->isPointerTy()) {
NodeIndex retIndex =
nodeFactory.getReturnNodeFor(inst->getParent()->getParent());
assert(retIndex != AndersNodeFactory::InvalidIndex &&
"Failed to find return node");
NodeIndex valIndex = nodeFactory.getValueNodeFor(inst->getOperand(0));
assert(valIndex != AndersNodeFactory::InvalidIndex &&
"Failed to find return value node");
constraints.emplace_back(AndersConstraint::COPY, retIndex, valIndex);
}
}
void Andersen::instLoad(const Instruction *inst) {
if (inst->getType()->isPointerTy()) {
NodeIndex opIndex = nodeFactory.getValueNodeFor(inst->getOperand(0));
assert(opIndex != AndersNodeFactory::InvalidIndex &&
"Failed to find load operand node");
NodeIndex valIndex = nodeFactory.getValueNodeFor(inst);
assert(valIndex != AndersNodeFactory::InvalidIndex &&
"Failed to find load value node");
constraints.emplace_back(AndersConstraint::LOAD, valIndex, opIndex);
}
}
void Andersen::instStore(const Instruction *inst) {
if (inst->getOperand(0)->getType()->isPointerTy()) {
NodeIndex srcIndex = nodeFactory.getValueNodeFor(inst->getOperand(0));
assert(srcIndex != AndersNodeFactory::InvalidIndex &&
"Failed to find store src node");
NodeIndex dstIndex = nodeFactory.getValueNodeFor(inst->getOperand(1));
assert(dstIndex != AndersNodeFactory::InvalidIndex &&
"Failed to find store dst node");
constraints.emplace_back(AndersConstraint::STORE, dstIndex, srcIndex);
}
}
```
這邊可以看到我先把 case 裡面的東西給 獨立出來成一個 fucntion 當switch case 條件成真的時候 我把我當時的 this Andersen::instAlloca; fucntion 指標給指給 fptr
再來我把 inst 當key fptr 當 value
並加上鎖
>pthread_mutex_lock(&lock);
> pthread_mutex_unlock(&lock);
```c=
switch (inst->getOpcode()) {
case Instruction::Alloca: {
// Andersen *p =this;
// cout << "Alloca" <<endl;
// cout << "fuck123" <<endl;
pthread_mutex_lock(&lock);
fptr f = &Andersen::instAlloca;
mymap2[inst] = f;
done++;
pthread_mutex_unlock(&lock);
// Andersen xobject;
// (this->*f)(inst);
//
//(this)->f(inst);
// instAlloca(inst);
break;
}
case Instruction::Call:
case Instruction::Invoke: {
pthread_mutex_lock(&lock);
fptr f = &Andersen::instInvoke;
// (this->*f)(inst);
mymap2[inst] = f;
done++;
pthread_mutex_unlock(&lock);
// cout << "fuck" <<endl;
// fptr f = instInvoke;
// f(inst);
break;
}
case Instruction::Ret: {
pthread_mutex_lock(&lock);
fptr f = &Andersen::instRet;
// (this->*f)(inst);
mymap2[inst] = f;
done++;
pthread_mutex_unlock(&lock);
// cout << "fuck2" <<endl;
// fptr f = instRet;
// f(inst);
break;
}
case Instruction::Load: {
pthread_mutex_lock(&lock);
fptr f = &Andersen::instLoad;
// (this->*f)(inst);
mymap2[inst] = f;
done++;
pthread_mutex_unlock(&lock);
// cout << "fuck3" <<endl;
// fptr f = instLoad;
// f(inst);
// instLoad(inst);
break;
}
case Instruction::Store: {
pthread_mutex_lock(&lock);
// cout << "fuck4" <<endl;
fptr f = &Andersen::instStore;
mymap2[inst] = f;
done++;
pthread_mutex_unlock(&lock);
// (this->*f)(inst);
// fptr f = instStore;
// f(inst);
// instStore(inst);
break;
}
}
```
到這邊產生任務已經完成了
如何完成一個任務呢
```c=
fprintf(stderr, "Added %d tasks\n", tasks);
/* 不断检查任务数是否完成一半以上,没有则继续休眠 */
while (done != tasks) {
// fprintf(stderr, "Did %d tasks\n", done);
// usleep(1);
}
fprintf(stderr, "Did %d tasks\n", done);
/* 这时候销毁线程池,0 代表 immediate_shutdown */
assert(threadpool_destroy(pool, 0) == 0);
int k = 0;
for (auto const &f : M) {
if (f.isDeclaration() || f.isIntrinsic())
continue;
for (const_inst_iterator itr = inst_begin(f), ite = inst_end(f); itr != ite;
++itr) {
k++;
auto inst = &*itr.getInstructionIterator();
// fprintf(stderr, "%p\n", &inst);
去找我的 key 和 value
fptr f = mymap2.find((Instruction *)inst)->second;
;
(this->*f)(inst);
// fprintf(stderr, "%d\n", k);
// mymap2[inst] = f;
// collectConstraintsForInstruction(inst);
}
}
```
因為順序性問題
我透過
去找我的 key 和 value
```c=
fptr f = mymap2.find((Instruction *)inst)->second;
(this->*f)(inst);
```
最後實作完成的時候發現的bug 就是
thread 為1個的時候很正常
![](https://i.imgur.com/cENu81A.png)
左邊是我的版本,有加上那麼多 thread 很正常
也就是說
我額外寫了一個 產生程式碼的工具
```c=
#include <iostream>
#include <omp.h>
using namespace std;
/* run this program using the console pauser or add your own getch, system("pause") or input loop */
#include <stdio.h>
#include <stdlib.h>
#include <vector>
#include <fstream>
int main()
{
ofstream myfile ("exampletest.c");
int i = 0;
if (myfile.is_open()) {
myfile << "int main (){\n";
for (int i =0 ; i <15; i++){
//myfile << "inline void test"<<i<<"()"<<"{ int *i"<<i<<"; i"<<i<<"=malloc(sizeof (int)*1); free(i"<<i<<"); }\n";
// myfile<<"inline void test"<<i<<"()" <<"{ int i, j, k; int *a = &i; int *b = &k; a = &j; int **p = &a; int **q = &b; p = q; int *c = *q; if(a>0) q=p; else a=&k; }\n";
//myfile << "test"<<i<<"();\n";
//myfile <<"int i, j, k;\nint *a = &i;\nint *b = &k;\na = &j;\nint **p = &a;\nint **q = &b;\np = q;\n ";
myfile <<"int i"<<i<<", j"<<i<<", k"<<i<<";\nint *a"<<i<<" = &i"<<i<<";\nint *b"<<i<<" = &k"<<i<<";\na"<<i<<" = &j"<<i<<";\nint **p"<<i<<" = &a"<<i<<";\nint **q"<<i<<" = &b"<<i<<";\np"<<i<<" = q"<<i<<";\n ";
}
myfile << "}\n";
myfile.close();
}
else cout << "Unable to open file";
}
```
可以輸出成一個exampletest.c 檔案
![](https://i.imgur.com/UD7yEYF.png)
```bash=
vim gen.cpp
g++ gen.cpp -o gen
./gen
clang -S -emit-llvm exampletest.c -o testexe.ll
opt-10 -S -load ../lib/libAndersen.so -dump-result -anders-aa ./testexe.ll -o andertest.ll
```
真實環境我是不知道有沒有這麼多指標互指,大專案應該是有
![](https://i.imgur.com/G5qjUJb.png)
那麼我產生了150000 次 得到了程式碼數量快兩百萬行的程式碼
這邊時間 以2000001行來測試,這邊時間比較偏向於一開始蒐集的位置
可以知道蒐集約束這邊應該不是瓶頸,或許是後面的fucntion
void collectConstraints(const llvm::Module &);
void optimizeConstraints(); <====
void solveConstraints(); <===
其實在分析專案前已經看過了,那邊也是在做類似的事情,也就是程式碼在其他 fucntion 的時候遍歷fucntion return 等等,不是在 main fucntion
thread pool 單執行緒版本 30s
![](https://i.imgur.com/ChAMoSM.png)
原版 23s
![](https://i.imgur.com/0snLhcH.png)
可以看到為什麼我在這邊 還是單執行緒在程式碼量小的時候還可以多開thread 去處理可能要自己調task pool 和 thread 數量,也就是前面我說的為什麼有bug,太多 thread 搶著要mymap2 那這個地方又有加鎖,所以程式會被整個 lock 住
![](https://i.imgur.com/Xl2MqH6.png)
結論就是
單一thread
thread pool lose
多多少少 pthread_mutex_lock(&lock);
會影響
Original win
多 thread
thread pool lose
程式碼量大的話
Original win
thread pool版本都會小輸幾秒,應該就是最大的瓶頸就在存取我們的mymap2
這部分。
2thread
![](https://i.imgur.com/rZdQ9o5.png)
4thread
![](https://i.imgur.com/8KtIZi4.png)
8thread
![](https://i.imgur.com/NBuIg9X.png)
Original win
[](https://i.imgur.com/uiySM3M.png)