LLVM Winter Of Code- Things done so far

# LLVM Winter Of Code- Things done so far Current idea is solving issues and contributing to `clang` through llvm Phabricator. Currently done: - project: clang. - issue to solve: https://github.com/llvm/llvm-project/issues/50308 Links to learn from: - [ ] https://eli.thegreenplace.net/tag/llvm-clang - [ ] https://lowlevelbits.org/how-to-learn-compilers-llvm-edition/ - [x] https://www.youtube.com/watch?v=m8G_S5LwlTo - [ ] http://www.nondot.org/sabre/Resume.html#writing - [ ] https://online.stanford.edu/courses/soe-ycscs1-compilers - [x] https://llvm.org/docs/ProgrammersManual.html - [x] https://www.youtube.com/watch?v=bWH-nL7v5F4 Docs Links: - https://llvm.org/docs/LangRef.html - https://llvm.org/doxygen/index.html for file structure. - https://llvm.org/docs/ProgrammersManual.html for modifying any code. - https://llvm.org/docs/LangRef.html LLVM IR - https://en.cppreference.com/w/ Additional links: - https://www.youtube.com/watch?v=qzljG6DKgic Flang links: - https://docs.google.com/spreadsheets/d/1FvHPuSkGbl4mQZRAwCIndvQx9dQboffiD-xD0oqxgU0/edit#gid=0 ## Notes ### Initial build steps in case you delete the files: - make a build folder in the project. - cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_PROJECTS="clang" -DCMAKE_INSTALL_PREFIX=/home/razetime/Documents/Code/general/llvm-project/build -DLLVM_USE_LINKER=gold ../llvm - or, cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=/home/razetime/Documents/Code/general/llvm-project/build -DLLVM_ENABLE_LLD=ON ../llvm - use gold linker or lld because ld seems to crash for some annoying reason - `ninja -j2 install` or `ninja -j1 install` because multiple jobs eat so much RAM(especially with clang included). - *REMOVE CMakeCache.txt if the build target is not set properly.* - `clang-14 -v` to check installation - `flang-new -v` ### Contribution steps: - Make a patch to llvm-project repo locally. - ninja -j2 install - check the working of your problem. - if it's good, show to manwe. Get it checked on phabricator - hopefully it gets in ### Useful commands for IR cheatsheet: - `clang-14 -emit-llvm -S file.cpp -o file.ll` - `opt -analyze -dot-cfg-only` for the Control flow graph of an IR file - `clang-14 -g` for debuggable output - `grep -rnw <folder> -e <string>` is unbelievably useful for finding relevant code segments ### manwe notes: - check once if the reproducer case is okay? Ideally, following up from the bugzilla bug 50964, we should have had a memset optimization for the reproducer case - Commands used: 1. clang -S -emit-llvm memset.c 2. opt -loop-idiom -S --debug-only="raze" givenIR.ll 3. -debug flag outpu makes no sense to me - Weird thing: Doing a `clang -S -emit-llvm -Oz memset.c` simply removes the entire loop - There's one class and one function in `llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp` that is responsible for handling the loop idioms. I added a print statement to LoopIdiomRecognizePass::run() and it was printed ONLY for `clang -S -emit-llvm -Oz memset.c` and not for the other `clang & opt` combination. - Question? In the generated ll file from `clang -S -emit-llvm -Oz memset.c`, there are all function calls, correct return values, but where is the loop? However, when you see the values stored in the array, those are all correct. - Trying out to see which level of optimization makes the loop disappear 1. -O0 has the loop intact 2. -O1 has the loop intact 3. -O2 has done the loop unrolling 4. -O3 has done even more aggressive unrolling 5. Loops have completely disappeared with -Ofast, -Os, and -Oz - Question? Are there only certain kinds of loops that are disappearing? What of more complex loops? For example: ``` for(start = 0; start < end; start++) noob_fun(); ``` The loop is NOT disappearing. Means we need a more complex example. I think the memset optimization is happening. If I make ANY example of the form ``` if(start < end) A[start] = CONSTANT_VALUE ``` The loop is disappearing for all of -Ofast, -Os, and -Oz. However, the array is correctly populated **What is happening in `llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp`** 1. Under LoopIdiomRecognizePass::run(), we have a LIR (loop recognize pass) check `runOnLoop()`. That returns a 0 for -Oz. I think this means for -Oz, the loop idiom recognize is telling the pass manager to NOT optimize (since we already have decent optimizations?) *EDIT:* On further analysis it's found that one of these two conditions have to be true, or `LIR.RunOnLoop(L)` is false. ``` if (DisableLIRP::All) return false; if (skipLoop(L)) return false; ``` 2. In `LIR->runOnLoop(...)` defined in the same file, the condition `(HasMemset || HasMemsetPattern || HasMemcpy)` is true as well as (`hasLoopInvariantBackedgeTakenCount`) is also true and `runOnCountableLoop()` returns a false. 3. Out of the three, `HasMemset` is true. I think `TLI->has(LibFunc_memset)` is checking for the presence of memset in the library included, implying memset optimizations to the loop have already happened by the time control came here. Could it be possible that some lower level optimization (ex -O2 or something) is already happening BEFORE -Oz 4. `hasLoopInvariantBackedgeTakenCount()` returns a true for a loop having countable trip counts (means we can deduce the number of iterations at compile time and thus perform optimizations). This is defined in `llvm/include/llvm/Analysis/ScalarEvolution.h`. But even if you input the variable `end` from the user, this check is still returning true. There are gaps in understanding of this function ### Further checking from given llvm IR on github issue, `loop-rotate` applies the necessary optimization for some reason. ``` ./llvm/lib/Transforms/Utils/LoopRotationUtils.cpp ./llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp ./llvm/lib/Transforms/Utils/LoopUnroll.cpp ./llvm/lib/Transforms/Scalar/LoopRotation.cpp ``` Are also possible cadidates for a fix, maybe. From asking in the IRC: > i see. roughly, that loop is countable as per scev: https://llvm.godbolt.org/z/K488d87rE (backedge-taken count is computable) > so it should be handled somewhere in LoopIdiomRecognize::runOnCountableLoop() Stuff from llvm site: > The llvm/Support/Debug.h (doxygen) file provides a macro named LLVM_DEBUG() that is a much nicer solution to this problem. Basically, you can put arbitrary code into the argument of the LLVM_DEBUG macro, and it is only executed if ‘opt’ (or any other tool) is run with the ‘-debug’ command line argument: Names to look up inside `LoopIdiomRecognize::runOnCountableLoop()`: - [x] SimpleLoopSafetyInfo - possible problem here(can give false positives to avoid complex analysis) - [x] runOnLoopBlock Currently added debug statements to each part of runoncountableloop to find out which parts affect the IR. Removed clang for now since we are only messing with opt. `runOnCountableLoop` looks like it is doing fine. **important:** The pain point looks like it is at L641 in Loopidiomrecognize, where the memset pattern recognition is written. `processLoopStores` takes each store and checks whether it goes into a memset type pattern. so we have multiple places to observe: - `processLoopMemIntrinsic` - `StoreRefsForMemset` - `StoreRefsForMemsetPattern` - `processLoopStores` - `processLoopStoreOfLoopLoad` not sure what ``` MadeChange |= processLoopMemIntrinsic<MemCpyInst>( BB, &LoopIdiomRecognize::processLoopMemCpy, BECount); ``` this thing does yet. `runOnLoopBlock` terminates early on here: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp#L628 where it implies that all exit paths are not dominated. And this is where `runOnLoopBlock` teminates with a false value. New useful link: https://llvm.org/docs/LoopTerminology.html removing the loop causes optimization to happen. Current Patch: https://reviews.llvm.org/D116754 Current Build to check: https://buildkite.com/llvm-project/premerge-checks/builds/72482 useful link: https://llvm.org/docs/Phabricator.html#phabricator-reviews New idea is to apply a simple type of loop rotation within loopidiomrecognize after performing some analysis on the loop. Current steps taken: - check the successors of each block in loop: - if it's not the header and it has a branch to the exit block, then the loop cannot be optimized. `Block != CurLoop->getHeader()` - Then, Some dunctions which might have use: - llvm::formDedicatedExitBlocks (Loop *L, DominatorTree *DT, LoopInfo *LI, MemorySSAUpdater *MSSAU, bool PreserveLCSSA) - void llvm::LoopBase< BlockT, LoopT >::getExitingBlocks ( SmallVectorImpl< BlockT * > & ExitingBlocks ) const