# Ownership for C Language ###### tags: `Rust` > [GitHub repo - linD026/Ownership-Checker](https://github.com/linD026/Ownership-Checker) --- ## Related Work ### Ownership Concept * **Ownership is Theft: Experiences Building an Embedded OS in Rust**, PLOS2015, http://www.amitlevy.com/papers/tock-plos2015.pdf * **Comprehensive Rust**, Google, https://google.github.io/comprehensive-rust/ownership.html * **References and Borrowing**, Rust-lang Book, https://doc.rust-lang.org/1.8.0/book/references-and-borrowing.html * **Checker Developer Manual**, `llvm.org`, https://clang-analyzer.llvm.org/checker_dev_manual.html ### Rust - Clone and Copy > Clone is designed for arbitrary duplications: a Clone implementation for a type T can do arbitrarily complicated operations required to create a new T. It is a normal trait (other than being in the prelude), and so requires being used like a normal trait, with method calls, etc. > > The Copy trait represents values that can be safely duplicated via memcpy: things like reassignments and passing an argument by-value to a function are always memcpys, and so for Copy types, the compiler understands that it doesn't need to consider those a move. >> Reference: [What is the difference between Copy and Clone?](https://stackoverflow.com/questions/31012923/what-is-the-difference-between-copy-and-clone) ### Rust - References and Borrowing 1. one or more references (&T) to a resource, 2. exactly one mutable reference (&mut T). --- ## Implemetation :::info - https://sparse.wiki.kernel.org/ - https://github.com/error27/smatch ::: ### Compiler parser ```graphviz digraph main { node [shape=record] rankdir=TB file [label="file"] typedec [label="{<t>type declaration|{pointer type|struct type\nwith pointer member|array pointer}}"] function [label="{<f>function|{return type|parameter|<s>scope}}}"] scope [label="scope"] file -> typedec:t file -> function:f function:s -> scope } ``` :::warning **Question:** Should we use LLVM/GCC IR, o just analyze the source code C? **GCC Doc** - https://en.wikibooks.org/wiki/Category:Book:GNU_C_Compiler_Internals - https://en.wikibooks.org/wiki/GNU_C_Compiler_Internals/GNU_C_Compiler_Architecture_4_1 - http://lcs.ios.ac.cn/~xzx/memmodel.pdf - https://lwn.net/Articles/806099/ - https://lwn.net/ml/gcc-patches/1573867416-55618-35-git-send-email-dmalcolm@redhat.com/ - https://www.pspace.org/a/thesis/baueran_thesis.pdf - https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Attribute-Syntax.html#Attribute-Syntax ::: ### Example of GCC plugin > Writing a new plugin for GCC static analysis involves the following general steps: > > Familiarize yourself with the GCC plugin API and the internals of the GCC compiler. The GCC documentation provides detailed information on the plugin API and the structure of the code. > > Choose a specific analysis you want to implement, such as memory leak detection, buffer overflow detection, or thread-safety analysis. > > Create a new directory for your plugin and write the code for your analysis. You will need to write callbacks for the GCC plugin API that will be invoked at specific points during the compilation process. > > Write a configure script that checks for the necessary dependencies and generates a Makefile to build the plugin. > > Build the plugin by running make. > > Test the plugin by running the GCC compiler with the -fplugin=path/to/plugin option. > > Here is an example of a simple GCC plugin that prints a message when the plugin is loaded: ```cpp #include <gcc-plugin.h> #include <plugin-version.h> int plugin_is_GPL_compatible; int plugin_init (struct plugin_name_args *plugin_info, struct plugin_gcc_version *version) { printf("Hello from the GCC plugin!\n"); return 0; } ``` Example: ```cpp // Warn on the pure pointer type in function prototype, i.e. `int *p` . int function(int __move *move, int __clone *clone, int copy) { int *objA; int __heap *objB_from_heap = malloc(); // obj, allocate memory from heap // change ownership objA = objB_from_heap; // end of scope // - end of lifetime: move, clone, copy return *objA; // pass value, don't have to consider the ownership } // Warning: don't have free() for obj, move, and clone ``` Rust-style of checker ```cpp int *funcA(const int /* __immutable */ *imt, int __mut *mut, int __brw *brw) { ... } // Warning: don't have free() for imt, and mut ``` ### Headers ```cpp #include <...> #include "..." ``` ```bash exec_prog target_file.c -I {directory} ``` ### Heap allocation function markers ```cpp // malloc, zalloc, realloc, etc *alloc(); free(); ``` Allocation and release function marker, for example: ```cpp void * __alloc my_alloc(...); void __free my_free(...); ``` ### Move and clone traits Pass heap object to another function uses: ```cpp move_to(); clone_to(); ``` example: ```cpp void funcB(int __move *obj) { free(obj); } void funcA(void) { int *obj = malloc(); funcB(move_to(obj)); } ``` ### pointer Markers Function prototypes: ```cpp __clone __move ``` ### Structure ```cpp struct obj_struct { int *p; int a; }; struct obj_struct function(void) { struct obj_struct obj; // heap memory allocated. obj.p = malloc(); ... return obj; // Don't have to warn, `lose the ownership, may case memleak`. // But take care of the ownership of obj.p. // We can warn the user to take care of it. } void upper_function(void) { // Get the ownership of obj.p struct obj_struct obj = function(); ... // Will the function free the obj.p or change the ownership? } ``` ### Multiple threads with files Use one thread to scan one file. For the headers, use the table to prevent the multiple scanning. ```bash exec_prog target_file.c -j 4 ``` ### Routine 1. Scan header(s). 2. Add allocation and release functions (from `__alloc` and `__free`) to allocated checker. 3. Scan source code and check the ownership. 1. Scan the structure type. (take care of pointer member) 2. Scope scanning, add pointer types 3. Scan the function to create the dependency graph. 4. Check the crossing scope situation. ## State Mechine ```graphviz digraph main { node [shape=record] rankdir=TB funcA [label="{funcA ()\nint *obj = malloc()\n|obj.sm_state = OWNED(funcA)}"] funcB [label="{funcB (obj)|obj.sm_state=\nDROPPED, MOVE_TO(funcB)}"] funcC [label="{funcC (__clone obj)|obj.sm_state=\nCLONE_TO(funcC)}"] funcD [label="{funcD (__brw obj)|obj.sm_state=\nBORROWED(funD)}"] funcA -> funcB funcA -> funcC funcA -> funcD } ``` ### Routine ```graphviz digraph main { node [shape=box] rankdir=TB function -> block_1, block_2 { rank=same block_1 -> block_2 [label="check/clean\nblock_local_var_list"] } { rank=same curr, function } { rank=sink block_local_var_list, stack_var_list } curr -> block_1 block_1 -> varA stack_var_list, block_local_var_list -> varA } ``` ### Case - borrow from stmt ```cpp int *b = malloc(sizeof(int)); int __brw *a = b; // search from the stack_var_list function(a); /* only allow the borrow state */ ``` # GCC plugin > https://gcc.gnu.org/onlinedocs/gccint/Plugins.html#Plugins > https://thinkingeek.com/2015/08/16/a-simple-plugin-for-gcc-part-1/ GCC plugins are loadable modules that provide extra features to the compiler. Like GCC itself they can be distributed in source and binary forms. ## Enviroments Here is my enviroment when I was writing this note. ### Directories Main-dir: `$HOME/gcc-dev` Sub-dir: `build gcc gcc-install plugin` - `build`: the dir we build the gcc. - `gcc`: source code of GCC. - `gcc-install`: the headers and bin will be installed at. - `plugin`: my plugin repo. ## Build > https://gcc.gnu.org/install/configure.html > http://gcc.gnu.org/install/prerequisites.html ### Configuration and Install > https://gcc.gnu.org/install/configure.html First, in gcc source directory use the following cmd to install the requistites. ```bash # gcc $ ./contrib/download_prerequisites ``` Second, in build directory use the following cmd to get the Makefile. "../gcc/configure" should be the path to source code. ```bash # build $ ../gcc/configure --prefix=$HOME/gcc-dev/gcc-install --disable-multilib --enable-languages=c,c++ ``` :::warning If you change the prefix and want to build again, use the following cmd to cleanup: ```bash # build $ make distclean ``` **GMP header problem** fatal error: gmp.h: No such file or directory: ``` $ apt-get install libgmp3-dev ``` ::: :::info > https://gabrieleserra.ml/blog/2020-08-27-an-introduction-to-gcc-and-gccs-plugins.html ``` -fno-rtti disables generation of information about every class with virtual functions for use by the C++ runtime type identification features. If you don’t use those parts of the language, you can save some space by using this flag. GCC itself is built with -fno-rtti. Hence plugins that create passes will need to be built with RTTI disabled in order to link against gcc, or they will fail to load ``` ::: ## GIMPLE > https://thinkingeek.com/2015/08/16/simple-plugin-gcc-part-2/ > https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html#GIMPLE > https://gavinray97.github.io/blog/adding-invariant-to-cpp-design-by-contract All other blocks have their content simply dumped. GCC provides several debugging functions and here we just call print_gimple_seq because the code of a basic block is a gimple sequence (gimple_seq). Every element of a gimple_seq is a gimple statement (gimple_stmt) but we will not be handling them today. Gimple statements are printed in a pseudo-C syntax, so it is easy for the compiler writer to see what is going on (line 66). ### Block For each function, we will emit a subgraph for it. In order to avoid name clashes, we will name the subgraph using the address of fun (e.g. we will emit things like subgraph fun_0x1234 {). Then we proceed to iterate each of the basic blocks of the function. GCC provides a handy macro for this called FOR_ALL_BB_FN(bb, fun) that will iterate for all the basic blocks bb of the function fun (line 45). Once we have the basic block we want its gimple basic block info gimple_bb_info (line 47). Each basic block in a function has an index starting from 0. There are two special basic blocks: 0 and 1. The basic block number 0 is the entry of the function and the basic block number 1 is the exit of the function. These basic blocks exist for every function (even if it is empty). They exist because they make the analysis much more regular: there is always a single entry and a single exit block for every function rather than several exits, or entries (the latter would be terrible!). These two basic blocks do not have code inside, so we handle them in a special way, clearly labeling them as ENTRY and EXIT respectively. We complete the label with the location of the beginning and exit of the function respectively (lines 50 to 63). ### Block edge Basic blocks have outgoing and ingoing edges. In order to draw the graph it is enough to use the outgoing edges (or the ingoing edges, but not both unless we want repeated arrows). Similar to the iterator of basic blocks, there is a macro to iterate the edges of a basic block (line 63). We iterate the successors of the current basic block bb, bb->succs, and then we emit an edge from the current basic block to the target basic block. Two things to note here: the Graphviz syntax does not care if a node has not yet been defined when defining an edge (this is important when we have forward edges) and the names of the nodes are global to the graph, this is why our nodes (see lines 43 and 77) are of the form `bb_<<fun>>_<<index>>`. ### GIMPLE seq stmt iterator https://gcc.gnu.org/onlinedocs/gccint/Sequence-iterators.html#Sequence-iterators ### GIMPLE code https://gcc.gnu.org/onlinedocs/gccint/Tuple-specific-accessors.html#Tuple-specific-accessors https://gcc.gnu.org/onlinedocs/gccint/GIMPLE_005fCALL.html#GIMPLE_005fCALL ### Attribute lookup ```cpp if (lookup_attribute ("warn_unused_result", TYPE_ATTRIBUTES (ftype))) { location_t loc = gimple_location (stmt); if (fdecl) warning_at (loc, OPT_Wunused_result, "ignoring return value of %qD, " "declared with attribute warn_unused_result", fdecl); else warning_at (loc, OPT_Wunused_result, "ignoring return value of function " "declared with attribute warn_unused_result"); } ``` ```cpp // gcc/c-family/c-warn.cc void do_warn_unused_parameter (tree fn) { tree decl; for (decl = DECL_ARGUMENTS (fn); decl; decl = DECL_CHAIN (decl)) if (!TREE_USED (decl) && TREE_CODE (decl) == PARM_DECL && DECL_NAME (decl) && !DECL_ARTIFICIAL (decl) && !warning_suppressed_p (decl, OPT_Wunused_parameter)) warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wunused_parameter, "unused parameter %qD", decl); } ```