# Internals of CPython * current version: Python 3.7 alpha * git commit: 03f68b60e17b57f * contributer: Louie Lu <<me at louie.lu>> This document is based on the work of *Prashanth Raghu**, who wrote [Internals of CPython 3.6](https://mail.python.org/mailman/private/core-mentorship/2017-January/003778.html). You can find it at this [Google document](https://docs.google.com/document/d/1nzNN1jeNCC_bg1LADCvtTuGKvcyMskV1w8Ad2iLlwoI/) and Python [Core-Mentorship 2017-Jan](https://mail.python.org/mailman/private/core-mentorship). And, also reference from **Guido** [Yet another guided tour of CPython](https://paper.dropbox.com/doc/Yet-another-guided-tour-of-CPython-XY7KgFGn88zMNivGJ4Jzv), and cpython-devguide [chapter compiler](http://cpython-devguide.readthedocs.io/compiler.html). Most of the actions were performed in this environment: ```shell $ uname -a 4.8.3-1-ARCH $ gcc -v gcc version 6.3.1 20170109 (GCC) ``` ## Table of Contents * 00. Pre-requirements * 01. Set up the environment * 02. Debugging the Parser * 03. Debugging the AST(Abstract Syntax Tree) generator * 04. Debugging the symbol table generator * 05. Debugging the compiler and code generator * 06. Debugging the interpreter loop * 07. Debugging Python Objects * 08. Debugging the memory allocator * 09. Debugging Python threads and sockets * 10. Python Optimizations * Appendix A - Python Sandard Library * Appendix B - Introduction to PyPy * Appendix C - Some useful terms ## Why re-write these documents? The great work from Prashanth Raghu was using Eclipse IDE as its debugging environment. What I want to change is to remove the requirement of an IDE. It should be able to work everywhere, without any specific IDE for this tour. ## 0. Prerequisites 1. Linux based operating system 2. Python source code. The code has been migrated to [GitHub](https://github.com/python/cpython), so you can easily get the latest source code by running `git clone https://github.com/python/cpython` 3. `gdb` 4. `make` ## 01. Set up the environment 1. Clone the latest cpython source code from GitHub: ```shell $ git clone https://github.com/python/cpython $ cd cpython ``` 2. Configure with `--with-pydebug` then build python from source code: ```shell $ ./configure --with-pydebug $ make -j8 -s ``` :::info You can change the number for the `-j` parameter. For preformance reasons, this number is recommended to be set to `# of physical CPU cores * 2`. ::: After this, you will see a binary file called `python` in your directory. If you are using macOS this will probably be `python.exe`. If this is the case, replace `python` with `python.exe` in all the commands that follow later in the tutorial. ```shell $ ls aclocal.m4 config.log configure Doc install-sh LICENSE Makefile.pre Modules PC pyconfig.h Python python-gdb.py test.py build config.status configure.ac Grammar Lib Mac Makefile.pre.in Objects PCbuild pyconfig.h.in python-config README.rst Tools config.guess config.sub coverage.info Include libpython3.7dm.a Makefile Misc Parser Programs python python-config.py setup.py ``` If you see it, then you are ready to go! ## 02. Debugging with GDB We will using `gdb` to trace the behaivor of `python` that we built when running it. As such, this chapter will also introduce an easy `gdb` tutorial to give you confidence when using `gdb`. ### GDB shortcut * r (run): run the program * b (break): to set the break point. * s (step): step-to-step * c (continue): continue the program, this will stop at breakpoint or some trap. * l (list): list the code where the program now running. * ctrl+x a: open tui mode. * ctrl+p: up * ctrl+n: down ### GDB intro Now, type the command `gdb python` under the directory where you built cpython: ```shell $ gdb python GNU gdb (GDB) 7.12 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from python...done. (gdb) ``` You will see the message in the bottom line that says: `Reading symbols from python...done.`. Great, that means we can now debug python via `gdb`. Now, the program hasn't run yet, it stops at the front of the program. Now we can plan what we want to do. First, let us familiarise ourselves with `gdb`. Every C program starts its execution in `main()`, so we will set a breakpoint for `main()`: ``` (gdb) b main Breakpoint 1 at 0x41d1d6: file ./Programs/python.c, line 20. (gdb) ``` `gdb` then sets a breakpoint at `Programs/python.c, line 20`. This message shows that the entry point of `Python` is at `Programs/python.c:20`. Alternativey, if you know where you want to set the breakpoint (in this example, at `Programs/python.c, line 20`) we can set the breakpoint like this: ``` (gdb) b Programs/python.c:20 Breakpoint 3 at 0x41d1d6: file ./Programs/python.c, line 20. (gdb) ``` We can run `python` now: ```shell (gdb) r Starting program: /home/grd/Python/cpython/python [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". Breakpoint 1, main (argc=1, argv=0x7fffffffdff8) at ./Programs/python.c:20 20 { (gdb) ``` We are now stop at the breakpoint we set before, list the source code with `list`: ``` (gdb) l 15 } 16 #else 17 18 int 19 main(int argc, char **argv) 20 { 21 wchar_t **argv_copy; 22 /* We need a second copy, as Python might modify the first one. */ 23 wchar_t **argv_copy2; 24 int i, res; (gdb) ``` Or just call out `tui` by `ctrl+x` a: ``` ┌──./Programs/python.c─────────────────────────────────────────────────────────┐ │15 } │ │16 #else │ │17 │ │18 int │ │19 main(int argc, char **argv) │ B+>│20 { │ │21 wchar_t **argv_copy; │ │22 /* We need a second copy, as Python might modify the first one. */│ │23 wchar_t **argv_copy2; │ │24 int i, res; │ │25 char *oldloc; │ │26 │ └──────────────────────────────────────────────────────────────────────────────┘ multi-thre Thread 0x7ffff7f9de In: main L20 PC: 0x41d1d6 (gdb) ``` In `tui` mode, you can see that we are now stopped at line 20 in the source code. If we didn't set any arguments for `python`, running `python` will get a REPL in linux or macOS. You can try with `continue` in gdb: ``` (gdb) c │29 │ │30 argv_copy = (wchar_t **)PyMem_RawMalloc(sizeof(wchar_t*) * (argc+1)); │ │31 argv_copy2 = (wchar_t **)PyMem_RawMalloc(sizeof(wchar_t*) * (argc+1)); │ │32 if (!argv_copy || !argv_copy2) { │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ multi-thre Thread 0x7ffff7f9de In: main L20 PC: 0x41d1d6 [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". Breakpoint 1, main (argc=1, argv=0x7fffffffdff8) at ./Programs/python.c:20 (gdb) c Continuing. Python 3.7.0a0 (default, Feb 22 2017, 22:10:22) [GCC 6.3.1 20170109] on linux Type "help", "copyright", "credits" or "license" for more information. >>> ``` Yep, you will get a REPL as we expected. Now that we understand some basic `gdb` concepts, we can move on to debugging Python! ## 03. Debugging the parser ### 3bb.1 Debugging the parser Before starting this chapter, let's create a simple python script, name it `test.py`, with following the contents: ```python= a = 100 ``` Now open gdb, set the arguments, and run python with the script we created: ```shell $ gdb python (gdb) set args test.py // Or doing this $ gdb --args python test.py ``` :::info This is equivalent to what we do in the linux command line: ``` $ python test.py ``` ::: Now set the breakpoint at `main` as we did before, then run it: ``` (gdb) b main Breakpoint 1 at 0x41d1d6: file ./Programs/python.c, line 20. (gdb) r Starting program: /home/grd/Python/cpython/python [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". Breakpoint 1, main (argc=1, argv=0x7fffffffdff8) at ./Programs/python.c:20 20 {Starting program: /home/grd/Python/cpython/python [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". Breakpoint 1, main (argc=1, argv=0x7fffffffdff8) at ./Programs/python.c:20 20 { (gdb) ``` #### Programs/python.c If you take a look to the source code `Programs/python.c`, you can find that the `main()` does a lot of setting with it, the core is to call the function `Py_Main(argc, argv_copy)` to start Python, which is defined in the file `Modules/main.c`: ```c=64 argv_copy2[argc] = argv_copy[argc] = NULL; setlocale(LC_ALL, oldloc); PyMem_RawFree(oldloc); res = Py_Main(argc, argv_copy); /* Force again malloc() allocator to release memory blocks allocated before Py_Main() */ (void)_PyMem_SetupAllocators("malloc"); ``` #### Modules/main.c If you are interesting in what `modules/main.c` is doing, I recommend you read Guido's work - [Yet another guided tour of CPython - Chapter 1](https://paper.dropbox.com/doc/Yet-another-guided-tour-of-CPython-XY7KgFGn88zMNivGJ4Jzv). To simplify, in `Py_Main` it will do: 1. [Initialization for hash randomiztion](https://mail.python.org/pipermail/python-announce-list/2012-March/009394.html):391 2. Reset Warning options and GetOpt: 394,395 3. Parse command line argument via `_PyOS_GetOpt()`: 397 4. Initialize python via `Py_Initialize()`: 693 5. Import `readline`: 723 6. Decide how to run: 1. run_command() 2. RunModule() 3. RunInteractiveHook() 4. RunMainFromImporter() 5. run_file() 6. PyRun_AnyFileFlags() Now, we are focused on road number 5, `run_file` in line 793: ```c=792 if (sts == -1) sts = run_file(fp, filename, &cf); ``` Stop here by typing `b 793` or `b run_file` to stop inside the function, we can check filename by `p filename`, go into function by `step`: ``` (gdb) b Modules/main.c:793 (gdb) c (gdb) p filename $1 = 0x931050 L"test.py (gdb) s ``` `run_file` is just a wrapper of `PyRun_AnyFileExFlags` which lives in `Python/pythonrun.c` and is a wrapper that calls either `PyRun_InteractiveLoopFlags()` or `PyRun_SimpleFileExFlags()`. Because we set the arguments `test.py`, it will run `PyRun_SimpleFileExFlags()`. #### Python/pythonrun.c `PyRun_SimpleFileExFlags()` will then check if there is a .pyc file to run with `maybe_pyc_file()` at the line 366 if-else statement. In our case, is should not run with .pyc, but from .py. It will then call `PyRun_FileExFlags()` in same file, and then try to construct an abstract syntax tree (AST) from file by `PyParser_ASTFromFileObject()`. This function (`PyParser_ASTFromFileObject()`) invokes the parser in the function `PyParser_ParseFileObject()`, which lives in `Parser/parsetok.c` to construct a node object, and invokes the AST tree contructor function `PyAST_FromNodeObject()` to create AST tree by node object. #### Parser/parsetok.c For now, let's take a closer look inside `PyParser_ParseFileObject()`. It first initializes the syntax token from `PyTokenizer_FromFile()`, then passes to `parsetok()` to create the node object. `parsetok()` will parse the input from the given tokenizer structure and return an error code. The interesting part is inside the infinite for loop; take a look here: ```c= type = PyTokenizer_Get(tok, &a, &b); ``` `PyTokenizer_Get()`, is a wrapper of `tok_get()`. It will return token type defined in `token.h`: :::info include/token.h ```cpp= #define ENDMARKER 0 #define NAME 1 #define NUMBER 2 #define STRING 3 #define NEWLINE 4 #define INDENT 5 #define DEDENT 6 #define LPAR 7 #define RPAR 8 #define LSQB 9 ... #define RARROW 51 #define ELLIPSIS 52 /* Don't forget to update the table _PyParser_TokenNames in tokenizer.c! */ #define OP 53 #define AWAIT 54 #define ASYNC 55 #define ERRORTOKEN 56 #define N_TOKENS 57 ``` ::: In the first round of the for-loop, we can print out the type, because it is `a`, it is a NAME, the value of type should be 1: ``` (gdb) p type $1 = 1 ``` :::info include/token.h ```c=13 #define NAME 1 ``` ::: At line 236, `str[lbbben] = '\0'` will store the token string value. If we print it out, it will be `a`. ``` (gdb) p str $2 = 0x7ffff6eb5a18 "a" ``` This make sense, since our source code is `a = 100`, the first token string should be `a`, and its type is `NAME`. Then calling `PyParser_AddToken()`, this will add the token into the parser tree at this moment. You can now try to follow the other token `=` and `100` and trace how they are added into the parser tree. ### 3.2 Generation of the grammar The textual representation of the grammar is present in the file `Grammar/Grammar`. It is written in [yacc](https://en.wikipedia.org/wiki/Yacc) and I would suggest you to go through it. The numerical representation of the grammar is present in the file `Python/graminit.c` which contains the array of [DFA's](https://en.wikipedia.org/wiki/Deterministic_finite_automaton) representing the source program. We will now try to understand how this works in Python. First, open `test.py` and modify the contents to: ```python class foo: pass ``` Then, start gdb and set the breakpoint at `PyParser_AddToken()`: ```bash $ gdb python (gdb) b PyParser_AddToken (gdb) r test.py [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". Breakpoint 1, PyParser_AddToken (ps=ps@entry=0x9cf4f0, type=type@entry=1, str=str@entry=0x7ffff6eb5a18 "class", lineno=1, col_offset=0, expected_ret=expected_ret@entry=0x7fffffffdc34) at Parser/parser.c:229 229 { ``` At this moment, look inside the for-loop. We want to know the DFA state now, so set a breakpoint at `dfa *d = ps->p_stack.s_top->s_dfa` next line, and run the program by `continue`: ```bash (gdb) b 244 Breakpoint 2 at 0x5b5933: file Parser/parser.c, line 244. (gdb) c Breakpoint 2, PyParser_AddToken (ps=ps@entry=0x9cf4f0, type=type@entry=1, str=str@entry=0x7ffff6eb5a18 "class", lineno=1, col_offset=0, expected_ret=expected_ret@entry=0x7fffffffdc34) at Parser/parser.c:244 244 state *s = &d->d_state[ps->p_stack.s_top->s_state]; ``` We can then print out the value in `d`, for example, print out its name: ```bash (gdb) p d->d_name 'file_input' (gdb) ``` Or, print out as Python `dir()`: ```bash (gdb) p *d $6 = {d_type = 269, d_name = 0x606340 "stmt", d_initial = 0, d_nstates = 2, d_state = 0x8a6a40 <states_13>, d_first = 0x60658c ""} (gdb) set print pretty (gdb) p *d $8 = { d_type = 269, d_name = 0x606340 "stmt", d_initial = 0, d_nstates = 2, d_state = 0x8a6a40 <states_13>, d_first = 0x60658c "" } ``` Compare the value of `d_name`, you can find it in `Grammar/Grammar`: ```yacc # Grammar for Python single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE file_input: (NEWLINE | stmt)* ENDMARKER eval_input: testlist NEWLINE* ENDMARKER ``` Run it more time, you will see that `d_name` will change as this sequence: `file_input`, `stmt`, `compound_stmt`, `classdef`, `classdef`, `classdef`, `classdef`, `suite`, `suite`, `suite`, `stmt`, `simple_stmt`, `small_stmt`, `pass_stmt`, `simple_stmt`, `suite`, `file_input`, `file_input`. ### 3.3 Review #### Backtrace * main() * Py_Main() * run_file() * PyRun_AnyFileExFlags() * PyRun_SimpleFileExFlags() * PyRun_FileExFlags() * PyParser_ASTFromFileObject() * PyParser_ParseFileObject() * parsetok() * PyParser_AddToken() * PyAST_FromNodeObject() ## 04. Debugging the AST (Abstract Syntax Tree) generator At this moment, we build our parser tree via `PyParser_ParseFileObject()`. The next step is to generate the AST from the parser node we built. Before getting into it, there are several macros we need to introduce. These macros are used to query data from node structs, which are defined in `Include/node.h`: * CHILD(node *, int) Returns the nth child of the node using zero-offset indexing * RCHILD(node *, int) Returns the nth child of the node from the right side; use negative numbers! * NCH(node *) The number of children the node has * STR(node *) A string representation of the node; e.g. will return `:` for a * COLON token * TYPE(node *) The type of node as specified in `Include/graminit.h` * REQ(node *, TYPE) Assert that the node is the type that is expected * LINENO(node *) Retrieve the line number of the source code that led to the creation of the parse rule; defined in `Python/ast.c`. We can now try to convert the Parse Tree to an AST using the function `PyAST_FromNodeObject()` in `Python/ast.c`. ```c=786 for (i = 0; i < NCH(n) - 1; i++) { ch = CHILD(n, i); if (TYPE(ch) == NEWLINE) continue; REQ(ch, stmt); num = num_stmts(ch); if (num == 1) { s = ast_for_stmt(&c, ch); if (!s) goto out; asdl_seq_SET(stmts, k++, s); } else { ch = CHILD(ch, 0); REQ(ch, simple_stmt); for (j = 0; j < num; j++) { s = ast_for_stmt(&c, CHILD(ch, j * 2)); if (!s) goto out; asdl_seq_SET(stmts, k++, s); } } } res = Module(stmts, arena); ``` `ast_for_stmt()` is a wrapper for `ast_for_xx` where xx is the grammer rule that the function handles. Set a breakpoint at `ast_for_stmt()` to observe how the ASDL statement is generated. ## 05. Debugging the symbol table generator Back into the function `PyRun_FileExFlags()` in `Python/pythonrun.c:906`, we finish our tour for `PyParser_ASTFromFileObject()`. It will then pass the result `mod` into the `run_mod()` function above. It has two principal functions: The first to generate the code object (`PyAST_CompileObject()`) and the second to call the interpreter loop (`PyEval_EvalCode()`). Let us step into `PyAST_CompileObject()` first, this function is in `Python/compile.c`. It has two important functions: 1. `PySumtable_BuildObject()` 2. `compiler_mod()` `PySumtable_BuildObject()` is used to generate the symbol table, which is defined in `Python/symtable.c` at line 240. We shall step into this function and insert a breakpoint at line 281, where there is a for-loop that iterates through the ASDL sequence (`for (i = 0; i < asdl_seq_LEN(seq); i++)`) for the module and generates the symbol table entry in the function `symtable_visit_stmt()`. Before we step into this function we need to understand the structures that support the symbol table. They are defined in the file `Include/symtble.h`. ```c= struct _symtable_entry; struct symtable { PyObject *st_filename; /* name of file being compiled, decoded from the filesystem encoding */ struct _symtable_entry *st_cur; /* current symbol table entry */ struct _symtable_entry *st_top; /* symbol table entry for module */ PyObject *st_blocks; /* dict: map AST node addresses * to symbol table entries */ PyObject *st_stack; /* list: stack of namespace info */ PyObject *st_global; /* borrowed ref to st_top->ste_symbols */ int st_nblocks; /* number of blocks used. kept for consistency with the corresponding compiler structure */ PyObject *st_private; /* name of current class or NULL */ PyFutureFeatures *st_future; /* module's future features that affect the symbol table */ int recursion_depth; /* current recursion depth */ int recursion_limit; /* recursion limit */ }; typedef struct _symtable_entry { PyObject_HEAD PyObject *ste_id; /* int: key in ste_table->st_blocks */ PyObject *ste_symbols; /* dict: variable names to flags */ PyObject *ste_name; /* string: name of current block */ PyObject *ste_varnames; /* list of function parameters */ PyObject *ste_children; /* list of child blocks */ PyObject *ste_directives;/* locations of global and nonlocal statements */ _Py_block_ty ste_type; /* module, class, or function */ int ste_nested; /* true if block is nested */ unsigned ste_free : 1; /* true if block has free variables */ unsigned ste_child_free : 1; /* true if a child block has free vars, including free refs to globals */ unsigned ste_generator : 1; /* true if namespace is a generator */ unsigned ste_coroutine : 1; /* true if namespace is a coroutine */ unsigned ste_varargs : 1; /* true if block has varargs */ unsigned ste_varkeywords : 1; /* true if block has varkeywords */ unsigned ste_returns_value : 1; /* true if namespace uses return with an argument */ unsigned ste_needs_class_closure : 1; /* for class scopes, true if a closure over __class__ should be created */ int ste_lineno; /* first line of block */ int ste_col_offset; /* offset of first line of block */ int ste_opt_lineno; /* lineno of last exec or import * */ int ste_opt_col_offset; /* offset of last exec or import * */ int ste_tmpname; /* counter for listcomp temp vars */ struct symtable *ste_table; } PySTEntryObject; ``` We can observe that the symbol table is a dictionary containing the symbol table entries. For more specific information, please refer to the comment below the source code. Back to the function `symtable_visit_stmt`, we can set the breakpoint on it: ``` (gdb) b symtable_visit_stmt ``` Here we can observe that the kind of expression is `xx_kind`, for `Name_kind`, it will simply call the function `symtable_add_def()` to add a definition into the symbol table. I would suggest you to debug this function on your own as it is quite straight forward. :::info Exercise: Trace the symbol table entries for the following Python files: ```python class test: pass ``` ```python i = range(0, 10) m = map(lambda x: x * 2, i) ``` In this exercise we have understood how symbol table entries are created. ::: ## 06. Debugging the compiler and code generator Back to the function `PyAST_CompileObject()`, the next step is `compiler_mod()`, which then converts the AST into a CFG (control flow graph). Set a breakpoint to it (`b compiler_mod`), and we can step into it. The switch will then bring us into `Module_kind`, into this there is a call to the function `compiler_body()`. Step into it and you will see that there is a for-loop: ```c= for (; i < asdl_seq_LEN(stmts); i++) VISIT(c, stmt, (stmt)ty)asdl_seq_GET(stmts, i)); ``` Here we observe that we iterate through the ASDL statements and call the macro `VISIT`, then calls `compiler_visit_expr(c, node)`. The emission of bytecode is handled by the following macros: * `ADDOP()` Add a specified opcode * `ADDOP_I()` Add an opcode that takes an argument * `ADDOP_O(struct compiler *c, int op, PyObject *type, PyObject *obj)` Add an opcode with the proper argument based on the position of the specified `PyObject` in * PyObject sequence object, but with no handling of mangled names; this is used for when you need to do named lookups of objects such as globals, consts, or parameters where name mangling is not possible and the scope of the name is known * `ADDOP_NAME()` Just like `ADDOP_O`, but name mangling is also handled; used for attribute loading or importing based on name * `ADDOP_JABS()` Create an absolute jump to a basic block * `ADDOP_JREL()` Create a relative jump to a basic block Several helper functions that will emit bytecode and are named `compiler_xx()`, where `xx` is what the function helps with (list, boolop, etc.) :::danger Not yet done. Missing some parts. ::: To verify if we have generated the right opcode, on the file `test.py` execute the following command: ```bash $ python -m dis test.py 1 0 LOAD_CONST 0 (100) 2 STORE_NAME 0 (a) 4 LOAD_CONST 1 (None) 6 RETURN_VALUE ``` From this we can verify that our generated bytecode is correct. I would suggest you to debug larger programs and see how the opcode is generated for the same. ## 07. Debugging the interpreter loop Once the bytecode is generated the next step is to execute the program by the interpreter. Back to the file `Python/pythonrun.c`, we then call the function `PyEval_EvalCode()`, it is a wrapper function to `PyEval_EvalCodeEx()`, and is another wrapper function for `_PyEval_EvalCodeWithName()` :::warning Deprecate to 2.7, PyEval_EvalCodeEx won't setup frame before entering `_PyEval_EvalCodeWithName()`, frame create is moved into _PyEval_EvalCodeWithName. ::: Let us examine the structure of the frame object, which is defined in the file `Include/frameobject.h`: ```cpp typedef struct _frame { PyObject_VAR_HEAD struct _frame *f_back; /* previous frame, or NULL */ PyCodeObject *f_code; /* code segment */ PyObject *f_builtins; /* builtin symbol table (PyDictObject) */ PyObject *f_globals; /* global symbol table (PyDictObject) */ PyObject *f_locals; /* local symbol table (any mapping) */ PyObject **f_valuestack; /* points after the last local */ /* Next free slot in f_valuestack. Frame creation sets to f_valuestack. Frame evaluation usually NULLs it, but a frame that yields sets it to the current stack top. */ PyObject **f_stacktop; PyObject *f_trace; /* Trace function */ /* In a generator, we need to be able to swap between the exception state inside the generator and the exception state of the calling frame (which shouldn't be impacted when the generator "yields" from an except handler). These three fields exist exactly for that, and are unused for non-generator frames. See the save_exc_state and swap_exc_state functions in ceval.c for details of their use. */ PyObject *f_exc_type, *f_exc_value, *f_exc_traceback; /* Borrowed reference to a generator, or NULL */ PyObject *f_gen; int f_lasti; /* Last instruction if called */ /* Call PyFrame_GetLineNumber() instead of reading this field directly. As of 2.3 f_lineno is only valid when tracing is active (i.e. when f_trace is set). At other times we use PyCode_Addr2Line to calculate the line from the current bytecode index. */ int f_lineno; /* Current line number */ int f_iblock; /* index in f_blockstack */ char f_executing; /* whether the frame is still executing */ PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */ PyObject *f_localsplus[1]; /* locals+stack, dynamically sized */ } PyFrameObject; ``` What is important is that this frame contains an instruction pointer which is very useful when we try to understand how generators work in the coming chapters. In `_PyEval_EvalCodeWithName()`, it will create a frame by `_PyFrame_New_NoTrack()`, and at the bottom call the function `PyEval_EvalFrameEx()`. `PyEval_EvalFrameEx()` then will call the `eval_frame()` function on the PyThreadState which is nothing but the function `_PyEval_EvalFrameDefault()`. This function can also be called the virtual machine of Python. Trace into the function `_PyEval_EvalFrameDefault()`, we can then observe an infinite for-loop at line 1054, which will then generate opcode. We can trace it, and you will see it switch to the corresponding opcode block. For example, running code with `a = 100` will first use `LOAD_CONST`, then `LOAD_NAME`, then so on, which we can observe with `python -m dis test.py`. :::info Please, run this to make sure you are familiar with this process. ```python i = range(0, 10) m = map(lambda x: x * 2, i) ``` ```py def logger(func): def logged(name) print('logging starts') func(name) print('logging ends') return logged @logger def print_name(name): print(name) print_name('Resolution Debugging') ``` ::: ### Review of the process This three chapter is the process of converting AST to CFG to Bytecode, you can get more information by referencing the devguide [25.7 AST to CFG to Bytecode](http://cpython-devguide.readthedocs.io/compiler.html#ast-to-cfg-to-bytecode) ## 08. Debugging Python Objects In this chapter we shall learn about the implementation of Python Objects. Let us begin with the PyObject which is the generic Python Object. It can be also called the Object class of Python. It is defined in the file `Include/object.h`. ### 8.0 Before starting PyObject tour In this chapter, you need to use gdb to trace the code. Each PyObject can be traced with these steps: * Open gdb with python * Set the break point at Object create function * Take out REPL, then type the object create you want * Trap, start to trace it For example, if you want to trace about PyBoolObject, you can do the following: ```bash $ gdb python (gdb) b bool_newbb Breakpoint 1 at 0x44812f: file Objects/boolobject.c, line 44. (gdb) r [GCC 6.3.1 20170109] on linux Type "help", "copyright", "credits" or "license" for more information. >>> a = bool(1) Breakpoint 1, bool_new (type=0x87a700 <PyBool_Type>, args=(1,), kwds=0x0) at Objects/boolobject.c:44 44 { ``` Have fun! ### 8.1 The PyObject The generic Python object is defined as follows: ```cpp typedef struct _object { _PyObject_HEAD_EXTRA Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObjecdt; ``` After the preprocessor expands _PyObject_HEAD_EXTRA, it will look like, it defines two pointers to support a doubly-linked list of all live heap objects: ```cpp typedef struct _object { struct _object *_ob_next; struct _object *_ob_prev; Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObjecdt; ``` Inside PyObject, it contains two important elements, the reference count and type object. We will look into these objects after. ### 8.2 The PyVarObject The definition for the Python Objects of variable length is defined as: ```cpp typedef struct { PyObject ob_base; Py_ssize_t ob_size; /* Number of items in variable part */ } PyVarObject; ``` It is basically the same as PyObject, but with an additional field to denote the size of the variable length of the object. :::info Every pointer to a Python object can be cast to a `PyObject*`. Similarly every pointer to a variable-size Python object can, in addition, be cast to `PyVarObject*`. ::: ### 8.3 The PyTypeObject The PyTypeObject is the representation of the type of the Python object. to get the type of any object in Python, you can type the following statements: ```py >>> t = type(1) >>> dir(t) ['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes'] ``` We can see the definition of these methods in the PyTypeObject. It is also defined in the same file, `Include/object.t`. ```cpp #ifdef Py_LIMITED_API typedef struct _typeobject PyTypeObject; /* opaque */ #else typedef struct _typeobject { PyObject_VAR_HEAD const char *tp_name; /* For printing, in format "<module>.<name>" */ Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */ /* Methods to implement standard operations */ destructor tp_dealloc; printfunc tp_print; getattrfunc tp_getattr; setattrfunc tp_setattr; PyAsyncMethods *tp_as_async; /* formerly known as tp_compare (Python 2) or tp_reserved (Python 3) */ reprfunc tp_repr; /* Method suites for standard classes */ PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping; /* More standard operations (here for binary compatibility) */ hashfunc tp_hash; ternaryfunc tp_call; reprfunc tp_str; getattrofunc tp_getattro; setattrofunc tp_setattro; /* Functions to access object as input/output buffer */ PyBufferProcs *tp_as_buffer; /* Flags to define presence of optional/expanded features */ unsigned long tp_flags; const char *tp_doc; /* Documentation string */ /* Assigned meaning in release 2.0 */ /* call function for all accessible objects */ traverseproc tp_traverse; /* delete references to contained objects */ inquiry tp_clear; /* Assigned meaning in release 2.1 */ /* rich comparisons */ richcmpfunc tp_richcompare; /* weak reference enabler */ Py_ssize_t tp_weaklistoffset; /* Iterators */ getiterfunc tp_iter; iternextfunc tp_iternext; /* Attribute descriptor and subclassing stuff */ struct PyMethodDef *tp_methods; struct PyMemberDef *tp_members; struct PyGetSetDef *tp_getset; struct _typeobject *tp_base; PyObject *tp_dict; descrgetfunc tp_descr_get; descrsetfunc tp_descr_set; Py_ssize_t tp_dictoffset; initproc tp_init; allocfunc tp_alloc; newfunc tp_new; freefunc tp_free; /* Low-level free-memory routine */ inquiry tp_is_gc; /* For PyObject_IS_GC */ PyObject *tp_bases; PyObject *tp_mro; /* method resolution order */ PyObject *tp_cache; PyObject *tp_subclasses; PyObject *tp_weaklist; destructor tp_del; /* Type attribute cache version tag. Added in version 2.6 */ unsigned int tp_version_tag; destructor tp_finalize; #ifdef COUNT_ALLOCS /* these must be last and never explicitly initialized */ Py_ssize_t tp_allocs; Py_ssize_t tp_frees; Py_ssize_t tp_maxalloc; struct _typeobject *tp_prev; struct _typeobject *tp_next; #endif } PyTypeObject; #endif ``` All the integer objects are now implemented in `Objects/longobject.c`; it is defined as `PyLong_Type` at line 5389, with PyTypeObject: ```cpp PyTypeObject PyLong_Type = { PyVarObject_HEAD_INIT(&PyType_Type, 0) "int", /* tp_name */ offsetof(PyLongObject, ob_digit), /* tp_basicsize */ sizeof(digit), /* tp_itemsize */ long_dealloc, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_reserved */ long_to_decimal_string, /* tp_repr */ &long_as_number, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ (hashfunc)long_hash, /* tp_hash */ 0, /* tp_call */ long_to_decimal_string, /* tp_str */ PyObject_GenericGetAttr, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_LONG_SUBCLASS, /* tp_flags */ long_doc, /* tp_doc */ 0, /* tp_traverse */ 0, /* tp_clear */ long_richcompare, /* tp_richcompare */ 0, /* tp_weaklistoffset */ 0, /* tp_iter */ 0, /* tp_iternext */ long_methods, /* tp_methods */ 0, /* tp_members */ long_getset, /* tp_getset */ 0, /* tp_base */ 0, /* tp_dict */ 0, /* tp_descr_get */ 0, /* tp_descr_set */ 0, /* tp_dictoffset */ 0, /* tp_init */ 0, /* tp_alloc */ long_new, /* tp_new */ PyObject_Del, /* tp_free */ }; ``` ### 8.4 PyLongObject Defined at `Include/longobject.h`, as this: ```cpp typedef struct _longobject PyLongObject; /* Revealed in longintrepr.h */ ``` ### 8.5 PyBoolObject The PyBoolObject is used to store boolean types in Python. The definition is in the file `include/boolobject.h`. ### 8.6 PyFloatObject In file `Include/floatobject.h` ```cpp typedef struct { PyObject_HEAD double ob_fval; } PyFloatObject; ``` ### 8.7 PyListObject In file `Include/listobject.h`. ```cpp typedef struct { PyObject_VAR_HEAD /* Vector of pointers to list elements. list[0] is ob_item[0], etc. */ PyObject **ob_item; /* ob_item contains space for 'allocated' elements. The number * currently in use is ob_size. * Invariants: * 0 <= ob_size <= allocated * len(list) == ob_size * ob_item == NULL implies ob_size == allocated == 0 * list.sort() temporarily sets allocated to -1 to detect mutations. * * Items must normally not be NULL, except during construction when * the list is not yet visible outside the function that builds it. */ Py_ssize_t allocated; } PyListObject; ``` ## Appendix C - Some useful terms * ASDL (Abstract Syntax Definition Language) * yacc and lex * Parser * AST (Abstract Syntax Tree) * CFG (Context-free graph) * LL (LL(k) parser)