This document is based on the work of Prashanth Raghu*, who wrote Internals of CPython 3.6. You can find it at this Google document and Python Core-Mentorship 2017-Jan.
And, also reference from Guido Yet another guided tour of CPython, and cpython-devguide chapter compiler.
Most of the actions were performed in this environment:
$ uname -a
4.8.3-1-ARCH
$ gcc -v
gcc version 6.3.1 20170109 (GCC)
The great work from Prashanth Raghu was using Eclipse IDE as its debugging environment. What I want to change is to remove the requirement of an IDE. It should be able to work everywhere, without any specific IDE for this tour.
git clone https://github.com/python/cpython
gdb
make
$ git clone https://github.com/python/cpython
$ cd cpython
--with-pydebug
then build python from source code:$ ./configure --with-pydebug
$ make -j8 -s
You can change the number for the -j
parameter. For preformance reasons, this number is recommended to be set to # of physical CPU cores * 2
.
After this, you will see a binary file called python
in your directory. If you are using macOS this will probably be python.exe
. If this is the case, replace python
with python.exe
in all the commands that follow later in the tutorial.
$ ls
aclocal.m4 config.log configure Doc install-sh LICENSE Makefile.pre Modules PC pyconfig.h Python python-gdb.py test.py
build config.status configure.ac Grammar Lib Mac Makefile.pre.in Objects PCbuild pyconfig.h.in python-config README.rst Tools
config.guess config.sub coverage.info Include libpython3.7dm.a Makefile Misc Parser Programs python python-config.py setup.py
If you see it, then you are ready to go!
We will using gdb
to trace the behaivor of python
that we built when running it.
As such, this chapter will also introduce an easy gdb
tutorial to give you confidence when using gdb
.
Now, type the command gdb python
under the directory where you built cpython:
$ gdb python
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...done.
(gdb)
You will see the message in the bottom line that says: Reading symbols from python...done.
. Great, that means we can now debug python via gdb
.
Now, the program hasn't run yet, it stops at the front of the program. Now we can plan what we want to do.
First, let us familiarise ourselves with gdb
. Every C program starts its execution in main()
, so we will set a breakpoint for main()
:
(gdb) b main
Breakpoint 1 at 0x41d1d6: file ./Programs/python.c, line 20.
(gdb)
gdb
then sets a breakpoint at Programs/python.c, line 20
. This message shows that the entry point of Python
is at Programs/python.c:20
. Alternativey, if you know where you want to set the breakpoint (in this example, at Programs/python.c, line 20
) we can set the breakpoint like this:
(gdb) b Programs/python.c:20
Breakpoint 3 at 0x41d1d6: file ./Programs/python.c, line 20.
(gdb)
We can run python
now:
(gdb) r
Starting program: /home/grd/Python/cpython/python
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Breakpoint 1, main (argc=1, argv=0x7fffffffdff8) at ./Programs/python.c:20
20 {
(gdb)
We are now stop at the breakpoint we set before, list the source code with list
:
(gdb) l
15 }
16 #else
17
18 int
19 main(int argc, char **argv)
20 {
21 wchar_t **argv_copy;
22 /* We need a second copy, as Python might modify the first one. */
23 wchar_t **argv_copy2;
24 int i, res;
(gdb)
Or just call out tui
by ctrl+x
a:
┌──./Programs/python.c─────────────────────────────────────────────────────────┐
│15 } │
│16 #else │
│17 │
│18 int │
│19 main(int argc, char **argv) │
B+>│20 { │
│21 wchar_t **argv_copy; │
│22 /* We need a second copy, as Python might modify the first one. */│
│23 wchar_t **argv_copy2; │
│24 int i, res; │
│25 char *oldloc; │
│26 │
└──────────────────────────────────────────────────────────────────────────────┘
multi-thre Thread 0x7ffff7f9de In: main L20 PC: 0x41d1d6
(gdb)
In tui
mode, you can see that we are now stopped at line 20 in the source code.
If we didn't set any arguments for python
, running python
will get a REPL in linux or macOS. You can try with continue
in gdb:
(gdb) c
│29 │
│30 argv_copy = (wchar_t **)PyMem_RawMalloc(sizeof(wchar_t*) * (argc+1)); │
│31 argv_copy2 = (wchar_t **)PyMem_RawMalloc(sizeof(wchar_t*) * (argc+1)); │
│32 if (!argv_copy || !argv_copy2) { │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
multi-thre Thread 0x7ffff7f9de In: main L20 PC: 0x41d1d6
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Breakpoint 1, main (argc=1, argv=0x7fffffffdff8) at ./Programs/python.c:20
(gdb) c
Continuing.
Python 3.7.0a0 (default, Feb 22 2017, 22:10:22)
[GCC 6.3.1 20170109] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
Yep, you will get a REPL as we expected.
Now that we understand some basic gdb
concepts, we can move on to debugging Python!
Before starting this chapter, let's create a simple python script, name it test.py
, with following the contents:
a = 100
Now open gdb, set the arguments, and run python with the script we created:
$ gdb python
(gdb) set args test.py
// Or doing this
$ gdb --args python test.py
This is equivalent to what we do in the linux command line:
$ python test.py
Now set the breakpoint at main
as we did before, then run it:
(gdb) b main
Breakpoint 1 at 0x41d1d6: file ./Programs/python.c, line 20.
(gdb) r
Starting program: /home/grd/Python/cpython/python
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Breakpoint 1, main (argc=1, argv=0x7fffffffdff8) at ./Programs/python.c:20
20 {Starting program: /home/grd/Python/cpython/python
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Breakpoint 1, main (argc=1, argv=0x7fffffffdff8) at ./Programs/python.c:20
20 {
(gdb)
If you take a look to the source code Programs/python.c
, you can find that the main()
does a lot of setting with it, the core is to call the function Py_Main(argc, argv_copy)
to start Python, which is defined in the file Modules/main.c
:
argv_copy2[argc] = argv_copy[argc] = NULL;
setlocale(LC_ALL, oldloc);
PyMem_RawFree(oldloc);
res = Py_Main(argc, argv_copy);
/* Force again malloc() allocator to release memory blocks allocated
before Py_Main() */
(void)_PyMem_SetupAllocators("malloc");
If you are interesting in what modules/main.c
is doing, I recommend you read Guido's work - Yet another guided tour of CPython - Chapter 1. To simplify, in Py_Main
it will do:
_PyOS_GetOpt()
: 397Py_Initialize()
: 693readline
: 723Now, we are focused on road number 5, run_file
in line 793:
if (sts == -1)
sts = run_file(fp, filename, &cf);
Stop here by typing b 793
or b run_file
to stop inside the function, we can check filename by p filename
, go into function by step
:
(gdb) b Modules/main.c:793
(gdb) c
(gdb) p filename
$1 = 0x931050 L"test.py
(gdb) s
run_file
is just a wrapper of PyRun_AnyFileExFlags
which lives in Python/pythonrun.c
and is a wrapper that calls either PyRun_InteractiveLoopFlags()
or PyRun_SimpleFileExFlags()
. Because we set the arguments test.py
, it will run PyRun_SimpleFileExFlags()
.
PyRun_SimpleFileExFlags()
will then check if there is a .pyc file to run with maybe_pyc_file()
at the line 366 if-else statement.
In our case, is should not run with .pyc, but from .py. It will then call PyRun_FileExFlags()
in same file, and then try to construct an abstract syntax tree (AST) from file by PyParser_ASTFromFileObject()
.
This function (PyParser_ASTFromFileObject()
) invokes the parser in the function PyParser_ParseFileObject()
, which lives in Parser/parsetok.c
to construct a node object, and invokes the AST tree contructor function PyAST_FromNodeObject()
to create AST tree by node object.
For now, let's take a closer look inside PyParser_ParseFileObject()
.
It first initializes the syntax token from PyTokenizer_FromFile()
, then passes to parsetok()
to create the node object.
parsetok()
will parse the input from the given tokenizer structure and return an error code.
The interesting part is inside the infinite for loop; take a look here:
type = PyTokenizer_Get(tok, &a, &b);
PyTokenizer_Get()
, is a wrapper of tok_get()
. It will return token type defined in token.h
:
include/token.h
#define ENDMARKER 0
#define NAME 1
#define NUMBER 2
#define STRING 3
#define NEWLINE 4
#define INDENT 5
#define DEDENT 6
#define LPAR 7
#define RPAR 8
#define LSQB 9
...
#define RARROW 51
#define ELLIPSIS 52
/* Don't forget to update the table _PyParser_TokenNames in tokenizer.c! */
#define OP 53
#define AWAIT 54
#define ASYNC 55
#define ERRORTOKEN 56
#define N_TOKENS 57
In the first round of the for-loop, we can print out the type, because it is a
, it is a NAME, the value of type should be 1:
(gdb) p type
$1 = 1
include/token.h
#define NAME 1
At line 236, str[lbbben] = '\0'
will store the token string value. If we print it out, it will be a
.
(gdb) p str
$2 = 0x7ffff6eb5a18 "a"
This make sense, since our source code is a = 100
, the first token string should be a
, and its type is NAME
.
Then calling PyParser_AddToken()
, this will add the token into the parser tree at this moment. You can now try to follow the other token =
and 100
and trace how they are added into the parser tree.
The textual representation of the grammar is present in the file Grammar/Grammar
. It is written in yacc and I would suggest you to go through it. The numerical representation of the grammar is present in the file Python/graminit.c
which contains the array of DFA's representing the source program.
We will now try to understand how this works in Python. First, open test.py
and modify the contents to:
class foo:
pass
Then, start gdb and set the breakpoint at PyParser_AddToken()
:
$ gdb python
(gdb) b PyParser_AddToken
(gdb) r test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Breakpoint 1, PyParser_AddToken (ps=ps@entry=0x9cf4f0, type=type@entry=1, str=str@entry=0x7ffff6eb5a18 "class", lineno=1, col_offset=0, expected_ret=expected_ret@entry=0x7fffffffdc34) at Parser/parser.c:229
229 {
At this moment, look inside the for-loop. We want to know the DFA state now, so set a breakpoint at dfa *d = ps->p_stack.s_top->s_dfa
next line, and run the program by continue
:
(gdb) b 244
Breakpoint 2 at 0x5b5933: file Parser/parser.c, line 244.
(gdb) c
Breakpoint 2, PyParser_AddToken (ps=ps@entry=0x9cf4f0, type=type@entry=1, str=str@entry=0x7ffff6eb5a18 "class", lineno=1, col_offset=0, expected_ret=expected_ret@entry=0x7fffffffdc34) at Parser/parser.c:244
244 state *s = &d->d_state[ps->p_stack.s_top->s_state];
We can then print out the value in d
, for example, print out its name:
(gdb) p d->d_name
'file_input'
(gdb)
Or, print out as Python dir()
:
(gdb) p *d
$6 = {d_type = 269, d_name = 0x606340 "stmt", d_initial = 0, d_nstates = 2, d_state = 0x8a6a40 <states_13>, d_first = 0x60658c ""}
(gdb) set print pretty
(gdb) p *d
$8 = {
d_type = 269,
d_name = 0x606340 "stmt",
d_initial = 0,
d_nstates = 2,
d_state = 0x8a6a40 <states_13>,
d_first = 0x60658c ""
}
Compare the value of d_name
, you can find it in Grammar/Grammar
:
# Grammar for Python
single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
file_input: (NEWLINE | stmt)* ENDMARKER
eval_input: testlist NEWLINE* ENDMARKER
Run it more time, you will see that d_name
will change as this sequence: file_input
, stmt
, compound_stmt
, classdef
, classdef
, classdef
, classdef
, suite
, suite
, suite
, stmt
, simple_stmt
, small_stmt
, pass_stmt
, simple_stmt
, suite
, file_input
, file_input
.
At this moment, we build our parser tree via PyParser_ParseFileObject()
. The next step is to generate the AST from the parser node we built.
Before getting into it, there are several macros we need to introduce. These macros are used to query data from node structs, which are defined in Include/node.h
:
:
for a * COLON tokenInclude/graminit.h
Python/ast.c
.We can now try to convert the Parse Tree to an AST using the function PyAST_FromNodeObject()
in Python/ast.c
.
for (i = 0; i < NCH(n) - 1; i++) {
ch = CHILD(n, i);
if (TYPE(ch) == NEWLINE)
continue;
REQ(ch, stmt);
num = num_stmts(ch);
if (num == 1) {
s = ast_for_stmt(&c, ch);
if (!s)
goto out;
asdl_seq_SET(stmts, k++, s);
}
else {
ch = CHILD(ch, 0);
REQ(ch, simple_stmt);
for (j = 0; j < num; j++) {
s = ast_for_stmt(&c, CHILD(ch, j * 2));
if (!s)
goto out;
asdl_seq_SET(stmts, k++, s);
}
}
}
res = Module(stmts, arena);
ast_for_stmt()
is a wrapper for ast_for_xx
where xx is the grammer rule that the function handles.
Set a breakpoint at ast_for_stmt()
to observe how the ASDL statement is generated.
Back into the function PyRun_FileExFlags()
in Python/pythonrun.c:906
, we finish our tour for PyParser_ASTFromFileObject()
. It will then pass the result mod
into the run_mod()
function above. It has two principal functions: The first to generate the code object (PyAST_CompileObject()
) and the second to call the interpreter loop (PyEval_EvalCode()
).
Let us step into PyAST_CompileObject()
first, this function is in Python/compile.c
. It has two important functions:
PySumtable_BuildObject()
compiler_mod()
PySumtable_BuildObject()
is used to generate the symbol table, which is defined in Python/symtable.c
at line 240.
We shall step into this function and insert a breakpoint at line 281, where there is a for-loop that iterates through the ASDL sequence (for (i = 0; i < asdl_seq_LEN(seq); i++)
) for the module and generates the symbol table entry in the function symtable_visit_stmt()
.
Before we step into this function we need to understand the structures that support the symbol table. They are defined in the file Include/symtble.h
.
struct _symtable_entry;
struct symtable {
PyObject *st_filename; /* name of file being compiled,
decoded from the filesystem encoding */
struct _symtable_entry *st_cur; /* current symbol table entry */
struct _symtable_entry *st_top; /* symbol table entry for module */
PyObject *st_blocks; /* dict: map AST node addresses
* to symbol table entries */
PyObject *st_stack; /* list: stack of namespace info */
PyObject *st_global; /* borrowed ref to st_top->ste_symbols */
int st_nblocks; /* number of blocks used. kept for
consistency with the corresponding
compiler structure */
PyObject *st_private; /* name of current class or NULL */
PyFutureFeatures *st_future; /* module's future features that affect
the symbol table */
int recursion_depth; /* current recursion depth */
int recursion_limit; /* recursion limit */
};
typedef struct _symtable_entry {
PyObject_HEAD
PyObject *ste_id; /* int: key in ste_table->st_blocks */
PyObject *ste_symbols; /* dict: variable names to flags */
PyObject *ste_name; /* string: name of current block */
PyObject *ste_varnames; /* list of function parameters */
PyObject *ste_children; /* list of child blocks */
PyObject *ste_directives;/* locations of global and nonlocal statements */
_Py_block_ty ste_type; /* module, class, or function */
int ste_nested; /* true if block is nested */
unsigned ste_free : 1; /* true if block has free variables */
unsigned ste_child_free : 1; /* true if a child block has free vars,
including free refs to globals */
unsigned ste_generator : 1; /* true if namespace is a generator */
unsigned ste_coroutine : 1; /* true if namespace is a coroutine */
unsigned ste_varargs : 1; /* true if block has varargs */
unsigned ste_varkeywords : 1; /* true if block has varkeywords */
unsigned ste_returns_value : 1; /* true if namespace uses return with
an argument */
unsigned ste_needs_class_closure : 1; /* for class scopes, true if a
closure over __class__
should be created */
int ste_lineno; /* first line of block */
int ste_col_offset; /* offset of first line of block */
int ste_opt_lineno; /* lineno of last exec or import * */
int ste_opt_col_offset; /* offset of last exec or import * */
int ste_tmpname; /* counter for listcomp temp vars */
struct symtable *ste_table;
} PySTEntryObject;
We can observe that the symbol table is a dictionary containing the symbol table entries. For more specific information, please refer to the comment below the source code.
Back to the function symtable_visit_stmt
, we can set the breakpoint on it:
(gdb) b symtable_visit_stmt
Here we can observe that the kind of expression is xx_kind
, for Name_kind
, it will simply call the function symtable_add_def()
to add a definition into the symbol table.
I would suggest you to debug this function on your own as it is quite straight forward.
Exercise:
Trace the symbol table entries for the following Python files:
class test:
pass
i = range(0, 10)
m = map(lambda x: x * 2, i)
In this exercise we have understood how symbol table entries are created.
Back to the function PyAST_CompileObject()
, the next step is compiler_mod()
, which then converts the AST into a CFG (control flow graph).
Set a breakpoint to it (b compiler_mod
), and we can step into it. The switch will then bring us into Module_kind
, into this there is a call to the function compiler_body()
. Step into it and you will see that there is a for-loop:
for (; i < asdl_seq_LEN(stmts); i++)
VISIT(c, stmt, (stmt)ty)asdl_seq_GET(stmts, i));
Here we observe that we iterate through the ASDL statements and call the macro VISIT
, then calls compiler_visit_expr(c, node)
.
The emission of bytecode is handled by the following macros:
ADDOP()
ADDOP_I()
ADDOP_O(struct compiler *c, int op, PyObject *type, PyObject *obj)
PyObject
in * PyObject sequence object, but with no handling of mangled names; this is used for when you need to do named lookups of objects such as globals, consts, or parameters where name mangling is not possible and the scope of the name is knownADDOP_NAME()
ADDOP_O
, but name mangling is also handled; used for attribute loading or importing based on nameADDOP_JABS()
ADDOP_JREL()
Several helper functions that will emit bytecode and are named compiler_xx()
, where xx
is what the function helps with (list, boolop, etc.)
Not yet done. Missing some parts.
To verify if we have generated the right opcode, on the file test.py
execute the following command:
$ python -m dis test.py
1 0 LOAD_CONST 0 (100)
2 STORE_NAME 0 (a)
4 LOAD_CONST 1 (None)
6 RETURN_VALUE
From this we can verify that our generated bytecode is correct.
I would suggest you to debug larger programs and see how the opcode is generated for the same.
Once the bytecode is generated the next step is to execute the program by the interpreter. Back to the file Python/pythonrun.c
, we then call the function PyEval_EvalCode()
, it is a wrapper function to PyEval_EvalCodeEx()
, and is another wrapper function for _PyEval_EvalCodeWithName()
Deprecate to 2.7, PyEval_EvalCodeEx won't setup frame before entering _PyEval_EvalCodeWithName()
, frame create is moved into _PyEval_EvalCodeWithName.
Let us examine the structure of the frame object, which is defined in the file Include/frameobject.h
:
typedef struct _frame {
PyObject_VAR_HEAD
struct _frame *f_back; /* previous frame, or NULL */
PyCodeObject *f_code; /* code segment */
PyObject *f_builtins; /* builtin symbol table (PyDictObject) */
PyObject *f_globals; /* global symbol table (PyDictObject) */
PyObject *f_locals; /* local symbol table (any mapping) */
PyObject **f_valuestack; /* points after the last local */
/* Next free slot in f_valuestack. Frame creation sets to f_valuestack.
Frame evaluation usually NULLs it, but a frame that yields sets it
to the current stack top. */
PyObject **f_stacktop;
PyObject *f_trace; /* Trace function */
/* In a generator, we need to be able to swap between the exception
state inside the generator and the exception state of the calling
frame (which shouldn't be impacted when the generator "yields"
from an except handler).
These three fields exist exactly for that, and are unused for
non-generator frames. See the save_exc_state and swap_exc_state
functions in ceval.c for details of their use. */
PyObject *f_exc_type, *f_exc_value, *f_exc_traceback;
/* Borrowed reference to a generator, or NULL */
PyObject *f_gen;
int f_lasti; /* Last instruction if called */
/* Call PyFrame_GetLineNumber() instead of reading this field
directly. As of 2.3 f_lineno is only valid when tracing is
active (i.e. when f_trace is set). At other times we use
PyCode_Addr2Line to calculate the line from the current
bytecode index. */
int f_lineno; /* Current line number */
int f_iblock; /* index in f_blockstack */
char f_executing; /* whether the frame is still executing */
PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */
PyObject *f_localsplus[1]; /* locals+stack, dynamically sized */
} PyFrameObject;
What is important is that this frame contains an instruction pointer which is very useful when we try to understand how generators work in the coming chapters.
In _PyEval_EvalCodeWithName()
, it will create a frame by _PyFrame_New_NoTrack()
, and at the bottom call the function PyEval_EvalFrameEx()
.
PyEval_EvalFrameEx()
then will call the eval_frame()
function on the PyThreadState which is nothing but the function _PyEval_EvalFrameDefault()
. This function can also be called the virtual machine of Python.
Trace into the function _PyEval_EvalFrameDefault()
, we can then observe an infinite for-loop at line 1054, which will then generate opcode. We can trace it, and you will see it switch to the corresponding opcode block.
For example, running code with a = 100
will first use LOAD_CONST
, then LOAD_NAME
, then so on, which we can observe with python -m dis test.py
.
Please, run this to make sure you are familiar with this process.
i = range(0, 10)
m = map(lambda x: x * 2, i)
def logger(func):
def logged(name)
print('logging starts')
func(name)
print('logging ends')
return logged
@logger
def print_name(name):
print(name)
print_name('Resolution Debugging')
This three chapter is the process of converting AST to CFG to Bytecode, you can get more information by referencing the devguide 25.7 AST to CFG to Bytecode
In this chapter we shall learn about the implementation of Python Objects. Let us begin with the PyObject which is the generic Python Object. It can be also called the Object class of Python. It is defined in the file Include/object.h
.
In this chapter, you need to use gdb to trace the code. Each PyObject can be traced with these steps:
For example, if you want to trace about PyBoolObject, you can do the following:
$ gdb python
(gdb) b bool_newbb
Breakpoint 1 at 0x44812f: file Objects/boolobject.c, line 44.
(gdb) r
[GCC 6.3.1 20170109] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = bool(1)
Breakpoint 1, bool_new (type=0x87a700 <PyBool_Type>, args=(1,), kwds=0x0) at Objects/boolobject.c:44
44 {
Have fun!
The generic Python object is defined as follows:
typedef struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObjecdt;
After the preprocessor expands _PyObject_HEAD_EXTRA, it will look like, it defines two pointers to support a doubly-linked list of all live heap objects:
typedef struct _object {
struct _object *_ob_next;
struct _object *_ob_prev;
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObjecdt;
Inside PyObject, it contains two important elements, the reference count and type object. We will look into these objects after.
The definition for the Python Objects of variable length is defined as:
typedef struct {
PyObject ob_base;
Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;
It is basically the same as PyObject, but with an additional field to denote the size of the variable length of the object.
Every pointer to a Python object can be cast to a PyObject*
. Similarly every pointer to a variable-size Python object can, in addition, be cast to PyVarObject*
.
The PyTypeObject is the representation of the type of the Python object. to get the type of any object in Python, you can type the following statements:
>>> t = type(1)
>>> dir(t)
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']
We can see the definition of these methods in the PyTypeObject. It is also defined in the same file, Include/object.t
.
#ifdef Py_LIMITED_API
typedef struct _typeobject PyTypeObject; /* opaque */
#else
typedef struct _typeobject {
PyObject_VAR_HEAD
const char *tp_name; /* For printing, in format "<module>.<name>" */
Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */
/* Methods to implement standard operations */
destructor tp_dealloc;
printfunc tp_print;
getattrfunc tp_getattr;
setattrfunc tp_setattr;
PyAsyncMethods *tp_as_async; /* formerly known as tp_compare (Python 2)
or tp_reserved (Python 3) */
reprfunc tp_repr;
/* Method suites for standard classes */
PyNumberMethods *tp_as_number;
PySequenceMethods *tp_as_sequence;
PyMappingMethods *tp_as_mapping;
/* More standard operations (here for binary compatibility) */
hashfunc tp_hash;
ternaryfunc tp_call;
reprfunc tp_str;
getattrofunc tp_getattro;
setattrofunc tp_setattro;
/* Functions to access object as input/output buffer */
PyBufferProcs *tp_as_buffer;
/* Flags to define presence of optional/expanded features */
unsigned long tp_flags;
const char *tp_doc; /* Documentation string */
/* Assigned meaning in release 2.0 */
/* call function for all accessible objects */
traverseproc tp_traverse;
/* delete references to contained objects */
inquiry tp_clear;
/* Assigned meaning in release 2.1 */
/* rich comparisons */
richcmpfunc tp_richcompare;
/* weak reference enabler */
Py_ssize_t tp_weaklistoffset;
/* Iterators */
getiterfunc tp_iter;
iternextfunc tp_iternext;
/* Attribute descriptor and subclassing stuff */
struct PyMethodDef *tp_methods;
struct PyMemberDef *tp_members;
struct PyGetSetDef *tp_getset;
struct _typeobject *tp_base;
PyObject *tp_dict;
descrgetfunc tp_descr_get;
descrsetfunc tp_descr_set;
Py_ssize_t tp_dictoffset;
initproc tp_init;
allocfunc tp_alloc;
newfunc tp_new;
freefunc tp_free; /* Low-level free-memory routine */
inquiry tp_is_gc; /* For PyObject_IS_GC */
PyObject *tp_bases;
PyObject *tp_mro; /* method resolution order */
PyObject *tp_cache;
PyObject *tp_subclasses;
PyObject *tp_weaklist;
destructor tp_del;
/* Type attribute cache version tag. Added in version 2.6 */
unsigned int tp_version_tag;
destructor tp_finalize;
#ifdef COUNT_ALLOCS
/* these must be last and never explicitly initialized */
Py_ssize_t tp_allocs;
Py_ssize_t tp_frees;
Py_ssize_t tp_maxalloc;
struct _typeobject *tp_prev;
struct _typeobject *tp_next;
#endif
} PyTypeObject;
#endif
All the integer objects are now implemented in Objects/longobject.c
; it is defined as PyLong_Type
at line 5389, with PyTypeObject:
PyTypeObject PyLong_Type = {
PyVarObject_HEAD_INIT(&PyType_Type, 0)
"int", /* tp_name */
offsetof(PyLongObject, ob_digit), /* tp_basicsize */
sizeof(digit), /* tp_itemsize */
long_dealloc, /* tp_dealloc */
0, /* tp_print */
0, /* tp_getattr */
0, /* tp_setattr */
0, /* tp_reserved */
long_to_decimal_string, /* tp_repr */
&long_as_number, /* tp_as_number */
0, /* tp_as_sequence */
0, /* tp_as_mapping */
(hashfunc)long_hash, /* tp_hash */
0, /* tp_call */
long_to_decimal_string, /* tp_str */
PyObject_GenericGetAttr, /* tp_getattro */
0, /* tp_setattro */
0, /* tp_as_buffer */
Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE |
Py_TPFLAGS_LONG_SUBCLASS, /* tp_flags */
long_doc, /* tp_doc */
0, /* tp_traverse */
0, /* tp_clear */
long_richcompare, /* tp_richcompare */
0, /* tp_weaklistoffset */
0, /* tp_iter */
0, /* tp_iternext */
long_methods, /* tp_methods */
0, /* tp_members */
long_getset, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */
0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
0, /* tp_init */
0, /* tp_alloc */
long_new, /* tp_new */
PyObject_Del, /* tp_free */
};
Defined at Include/longobject.h
, as this:
typedef struct _longobject PyLongObject; /* Revealed in longintrepr.h */
The PyBoolObject is used to store boolean types in Python. The definition is in the file include/boolobject.h
.
In file Include/floatobject.h
typedef struct {
PyObject_HEAD
double ob_fval;
} PyFloatObject;
In file Include/listobject.h
.
typedef struct {
PyObject_VAR_HEAD
/* Vector of pointers to list elements. list[0] is ob_item[0], etc. */
PyObject **ob_item;
/* ob_item contains space for 'allocated' elements. The number
* currently in use is ob_size.
* Invariants:
* 0 <= ob_size <= allocated
* len(list) == ob_size
* ob_item == NULL implies ob_size == allocated == 0
* list.sort() temporarily sets allocated to -1 to detect mutations.
*
* Items must normally not be NULL, except during construction when
* the list is not yet visible outside the function that builds it.
*/
Py_ssize_t allocated;
} PyListObject;