# Introduction to TON TL-B (Part 2, Cell and Bag of Cells) [TL-B (Type Language - Binary)](https://docs.ton.org/v3/documentation/data-formats/tlb/tl-b-language) is derived from Telegram's [TL (Type Language)](https://core.telegram.org/mtproto/TL). It is an [IDL](https://en.wikipedia.org/wiki/Interface_description_language) similar to Google's [Protobuf](https://protobuf.dev/). Understanding TL-B is crucial for learning and utilizing the [TON](https://ton.org/) blockchain, as structures like **Block** and **Transaction** are described using the TL-B language. I am preparing to write a series of articles to introduce the TL-B language step by step. In the first article of this series, I introduced the basic concepts and syntax of TL-B, most of the built-in types, as well as some commonly used advanced types. This is the second article in this series, in which I will provide a detailed introduction to **Cell**, **BoC (Bag of Cells)**, and related concepts. If you are not very familiar with the TL-B language, make sure to read [the previous article](https://hackmd.io/@sp1derca7/ton-tlb-1) before reading this one. ![ton-tlb](https://hackmd.io/_uploads/H1dgASwbyg.png) ## Cell Tree We already know that TL-B's serialization/deserialization processes bit streams. However, in the TON blockchain, what we face directly is not an arbitrary bit stream, but the so-called [Cells](https://docs.ton.org/develop/data-formats/cell-boc#cell). In this article, we can simply understand a cell as a [tree](https://en.wikipedia.org/wiki/Tree_(abstract_data_type)). Each node of this tree can store up to 1023 bits of data, and non-leaf nodes can have up to four subtrees, as shown in the figure below. Obviously, if the size of a struct does not exceed 1023 bits, then it is possible to completely place it within a single cell. Otherwise, we must serialize it into a tree of cells. ![cell-tree](https://hackmd.io/_uploads/r1uMCBwZ1l.png) In TON's terminology, a cell is linked to its subtree through **Ref**s. Therefore, we can use pseudo-code to provide a simplified version of the Cell definition, as shown below. There are also some advanced concepts of Cell, such as different cell types, levels, etc., but we will not consider them in this article. We will introduce these advanced concepts in detail in later articles. ```ts class Cell { data: BitString; // up to 1023 bits refs: Cell[]; // up to 4 refs } ``` Since TL-B describes both a **Type** and how to serialize and deserialize its instances, there are generally three ways to handle the serialization and deserialization of that Type: **manual**, **semi-automatic** (requiring support from the programming language), and **fully automatic** (by code generation). The manual way (write all of the code by hand) is the most cumbersome, but it helps us understand the TL-B language and Cell structure, so this article will mainly focus on this. Additionally, in [FunC](https://docs.ton.org/v3/documentation/smart-contracts/func/overview), the current primary development language for TON smart contracts, we still use the manual method to build and parse Cells. The semi-automatic way requires support from the programming language itself and related serialization libraries. Taking [Golang](https://go.dev/) and [tonutils-go](https://github.com/xssnick/tonutils-go) as an example, we need to define corresponding Go structs based on the TL-B schema and use [Tags](https://go.dev/ref/spec#Tag) to describe their corresponding TL-B types and constraints. Then, tonutils-go can help us handle serialization and deserialization. The code generation way is the most convenient and universal. For example, we can use a code generator to generate [C++](https://github.com/ton-blockchain/ton/blob/master/crypto/tl/tlbc-gen-cpp.cpp), [Python](https://github.com/disintar/ton/blob/master/crypto/tl/tlbc-gen-py.cpp), or [TypeScript](https://github.com/ton-community/tlb-codegen) types (structs or classes) and serialization/deserialization code. In [the first article](https://hackmd.io/@sp1derca7/ton-tlb-1) of this series, we used the code generation method to help us understand the basic syntax and built-in types of TL-B. Another example of code generation is [Tact](https://tact-lang.org/), the newly-developing TON smart contract language. When we write TON smart contracts using the Tact language, we only need to define structs (that compatible with TL-B scheme), and the Tact compiler will help us generate serialization and deserialization code. However, whether it is manual, semi-automatic, or fully automatic, we all need the support of some underlying libraries. In addition to the Cell type, the two most important types in this underlying library are **Builder** and **Slice**. As the name suggests, Builder is used to build cells. Slice, on the other hand, is used to parse cells. Moreover, we need to flatten the entire cell (and its refs) into a bit string. TON has specifically defined a TL-B Schema for this purpose: [Bag of Cells](https://docs.ton.org/develop/data-formats/cell-boc#bag-of-cells) (abbreviated as **BoC**). In the following sections, we will take TypeScript language and [ton-core library](https://github.com/ton-org/ton-core) as examples to introduce Builder, Slice, and BoC in detail. The following figure shows their mutual conversion relationship with Cell: ![cell-api](https://hackmd.io/_uploads/HkC4RrPWye.png) ### Cell Builder The name Builder is straightforward; its role is to build Cells. We create a new `Builder` by calling `beginCell()`, then call various `storeXXX()` methods to write data, and finally call `endCell()` to obtain a `Cell`. `Builder` has corresponding `store` methods for the three kinds of built-in TL-B types: Uint, Int, and Bits. Taking Uint types as an example, here is a test: ```ts import assert from 'node:assert'; import { beginCell } from '@ton/core' function testCell1() { const cell1 = beginCell() .storeUint(0x1234, 16) // store an uint16 .storeUint(0x5678, 32) // store an uint32 .endCell(); assert.equal(cellToJSON(cell1), '{data: "0x123400005678"}'); } ``` The `toString()` method built into the `Cell` type does not return an intuitive string, so we wrote a `cellToJSON()` function to represent a cell as a JSON string. The code for `cellToJSON()` function will be provided at the end of the article. We can insert a subtree into a cell using the `storeRef()` method. Please see another test: ```ts function testCell2() { const cell1 = beginCell().storeUint(0x1234, 16).endCell(); const cell2 = beginCell() .storeUint(0x5678, 24) // store an uint24 .storeRef(cell1) // store a cell .endCell(); assert.equal(cellToJSON(cell2), '{data: "0x005678", refs: [{data: "0x1234"}]}'); } ``` ### Cell Parser The parser for Cell in TON is called Slice. For every `parse` method in `Builder`, there is a corresponding `load` method in `Slice`. The following test demonstrates the basic usage of `Slice`: ```ts function testSlice() { const cell1 = beginCell() .storeUint(0x1234, 16) .storeUint(0x5678, 24) .endCell(); const cell2 = beginCell() .storeUint(0x90AB, 32) .storeRef(cell1) .endCell(); assert.equal(cellToJSON(cell2), '{data: "0x000090AB", refs: [{data: "0x1234005678"}]}'); const slice2 = cell2.asSlice(); assert.equal(slice2.loadUint(32), 0x90AB); // read an uint32 const slice1 = slice2.loadRef().asSlice(); // read a cell assert.equal(slice1.loadUint(16), 0x1234); // read an uint16 assert.equal(slice1.loadUint(24), 0x5678); // read an uint24 } ``` ### Cell in TVM In the previous two sections, we tested Cell, Builder, and Slice through the [TypeScript library](https://github.com/ton-org/ton-core). In fact, [Cell, Builder, and Slice](https://docs.ton.org/v3/concepts/dive-into-ton/ton-blockchain/cells-as-data-storage#cell-flavors) are also native types of [TVM](https://docs.ton.org/v3/documentation/tvm/tvm-overview#tvm-is-a-stack-machine), and TVM has special [instructions](https://docs.ton.org/v3/documentation/tvm/instructions) to manipulate them. Because of this, TON smart contract programming languages such as [Fift](https://docs.ton.org/v3/documentation/smart-contracts/fift/overview), [FunC](https://docs.ton.org/v3/documentation/smart-contracts/func/docs/stdlib#cell-primitives), and [Tact](https://docs.tact-lang.org/ref/core-cells/) also provide built-in support for Cell, Builder, and Slice. The following table lists the main APIs of Cell, Builder, and Slice, as well as their corresponding relationships in FunC, Tact, and TypeScript library: | | FunC Standard Library | Tact Core Library | TypeScript Library | | ------- | --------------------- | ----------------- | ------------------ | | Builder | `begin_cell()` | `beginCell()` | `beginCell()` | | | `store_xxx()` | `storeXXX()` | `storeXXX()` | | | `end_cell()` | `endCell()` | `endCell()` | | Slice | `begin_parse()` | `beginParse()` | `new Slice()` | | | `load_xxx()` | `loadXXX()` | `loadXXX()` | | | `end_parse()` | `endParse()` | `endParse()` | ### Cell Ref in TL-B In the previous article, we introduced the basic syntax of TL-B in detail. However, since we had not yet introduced the concept of Cell at that time, all of our tests were based on a single Cell. In fact, TL-B can represent [a tree of Cells](https://docs.ton.org/develop/data-formats/tl-b-language#extend-cell-with-references) using `^[...]` notation, which means putting certain fields into another Cell for storage. Here is an example: ```tlb _ a:(## 32) ^[ b:(## 32) c:(## 32) d:(## 32)] = A; ``` In this example, we store field a in the root cell, and store fields b, c, and d in another cell. We can write a test using the generated code to verify this: ```ts function testCellRef1() { const f = cellRefTlb.storeA({kind: 'A', a: 0x1234, b: 0x5678, c: 0x90AB, d: 0xCDEF}); const builder = beginCell(); f(builder); const cell = builder.endCell(); assert.equal(cellToJSON(cell), '{data: "0x00001234", refs: [{data: "0x00005678000090AB0000CDEF"}]}'); } ``` The `^[...]` notation can be nested, allowing for the definition of arbitrarily complex Cell trees, as shown in the following example: ```tlb _ a:(## 32) ^[ b:(## 32) ^[ c:(## 32) ^[ d:(## 32) ] ] ] = B; ``` Let's write a test using the generated code. As you can see, fields a, b, c, and d are indeed stored in their own cells respectively: ```ts function testCellRef2() { const f = cellRefTlb.storeB({kind: 'B', a: 0x1234, b: 0x5678, c: 0x90AB, d: 0xCDEF}); const builder = beginCell(); f(builder); const cell = builder.endCell(); assert.equal(cellToJSON(cell), '{data: "0x00001234", refs: [{data: "0x00005678", refs: [{data: "0x000090AB", refs: [{data: "0x0000CDEF"}]}]}]}'); } ``` ## Bag of Cells We already know that a Cell actually represents a **Tree of Cells**. Starting from the **Root Node** of this tree, we can access all the nodes of the tree. A Forest of Cells is formed by N mutually independent cell trees, as shown in the figure below: ![cell-forest](https://hackmd.io/_uploads/S1wLRSwZyg.png) In order to store or transmit a Forest of Cells, we need to flatten all the information contained in the cells into a bit string. TON defines a format for serializing Forest of Cells into byte arrays, called [Bag of Cells](https://docs.ton.org/v3/documentation/data-formats/tlb/cell-boc#bag-of-cells), abbreviated as **BoC**. ![ton-tlb-boc](https://hackmd.io/_uploads/r1TICC3Wke.png) ### TL-B Schema TL-B is specifically used to define types (structs and serialization/deserialization formats), so it's not surprising that BoC has its own TL-B schema. Below is the definition of the serialization format for [BoC](https://github.com/ton-blockchain/ton/blob/v2024.10/crypto/tl/boc.tlb#L25) (with some fields renamed for clarity): ``` serialized_boc#b5ee9c72 has_idx:(## 1) has_crc32c:(## 1) has_cache_bits:(## 1) flags:(## 2) { flags = 0 } size_bytes:(## 3) { size_bytes <= 4 } offset_bytes:(## 8) { offset_bytes <= 8 } cells:(##(size_bytes * 8)) roots:(##(size_bytes * 8)) { roots >= 1 } absent:(##(size_bytes * 8)) { roots + absent <= cells } total_cells_size:(##(offset_bytes * 8)) root_list:(roots * ##(size_bytes * 8)) index:has_idx?(cells * ##(offset_bytes * 8)) cell_data:(tot_cells_size * [ uint8 ]) crc32c:has_crc32c?uint32 = BagOfCells; ``` This structure looks a bit complicated, but in short, it consists of three parts of data: header, body, and checksum. The header records the schema id, some flags, and various metadata needed to parse the cells. The body stores the specific cell data, and finally, the checksum serves as a verification. Let's look at a simple example: ```ts function testBoC2() { const cell1 = beginCell() .storeUint(0x1234, 16) .asCell(); const cell2 = beginCell() .storeUint(0x567890, 32) .storeRef(cell1) .asCell(); console.log(cellToJSON(cell2)); const boc = cell2.toBoc({idx: true, crc32: true}); console.log(boc.toString('hex')); } ``` The above example serializes a BoC with only one Cell Tree (a total of two Cells), which prints out two lines of data: ``` {data: "0x00567890", refs: [{data: "0x1234"}]} b5ee9c72c1010201000b00000701080056789001000412344b310ba3 ``` The first line is the JSON representation of the Cell Tree, which we are already very familiar with. The second line is the complete BoC data in HEX form, which is not very intuitive. We split it into groups of 8 bytes, and provide the corresponding binary data, as shown below: ``` b5ee9c72 10110101111011101001110001110010 c1010201 11000001000000010000001000000001 000b0000 00000000000010110000000000000000 07010800 00000111000000010000100000000000 56789001 01010110011110001001000000000001 00041234 00000000000001000001001000110100 4b310ba3 01001011001100010000101110100011 ``` In the following sections, we will introduce the BoC structure by analyzing the above data in detail. ### Schema ID The schema defined by TL-B can have a **Tag**, also known as **Schema ID**, or **Magic Number** (Our [previous article](https://hackmd.io/@sp1derca7/ton-tlb-1) has a detailed introduction about TL-B and Tag). The Schema ID of BoC is a fixed 4-byte: `0xb5ee9c72`, we can easily find this it in the serialized data: ``` b5ee9c72 10110101111011101001110001110010 ^^^^^^^^ Schema ID ... ``` ### flags Following the Schema ID are three 1-bit and one 2-bit flags, which are: * `has_idx:(## 1)`: If this flag is `1`, the `index` field will be serialized, otherwise not. We will introduce the `index` field later. * `has_crc32c:(## 1)`: If this flag is `1`, a checksum will be appended to the end of the entire data, otherwise not. We will introduce the checksum later. * `has_cache_bits:(## 1)`: In the TypeScript [library](https://github.com/ton-core/ton-core/blob/0.53.0/src/boc/cell/serialization.ts#L244), this flag seems always to be `0`, so we won't expand on it here. * `flags:(## 2) { flags = 0 }`: This flag might be reserved, currently, it can only be `0`. Since we turned on `has_idx` and `has_crc32c` flags when calling `toBoc()` in the test, these two bits are both `1` after serialization, and `has_cache_bits` is `0`: ``` b5ee9c72 10110101111011101001110001110010 c1010201 11000001000000010000001000000001 ^^^ has_idx, has_crc32c, has_cache_bits: 1, 1, 0 ... ``` Then there are 2-bit flags, which can only be `0`: ``` b5ee9c72 10110101111011101001110001110010 c1010201 11000001000000010000001000000001 ^^ flags (uint2): 0 ... ``` ### size_bytes & offset_bytes We need to store the number of cells, as well as the offset of each cell in the cell data. To make the data as compact as possible, we need to record the number of bytes required to store the cell count and offset: * `size_bytes:(## 3) { size_bytes <= 4 }`: Records the number of bytes needed for the cell count. * `offset_bytes:(## 8) { offset_bytes <= 8 }`: Records the number of bytes needed for the cell offset in the data. In our example, there are only two cells, so one byte is enough to record this number: ``` b5ee9c72 10110101111011101001110001110010 c1010201 11000001000000010000001000000001 ^^^ size_bytes (uint3): 1 ... ``` Our serialized data is only 28 bytes in total, so one byte is enough to record any offset: ``` b5ee9c72 10110101111011101001110001110010 c1010201 11000001000000010000001000000001 ^^^^^^^^ offset_bytes (uint8): 1 ... ``` ### cells, roots & absent Next are three Uints, the number of bytes they occupy is determined by the value of the preceding `size_bytes` field: * `cells:(##(size_bytes * 8))`: Records how many cells there are in the cell data. * `roots:(##(size_bytes * 8)) { roots >= 1 }`: Records how many root cells there are in the cell data, obviously, we expect BoC to have at least one root node. * `absent:(##(size_bytes * 8)) { roots + absent <= cells }`: [absent](https://docs.ton.org/develop/data-formats/cell-boc#packing-a-bag-of-cells), always `0` (in current implementations). In our example, `size_bytes` is `1`, so these three Uints each occupy only one byte. In our example, there are `2` cells: ``` b5ee9c72 10110101111011101001110001110010 c1010201 11000001000000010000001000000001 ^^^^^^^^ cells (uint1x8): 2 ... ``` And in our example, there is only `1` root cell: ``` b5ee9c72 10110101111011101001110001110010 c1010201 11000001000000010000001000000001 ^^^^^^^^ roots (uint1x8): 1 ... ``` And the `absent` field must be `0`: ``` b5ee9c72 10110101111011101001110001110010 c1010201 11000001000000010000001000000001 000b0000 00000000000010110000000000000000 ^^^^^^^^ absent (uint1x8): 0 ... ``` ### total_cells_size Next is a Uint: * `total_cells_size:(##(offset_bytes * 8))`: Records the total number of bytes of all cells, the number of bytes it occupies is determined by the preceding `offset_bytes` field. In our example, the value of `offset_bytes` is `1`, and the two cells occupy a total of `11` bytes: ``` b5ee9c72 10110101111011101001110001110010 c1010201 11000001000000010000001000000001 000b0000 00000000000010110000000000000000 ^^^^^^^^ total_cells_size (uint1x8): 11 03af0000 00000011101011110000000000000000 ``` ### root_list & index_list Next are two Uint lists: * `root_list:(roots * ##(size_bytes * 8))`: List of root cell indices, the length of the list is determined by the `roots` field, and the number of bytes for the index is determined by the `size_bytes` field. * `index:has_idx?(cells * ##(offset_bytes * 8))`: List of offsets for all cells, the length of the list is determined by the `cells` field, and the number of bytes for the offset is determined by the `offset_bytes` field. Note that this list is only serialized when the `has_idx` field is set to `1`. In our example, there is only one root cell, and its index is `0`: ``` b5ee9c72 10110101111011101001110001110010 c1010201 11000001000000010000001000000001 000b0000 00000000000010110000000000000000 ^^^^^^^^ root_list (uint1x8): 0 ... ``` In our example, there are two cells, and their offsets are `0` and `7` respectively: ``` b5ee9c72 10110101111011101001110001110010 c1010201 11000001000000010000001000000001 000b0000 00000000000010110000000000000000 ^^^^^^^^ offset0 (uint1x8): 0 07010800 00000111000000010000100000000000 ^^^^^^^^ offset1 (uint1x8): 7 ... ``` ### cell_data Next is the actual data of all cells: * `cell_data:(total_cells_size * [ uint8 ])`: This is a byte array, the number of bytes occupied is determined by the preceding `total_cells_size` field. In our example, `total_cells_size` is `11`, so there are a total of `11` bytes: ``` b5ee9c72 10110101111011101001110001110010 c1010201 11000001000000010000001000000001 000b0000 00000000000010110000000000000000 07010800 00000111000000010000100000000000 ^^^^^^^^^^^^^^^^^^^^^^^^ cell_data: 11 bytes 56789001 01010110011110001001000000000001 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 00041234 00000000000001000001001000110100 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 4b310ba3 01001011001100010000101110100011 ``` The hex of these 11 bytes is: `0x0108005678900100041234`. Taking the first cell as an example, the first byte records the [refs descriptor](https://docs.ton.org/develop/data-formats/cell-boc#cell-serialization), which we can simply consider as the number of refs for this cell in this article. In our example, the root cell has one ref, so this value is 1: ``` 01080056789001 ^^ refs descriptor: 1 ref ``` The second byte records the [bits descriptor](https://docs.ton.org/develop/data-formats/cell-boc#cell-serialization), which records the number of [nibbles](https://en.wikipedia.org/wiki/Nibble) occupied by the cell. In our example, the data of the root cell occupies 4 bytes, which is 8 nibbles: ``` 01080056789001 ^^ bits descriptor: 8 nibbles ``` After that is the actual cell data. In our example, the root cell occupies 4 bytes: ``` 01080056789001 ^^^^^^^^ cell data: 0x00567890 ``` Having analyzed the first cell, the second cell is also easy to understand; it has no refs, and the data occupies 2 bytes: ``` 00041234 ||||^^^^ cell data: 0x1234 ||^^ bits descriptor: 4 nibbles ^^ refs descriptor: 0 refs ``` ### crc32c The last 4 bytes of the BoC are the [CRC32c](https://docs.ton.org/v3/documentation/data-formats/tlb/crc32) checksum: * `crc32c:has_crc32c?uint32`: Of course, this data is only actually stored when the `has_crc32c` field was set to `1`. In our example, the `has_crc32c` flag is turned on, so there is a checksum: ``` b5ee9c72 10110101111011101001110001110010 c1010201 11000001000000010000001000000001 000b0000 00000000000010110000000000000000 07010800 00000111000000010000100000000000 56789001 01010110011110001001000000000001 00041234 00000000000001000001001000110100 4b310ba3 01001011001100010000101110100011 ^^^^^^^^ crc32c ``` The checksum can be obtained by calculating [CRC32c](https://github.com/ton-org/ton-core/blob/main/src/boc/cell/serialization.ts#L290) for all data except the checksum. We can verify this through the following test: ```ts import { crc32c } from '@ton/core' function testCRC32() { const data = Buffer.from('b5ee9c72c1010201000b0000070108005678900100041234', 'hex'); assert.equal('4b310ba3', crc32c(data).toString('hex')); } ``` ## Summary In this article, we first introduced the unique Cell in TON and the basic concepts and related APIs of Builder and Slice closely related to it, and then introduced the ref notation (`^[..]`) of TL-B language. Next, we introduced the concept of BoC (Bag of Cells) and its TL-B schema, and carefully analyzed the BoC format with an actual serialization example. There are still some details about Cell that are not covered in this article, and we will further introduce these details in later articles. Buy me a cup of coffee if this article has been helpful to you: * EVM: `0x8f7BEE940b9F27E8d12F6a4046b9EC57c940c0FA` * TON: `UQBk1flhLnRsAebsYFnvt8HUwI4s_dbG7w3AzpyH5SbIHq_S` ## Appendix Here is the complete code for the `cellToJSON()` function: ```ts import { Cell } from '@ton/core' export function cellToJSON(cell: Cell) { let str = '{data: "0x' + cell.bits.toString() + '"'; if (cell.refs.length > 0) { str += ', refs: ['; for (let i = 0; i < cell.refs.length; i++) { if (i > 0) { str += ', '; } str += cellToJSON(cell.refs[i]); } str += ']'; } str += '}'; return str; } ```