PWN Protobuf逆向

# PWN Protobuf逆向前言：呜呜，下午逃课来做这个东西，可惜最后一次课点名md，还没搞，就跑去教室了，学校好多课都没什么用，还有好多活动，太jb垃圾了学校呜呜，哎，当初考的太差了，呜呜， 2023年CISCN 和2024年CISCN都有此协议的pwn，所以这里准备复现复现，也是为了准备下省赛的题型，protobuf 协议很简单，只要掌握了基础的知识和逆向的手段基本上问题不大，很简单，今天学的东西不是很难，，，， ## 简介 Protocol Buffers (Protobuf) 是由 Google 开发的一种高效的、语言中立的、平台中立的、可扩展的序列化结构数据的方法。它可以用于通信协议、数据存储、以及各种系统中不同组件之间的结构化数据交换。Protobuf 通过一种灵活且紧凑的二进制格式来存储数据，相比于 XML 或 JSON 等文本格式，具有更高的效率。 ### Protobuf 的核心特点： 1. **高效**：Protobuf 采用紧凑的二进制格式，具有较小的消息体积和较快的序列化/反序列化速度。 2. **跨语言支持**：Protobuf 支持多种编程语言（如 C++, Java, Python, Go, C#, JavaScript 等），方便不同语言之间的互操作。 3. **平台无关**：Protobuf 定义的消息格式可以在不同操作系统和平台之间无缝传输。 4. **可扩展性**：Protobuf 允许在不破坏现有数据结构的情况下向消息格式中添加新的字段，因此具有较好的向后兼容性。 ### 工作原理： Protobuf 通过定义一个 `.proto` 文件来描述消息格式。该文件包含数据的结构定义，如字段名称、数据类型等。然后使用 Protobuf 编译器 `protoc` 将 `.proto` 文件编译为特定语言的代码，这些代码可以用于序列化和反序列化数据。 ```c syntax = "proto3"; message Person { string name = 1; int32 id = 2; string email = 3; } ``` 在这个例子中，`Person` 是一个消息类型，包含三个字段：`name`（字符串类型）、`id`（32位整数类型）、`email`（字符串类型）。 #### 编译和使用： 1. 使用 `protoc` 编译 `.proto` 文件生成语言特定的代码（例如 Python、Java、C++ 等）。 2. 在代码中使用生成的类来序列化数据到二进制格式，或将二进制数据反序列化成对应的对象。序列化和反序列化： ```python # 示例 Python 代码 import person_pb2 # 创建一个 Person 对象 person = person_pb2.Person() person.name = "Alice" person.id = 123 person.email = "alice@example.com" # 序列化为二进制数据 data = person.SerializeToString() # 反序列化 new_person = person_pb2.Person() new_person.ParseFromString(data) print(new_person.name) # 输出 Alice ``` ### 为什么选择 Protobuf？ - **高效性**：Protobuf 在性能和存储上都优于传统的文本格式（如 JSON 和 XML），特别是在需要传输大量数据时。 - **跨语言和跨平台**：无论是移动端、Web 端还是后台服务，Protobuf 都能提供一致的数据格式。 - **版本兼容**：Protobuf 允许在不破坏已存在系统的情况下对数据结构进行扩展，非常适合需要长期维护和版本演进的系统。总的来说，Protobuf 是一种高效、灵活且跨语言的数据序列化工具，广泛应用于现代微服务架构、网络协议和大数据处理等领域。以上是简单的知识，下面来搭建下安装 ``` wget https://github.com/protocolbuffers/protobuf/releases/download/v3.6.1/protobuf-cpp-3.6.1.tar.gz tar -xzvf protobuf-cpp-3.6.1 cd protobuf-3.6.1 ``` 配置编译安装 ``` ./configure make sudo make install ``` 在/usr/lib中创建软连接： ```sh cd /usr/lib sudo ln -s /usr/local/lib/libprotoc.so.17 libprotobuf.so.17 sudo ln -s /usr/local/lib/libprotoc.so.17 libprotoc.so.17 ``` protoc --version 在我们比赛当中，用的protobuf-c比较多，一般CTF PWN 中的下载Protobuf-c项目：https://github.com/protobuf-c/protobuf-c/releases 进入Protobuf-c目录配置、编译并安装： ``` tar -xzvf protobuf-c.tar.gz cd protobuf-c ./configure && make sudo make install ``` ## 基本语法官方文档： ```c // demo.proto syntax = "proto3"; package tutorial; message Person { string name = 1; int32 id = 2; string email = 3; enum PhoneType { PHONE_TYPE_UNSPECIFIED = 0; PHONE_TYPE_MOBILE = 1; PHONE_TYPE_HOME = 2; PHONE_TYPE_WORK = 3; } message PhoneNumber { string number = 1; PhoneType type = 2; } repeated PhoneNumber phones = 4; } message AddressBook { repeated Person people = 1; } ``` 这段代码是一个 Protobuf 的示例定义文件 (`demo.proto`)，描述了一个地址簿（`AddressBook`）的数据结构以及包含在其中的人员信息（`Person`）和联系方式（`PhoneNumber`）。让我们逐行分析该文件的结构。 ### 1. `syntax = "proto3";` 指定 Protobuf 的语法版本为 `proto3`。`proto3` 是 Protobuf 的最新版本，具有更简洁的语法和一些新特性，如默认值和支持 `repeated` 字段等。 ### 2. `package tutorial;` 定义了一个命名空间 `tutorial`，这有助于避免与其他 Protobuf 定义文件中可能定义的类型发生命名冲突。它通常会作为生成的代码的包名。 ### 3. `message Person` 定义了一个消息类型 `Person`，表示一个人的信息。消息类型类似于面向对象编程中的类，包含一组字段。 #### `Person` 中的字段： - `string name = 1;` 这是一个字符串类型字段，用于存储人的姓名。字段的标签（`= 1`）指示该字段在序列化时的唯一标识符，通常是一个递增的数字。 - `int32 id = 2;` 这是一个整型字段，表示人的唯一标识符（ID）。 - `string email = 3;` 这是一个字符串类型字段，用于存储人的电子邮件地址。 #### `PhoneType` 枚举： - ``` enum PhoneType { ... } ``` 这个枚举定义了手机类型的不同类别。Protobuf 的枚举类似于常规的枚举类型，可以用来描述具有固定值的一组常量。它有以下四个值： - `PHONE_TYPE_UNSPECIFIED = 0;`：表示未指定的手机类型。 - `PHONE_TYPE_MOBILE = 1;`：表示手机号码。 - `PHONE_TYPE_HOME = 2;`：表示家庭电话。 - `PHONE_TYPE_WORK = 3;`：表示工作电话。 #### `PhoneNumber` 消息： - ``` message PhoneNumber { ... } ``` 这是一个嵌套的消息类型，用于表示电话号码的信息。它包含两个字段： - `string number = 1;`：存储电话号码。 - `PhoneType type = 2;`：使用之前定义的 `PhoneType` 枚举来指示电话号码的类型（如家庭电话、工作电话等）。 #### `repeated PhoneNumber phones = 4;` - `repeated` 是 Protobuf 中的一个关键字，表示该字段可以包含多个值。这个字段用来存储一个人的所有电话号码，类型为 `PhoneNumber`。 ### 4. `message AddressBook` 定义了另一个消息类型 `AddressBook`，用于表示一个地址簿。该消息包含一个 `repeated` 字段，用于存储多个 `Person` 对象，即这个地址簿中包含的所有人。 - `repeated Person people = 1;` 表示 `AddressBook` 可以存储多个 `Person` 对象。字段 `people` 是一个重复字段，用来表示地址簿中的所有人员。 ### 总结：这个 Protobuf 定义文件描述了一个地址簿系统的结构，其中包括： - 每个人的基本信息（如姓名、ID、电子邮件等）。 - 每个人的电话号码，以及电话号码的类型（如手机、家庭电话、工作电话）。 - 一个包含多个人信息的地址簿。 ### 使用示例：假设您将这段 Protobuf 定义文件（`demo.proto`）编译成相应的代码（如 Python、Java、C++ 等），您可以使用以下方式创建和操作这些消息。例如，假设我们使用 Python，生成的类 `Person` 和 `AddressBook` 可以像这样使用： ```python import demo_pb2 # 假设生成的文件名是 demo_pb2.py # 创建 Person 对象 person = demo_pb2.Person() person.name = "Alice" person.id = 123 person.email = "alice@example.com" phone = person.phones.add() # 添加电话号码 phone.number = "123-456-7890" phone.type = demo_pb2.Person.PHONE_TYPE_MOBILE # 创建 AddressBook 并添加 Person address_book = demo_pb2.AddressBook() address_book.people.append(person) # 序列化到字符串 data = address_book.SerializeToString() # 反序列化 new_address_book = demo_pb2.AddressBook() new_address_book.ParseFromString(data) print(new_address_book) ``` ## 编译 protoc --c_out=. demo.proto 生成如下文件 - **demo.pb-c.h**：类的声明。 - **demo.pb-c.c**：类的实现。编译为python protoc --python_out=. demo.proto 与c进行交互生成demo_pb2, import demo_pb2 导入即可 ## 逆向下面进入正题我们一般第一步逆向的时候先还原结构体 Protobuf关键结构体在生成的demo.c文件里我们可以看到如下unpack函数 ```c Tutorial__AddressBook * tutorial__address_book__unpack(ProtobufCAllocator *allocator, size_t len, const uint8_t *data) { return (Tutorial__AddressBook *) protobuf_c_message_unpack (&tutorial__address_book__descriptor, allocator, len, data); } ``` 反序列化函数传入的是消息结构体数据的descriptor，我们最后逆的就是这个消息结构体数据 descriptor： ```c struct ProtobufCMessageDescriptor { /** Magic value checked to ensure that the API is used correctly. */ uint32_t magic; /** The qualified name (e.g., "namespace.Type"). */ const char *name; /** The unqualified name as given in the .proto file (e.g., "Type"). */ const char *short_name; /** Identifier used in generated C code. */ const char *c_name; /** The dot-separated namespace. */ const char *package_name; /** * Size in bytes of the C structure representing an instance of this * type of message. */ size_t sizeof_message; /** Number of elements in `fields`. */ unsigned n_fields; /** Field descriptors, sorted by tag number. */ const ProtobufCFieldDescriptor *fields; /** Used for looking up fields by name. */ const unsigned *fields_sorted_by_name; /** Number of elements in `field_ranges`. */ unsigned n_field_ranges; /** Used for looking up fields by id. */ const ProtobufCIntRange *field_ranges; /** Message initialisation function. */ ProtobufCMessageInit message_init; /** Reserved for future use. */ void *reserved1; /** Reserved for future use. */ void *reserved2; /** Reserved for future use. */ void *reserved3; }; ``` - magic：通常为0x28AAEEF9。 - n_fields：结构体中的字段数量。 - fields：指向一个储存字段和数据的结构体。 fields是ProtobufCFieldDescriptor类型。继续分析ProtobufCFieldDescriptor - name：字段名。 - id：唯一字段编号。 - label：修饰符，如：required、optional、repeated。 - type：数据类型，如：bool、int32、float、double等。以上是逆向的基础知识下面以2023年的ciscn和2024年的ciscn来实战分析 2023 第一步先去找特征，在sub_5090函数里找到了以下特征 ![image-20241111203203292](C:\Users\70335\AppData\Roaming\Typora\typora-user-images\image-20241111203203292.png) ![image-20241111203225113](C:\Users\70335\AppData\Roaming\Typora\typora-user-images\image-20241111203225113.png) 很明显这是消息信息结构体descriptor 我们确定了是protobuf协议后我们开始第一步逆向descriptor，在ida里去找magic字段为0x28AAEEF9 在data.rel.ro字段里发现了magic ![image-20241111203714788](C:\Users\70335\AppData\Roaming\Typora\typora-user-images\image-20241111203714788.png) 确定它的descriptor name：devicemsg magic为0x28AAEEF9 结构体size为0x40 字段数为4 接着分析ProtobufCFieldDescriptor结构体也就是"actionid" ![image-20241111203933134](C:\Users\70335\AppData\Roaming\Typora\typora-user-images\image-20241111203933134.png) id：1 label：0 type: 4 enum表： lable enum: ```c typedef enum { /** A well-formed message must have exactly one of this field. */ PROTOBUF_C_LABEL_REQUIRED, /** * A well-formed message can have zero or one of this field (but not * more than one). */ PROTOBUF_C_LABEL_OPTIONAL, /** * This field can be repeated any number of times (including zero) in a * well-formed message. The order of the repeated values will be * preserved. */ PROTOBUF_C_LABEL_REPEATED, /** * This field has no label. This is valid only in proto3 and is * equivalent to OPTIONAL but no "has" quantifier will be consulted. */ PROTOBUF_C_LABEL_NONE, } ProtobufCLabel; ``` type enum表： ```c typedef enum { PROTOBUF_C_TYPE_INT32, /**< int32 */ PROTOBUF_C_TYPE_SINT32, /**< signed int32 */ PROTOBUF_C_TYPE_SFIXED32, /**< signed int32 (4 bytes) */ PROTOBUF_C_TYPE_INT64, /**< int64 */ PROTOBUF_C_TYPE_SINT64, /**< signed int64 */ PROTOBUF_C_TYPE_SFIXED64, /**< signed int64 (8 bytes) */ PROTOBUF_C_TYPE_UINT32, /**< unsigned int32 */ PROTOBUF_C_TYPE_FIXED32, /**< unsigned int32 (4 bytes) */ PROTOBUF_C_TYPE_UINT64, /**< unsigned int64 */ PROTOBUF_C_TYPE_FIXED64, /**< unsigned int64 (8 bytes) */ PROTOBUF_C_TYPE_FLOAT, /**< float */ PROTOBUF_C_TYPE_DOUBLE, /**< double */ PROTOBUF_C_TYPE_BOOL, /**< boolean */ PROTOBUF_C_TYPE_ENUM, /**< enumerated type */ PROTOBUF_C_TYPE_STRING, /**< UTF-8 or ASCII string */ PROTOBUF_C_TYPE_BYTES, /**< arbitrary byte sequence */ PROTOBUF_C_TYPE_MESSAGE, /**< nested message */ } ProtobufCType; ``` 发现 label是PROTOBUF_C_LABEL_REQUIRED type是PROTOBUF_C_TYPE_SINT64。这是actionid的类型，然后msgidx也是如此 ![image-20241111204642308](C:\Users\70335\AppData\Roaming\Typora\typora-user-images\image-20241111204642308.png) 2 0 4 idx为2 label是PROTOBUF_C_LABEL_REQUIRED type是PROTOBUF_C_TYPE_SINT64 ![image-20241111204755653](C:\Users\70335\AppData\Roaming\Typora\typora-user-images\image-20241111204755653.png) 4 0 f idx为2 label是PROTOBUF_C_LABEL_REQUIRED type是bytes 自此我们就可以还原消息结构体了： ```c syntax = "proto2"; message devicemsg { required sint64 actionid = 1; required sint64 msgidx = 2; required sint64 msgsize = 3; required bytes msgcontent = 4; } ``` 然后进行编译为python protoc --python_out=. device.proto 生成device_pb2.py文件在exp里进行交互： ```python from pwn import * import device_pb2.py elf=ELF("./pwn") p=process("pwn") def add(index,size,content): msg=device_pb2.devicemsg() msg.actionid=1 msg.msgidx=index msg.msgsize=size msg.msgcontent=content p.sendlineafter("now: ",msg.SerializeToString()) add() p.interactive() ``` 自此2023年的ciscn protobuf逆向完成下面来看下2024年的ciscn国赛题 ![image-20241111205638841](C:\Users\70335\AppData\Roaming\Typora\typora-user-images\image-20241111205638841.png) 消息结构体名字为heybro ![image-20241111205716898](C:\Users\70335\AppData\Roaming\Typora\typora-user-images\image-20241111205716898.png) whatcon 字段 id=1 3 f required bytes 其它字段也是如此还原结构体 ```c syntax "proto2" message heybro { required bytes whatcon = 1; required sint64 whattodo = 2; required sint64 whatidx = 3; required sint64 whatsize = 4; required uint32 whatsthis = 5; } ``` protoc --python_out=. device.proto ```python from pwn import * import device_pb2 def add_chunk(index, content): msg =Heybro_pb2.heybro() msg.whattodo = 1 msg.whatidx = index msg.whatsize = 0 msg.whatcon = content msg.whatsthis = 0 p.sendafter(b'WANT?\n', msg.SerializeToString()) ``` 总结：通过前面的逆向分析其实这种带有protobuf的并不可怕，只要按照常规操作给它逆向出来就行了，很简单