# Python - Struct ###### tags: `Python` `Struct` # Reference * [Java - Binary Parsing](https://hackmd.io/Ge69-l3tRlurS7CduRf1iw) * [reading binary structures with python](https://blog.mozilla.org/nfroyd/2013/12/06/reading-binary-structures-with-python/) # [Module struct. Packing / unpacking data. Basic methods](https://www.bestprog.net/en/2020/05/08/python-module-struct-packing-unpacking-data-basic-methods/) ## 1. Using the struct module. Packed binary data The Python __struct__ module is used to create and pull __packed binary__ data from strings. In the struct module, __data bytes__ are interpreted as __packed binary data__ that can be represented by __objects__ of type ``bytes`` or ``bytearray``. The module contains __conversion tools between Python values and C structures__, which are represented as __Python byte objects__. Such conversions are used in processing binary data that is stored in files or obtained from network connections, etc. 直接在 C <--> Python 之間互轉 To provide a compact description of C-structures and conversion to values (from values) of Python, __format strings are used__. 上面這句是為何? --> 往下讀才知是 有一個 ``format`` parameter, 其 type 是 ``str`` ## 2. The basic methods of the struct module The struct module contains several basic methods that you can use to __pack__ and __unpack__ data. ### 2.1. The ``pack()`` and ``unpack()`` methods. Packing and unpacking data For packing and unpacking data, the methods ``pack()``, ``unpack()`` are used. The packing/unpacking process is implemented according to the format string. According to the documentation, the general form of using the ``pack()`` method is as follows ``` obj = struct.pack(format, v1, v2, ...) ``` where * ``format`` – format string. This line is formed in accordance with the rules laid down in the tables (see paragraph 3); * ``v1``, ``v2``, … – values (objects) to be packed; * ``obj`` – packed binary object. The ``unpack()`` function performs the inverse of the ``pack()`` operation. It allows you to get the source object based on the packed object. The general form of using the function is as follows: ``` obj = struct.unpack(format, buffer) ``` here * ``buffer`` – a buffer in which an object that was previously packaged by the ``pack()`` function is written. __The size of this object must match the size specified in format__; * ``format`` – a format string based on which an unpacked binary object obj is obtained; * ``obj`` – the resulting object, which can be a ``list``, a ``tuple``, a ``set``, a ``dictionary``, etc. When calling the ``unpack()`` function, the __format string must match the same string__ that was specified by the ``pack()`` function. Example. For the purpose of demonstration, packing/unpacking of the list of numbers is carried out. ``` # Module struct. Methods pack(), unpack() # Pack/unpack list of numbers # 1. Specified list of numbers LS = [ 1, 3, 9, 12 ] # 2. Include module struct import struct # 3. Pack list of numbers. Method pack() pack_obj = struct.pack('>4і', LS[0], LS[1], LS[2], LS[3]) # 4. Display the object pack_obj print('pack_obj = ', pack_obj) # 5. Unpack list of numbers. Method unpack(). # The result is a tuple T2 T2 = struct.unpack('>4і', pack_obj) # T2 = (1, 3, 9, 12) # 6. Print the unpacked object T2 print('T2 = ', T2) # 7. Convert tuple T2 to list LS2 LS2 = list(T2) # LS2 = [1, 3, 9, 12] # 8. Display the list LS2 print('LS2 = ', LS2) the result of the program pack_obj = b'\x00\x00\x00\x01\x00\x00\x00\x03\x00\x00\x00\t\x00\x00\x00\x0c' T2 = (1, 3, 9, 12) LS2 = [1, 3, 9, 12] ``` ### 2.2. Method ``calcsize()``. The size of packed object The ``calcsize()`` method returns the size of the object created by the ``pack()`` method. Example. ``` # Module struct. # Method calcsize(). Determine the size of the packed object # 1. Include module struct import struct # 2. Determine the size of a packed list of numbers # 2.1. Specified list of floating point numbers LS = [ 2.88, 3.9, -10.5 ] # 2.2. Pack list LS. Method pack() pack_obj = struct.pack('>3f', LS[0], LS[1], LS[2]) # 2.3. Display the packed object pack_obj print('pack_obj = ', pack_obj) # 2.4. Display the size of pack_obj size = struct.calcsize('>3f') # size = 12 print('size = ', size) # 3. Determine the size of a packed tuple of strings # 3.1. The specified tuple of two strings TS = ( 'Hello', 'abcd') # 3.2. Pack the tuple TS pack_obj = struct.pack('<5s4s', TS[0].encode(), TS[1].encode()) # 3.3. Display the packed object print('pack_obj = ', pack_obj) # 3.4. Display the size of packed tuple size = struct.calcsize('<5s4s') # size = 9 print('size = ', size) ``` ## 3. Formatted strings ### 3.1. Set byte order, size and alignment based on format character In Python, the way a string is packed is determined based on the first character of the format string. This symbol defines: * the __byte order__, which is formed using the characters ``@``, ``=``, ``<``, ``>``, ``!``. If this parameter is not specified, the @ symbol is accepted; 預設字元是 ``@`` * the __size__ in bytes of packed data. In this case, the numbers that indicate the number are used first; * alignment, which is set by the system. According to Python documentation in the format string, byte order, size and alignment are formed according to the first character of the format. The possible first characters of the format are shown in the following table. format string 中的第一個字元 Character | Byte order | Size | Alignment -- | -- | -- | -- @ | native (host dependent) | Native | Native = | native | standard | none < | little-endian | standard | none > | big-endian | standard | none ! | network (= big-endian) | standard | none A __byte order__ value can be one of four: * native order. This order can be either little-endian or big-endian. This order is determined by the host system; * order of type little-endian. In this order, the low byte is processed first, and then the high byte; * order of type big-endian. In this case, the high byte is processed first, and then the low byte; * network order, which defaults to big-endian order. The size of the packed data can be one of two things: * native – defined using the ``sizeof`` __C compiler instructions__; * standard – is determined based on the format character in accordance with the table below. 由下面的 Table 決定 Table. Definition of the standard size of packed data depending on the format character Format | C Type | Python Type | Standard size -- | -- | -- | -- x | pad byte | no value | c | char | bytes of length 1 | 1 b | signed char | integer | 1 B | unsigned char | integer | 1 ? | _Bool | bool | 1 h | short | integer | 2 H | unsigned short | integer | 2 i | in | integer | 4 I | unsigned int | integer | 4 l | long | integer | 4 L | unsigned long | integer | 4 q | long long | integer | 8 Q | unsigned long long | integer | 8 n | ssize_t | integer | N | size_t | integer | e | float | (exponential format) | float | 2 f | float | float | 4 d | double | float | 8 s | char[] | bytes | p | char[] | bytes | P | void* | integer | ### 3.2. Examples of formatted strings for different data types * ``ii`` - two numbers of type int * ``2i`` - two numbers of type int * ``10f`` - 10 numbers of type float * ``>i8s`` - byte order big-endian, int-number, string of 8 characters * ``8dif`` - 8 numbers of type double, 1 number of type int, 1 float number * ``=bi`` - native order, bool-value, int-number --- # [Module struct. Working with binary files. Examples of writing/reading packed binary data](https://www.bestprog.net/en/2020/05/03/python-module-struct-working-with-binary-files/) ## 1. Using tools of struct module for working with files In Python, the struct module is used to read and save packed binary data. This module contains a number of methods that allow you to get a packed object on a specified format string. You can read more about the methods of the struct module here. Before using the ``struct`` module, you need to connect it with the directive ``` import struct ``` When working with binary files, the use of the struct module can be applied in the following cases: * while writing data to a file, they are pre-packaged using the ``pack()`` or ``pack_into()`` methods; * when reading pre-recorded packed data, they are unpacked using the ``unpack()`` or ``unpack_into()`` methods. ## 2. Examples of using the tools of the struct module ### 2.1. Example of writing/reading different types of data to a file In the example, the data that is placed in the list is first written and then read. Data types are different: ``float``, ``bool``, ``char[]``. ``` # Binary files. Writing/reading a list of different data. # Using the capabilities of the struct module. # 1. Include the struct module import struct # 2. The specified list of different types of data: # - 1.5 - type float, in struct denoted by 'f' # - True - type bool, in struct denoted by '?' # - 'abc def' - type char[], in struct denoted 's' L = [1.5, True, 'abc def'] # 3. Write list L to the file 'myfile4.bin' # 3.1. Open file for writing f = open('myfile4.bin', 'wb') # 3.2. Write the list in packed format. # To pack data, use the pack() method # Decoding the string '>f?7s': # - '>' - reverse byte order (high bytes follow last); # - 'f' - type float; # - '?' - type bool; # - '7s' - the type char[] of 7 character size. d = struct.pack('>f?7s', L[0], L[1], L[2].encode()) # 3.3. Write pack data d in file f.write(d) # 3.4. Close file f.close(); # 4. Read the list from the binary file 'myfile4.bin' # 4.1. Open file for reading f = open('myfile4.bin', 'rb') # 4.2. Read data from file d = f.read() # 4.3. Unpack data using the unpack() method. # Data is unpacked as a tuple. T = struct.unpack('>f?7s', d) # 4.4. Convert tuple T to list L2 L2 = list(T) # 4.5. Convert string L2[2] to str type L2[2] = L2[2].decode() # 4.6. Print the list print("L2 = ", L2) # L2 = [1.5, True, 'abc def'] # 4.7. Close the file f.close(); ``` The result of the program ``` L2 = [1.5, True, 'abc def'] ``` ### 2.2. Example of writing/reading a list containing integers The example demonstrates writing a list to a file and reading a list from a file. When reading a list from a file, the size of the previously written list is not known. When writing a list, item-by-item writing is performed based on a ``>i`` format string. In the case of reading, the number of elements is read first, and then the entire list is already read at a time using the line ``` T = struct.unpack('>ni', d) ``` where * ``>ni`` – format string in which ``n`` is the number of written elements; * ``d`` – binary object; * ``T`` – a tuple of numbers obtained by unpacking the object ``d``. ``` # Binary files. Module struct. # Example of writing/reading a list of integers # 1. Include the struct module import struct # 2. Specified list L = [ 2, 4, 6, 8, 10 ] # 3. Write list to the file # 3.1. Open file for writing f = open('myfile9.bin', 'wb') # 3.2. Write a list to a file element by element # 3.2.1. Get packed data object based on list s d = struct.pack('>i', len(L)) # 3.2.2. Write object d to the file f.write(d) # 3.2.3. Write elements one at a time - this is also possible for item in L: d = struct.pack('>i', item) f.write(d) # 3.3. Close file f.close() # --------------------------------------------- # 4. Reading a list from a file # 4.1. Open file for reading f = open('myfile9.bin', 'rb') # 4.2. Get the number of list items - the first number in the file, # first 4 bytes are read - int type size d = f.read(4) # d - binary object count = struct.unpack('>i', d)[0] # 4.3. Read the entire list into a binary object d d = f.read() # reading occurs from the current position to the end of the file # 4.4. Generate a format string to read all numbers at a time # (numbers can also be read one at a time in a loop) s = '>' + str(count) + 'i' # 4.5. Get a tuple of numbers based on a format string T = struct.unpack(s, d) # 4.6. Convert tuple to a list L2 = list(T) # 4.7. Print the list print("L2 = ", L2) # 4.8. Close file f.close() ``` The result of the program ``` L2 = [2, 4, 6, 8, 10] ``` ### 2.3. Example of writing/reading a tuple containing strings If you need to save several lines in a binary file, then when saving each line, its length must be indicated, and then the line itself. This is because strings have different lengths. ``` # Binary files. Module struct. # Example of writing/reading a tuple of strings. # 1. Connect the struct module import struct # 2. The specified tuple of rows to be written to the file. T = ( 'abc', 'abcd', 'def', 'ghi jkl') # 3. Writing to a file tuple T # 3.1. Open file for writing in binary mode f = open('myfile10.bin', 'wb') # 3.2. Write the number of elements in a tuple count = len(T) d = struct.pack('>i', count) # get the packed data f.write(d) # write the packed data # 3.3. Write each row of the tuple in a loop. # Since each line has a different length, # this length must also be written to the file. for item in T: # the loop bypassing of elements of the tuple # get the length of item length = len(item) # pack the length of item d = struct.pack('>i', length) # write to the file f.write(d) # pack the string item: '>ns' - means char [n] bt_item = item.encode() # convert str=>bytes d = struct.pack('>' + str(length) + 's', bt_item) # write to the file f.write(d) # 3.4. Close file f.close() # ------------------------------------------------------ # 4. Reading a recorded tuple from a file # 4.1. Open file for reading in binary mode f = open('myfile10.bin', 'rb') # 4.2. Count the number of elements (lines) in a file d = f.read(4) # Read the first 4 bytes - size of type int, d - packed data count = struct.unpack('>i', d)[0] # count - number of elements # 4.3.Generate an empty tuple T2 = () # 4.4. Lines reading cycle i = 0 while i<count: # get the length of the line d = f.read(4) length = struct.unpack('>i', d)[0] # create a format string sf = '>' + str(length) + 's' # read length bytes from file to object d d = f.read(length) # unpack the string according to the sf line sb = struct.unpack(sf, d)[0] # sb - string of type bytes # convert bytes=>str s = sb.decode() # Add string to the tuple T2 = T2 + (s,) i = i+1 # 4.5. Print the tuple T2 print("T2 = ", T2) # 4.6. Close file f.close() ``` The result of the program ``` T2 = ('abc', 'abcd', 'def', 'ghi jkl') ``` ### 2.4. An example of writing/reading a dictionary in which a __key:value__ pair is of type ``int``:``str`` ``` # Binary files. Module struct. # Example of writing/reading dictionary # 1. Include the struct module import struct # 2. Specified dictionary D = { 1:'Sunday', 2:'Monday', 3:'Tuesday', 4:'Wednesday', 5:'Thursday', 6:'Friday', 7:'Saturday' } # 3. Write tuple T to the file # 3.1. Open file for writing in binary mode f = open('myfile11.bin', 'wb') # 3.2. Write the number of items in the dictionary count = len(D) d = struct.pack('>i', count) # get packed data f.write(d) # write packed data # 3.3. Write each line of the dictionary in a loop. # Since each line can have a different length, # this length must also be written to a file. for key in D: # dictionary traversal cycle # write key - int number dd = struct.pack('>i', key) f.write(dd) # write line by key # get the length of line item length = len(D[key]) # pack the length of the string dd = struct.pack('>i', length) # write length to the file f.write(dd) # pack string D[key]: '>ns' - means char[n] bt_item = D[key].encode() # convert str=>bytes dd = struct.pack('>' + str(length) + 's', bt_item) # write to file f.write(dd) # 3.4. Close file f.close() # ------------------------------------------------------ # 4. Reading a recorded tuple from a file # 4.1. Open file for reading in binary mode f = open('myfile11.bin', 'rb') # 4.2. Count the number of elements (lines) in a file dd = f.read(4) # Read the first 4 bytes - size of type int, d - packed data count = struct.unpack('>i', dd)[0] # count - number of items in the dictionary # 4.3.Form an empty dictionary D2 = dict() # 4.4. Lines reading loop i = 0 while i<count: # 4.4.1. Read key - an integer of 4 bytes dkey = f.read(4) key = struct.unpack('>i', dkey)[0] # unpack data # 4.4.2. Read the number of characters per line dlength = f.read(4) # 4 - the number of type int length = struct.unpack('>i', dlength)[0] # 4.4.3. Read the line, first you need to form # the sf format line sf = '>' + str(length) + 's' ds = f.read(length) # read length bytes from file sb = struct.unpack(sf, ds)[0] # unpack string value = sb.decode() # 4.4.4. Add pair key:value to the dictionary D2 D2[key] = value i = i+1 # 4.5. Display the dictionary D2 print("D2 = ", D2) # 4.6. Close file f.close() ``` The result of the program ``` D2 = {1: 'Sunday', 2: 'Monday', 3: 'Tuesday', 4: 'Wednesday', 5: 'Thursday', 6: 'Friday', 7: 'Saturday'} ```