## macOS File Storage Structure By default, a macOS storage disk consists of **one** Apple File System (APFS) **container**. Each container may have multiple volumes.The default APFS container consists of five volumes. ##### 1. System Volume Contains : - All necessary files to start up the Mac - All apps installed automatically by macOS ##### 2. Data Volume Contains : - Information found in the user's folder, including photos and documents - Applications installed by the user - Custom frameworks installed by the user or organization. ##### 3. Preboot Volume Contains : - All necessary files to boot the operating system, including recovery and other pre-boot information. ##### 4. Recovery Volume Contains : - All macOS recovery tools, including the macOS Recovery environment, which allows for system repairs, reinstallations, or troubleshooting when the system is not bootable. ##### 5. VM (Virtual Memory) Volume : - Used for swap space and temporary system data storage. ## APFS File System Structure - All the file system structures are embedded in the file system as objects (Container Super Block, Volume Super Block, B-Tree Node, File System Tree, Object Map, Space Manager, Reaper) - Objects are stored on disk in **blocks**; a common block size is `4096 bytes`. - I have a sample of mac disk `APFS.dmg` taken from FOR518 class exercise from [SANS](https://www.sans.org/cyber-security-courses/mac-and-ios-forensic-analysis-and-incident-response/) that I will go through it. - As mentiond above, APFS does not use a typical partition table to divide the storage into partitions, each with its own FS volume. Instead, it uses storage or a partition to set up a **container**. Though, to parse these Objects we need first to find the **APFS container**, and to locate it, we need to find `partition type guid = 7C3457EF-0000-11AA-AA11-00306543ECAC`. - So, I will go through full disk step by step to reach that partition. > You may face more than one partion type, that's why we need to locate the one which is actualy with the APFS Container 1. First `512 bytes` of the disk will be Protective MBR. ![image](https://hackmd.io/_uploads/SJ-PsuMMkl.png) 2. APFS uses the GPT partition scheme. Based on GPT Boot sector sturcure Next bytes related to `GPT Header` which contains `Partition Entry LBA = 02` , `Partition Entry Size = 80 in hex` ![image](https://hackmd.io/_uploads/ryDp-tMfkg.png) 3. Every Partition Entry size is `80 in hex` ![image](https://hackmd.io/_uploads/B1eifFfzyg.png) 4. Now we are in the partition Entry which contains `Partition type guid = first 16-bytes` This was a little bit tricky as I go to Hummert and Pawlaszczyk's book to see how the GUID was extracted, then I backing to InovkeIR-Poster to ensure the method, and yea they totally match. It Appears the GUID was stored in format : ``` [4-bytes little endian] [2-bytes 00 00] [2-bytes little endian] [2-bytes (not multi bytes) left as it is] [six-bytes (not multi bytes)] ``` ![image](https://hackmd.io/_uploads/ryxswtfMyx.png) 5. Now it is clear that we got out Partition type, meaning that we are in the Container. 6. Our next step will go for Starting LBA which is `sector 28` which is `Container Super block` (our first object). - Just keep in mind that each object in APFS has a `32-byte` **object header** from which we can determine what type of the object through `object Type` ![image](https://hackmd.io/_uploads/SJ-jYizG1x.png) - APFS uses different kinds of superblocks, and the first suberblock as we find is the Container Superblock (CSB), a `nx_superblock_t` structure. #### 1- Container Superblock `nx_superblock_t` : - Contains information on the blocksize, the number of blocks and pointers to the space manager. ![image](https://hackmd.io/_uploads/HyZbxnfG1l.png) Based on, "Mobile Forensics – The File Format Handbook book", Christian Hummert said that there are 2 things that should be fulfilled 1. The `NXSB` magic must be found. 2. The checksum must verify, or else there is something wrong with the checkpoint superblock. - The checksum is how to find the latest checkpoint superblock through parsing all blocks in the `Checkpoint Descriptor Area`, and find the block with the highest transaction id (XID) with the same object id (OID). - Not gonna do it as there are things that i really don't understand about it, and not important for me till now, If you wanna deep dive in it go on with that book. #### 2- Volume Super block `apfs_superblock_t` : - Exists for each volume in the file system. It contains the name of the volume, ID and a timestamp, similarly to the Container Superblock. - The magic key is `APSB`, so instead we go through each block we will use grep to find the location of the first Suberblock. ``` grep -abi APFS APFS.dmg hexdump -C -n 4096 -s $((20480+4096*90)) APFS.dmg ``` ![image](https://hackmd.io/_uploads/rJ8C9Tzzyx.png) ![image](https://hackmd.io/_uploads/HynXb0GMJe.png) - It seemed that there are no directories or files in this volume. ![image](https://hackmd.io/_uploads/ByWrb0GGye.png) - Here is another block but contain real file number and directories ![image](https://hackmd.io/_uploads/SyEPp0fM1e.png) - Notice `apfs_last_mod_time` which is the time when the volume last mounted which is at `offset 0x100`, In our image at `0x000d6040 = 0xCF29D2975443ED15`, You should know that this time is **64-bit time value and little endian**, so I made a script to convert it from little endian and consider 64-bit case to get the right time. ``` import datetime def hex_to_datetime(hex_value): hex_value = hex_value[2:] if hex_value.startswith("0x") else hex_value if len(hex_value) % 2 != 0: hex_value = '0' + hex_value little_endian_hex = ''.join(reversed([hex_value[i:i+2] for i in range(0, len(hex_value), 2)])) timestamp_in_nanoseconds = int(little_endian_hex, 16) timestamp_in_seconds = timestamp_in_nanoseconds / 1_000_000_000 dt_object = datetime.datetime.utcfromtimestamp(timestamp_in_seconds) return dt_object.strftime("%A, %d %B %Y %H:%M:%S UTC") hex_value = input("Enter a 64-bit hexadecimal timestamp (e.g., 0x15E3C994B2AF9600): ") if hex_value.startswith("0x"): try: formatted_date = hex_to_datetime(hex_value) print(f"Converted date and time: {formatted_date}") except ValueError: print("Invalid hexadecimal value. Please check your input.") else: print("Please enter the hexadecimal value starting with '0x'.") ``` ![image](https://hackmd.io/_uploads/ryOJAc8zJe.png) Now we have talked about File System category which includes (Container Superblock Object, Volume Superblock Object), we will talk about Metadata category which will include B-tree object. #### B-Tree - The B-trees used in Apple File System are implemented using the `btree_node_phys_t` structure to represent a node, and this structure is used for all nodes in tree. - First, we should know that there are 2 types of B-Tree Node ( Root and Non-Root Node B-Tree), The main difference is that the Root B-Tree Node contain an instance of `btree_info_t` at the end of the block, this instance hold information about tree itself like (sizes of keys and values, the total number of keys in the tree) ![image](https://hackmd.io/_uploads/B1EUgb4Gyl.png) That's so good, Now u wonder how we can reach that block which contains B-Tree?? Answer is : All objects in APFS are `4096 bytes` so that block should be got from listing all blocks and looking at `Object Type = 3` for B-Tree Node and `Object Type = 2` for B-Tree, really No books or references mentioned that that block has any `magic` bytes to search for, so I made a script based on my image, (u can customize it with your image), for giving me all blocks that are for `B-Tree Node`, and for `B-Tree`, by matching object type `03 00` , `02 00` with second line of hexdump output as this will contain `offset 18,19 in hex`. `b-tree-node.py` script : ``` import subprocess # Function to get hexdump output for a specific block def get_hexdump(offset): # Run the hexdump command for the given offset command = ['hexdump', '-C', '-n', '4096', '-s', str(offset), 'APFS.dmg'] result = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE) return result.stdout.decode() # Iterate through block numbers from 0 to 300 for block_num in range(301): # Calculate the offset: 20480 + 4096 * block_num offset = 20480 + 4096 * block_num # Get the hexdump for the current block hexdump_output = get_hexdump(offset) # Split the hexdump output into lines lines = hexdump_output.splitlines() # Check the second line of the hexdump (which contains offsets 18 and 19) if len(lines) > 1: line = lines[1] # We are only interested in the second line # The first column of the line is the memory address, and the second column contains the byte values parts = line.split() # Ensure there are enough columns (at least 11 bytes in the line) if len(parts) > 9: # Extract the 18th and 19th bytes, which are columns 10 and 11 in the hexdump output byte_18 = parts[9] # The 18th byte in the output (columns 10) byte_19 = parts[10] # The 19th byte in the output (columns 11) # Check if the extracted bytes are '03 00' if byte_18 == '03' and byte_19 == '00': # Print the block number, offset, and matching line print(f"Block {block_num} found at offset {offset} with bytes '03 00' at 18-19") print(line) ``` ![image](https://hackmd.io/_uploads/Bk2inb4fJe.png) > You can make a better script, this is just for helping me to get my hits. Just before parsing B-Tree Node block, I should illustrate some points: 1. Each B-Tree Node contains a structure to various pieces of the B-Tree. 2. B-Tree Node could be leaf node and nonleaf node, as Leaf nodes are the final destination for queries in a B-tree, as they store the actual metadata, but non-leaf nodes contain keys and pointers to other nodes (child nodes), but no direct data. 3. B-Tree Node contains a Table of Contents (TOC) which stores the location of each key and value that form a key-value pair. #### B-Tree Node `btree_node_phys_t` We have about 4 blocks for B-Tree Node so, I will work on last block (216) - We will skip object header (0x20 in size). - The next section is node header (0x18 in size) which starts from offset `0x20` ![image](https://hackmd.io/_uploads/rJ39ZQ4Gkg.png) Flags for B-Tree ![image](https://hackmd.io/_uploads/B1A9fmNMkg.png) ![image](https://hackmd.io/_uploads/HkRCQIVz1e.png) ``` 1. btn_table_space : contains offset and length to Table Of Content (TOC) 2. btn_free_space : refers to the offset where the free space (unused space) is tracked for a particular node in the B-tree structure. ``` Now I have offset to TOC and it's length `0x180` in size, and offset just after the Node header. ![image](https://hackmd.io/_uploads/B1_-GwEGyx.png) Now, the TOC contains keys (47 key in totall) and values : ``` - If "BTNODE_FIXED_KV_SIZE" flag is set (1), only offsets to keys and values are used. If not (0), both offset and length are used. - In our case it is not set, see Flags above. - Format : 2-bytes [key_offset] 2-bytes [key_length] 2-bytes [value_offset] 2-bytes [value_length] - All offsets for keys are relative to the start of the key area (Key area is after TOC area). - All offsets for values are relative from the end of the value area (the bottom of the value area). ``` ![image](https://hackmd.io/_uploads/H1vTvPEG1l.png) ![image](https://hackmd.io/_uploads/ByOk5ONGyg.png) - After we now know the keys and values till end of TOC, we will parse keys to get the metadata. - The `first 8-bytes` of any key determine the `Inode Number` and Entry Type `j_obj_types`. ![image](https://hackmd.io/_uploads/ryvlfKVz1e.png) With the same sequnce we can know inode number and type of that inode (dir, file, XATTR,...) based on `j_obj_types` Table, `values are in hex` ![image](https://hackmd.io/_uploads/Hy8wGt4GJg.png) Based on Apple-File-System Reference, Every Type in `j_obj_types` Table, has a structure descriping the key and it's value. ![image](https://hackmd.io/_uploads/BkYDat4Gyx.png) Here is another one with different object type. ![image](https://hackmd.io/_uploads/r1HsWqNGkl.png) - Now we need to get the value related to those keys, as we knew from TOC, 4-bytes for key, 4-bytes for it's value. - And as a reminder the value is relative from the end of the value area which will be the end of the block, and case if the node is `root node` it will be relative from the start of the `btree_info_t`. - Here is a simple picture (from Hummert and Pawlaszczyk's book) of what all area's look like. ![image](https://hackmd.io/_uploads/SJKHZTDGyx.png) > Remainder : Root Node can be determined from `Flags` in the node header. ![image](https://hackmd.io/_uploads/ByYFV68zyl.png) - In our case we are b-tree node not root, so we will go to value offset relative from the end of the block. - We got `key 10` and knew that it was xattr attribute `com.apple.Finder.Info`, and here is keys and values from TOC. | Name | Value | | -------- | -------- | | key_offset | 0x9A | | key_length|0x1F | value_offset | 0x170 | value_length| 0x24 The end offset of the block is `0x000ddff0`, and we will calc from `0x000de000` to up, I mean we are at the end of the block and we will go up with the offset through subtract value_offset from the relative address `0x000de000-0x170 = 0xDDE90` . ![image](https://hackmd.io/_uploads/SyxVYpUGJe.png) Now as I said, based on Apple-File-System-Reference, every key and value have it's own structure. And as we now with xattr attribute, it has `j_xattr_key_t` and `j_xattr_val_t` ![image](https://hackmd.io/_uploads/HkTDTTIzJg.png) This is the content of the attribute as it is `com.apple.FinderInfo`, and this is what it is look like on live system. ![image](https://hackmd.io/_uploads/S1kHR6Ifyx.png) Now, I will repeat and search for another key and do same process to get the value. ![image](https://hackmd.io/_uploads/B1XnVRIM1x.png) From `com.apple.lastuseddata#PS` we should know the last time the file with the same inode (0x14) last opened. And the first `8-bytes` of that data should give us the exact time. ![image](https://hackmd.io/_uploads/SyqBB08G1x.png) Now we showed our results with directory, xattr attribute. I will do one more for Inode as it is so important and contains alot of data. - Back to our disk to get the data... Our Entry point is TOC, which we will get the key and value from. I will get the key which will point to and inode. ![image](https://hackmd.io/_uploads/r1NB_6vG1g.png) - Notice the inode key structure which contains only the `hdr` which is `8-bytes`, and that matches the `key_length` on TOC. - Now lets go to value_offset and determine our specific vlaue are with value_length. Our relative offset is `0x000de000-0x57D = 0xDDA83` this result offset will be relative to the start of the disk. ![image](https://hackmd.io/_uploads/BkR0KawMke.png) Now lets see the structure of `j_inode_val_t` from **Apple-File-System-Reference.pdf** and get our data. ![image](https://hackmd.io/_uploads/ryURHRwfkg.png) ``` Parent_id = 13 00 00 00 00 00 00 00 = 0x13 = 19 (Parent Inode Number) private_id = 14 00 00 00 00 00 00 00 = 0x14 = 20 (Inode Number) create_time = 00 8A 47 D3 14 43 ED 15 = Saturday, 25 January 2020 22:53:21 UTC mod_time = 00 8A 47 D3 14 43 ED 15 = Saturday, 25 January 2020 22:53:21 UTC change_time = 7B 69 84 93 37 43 ED 15 = Saturday, 25 January 2020 22:55:50 UTC access_time = 00 D0 8E CE 2C 43 ED 15 = Saturday, 25 January 2020 22:55:04 UTC internal_flags= 10 84 00 00 00 00 00 00 nchildren = 01 00 00 00 = 1 # This union field is valid only if the inode is a directory, then it's value will be Nubmer of Entries in the Directory protection_class= 00 00 00 00 generation_counter = 03 00 00 00 = 3 bsd_flags = 00 00 00 00 owner_uid = F5 01 00 00 = 0x1F5 = 501 (decimal) group_gid = 14 00 00 00 = 0x14 = 20 (decimal) mode = A4 81 00 00 = 0x81A4 = S_IFREG (regular file), rw-r--r-- ``` And last thing in `j_inode_val_t` structure is Extended Fields `xfileds[]` Extended Fields : Directory entries and inodes use extended fields to store a dynamically extensible set of member fields. We will skip `2-bytes for pad`, `8-bytes for uncompressed_size` then we now are in Extended Fields. Extended Fields section has a it's own structure, lets see... ![image](https://hackmd.io/_uploads/ryIFXyOMye.png) As shown, the Exteded Fields, has it's own structure, we knew that there are 2 Extended, and for each extended type, we can determine what we can get : 1. The first Extended `File Name = smudge_yoda.jpeg` 2. The seconde one is `Data Stream` which will give us the location and size of the file, don't forget, this is `object type = inode` , and we later that the mode of this inode is `rw-r--r--`, meaning that this is a file so, we got the file name from first Extended, then the second one will give us location and size to extract the file data. So it should be a structure for data stream to give is what we need. 3. Apple-File-System-Reference has the structure named `j_dstream_t` which will guid us. we need only the size and location, when I looked at the address where the size exists, i found `7-bytes of zeros`, then it appears that the first 7-bytes are **unused**, so the size will be the next 8-bytes. ![image](https://hackmd.io/_uploads/BkFkCy_M1e.png) So, now I don't have the physical block location where the data exist so, I asked chatGPT, where should i search to find such data, he didn't give a lot of help, but it enlightened where to look so, back to Apple-Reference, It mentioned that there is `Object Maps` which uses a B-tree to store a mapping from virtual object identifiers and transaction identifiers to the physical addresses where those objects are stored. Now, I noticed that it's value structure `omap_val_t` will contain the size and physical block address. ![image](https://hackmd.io/_uploads/Sk03QxuzJe.png) The good thing here, that the size will be the same size of `allocated_size` in Extended Field, so it will be easy to search for the same size and look for the hits that will give the next 8-bytes of non-zero values. All hits will give us `2 uniqe results` : ``` 1. 00 20 02 00 00 00 00 00 69 00 00 00 00 00 00 00 2. 00 20 02 00 00 00 00 00 00 00 00 00 00 00 00 00 ``` ![image](https://hackmd.io/_uploads/ByXlSlOMJg.png) ![image](https://hackmd.io/_uploads/S1ihBg_zye.png) Now we have the block number where the jpeg photo locate, we can simply use `dd` to extract it. - block 105, you already know the blocks starts after `20480 bytes` so we need to start count from `20480+105*4096 = 450560` and specifiy the allocated size = 138278 ``` dd if=APFS.dmg ibs=1 skip=450560 count=138278 > file.jpeg ``` ![image](https://hackmd.io/_uploads/HJCmIluf1x.png) ![image](https://hackmd.io/_uploads/HkZbveuzkl.png) Finally, That's is all what we want, there are many things with APFS that needs days to explain, like Object Maps process, and also if you notice that every structure has at it's end `__attribute__((packed))` indicates that there will be more values case if it's flag condition is already met. I would like to thank [Sara Edwards](https://www.sans.org/profiles/sarah-edwards/) for the great course, awesome materials, really one of the juicy courses I came across. That's it, Thank for reading such a long blog. <iframe src="https://giphy.com/embed/26BGqofNXjxluwX0k" width="480" height="480" style="" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/tvland-dead-tired-26BGqofNXjxluwX0k"> </a></p> ### References : 1. [FOR518 Course by SANS](https://www.sans.org/cyber-security-courses/mac-and-ios-forensic-analysis-and-incident-response/) 2. [jamf-100-course](https://learn.jamf.com/en-US/bundle/jamf-100-course-current/page/Lesson_4.html) 3. [Invoke-IR ForensicPosters](https://github.com/Invoke-IR/ForensicPosters/blob/master/Posters/BootSectors/GuidPartitionTable.png) 4. [Mobile Forensics – The File Format Handbook](https://link.springer.com/book/10.1007/978-3-030-98467-0) 5. [Apple File System Reference](https://developer.apple.com/support/downloads/Apple-File-System-Reference.pdf)