Try   HackMD

macOS File Storage Structure

By default, a macOS storage disk consists of one Apple File System (APFS) container. Each container may have multiple volumes.The default APFS container consists of five volumes.

1. System Volume Contains :
  • All necessary files to start up the Mac
  • All apps installed automatically by macOS
2. Data Volume Contains :
  • Information found in the user's folder, including photos and documents
  • Applications installed by the user
  • Custom frameworks installed by the user or organization.
3. Preboot Volume Contains :
  • All necessary files to boot the operating system, including recovery and other pre-boot information.
4. Recovery Volume Contains :
  • All macOS recovery tools, including the macOS Recovery environment, which allows for system repairs, reinstallations, or troubleshooting when the system is not bootable.
5. VM (Virtual Memory) Volume :
  • Used for swap space and temporary system data storage.

APFS File System Structure

  • All the file system structures are embedded in the file system as objects (Container Super Block, Volume Super Block, B-Tree Node, File System Tree, Object Map, Space Manager, Reaper)
  • Objects are stored on disk in blocks; a common block size is 4096 bytes.
  • I have a sample of mac disk APFS.dmg taken from FOR518 class exercise from SANS that I will go through it.
  • As mentiond above, APFS does not use a typical partition table to divide the storage into partitions, each with its own FS volume. Instead, it uses storage or a partition to set up a container. Though, to parse these Objects we need first to find the APFS container, and to locate it, we need to find partition type guid = 7C3457EF-0000-11AA-AA11-00306543ECAC.
  • So, I will go through full disk step by step to reach that partition.

You may face more than one partion type, that's why we need to locate the one which is actualy with the APFS Container

  1. First 512 bytes of the disk will be Protective MBR.

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

  2. APFS uses the GPT partition scheme. Based on GPT Boot sector sturcure Next bytes related to GPT Header which contains
    Partition Entry LBA = 02 , Partition Entry Size = 80 in hex

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  1. Every Partition Entry size is 80 in hex

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  1. Now we are in the partition Entry which contains Partition type guid = first 16-bytes

This was a little bit tricky as I go to Hummert and Pawlaszczyk's book to see how the GUID was extracted, then I backing to InovkeIR-Poster to ensure the method, and yea they totally match.

It Appears the GUID was stored in format :

[4-bytes little endian] [2-bytes 00 00] [2-bytes little endian] [2-bytes (not multi bytes) left as it is] [six-bytes (not multi bytes)]

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  1. Now it is clear that we got out Partition type, meaning that we are in the Container.
  2. Our next step will go for Starting LBA which is sector 28 which is Container Super block (our first object).
  • Just keep in mind that each object in APFS has a 32-byte object header from which we can determine what type of the object through object Type

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  • APFS uses different kinds of superblocks, and the first suberblock as we find is the Container Superblock (CSB), a nx_superblock_t structure.

1- Container Superblock nx_superblock_t :

  • Contains information on the blocksize, the number of blocks and pointers to the space manager.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Based on, "Mobile Forensics – The File Format Handbook book", Christian Hummert said that there are 2 things that should be fulfilled

  1. The NXSB magic must be found.
  2. The checksum must verify, or else there is something wrong with the checkpoint superblock.
  • The checksum is how to find the latest checkpoint superblock through parsing all blocks in the Checkpoint Descriptor Area, and find the block with the highest transaction id (XID) with the same object id (OID).

  • Not gonna do it as there are things that i really don't understand about it, and not important for me till now, If you wanna deep dive in it go on with that book.

2- Volume Super block apfs_superblock_t :

  • Exists for each volume in the file system. It contains the name of the volume, ID and a timestamp, similarly to the Container Superblock.
  • The magic key is APSB, so instead we go through each block we will use grep to find the location of the first Suberblock.
grep -abi APFS APFS.dmg
hexdump  -C -n 4096 -s $((20480+4096*90)) APFS.dmg

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  • It seemed that there are no directories or files in this volume.

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

  • Here is another block but contain real file number and directories

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  • Notice apfs_last_mod_time which is the time when the volume last mounted which is at offset 0x100, In our image at 0x000d6040 = 0xCF29D2975443ED15, You should know that this time is 64-bit time value and little endian, so I made a script to convert it from little endian and consider 64-bit case to get the right time.
import datetime
def hex_to_datetime(hex_value):
    hex_value = hex_value[2:] if hex_value.startswith("0x") else hex_value
    if len(hex_value) % 2 != 0:
        hex_value = '0' + hex_value  
    little_endian_hex = ''.join(reversed([hex_value[i:i+2] for i in range(0, len(hex_value), 2)]))  
    timestamp_in_nanoseconds = int(little_endian_hex, 16)  
    timestamp_in_seconds = timestamp_in_nanoseconds / 1_000_000_000 
    dt_object = datetime.datetime.utcfromtimestamp(timestamp_in_seconds)
    return dt_object.strftime("%A, %d %B %Y %H:%M:%S UTC")
hex_value = input("Enter a 64-bit hexadecimal timestamp (e.g., 0x15E3C994B2AF9600): ")
if hex_value.startswith("0x"):
    try:
        formatted_date = hex_to_datetime(hex_value)
        print(f"Converted date and time: {formatted_date}")
    except ValueError:
        print("Invalid hexadecimal value. Please check your input.")
else:
    print("Please enter the hexadecimal value starting with '0x'.")

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Now we have talked about File System category which includes (Container Superblock Object, Volume Superblock Object), we will talk about Metadata category which will include B-tree object.

B-Tree

  • The B-trees used in Apple File System are implemented using the btree_node_phys_t structure to represent a node, and this structure is used for all nodes in tree.
  • First, we should know that there are 2 types of B-Tree Node ( Root and Non-Root Node B-Tree), The main difference is that the Root B-Tree Node contain an instance of btree_info_t at the end of the block, this instance hold information about tree itself like (sizes of keys and values, the total number of keys in the tree)

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

That's so good, Now u wonder how we can reach that block which contains B-Tree??

Answer is : All objects in APFS are 4096 bytes so that block should be got from listing all blocks and looking at Object Type = 3 for B-Tree Node and Object Type = 2 for B-Tree, really No books or references mentioned that that block has any magic bytes to search for, so I made a script based on my image, (u can customize it with your image), for giving me all blocks that are for B-Tree Node, and for B-Tree, by matching object type 03 00 , 02 00 with second line of hexdump output as this will contain offset 18,19 in hex.

b-tree-node.py script :

import subprocess

# Function to get hexdump output for a specific block
def get_hexdump(offset):
    # Run the hexdump command for the given offset
    command = ['hexdump', '-C', '-n', '4096', '-s', str(offset), 'APFS.dmg']
    result = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    return result.stdout.decode()

# Iterate through block numbers from 0 to 300
for block_num in range(301):
    # Calculate the offset: 20480 + 4096 * block_num
    offset = 20480 + 4096 * block_num
    
    # Get the hexdump for the current block
    hexdump_output = get_hexdump(offset)
    
    # Split the hexdump output into lines
    lines = hexdump_output.splitlines()
    
    # Check the second line of the hexdump (which contains offsets 18 and 19)
    if len(lines) > 1:
        line = lines[1]  # We are only interested in the second line
        
        # The first column of the line is the memory address, and the second column contains the byte values
        parts = line.split()
        
        # Ensure there are enough columns (at least 11 bytes in the line)
        if len(parts) > 9:
            # Extract the 18th and 19th bytes, which are columns 10 and 11 in the hexdump output
            byte_18 = parts[9]  # The 18th byte in the output (columns 10)
            byte_19 = parts[10] # The 19th byte in the output (columns 11)
            
            # Check if the extracted bytes are '03 00'
            if byte_18 == '03' and byte_19 == '00':
                # Print the block number, offset, and matching line
                print(f"Block {block_num} found at offset {offset} with bytes '03 00' at 18-19")
                print(line)

image

You can make a better script, this is just for helping me to get my hits.

Just before parsing B-Tree Node block, I should illustrate some points:

  1. Each B-Tree Node contains a structure to various pieces of the B-Tree.
  2. B-Tree Node could be leaf node and nonleaf node, as Leaf nodes are the final destination for queries in a B-tree, as they store the actual metadata, but non-leaf nodes contain keys and pointers to other nodes (child nodes), but no direct data.
  3. B-Tree Node contains a Table of Contents (TOC) which stores the location of each key and value that form a key-value pair.

B-Tree Node btree_node_phys_t

We have about 4 blocks for B-Tree Node so, I will work on last block (216)

  • We will skip object header (0x20 in size).
  • The next section is node header (0x18 in size) which starts from offset 0x20

image

Flags for B-Tree

image

image

1. btn_table_space : contains offset and length to Table Of Content (TOC)
2. btn_free_space : refers to the offset where the free space (unused space) is tracked for a particular node in the B-tree structure.

Now I have offset to TOC and it's length 0x180 in size, and offset just after the Node header.

image

Now, the TOC contains keys (47 key in totall) and values :

- If "BTNODE_FIXED_KV_SIZE" flag is set (1), only offsets to keys and values are used. If not (0), both offset and length are used.
- In our case it is not set, see Flags above.  
- Format : 2-bytes [key_offset] 2-bytes [key_length] 2-bytes [value_offset] 2-bytes [value_length] 
- All offsets for keys are relative to the start of the key area (Key area is after TOC area).
- All offsets for values are relative from the end of the value area (the bottom of the value area).

image

image

  • After we now know the keys and values till end of TOC, we will parse keys to get the metadata.
  • The first 8-bytes of any key determine the Inode Number and Entry Type j_obj_types.

image

With the same sequnce we can know inode number and type of that inode (dir, file, XATTR,) based on j_obj_types Table, values are in hex

image

Based on Apple-File-System Reference, Every Type in j_obj_types Table, has a structure descriping the key and it's value.

image

Here is another one with different object type.

image

  • Now we need to get the value related to those keys, as we knew from TOC, 4-bytes for key, 4-bytes for it's value.
  • And as a reminder the value is relative from the end of the value area which will be the end of the block, and case if the node is root node it will be relative from the start of the btree_info_t.
  • Here is a simple picture (from Hummert and Pawlaszczyk's book) of what all area's look like.

image

Remainder : Root Node can be determined from Flags in the node header.
image

  • In our case we are b-tree node not root, so we will go to value offset relative from the end of the block.
  • We got key 10 and knew that it was xattr attribute com.apple.Finder.Info, and here is keys and values from TOC.
Name Value
key_offset 0x9A
key_length 0x1F
value_offset 0x170
value_length 0x24

The end offset of the block is 0x000ddff0, and we will calc from 0x000de000 to up, I mean we are at the end of the block and we will go up with the offset through subtract value_offset from the relative address 0x000de000-0x170 = 0xDDE90 .

image

Now as I said, based on Apple-File-System-Reference, every key and value have it's own structure. And as we now with xattr attribute, it has j_xattr_key_t and j_xattr_val_t

image

This is the content of the attribute as it is com.apple.FinderInfo, and this is what it is look like on live system.

image

Now, I will repeat and search for another key and do same process to get the value.

image

From com.apple.lastuseddata#PS we should know the last time the file with the same inode (0x14) last opened. And the first 8-bytes of that data should give us the exact time.

image

Now we showed our results with directory, xattr attribute. I will do one more for Inode as it is so important and contains alot of data.

  • Back to our disk to get the data

Our Entry point is TOC, which we will get the key and value from. I will get the key which will point to and inode.

image

  • Notice the inode key structure which contains only the hdr which is 8-bytes, and that matches the key_length on TOC.

  • Now lets go to value_offset and determine our specific vlaue are with value_length. Our relative offset is 0x000de000-0x57D = 0xDDA83 this result offset will be relative to the start of the disk.

image

Now lets see the structure of j_inode_val_t from Apple-File-System-Reference.pdf and get our data.

image

Parent_id = 13 00 00 00 00 00 00 00 = 0x13 = 19 (Parent Inode Number)
private_id = 14 00 00 00 00 00 00 00 = 0x14 = 20 (Inode Number)
create_time = 00 8A 47 D3 14 43 ED 15 = Saturday, 25 January 2020 22:53:21 UTC
mod_time = 00 8A 47 D3 14 43 ED 15 = Saturday, 25 January 2020 22:53:21 UTC
change_time = 7B 69 84 93 37 43 ED 15 = Saturday, 25 January 2020 22:55:50 UTC
access_time = 00 D0 8E CE 2C 43 ED 15 = Saturday, 25 January 2020 22:55:04 UTC
internal_flags= 10 84 00 00 00 00 00 00
nchildren = 01 00 00 00 = 1  # This union field is valid only if the inode is a directory, then it's value will be Nubmer of Entries in the Directory
protection_class= 00 00 00 00
generation_counter = 03 00 00 00 = 3
bsd_flags = 00 00 00 00 
owner_uid = F5 01 00 00 = 0x1F5 = 501 (decimal)
group_gid = 14 00 00 00 = 0x14 = 20 (decimal)
mode = A4 81 00 00 = 0x81A4 = S_IFREG (regular file), rw-r--r--

And last thing in j_inode_val_t structure is Extended Fields
xfileds[]
Extended Fields : Directory entries and inodes use extended fields to store a dynamically extensible set of member fields.

We will skip 2-bytes for pad, 8-bytes for uncompressed_size then we now are in Extended Fields.

Extended Fields section has a it's own structure, lets see

image

As shown, the Exteded Fields, has it's own structure, we knew that there are 2 Extended, and for each extended type, we can determine what we can get :

  1. The first Extended File Name = smudge_yoda.jpeg
  2. The seconde one is Data Stream which will give us the location and size of the file, don't forget, this is object type = inode , and we later that the mode of this inode is rw-r--r--, meaning that this is a file so, we got the file name from first Extended, then the second one will give us location and size to extract the file data. So it should be a structure for data stream to give is what we need.
  3. Apple-File-System-Reference has the structure named j_dstream_t which will guid us.

we need only the size and location, when I looked at the address where the size exists, i found 7-bytes of zeros, then it appears that the first 7-bytes are unused, so the size will be the next 8-bytes.

image

So, now I don't have the physical block location where the data exist so, I asked chatGPT, where should i search to find such data, he didn't give a lot of help, but it enlightened where to look so, back to Apple-Reference, It mentioned that there is Object Maps which uses a B-tree to store a mapping from virtual object identifiers and transaction identifiers to the physical addresses where those objects are stored.
Now, I noticed that it's value structure omap_val_t will contain the size and physical block address.

image

The good thing here, that the size will be the same size of allocated_size in Extended Field, so it will be easy to search for the same size and look for the hits that will give the next 8-bytes of non-zero values.

All hits will give us 2 uniqe results :

1. 00 20 02 00 00 00 00 00 69 00 00 00 00 00 00 00
2. 00 20 02 00 00 00 00 00 00 00 00 00 00 00 00 00

image

image

Now we have the block number where the jpeg photo locate, we can simply use dd to extract it.

  • block 105, you already know the blocks starts after 20480 bytes so we need to start count from 20480+105*4096 = 450560 and specifiy the allocated size = 138278
dd if=APFS.dmg ibs=1 skip=450560 count=138278 > file.jpeg 

image

image

Finally, That's is all what we want, there are many things with APFS that needs days to explain, like Object Maps process, and also if you notice that every structure has at it's end __attribute__((packed)) indicates that there will be more values case if it's flag condition is already met.

I would like to thank Sara Edwards for the great course, awesome materials, really one of the juicy courses I came across.

That's it, Thank for reading such a long blog.

References :

  1. FOR518 Course by SANS
  2. jamf-100-course
  3. Invoke-IR ForensicPosters
  4. Mobile Forensics – The File Format Handbook
  5. Apple File System Reference