Try  HackMD Logo HackMD

Boosting Performance: Optimizing Directory Extent Block Access

Legacy design

The current design:

  • New file/ directory add into parent directory
    find the latest emply file sector and store it
  • Remove a file/ directory
    Set the inode to 0 and move the following sector forward.

New design

Inspired by ext4, we added a nr_blk field to record the number of contiguous blocks within the file.

Ext4 dir entry info

​​​​* __le32	inode;			/* Inode number */
​* __le16	rec_len;		/* Directory entry length */
​* __u8	    name_len;		/* Name length */
​* __u8	    file_type;		/* See file type macros EXT4_FT_* below */

e.g.

01091000  02 00 00 00 0c 00 01 02  2e 00 00 00 02 00 00 00  |................|
01091010  0c 00 02 02 2e 2e 00 00  0b 00 00 00 14 00 0a 02  |................|
01091020  6c 6f 73 74 2b 66 6f 75  6e 64 00 00 0c 00 00 00  |lost+found......|
01091030  10 00 05 02 74 65 73 74  31 00 00 00 01 20 00 00  |....test1.... ..|
01091040  10 00 05 02 74 65 73 74  32 00 00 00 02 20 00 00  |....test2.... ..|
01091050  10 00 05 02 74 65 73 74  33 00 00 00 03 20 00 00  |....test3.... ..|
01091060  98 0f 05 02 74 65 73 74  34 00 00 00 00 00 00 00  |....test4.......|
01091070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

lost+found:

__le32	inode =0x00 00 00 0b
__le16	rec_len = 0x0014
__u8	name_len = 0x0a(10)
__u8	file_type = 0x02	
file_name = 6c 6f 73 74 2b 66 6f 75  6e 64

-> empty space = 0x00 00 (len = 2)
empty len = 0x14 - 0x04(inode) - 0x02(rec_len) - 0x01(name_len) - 0x01(file_type) - 0x0a(file_name) = 0x02

test1:

__le32	inode =0x00 00 20 01
__le16	rec_len = 0x0010
__u8	name_len = 0x05(16)
__u8	file_type = 0x02
file_name = 74 65 73 74  31

-> empty space = 0x00 00 00 (len = 3)
empty len = 0x10 - 0x04(inode) - 0x02(rec_len) - 0x01(name_len) - 0x01(file_type) - 0x05(file_name) = 0x03

When removing a file or directory in Ext4, the filesystem merges the freed space with the previous to reduce fragmentation. This way, the space left behind can be efficiently reused.

Later, when a new file or directory is created, Ext4 scans the directory entries from the beginning β€” a first-fit strategy β€” to find available space. If it encounters a free segment that is larger than the requested size, Ext4 will sperate from this segment.

simplefs modify

struct simplefs_file {
    uint32_t inode;
+    uint32_t nr_blk;
    char filename[SIMPLEFS_FILENAME_LEN];
};

struct simplefs_dir_block {
+    uint32_t nr_files;
    struct simplefs_file files[SIMPLEFS_FILES_PER_BLOCK];
};

  • simplefs_file->nr_blk
    • record the number of blocks by following this file
    • e.g
      • nr_blk = 1 -> only this file
      • nr_blk = 3 -> this + 2 empty simplefs_file
  • simplefs_dir_block->nr_files
    • number of files in the simplefs_file[]

limitation

  1. The maximum number of files allowed per directory is reduced to 30,600 (for 200 M image file)
  2. Listing operation is slower compared to the legacy method
  3. Creating operation is slower compared to the legacy method

Modify code

code structure

(simplefs_file_ei_block)                (simplefs_extent)
eblock---- nr_files                     |-- nr_files (NEW)
       |                                |
       |-- simplefs_extent extents[ei] ---- ee_block
                                        |
                                        |-- ee_len
                                        |   (point to simplefs_dir_block)         (simplefs_file)
                                        |-- ee_start ---- nr_files (NEW)           |-- nr_blk (NEW)
                                                      |                            |
                                                      |-- simplefs_file files[fi] ---- inode
                                                                                   |
                                                                                   |-- filename

simplefs_create flow

Created with RaphaΓ«l 2.2.0simplefs_createget new inodefree blk in "file_ei_block.extents"alloc new simplefs_dir_blockalloc a new onefree bi in simplefs_dir_blockfree fi in simplefs_filesimplefs_set_file_into_dirfirst nr_blk > 1set next(fi + 1) nr_blk = previous - 1set current(fi ) nr_blk = 1endyesno

simplefs_remove_from_dir

Created with RaphaΓ«l 2.2.0simplefs_remove_from_dirfind fi in simplefs_fileset inode and merge into previous blkend

Test the performance

  • Remove 30600 files:

    • legacy method
    ​​​​      22062.16 msec task-clock                       #    0.996 CPUs utilized             
    ​​​​          1795      context-switches                 #   81.361 /sec                      
    ​​​​             7      cpu-migrations                   #    0.317 /sec                      
    ​​​​          4747      page-faults                      #  215.165 /sec                      
    ​​​​   51268295368      cycles                           #    2.324 GHz                       
    ​​​​   45654883366      instructions                     #    0.89  insn per cycle            
    ​​​​   10488643855      branches                         #  475.413 M/sec                     
    ​​​​      15951697      branch-misses                    #    0.15% of all branches           
    
    ​​​​  22.157212315 seconds time elapsed
    
    ​​​​   0.147259000 seconds user
    ​​​​  21.875822000 seconds sys
    
    • use block record
    ​​​​      14749.43 msec task-clock                       #    0.985 CPUs utilized             
    ​​​​          3707      context-switches                 #  251.332 /sec                      
    ​​​​             8      cpu-migrations                   #    0.542 /sec                      
    ​​​​          4725      page-faults                      #  320.351 /sec                      
    ​​​​   35578980636      cycles                           #    2.412 GHz                       
    ​​​​   41977131884      instructions                     #    1.18  insn per cycle            
    ​​​​    9800763267      branches                         #  664.484 M/sec                     
    ​​​​       6998037      branch-misses                    #    0.07% of all branches           
    
    ​​​​  14.968838411 seconds time elapsed
    
    ​​​​   0.101744000 seconds user
    ​​​​  14.571970000 seconds sys
    
  • create file

    • legacy
    ​​​​     102618.47 msec task-clock                       #    0.742 CPUs utilized             
    ​​​​        135243      context-switches                 #    1.318 K/sec                     
    ​​​​         48370      cpu-migrations                   #  471.358 /sec                      
    ​​​​       3855991      page-faults                      #   37.576 K/sec                     
    ​​​​  190008162916      cycles                           #    1.852 GHz                       
    ​​​​  137070979481      instructions                     #    0.72  insn per cycle            
    ​​​​   28270105922      branches                         #  275.488 M/sec                     
    ​​​​     561816180      branch-misses                    #    1.99% of all branches           
    
    ​​​​ 138.242576340 seconds time elapsed
    
    ​​​​  20.828307000 seconds user
    ​​​​  85.203551000 seconds sys
    
    • use block record
    ​​​​     112542.76 msec task-clock                       #    0.759 CPUs utilized             
    ​​​​        133884      context-switches                 #    1.190 K/sec                     
    ​​​​         50032      cpu-migrations                   #  444.560 /sec                      
    ​​​​       3856118      page-faults                      #   34.264 K/sec                     
    ​​​​  212785219291      cycles                           #    1.891 GHz                       
    ​​​​  141859239946      instructions                     #    0.67  insn per cycle            
    ​​​​   28871184080      branches                         #  256.535 M/sec                     
    ​​​​     569827259      branch-misses                    #    1.97% of all branches           
    
    ​​​​ 148.324364965 seconds time elapsed
    
    ​​​​  20.976661000 seconds user
    ​​​​  94.894073000 seconds sys
    
  • list all files

    • legacy
    ​​​​        165.21 msec task-clock                       #    0.968 CPUs utilized             
    ​​​​            15      context-switches                 #   90.796 /sec                      
    ​​​​             3      cpu-migrations                   #   18.159 /sec                      
    ​​​​          2220      page-faults                      #   13.438 K/sec                     
    ​​​​     389961636      cycles                           #    2.360 GHz                       
    ​​​​     577849909      instructions                     #    1.48  insn per cycle            
    ​​​​     108154959      branches                         #  654.666 M/sec                     
    ​​​​        315500      branch-misses                    #    0.29% of all branches           
    
    ​​​​   0.170592822 seconds time elapsed
    
    ​​​​   0.073521000 seconds user
    ​​​​   0.092742000 seconds sys
    
    • use block record
    ​​​​        162.95 msec task-clock                       #    0.966 CPUs utilized             
    ​​​​            17      context-switches                 #  104.326 /sec                      
    ​​​​             3      cpu-migrations                   #   18.410 /sec                      
    ​​​​          2215      page-faults                      #   13.593 K/sec                     
    ​​​​     377643105      cycles                           #    2.318 GHz                       
    ​​​​     570636788      instructions                     #    1.51  insn per cycle            
    ​​​​     106418855      branches                         #  653.075 M/sec                     
    ​​​​        302796      branch-misses                    #    0.28% of all branches           
    
    ​​​​   0.168633385 seconds time elapsed
    
    ​​​​   0.081990000 seconds user
    ​​​​   0.081889000 seconds sys
    
  • Non-linear file removal, including recreating the same number of files

    • legacy
    ​​​​ Performance counter stats for 'bash ../ut_script/random_remove_and_create.sh':
    
    ​​​​     245107.46 msec task-clock                       #    0.930 CPUs utilized             
    ​​​​        235366      context-switches                 #  960.256 /sec                      
    ​​​​          3848      cpu-migrations                   #   15.699 /sec                      
    ​​​​       7676129      page-faults                      #   31.317 K/sec                     
    ​​​​  447994787202      cycles                           #    1.828 GHz                       
    ​​​​  311853053406      instructions                     #    0.70  insn per cycle            
    ​​​​   64490868435      branches                         #  263.113 M/sec                     
    ​​​​    1103054966      branch-misses                    #    1.71% of all branches           
    
    ​​​​ 263.676798169 seconds time elapsed
    
    ​​​​  41.648814000 seconds user
    ​​​​ 209.926699000 seconds sys
    
    • use block record
    ​​​​         198258.26 msec task-clock                       #    0.941 CPUs utilized             
    ​​​​        208116      context-switches                 #    1.050 K/sec                     
    ​​​​          3557      cpu-migrations                   #   17.941 /sec                      
    ​​​​       7737611      page-faults                      #   39.028 K/sec                     
    ​​​​  383803751737      cycles                           #    1.936 GHz                       
    ​​​​  269416592342      instructions                     #    0.70  insn per cycle            
    ​​​​   54192972144      branches                         #  273.345 M/sec                     
    ​​​​    1046335847      branch-misses                    #    1.93% of all branches           
    
    ​​​​ 210.784186080 seconds time elapsed
    
    ​​​​  39.877310000 seconds user
    ​​​​ 164.927539000 seconds sys