Lecture 14: File Systems – Allocation and Performance Strategies

# Lecture 14: File Systems – Allocation and Performance Strategies # Slide Notes ## Free Space and Allocation Issues How do I keep track of a file system's free space and how do I allocate new disk blocks when needed? File systems are not usually static, you create, destroy, change, where such changes convert unused disk blocks to used blocks or visa versa. we need a correct and efficient way to do that which typically implies a need to maintain a free list of unused disk blocks. Here the free elements are fixed size blocks where locality matters and some specific issues such as for flash where there are issues of erasure and load leveling. ### Create a New File for UNIX we search the super-block free I-node list and take the first free I-node. For DOS we search the parent directory for an unused directory entry. we then initialize the new file control block with file type, protection, ownership (metadata) ### Extending a File applicaton requests new data be assigned to a file. this may be an explicit allocation request or may be implicit # Chapter 41 Locality and the Fast File System #### recall: The **super block (S)** contained information about the entire file system: how big the volume is, how many inodes there are, a pointer to the head of a free list of blocks, and so forth. The **inode** region of the disk contained all the inodes for the file system. Finally, most of the disk was taken up by data blocks. ## The Problem: Poor Performance The main issue was that the old UNIX file system treated the disk like it was a random-access memory; data was spread all over the place without regard to the fact that the medium holding the data was a disk, and thus had real and expensive positioning costs. For example, the data blocks of a file were often very far away from its inode, thus inducing an expensive seek whenever one first read the inode and then the data blocks of a file. The file system could also get **fragmented** is the free space was npot carefully managed. - there are now **defragmentation** tools to help with this ![Screenshot 2024-05-19 at 6.43.09 PM](https://hackmd.io/_uploads/BJvnqQOm0.png) E is no longer contiguous here and you don't get peak performance with accessing E. Smaller blocks were good because they minimized internal fragmentation (waste within the block), but bad for transfer as each block might require a positioning overhead to reach it. ## Disk Awareness is the Solution **Fast File System (FFS)** was designed to have "disk aware" structures and allocation policies to improve performance. - kept the same interface (same API) but changed the internal implementation ### Organizing Structure: The Cylinder Group FFS divides the disk into a number of **cylinder groups**. A single **cylinder** is a set of tracks on difference surfaces of a hard drive that are the same distance from the center of the drive. ![Screenshot 2024-05-19 at 6.49.57 PM](https://hackmd.io/_uploads/rJx8hXu7A.png) Note that modern drives do not export enough information for the file system to truly understand whether a particular cylinder is in use; as discussed previously [AD14a], disks export a logical address space of blocks and hide details of their geometry from clients. Thus, modern file systems (such as Linux ext2, ext3, and ext4) instead organize the drive into **block groups**, each of which is just a consecutive portion of the disk’s address space. ![Screenshot 2024-05-19 at 6.53.30 PM](https://hackmd.io/_uploads/B1VX6Q_QC.png) Critically, by placing two files within the same group, FFS can ensure that accessing one after the other will not result in long seeks across the disk. one group scructures: ![Screenshot 2024-05-19 at 6.56.17 PM](https://hackmd.io/_uploads/HJhTT7O7A.png) FFS keeps a **copy** of the **super block (S)** in each group for reliability reasons. The super block is needed to mount the file system; by keeping multiple copies, if one copy becomes corrupt, you can still mount and access the file system by using a working replica. A per-group **inode bitmap (ib)** and **data bitmap (db)** serve this role for inodes and data blocks in each group. Finally, the **inode** and **data block regions** are just like those in the previous very-simple file system (VSFS). ### Policies: How to Allocate Files and Directories The basic mantra is simple: **keep related stuff together** (and its corollary, **keep unrelated stuff far apart**). The first is the **placement of directories**. FFS employs a simple approach: find the cylinder group with a low number of allocated directories (to balance directories across groups) and a high number of free inodes (to subsequently be able to allocate a bunch of files), and put the directory data and inode in that group. - Of course, other heuristics could be used here (e.g., taking into account the number of free data blocks). For **files**, FFS does two things: 1. First, it makes sure (in the general case) to allocate the data blocks of a file in the same group as its inode, thus preventing long seeks between inode and data (as in the old file system). 2. Second, it places all files that are in the same directory in the cylinder group of the directory they are in. - Thus, if a user creates four files, /a/b, /a/c, /a/d, and b/f, FFS would try to place the first three near one another (same group) and the fourth far away (in some other group). ![Screenshot 2024-05-19 at 7.09.56 PM](https://hackmd.io/_uploads/r1kZbVumC.png) example assumes regular files are each two blocks in size, and that the directories have just a single block of data. The FFS policy heuristics are not based on extensive studies of file system traffic or anything particularly nuanced; rather, they are based on good old-fashioned common sense (isn’t that what CS stands for after all?) Files in a directory are often accessed together: imagine compiling a bunch of files and then linking them into a single executable. Becuase such namespace-based locality exists, FFS will often improve per formance, making sure that seeks between related files are nice and short. ### Measuring File Locality Specifically, we’ll use the SEER traces [K94] and analyze how “far away” file accesses were from one another in the directory tree. - Our distance metric, in other words, measures how far up the directory tree you have to travel to find the common ancestor of two files; the closer they are in the tree, the lower the metric. ![Screenshot 2024-05-19 at 7.16.46 PM](https://hackmd.io/_uploads/HyY9fE_mA.png) nearly 40% of file accesses were to either the same file or to one in the same directory (i.e., a difference of zero or one). Thus, the FFS locality assumption seems to make sense (at least for these traces) For comparison, the graph also shows locality for a “Random” trace. The random trace was generated by selecting files from within an existing SEER trace in random order, and calculating the distance metric between these randomly-ordered accesses. As you can see, there is less namespace locality in the random traces, as expected. However, because eventually every file shares a common ancestor (e.g., the root), there is some locality, and thus random is useful as a comparison point. ### The Large-File Exception Without a different rule, a large file would entirely fill the block group it is first placed within (and maybe others). Filling a block group in this manner is undesirable, as it prevents subsequent “related” files from being placed within this block group, and thus may hurt file-access locality. For **large files**, After some number of blocks are allocated into the first block group FFS places the next “large” chunk of the file in another block group (perhaps chosen for its low utilization). Then, the next chunk of the file is placed in yet another different block group, and so on. we do not want this: ![Screenshot 2024-05-19 at 7.24.55 PM](https://hackmd.io/_uploads/SJ-tVNOX0.png) we want this: ![Screenshot 2024-05-19 at 7.25.23 PM](https://hackmd.io/_uploads/Ska5VEOmA.png) This will hurt performance however you are choosing the lesser of 2 evils where if the chunk size is large enough, the file system will spend most of its time transferring data from disk and just a (relatively) little time seeking between chunks of the block. - This process of reducing an overhead by doing more work per overhead paid is called **amortization** and is a common technique in computer systems. ![Screenshot 2024-05-19 at 7.30.46 PM](https://hackmd.io/_uploads/rkzk8VdQC.png) As you can see, the closer you want to get to peak, the bigger these chunks get ### A Few Other Things About FFS In particular, the designers were extremely worried about accommodating small files; as it turned out, many files were 2KB or so in size back then, and using 4KB blocks, while good for transferring data, was not so good for space efficiency. This **internal fragmentation** could thus lead to roughly half the disk being wasted for a typical file system. They decided to introduce **sub-blocks**, which were 512-byte little blocks that the file system could allocate to files. As the file grew, the file system will continue allocating 512-byte blocks to it until it acquires a full 4KB of data. At that point, FFS will find a 4KB block, copy the sub-blocks into it, and free the sub-blocks for future use. - this process is inefficient, requiring a lot of extra work for the file system Thus, FFS generally avoided this pessimal behavior by modifying the libc library; the library would buffer writes and then issue them in 4KB chunks to the file system, thus avoiding the sub-block specialization entirely in most cases. In particular, another the problem arose during sequential reads. FFS would first issue a read to block 0; by the time the read was complete, and FFS issued a read to block 1, it was too late: block 1 had rotated under the head and now the read to block 1 would incur a full rotation. FFS solved this problem with a different layout. By skipping over every other block FFS has enough time to request the next block before it went past the disk head - FFS was smart enough to figure out for a particular disk how many blocks it should skip in doing layout in order to avoid the extra rotations; this technique was called **parameterization** - in the old days this would be 50% peak bandwidth Fortunately, modern disks are much smarter: they internally read the entire track in and buffer it in an internal disk cache (often called a **track buffer** for this very reason). Then, on subsequent reads to the track, the disk will just return the desired data from its cache. FS was one of the first file systems to allow for **long file names**, thus enabling more expressive names in the file system instead of the traditional fixed-size approach (e.g., 8 characters). Further, a new concept was introduced called a **symbolic link**. FFS also introduced an atomic rename() operation for renaming files. Usability improvements, beyond the basic technology, also likely gained FFS a stronger user base.