Final Report For Project 3: </br> File Systems

# Final Report For Project 3: </br> File Systems ## Changes Made ### Task 1: Buffer Cache #### Additional Functions/Macros In order to streamline our implementation, we added a few new helper functions: - `void filesys_read (block_sector_t sector, void *buffer)`` - Public wrapper for writing to the filesystem block device via the buffer cache - `void filesys_write (block_sector_t sector, void *buffer)` - Public wrapper for writing to the filesystem block device via the buffer cache - `static void enter_monitor (bool writer)`: - Enters the buffer cache's reader-writer monitor as a reader if `writer = 0` and as a writer otherwise - `static void leave_monitor (bool writer)`: - Exits the buffer cache's reader-writer monitor as a reader if `writer = 0` and as a writer otherwise - `static void bcache_clear ()`: - Resets the entire buffer cache by calling `bcache_flush` on every entry and zeroing out the data block - `#define BCACHE_LOCK(index, name)`: - Resolves to `(&(bcache_lock_table[index].name))` - Helps code clarity - `#define BCACHE_LOCK(index, name)`: - Resolves to `(&(bcache_lock_table[index].name))` - Helps code clarity #### Access Syntax In accordance with the recommendations made in our design review, we decided against our original plan of modifying the internal logic of `block_read` and `block_write`, and instead added a transparent API layer to the buffer cache, subsequently replacing the `block_` calls in `inode.c` with their newly-implemented `filesys_` equivalents. In addition, we decided to change the linkage of most internal bcache functions to static, as they are not meant to be used by any translation unit other than that of `block.c` itself. The functions that remain globally visible are: - `filesys_read` - `filesys_write` - `bcache_clear` - `bcache_init` _Note: `bcache_read` and `bcache_write` do not read/write to the buffer cache in the traditional sense; these are the low-level functions that actually perform the action of reading/writing to an individual cache line once given an index to lock on to, as noted in the comments._ #### Function Changes In order to aid in testing and debugging, we added a series of synchronization precondition checks to each cache function via assert checks on the ownership statuses of relevant locks: - `bcache_read (size_t index, void *buffer)`: 1. The 'status' lock at the given index _is not_ owned by the current thread 2. The 'data' lock at the given index _is_ owned by the current thread - `bcache_write (size_t index, void *buffer)`: 1. The 'status' lock at the given index _is not_ owned by the current thread 2. The 'data' lock at the given index _is_ owned by the current thread - `bcache_lookup (register block_sector_t sector)`: 1. There _is no_ writer currently engaged in the reader/writer monitor - `bcache_clock (void)`: 1. There _is no_ writer currently engaged in the reader/writer monitor 2. The 'clock' lock _is_ owned by the current thread - `bcache_evict (size_t index)`: 1. The current writer to the buffer cache's reader/writer monitor _is_ the current thread 2. The data lock at the given index _is_ owned by the current thread - `bcache_flush (size_t index)`: 1. The 'data' lock at the given index _is_ owned by the current thread - `bcache_fetch (size_t index, block_sector_t sector)`: 1. The current writer to the buffer cache's reader/writer monitor _is_ the current thread 2. The 'status' lock at the given index _is not_ owned by the current thread 3. The 'data' lock at the given index _is_ owned by the current thread #### Synchronization Following on the recommendations of our design review, we realized that it would be possible to simplify our synchronization logic without sacrificing functionality in any way. Specifically, this involved removing the following components of our initial synchronization design: - 1 total 'disk' lock - replaced by simply requiring fetch/eviction go through our reader/writer monitor pattern as a writer - actually improves on the functionality of the disk lock: the reader/writer pattern ensures that writers (attempts to fetch/evict) cannot starve, while our prior design could potentially result in long delays before evictions could occur - 64 total 'entry' locks (1 per buffer cache entry) - replaced by extending the critical sections of the 'data' and 'status' locks to cover the bits previously protected by the 'entry' or 'control' locks -- in particular, - The valid bit of an entry - The 'tag'/sector # bits of an entry - the valid bit is now covered by the 'data' lock, as it is only ever changed during a fetch/evict operation (which would acquire the data lock anyways) - the 'tag' is now wholly protected through the reader/writer monitor; changes to the tag are only allowed by 'writers', ensuring that multiple 'readers' can operate in parallel, even jumping past eachother when necessary -- an improvement on our original design! We also extended our synchronization implementation by allowing reads/writes to the cache to exit the reader/writer monitor once they'd locked onto the cache line they want to read from/write to, allowing fetches/evictions to _other_ cache lines to occur in parallel. - This is safe! - Evictions/fetches to the same cache line would have to wait on that line's data lock anyways, ensuring mutual exclusion - Once a read/write attempt has gotten past the lookup stage and found the line it was looking for, edits to the global cache index by a later fetch/evict operation no longer can cause any problems! ### Task 2: Extensible Files #### Data Structures ```c=69 /* In-memory inode. */ struct inode { struct list_elem elem; /* Element in inode list. */ block_sector_t sector; /* Sector number of disk location. */ int open_cnt; /* Number of openers. */ bool removed; /* True if deleted, false otherwise. */ struct condition no_writers_cond; /* CV for deny_write */ struct lock deny_write_lock; /* Lock on deny_write */ int deny_write_cnt; /* 0: writes ok, >0: deny writes. */ struct lock dir_lock; /* Lock used to synch directory operations. */ int writer_cnt; /* Number of writers */ struct lock metadata_lock; /* Lock on inode metadata */ struct lock grow_lock; /* Lock on inode growth */ }; ``` - Change `inode` structure to not include `inode_disk`. - Use new functions `fetch_inode_disk()` and `write_inode_disk()` to access and write back the `inode_disk` corresponding to `inode->sector`. - Add metadata and grow lock for use in concurrent synchronization (detailed below) - Add monitor to handle `file_deny_write` synchronization cases. (detailed below) #### Functions - No changes to overall algorithms logic~! - Add functions for simplicity and abstraction for use in inode growth and read/writing. - We dictate which sectors to operate on in the higher level functions `inode_create()`, `inode_write_at()`, `inode_read_at()` - `inode_map()` maps a function `f` to every sector in range from sector number `start` to `end`, abstracting sector traversal logic. - We pass in `inode_write_sector()` to write to each sector. - We pass in `inode_read_sector()` to read from each sector. - We pass in `allocate_sector()` or `deallocate_sector()` allocate or deallocate each sector. - We use an intermediate level function `inode_grow()` to simplifiy logic for sector growth when creating or writing to an inode. - Handle rollback to maintain free_map in consistent state in `inode_create()`. - Use the `inode_free()` helper function to `free_map_release` on failure. - Add fail-safe logic in `inode_write()`. #### Synchronization - Did not use our original implementation for synchronization. - Add a list lock `inode_list_lock` which is acquired and released whenever modifying the `open_inodes` list. - Also use this lock to avoid the case where a thread tries to open an already open inode from disk in `inode_open`. - Add a freemap lock `freemap_lock` which is acquire and released whenever modifying the freemap through `free_map_allocate` and `free_map_release`. - Add a metadata and grow lock for each individual`inode`. - Acquire and release `inode->metadata_lock` whenever we change or access`inode` metadata. - Handle case when two writes on concurrent threads cause redundant file growths using the `inode->grow_lock` lock. - Handle `file_deny_write` cases with monitor - Wait for all threads to finish before processing `file_deny_write` call. - Did not implement per-sector locking on inode level as it is handled in disk cache (part 1). ### Task 3: Subdirectories #### Data Structures - Originally, we added a lock in `struct dir` to synch read/writes on directories. - However, created `struct dir`'s are unique to its process, and are regularly closed to avoid memory leaks. This would cause the lock to be destroyed, or be unaccessible to other processes accessing the same underlying `struct inode`. - Instead, we moved the lock down to the underlying `struct inode`. - Originally, we only kept a boolean in the `fd_wrapper_t` struct in `syscall.h` to denote if an open file's inode corresponded to a directory. - However, this prevented us from being able to track the `pos` value of a dir for `readdir`, since we'd just open and close `dir struct`s as needed - We alleviated this by adding a pointer to a `struct dir` for each open file. - Originally, we planned on using the 0th sector of every directory inode to contain a pointer to the parent directory. - However, this would have involved unraveling many layers of abstraction, and would involve calling inode_read/write_at, which could go wrong. - Instead, every time we call `dir_add`, if the child inode represents a directory, we automatically `dir_add` two `dir_entry`'s representing the self inode and parent inode respectively. #### Functions - For convenience's sake, we added a series of additional getter functions for `fd_wrapper_t` elements and `struct inode` elements - We had to correct issues with our string logic across the board, notably remembering to deep copy when creating copies strings. - We also had to account for previously unforseen edge cases: - Adding a case in `filesys_open` for the opening of the root directory (e.g. `fd = open("/")`) - Skipping the parent and current directory in `readdir` #### Algorithms - No changes #### Synchronization - Originally, we decided to use the synchronization scheme that was in place from project 1 for non-`read`/`write` syscalls. - However, we were told that this would not be acceptable at our design doc review. - So, we instead introduced the aforementioned aforementioned `inode->dir_lock`, which would only lock access to directory inodes being read/written, instead of the entire filesystem. ## Student Testing Report ### General Test Information Two tests were implemented: `bfc-write-only`, implementing the third option presented by the spec, and `bfc-hitrate`, implementing the first option presented by the spec. In order to most directly measure the underlying system statistics, each test makes use of a specially-made syscall: `bfc-write-only` makes use of the `fsystat` syscall to measure reads/writes to the filesystem block device, and `bfc-hitrate` makes use of the `bfcstat` syscall to directly measure the hitrate of the buffer cache itself. Both tests also make use of the file `src/tests/filesys/bfc.inc`, analogous to the `.inc` files utilized by tests families such as `grow-child`, `grow-dir`, and `rox-child`; this file contains common includes, constants, and a set of functions (based on the contents of `src/tests/lib.c`) to separately perform file reads/writes functions as was necessary for these tests. In order to avoid making overt changes to the makefile structure of `filesys/extended`, it was necessary for each test to have a check file for the `-persistence` case; however, as neither test directly involves the filesystem in a way that would require a call to the `check_archive` Perl submodule, each such check file is essentially a no-op (specifically, a trivial header followed immediately by a `pass` directive), and as such does not play an important role in the test. ### Test 1: Cache Hit Rate #### General Information This test is intended to ensure that the buffer cache actually improves the speed of filesystem reads/writes by checking the hitrate directly. The test is called `bfc-hitrate` and is implemented with three files located in `src/tests/filesys/extended/`: `bfc-hitrate.c`, `bfc-hitrate.ck`, and `bfc-hitrate-persistence.ck`. #### Test Mechanics 1. Create 4096B file for use in the test via `create` 2. Write random bytes to the entirety of the file 4. Close the file 5. Flush the buffer cache and reset its hit/miss counts via `bfcstat (BFCSTAT_CLEAR | BFCSTAT_FLUSH)`, returning to a cold-cache state. 6. Perform one sequential read across the buffer cache with a 256-byte write size. 7. Measure the number of resultant cache hits and misses via `bfcstat (BFCSTAT_HITS)` and `bfcstat (BFCSTAT_MISSES)` respectively. 8. Reset the buffer cache hit and miss counts with `bfcstat (BFCSTAT_CLEAR).` 9. Perform a second sequential read across the buffer cache with a 256-byte write size: this should have a notably higher hit rate. 7. Measure the number of resultant cache hits and misses via `bfcstat (BFCSTAT_HITS)` and `bfcstat (BFCSTAT_MISSES)` respectively. 8. The total number of accesses (hits and misses) should be the same across each test; as such, by elementary algebra we may compare their hitrates (hits/total) simply by comparing the number of hits each cache experienced. Check first to see that the number of cache accesses was equal in each case, and second that the number of cache hits on the second read operation exceeded the number from the first by a pre-determined margin -- in this case 10. #### Desired Test Output The file "ZOGCHAMP" should be created, be written to, and be closed; the buffer cache should be flushed successfully; the file "ZOGCHAMP" should open again, be read from once, be closed, be opened again, be read from a second time, and be closed. The CHECK function stating that "Full cache should have at least 10 more hits than the cold cache." must pass. #### Actual Test Output 1. `bfc-hitrate.output` ``` Copying tests/filesys/extended/bfc-hitrate to scratch partition... Copying tests/filesys/extended/tar to scratch partition... qemu-system-i386 -device isa-debug-exit -hda /tmp/Z0nTzzS8Xq.dsk -hdb tmp.dsk -m 4 -net none -nographic -monitor null PiLo hda1^M Loading.............^M Kernel command line: -q -f extract run bfc-hitrate Pintos booting with 3,968 kB RAM... 367 pages available in kernel pool. 367 pages available in user pool. Calibrating timer... 195,993,600 loops/s. hda: 1,008 sectors (504 kB), model "QM00001", serial "QEMU HARDDISK" hda1: 208 sectors (104 kB), Pintos OS kernel (20) hda2: 244 sectors (122 kB), Pintos scratch (22) hdb: 5,040 sectors (2 MB), model "QM00002", serial "QEMU HARDDISK" hdb1: 4,096 sectors (2 MB), Pintos file system (21) filesys: using hdb1 scratch: using hda2 Formatting file system...done. Boot complete. Extracting ustar archive from scratch device into file system... Putting 'bfc-hitrate' into the file system... Putting 'tar' into the file system... Erasing ustar archive... Executing 'bfc-hitrate': (bfc-hitrate) begin (bfc-hitrate) create "ZOGCHAMP" (bfc-hitrate) open "ZOGCHAMP" for writing (bfc-hitrate) close "ZOGCHAMP" (bfc-hitrate) ! ! ! Flushing buffer cache ! ! ! (bfc-hitrate) Commence read from cold cache... (bfc-hitrate) open "ZOGCHAMP" for reading (bfc-hitrate) close "ZOGCHAMP" (bfc-hitrate) Cache should now be full. (bfc-hitrate) Commence read from full cache... (bfc-hitrate) open "ZOGCHAMP" for reading (bfc-hitrate) close "ZOGCHAMP" (bfc-hitrate) Full cache should have at least 10 more hits than the cold cache. (bfc-hitrate) end bfc-hitrate: exit(0) Execution of 'bfc-hitrate' complete. Timer: 93 ticks Thread: 21 idle ticks, 65 kernel ticks, 8 user ticks hdb1 (filesys): 53 reads, 509 writes hda2 (scratch): 243 reads, 2 writes Console: 1522 characters output Keyboard: 0 keys pressed Exception: 0 page faults Powering off... ``` 3. `bfc-hitrate.result` ``` PASS ``` #### Example Kernel Bugs 1. Bad Flushing - If our kernel flushed sectors from the buffer cache on the closure of all relevant files, - Then the test case would instead fail on the final CHECK, as both the initial and second read would have around the same hitrate 2. Broken Clock - If our kernel's clock algorithm hand never advanced beyond the 0th sector slot - Then the test case would instead fail on the final CHECK, as both the initial and second read would have around the same hitrate, as only one page would fit into the cache at a time before the clock hand would flush it back out ### Test 2: Fetch Semantics #### General Information This test is intended to ensure that the buffer cache does not fetch from the filesystem block device on a write request, save for where it is necessary to pull relevant inode sectors off of disk. The test is called `bfc-write-only` and is implemented with three files located in `src/tests/filesys/extended/`: `bfc-write-only.c`, `bfc-write-only.ck`, and `bfc-write-only-persistence.ck`. #### Test Mechanics 1. Create test file of size 100KB, along with a 1024-byte stack array buffer, populated with random bytes 2. Call `fsystat (FSYSTAT_WRITES)` and `fsystat (FSYSTAT_READS)` to obtain a baseline number for the number of reads and writes to the filesystem block device. 3. Write the stack array buffer to the file, 1KB at a time. 4. Call `fsystat (FSYSTAT_WRITES)` and `fsystat (FSYSTAT_READS)` to obtain a new number for the number of reads and writes to the filesystem block device; the write/read differential is the new value of either one of these calls minus the one from step #2. 5. Confirm that the write-differential is around where we expect it to be (~201 in our case) 6. Confirm that the read-differential is sufficiently small (<=1 in our case) #### Desired Test Output 1. Create "ZOGCHAMP" file, open for writing 2. Write to the file 3. CHECK function confirms that "* new writes were made; up to 201 are allowed" 4. CHECK function confirms that "* new read/reads was/were made; up to 1 is/are allowed" #### Actual Test Output 1. `bfc-write-only.output` ``` Copying tests/filesys/extended/bfc-write-only to scratch partition... Copying tests/filesys/extended/tar to scratch partition... qemu-system-i386 -device isa-debug-exit -hda /tmp/HKLnNuLx7r.dsk -hdb tmp.dsk -m 4 -net none -nographic -monitor null PiLo hda1^M Loading.............^M Kernel command line: -q -f extract run bfc-write-only Pintos booting with 3,968 kB RAM... 367 pages available in kernel pool. 367 pages available in user pool. Calibrating timer... 230,604,800 loops/s. hda: 1,008 sectors (504 kB), model "QM00001", serial "QEMU HARDDISK" hda1: 208 sectors (104 kB), Pintos OS kernel (20) hda2: 242 sectors (121 kB), Pintos scratch (22) hdb: 5,040 sectors (2 MB), model "QM00002", serial "QEMU HARDDISK" hdb1: 4,096 sectors (2 MB), Pintos file system (21) filesys: using hdb1 scratch: using hda2 Formatting file system...done. Boot complete. Extracting ustar archive from scratch device into file system... Putting 'bfc-write-only' into the file system... Putting 'tar' into the file system... Erasing ustar archive... Executing 'bfc-write-only': (bfc-write-only) begin (bfc-write-only) create "ZOGCHAMP" (bfc-write-only) open "ZOGCHAMP" for writing (bfc-write-only) close "ZOGCHAMP" (bfc-write-only) 201 new writes were made; up to 201 are allowed (bfc-write-only) 1 new reads were made; up to 1 are allowed (bfc-write-only) end bfc-write-only: exit(0) Execution of 'bfc-write-only' complete. Timer: 103 ticks Thread: 15 idle ticks, 74 kernel ticks, 14 user ticks hdb1 (filesys): 45 reads, 900 writes hda2 (scratch): 241 reads, 2 writes Console: 1271 characters output Keyboard: 0 keys pressed Exception: 0 page faults Powering off... ``` 3. `bfc-write-only.result` ``` PASS ``` #### Example Kernel Bugs 1. Trivial Bug - If our kernel pulled sectors into the cache on write operations, - Then the test case would instead output a failure at the second CHECK function, noting that the number of disk reads was greater than 1 2. Bigg Cache - If our kernel !ILLEGALLY! had more than 64 block sectors - Then the test case would instead output a failure at the first CHECK function, noting that the number of disk writes was too small :( ## Reflection ### Individual Contributions #### Siddharth Gollapudi - Design, implementation, and writeup of task 3 - Debugged persistence tests and most of task 3 #### Richard Huang - Worked with Willis to write and debug task 2 - Designeds inode_map structure to abstract inode structure with inode operations - Helped debug task 3 uwus #### Marcus - Design, implementation, and writeup for Task 1 - Implementation of student tests - Student testing report - Debugging throughout Tasks 1, 2, 3 #### Willis Wang - Design, implementation, and writeup of task 2 - Debug task 1 (ish), 2, 3 and debug all failing tests ### What Went Well - We finished! - Had a roughly good work distribution even when done completely remotely. - Effective debugging and communication through remote means (hangouts, slack, teamspork) - Marcus hit Divine! pogchamp go learn sf now ### What Could Be Improved - Not having to debug tests which failed and were impossible to debug through gdb (bad error messages, user code) - Obtained a net loss of dota games during the time frame of the project.