Final Report for Project 1: User Programs
============================================
## Group Members
* Simon Li xuanlinli17@berkeley.edu
* Philip Zhao philipzhao@berkeley.edu
* Yifan Ning yifanning@berkeley.edu
* Albert Qu <albert_qu@berkeley.edu>
<br />
## Feature Summary
### Task 1: Argument Passing
#### 1. Headers for Major Data Structures and Functions Change
```c
// We put major changes in `load` function for argument parsing
bool load(const char *file_name, void (**eip)(void), void **esp)
```
<!-- /* loads string from stack*/
char* load_string_stack(void **esp, int len) -->
#### 2. Implementation Detail Change
1. We use `char * strtok_r` to parse the file name into a list of argument `char*[] parsed_args` and get the size of `argc`, `parsed_args` is initialized to be a fixed large size 64, instead of relying on using a dynamically allocated array to store the arguments.
2. The stack storage and stack alignment process is relatively invariant from the previous design in our Design Doc.
### Task 2: Process Control Syscalls
Task 2 has been significantly revamped since our initial design. The entire new scheme is now shown below.
#### 1. Headers for Major Data Structures and Functions Change
threads/thread.h
```c
struct child
{
tid_t tid;
struct list_elem elem;
struct semaphore sem;
bool waited; // whether the thread has been called wait on
/* Possible values:
* 2: Both child and parent are alive.
* 1: Either child or parent has terminated. */
int exit_counter;
int exit_code; // Exit code of the child thread. Should be initialized to -1.
};
struct thread
{
/* Project 1 Task 2 adds the following fields: */
bool children_init; // whether children list is initialized
struct list children; // children list, takes type struct child
struct child *child_in_parent; // pointer to the struct child in parent's list
/* ******************** */
};
```
threads/thread.c
```c
/* Thread_create is modified to allocate a new child struct and
* add it to the parent's children list.
* If the new thread is running a user program, it also waits for
* the load to finish before returning. */
tid_t thread_create(const char *name, int priority,
thread_func *function, void *aux);
/* Thread_exit needs to free the dynamically allocated child struct.
* The struct is freed by either the parent or the child itself, whichever
* exits later. This function frees from the parent side. */
void thread_exit(void);
/* Finds the thread by tid from all threads. Returns NULL if not found. */
struct thread *find_thread_by_tid(tid_t query);
/* Finds a child with specified tid. Returns NULL if not found. */
struct child *get_child(tid_t child_tid);
```
userprog/process.c
```c
/* Start_process is modified to block its parent process
* using the child's semaphore until load finishes.
* If the load fails, the tid field in the child struct is
* change to -1. */
void start_process(void *file_name_);
/* Process_wait is properly implemented to wait on the
* specified tid if it corresponds to a child. */
int process_wait(tid_t child_tid);
/* This function, along with thread_exit, frees the child struct.
* If the parent has terminated, it frees the child struct. */
void process_exit(void);
```
userprog/syscall.c
```c
/* Modify kernel code for syscall handling. */
static void syscall_handler(struct intr_frame *f);
/* Helper function:
Checks whether an address is valid. If invalid, exit program. */
void valid_user_addr(struct intr_frame *f, void *p);
/* Reads a byte at user virtual address UADDR.
UADDR must be below PHYS_BASE.
Returns the byte value if successful, -1 if a segfault
occurred. */
static int get_user (const uint8_t *uaddr);
/* Writes BYTE to user address UDST.
UDST must be below PHYS_BASE.
Returns true if successful, false if a segfault occurred. */
static bool put_user (uint8_t *udst, uint8_t byte);
```
userprog/exception.c
```c
/* Change handler to return -1 for Kernel page fault at user v-address */
static void page_fault (struct intr_frame *f);
```
#### 2. Implementation Detail Change
- `page_fault` will be modified such that a page fault in kernel mode sets `%eax` to `0xffffffff` and moves the old value into `%eip`.
- `valid_user_addr` checks whether a pointer `p` is valid before dereferencing.
- First check that `is_user_vaddr(p)`
- Call `get_user(p)`. If the function returns -1, the user address might be invalid.
- Then try `put_user(p, 0xff)`, if it returns false, the address is invalid.
- Add more cases into `syscall_handler` to support other syscalls (all bytes in each user-space memory access need to be checked for validity first), let pointer to current thread be `t_curr`:
- `practice`: get integer word from `f->esp + 4` in intr_frame. Add 1 to the number. Put the result to `f->eax`. Finally, call `iret`.
- `exec`: The string pointed to by `f->esp + 4` is checked as valid user memory. Then, call `process_execute` in `userprog/process.c` and put the returned tid to `f->eax`. `process_execute` waits until load finishes, so the returned tid will be -1 if the child process fails to execute.
- `wait`: Call `process_wait` on the value of `f->esp + 4`. How `process_wait` is implemented will be discussed in the next section.
- `exit`: Puts the return code into the `struct child`, then call `thread_exit`.
- In `thread_create`, a `struct child` is created using `malloc` with the `tid` of the new thread, and this new child is added to the `children` list of the current parent thread. Usage of `malloc` is to prevent from `struct child` being deleted upon the exit of the child thread. (An alternative way would be to store childrens in static memory with a fixed sized array or in a fresh page designated to save children.) A semaphore is initialized with value zero for each children and the parent frame is gonna wait till child finish the load for user program in `start_process` to down the semaphore in order to access `tid` field.
<!-- What if child is killed before calling sema_up -->
- To handle different cases of wait when we call `process_wait`, we use the `struct child` to pass around all the information and the semaphore.
- In normal cases, `process_wait` tries to down the semaphore of the child thread. The semaphore is upped when the child thread exits.
- To deal with corner cases, we use `struct child` to hold the tid and exit code of the child process. The exit code is put into the struct when the child process calls the exit syscall, so its parent can know the return value even if the child thread has terminated. `struct child` also has the `waited` field which makes sure that the tid can only be waited once.
- We need to be careful when freeing the heap memory for `struct child`. We make sure that the later of the child / parent pair to terminate frees the memory. We use the `exit_counter` to check whether one side of the child / parent pair has terminated. When the thread exits, it decreases the `exit_counter` value for all children and its own struct in its parent's list. If the parent or child is found to have terminated, the struct is freed.
<br />
### Task 3: File Operation Syscalls
#### 1. Headers for Major Data Structures and Functions Change
All the function signatures remained invariant from initial design doc, they are copied here for reference.
thread.h
```c
/* Add struct definition. */
struct fd_file_map {
int fd; // key
struct file * file; // value
struct list_elem elem;
};
/* Add struct fields. */
struct thread {
...
struct file* cur_file; // for file_deny_write & file_allow_write
int next_fd; // for allocate
struct list fd_list; // for fd_file_map
...
}
```
syscall.c
```c
/* Add global lock for file system. */
struct lock *file_lock;
/* A helper function that gets the file given its fd by iterating the linked list */
struct file* get_file(int fd);
```
#### 2. Implementation Detail Change
Most of the procedures are invariant from our initial design in design doc and are copied here for reference.
- For each thread, we add the field `*cur_file` to record the executable we are going to execute, so during the `load` call we can use `file_deny_write` on the recorded file to prevent any modification on the executable by others, and when finishing we use `file_allow_write` on the recorded `*cur_file` to recover.
- Since we need to associate an unused fd to a file during `open`, we use the `next_fd` field to indicate the next free integer that can be assigned, and increment it by 1 once it's used. Note we should skip 0 and 1 as they are reserved for `stdin` and `stdout`.
- To keep track of all the files opened, we use a (doubly) linked list of `fd_file_map` to record the mapping between each file and the file descriptor that gets assigned from `open` that file.
- The process of each file system call is roughly divided into the following steps:
- Before doing any file operation syscall, we need to validate the address of the passing arguments, not only addresses in `argv`, but also all addresses from `buff*` to `(buff+buff_size)*` during `read` and `write`. We also need to make sure the address of all arguments do not span across the page boudary, which `sc-boudary` tests on. If any of them is not valid, we should exit the user process.
- Try to grab the global file system lock to prevent multiple file system functions called concurrently.
- Retrieve the arguments from `argv`, use the maintained linked list of `fd_to_file` mapping to insert, get, or remove depending specific file system operation, and then call the corresponding function in the file system library. If we find any invalid arguments such as invalid fd, we should exit the user process.
- Release the global file system lock and return our result to user.
- In order to prevent any modification on executing file, we call `file_deny_write` before process starts running the executable. To clean up, call `file_allow_write` after finishing using the executable.
- close all file descriptors after exiting
<br />
## Project Reflection
- what exactly did each member do?
- Yifan Ning: wrote design doc for task3; wrote code for task1 and task3.
- Philip Zhao: wrote design doc for "Get acquainted with pintos" and task 2. Wrote code for user memory check. Wrote and debug task 2.
- Albert Qu: Wrote deisgn doc for "Get acquainted with pintos", task 1, task 2. Wrote code for test suits. Wrote final report and test report.
- Simon Li: Wrote design doc for "Get acquainted with pintos", task 1, and task 2; write and debug code for task 2; debug for multi-oom test.
- What went well, and what could be improved?
- We finished the whole project with all tests passed.
- We missed check point 2 and used one slip day, our design doc is also not very detailed. In the next project, we should start earlier and get better plan of schedule.
- Checkpoint 2 was the most difficult part of the project and we at first did not come up with a good plan of how to ensure synchronization, which caused trial and errors during the code writing process (and waste of time). For later projects, we need more rigorous and detailed designs.
- Some bugs can be found in the gcc warning when the kernel is being compiled. We should pay more attention to these warnings, as they can be good indicators of bugs and save our debugging time.
## Student Testing Report
### Test 1: seek-full
1. Test Description
* It tests the `open` syscall matches the expected `open` behavior
* It tests the `read` syscall matches the expected `read` behavior
* It tests that `seek` syscall would position cursor at designated position, and handle cases of any arbitrary position input (potentially after `EOF`).
2. Test Mechanics
This test opens `sample.txt` and tries to use `seek` syscall to set the offset position to `BASE` and `sizeof sample + 10` respectively, in order to test:
* The data stored at `BASE`, which could be accessed through `sample` variable from `sample.inc`, is the same as the data read from the position that `seek` syscall sets at.
* The syscall `seek` called at a position after `EOF` would not throw error but place the cursor at the designated position. `read` on this position would result in an empty read.
The test would throw a fail message with `fail` if any of the above functions do not match the expected behavior.
3. Test Output
seek-full.output
```
Copying tests/userprog/seek-full to scratch partition...
Copying ../../tests/userprog/sample.txt to scratch partition...
qemu-system-i386 -device isa-debug-exit -hda /tmp/1IKD1Q1b3o.dsk -m 4 -net none -nographic -monitor null
PiLo hda1
Loading...........
Kernel command line: -q -f extract run seek-full
Pintos booting with 3,968 kB RAM...
367 pages available in kernel pool.
367 pages available in user pool.
Calibrating timer... 254,361,600 loops/s.
hda: 5,040 sectors (2 MB), model "QM00001", serial "QEMU HARDDISK"
hda1: 182 sectors (91 kB), Pintos OS kernel (20)
hda2: 4,096 sectors (2 MB), Pintos file system (21)
hda3: 109 sectors (54 kB), Pintos scratch (22)
filesys: using hda2
scratch: using hda3
Formatting file system...done.
Boot complete.
Extracting ustar archive from scratch device into file system...
Putting 'seek-full' into the file system...
Putting 'sample.txt' into the file system...
Erasing ustar archive...
Executing 'seek-full':
(seek-full) begin
(seek-full) open "sample.txt"
(seek-full) end
seek-full: exit(0)
Execution of 'seek-full' complete.
Timer: 68 ticks
Thread: 0 idle ticks, 66 kernel ticks, 2 user ticks
hda2 (filesys): 92 reads, 224 writes
hda3 (scratch): 108 reads, 2 writes
Console: 960 characters output
Keyboard: 0 keys pressed
Exception: 0 page faults
Powering off...
```
seek-full.result
```
PASS
```
4. Potential Bugs
* If the `seek` syscall incorrectly positions the cursor in the file, the `read` syscall would return values that are inconsistent with the expected value stored in variable `sample`, and would throw an error message `"seek results in wrong position."`
* If the `read` syscall still attempts to read data even if `seek` sets the cursor position after `EOF`, the number of bytes read would be inconsistent with the expected amount 0, and would throw a failure message `"seek fails, read multiple bytes after EOF"`
### Test 2: tell-full
1. Test Description
* It tests the `open` syscall matches the expected `open` behavior
* It tests the `read` syscall matches the expected `read` behavior
* It tests that `seek` syscall would position cursor at designated position
* It tests that `tell` syscall would return position of the current file cursor, even if `seek` sets it after `EOF`
2. Test Mechanics
This test opens `sample.txt` and tries to use `read` syscall or `seek` syscall to move file cursor, and consequently test if `tell` syscall returns position values that are consistent with the ones being set. The major components of the tests are:
* It first `read` `BASE` bytes and tests if `tell` would return `BASE`
* Then it use `seek` at a position after `EOF` and see if `tell` would return the same position as the designated value.
The test would throw a fail message with `fail` if any of the above functions do not match the expected behavior.
3. Test Output
tell-full.output
```
Copying tests/userprog/tell-full to scratch partition...
Copying ../../tests/userprog/sample.txt to scratch partition...
qemu-system-i386 -device isa-debug-exit -hda /tmp/d4ul5_89nJ.dsk -m 4 -net none -nographic -monitor null
PiLo hda1
Loading...........
Kernel command line: -q -f extract run tell-full
Pintos booting with 3,968 kB RAM...
367 pages available in kernel pool.
367 pages available in user pool.
Calibrating timer... 103,936,000 loops/s.
hda: 5,040 sectors (2 MB), model "QM00001", serial "QEMU HARDDISK"
hda1: 182 sectors (91 kB), Pintos OS kernel (20)
hda2: 4,096 sectors (2 MB), Pintos file system (21)
hda3: 109 sectors (54 kB), Pintos scratch (22)
filesys: using hda2
scratch: using hda3
Formatting file system...done.
Boot complete.
Extracting ustar archive from scratch device into file system...
Putting 'tell-full' into the file system...
Putting 'sample.txt' into the file system...
Erasing ustar archive...
Executing 'tell-full':
(tell-full) begin
(tell-full) open "sample.txt"
(tell-full) end
tell-full: exit(0)
Execution of 'tell-full' complete.
Timer: 59 ticks
Thread: 0 idle ticks, 57 kernel ticks, 2 user ticks
hda2 (filesys): 92 reads, 224 writes
hda3 (scratch): 108 reads, 2 writes
Console: 960 characters output
Keyboard: 0 keys pressed
Exception: 0 page faults
Powering off...
```
tell-full.result
```
PASS
```
4. Potential Bugs
* If the `tell` syscall returns the file cursor position that is 1-indexed instead of 0-indexed, or if `read` syscall incorrectly moves the cursor after the reading operation, the value returned would be `BASE + 1` or some other values instead of `BASE`, and would print a failure message `"BAD tell returns [pos] instead of [BASE]"`
* If the `tell` syscall simple reports the length of file at file cursor positions after `EOF`, the value returned would be `sizeof sample` instead of `sizeof sample + 10`
### System Test Reflection
* The Pintos test cases are relatively exhaustive, which makes coming up with new tests a relatively hard process. After careful examination of the test cases, our group found out that tests are lacking for basic operations of `seek` and `tell` syscalls and we consequently implemented them. Also, for `stack-align`, the test suite only checks cases with alignment of 1-4 bytes, which are not exhaustive, as the slack misalignment cases with more than 4B could also happen, and associated tests might need to be added for completeness.
* The process of writing tests adds more industrial flavor to the project and thinking about potential cases that might affect the code correctness helps facilitate development of code and improvement of code quality in general. In future projects, our group will place more emphasis on test development and improve the code quality in a way that is less reliant to built-in tests.