Note: Task refers to process or thread, i.e. do not distinguish between process and thread.
[Race Conditions] Consider the following two tasks, A and B, to be run concurrently and use a shared variable x. Assume that:
Task A | Task B |
---|---|
x++; x++; |
x = 2*x; |
How many relevant interleaving scenarios are possible when the two threads are executed? What are all possible values of x after both tasks have terminated? Use a step-by-step execution sequence of the above tasks to show all possible results.
Load and store instructions to variable x are the only instructions whose interleaving can affect the value of x. The sequences of such instructions for both processes are shown below:
Task A | Task B |
---|---|
A1: load x A2: store x A3: load x A4: store x |
B1: load x B2: store x |
Interleaving scenarios can be conveniently represented as a sequence (e.g., A1-A2-A3-A4-B1-B2). How many possible interleaving patterns are there?
Lets start with A1-A2-A3-A4 (the order of instructions cannot change) and see in how many ways we can insert B1 and B2 so that B1 is before B2. We can insert B1 into one of 5 slots (before each A instruction and after the last one). In each of these insertions we have created one new slot for B2, which can now be inserted into one of 6 slots – 30 possible sequences in total. However, in only half of these B1 is before B2, so 5x6/2=15. All of them are shown below, together with the outcomes:
B1-A1-A2-A3-A4-B2 🡪 x = 0
B1-A1-A2-A3-B2-A4 🡪 x = 2
B1-A1-A2-B2-A3-A4 🡪 x = 1
B1-A1-B2-A2-A3-A4 🡪 x = 2
B1-B2-A1-A2-A3-A4 🡪 x = 2
A1-B1-A2-A3-A4-B2 🡪 x = 0
A1-B1-A2-A3-B2-A4 🡪 x = 2
A1-B1-A2-B2-A3-A4 🡪 x = 1
A1-B1-B2-A2-A3-A4 🡪 x = 2
A1-A2-B1-A3-A4-B2 🡪 x = 2
A1-A2-B1-A3-B2-A4 🡪 x = 2
A1-A2-B1-B2-A3-A4 🡪 x = 3
A1-A2-A3-B1-A4-B2 🡪 x = 2
A1-A2-A3-B1-B2-A4 🡪 x = 2
A1-A2-A3-A4-B1-B2 🡪 x = 4
There are five possible outcomes: 0, 1, 2, 3, 4. Note that the outcome would be non-deterministic even if each statement was executed atomically, because x=2*x
and x++
are not commutative operations.
[Critical Section] Can disabling interrupts avoid race conditions? If yes, would disabling interrupts be a good way of avoiding race conditions? Explain.
Yes, disabling interrupts and enabling them back is equivalent to acquiring a universal lock and releasing it, respectively. Without interrupts, there will be no quantum-based process switching (because even timer interrupts cannot happen); hence only one process is running. However:
Therefore disabling interrupts is not a good choice for the purpose of locking.
[AY 19/20 Midterm – Low Level Implementation of CS] You are required to implement an intra-process mutual exclusion mechanism (a lock) using Unix pipes. Your implementation should not use mutex (pthread_mutex) or semaphore (sem), or any other synchronization construct.
Information to refresh your memory:
Write your lock implementation below in the spaces provided. Definition of the pipe-based lock (struct pipelock
) should be complete, but feel free to add any other elements you might need. You need to write code for lock_init
, lock_acquire
, and lock_release
.
[Page Table Structure] The Linux machines in the SoC cluster used in the labs have 64-bit processors, but the virtual addresses are 48-bit long (the highest 16 bits are not used). The physical frame size is 4KiB. The system uses a hierarchical direct page table structure, with each table/directory entry occupying 64 bits (8 bytes) regardless of the level in the hierarchy.
Assume Process P runs the following simple program on one of the lab machines:
At line 9, the program prints out the value of variable array in hexadecimal:
Consider the point in time when the process execution reaches the point of interest (line 10 in the listing). Answer the following questions:
a. If we use a single-level page table, how much memory space is needed just to store the page table for process P?
b. How many levels are there in the page-table structure for process P?
c. How many entries in the root page directory are holding information related to process P's dynamically allocated data?
d. How many physical frames are holding information related to process P's dynamically allocated data in the second level of the hierarchical page-table structure (next after the root)?
e. How many physical frames are holding information related to process P's dynamically allocated data in the penultimate level of the page-table structure (i.e., the level before the last one).
f. How many physical frames are holding information related to process P's dynamically allocated data in the last level of the page-table structure?
a. bytes
b. 4 levels
c. 1 entry
d. 1 frame
e. 2 frames
f. frames
array
printed is not at the page boundary.Given a program P with two independent tasks A and B, let us examine the two implementations below: i. Sequential implementation: Single program that executes task A then task B. ii. Multithreaded implementation: A program with two threads working on task A and B concurrently. |
---|
Suppose both tasks are CPU-intensive, which of the following statement(s) is/are TRUE if the two implementations are executed on a single CPU (single core)?
i. If the threads are implemented as user threads, then multithreaded implementation will finish execution in shorter amount of time compared to the sequential implementation.
ii. If the threads are implemented as kernel threads, then multithreaded implementation will finish execution in shorter amount of time compared to the sequential implementation.
Both are FALSE
Suppose task A is CPU-intensive and task B is I/O intensive (reading from a file), which of the following statement(s) is/are TRUE if the two implementations are executed on a single CPU (single core)?
i. If the threads are implemented as user threads, then multithreaded implementation will finish execution in shorter amount of time compared to the sequential implementation.
ii. If the threads are implemented as kernel threads, then multithreaded implementation will finish execution in shorter amount of time compared to the sequential implementation.
Only (ii) is TRUE
Suppose there a number of tasks (>1) share the following variables:
For your convenience, all shared variables have a "s" prefix.
Each task executes the following code (in pseudo code):
The above is an attempt to parallelize the famous Sieve of Eratosthenes algorithm. Essentially, we eliminate all multiples of a prime number, e.g. use 2 to eliminate of 4, 8, 10, 12… Then the smallest number > 2 that is not eliminated (e.g. 3) will be used as the next prime number. The process is then repeated until the whole range is exhausted (50,000 in this case). The algorithm is an efficient way to find out all primes in a given range.
a. Line 12 seems redundant (sNext is used as factor, so how can k > sNext?) Do you think it is needed? Briefly explain.
b. The code currently still exhibits race conditions, use two tasks to identify the race conditions. You only need to list down the line numbers that respective tasks executes that causes the race condition.
c. If you can use critical sections to solve the issues in (b), indicates the absolute minimal lines of code you will protect. Give the corresponding line numbers in the answer sheet.
a. It is not redundant, as other task may have increased sNext to a much larger number.
b.
The code is "carefully written badly" such that there is only one point of failure. If you think through it carefully, there are many possibilities of wasted computation (e.g. re-sieve the array with a known prime, two or more task used the same prime to sieve etc), but they will not result in incorrect execution. So, to properly shows a race condition, you need to pinpoint that single point of failure.
The point of failure is in line 12-13. Part (a) is actually a failed attempt to correct the error!
Task 1 | Task 2 | |
---|---|---|
t0 | 5-12 | |
t1 | 5-17 | |
t2 | 2-17 | |
t3 | 13(!)-17 |
c.
From the above discussion, the minimal answer is "line 12, 13". Note that this essentially "sequentialize" the tasks.
Alternative answer: "line 5,6 and line 12,13" is acceptable (though "5,6" is not critical).
Consider the 5-threads global sum program:
5 threads execute the following code, where globalVar
is shared
a. What is the smallest possible value for globalVar
at the end of execution (i.e. after all threads terminated)?
b. Briefly describe the interleaving pattern which gives you the answer in (a).
c. If the loop counter "i
" is changes to a shared variable among the 5 threads, what is the smallest and largest possible value for globalVar
at the end of execution.
a. The smallest globalVar is 2.
b.
Consider the following:
You can easily extend the above to 5 threads (just "kill off" the other 3 threads and make sure that two threads follow the above pattern).
c.
Smallest globalVar is 1. Consider the following:
Largest globalVar is HUGE. Consider the following:
Task B can do the above EVERYTIME Task A finishes near to 49,999 times.
BUT, that's not all, we can use Task C to restart Task B when it is finally near the end and restart the entire process for A and B. Don’t forget that we still have task D and E…
Below is the blocking solution for producer-consumer problem:
Producer Process:
Consumer Process:
Suppose we initialized the semaphores as follows (note that the size of the shared buffer is 5 and pay attention to the two incorrect initializations):
notFull = Semaphore(3); //incorrect!
notEmpty = Semaphore(1); //incorrect!
mutex = Semaphore(1); //correct
a. Briefly describe the effect of the incorrect notFull
semaphore initialization. You should ignore the effect of the wrong notEmpty
in this question.
b. Briefly describe the effect of the incorrect notEmpty
semaphore initialization. You should ignore the effect of the wrong notFull
in this question.
c. Suppose we have 5 producers tasks (P1 to P5) and 3 consumers tasks (C1 to C3) using the incorrect initialization as stated. Give a snapshot of the semaphores (the value and the blocked tasks on them) at any point during the execution. The only restriction is that the status of all 8 tasks must be clearly stated or implied in the snapshot.
d. If you were to mark your friend's answer on part (c) above, give the rules that you use to determine correctness. Your rule should be concise and has clear outcome (i.e. if XXXX, then it is wrong / correct or similar). You can also consider using formula to express relationship.
a.
Difficulty = Medium. Key point, the wrong semaphore init value limits the number of concurrent producer task (or the total useful value in buffer). Note that I was (very) strict for this part. Your answer should show that you understand all 5 locations of the buffer can still be used (e.g. after the first 3 production, consumer consumed 3, then the subsequent production will utilize buffer[3]
and buffer[4]
and wraps around etc). So, if you simply say "two slots will not be used" (or similar), that is INCORRECT.
b.
Difficulty = Easy. Key point, one consumer can start consuming without the data being placed in the buffer.
c.
Difficulty = Medium. The key phrase in the question is "All tasks status should be clearly stated or deducible". The "clearly stated" part refers to tasks shown explicitly on the semaphore queue, i.e. you know they are blocked. The "deducible" part can have a few example, e.g. if one task is missing from the picture and mutex is 0 -> that task is in the critical section.
So, the simplest answer is to just follow up on the original setup:
Mutex = 0, Queue = P2, P3, C1
notFull = 0, Queue = P4, P5
notEmpty = 0, Queue = C2, C3
The missing P1 is clearly in the critical section.
Note that if your answer need to write "P1, P2 is doing …, P3 is …" explicitly outside of the semaphore -> incorrect.
There are (way too) many alternatives. Feel free to explore a few other.
d.
Which of the following statment(s) regarding zombie process is TRUE?
i. Zombie process takes up a slot in the OS PCB table.
ii. Zombie process is created so that wait() system call can be implemented properly.
iii. A user command running in the background under a shell interpreter can become a zombie process.
i. Correct. Zombie process retains PID and stores termination status and result at its PCB table.
ii. Correct. As dicussed in lecture, without the zombie state, we wont be able to return information about terminated child process to the parent process.
iii. Correct. This is essentially similar to (ii). A foreground process is waited by the shell interpreter directly. A background process, on the other hand, is not CS2106 1718S1 Midterm Solution and Explanation waited. That's how the interpreter can continue to accept user input. So, if the background process terminated, it will turned into a a zombie process.
Given the following pseudo code:
Code A | Code B |
Which of the following setup can potentially cause priority inversion?
a. A high priority task running code B and a lower priority task running code A.
b. A high priority task running code A and a lower priority task running code B.
c. The highest and lowest priority tasks running code A and a middle priority task running task B.
d. The highest and lowest priority tasks running code B and a middle priority task running task A.
e. None of the above
Ans: c
Priority inversion: ‐ Assume priority based pre‐emptive scheduling algorithm. ‐ A low priority task acquires a resource needed by high priority task. ‐ The low priority task is preempted by high priority task -> cannot release the resource -> blocks high priority task. ‐ A medium priority task can execute despite the existence of a high priority task.
The code above construct a simple demonstration of this problem. If a high priority task H and a low priority task L run code A, then it is possible that the L enter the critical section but get pre‐empted by H before finishing its work. So, L still "holds" the semaphore, preventing H from executing. At this time, if a medium task M enter the system to run code B, it will get picked over H despite having a lower priority.
We have the following 4 processes that have access to 3 shared binary semaphores s1, s2 and s3, initially set to 1. The read() and write() operations are non-blocking asynchronous calls (i.e. they exit immediately after being called regardless of whether there is a corresponding write/read). The “sem_pend” operation decrements the given semaphore while the “sem_post” operation increments it.
P1 |
P2 |
P3 |
P4 |
Which of the following pairs of processes may cause deadlock?
a. P1 and P2
b. P1 and P4
c. P1 and P3
d. P2 and P4
e. None of the above pairs in a. to d. may cause deadlocks.
c
Which of the following statement(s) regarding the difference between library calls and system calls is/are TRUE?
i. Library calls are programming language dependent, while system calls are dependent on the operating system.
ii. A system call may be invoked during a library call.
iii. If a system call and library call both execute approximately the same code, the library call is likely to take a longer amount of time to complete, as the system call is lower-level.
i. Correct
ii. Correct
iii. Incorrect, system call will take longer, context switch. It is ambiguous and unclear what "system call is lower-level" means, or if that is the reason for library calls taking longer to execute.
On a system using pure paging-scheme, when we attach a share memory region to a process, the memory address returned by the shmat()
function is likely to be at the boundary of a memory page. True or False and explain why.
True. The translation (i.e. frame number) to the shared region must be different from other memory locations.
All threads in the same process can share the same page table. True or False and explain why.
True. All threads in the same process still using same virtual memory space.
The size of a Page Table Entry (PTE) is dependant on the total size of the logical / virtual memory space of a process. True or False and explain why.
False. PTE stores frame number mainly, which is influenced directly by the size of physical RAM not the size of the virtual memory space.
The size of a Direct Paging Page Table (i.e. the number of page table entries) of a Process is dependant on the total size of the logical / virtual memory space utilized by the process. True or False and explain why.
False. Page table has the same number of entries regardless of actual usage.
The Virtual Address (VA) can never be the same as the Physical Address (PA), i.e. there can be no instance where VA 0xZZZZ ZZZZ
get translated to PA 0xZZZZ ZZZZ
. True or False and explain why.
False. It is possible for a Page to map to a frame with the same number, i.e. Page 123 to Frame 123 by chance.
Accessing the Nth block of a file managed by FAT scheme requires O(N) hard disk accesses in the worst case. True or False and explain why.
False. FAT scheme is the linked list v2.0 implementation learnt in class. FAT is an in-memory structure, the linked list traversal takes place in memory not hard disk. So it requires zero hard disk accesses.
Suppose process A is current executing and is given 2 time units time quantum to run:
1 | 2 | 3 | 4 |
A | A | ... | ... |
In Unix, a new Process B can only arrive in the system during the shaded region. Note that ITI is not part of the shaded region. True or False and explain why.
True. In UNIX, process can be created only via fork()
/clone()
mechanism. So, A has to invoke one of these. Hence, it can only happend when A is executing
Outline a simple program that can find out the page size of a machine. You should state clearly how to run the program and interpret the result.
Many possibilities.
Common approaches:
Sample answer:
Common wrong / partially wrong answers:
Upon the termination of a process P, operating system is supposed to "clean up the resources used by P". Answer the following briefly:
a. How is the OS notified of P's termination?
b. Indicate the resources to be cleaned up and how can the OS find out the relevant information to perform the cleaning. You should be as specific as possible and refer to ideas covered in this course.
a. Process will calls exit()
system call implicitly or explicitly.
b. Use process's page table to find out memory frames used. Use process's file descriptors table to find out file opened.
Indicate whether the working set W(T, Δ) increases (more pages are needed), decreases (less pages are needed) or stable (no change in number of pages needed) for the following scenarios:
a. We are executing a loop to calculate values to be stored in a large array.
b. We are currently executing a recursive function.
c. We are currently blocked on a critical section implemented by busy waiting.
State any assumption if necessary.
a. Increases. Data pages for the array needs to be read in / write out.
b. Increases. Stack space will increases.
c. Stable. Code executed is the same (busy waiting is usually an infinite loop).
Alternative answers with assumptions:
A computer supports 64-bit virtual memory addresses. The page size is 1KiB ( bytes), the size of a page table entry is two bytes. Assuming the addressing is at the word (8 bytes) level, calculate the size of the standard page table (in bytes).
A.
B.
C.
D.
E.
A.
Since the addressing is at word (8 bytes) level, the system can support bytes in total with 64-bit virtual address.
number of pages =
size of page table = size of page table entry * number of pages =
Consider a virtual memory system with the page table stored in memory (memory resident). If a memory access takes 200 nanoseconds, what is the longest time that a paged memory reference takes?
A. 600 ns
B. 600 ns + time to service a page fault
C. 400 ns + time to service a page fault
D. 200 ns + time to service a page fault
E. More than 600 ns + time to service a page fault.
*time to service a page fault includes the time to bring the page from SWAP to memory.
B. 600 ns + time to service a page fault
1st memory access: search page table
2nd memory access: page fault
3rd memory access: succesfully get page needed
Total = 600ns + time to server page fault after 2nd access
A process is generating a page fault while accessing a memory region for which is does not have access rights. What happens in this case?
A. The operating system will handle the page fault and the process can continue the execution with the next instructions.
B. A signal SIGSEGV is sent to the process. If the operating system does not have a signal handler set for SIGSEGV, the process is terminated.
C. A signal SIGSEGV is sent to the process. The process might terminate.
D. A signal SIGSEGV is sent to the operating system. If the operating system does not have a signal handler set for SIGSEGV, the process is terminated.
E. The operating system will handle the page fault, and the process retries the instruction that caused the page fault.
C. A signal SIGSEGV is sent to the process. The process might terminate.
Throughout their execution, parent and child processes share ALL entries in the system-wide open file table, but different file descriptor tables.
A. True
B. False
B. False. All files opened by parent before fork()
is shared between parent and child. After fork()
, files open by parent and child are not shared.
A process creates multiple threads (using pthread library); next, each thread opens the same file and reads 32 bytes from the file (using read system call). What will each thread read?
A. Same information (same bytes).
B. Different parts of the file, depending on the order the threads get to read.
C. There is a race condition because the threads read from the same file.
D. If the file is open for reading by another process, it is impossible to say.
A. Same information (same bytes).
This is similar to the scenario where same process open file twice in lecture note.
In a program, the following call is used:
fd = open("/home/e1234567/myfile", O_RDONLY);
The call is successful and myfile is a hard link. How many I-nodes have been read by this call?
A. 1
B. 2
C. 4
D. 5
E. Impossible to say.
C. 4
/
, /home
, /home/e1234567
, /home/e1234567/myfile
What would be the fastest way to transfer a large amount of data between two processes running on the same operating system? Explain your answer.
Note: The information does not need to be copied during the transfer.
mmap()
is used to map a shared memory region or a shared file to both processes such that both processes are able to access the shared data on RAM without any system calls.
In general exec()
syscall takes longer time to execute than fork()
syscall. Briefly explain why.
fork()
duplicates the current process' image and creates a chile process whereas exec()
replaces the current process' image with a new executable.
Duplicating the current process image requires only memory access whereas bringing in a new executable requires access to secondary storage, which is much slower than memory access.
Which of the following are used by the OS to provide isolation and protection?
i. Virtual memory
ii. Process abstraction
iii. Exception handling
iv. File permissions
a. (i) only
b. (i) and (ii) only
c. (i), (ii) and (iii) only
d. None of the above
e. All of the above
e. All of the above
Virtual memory and processes isolate user application execution instances from one another.
Exception handling delivers exceptions to the responsible party, leaving others unaffected.
File permissions have protect user files from unwanted access.
Question 11 to 16 below uses the same set of assumptions. Namely, consider a byte-addressed virtual memory system with the following set up: (i) virtual address are 32 bits long, (ii) physical addresses are 28 bits long, (iii) a page is 4Kibyte (4096 bytes), (iv) each page table is a page in length, (v) page table entries are 4 bytes, and (vi) it uses 3 levels of page tables.
How many entries are there in a single page table?
a. 256
b. 512
c. 1024
d. 2048
e. 4096
c. 1024
Size of a page table = Size of a page
One page is 4096 bytes = bytes
One page table entry is 4 bytes = bytes
So, number of entries =
Suppose a process only has one physical page frame allocated to it. What is the total memory required for the page tables (not including that final one data page frame).
a. 1029 bytes
b. 5000 bytes
c. 10821 bytes
d. 12288 bytes
e. 16384 bytes
d. 12288 bytes
The process just need one page table at each level.
Each page table is bytes, so total
Suppose a process has 10 physical page frames allocated to it. What is the minimal memory space that would be required for the page tables (not including the 10 data page frames)?
a. 10821 bytes
b. 12288 bytes
c. 16384 bytes
d. 36864 bytes
e. 45056 bytes
b. 12288 bytes
Level 3 page table can point to 10 data frames. So same as previous question.
Suppose a process has 10 physical page frames allocated to it. What is the worst case memory space that would be required for the page tables (not including the 10 data page frames)?
a. 36864 bytes
b. 45056 bytes
c. 86016 bytes
d. 167936 bytes
e. 372736 bytes
c. 86016 bytes
In the worst case, each level 3 page table point to 1 data frame. Each level 2 page table point to 1 level 3 page table. Level 1 will only has 1 page table.
Level 1: bytes
Level 2: bytes
Level 3: bytes
Total = bytes
Assuming that processes do not share any pages. In the worst case, how many processes would the system be able to accommodate if each process uses 100 logical pages before it runs out of (maximum) physical memory? Assume that all pages reside in physical memory, i.e., no disk backup.
a. 13
b. 115
c. 217
d. 326
e. 512
c. 217
Same as previous question, in the worst case, page tables for 1 process will need
In total, 1 process will need (size of page tables) + (100 data frames) =
Total memory available =
So,
Following up on Question 15, i.e., using the same assumptions in that question except that the system turns on demand paging using an infinite disk storage to backup for data pages (but not the page tables – which remains in physical page frames), how many processes can the system accommodate in the worst case? Note that you need at least one data page in physical memory to contain some code and data for execution.
a. Infinite
b. 115
c. 217
d. 326
e. 512
d. 326
Pages can now be in disk. So we can only consider size of page tables.
Which of the following address falls in the 512-byte size buddy block of this address: 0x485BB1
a. 0x485BA1
b. 0x4859B1
c. 0x485AB1
d. 0x485CB1
e. 0x486BB1
b. 0x4859B1
0x485BB1 = 0100 1000 0101 1011 1011 0001
519 bytes = 2^9 bytes
So the answer will be 0100 1000 0101 1001 1011 0001 = 0x4859B1
Suppose both the 512-byte block containing 0x485BB1 and its buddy are freed. What is the starting address of the resultant 1024-byte block
a. 0x485800
b. 0x485900
c. 0x485A00
d. 0x485B00
e. 0x486C00
a. 0x485800
From previous question, we know th 9th bit will be complement of each other, and the most significant 14 bits need to be the same.
The most significant 14 bits will be the starting address when they combined.
Suppose we have the following 2048-entry inverted table:
PID | Page Number |
---|---|
0 | 0 |
1 | 0 |
2 | 0 |
3 | 0 |
0 | 1 |
1 | 1 |
2 | 1 |
3 | 1 |
… | … |
0 | 511 |
1 | 511 |
2 | 511 |
3 | 511 |
In other words, entry of the table is the entry for PID = mod 4, and page number ( >> 2) – "right shift by 2 bits". What is the physical address for Process with ID of 1 and a virtual address of 0xAC529 assuming each page is 4096 bytes?
a. 0xAC429
b. 0xACB29
c. 0x2A0529
d. 0x2B1529
e. 0x2AC012
d. 0x2B1529
We know the virtual address can be split into two parts, page number and offset.
Since one page is 4096 bytes = bytes, the least significant 12 bits will be offset.
So, 0xAC529: Page number = 0xAC, Offset = 0x529
Page number = 0xAC = 172, the query will be <1, 172>
Also notice that, for PID = 1:
So, page 172 will be mapped to frame 689 = 0x2B1
Hence, the physical address will be frame number + offset = 0x2B1 + 0x529 = 0x2B1529
Note: The plus, + here doesn't mean addition, it is just for visualise.