owned this note
owned this note
Published
Linked with GitHub
# Mini Project 3 Questions
## LAZY Fork
Q1. The report asks us to count the frequency of page faults during the execution of COW fork. Does this mean we have to count page faults specific to COW-triggered faults only, or capture all types of page faults?
Q2. While running usertests after implementing COW, all tests other than textwrite pass, although it passed before implementing it. For MP2 though, we were asked to remove textwrite from usertests, since it was faulty. Do we do the same here?
Q3. When does a page fault occur - during page allocation or during page access, that is the first time a page is accessed by a program?
Q4. All usertests passes, but im getting the following: FAILED -- lost some free pages 32063 (out of 32462).Is this fine?
Q5. DO we have to record page faults for all processes together or the page faults htat have occurred for each process?
## LAZY Read-Write
Q1. Will the entire input will be given at the start of the program , or input can be given after some processing ?
Q2. Is the input to be taken from a file?
Q3. Can we have two request having the same user ID ?
Q4. If we have a concurrency limit of `5` and let say at time `t = 10` there are `3` readers on the file `f1`. At `t = 11` there is a delete request and at `t = 12` there is another reader. Now the delete request is blocked until all the readers leave , but should we allow the reader at `t = 12` to read the file with other readers (as we are below our concurrency limits) so should we delete first.
Q5. If two events happen at `t` seconds, then does the order in which they are printed matter? For example, LAZY takes up a WRITE request at `t=2` and another user makes another request at `t=2`, can they be printed in any order?
Q6. Is the given example output correct? User 3 made a request to delete file 2 at 2 seconds and User 5 made a request to read file 2 at 4 seconds, then shouldn't User 3's request be taken up first instead of User 5's request once User 2 completes writing to file 2?
Q7. Can two users write to different files at the same time?
Q8. What does it mean that READing is alllowed while simultaneously writing to a file? What will the user read is the file being read is modified by another user WRITing to it.
Q9. What would be the expected output for this input:
2 4 6
3 2 5
1 1 READ 0
2 1 WRITE 1
3 2 DELETE 2
STOP
Q10. In the given example, lazy can take up the request of user1 at 1 sec and user 2 can send a write req at 1 sec, should the order of these two events happening be the same as example or can they change randomly??
Q11. Do we have to check each second while the thread is waiting to see whether operation is possible or not and print that User cancelled at `t_k` where `t_k = T+arrival`? Or is it fine if we wait for much more than `T` seconds and then when we acquire the required locks, we realise that time > T and print cancelled at t_k seconds where `t_k = T+arrival+extra`
Q12. The task requirement states the following "Users cancel their requests if LAZY takes more than T seconds (from the time at which users send their request) to start processing." How should we handle the case when a request starts being processed by LAZY at exactly time `T`? According to the example, user 3 sends his request at 2 seconds. Now, user 3 cancels his request at 7 seconds, but the request can start at exactly 7 seconds (`T` = 5).
Q13. In the given test case file 2 is not being read and no one is writing to it at t = 7, so can't we accept request of user 3 at t = 7 seconds?
Q14. Can we get more test cases please.
Q15. Can we limit the number of requests like take it at max 100?
Q16. Can we limit the number of files (can i take the max no.of files to be 100 or something?)
Q17. Can a user perform two tasks simultaneously. For example, can user 1 read file 1, and write to file 1 simultaneously?
Q18. ![image](https://hackmd.io/_uploads/S1Uyfcjlkg.png)what do colour mean here
Q19. Does the order in which the output is printed matter? My output is correct, but the results are not in order.
Q20. If the concurrency limit is reached then should the user wait till it gets chance(if the concurrency limit decreases) or LAZY should cancel the request. If it should wait then the max time would be T+T_arr ?
Q21. What would be the expected output for this input:
2 4 6
3 2 2
1 1 READ 0
2 2 WRITE 1
3 2 DELETE 2
STOP
Q22. If an operation example WRITE completes at 6s then another WRITE on the same file which arrived at 4s, then the next WRITE should start at 6s or 7s?
Q23. If an operation completes at t's then can another operation start at t's on the same file?
Q24.Can any two requests can arrive at same time ? . If yes then what is the output of the following test case if only 2 concurrent users are allowed
1 1 READ 0
2 1 READ 1
3 1 READ 1
Since the both 2 and 3 came at the same time what should we consider do we need to consider both ?
Q25. Referring to Q16, I don't see why cannot we limit the files. One request can access one file only, so we can limit the number of files to the number of requests.
Q26. If a user say user1 requests to delete a file say file1 at t seconds then lazy takes the request of user1 at t+1 seconds then if user2 requests to read or write to file1 at t+1 seconds , should the request be declined by lazy at t+2 seconds? what should be done?
Q27.let write operation take 4s then user1 requests to write into file1 at ts then lazy takes request at t+1 s if user2 requests to write into same file: file1 at t+1 s should lazy decline the request?
Q28. REFERENING TO Q3 if multiple requests with same user id how we differnatte between requests
Q29.
1 1 READ 0
2 1 READ 1
In this case what should be output should lazy start processing both of them at 1 second assuming max_concurrent_limit greater than two
Q30. If no of files is 2 but user tries to access 100 file what should happen
Q31.
![image](https://hackmd.io/_uploads/ryoBLzVWkx.png)
CAN I MAKE FOLLOWING ASSUMPTIONS
1. THE REQUESTS ARE GIVEN IN INCREASING ORDER OF t_i
2. t_i is like 1,2,3.... i.e every second a request arrives
Q32. What happens if o_1 is erroneous (ie. instead of "READ" or "WRITE", the user types in "HELLO"), what should be the output?
Q33. If a single user makes two different read requests on the same file (such that the second read request would execute before the first read request is completed), how is that supposed to be treated? Is the second request supposed to be delayed, declined, etc.? If it supposed to be taken up immediately without delay, does that count as 2 different users accessing the file in terms of "c" (the maximum number of users that can access a file at a given time)?
Q34. Just a further clarification on Q26, every time there's an invalid file access (whether it be because the file was deleted or it doesn't exist), it should occur at t seconds (assuming the request was made at t seconds) or can it occur at t+1 seconds when it is supposed to be taken up?
Q35. Do we need to use threads for this part or can we do it without threads?
Q36. If a delete operation is requested at `t=2` and is taken up at `t=3`, and another request is at `t=4` and delete operations takes 6s. Will the second request wait for the completion of the delete request and display invalid at `t=9s` or at `t=4`(i.e, as soon as the request can be taken up)
Q37. (With refference to Q9) isn't the output incorrect? The delete request shouldn't be taken up untill all the read/write requests are completed right?
Q38. Say if a write request is completed at t seconds, and a delete request and a read/write request are both ready to be executed at the same time then is it ok to assume that whichever request arrived earlier would be given the prefference? Also what should be done if in the same scenario both a read and a delete request arrived at (t-1)s , which should be executed at (t)s ?
Q39. Say if there are two requests which came at the same time but both cannot be executed at the moment because of concurrency limit (max number of users accessing the file), when they can actually execute can we asssume the order of execution to be random as we can't determine which request acquired the lock??
Q40. Can we use sem_timedwait function in order to wait for a semaphore for only a given amount of time and return if it exceeds T.
Q41. Is it fine if we imitate the delete read or write request by sleeping for the required amount of time.There should be no harm in doing this right.
Q42. Is it necessary to do sleep or can we just ensure that the output is correct (like If we use sleep the query might require 20 secs but is it fine if without sleep the ouput comes in immediately )
Q43. In relation to Q40, can we use pthread_mutex_timedlock (which serves the same purpose but for threads)? If not can we use pthread_mutex_trylock() to test every second if a lock has been acquired? We shouldn't wait for a time longer than T for any lock, so there should some funciton to let that happen right?
Q44.For this Case:
2 4 6
3 2 1
1 1 READ 0
2 2 WRITE 1
STOP
should both users cancel their requests? or both requests will be taken??
Q45. Refering to Q39, we can't determine which request acquired the lock right? (it's random right) (how can we execute the earlier one first?)
Q46. Lets say a delete come at time x cant be done now after that a write came it cant be done now after some time the file become empty so we should process delete as it came early am I correct or should I go for write as it was in the *queue*?
Q47. Similar to question Q46, what happens if a delete and read request are both sent at t seconds. The read request will be prioritized (in line with question Q38). Let's say the time for every operation is 5 seconds and a read request is sent at t+1 seconds. Assume the concurrency limit has not been reached. LAZY should take up write at t+2, finish write, then take up then at t+7. Right?
## LAZY Sort
Q1. Can ID be a String or is it just Interger?
Q2. Should we use a constant like `max_threads` and hence implement a basic task queuing system (Add all tasks to a queue and at any given time only `max_threads` amount of threads can be spawned). Or do we assume that we can spawn as many threads as we like (Just spawn all tasks at once for each level of merge).
Q3. Can we use 1 thread extra for managing everything? Or do you expect us to use locks etc and implement it like a recursive function?
Q4. Please tell me if I understand the qustion wrong , but can't the problem be solved without using concurrency concepts (especially without using threads,locks).So is that acceptable?
Q5. Can we assume that ID's are unique?
Q6. Can we use other sorting algorithm for sorting strings in count sort ? Or is it mandatory to use count sort for strings(name and timestamp)?As count sort is very efficient for numbers and becomes complex for strings.
Q7. a) What does distributed implemetation here mean? Does it mean distributed over different networks, or file systems or list of files/folders in multiple distinct multiple files? How do we ensure that we are testing distributed nature of things?
If we actually have to simulate files spread across multiple machines or network locations we would have to implement the networking for this. How can we go about doing this?
b)If we are not supposed to implement the networking between different machines, do we partition files into chunks and consider each chunk as a different node?
Does the input file look something like this?
5
node1 fileA.txt 205 2023-10-02T08:00:00
node2 fileB.txt 207 2023-09-30T10:10:00
node2 fileC.txt 203 2023-10-01T15:20:00
node3 fileD.txt 201 2023-09-29T17:15:00
node3 fileE.txt 204 2023-10-01T12:00:00
ID
Or do we use multiple files? One for each node?
Somethhing like this?
distributed_system/
├── Node_A.txt
├── Node_B.txt
├── Node_C.txt
└── main_data_file.txt
main_data_file.txt
50
Node_A fileA.txt 205 2023-10-02T08:00:00
Node_A fileB.txt 207 2023-09-30T10:10:00
Node_B fileD.txt 201 2023-09-29T17:15:00
Node_C fileC.txt 107 2023-10-01T09:15:00
...
ID
Content for individual nodes
Node_A.txt
fileA.txt 205 2023-10-02T08:00:00
fileB.txt 207 2023-09-30T10:10:00
Node_B.txt
fileD.txt 201 2023-09-29T17:15:00
Node_C.txt
fileC.txt 107 2023-10-01T09:15:00
c)Files belonging to different nodes may not have unique names or IDs. What can we assume to be unique for all files? Or are the files only differentaited on the basis of which node they belong to?
Q8. Count sort for strings? How does that work?
Q9. (Refering to ans of Q6) If we generate a unique number for each string and do count sort on numbers, we cant get the sorted list of strings because for that we must give smaller numbers to lexographically smaller strings and for that we need to know the lexographical order of strings but that is what have to find out.
From my understanding, I think strings can't be countsorted but please correct me if I am wrong.
Q10. Is there a maximum limit on the range (max - min) of IDs/timestamps in a test case (e.g. 1e5 or similar) so that we can allocate an array of allowable size?
Q11. Can we declare fixed size for filename/timestamp and call countsort for max len from the given strings, the complexity changes to O(MAXLEN*N), is this valid or should I use the procedure as specified in Q6?
Q12. Do we need to handle cases where 2 strings from name are mapping to the same number when using our hash function?
For example string1 and string2 may map to the same number x with the hash function used. Do we have to handle such
cases separately or we can assume such collisions won't occur?
Q13. The clarification for Q6 stated that ultimately we have to use count sort to sort Names and Timestamps (sorting criteria that are strings) in case threshold is less than 42. However, can we use multiple passes on each character position emulating LSD (Least Significant Digit) Radix Sort, where each character position is treated as a "digit" in the sorting process? Or should we just stick to mapping each string to a single integer that only requires a single Count Sort pass? My implemetation of the latter approach restricts the string length to 8 as my hash value is large enough that there is risk of overflow. While the former approach (emulating radix sort) is enabling me to work with much longer strings.
Could you please clarify which approach I can use?
Q14. So essentially we are not supposed to use MPI or anything like it to implement the distributed system, right?
Q15. Do we have to account for name hash collisions in countsort?
Q16. In reference to q9 and q6. We will need to make a hash function for 128 characters, which also preserves ordering, i.e hash(fileA.txt) < hash(fileB.txt). Also consider that a reasonable max array size is like 10,000 for this (also consider 10 bucket size needed inside each slot if you want to). Now when you try to generate these hashes, there will inevitably be overflows. Handling these overflows is not trivial. If you simply take modulus and place it from the starting and use bucketing also then you will have to drastically change the countsort algorithm and it becomes extremely complex and is no longer countsort. The true countsort implementation would be to iterate over the entire searchspace i.e 0 to max(hash(filename)) WITHOUT adding any modulus to the hash, i.e the max would be theoretically 2^(8\*128) if all chars allowed, (26+4)^128 if only alphabets + extra chars.
Any submitted implementation will be remote from countsort. Either allow that, or let us know if you have a specific logic/implementation in mind ***in detail***.
Q17. For the line graph, execution time refers to the CPU time or the wallclock time?
Q18. Can we assume that our filename is atmost some 8 length ?
Q19. The hashing techique is generating very large number (if we are to maintain the lexicographic order) , so can we use something like Radix sort or some other sorting algo to generate the hashing number.
Q20. Is there any restriction as to which libraries we are allowed to use? More specifically, is the MPI.h library allowed?
Q21. Are we allowed to use a trie for sorting on Name and Timestamp for count sort?
Q22. Can we assume that only `a-z , .` are being used as filenames to increase the number of characters usable for filename in countsort?
Q23. Can we use the `_Atomic` keyword in C?
Q24. In response to the addition in Q16 by [PS]
> Thus there are multiple possible hash function mapping that would lead to a normal count sort implementation. 18446744073709551615 uint size. 26^8 = 208827064576
Here there is a clear issue, that iterating over 208827064576 slots will take a significant amount of time. Which is 208*10^9, would take at least around 500 seconds to run. And it has to be done in EACH thread and finally once as well.
Not to mention space, but there are methods to make space work.
26^6 is much more reasonable.
Q25. Is it okay if we take all the countsort arrays of children together and then merge them in the main thread? It is not what the stackoverflow source says, that case is efficient when the number of elements is much larger than the max array size, but this is easier to implement and more efficient for our case
Q26. Can we assume that valid timestamps are only after 1970-01-01?
Q27. Can we assume that the difference between hashed numbers of the file which has the min hashing number and the file which has the max hashing number is at max 10^6 or 10^7?
Q28. Adding onto Q18, is an assumption of length 6 acceptable to the TAs?
Q29. Is using qsort allowed to sort individual threads