owned this note
owned this note
Published
Linked with GitHub
Q1. The link provided provided for reference code of NFS implementation is wrong, can you please correct it.
Q2. If I understood it correctly, we are required to form an 'illusion' of file paths to the user. This is done by the naming server. Just like how physical memory addresses are virtualized, the location of files on storage servers is virtualized by the naming server.
If this is the case, how exactly should we virtualize? What should be the paths available to the user? Should the
filesystem consist of d
irectories like `ss1/`, `ss2/`, and so on where all the files in the `ss1/` directory are stored on storage server 1, or do we decide on some other naming scheme, or is this scheme decided by someone else?
Q3. Can we create 2 sockets for naming server??
Q4. Are we required to play the audio file in real time in the client while the files are being streamed from server?
Q5. Can we use `libvlc` ?
Q6.While Handling the read command should the file be diplayed directly on the terminal or should we use vim or some other way entirely?
Q7. Just to confirm the clients will always provide the absolute paths right ? Because no actual files / directories exists on the user end.
Q8. Is it guaranteed that 2 clients won't stream on the same laptop at the same time?
Q9. Can you provide/can we assume any maximum on number of clients, storage servers, path length, etc?
Q10. Do we need to ensure persistence of files even after the program is stopped; should the data in the Storage servers remain even after they are shut down?
Q11.
```NM Starts Accepting Client Requests: Once all Storage Servers are initialized and registered with the Naming Server, the Naming Server begins accepting client requests. It becomes the central point through which clients access and manage files and directories within the network file system.```
What do we mean by "once all Storage Servers are initialized"? Do we assume a fixed number of SSs will connect initially?
Q12. Can we use protobuf-c library for communication (Makes everything cleaner, easier to manage)?
Q13.Does reading and writing only deal with text? Or in write are we expected to send a local file (as in something from the client machine)
Do we implement a system where we use a command like `WRITE <localfilepath> <nfsfilepath>` which copies from the client machine and uploads to nfs.
And a command like `READ <nfsfilepath> <localfilepath>` which copies from NFS to local, If only provided with `READ <nfsfilepath>` it will print the content of the NFS file in the client terminal.
Q14. All our folders are managed only on the NM side, SS not concerned with any folder behavior. Is this ok? (All specification funcionalities are maintained.)
Q15. In initialization spec under "List of Accessible Paths" is it ok if this is completely managed by NM instead? (Subset of q14).
Q16. To elaborate on questions 14,15 the plan is to not store any data about folder structure etc on storage servers. Storage servers are only responsible to store files (filenames can be unique identifiers to avoid clashes). NM stores all details about files also. When `READ <nfsfilepath>` is executed, the NM looks up the unique identifier, passes the unique identifier and the storage server ip:port, the client uses this identifier and sends a request to storage server to get the file.
Also, b) If you still require us to send a list of files, we will send the list of identifier file ids to the NM
This question partly comes up because we feel this process of sending a list is not necessary, when a file is queried and it cant be accessed it can be handled accordingly on the spot.
Q17. Do you want the ability to manipulate files and folder structures externally without using NFS (i.e fiddle around with Storage Server files when NFS not running) and then have the changes show up on boot? Or is it fine if we assume that all manipulation is done via NFS only.
If we assume this then we can make the entire folder manipulation software in the NM itself, instead of the complications of directory structures inside each storage server
b) If you really need aforementioned ability, would it be ok if we assign each StorageServer a top level directory randomly like `SS1 <-> /bagel`,`SS2 <-> /pakoda` etc , or if you expect something different please do elaborate
Q18. Do we need to handle backup cases when a SS registers? or is it enough if we do it when new files are written
Q19. After a storage server comes back up after going off, how can we tell if it is the same storage server ? Can we assume that the ip address and the port remain same ?
Q20. With respect to Answer for Q18, this could cause potential complications
Say SS1 backups to SS2, SS3 initially
Now SS1 comes on, SS4, SS5 come on
SS1 starts backup process to SS4, SS5 although 2 copies already exist on SS2, SS3 which come on later.
Q21. Consider this scenario
SS1 has following files
/abc/1/hello
SS2 has
/abc/2/hi
now user decides to create a folder /abc/3/
under whom should it be created? (Or random?)
Q22. I did not really understand this part can u please explain clearlyy
3.5 Backing up Data [70 Marks]
Failure Detection: The NM should be equipped to detect Storage Server (SS) failures. This capability ensures that the NFS can respond promptly to any disruptions in SS availability.
Replication: Implement a replication strategy for data stored within the NFS. This strategy involves duplicating every file and folder in an SS in two other SS (once the number of SS exceeds two). In the event of an SS failure, the NM should be able to retrieve the requested data from one of the replicated stores. However, at this stage, only read operations should be allowed.
Asynchronous Duplication: Every write command should be duplicated asynchronously across all replicated stores. The NM does not wait for acknowledgment but ensures that data is redundantly stored for fault tolerance.
Q23. In response to the answer of Q20 If we have to store details about all files and their backups anyway, then what is the need of sending the entire list of files again to the name server whenever a storage server starts? I think the intended way of doing it is that on WRITE operations only backups take place.
b) Can any new SS connect to NM at any point in time? If so backup on SS start will again become more complicated
Q24. In the `CREATE <path> <name>` call, what exactly is `name`? If it's the name of the file to be created, can't it just be a part of the path?
Q25. Can we use `ao/ao.h` header file for streaming audio?
Q26.Suppose a ss trying to reconnect again to nm,since we already registered all files before time,we are not registering again,In redundancy part
a) since we are not adding new paths we don't really care any files that added,we have to accodimate only the files deleted through backup servers right?
b)is checking accessible path is sufficent or checking the actual data inside the file is also needed?
Q27)can we fix the storage servers which has to backup,SS1 and SS2 for any other server and SS3 for SS1 and SS2 backup or should it be random?
Q28) For the operations issued by clients to storage servers by client the acknowledgement to client will be sent directly by stoerage servers or indirectly by SS via naming server?
Q29) Can we assume the File/Directory names to be unique, i.e ./DIR1/a.txt in Storage server 1 will not be in storage server 2 with the same path, i.e ./DIR2/a.txt?
Q30) How many bonus points would we get if we give two golden tickets for coldplay to the TA's ?
Q31) Can we assume ASYNC flag will be provided for asynchronous write?
Q32)Regarding Q26 if the files deleted when the ss goes off,how to handle that files when the ss comes back?Have to copy from backup ss?
Q33) If a folder is given as accessible path will all the files present in it be accessible, also will all the subfolders in it be accessible?
If yes, then what do you mean by `NM will also dynamically add the new path in this case into the list of accessible paths` when creating a file/folder, won't this be get taken care by itself.
Q34) Can we use the utash library?
Q35) Suppose we make the client upload/write a very large file to a storage server. In this case does the client wait for the 'network send' part to finish and then recieve the write acknowledegement asynchronously? **OR** Do you want us to do BOTH the '*network send*'' and '*wait for acknowledegement*' asynchronously?
Q36) Are we allowed to use execvp and other cousins of exec?
Q37) Can we make an assumption that if a particular folder is stored on a storage server, the whole subtree under that folder is stored in the same storage server?
Q38) Can I assume a maximum of 65536 servers?
Q39) In my understanding, there are 3 types of writes 1) synchronous write 2) asynchronous write 3) normal write , can u please differentiate between synchronous write and write?
Q40) How are we supposed to handle the locking cases, should a write block when another write is running, or should the second write print an error?
Q41) Adding on to Q27, just to confirm, we need to have 2 backup servers for n storage servers or does each storage server have it's data backed up in two different storage servers (For example if we have 4 storage servers SS1-SS4, do I need to have dedicated servers SS1-SS2 for backup such that all of them backup to these 2 only, or SS1 can store to SS2 and SS4 and so on) ?
Q42) Are system commands allowed for copy?
Q43) Consider the situation where we have /abc/1/ stored in SS1, backed up in SS2 and SS3. a) SS1 is down. COPY /abc/1/ /123/xyz/ is executed. Are we required allow COPY in this case?
Q44) Can we use zip command and zip a folder, then send it over for copying an entire folder?
Q45) Can we assume that if file /abc/123/a.txt is being read, DELETE /abc/ wont be executed?
Q46) Can we use execvp commands?
Q47) For client it says it should be able to do various file-related operations such as reading, writing, deleting, streaming, and more. What all is meant by "more" ?
Q48) Refering to Q39 if Sync flag is not provided and data is not large (not greater than threshold for async) I'm thinking it as normal write... correct me if I'm wrong please
Q49) can we exec or related functions to get the file info?
Q50) So can the client wait for the data to be first sent to the storage server as packets in case of an asynchronous write? Or do we need to create a separate thread in the client responsible for sending the data, making sure the client doesn't have to wait for the packets to be sent over the network? In the first case, I am assuming we are allowed to assume a maximum file size to be written.
Q51) The answer to Q29 of this document said that we could assume that file/dirpaths would be unique. But we can copy files/dirs from one SS to another. So if I copy a directory from SS1 to SS2, and then later try to access a file in that directory, how do i know which server to go to, since there are now 2 copies of that file? Sure i could include info in each path about which server it's located on, but it would render using an optimized search function for files/dirs useless (and we were also told not to do this in the answer to Q2 of this document).
So how are we supposed to deal with this?
Q52) Can we implement logging mechanism by writing everything to a file?
Q53. Can we implement the write operation through naming server rathan than storage server
Q54) Do we need to make everything efficient or only the search that naming server does?? In our implementation other than the search we are using `O(n²)` approach. Is it fine if only search for ss is efficient and not the others?
Q55) In response to the zip question, if it does infact copy all contents, are we allowed to use zip?
And are we also allowed to use rm, rm -rf command for deleting files and folders (this is where execvp would be used)?
Q56) Can we use a shell (bash) feature to implement logging?
Q57) Are we expected to store logs even after program ends?
Q58) If a file is being read from or written at /abc/123, and DELETE /abc/123 is executed, or while /abc/123 is only being written to, COPY /abc/123 is executed, We decide the reject the command. Will any marks be cut for this? (OR) do you expect us to wait and check for a while and then reject the command for full marks?
Q59) When say a directory /DIR1 is stored in SS1, backed up in SS2 and SS3. SS2 is down. Do we need to allow write to a NEW file called /DIR1/123.txt Or can we reject the request? If you want us to allow the write then we have to backup the new file to say SS4? (or) we only backup to SS3 and then whenever SS2 comes online, backup to SS2?
Q60) Do backup operations have to be immediate? (After list is uploaded or file is written) Or can we simply run a background thread to check and initiate backups
Q61) Suppose the folder /DIR1/ is initially only in SS1. Now a request `WRITE /DIR1/abc.txt` is sent. Is it ok if we write the new file in SS2? (This helps us generalise everything and handle things on a per file basis).
Q62) If only /DIR1/ exists and it is empty, a command called `WRITE /DIR1/DIR11/abc.txt` is sent. Should we send an error or can we recursively generate folders and insert the file anyway.
Q63) Is create only for files? or is it also for folders?
Q64) Is it ok if we DISALLOW file writes/creates in the root directory? `CREATE /folder/` Allowed. `Create /file` Disallowed.
Q65) A folder initially belonged to SS1 called `/folder/`, now SS1 is down. If a command `CREATE /folder/abc/` or a command `WRITE /folder/abc` is executed, do we store these in SS2?. OR Do we have freedom to simply reject the command?
Q66) What could be the reason why the function for copying directories also copies files which have been deleted before starting any code, but existed at some point in the source directory?