Course Project Questions

Q1. Is POSIX compatibility necessary? Q2. Do we need a main function which calls exec for both NM and SS? or will they be booted manually Q3. All the SSx are essentially just fodlers right? a client may request for a file in any of these servers(folders)? Will this implementation be wrong? (If the NM commands some SSx to (say) read a file, then the SSx directory would output the contents of this file to the client who requested them, right?) Q4. If a file is created on one of these SSx directories without any involvment of client, for eg- I have booted the system with running client, NM and SS1... If I manually go into the directory of SS1 and add a new file (using touch command on linux terminal) should I add this entry on SS1 path list? Or we wont be tested on cases where files arent created by clients? Q5. When the spec says initialize servers upto SS_n, will n be given as input? Q6. How are the storage servers stored physically? a NM server and Client will be typically run via their exe files, but how do you dynamically "run" storage servers, which did not exist before? Do you want me to write a script which creates .c files and run them according to the value of n? Q7. when we're creating a new file, how will we know which SS to allocate the data to? will the naming server allocate it to a random server and tell the client? Q8. for the write operation, in case the path does not exist, do we create a new file/folder given that path and then write to it? Q9. how & where exactly do we store the data in all the storage servers in terms of implementation? like are each of them folders in which we create the files and read from/write to? Q10. how/why does a new SS get created? who asks for a new one, and what is the need? Q11. how exactly do we define the list of accessible paths for the storage server for 1.1? Do we predefine them? In that case, are we expected to load the paths from a file? Q12. for the OSN course project, the requirements say that the naming server can issue specific commands to the storage server But the client is supposed to take a command and send the path to the naming server, and the NM gives the client the ip address and port of the required SS. The client is the one who requests the SS to carry out the action So when would the naming server issue read/write/create commands? Q13. So, by executable do you mean that we only have one C program written and now we just run it in different directories to make a new storage server at that location? Q14. Will the NM code handle all the operations include updating and copying a file, as in even though the data is getting modified in the SS (local disk), the NM code is the one who is actually changing the data? Q15. Clients will write data into files via commands like ```write "hello" /file.txt``` ? (/file.txt is a valid entry in SS1 path list) Or is there some other format required? It would help if we were given some sort of standard commands which would be run ([OPERATION] [CONTENT] [SERVER] [PATH]). Q16. Only the client file takes input via terminal? Will the NM and SS need any input from the user booting the system? Q17. Can we be provided with a list of commands/operations done by client as it would be helfpul to set up an interface? Q18. Since every server/client is run locally on the same machine, wont they all have the same IP? Why are we required to send an IP as an entry to NM since they are the same anyway? Q19. Can we assume there wont be more than 32 servers? Q20. Can you tell me if my understanding is correct - a storage server is an executable , and we create new storage servers by running the code in different directories. To add the accessible paths to the storage server , we can take user input. Q21. When any operations are performed by the client, (say read dir1/a.txt) the NM will return the relevant information (ip and port) of the SS containing this accesable path, I am confused what happens after this, does this client disconnect with the NM and connects to SS and requests the "read" again ? Once the stop packet is arrived do we connect the client back to NM or just exit? Q22. In 2.2 *"Clients can request file and folder creation or deletion operations by providing the respective path and action....Once the NM determines the correct SS,.. "* , If the client only gave a request for creation/deletion, but never gave the SS it wanted to perform this operation on, how ill the NM ever determine the SS for which the client wants to perform the operation on? Q23. In 2.3 , can we assume that all the requests of copying will come from different servers? (copying a file from SS1 to a directory in SS2, but never to a directory in SS1 itself since its the same server) Q24. Can we have a master storage server acting as an interface to the NM and all the SS(_1..._x) ? Q25. When we run the code for SS_1 ...SS_n, while initializing, we have to change the port numbers (for NM and client) for every code right? Or is this not allowed Q26. (following q19) can we have an upper bound on the number of servers going to be initialized? Q27. For creating and deleting files/folders, the doc says that the SS performs the copying(```The SS processes the request and performs the specified action, such as creating an empty file or deleting a file/folder.```) . The answer to question 12 says creating/deleting is a privileged operation and the NM does this. Which do we follow? Is creating/deleting/copying performed by the NM, and read/write/getting information done by the SS? Q28. According to my understanding, a client will be able to send requests as long as at least 1 storage server has been initialized. Is this correct? Q29. When a client executes a write operation on say dir1/file1 in SS1, will the entire file1 be copied over to the client, and the client makes changes locally and sends back the file to SS1 (as in the conventinal NFS mounting), or can we just make the server handle the write operations? Q30. Can we assume the number of clients to be a fixed number from the start(They are not added dynamically like the servers). For Eg: we assume 4 clients and run them on different terminal windows to get the input from them. Q31. Does the client have a list of paths present in the NM? why would the client ever make a request of ```read dir1/file.txt``` without knowing that there is such a path? Q32. Suppose SS1 has an accesible path ```/file.txt```, SS2 has an acessible path ```/dir1/file.txt``` We can assume that there wont be another SSx (x!=2) having an accesible path as ```/dir1/file.txt``` right? Q33. In continuation to Q32, if a client inputs ```read file.txt``` will it read the ```file.txt``` in SS1 or SS2? Q34. Who is responsible for giving permissions of files/paths to client? Does the NM decide which operation is allowed for which client? Q35. When and how is the path permission given to client? Does the NM decide which operations (on every path the NM has) are allowed for a client? Q36. Marks wont be given for implementing the NFS to work on different machines / online right? We dont need to implement those as of now? Q37. In continuation to Q16, once the server has booted with a list of accessible paths (i.e it has sent its entries and accepts client requests forwarded via NM). Will the list of accessible paths change later on? If a path ```/dir1/newdir``` did not exist before, but was created on the server SS1 by some client, will this path be in the list of accessible paths by SS1? Q38. In continuation of Q37, new files/folders created by clients, how is the permission of this file decided? (Lets say client 1 created a directory on SS1 but how are we checking if client 2 has acess operations to this directory? Because we never assigned any idealogy of permissions associated with creating from the client 1 side) Does the new file even be added to the list of accessible paths in SS1? And consequently updates the entries of NM? Q39. Can we design our NFS such that the NFS can be told to open a particular port say 1234 using a CLI command and the new server that we initialise would connect to the same port as told by humans in both cases? Q40. In response to A32, if there is no repetition, then what about the redundancy and replication part? And if there is repetition and we cannot assume a limit on the number of storage servers (using macro?), then how are we supposed to track file presence? A bitmask cannot work without a limit, and a list associated with each file would be overkill. Q41. Two questions related to A31. -a. So the client will inevitably guess a path while giving input to NM without ever knowing if the path even exists or not? -b. NM's can be added dynamically??? Isnt there only a single NM running throughout the course? Q42. Why does 1.2 require us to send the NM port as an entry attribute? Wont all SS communicate to the NM via the same port number, on which the NM is listening? Or am i missing something? (The NM directly responds back to the SS from the port it contacted it on right?) I cant seem to find its significance Q43. Will we be tested on cases where the file size exceeds the RAM size? (similliar to Mini Project 0) Q44. Is copying a file to a file, as in overwriting, a valid operation for ```Copying Files/Directories Between Storage Servers``` ? Q45. (cont. of Q11) When we run the executable for SS in a folder, do we not obtain file paths from there itself(like what we did in C-Shell MP) instead of taking them as input (user or predefined)? Q46. When a client wants to create a new file, is it the NM code which actually creates the file in the memory, or does the storage server receive instructions from the NM to create the file? Q47. In the initialisation part, it is assumed only one client connects to a storage server: ``` Port for Client Connection: A separate port for clients to interact with SS_1. ``` But later, its mentioned that multiple clients can read the same file? Q48. Can we use the execvp() function for copying/deleting directories? Q49. For the redundancy part, when we say that each backup SS stores a copy of the files, are these actually physical copies of the files in the machine which are copied to different SS directories or are they just paths of the files (which can then be used alongside the absolute path of SS which went down)? Q50. Is the redundancy handled by NM or does the SS copy its data to any two random SS? Q51. Are directory names unique across different storage servers? Eg. SS1 has path `dirA/dirB/file1` and is run from `/home/vineeth/d1` and SS2 has a path `dirA/dirB/file2` and is run from `/home/vineeth/d2`. Q52. Can inbuilt functions for binary search and sorting be used (from `<stdlib.h>`)? Q53. Can you please elaborate on the answer for Q47? Q54. Instead of passing list of accesible files as user input, can we not parse them from the directory itself (like what we did in MP1)? What's the need for user input? Q55. (Clarification to Q39) When a new server has to be connected to the NS* in the NFS, can we keep a CLI interface in the NS to allow for new server connection requests? Say a server S_65 has be connected, the NS would be told using some command to open a port and the new server would be connected on the same port. Or should we first connect to an initial always open port to the NS (if free) and then connect to another port whose information would be sent through the new server that is being connected? Q56. Can you please provide resources on how to reconnect a abrupt connection lost? (or atleast tell me what to search for, I have been trying but I am unable to find it) (Question 51 unanswered, someone changed the ordering - look at Q63 - delete this line once you see it) Q57. Suppose the ss executable is run in home/d1/ which has the following structure ``` . d2 file1 d3 file2 file3 d4 file4 ``` And the accessible paths given are ``` ./d3/ ./d4/ ``` If the client asks to create ./d2/file5, what should we do? Q58. Regarding accessible files and folders. Let's say `d2/f1.txt` is accessible and a client wants to create a new file `d2/f2.txt`. Does the client have permission to do so or would the directory `d2` have to accessible? Does `d2/f1.txt` being accessible imply `d2` is accessible? Q59. 1.2 States that the NM can issue specific commands to the SS, so only the client and NM are expected to have a prompt right? (The SS does not need to have a prompt to accept commands unlike client and SS) Q60. 1.2 Describes that an NM can copy/create/delete files on an SS (but a client cant). However in 2.1 you have provided clients with the functionality of creating/copying/deleting files contrary to 2.3. This question is sort of in continuation to Q59, do we have to send NM a command for copy/create/delete via a client or does the NM itself has a prompt which reads the command for this? Q61. In continaution to both Q59 and Q60, does the client send paths while the NM sends server ID's for those operations? some example of commands would be (extremely!) helpful. Q62. Let's say a storage server has started in `dir1`. Can we assume that no storage server would be started in `dir1` or any of its subdirectories? Can we assume the directory in which any storage server has started is not a subdirectory of a previously started storage server? Q63. (Continuation of Q9) Say, I have a directory whose absolute path is `/home/vineeth/d1/d2` which contains the following things: ``` . d3 file1.txt file2.txt d4 file3.txt file4.txt file5.txt file6.txt ``` If I run SS1 executable in `/home/vineeth/d1/d2` and give the accessible paths as ``` d3/file1.txt d4/file3.txt d4/file4.txt file6.txt ``` Then the NM stores these that these paths are mapped to SS1. 1) Can another SS (say SS2) be initialized inside the subdirectory of SS1? As in can SS2 executable be run in the `/home/vineeth/d1/d2/d3` folder and then add a new accessible path as `file1.txt`, which creates two entries for the same file but via 2 different SS? 2) Say SS2 executable was run in `/home/vineeth/d1/d5`(which is an empty dir so no accessible paths), and then if the client issues a command like `Create_file file7.txt`. Then does this file get added to SS1 or SS2? Can it be added randomly? Q64. (a). Suppose that there are 3 storage servers SS1, SS2, and SS3. Assume that SS2 is a redundant server for SS1 and suppose SS2 goes down. While SS2 is down, suppose some deletes, creates, etc. are done in SS1. Now, when SS2 comes back up, how exactly would we replicate the data in SS1? (b). Another question, suppose a client requests for a file to be copied from SS3 to SS1. After this is complete, we replicate the command to SS2 (since SS2 is a redundant server for SS1). Now if the client requests for that file to be deleted from SS3, should we make the request wait as SS2 is currently reading from that file, or should we delete the file before SS2 is done, as the client had never requested SS2 to access that file (the Name Server handles redundancy). (c\). Adding on to the above question, when a storage server is initially added, can we make it's paths available for client operations only after all it's files have been redundantly copied over to 2 other storage servers? This it to prevent situations like in the previous question where a client may request a write operation to a file that is currently being copied for redundancy. (d). A final question, can we dedicate 3 storage servers to be redundant servers for all other servers, and assume that these 3 servers will never go down? And for this purpose, can we dedicate the first 3 servers to do this? (3 SSs are necessary as each needs the other 2 to be it's redundant servers.) Q65. For the efficient search requirement, is it ok if we copy standard code for the implementation of the data structure and cite it? Q66. Can you please elaborate over LRU. As in what exactly need to be Cached. Do we need to cache output or do we need to cache corresponding storage server the client need to got to ? Also is LRU specific to client or is it common to all the clients? Q67. Instead of inputting the valid files from a user, can we use something akin a .gitignore that will define what files to ignore? Q68. You had mentioned only a client can give prompts (i.e only a client will have a running terminal which takes input) but the website says that the NM can issue commands to SS under Commands issued by NM. Do these commands come from client but are fulfilled by NM? (I went thru A12 and still wanted to clarify) Q69. Continuation to Q68, In A12 you had mentioned that if a client wants to create a file/directory at some path, the NM will judge the path to determine which SS it belongs to, this seems confusing, could you give some example? Because we do not store the paths of the SSx themselves, how would we determine where to create this path/directory? It would make more sense to give the value of 'x' so that the file/directory can be created on server SSx, otherwise please (kindly) elaborate with an example of how a directory creation would take place from a client input (if Q68 holds) is forwarded and managed by the NM. (This should also clarify other privilleged commands like copy/delete) Q70. Will we be tested on cases where a SS goes down but copy/create/delete commands are issued are made to this SS? Q71. In continuation to A69, using the same example could you please(!) elaborate on creating a file (what would the input look like and how the NM would figure out the SS only). Q72. Referring to Q71, everything before the last '/' (from client request) is assumed to exist in some SS? Q73. Can we assume that while one query of the client is being executed, the client will wait for the query to finish, or is it necessary to handle multiple queries by the same client concurrently ? Q74. Do backup SS know if the files they are storing are backup files or their own entries?. In other words, if SS3 comes active and stores its backup to SS1 and SS2 by physically copying the files to these servers, is it stored in the same way it was stored in SS3 (Same accesible paths etc) or do we make a new directory in SS1 and SS2 for these backup files? (for eg if SS3 had accesible path as "file.txt" or "newdir/a.txt" then SS1 (its backup) would store its backup in a SS3 directory with the SS1 having accesible paths as "SS3/file.txt" and "SS3/newdir/a.txt" or does the SS1 store it just exactly how SS3 did, but in this case two SS would have same accesible paths which was never allowed) Q75. There are some issues we are facing with implementing the `copy` function a\across storage servers. 1. If we are providing relative paths (as mentioned in Q47), then the 'dot' as in './folder1/folder2' etc (which denotes current directory) would potentially be different. 2. if we call `copy ./folder1/1.txt ./folder2` and if they are in different storage servers, then the naming server would have to gather all contents of the file from one SS and send it to the other SS together? Just facing a lack of clarity in implementing `copy` across multiple SS. Would be great if you could give some pointers. Q76. When you say NM can facilitate the transfer of sending files between two SS, how are these files transferred? because sending non-empty directories over TCP sockets is not possible. Can we use the absolute paths and system calls instead of using TCP sockets for this part? Q77. (DELETED , answer found) Q78. Does the backup store the state of an SS on its entry only or is the backup also updated each time something in the SS is also updated? Q79. Will SS exit through ctrl+c and client exit through some input like "exit"? or do they both exit using Ctrl C? (Or some other exit mechanism will be tested?) Q80. Can we use system function for copy? This is supported on Unix like systems. Q81. Referring to Q66, let's assume that when a certain path was accessed, it's server was down, and we had used the backup server, to perform the operation, in that case, should LRU store the backup server or the original server? Q82. You mentioned in above thread that LRU caching is to be done in O(1) , but how to ensure to map strings with O(1)? Like even for hashing , we need to traverse the entire length of string which consumes O(l). Q83. Regarding the LRU implementation. If the info of which SS the file path is located in is already stored in HashTable like it is in our implementation, what will be the advantage of LRU. Can we instead store outputs of particular commands so that contacting SS can be reduced? Q84. Can we work on an assumption that if we want to copy a directory in SS_x1 to SS_x2, and this directory has only a limited paths exposed, then we can only copy those directories and files which are exposed to NS? Also in sort of a general doubt regarding function calls, can we just use paths of the directories/files perform tasks across SS, i.e. pass absolute paths based on search in NS, and checking src and dest SS, and then call system calls in the dest SS? Q85. Do all the servers need to have unique port addresses? Q86. Say a client requests a file which is located in SS1 using read, will the naming server handle the complete request in the backend and and give the client the exact content of the file or will just provide the client the address of SS1, and client will have to perform more operations to fetch that?And if the client only receives the address what is the client supposed to do next? And after establishing the connection with the storage_server, will the client break the connection with the naming server? Q87. Is it necessary for the SS to know that which files are backups? Is it not fine as long as don't allow access while the primary ss is active and write operations on those files in our implementations? Because its not mentioned in the project doc for the SS to be aware of which files are backup. Q88. LRU is to be implemented in O(1). Say, we are to cache a known upper bound number of entries, CACHE_MAX. Can a CACHE_MAX number of operations for search be rightfully considered as O(1), or will this attract a penalty? (This feels like it incurs a much less overhead compared to a hash implementation) Q89. Are we allowed to use "uthash" lib for hashing? because there is no need to implement a hashing algo in Networking manually. Q90. Can we compress the directory to .tar for copying between different SS and then sending it over TCP? Q91. do we have to assume that initially SS is empty So no accessible path initially