OSDI workshop === ##### 0756049 許耕福 --- ## Outline * Brief introduce of dentry , inode * Hard link * Soft link <!-- .slide: data-transition="fade-in fade-out" --> --- ## Dentry and Inode <!-- .slide: data-transition="fade-in fade-out" --> ---- ### Inode * Inode is **a data-structure** that represents a file or a directory in disk. <!-- .element: class="fragment" data-fragment-index="1" --> * An inode number is a unique number given to an inode. We use inode number to ==identify each file== in a file system. <!-- .element: class="fragment" data-fragment-index="2" --> * A file is stored in a file system. * We need to have some kind of identification mechanism. <!-- .element: class="fragment" data-fragment-index="3" --> * Each **inode contains a lot of information** about the file (size of the file, owner, group, permissions and all kinds of stuff like that) <!-- .element: class="fragment" data-fragment-index="4" --> ---- ### Dentry * The user generally accesses the file by its name; however, such filenames are not understood by the kernel. * Directory entry is basically ==the mapping of filename to its inode==. * They are stored in a slab allocator cache called dentry_cache. ---- ![](https://i.imgur.com/3FKpHp7.png) ---- ![](https://i.imgur.com/UGmlwZV.png) <style> h1 { text-align: center; } h2 { text-align: center; } h3 { text-align: center; } h4 { text-align: center; } h5 { text-align: center; } .reveal .slides { text-align: left; } </style> <!-- .slide: data-transition="fade-in fade-out" --> --- ## Hard Link <!-- .slide: data-transition="fade-in fade-out" --> ---- * Hark link is just a link. * It points to the original ==inode== of the “file”, which makes it almost similar to the orignal “file” except in the name. ---- ```cpp= asmlinkage long sys_link(const char __user * oldname, const char __user * newname) { struct dentry *new_dentry; struct nameidata nd, old_nd; int error; char * to; to = getname(newname); if (IS_ERR(to)) return PTR_ERR(to); error = __user_walk(oldname, 0, &old_nd); if (error) goto exit; error = path_lookup(to, LOOKUP_PARENT, &nd); if (error) goto out; error = -EXDEV; if (old_nd.mnt != nd.mnt) goto out_release; new_dentry = lookup_create(&nd, 0); error = PTR_ERR(new_dentry); if (!IS_ERR(new_dentry)) { error = vfs_link(old_nd.dentry, nd.dentry->d_inode, new_dentry); dput(new_dentry); } up(&nd.dentry->d_inode->i_sem); out_release: path_release(&nd); out: path_release(&old_nd); exit: putname(to); return error; } ``` ---- ```cpp= int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry) { struct inode *inode = old_dentry->d_inode; int error; if (!inode) return -ENOENT; error = may_create(dir, new_dentry); if (error) return error; if (dir->i_sb != inode->i_sb) return -EXDEV; /* * A link to an append-only or immutable file cannot be created. */ if (IS_APPEND(inode) || IS_IMMUTABLE(inode)) return -EPERM; if (!dir->i_op->link) return -EPERM; if (S_ISDIR(inode->i_mode)) return -EPERM; error = security_inode_link(old_dentry, dir, new_dentry); if (error) return error; mutex_lock(&inode->i_mutex); error = dir->i_op->link(old_dentry, dir, new_dentry); mutex_unlock(&inode->i_mutex); if (!error) fsnotify_link(dir, inode, new_dentry); return error; } ``` ---- ```cpp= static int ext3_link (struct dentry * old_dentry, struct inode * dir, struct dentry *dentry) { handle_t *handle; struct inode *inode = old_dentry->d_inode; int err, retries = 0; if (inode->i_nlink >= EXT3_LINK_MAX) return -EMLINK; retry: handle = ext3_journal_start(dir, EXT3_DATA_TRANS_BLOCKS(dir->i_sb) + EXT3_INDEX_EXTRA_TRANS_BLOCKS); if (IS_ERR(handle)) return PTR_ERR(handle); if (IS_DIRSYNC(dir)) handle->h_sync = 1; inode->i_ctime = CURRENT_TIME_SEC; ext3_inc_count(handle, inode); atomic_inc(&inode->i_count); err = ext3_add_nondir(handle, dentry, inode); ext3_journal_stop(handle); if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, &retries)) goto retry; return err; } ``` <!-- .slide: data-transition="fade-in fade-out" --> ---- ### Hard Link ![](https://i.imgur.com/ab6yVai.png) <!-- .slide: data-transition="fade-in fade-out" --> ---- * ```..``` is a hard link to the parent directory, which is created as part of the directory entry. <!-- .element: class="fragment" data-fragment-index="5" --> * ![](https://i.imgur.com/7GPi0PB.png) <!-- .element: class="fragment" data-fragment-index="6" --> * Limit: cannot create a hark link to a directory. <!-- .element: class="fragment" data-fragment-index="7" --> * A file system with this kind of hard link is no longer a tree, because a tree must not, by definition, contain a loop. <!-- .element: class="fragment" data-fragment-index="8" --> ---- ```shell= mkdir -p /tmp/a/b cd /tmp/a/b ln -d /tmp/a l ``` A filesystem with a directory loop has infinite depth: ```shell= cd /tmp/a/b/l/b/l/b/l/b/l/b ``` * Avoiding an infinite loop when traversing such a directory structure is somewhat difficult. <!-- .element: class="fragment" data-fragment-index="10" --> * Limit: cannot cross among the filesystems. <!-- .element: class="fragment" data-fragment-index="11" --> <!-- .slide: data-transition="fade-in fade-out" --> --- ## Soft Link <!-- .slide: data-transition="fade-in fade-out" --> ---- * symbolic link or symlink <!-- .element: class="fragment" data-fragment-index="12" --> * A symbolic link is **a file**, which contains information about the location of a target file or directory. <!-- .element: class="fragment" data-fragment-index="13" --> * It points to another file or directory. <!-- .element: class="fragment" data-fragment-index="14" --> * A symbolic link has its own inode. <!-- .element: class="fragment" data-fragment-index="15" --> * Loop issue <!-- .element: class="fragment" data-fragment-index="16" --> * A symlink is independent of the target. <!-- .element: class="fragment" data-fragment-index="17" --> <!-- .slide: data-transition="fade-in fade-out" --> ---- ```cpp= asmlinkage long sys_symlink(const char __user * oldname, const char __user * newname) { int error = 0; char * from; char * to; from = getname(oldname); if(IS_ERR(from)) return PTR_ERR(from); to = getname(newname); error = PTR_ERR(to); if (!IS_ERR(to)) { struct dentry *dentry; struct nameidata nd; error = path_lookup(to, LOOKUP_PARENT, &nd); if (error) goto out; dentry = lookup_create(&nd, 0); error = PTR_ERR(dentry); if (!IS_ERR(dentry)) { error = vfs_symlink(nd.dentry->d_inode, dentry, from, S_IALLUGO); dput(dentry); } up(&nd.dentry->d_inode->i_sem); path_release(&nd); out: putname(to); } putname(from); return error; } ``` <!-- .slide: data-transition="fade-in fade-out" --> ---- ```cpp= int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname, int mode) { int error = may_create(dir, dentry, NULL); if (error) return error; if (!dir->i_op || !dir->i_op->symlink) return -EPERM; error = security_inode_symlink(dir, dentry, oldname); if (error) return error; DQUOT_INIT(dir); error = dir->i_op->symlink(dir, dentry, oldname); if (!error) { inode_dir_notify(dir, DN_CREATE); security_inode_post_symlink(dir, dentry, oldname); } return error; } ``` <!-- .slide: data-transition="fade-in fade-out" --> ---- ```cpp= static int ext3_symlink (struct inode * dir, struct dentry *dentry, const char * symname) { handle_t *handle; struct inode * inode; int l, err, retries = 0; l = strlen(symname)+1; if (l > dir->i_sb->s_blocksize) return -ENAMETOOLONG; retry: handle = ext3_journal_start(dir, EXT3_DATA_TRANS_BLOCKS + EXT3_INDEX_EXTRA_TRANS_BLOCKS + 5 + 2*EXT3_QUOTA_INIT_BLOCKS); if (IS_ERR(handle)) return PTR_ERR(handle); if (IS_DIRSYNC(dir)) handle->h_sync = 1; inode = ext3_new_inode (handle, dir, S_IFLNK|S_IRWXUGO); err = PTR_ERR(inode); if (IS_ERR(inode)) goto out_stop; if (l > sizeof (EXT3_I(inode)->i_data)) { inode->i_op = &ext3_symlink_inode_operations; ext3_set_aops(inode); /* * page_symlink() calls into ext3_prepare/commit_write. * We have a transaction open. All is sweetness. It also sets * i_size in generic_commit_write(). */ err = page_symlink(inode, symname, l); if (err) { ext3_dec_count(handle, inode); ext3_mark_inode_dirty(handle, inode); iput (inode); goto out_stop; } } else { inode->i_op = &ext3_fast_symlink_inode_operations; memcpy((char*)&EXT3_I(inode)->i_data,symname,l); inode->i_size = l-1; } EXT3_I(inode)->i_disksize = inode->i_size; err = ext3_add_nondir(handle, dentry, inode); out_stop: ext3_journal_stop(handle); if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, &retries)) goto retry; return err; } ``` <!-- .slide: data-transition="fade-in fade-out" --> ---- ### Lookup of Symbolic Links * link_path_walk() ```cpp= if (inode->i_op->follow_link) { mntget(next.mnt); err = do_follow_link(next.dentry, nd); dput(next.dentry); mntput(next.mnt); if (err) goto return_err; err = -ENOENT; inode = nd->dentry->d_inode; if (!inode) break; err = -ENOTDIR; if (!inode->i_op) break; } else { dput(nd->dentry); nd->mnt = next.mnt; nd->dentry = next.dentry; } ``` <!-- .slide: data-transition="fade-in fade-out" --> ---- ```cpp= static inline int do_follow_link(struct dentry *dentry, struct nameidata *nd) { int err = -ELOOP; if (current->link_count >= MAX_NESTED_LINKS) goto loop; if (current->total_link_count >= 40) goto loop; BUG_ON(nd->depth >= MAX_NESTED_LINKS); cond_resched(); err = security_inode_follow_link(dentry, nd); if (err) goto loop; current->link_count++; current->total_link_count++; nd->depth++; err = __do_follow_link(dentry, nd); current->link_count--; nd->depth--; return err; loop: path_release(nd); return err; } ``` <!-- .slide: data-transition="fade-in fade-out" --> ---- * The follow_link method is a filesystem dependent function that reads the pathname stored in the symbolic link from the disk. <!-- .slide: data-transition="fade-in fade-out" --> ---- ```cpp= static inline int __do_follow_link(struct dentry *dentry, struct nameidata *nd) { int error; touch_atime(nd->mnt, dentry); nd_set_link(nd, NULL); error = dentry->d_inode->i_op->follow_link(dentry, nd); if (!error) { char *s = nd_get_link(nd); if (s) error = __vfs_follow_link(nd, s); if (dentry->d_inode->i_op->put_link) dentry->d_inode->i_op->put_link(dentry, nd); } return error; } ``` <!-- .slide: data-transition="fade-in fade-out" --> ---- ```cpp= static inline int __vfs_follow_link(struct nameidata *nd, const char *link) { int res = 0; char *name; if (IS_ERR(link)) goto fail; if (*link == '/') { path_release(nd); if (!walk_init_root(link, nd)) /* weird __emul_prefix() stuff did it */ goto out; } res = link_path_walk(link, nd); out: if (nd->depth || res || nd->last_type!=LAST_NORM) return res; /* * If it is an iterative symlinks resolution in open_namei() we * have to copy the last component. And all that crap because of * bloody create() on broken symlinks. Furrfu... */ name = __getname(); if (unlikely(!name)) { path_release(nd); return -ENOMEM; } strcpy(name, nd->last.name); nd->last.name = name; return 0; fail: path_release(nd); return PTR_ERR(link); } ``` <!-- .slide: data-transition="fade-in fade-out" --> --- ## Hard Link ![](https://i.imgur.com/ab6yVai.png) <!-- .slide: data-transition="fade-in fade-out" --> --- ## Soft Link ![](https://i.imgur.com/q7LUM83.png) <!-- .slide: data-transition="fade-in fade-out" --> --- ## Linux 4.2 * Linux 4.2 will contain a substantial rewrite of much of the code for handling symbolic links in pathname lookup. <!-- .element: class="fragment" data-fragment-index="22" --> * Use iteration <!-- .element: class="fragment" data-fragment-index="23" --> * The nameidata structure that we met in an earlier article contains ==a small stack== that can be used to store the remaining part of up to two symlinks. * In many cases this will be sufficient. If it isn't, a separate stack is allocated with room for 40 symlinks. <!-- .element: class="fragment" data-fragment-index="24" --> <!-- .slide: data-transition="fade-in fade-out" --> ---- * When a symlink is found, walk_component() returns the value 1 (0 is returned for any other sort of success, and a negative number is, as usual, an error indicator). ```cpp= if (unlikely(!*name)) { OK: /* pathname body, done */ if (!nd->depth) return 0; name = nd->stack[nd->depth - 1].name; /* trailing symlink, done */ if (!name) return 0; /* last component of nested symlink */ err = walk_component(nd, WALK_FOLLOW); } else { /* not the last component */ err = walk_component(nd, WALK_FOLLOW | WALK_MORE); } ``` <!-- .slide: data-transition="fade-in fade-out" --> ---- * This causes ==get_link()== to be called; it then gets the link from the filesystem. Providing that operation is successful, ==the old path name is placed on the stack==, and ==the new value is used as the name for a while==. **When the end of the path is found (i.e. ```*name``` is '\0'), the old name is restored off the stack** and path walking continues. ```cpp= if (err) { const char *s = get_link(nd); if (IS_ERR(s)) return PTR_ERR(s); err = 0; if (unlikely(!s)) { /* jumped */ put_link(nd); } else { nd->stack[nd->depth - 1].name = name; name = s; continue; } } ``` <!-- .slide: data-transition="fade-in fade-out" --> --- ![](https://i.imgur.com/Qr3guiN.png) <!-- .slide: data-transition="fade-in fade-out" --> ---- ### Aliases in Mac * Similiar to symbolic link * You can also move the original item anywhere in your Mac's file system. The alias is still able to find the file. <!-- .element: class="fragment" data-fragment-index="27" --> * Process * When you access an alias, the system checks to see if the original item is at the pathname stored in the alias file. * If it is, the system accesses it, and that's that. * If the object has moved, the system ==searches for a file that has the same inode name== as the one stored in the alias file. * When it finds a matching inode name, the system connects to the object. <!-- .element: class="fragment" data-fragment-index="28" --> --- ## fask symbolic link * A fast symlink stores the path of the linked file inside the inode of the fast symlink <!-- .slide: data-transition="fade-in fade-out" --> --- # Thanks for your listening --- ## what permission bits should mean * Because all the hard links point to ==the single inode that contains the metadata about the file==, all of these attributes are part of the file, such as ownerships, permissions, and the total number of hard links to the inode, and cannot be different for each hard link. * It is one file with one set of attributes. ==The only attribute that can be different is the file name==, which is not contained in the inode. <!-- .slide: data-transition="fade-in fade-out" -->
{"metaMigratedAt":"2023-06-14T22:14:00.894Z","metaMigratedFrom":"Content","title":"OSDI workshop","breaks":true,"contributors":"[{\"id\":\"b03ee6eb-7680-4708-aba4-bb534043d069\",\"add\":20622,\"del\":5331}]"}
    346 views