owned this note
owned this note
Published
Linked with GitHub
###### tags: `LWN` `translate` `not yet done`
Object-oriented design patterns in the kernel, part 2
===
PART1: https://goo.gl/jJpVeD
PART2: https://goo.gl/RjRNxj
原文: https://lwn.net/Articles/446317/
中文翻譯
> 呂紹榕
> louie.lu@hopebaytech.com
> 31/07/2016
> In the first part of this analysis we looked at how the polymorphic side of object-oriented programming was implemented in the Linux kernel using regular C constructs. In particular we examined method dispatch, looked at the different forms that vtables could take, and the circumstances where separate vtables were eschewed in preference for storing function pointers directly in objects. In this conclusion we will explore a second important aspect of object-oriented programming - inheritance, and in particular data inheritance.
在第一個部份我們分析了在 Linux kernel 中如何用正常的 C 語言來建構物件導向程式的多型部份。更精確的說我們探討了方法派送,看了 vtables 的各種不同形式,以及為了效能直接把函數指標嵌入物件而不是使用 vtable 的狀況。現在我們要來探索第二個在物件導向程式中重要的部份 - 繼承,更精確而言 - 資料繼承。
Data inheritance
---
> Inheritance is a core concept of object-oriented programming, though it comes in many forms, whether prototype inheritance, mixin inheritance, subtype inheritance, interface inheritance etc., some of which overlap. The form that is of interest when exploring the Linux kernel is most like subtype inheritance, where a concrete or "final" type inherits some data fields from a "virtual" parent type. We will call this "data inheritance" to emphasize the fact that it is the data rather than the behavior that is being inherited.
繼承是物件導向程式的一個核心概念,雖然他以不同的型態表現,諸如 prototype 繼承、mixin 繼承、subtype 繼承、interface 繼承...etc,其中一些是有重疊的。在探索 Linux kernel 時我們在意的是 subtype 繼承,也就是從 "虛擬" 長輩繼承一些具體或 "final" 型態的資料欄位。我們把這種稱之為 "資料繼承" 來強調事實上繼承的是資料而不是那些行為。
> Put another way, a number of different implementations of a particular interface share, and separately extend, a common data structure. They can be said to inherit from that data structure. There are three different approaches to this sharing and extending that can be found in the Linux kernel, and all can be seen by exploring the struct inode structure and its history, though they are widely used elsewhere.
換句話說,若干特定的接口共享不同的實作,然後分別延伸出公共資料結構。他們可以說是從該資料結構繼承而來。我們可以在 Linux kernel 中發現三種不同的方式可以達成這樣的分享再延伸,而且可以這三種模式,當然從 `struct inode` 與他的歷史中找到這三種模式,當然這也廣泛的使用在其他地方。
Extension through unions
---
> The first approach, which is probably the most obvious but also the least flexible, is to declare a union as one element of the common structure and, for each implementation, to declare an entry in that union with extra fields that the particular implementation needs. This approach was introduced to struct inode in Linux-0.97.2 (August 1992) when
第一種作法,大概是最常見也最沒有變通性,就是在 common structure 中定義一個 union 當作是一個元素,並且在每個實作中定義一個 entry 給 union 並且加入該實作所需的額外欄位。這個方法[第一次出現](http://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/?h=0.97.2&id=eb79918f272fe119902db3028e0fbdc752f4942d)在 struct inode 是在 Linux-0.97.2 (1992 八月)中被加入 struct inode。
* Linux v0.97.5 - [linux/fs.h](http://lxr.linux.no/linux-old+v0.97.5/include/linux/fs.h#L238)
```c=137
union {
struct minix_inode_info minix_i;
struct ext_inode_info ext_i;
struct msdos_inode_info msdos_i;
} u;
```
> was added to struct inode. Each of these structures remained empty until 0.97.5 when i_data was moved from struct inode to struct ext_inode_info. Over the years several more "inode_info" fields were added for different filesystems, peaking at 28 different "inode_info" structures in 2.4.14.2 when ext3 was added.
直到 0.97.5 把 `i_data` 從 `struct inode` 移動 `struct ext_inode_info` 前,這些 union 欄位都是沒有使用的。多年過去,更多的 `inode_info` 欄位被加入到不同的檔案系統之中,在 2.4.14.2 `ext3` 加入的時候,共計有 28 種不同的 `inode_info` 結構存在。
* Linux v1.1.80 - [linux/fs.h](http://lxr.linux.no/linux-old+v1.1.80/include/linux/fs.h#L238)
```c=237
union {
struct pipe_inode_info pipe_i;
struct minix_inode_info minix_i;
struct ext_inode_info ext_i;
struct ext2_inode_info ext2_i;
struct hpfs_inode_info hpfs_i;
struct msdos_inode_info msdos_i;
struct umsdos_inode_info umsdos_i;
struct iso_inode_info isofs_i;
struct nfs_inode_info nfs_i;
struct xiafs_inode_info xiafs_i;
struct sysv_inode_info sysv_i;
void * generic_ip; /* talk later */
} u;
```
> This approach to data inheritance is simple and straightforward, but is also somewhat clumsy. There are two obvious problems. Firstly, every new filesystem implementation needs to add an extra field to the union "u". With 3 fields this may not seem like a problem, with 28 it was well past "ugly". Requiring every filesystem to update this one structure is a barrier to adding filesystems that is unnecessary. Secondly, every inode allocated will be the same size and will be large enough to store the data for any filesystem. So a filesystem that wants lots of space in its "inode_info" structure will impose that space cost on every other filesystem.
用這個方法來實現資料繼承是簡單以及直覺的,不過也有點笨拙。這有兩個明顯的問題。第一,每個新的檔案系統實作都需要新增一個額外的欄位給 union `u`。在三個欄位的時候可能還不是問題,但有 28 個的時候就是醜,「醜」。要求每個檔案系統都要更新這個結構導致在新增不必要的檔案系統時出現隔閡。第二,在任何檔案系統請求分配 `inode` 空間時都會是相同大小,而且要符合最大的那個*。所以當一個檔案系統在自身的 `inode_info` 中使用許多空間的話,代表其他檔案系統也會用掉相同的空間。
> *註: [C99 spec ISO/IEC 9899:TC2](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf) p.115 提到:
>> 14. The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.
>>
>因此會造成文章上所說到,空間愈用愈多,對其他檔案系統造成浪費的問題。
> The first of these issues is not an impenetrable barrier as we will see shortly. The second is a real problem and the general ugliness of the design encouraged change. Early in the 2.5 development series this change began; it was completed by 2.5.7 when there were no "inode_info" structures left in union u (though the union itself remained until 2.6.19).
我們等等會看到,要解決上述第一個提到的問題並沒有很困難。第二個問題才是真正醜到需要去改動的問題。直到 2.5 開發版的時候開始有了轉變,最後結束在 2.5.7 版,`inode_info` 終於不存在於 `union u` 當中了(雖然 union 本身還存留到 2.6.19)。
Embedded structures
---
> The change that happened to inodes in early 2.5 was effectively an inversion. The change which removed ext3_i from struct inode.u also added a struct inode, called vfs_inode, to struct ext3_inode_info. So instead of the private structure being embedded in the common data structure, the common data structure is now embedded in the private one. This neatly avoids the two problems with unions; now each filesystem needs to only allocate memory to store its own structure without any need to know anything about what other filesystems might need. Of course nothing ever comes for free and this change brought with it other issues that needed to be solved, but the solutions were not costly.
在 2.5 早期(2.5.3)發生的改變可以說是有效翻轉了 inodes。從 struct `inode.u` 中移除 `ext3_i`,並且在 `ext3_inode_info` 中加入一個稱做 `vfs_inode` 的 struct `inode`。所以除了私有結構 (struct *_inode_info)開始嵌入到公有結構,公有結構(struct inode)也開始嵌入到私有結構中。這漂亮的避免了 unions 的兩個問題 (醜跟空間浪費)。現在每個檔案系統只要分配可以存下自己結構的空間而不需要知道其他檔案系統需要多少。當然,所有東西都不是免費而來,這個改變帶給我們另一個問題要去解決,不過解決方式並沒有花費太多的代價。
> [2.5.2 - linux/fs.h - struct inode](http://lxr.linux.no/linux-bk+v2.5.2/include/linux/fs.h#L427)
[2.5.3 - linux/fs.h - struct inode](http://lxr.linux.no/linux-bk+v2.5.3/include/linux/fs.h#L404)
[2.5.2 - linux/ext3_fs_i.h](http://lxr.linux.no/linux-bk+v2.5.2/include/linux/ext3_fs_i.h#L24)
[2.5.3 - linux/ext3_fs_i.h](http://lxr.linux.no/linux-bk+v2.5.3/include/linux/ext3_fs_i.h#L75)
> The first difficulty is the fact that when the common filesystem code - the VFS layer - calls into a specific filesystem it passes a pointer to the common data structure, the struct inode. Using this pointer, the filesystem needs to find a pointer to its own private data structure. An obvious approach is to always place the struct inode at the top of the private inode structure and simply cast a pointer to one into a pointer to the other. While this can work, it lacks any semblance of type safety and makes it harder to arrange fields in the inode to get optimal performance - as some kernel developers are wont to do.
第一個困難處在於,事實上當在公共檔案系統層 - 也就是 VFS 層的時候 - 呼叫一個特定檔案系統需要傳入一個指標給公共檔案結構 `struct inode`。使用這個指標,檔案系統需要自己去找尋一個指標到他自己的私有資料結構中。一個明顯的作法是永遠把 `struct inode` 放在私有`inode`結構的上方,and simply cast a pointer to one into a pointer to the other. while this can work, it lacks any semblance of type safety and makes it harder to arrange fields in the inode to get optimal performance - as some kernel developers are wont to do.
> The solution was to use the list_entry() macro to perform the necessary pointer arithmetic, subtracting from the address of the struct inode its offset in the private data structure and then casting this appropriately. The macro for this was called list_entry() simply because the "list.h lists" implementation was the first to use this pattern of data structure embedding. The list_entry() macro did exactly what was needed and so it was used despite the strange name. This practice lasted until 2.5.28 when a new container_of() macro was added which implemented the same functionality as list_entry(), though with slightly more type safety and a more meaningful name. With container_of() it is a simple matter to map from an embedded data structure to the structure in which it is embedded.
> The second difficulty was that the filesystem had to be responsible for allocating the inode - it could no longer be allocated by common code as the common code did not have enough information to allocate the correct amount of space. This simply involved adding alloc_inode() and destroy_inode() methods to the super_operations structure and calling them as appropriate.
第二個困難之處是,inode 的空間分配責任將落到檔案系統本身 - 因為公共層沒有足夠的資訊來分配正確的空間大小給 inode 使用。這個問題可以簡單的將 `alloc_inode()` 與 `destroy_inode()` 這兩個方法放到 `super_operations` 結構中並且在適當的時機呼叫他們。
Void pointers
---
> As noted earlier, the union pattern was not an impenetrable barrier to adding new filesystems independently. This is because the union u had one more field that was not an "inode_info" structure. A generic pointer field called generic_ip was added in Linux-1.0.5, but it was not used until 1.3.7. Any file system that does not own a structure in struct inode itself could define and allocate a separate structure and link it to the inode through u.generic_ip. This approach addressed both of the problems with unions as no changes are needed to shared declarations and each filesystem only uses the space that it needs. However it again introduced new problems of its own.
如先前所提,union pattern was not an impenetrable barrier (難以跨越的屏障) to adding new filesystems independently. 這是因為 `union u` 有一個欄位並不是 `inode_info` 結構。在 Linux-1.0.5 版時一個通用指標欄位稱做 `generic_ip` 被加入,直到 1.3.7 版時都未被使用。任何在 `struct inode` 中沒有持有結構的檔案系統都可以定義與分配一個分離的結構,並且透過 `u.generic_ip` 連結到 `inode` 上。
> Using generic_ip, each filesystem required two allocations for each inode instead of one and this could lead to more wastage depending on how the structure size was rounded up for allocation; it also required writing more error-handling code. Also there was memory used for the generic_ip pointer and often for a back pointer from the private structure to the common struct inode. Both of these are wasted space compared with the union approach or the embedding approach.
使用 `generic_ip`,每個檔案系統需要在每個 `inode` 做兩次分配而不是一次,這可能讓我們浪費更多的空間,取決於分配空間時是如何 rounded up 結構大小;這也讓檔案系統需要寫更多的 error-handling 程式碼。同時這個方法也需要把空間分配給 `generic_ip` 指標與一個 `back` 指標給(這個指標從私有結構指向公共層 inode)。與前述的 union 方法與 embedding 方法相比,這是很浪費空間的。
> Worse than this though, an extra memory dereference was needed to access the private structure from the common structure; such dereferences are best avoided. Filesystem code will often need to access both the common and the private structures. This either requires lots of extra memory dereferences, or it requires holding the address of the private structure in a register which increases register pressure. It was largely these concerns that stopped struct inode from ever migrating to broad use of the generic_ip pointer. It was certainly used, but not by the major, high-performance filesystems.
糟糕的不只如此,在公共結構中需要額外的 dereference
Though this pattern has problems it is still in wide use. struct super_block has an s_fs_info pointer which serves the same purpose as u.generic_ip (which has since been renamed to i_private when the u union was finally removed - why it was not completely removed is left as an exercise for the reader). This is the only way to store filesystem-private data in a super_block. A simple search in the Linux include files shows quite a collection of fields which are void pointers named "private" or something similar. Many of these are examples of the pattern of extending a data type by using a pointer to a private extension, and most of these could be converted to using the embedded-structure pattern.