everythings is file -> everythins is file descriptor
mmap and the read/write system calls are both used for input and output operations, but they serve different purposes and have different use cases. Let's compare the two:
mmap:
Memory Mapping:
mmap is primarily used for memory mapping files into a process's virtual address space.
The file contents can be accessed directly as if they were in-memory data structures.
Flexibility:
It provides a more flexible and efficient mechanism for reading and writing files, especially when working with large files.
Changes made to the memory-mapped region are automatically reflected in the file, and vice versa.
Shared Memory:
Allows multiple processes to share the same memory-mapped region. This enables interprocess communication (IPC) through shared memory.
Zero-Copy I/O:
It's often used in networking for zero-copy operations, where data is directly read or written from/to memory-mapped buffers.
Anonymous Mapping:
Can be used for anonymous memory mapping, where no file is involved.
read/write:
Sequential I/O:
read and write are traditional system calls for reading from and writing to files sequentially.
They read or write a specified number of bytes at a given file offset.
Buffered I/O:
Often used with a buffer for improved efficiency, especially when reading or writing small amounts of data at a time.
Random Access:
Suitable for random access of files, where the application decides where to read or write data within the file.
Stream-Based:
read and write operate on file descriptors and are generally used with regular files, pipes, sockets, and other file-like entities.
Considerations:
Granularity:
mmap provides a memory-mapped view of the entire file or a portion of it, while read and write operate on a specified number of bytes at a time.
Complexity:
mmap can introduce additional complexity, especially when dealing with synchronization in shared-memory scenarios.
Performance:
mmap can be more efficient for certain use cases, especially when dealing with large files or scenarios where shared memory is advantageous.
Use Cases:
Use mmap when you want to treat a file as if it were an in-memory data structure or for certain performance-critical scenarios.
Use read/write for sequential or random access file I/O and when simplicity is preferred.
Example:
In summary, choose between mmap and read/write based on your specific requirements and the characteristics of your data access patterns. If you need shared memory, zero-copy I/O, or the ability to treat a file as if it were in memory, mmap may be more suitable.If you require sequential or random access file I/O with a simpler interface, use read/write.
9p filesystem passthrough -> 在虛擬機器中廣泛使用
Linux 秉持 UNIX 哲學 "Everything is a file",後者被 The Linux Programming Interface 一書 (該書可簡稱為 TLPI) 稱作 "Universality of I/O":
One of the distinguishing features of the UNIX I/O model is the concept of universality of I/O. This means that the same four system calls—open(), read(), write(), and close()—are used to perform I/O on all types of files, including devices such as terminals.
因此,看似不同種類的 I/O (如輸出到終端機 vs. 寫資料到檔案),即可在這個一致的抽象層下運作(都可以用檔案操作視之)。裝置驅動程式的目的就是要給出這個抽象層的實作,使得作業系統能夠透過一致的介面去存取。該書第 14 章提到:
A device special file corresponds to a device on the system. Within the kernel, each device type has a corresponding device driver, which handles all I/O requests for the device. A device driver is a unit of kernel code that implements a set of operations that (normally) correspond to input and output actions on an associated piece of hardware. The API provided by device drivers is fixed, and includes operations corresponding to the system calls open(), close(), read(), write(), mmap(), and ioctl().
這些操作大致上是那些基於 file descriptor 的操作,比如 open()
, read()
, write()
, lseek()
等等。
考慮到 UNIX 的實際行為,應該把 "Everything is a file" 改寫為 "Everything is a file descriptor" 會更恰當。Linux 核心貫徹這理念,實作出若干以 fd
結尾的核心機制,如:
UNIX 和後續的 BSD (可視為具備血緣關係) 及 (半途殺出的程咬金) Linux 並非真的落實 "Everything is a file",在 socket 存在例外。Bell Labs 在 UNIX 之後,發展出 Plan 9 作業系統,才算是真正的 "Everything is a file",但沒有太多開發者重視。在 Plan 9 作業系統中,所有的裝置和服務都視作檔案為基礎的操作,例如 telnet, ftp, nfs 等等。
The client process's input/output on virtual files, that appear in other processes' namespace, becomes inter-process communication between the two processes. This way, Plan 9 generalizes the Unix notion of the filesystem as the central point of access to computing resources. It carries over Unix's idea of device files to provide access to peripheral devices (mice, removable media, etc.) and the possibility to mount filesystems residing on physically distinct filesystems into a hierarchical namespace, but adds the possibility to mount a connection to a server program that speaks a standardized protocol and treat its services as part of the namespace.
process 間的通訊可藉由對虛擬檔案進行 I/O 操作來達成,因此從檔案系統可輕易的存取週邊裝置,也能掛載特定通訊協定 (9P) 的伺服器程式
All programs that wish to provide services-as-files to other programs speak a unified protocol, called 9P.
Plan 9 儘管在商業上不成功,開發者社群規模也有限,但其精神卻在 Linux 和虛擬化技術發揚光大。節錄 Virtio: An I/O virtualization framework for Linux 如下
Linux is the hypervisor playground. As my article on Linux as a hypervisor showed, Linux offers a variety of hypervisor solutions with different attributes and advantages. Examples include the Kernel-based Virtual Machine (KVM), lguest, and User-mode Linux. Having these different hypervisor solutions on Linux can tax the operating system based on their independent needs. One of the taxes is virtualization of devices. Rather than have a variety of device emulation mechanisms (for network, block, and other drivers), virtio provides a common front end for these device emulations to standardize the interface and increase the reuse of code across the platforms.
A: we need more file status than the
A: While many aspects of the system are represented as files, there are certain scenarios where the "everything is a file" concept has limitations or might not be applicable. Here are some reasons:
Device Files:
In Unix-like systems, device files (e.g., /dev/sda for a disk or /dev/tty1 for a terminal) are used to represent hardware devices. However, not all devices fit neatly into the file abstraction, especially more complex devices with multiple interfaces and functionalities.
Network Sockets:
While Unix sockets and named pipes are file-like abstractions, network sockets are not always straightforward to represent as files. Network communication involves more complexities than simple file read and write operations.
Processes and System Information:
Information about processes, system statistics, and other system-related details are often exposed through special files or directories in the /proc filesystem. While these provide a way to access system information, they don't strictly adhere to the traditional file concept.
Signals and IPC Mechanisms:
Interprocess communication (IPC) mechanisms such as signals and semaphores do not neatly fit into the file abstraction. They involve signaling and coordination between processes, which is more complex than file-based interactions.
Memory Mapped Files:
While memory-mapped files use the file abstraction, they involve mapping a region of a file directly into memory. This may not be a natural fit for all scenarios.
Security and Special Files:
Security-related information and special files may not fit the "everything is a file" concept in a straightforward manner. For example, files in /sys related to kernel parameters and configurations may not be typical files in the traditional sense.
In summary, the "everything is a file" concept is a generalization that simplifies the programming interface and provides a unified way to interact with various resources. However, not all resources or scenarios can be perfectly represented as files. Some abstractions, such as network communication, complex devices, or interprocess communication, require different mechanisms and interfaces. The concept serves as a guiding principle rather than a strict rule in all situations.
A distributed operating system is an operating system that runs on a collection of independent computers and makes them appear to be a single, coherent system. The primary goal of a distributed operating system is to provide transparency and a unified view of the resources distributed across multiple machines. Here are some key concepts associated with distributed operating systems:
Transparency:
Transparency is a key goal, and it involves hiding the distribution of resources from users and applications. This includes transparency in terms of access, location, migration, and failure.
Communication:
Distributed systems rely heavily on communication between different components. Communication mechanisms need to be efficient, reliable, and capable of handling various types of interactions.
Concurrency and Synchronization:
Handling multiple processes executing concurrently on different machines requires synchronization mechanisms to ensure consistency and avoid conflicts.
Fault Tolerance:
Distributed systems often incorporate fault-tolerance mechanisms to ensure that the system can continue to function in the presence of failures, whether they are hardware failures or network issues.
Resource Sharing:
Distributed operating systems facilitate resource sharing across multiple machines. This includes sharing of computational resources, storage, and other services.
Scalability:
Distributed systems are designed to scale horizontally by adding more machines to the system. This allows them to handle larger workloads and provide better performance.
Now, regarding Plan 9 from Bell Labs, it is not a traditional distributed operating system in the sense of running on a collection of independent computers. Instead, Plan 9 is an innovative operating system designed at Bell Labs as the successor to Unix. Plan 9 introduces a concept known as "distributed computing within a single address space."
In Plan 9, the entire network, including remote resources and services, is integrated into the file system, and everything is accessible through a unified namespace. Plan 9 blurs the line between local and remote resources, presenting them to users and applications as if they were part of a single, coherent system.
While Plan 9 doesn't fit the traditional definition of a distributed operating system, it does introduce unique concepts for distributed computing within a single address space, offering a different approach to resource access and management compared to traditional distributed operating systems.
In Linux, file status or metadata associated with a file is often referred to as file attributes. Key file attributes include:
File Type and Permissions:
File type (regular file, directory, symbolic link, etc.) and permissions (read, write, execute) for the owner, group, and others.
Owner and Group:
The user and group associated with the file. These are identifiers based on the user and group information on the system.
File Size:
The size of the file in bytes.
Timestamps:
Three timestamps associated with a file:
Access Time (atime): When the file was last accessed.
Modification Time (mtime): When the file's content was last modified.
Change Time (ctime): When the file's metadata (permissions, ownership) was last changed.
Inode Number:
A unique identifier assigned to each file in a filesystem. It is used by the filesystem to locate and manage the file.
Hard Links:
The number of hard links pointing to the file. A hard link is an additional name for an existing file.
File System:
The filesystem on which the file resides.
Device ID (for block and character special files):
For block and character special files (e.g., devices), information about the device ID is stored.
These attributes collectively provide information about a file's characteristics, ownership, timestamps, and location on the filesystem. While these attributes cover a significant amount of information about a file, there are aspects of files and their behavior that are not described by these metadata. For example:
Content: File attributes do not directly describe the content of a file. The actual data within the file is not part of the file's metadata.
Extended Attributes: Some filesystems support extended attributes, allowing additional metadata to be associated with a file beyond the standard attributes mentioned above.
File Content Type: The type of data stored in a file (e.g., text, binary, image) is not explicitly captured in file attributes.
Security Contexts: Attributes related to security contexts, such as SELinux or AppArmor information, may not be fully represented in standard file attributes.
In summary, while file attributes in Linux provide crucial information about files, they do not cover every aspect of file content, extended attributes, or security-related information. For a more comprehensive understanding, additional tools, commands, and considerations may be needed depending on specific requirements.
Both Plan 9 and Linux share some common concepts when it comes to file attributes, but there are differences in their approaches. Below, I'll outline key aspects of file attributes in both operating systems:
Plan 9:
Unified Namespace:
In Plan 9, everything, including devices, network services, and even running processes, is represented as files in a unified namespace. This concept is known as the "everything is a file" philosophy.
Distributed File System:
Plan 9 incorporates a distributed file system where remote resources are seamlessly integrated into the file system namespace. Users and applications interact with remote resources in the same way as local ones.
Resource Access through File Operations:
Plan 9 represents various system resources as files, and users can access these resources through standard file operations like read and write. For example, network communication is achieved by reading from and writing to special files.
Dynamic File Content:
In Plan 9, some files, such as /dev/mouse, provide dynamic content. Reading from these files gives real-time updates, making them behave like input streams.
Linux:
Traditional Filesystem Hierarchy:
Linux follows a more traditional filesystem hierarchy where different types of resources are organized into directories such as /dev, /proc, and /sys. While /proc and /sys provide information about processes and the kernel, they are not treated as part of a unified filesystem namespace in the same way as in Plan 9.
Special Directories:
Linux uses special directories like /dev for device files, /proc for process-related information, and /sys for kernel-related information. These directories contain files representing various attributes, and users interact with these files using standard file operations.
Extended Attributes:
Linux supports extended attributes (xattrs), which allow additional metadata to be associated with a file. Extended attributes can store information beyond the standard file attributes.
Security Contexts:
Linux systems with security-enhanced features (e.g., SELinux, AppArmor) associate security contexts with files, providing additional security-related information.
Differences:
Namespace Philosophy:
Plan 9's "everything is a file" philosophy creates a more uniform and unified namespace for all system resources. Linux, while providing file-like abstractions for many resources, does not adopt the same level of unification.
Dynamic Content:
Plan 9 allows for dynamic content in certain files, which can be continuously updated. Linux typically provides static or periodically updated content for files in /proc and /sys.
Distributed File System:
Plan 9 incorporates distributed computing within a single address space, seamlessly integrating remote resources. Linux typically relies on network protocols for remote file access.
In summary, both Plan 9 and Linux share the concept of representing various resources as files. However, Plan 9 takes a more unified and integrated approach, treating everything as part of a single namespace, while Linux maintains a more traditional filesystem hierarchy with special directories for different resource types. The design philosophies of each system lead to different approaches in terms of namespace, resource representation, and dynamic content.
會增加系統複雜度,所以直接用中斷的方式而不是multi threading
鳥哥