# Charting a Path for Julia ==Work in progress== :::info *A unified filesystem model for Julia, built on paths, handles, and an abstract filesystem interface.* ::: **Authors:** Timothy, [add your name if you've made substantial edits] **Reviewed-by:** [your name here] **Comments-from:** Chengyu Han, Jānis Erdmanis , Kevin Bonham, Neven Sajko, Miles Cranmer, [your name here] [toc] # History Julia's approach to file paths was largely inspired by Python ...just before `Pathlib` was adopted. In the years since, the idea that a path type would benefit Julia has been articulated multiple times, in different ways. - In 2013 the path methods we know today [were introduced](https://github.com/JuliaLang/julia/commit/6f9fb22eccf7ecbcba9158f19bb24985623b3ca4). - In 2014, `@stevengj` made [an issue](https://github.com/JuliaLang/julia/issues/9488) proposing a mildly cursed partial workaround for the lack of a path type. - In 2016, FilePathsBase.jl [was started](https://github.com/rofinn/FilePathsBase.jl/commit/c20ecd2d36e305d2bde69eb07fec358b4f3c38aa). - In 2017 (just before Julia 1.0), Frames wrote [a Julep](https://github.com/JuliaLang/Juleps/pull/26) advocating for a path type, but unfortunately it didn't go anywhere before Julia 1.0 was out. - In 2018 this was incidentally mentioned in [a discourse topic](https://discourse.julialang.org/t/appending-trailing-slashes-to-paths/15922/20). - In 2020, a newcomer from CommonLisp opened [a discourse topic](https://discourse.julialang.org/t/better-handling-of-pathnames/36792) about missing a path type - In 2020, [an issue](https://github.com/JuliaLang/julia/issues/38415) was opened in the main Julia repo on this. - In 2021 Jakob wrote [a post](https://viralinstruction.com/posts/badjulia/#there_is_no_path_type) that stuck in my mind examining flaws in Julia, including the lack of a path type in the language. - In 2022, ExpandingMan [starts working on](https://gitlab.com/ExpandingMan/FilePaths2.jl/-/commit/423d6ec97baa2fb79d1aa74a4dfe095cf671f4fb) FilePaths2.jl. - In 2024, I got fed up enough with this after the latest papercut I experienced to write a gripe on Slack. Julia's Slack only keeps 90 days of conversation history, but you can usually search for `"path type"` and find _somebody_ running into papercuts/headaches. Ignoring my recent gripe, doing this I see Mosé in response to some platform-specific handling that came up on a Julia PR adressing the difference a trailing slash makes with some depot operations. > We should really have a proper path type, strings are simple bad for manipulating them `(💯 ×2)` The use of strings as paths also precludes (or at least complicates) support for filesystems beyond that of the current operating system. Virtual filesystems are increasingly recognised as a useful abstraction in programming languages, not just operating systems (see: `io/fs` in Go for example), but very few languages provide built-in support (Java is a notable exception with NIO2). # Motivation While **C** and friends use char-vector types for strings, paths, and more, most modern high-level languages have settled on a dedicated path type, and for good reason. While familiar, this approach conflates several distinct concepts: textual data, namespace locations, and filesystem resources. This places a significant burden on users and library authors to reason about correctness, safety, and platform-specific behaviour. Users with experience with a modern language (such as Rust, Python, Java/Kotlin/Scala, Swift, or C++17) will be familiar with the value of a dedicated path type, and the importance of it being part of the base language/stdlib. Julia already has dedicated non-`String` types for regular expressions, substitution strings, and more. Filesystem paths merit the same treatment. A first-class path type offers several concrete benefits: - Resolving the ambiguity of representing data as a string directly vs. a path to the data - Allowing for dispatch on text content/paths, and more generic functions - A platform-independent syntax and methods for working with paths - Less footguns/papercuts, from reduced ambiguity and more rigorous handling - Support for virtual filesystems This path type must exist in Julia's base, for two primary reasons: - Base itself makes extensive use of paths - A 3rd party library cannot provide the same level of ecosystem-wide coherence and consistency Experience from other languages also reveals that paths should be considered together with the context they operate within. The filesystem uses paths as means to resolve a reference to a resource, and paying careful attention to what this means allows us to generalise filesystem interaction to support capability-based access and virtual file systems. # Terminology While the terms used to describe files, paths, and other filesystem-related entities and operations are standard, when splitting hairs (as this proposal does), it is important to be precise with our language. To that end, here are the specific concepts we will be interrogating: - A **path** is a description of a location within a namespace. It is a structured value composed of segments, and its meaning is defined only relative to a particular filesystem and (in general) a point of reference within that filesystem. A path does not identify a resource directly, nor does it convey authority to access one. - A **handle** is an authoritative reference to a specific resource. Handles are typically produced by resolving a path, and they provide the ability to interact with the referenced resource. Unlike paths, handles do not describe where a resource is located in the namespace; they refer directly to the resource itself. Handles are temporal in nature: they may become invalid over time (for example, when closed). - A **filesystem** is an organisation of resources together with rules for naming, resolving, and accessing those resources. Conceptually, a filesystem defines a namespace in which paths may be interpreted and resolved. Resolution establishes a relationship between paths and resources, but this relationship is not assumed to be stable across time or operations. - **Resolution** is the act of turning a path into a handle within the context of a filesystem. Resolution is an effectful operation: it may fail, and its result may depend on the state of the filesystem at the time it is performed. Making resolution explicit is central to distinguishing between operations on names and operations on resources. - A **capability** is the authority to perform a particular class of operations on a resource or within a filesystem. In this proposal, capabilities are represented explicitly by values (usually handles) and are not assumed implicitly through ambient state or global context. Treating filesystem access as capability-based allows code to be written that is safer, more composable, and more amenable to restriction or sandboxing. # Design goals ## High level path interface While we want to end up with a `Path` type, it would be good to take a step back, consider what makes a "path" conceptually, and define an abstract type we can then specialise on. I propose that in the abstract, a path is an ordered series of directions that takes you to a location. From this, we can conceptualise a path as a list of direction segments, and arrive at a few fundamental operations: - `root`: the origin point - `parent`: the sequence of directions up to but excluding the most recent one - `length`: the number of directions in the path - `iterate`: give each direction of the path - `basename`: the most recent segment - `children`: the immediate next paths one may take - `joinpath`: combine two sets of directions A path that includes a root is considered absolute, and other paths are relative. ## Avoid representational ambiguities Thinking of a path as a representation of a location, the existance of the `.` and `..` pseudopath components complicates path considerations: - are `foo/bar/..`, `foo/.`, and `foo` equal? - is `foo/bar/` the same as `foo/bar`? - are `foo\\bar` and `foo/bar` equivalent on Windows? These questions tend to fall under path normalisation, but by using a dedicated path type and path-specific operations I contend that we can decide on rules for a canonical representation of a path, and make it the _only_ form that can be constructed. There is no need for path normalisation, as it is no longer possible to construct an abnormal path. Frames also discusses the algebraic appeal of normalised paths in her blog post. It also feels like the more principled choice to me, and I suspect it makes it harder to fall into a few edge cases. ### The complication that is *symlinks* It has come to my attention that due to symlinks the first question _cannot_ be answered in the context of a real/concrete path _without_ querying the filesystem. On-disk, `foo/bar/..` != `foo` with symlinks. This has been the cause of much consternation, and extensive discussion on Slack. Various ideas were discussed on how to best handle this complication, including: - Calling `realpath` in the background - Returning `nothing` or throwing an error when `parent` is called on a path ending in `..` - Using type/field information to split types into pure/concrete types *a la* Pathlib, and handle them differently While mired in the messy filesystem details, Julius made the excellent point that given the filesystem is in a constant state of flux --`realpath` can be wrong the moment after it returns-- and so it's overly presumptuous of us to try to handle this using information in the path object itself (using the type domain or runtime information). So, we should be upfront about this and just say that if you want operations on a path to take into account the filesystem state at the time, you need to call `realpath`. > ==Note== Should `realpath` be `resolve` post-`Path`? It would also be good to have a unnormalised string to real `Path` function (ideally the same function). This has the following benefits: - Simplified model for path operations - Predictable normalisation - Separation of concerns ## Make invalid paths unconstructable There are some characters that may not appear in a path. - Posix (Linux/BSD/Mac): the null byte (`\0`). - Windows: `|`, the null byte (`\0`), and ASCII codes `\x01` ~ `\x31`. There are also some restrictions on filenames: - Posix: - `/` and the null byte (`\0`). - Reserved file names: `.` and `..` - Windows: This is a superset of the restrictions for Path - Reserved characters: `<`, `>`, `:`, `"`, `\`, `/`, `|`, `?`, and `*`. - null byte (`\0`) - ASCII codes `\x01` ~ `\x31`. - Reserved names: `CON`, `PRN`, `AUX`, `NUL`, `CON1` ... `COM9`, `LPT1` ... `LPT9` regardless of extension - Any filename ending with ` ` or `.` We will also pretend that the empty path is never allowed (see: [Compromises](#Pretending-empty-segments-are-invalid)). **References:** - Unix/Win, [Comparison of filename limitations - Wikipedia](https://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations) - Win, [Naming Conventions - Win32](https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#naming-conventions) - Win, [doctaphred/ntfs-filenames.txt - Gist](https://gist.github.com/doctaphred/d01d05291546186941e1b7ddc02034d3) - Win, [C# `GetInvalidFileNameChars()`, `GetInvalidPathChars()` - dotnet/runtime](https://github.com/dotnet/runtime/blob/511d26611c051c56e546404ea616c220cc78817c/src/libraries/System.Private.CoreLib/src/System/IO/Path.Windows.cs#L15-L31) It's fairly easy to apply the Posix path restrictions when constructing paths, but Windows is a bit of a pain, making me think perhaps it's not worth the effort. Since the Windows restrictions are a (large) superset of the Posix restrictions, one approach that I'd like to explore is validating the Posix requirements are met during path construction, and then maybe checking for forms Windows wouldn't like in literal path construction (with the `p""` macro) and emitting a warning. ## Cross-platform path construction Posix exclusively uses the `/` delimiter, and Windows accepts `\` (preferred) or `/`. As such, we can reasonably settle on `/` as the in-Julia syntax for paths, and handle operating system dependent normalisation in the background. This makes it impossible to accidently hardcode a particular platform's delimiters. ## Convenient prefixes ### The `~` home shorthand The handling of `~` in paths in Julia tends to be trip up people used to shell expansion, but there's a very good reason why Julia doesn't go ahead an interpret `"~/dir"` as `/home/$USER/dir` but requires `expanduser`: `~` is a valid path segment. Without knowing the _intent_ with which the `~` was written (or generated and passed around) it is not possible to reasonably decide whether it should be interpreted as a `"~"` segment or a reference to the home directory. With a path macro, this changes. We can differentiate between a `~` that has been put literally at the start of a path, and a `~` that's come from elsewhere. This makes the convienent `~`-home interpretation viable, without re-introducing the current issues. As a tradeoff expressing an initial `~` segment becomes less convenient, but given the relative frequency of home vs. `"~"` forms, this seems like a worthwhile tradeoff. ### Introducing `@` project shorthand > ==Note== I'm not completely sold on this idea, but I'm interested and it seems worth exploring Within package and project code, it is common to see forms like `joinpath(@__DIR__, "..", "..", "assets", "file.txt")`. There are two major issues with this: 1. Poor clarity of intent: this is an attempt to express the a target location within a project, but the path is expressed relative to the current file (wherever it may be within the project) rather than the project itself. The fact the form is twisted as a result (with `@__DIR__, "..", ".."`) only makes this less apparent. 2. As a result of the poor clarity/expression, this form is slightly fragile: moving the source file around will break the reference, even if the target remains in the same location within the project. Extending the "special literal prefix" handling to treat `@` as a project-prefix as `~` is a user-prefix can improve this situation. The choice of `@` seems natural given the existing use of `@`-prefixed special paths in `DEPOT_PATH` and `--project` already. Besides the two issues above, a `@`-prefix also provides an oppotunity to improve the status quo with regard to relocatability. Enough Julia packages use `@__DIR__` in paths to make relocatability a general issue (motivating [RelocatableFolders.jl](https://github.com/JuliaPackaging/RelocatableFolders.jl) and [julia/PR#55146](https://github.com/JuliaLang/julia/pull/55146)). Implementing `@` as a relocatable project-relative path (determined at compile-time) creates a form that is both more convenient and more robust, a "pit of success". ## Platform-specific path types It seems likely useful to still be able to model paths of other platforms, and we can do this without compromising ergonomics fairly easily by defining `<Platform>Path` types and then aliasing `Path` to the current platform. ## Low overhead when invoking Libuv path methods Using a contiguous null terminated char array (whether a `String`, `Memory{UInt8}`, or `Vector{UInt8}`) for the internal representation of system paths makes it possible to pass the path representation directly to Libuv with no overhead. ## SubStrings as the type of path components Since we've got good reason to want a single contiguous on-disk format, operations that fetch components of the path will either have to allocate a new object ... or we can use a `SubString`. This is the approach FilePaths2.jl takes, and I like it. ## Iterable segments Since we're viewing a path as an ordered sequence of directions, it makes sense to be able to `iterate` through them. Together with `length`, this also makes it possible to simply `collect` the segments of a path. ## Safe path interpolation When interpreting externally provided content as a path, the existence of the pseudopath element introduces a risk of ending up in an unexpected directory. The intent of code like `joinpath(workdir, subdirname)` can be subverted (deliberately or accidently) by provididing a `subdirname` like `/some/other/dir`, `../stuff`, the empty string, or even a null byte. This is most of the [Path Traversal](https://owasp.org/www-community/attacks/Path_Traversal) class of CVEs. When interpreting a string as a path segment, we can validate that it is a "normal" path segment and raise an error otherwise, preventing a suprising result from appearing with forms like `p"path/$var/$name.txt"`. # Design nuance ## Recognising the difference between locations and resources In conversation, I've recieved a fair bit of pushback from multiple individuals on the normalisation I've proposed in the prior section. The essential argument is that the nice, algebraic model of paths isn't able to fully abstract over the world of system-specific details. To give a few examples: - There's the symlink stuff from ealier - `stat foo/bar` is different to `stat foo/bar/.` - Some tools like `rsync` treat `foo` differently to `foo/` - `/..` is `/` (root is a fixed point essentially on Linux) I really dislike these complications, particularly because in accepting this messiness we abdicate the handling of it to the user, and thus make it easier to write naievely buggy code. Looking at this another way, even more pushback along these lines is deserved. There is an abundence of unquestioned issues of this nature with the current status quo. For example: it is currently not possible to write to a file and then move it without there being an oppotunity for the file to be replaced entirely inbetween each step. This kind of issue and much of the pushback I've recieved essentially stems from one core issue: we often like to _think_ of paths as unique _resource_ descriptors, when infact they are unique _location_ descriptors. This is a subtle but important distinction. It is responsible for a large class of bugs and vunerabilities known as [TOCTTOU](https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use) (Time of Check to Time of Use). Essentially any time a path is reused with any degree of outside influence (over the path or filesystem), it is near trivial to swap out the file in between operations by constructing a deep directory nesting and monitoring the directory `atime`s (yes, really). The only system where we can truthfully say this is not an issue is one with no concurrency of any sort (including time-sharing). This is in large part a consequence of the initial POSIX standard being path oriented, a limitation that is gradually being rectified with the addition of `f<op>` and `<op>at` calls, such as `faccessat`. These calls operate not on a path to the file, but a handle to the file itself: a file descriptor, or FD for short. File descriptors essentially sit in between a description of the location of a resource, and the data on disk. I am less familiar with the NT situation, but am lead to believe that it has been ahead of nix in supporting handle-based path operations. Other programing languages have also recognised this issue, for example Python's Pathlib seperates paths into _pure_ and _concrete_ paths, creating a clear split between an abstract conception of a path (as I've found myself attracted to thus far), and something that actually exists on the filesystem. I suspect we can go even further, and consider a scheme by which we provoke the user into obtaining a reference to the *resource* at a path when they want to work on it, and so avoid TOCTTOU-style issues and related messiness wholesale. I conjecture that with the development of the file descriptor based API in POSIX 2008, Linux 2.6, and OpenBSD we have the capability to fufill this ideal by building a path-like type that is oriented around file descriptors rather than path strings. We can make a *more* concrete path type than Pathlib's "concrete" paths. Arguably this isn't really a *path* any more so much as a handle to a filesystem-adressed resource. I'm not sure what best to call this, but regardless it seems exceptionally useful for writing safe filesystem-interacting code. ## Performance considerations ## Making the safe path the happy path From all the investigation we've done so far, we know that: - An abstract path type allows for sensible and efficient path manipulation - A concrete fd-based path type allows for some TOCTTOU-safe filesystem operations - A specialised directory entry type allows for efficient `readdir` usage, and for other TOCTTOU-safe filesystem operations Currently, we just use `String` for all of these purposes. This is "simple" in the sense that all of the inherrent complexity is put off to the user of the API to think about. By contrast, this trio of system path types requires a little more upfront thinking, but this is paid for several times over in the reduction of edge cases that package developers and end users may hit. ## Mitigating Pain Points While I like this set of design goals, they're ultimately a compromise between various concerns, and so produce some potential pain points. This should be mitigated as much as possible. ### Symlinks and pseudoparents With the separation of concerns of the design, path operations are very predictable. However, if pseudopaths are present and/or symlinks need to be accounted for, `realpath` will need to be called. This is something that people using `Path` objects will simply need to remember to do, and so we will sprinkle mention of this liberally into the documentation. ## Need to convert strings to paths for common operations Made easier via interpolation within the `p""` macro. ## Compromises ### Pretending empty segments are invalid If one reads on [Linux Pathname Lookup](https://docs.kernel.org/filesystems/path-lookup.html), one may notice that while empty segments are generally invalid, with the appropriate flags it can be valid to interact with the empty path. I cannot begin to imagine any legitimate use case for this, and so am inclined to pretend this edge case doesn't exist, particularly since there's no easy way to supply the necesary flag to Julia's (public) filesystem functions. ## Unresolved questions - Is treating the drive + `/` as the path root on Windows good enough? - Should we take this oppotunity to copy FilePathsBase.jl / FilePaths2.jl and provide more structured outputs to functions like `uperm`? - Can we get away with eagerly normalising `..` and requiring `realpath` when you need to guard against symlink shenanigans? - Do we want to under-the-hood transform absolute Windows paths to [verbatim-prefixed paths](https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#win32-file-namespaces) (`\\?\`), for long file name support? ### Now resolved through community discussion - Should joining two absolute paths return the latter absolute paths, or raise a runtime warning/error? - An error should be thrown # Proposal ```mermaid %%{init: {"flowchart": {"defaultRenderer": "elk"}}}%% flowchart TB %% Top AbstractResolvable["AbstractResolvable{H <: AbstractHandle}"] AbstractPath["AbstractPath{S, H}"] AbstractHandle["AbstractHandle"] AbstractPath -->|"handle morphism"| AbstractHandle AbstractResolvable --> AbstractPath AbstractResolvable --> AbstractHandle %% Paths PlainPath["PlainPath{H}"] PlatformPath["PlatformPath{H}"] XPath["XPath"] URI["URI"] PosixPath["PosixPath"] WindowsPath["WindowsPath"] SystemPath["SystemPath{LocalFileHandle}"] LocalFilepath["LocalFilepath"] DirEntry["DirEntry"] AbstractPath --> PlainPath PlainPath --> PlatformPath PlainPath --> XPath PlainPath --> URI PlatformPath --> PosixPath PlatformPath --> WindowsPath PlatformPath --> SystemPath SystemPath --> LocalFilepath SystemPath --> DirEntry %% Handles AbstractFileHandle["AbstractFileHandle{F}"] LocalFileHandle["LocalFileHandle{LocalFilesystem}"] AbstractHandle --> AbstractFileHandle AbstractFileHandle --> LocalFileHandle %% Filesystems AbstractFilesystem["AbstractFilesystem"] LocalFS["LocalFilesystem"] VirtualFS["Virtual FS"] AbstractFilesystem --> LocalFS AbstractFilesystem --> VirtualFS LocalFS -.-> LocalFileHandle %% Cross-link LocalFileHandle -.-> SystemPath %% Styling classDef abstract fill:#e8f0ff,stroke:#4c78ff,stroke-width:1.5px; classDef concrete fill:#eaffea,stroke:#2ca02c,stroke-width:1.5px; class AbstractResolvable,AbstractPath,AbstractHandle,AbstractFilesystem abstract; class PlainPath,PlatformPath,SystemPath,AbstractFileHandle abstract; class LocalFS,VirtualFS,LocalFileHandle,LocalFilepath,DirEntry,PosixPath,WindowsPath concrete; ``` https://code.tecosaur.net/tec/julia-basic-paths If you'd like to make a PR etc. this is also now mirrored to GitHub: https://github.com/tecosaur/julia-basic-paths I'm happy to take feedback in any form you're willing to give it. If easy/possible I like recieving a `.patch` with inline comments :slightly_smiling_face:. ## Non-breaking changes Ideally we'd use a time-travel machine to shoehorn this into Julia 1.0, but the second best time to add a path type to Julia is now. Avoiding breaking changes means we can't remove papercuts like `eachline(::String)`, but we can provide a better alternative, gradually adopt it, and push for it to become the status quo in the long term. # Prior art ## Summary ### Paths ### Virtual Filesystems ### Capabilities ## Paths ### Python's Pathlib > https://peps.python.org/pep-0428 > https://peps.python.org/pep-0519 > https://docs.python.org/3/library/pathlib.html Python's pathlib is generally praised for offering an ergonomic way of handling filesystem paths. It makes paths first-class types, with a deliberate split between _pure_ (lexical) path manipulation and _concrete_ (I/O) paths that interact with the filesystem. Each kind of path is further divided into POSIX and Windows flavours, with a mostly-uniform interface. ```mermaid flowchart TB subgraph PY["Python: pathlib"] PathLike["os.PathLike\n(__fspath__)"] PurePath["PurePath (abstract)"] PurePosixPath["PurePosixPath"] PureWindowsPath["PureWindowsPath"] Path["Path (abstract; concrete I/O paths)"] PosixPath["PosixPath"] WindowsPath["WindowsPath"] PurePath --> PurePosixPath PurePath --> PureWindowsPath PurePath --> Path Path --> PosixPath Path --> WindowsPath PurePath -.->|implements| PathLike Path -.->|implements| PathLike end ``` #### Sample usage ```python from pathlib import Path, PureWindowsPath import os # Pure, lexical construction (no I/O) win = PureWindowsPath(r"C:\Users\TEC") / "project" / "data.csv" # Concrete, effectful operations on the local OS filesystem p = Path("data") / "results.csv" text = p.read_text(encoding="utf-8") # opens/reads the file (I/O) # Explicit demotion for interop with APIs expecting a filesystem path representation os_path = os.fspath(p) ``` #### Pure and Concrete paths Pathlib separates out purely conceptual and filesystem-grounded paths as *pure* and *concrete* paths. Operating on pure paths does not involve any interaction with the filesystem, while concrete paths check for symlinks, resolve symlinks, and verify various path operations using the filesystem. This also makes the transition from operating on the path to interacting with the filesystem explicit. Note that this is _not_ true path "resolution" in the sense that concrete paths are still string based, instead of obtaining a resource handle. #### Posix and Windows paths Pathlib provides per-platform path classes, and aliases `Path`/`PurePath` based on the current platform. This preserves OS-specific semantics (drives/UNC, separators, etc.), and allows for working with non-native paths when needed, without forcing all callers to write per-platform code. #### Not a string subclass PEP 428 explicitly decides against deriving paths from `str` to avoid silent misuse. #### Interop via path protocol PEP 519 creates a path _protocol_ (`os.PathLike` / `__fspath__`) so that "path objects" can be accepted accross the stdlib. Path objects can be _demoted_ to `str`/`bytes` for legacy APIs, while constructors can _promote_ strings into path objects. #### Solid path API The Pathlib API provides the basics (`parent`, `parts`, `joinpath`, `home`), and also a decent collection of utilities on top: - `suffix` - `suffixes` - `stem` - `with_name` - `with_stem` - `with_suffix` - `with_segments` - `from_uri` - `as_uri` Currently Julia covers the basics, but could probably do with some more convenience functions. #### Tradeoffs - **Convenience and separation:** while having path manipulation and filesystem interaction methods all with the same class would be convenient, it is seen as more important to split the two, and provide `Path`/`PurePath` aliases to make the split more manageable. #### Limitations - **Single ambient filesystem context:** the host OS is always implicit - **No capability model:** there's no concept of authority, or restricted namespaces; access control is left to each calling context ### C++17 `<filesystem>` library > https://en.cppreference.com/w/cpp/header/filesystem.html > https://en.cppreference.com/w/cpp/filesystem/path.html > https://en.cppreference.com/w/cpp/filesystem/path/lexically_normal > https://en.cppreference.com/w/cpp/filesystem/canonical.html > https://learn.microsoft.com/en-us/cpp/standard-library/path-class?view=msvc-170 C++17’s filesystem library, based on the Boost library of the same name, introduces a dedicated `std::filesystem::path` value type for representing _names_ in a filesystem namespace, plus a family of functions for effectful filesystem operations. The design intentionally keeps path manipulation largely lexical, while making "touch the filesystem" operations explicit via separate APIs. ```mermaid flowchart TB subgraph CPP["C++17: std::filesystem"] fs_path["std::filesystem::path"] dir_entry["std::filesystem::directory_entry"] dir_iter["std::filesystem::directory_iterator"] fs_path -->|used by| dir_entry dir_iter -->|yields| dir_entry fs_file["std::fstream / std::ifstream / std::ofstream\n(OS handle wrapper)"] fs_ops["std::filesystem::* free functions\n(exists, canonical, copy, ...)"] fs_path -->|argument| fs_ops fs_path -->|open via| fs_file end ``` #### Sample usage ```cpp #include <filesystem> #include <fstream> namespace fs = std::filesystem; fs::path p = fs::path{"config"} / "app.toml"; // lexical: just builds a name fs::path normalized = p.lexically_normal(); // lexical: no filesystem access fs::path resolved = fs::weakly_canonical(p); // effectful: resolves existing prefix + symlinks std::ifstream in(resolved); // the handle is the stream/FD, not fs::path for (const fs::directory_entry& e : fs::directory_iterator(resolved.parent_path())) { if (e.is_regular_file() && e.path().extension() == ".toml") { // ... } } ``` #### Natively stored paths with a generic view Paths store the "pathname" in the native format, but allow viewing the path in a generic (POSIX) format too. There are explicit functions to go between the two forms. #### Separate lexical and filesystem queries Lexical operations (e.g. normalisation, relative path construction) are supported with a separate API from filesystem interaction (`canonical` / `weakly_canonical`). #### Exceptions and error-valued returns Most effectful operations come in two forms: a exception throwing form, and a overloads with `std::error_code&` that report failures out-of-band. #### Tradeoffs - **Lexical purity vs. meaningful normalisation:** having both lexical and filesystem operations operate on the same type is convenient, but introduces a softer separation of intent, and means that knowledge about whether a path refers to a real filesystem resource or not must be thought about and managed by the programmer. - **Interoperative convenience vs. sharp edges:** implicit convertability to/from `std::basic_string` makes adoption easy and incremental, but also blur intent and creates more opportunities for surprises in cross-platform code. #### Limitations - **No capability model:** the approach is only concerned with modelling path names/transformations, and not the use of paths. - **Single ambient filesystem context:** there's no mechanism (or room to insert one) for different kinds of filesystems. Libraries must create parallel APIs. ### Rust `std::path` (Path and PathBuf) > https://doc.rust-lang.org/std/path/index.html > https://doc.rust-lang.org/std/path/struct.Path.html > https://doc.rust-lang.org/std/path/struct.PathBuf.html > https://doc.rust-lang.org/std/fs/index.html > https://rust-lang.github.io/rfcs/0474-path-reform.html Rust provides `Path` (borrowed, unsized) and `PathBuf` (owned) as first-class path types, analogous to `str`/`String`, with representations designed to preserve platform-native path encodings and semantics. Filesystem effects are performed through `std::fs`, producing resource handles like `std::fs::File` and metadata, rather than "rematerialising" paths as handles. ```mermaid flowchart TB subgraph RS["Rust: std::path + std::fs"] OsStr["std::ffi::OsStr\n(borrowed OS string)"] OsString["std::ffi::OsString\n(owned OS string)"] Path["std::path::Path\n(borrowed; unsized)"] PathBuf["std::path::PathBuf\n(owned; growable)"] AsRefPath["trait AsRef<Path>"] IntoPathBuf["trait Into<PathBuf>"] BorrowPath["trait Borrow<Path>"] OsStr -->|backing| Path OsString -->|backing| PathBuf PathBuf -->|Deref<Target=Path>| Path PathBuf -.->|implements| AsRefPath PathBuf -.->|implements| BorrowPath StrTypes["String / &str"] -.->|implements| AsRefPath PathTypes["PathBuf / &Path"] -.->|implements| IntoPathBuf File["std::fs::File\n(OS FD handle wrapper)"] DirEntry["std::fs::DirEntry\n(resolved directory item)"] FsFns["std::fs::* fns\n(open, metadata, read_dir, ...)"] Path -->|argument (impl AsRef<Path>)| FsFns FsFns -->|returns| File FsFns -->|yields| DirEntry end ``` #### Sample usage ```rust use std::fs; use std::path::{Path, PathBuf}; let base: &Path = Path::new("data"); let mut p: PathBuf = base.join("results.csv"); // lexical name construction p.set_extension("tsv"); // mutate owned path buffer let meta = fs::metadata(&p)?; // effectful query via std::fs for entry in fs::read_dir(base)? { // std::fs takes P: AsRef<Path> let entry = entry?; if entry.path().extension() == Some("tsv".as_ref()) { // ... } } ``` #### Borrowed/owned split Rust models paths like strings: `Path` is a borrowed view used pervasively in APIs, while `PathBuf` is an owned, mutable buffer for constructing and editing paths efficiently (`push`, `pop`, `set_extension`, etc.). This makes path-heavy code ergonomic without forcing allocations at every boundary. #### Wrapping around the native OS representation `Path`/`PathBuf` are thin wrappers over `OsStr`/`OsString`, so they can represent platform-native paths that are not valid Unicode text. Conversions to `&str` are therefore fallible/optional, which forces callers to confront encoding rather than silently corrupting or rejecting valid OS paths. #### Comprehensive API for structure and transformation The standard API covers the common "shape of a pathname" operations: iteration by components, `file_name` / `file_stem` / `extension`, `parent`, prefix/suffix tests, `strip_prefix`, joining, and targeted edits (`with_extension`, `with_file_name`, plus mutable `PathBuf` setters). It is deliberately narrower than `pathlib`’s high-level conveniences (no standard `home()`/tilde expansion, globbing, recursive walking, etc.), which are see-as-needed in ecosystem crates. #### Hard separation between paths and filesystem operations Rust keeps path values largely inert: opening a file, reading metadata, canonicalising, iterating directories, etc. is primarily done via `std::fs` free functions or types. This makes "touches the filesystem" sites stand out in code, and ensures handles are explicit (`std::fs::File` being the canonical OS-backed handle wrapper). #### Interop via `AsRef<Path>` (promotion without a dedicated protocol type) Most filesystem APIs accept `P: AsRef<Path>`, allowing callers to pass `&Path`, `PathBuf`, `&str`, `String`, and other path-like inputs without a bespoke path-protocol mechanism. This makes adoption incremental while still standardising the "real" path vocabulary type at API boundaries. #### Tradeoffs * **Ergonomics vs purity:** keeping most effects in `std::fs` sharpens the "names vs resources" distinction, but also means fewer "one object does everything" conveniences compared with `pathlib`. * **Correctness vs string convenience:** non-Unicode-capable paths reduce footguns on real systems, but push more code toward `OsStr`/`OsString`-aware manipulation when doing text-like operations. #### Limitations * **Single ambient filesystem context:** `std` assumes the host OS filesystem; multiple filesystem instances and VFS composition are not standardised in the core API. * **No capability model in `std`:** capability-oriented directory handles and sandbox-friendly "*at-style*" patterns are left to crates (notably outside `std`). ### Racket paths > https://docs.racket-lang.org/reference/pathutils.html > https://docs.racket-lang.org/reference/Manipulating_Paths.html > https://docs.racket-lang.org/reference/file-ports.html Racket treats paths as a distinct datatype, while allowing most filesystem APIs to accept either a path value or a string that is promoted to a path. Path values preserve an underlying byte-oriented representation (so not all paths are losslessly representable as strings) and path-manipulation utilities can operate under Unix/Windows conventions independent of the host platform. ```mermaid flowchart TB subgraph RK["Racket: paths + ports"] PathVal["path value\n(path?)"] PathKind["platform kind\n(windows/posix)"] Str["string"] Bytes["bytes"] Str -->|"string->path"| PathVal Bytes -->|"bytes->path"| PathVal PathVal -->|has| PathKind InPort["input-port"] OutPort["output-port"] OpenIn["open-input-file"] OpenOut["open-output-file"] PathVal -->|argument| OpenIn PathVal -->|argument| OpenOut OpenIn -->|returns| InPort OpenOut -->|returns| OutPort end ``` #### Sample usage ```racket #lang racket ;; Path construction is lexical (define p (build-path (current-directory) "data" "results.csv")) ;; "Pure-ish" normalization is available (no filesystem access) (define p-lex (simplify-path p #f)) ;; Filesystem-aware normalization can consult the filesystem (e.g. symlinks) (define p-fs (simplify-path p)) ;; Path → handle (port); the port is the resource capability (backed by an OS fd) (define in (open-input-file p)) (define txt (port->string in)) ;; Interop / presentation: string conversion can be lossy (define display (path->string p)) ``` #### Dedicated path datatype with string interop Filesystem procedures generally accept either a string or a path value, promoting strings via `string->path`, while procedures that *produce* filesystem paths return path values. This keeps "path-as-name" visible in values without forcing an all-at-once ecosystem migration away from strings. #### Byte-oriented representation and lossy string rendering Paths round-trip through byte strings more faithfully than through Unicode strings: `path->string` decodes using the platform’s conventions and is explicitly documented as unsuitable for lossless "convert to string, tweak, convert back" workflows. This is a concrete example of separating *display text* from *filesystem name representation*. #### Convention-aware paths (Unix vs Windows) independent of host Racket tracks a path’s convention ("kind") and makes many non-effectful procedures sensitive to that kind. Construction from bytes supports an explicit `'unix`/`'windows` convention, enabling manipulation of (say) Windows paths on Unix when no filesystem access is required. #### Cleansing, lexical normalization, and filesystem-aware simplification Racket distinguishes several tiers of "make this path nicer": * **Cleansing**: many primitives cleanse inputs (e.g. redundant separators) before use; `cleanse-path` is explicitly non-effectful. * **Filesystem-aware adjustment**: `resolve-path` can dereference a single soft link, and `simplify-path` defaults to `use-filesystem? = #t`, potentially consulting the filesystem and accounting for soft links when eliminating `..` to preserve referential meaning. * **Pure mode**: `simplify-path` with `#f` performs syntactic simplification without filesystem access (and can operate on paths for any platform). #### Separation of names from resource handles Racket’s I/O APIs return *ports* (e.g. `open-input-file`) which are the capability-bearing handles used for reading/writing; paths remain names that are interpreted during the open. Ports are backed by OS-level descriptors underneath. #### Efficiency-oriented decomposition utilities The API includes primitives designed to avoid allocation patterns that arise from repeated splitting. For example, `explode-path` is documented to run in time proportional to the path length, unlike iterative `split-path` usage that allocates intermediate paths. #### Tradeoffs * **Interop convenience vs. strictness**: accepting both strings and path values keeps APIs ergonomic, but weakens the type-level separation between "text" and "path name" compared to designs that require explicit promotion everywhere. * **Convenient defaults vs. explicit context**: some "pure" utilities still consult ambient process state (e.g. `current-directory` as a default base), which is pragmatic but less explicit than a fully capability-oriented model. #### Limitations * **Ambient filesystem context**: paths do not carry an explicit filesystem object/context; operations are fundamentally anchored to the host OS filesystem semantics. * **No first-class capability model for namespace restriction**: while ports are handles, there is no standard "directory capability" pattern (à la `openat`-style APIs) that makes authority over a subtree explicit in function signatures. ### Common Lisp filepaths library > https://github.com/fosskers/filepaths `filepaths` is a "modern and consistent filepath manipulation" library for Common Lisp which consolidates scattered pathname utilities, fills in commonly-missed operations, and renames them to be more predictable. It operates purely on *names* (CL pathnames / namestrings) and intentionally does not probe the filesystem. ```mermaid flowchart TB subgraph CL["Common Lisp: pathname + filepaths library"] Pathname["CL pathname\n(#p\"...\")"] Strings["string"] Strings -->|parse/merge| Pathname LibFns["filepaths:* functions\n(with-extension, with-name, drop-extension, ...)"] IO["CL I/O\n(open, probe-file, directory, ...)"] Pathname -->|argument| LibFns Pathname -->|argument| IO end ``` #### Sample usage ```lisp ;; Construct and transform purely lexically (no I/O) (let* ((p (p:join #p"/home/you/code" "common-lisp" "hello.lisp")) (q (p:with-extension p "json"))) (values p q (p:parent q) (p:components q) (p:to-string q))) ``` #### Standard pathname substrate Common Lisp already has a first-class `pathname` object (with components like host/device/directory/name/type/version), and `namestring` is an implementation-defined textual rendering of a pathname. `filepaths` builds on this existing substrate rather than introducing a new path type. #### Lexical-only API by design `filepaths` explicitly focuses on structural/lexical operations (joins, component access, extension manipulation, structure tests) and avoids filesystem queries such as existence checks. This gives a sharp "names, not resources" boundary. #### Predictable "modern" operations and naming The library centers a set of operations that look very similar to what developers expect from newer path APIs: `join`, `parent`, `with-name`, `with-parent`, `extension`/`with-extension`/`add-extension`/`drop-extension`, and component conversion (`components`, `from-list`). #### Accepts either `pathname` or `string` Nearly every function accepts either a `pathname` or a `string`, and provides explicit "ensure"/conversion helpers (`ensure-path`, `ensure-string`, `to-string`, `from-string`) to normalize inputs/outputs. This is a pragmatic interop story in a Lisp ecosystem where APIs frequently accept "pathname designators". #### Errors via conditions For cases where `nil` is ambiguous, it signals dedicated conditions (e.g. `empty-path`, `no-filename`, `root-no-parent`), which aligns with CL’s condition system for recoverable errors. #### Tradeoffs * **Interop convenience vs. type clarity:** accepting both strings and path objects makes adoption easy, but allows "stringly-path" to remain pervasive, weakening the signaling value of path-typed APIs. * **Leverages CL pathnames vs. inherited complexity:** reusing `pathname` avoids ecosystem fragmentation, but also inherits long-standing complexity/quirks of CL pathname semantics and conversions. #### Limitations * **Not a filesystem abstraction:** there is no VFS concept, no filesystem context object, and no capability/authority model—only lexical pathname manipulation. * **Portability of textual syntax is constrained by CL:** because namestring syntax is implementation-dependent (outside of logical pathnames), any string-based portability story depends on the underlying implementation’s parsing/rendering choices. ### Frames' algebraic path schema > https://www.oxinabox.net/2016/09/14/an-algebraic-structure-for-path-schema-take2.html Frames proposes a minimal algebraic specification for "path schemas" (file paths, URLs, XPath, globs, etc.) by treating paths as structured names built from roots and relative components, and then separating that lexical structure from the effectful act of evaluating a path against a backing domain. #### Minimal core: roots, a free monoid, and a faithful action The model starts with a set of absolute roots $A$, relative components $R$, and a `pathjoin` operator ⋅ such that relative paths form a **free monoid** ($R^*$) and ($R^*$) acts *faithfully* on absolute roots, generating absolute paths ($A^*$). This aims to capture "paths as names" with as few primitives as possible, while still deriving most common operations. :::info I think that paths actually best fit an ordered monoid, since there's a sensible partial order. ::: #### Multiple roots as a first-class concern The framework explicitly allows *multiple* absolute roots (e.g. POSIX "/" vs Windows drive roots and other namespaces), and notes that "zero absolute roots" is theoretically possible but makes evaluation ill-defined for "absolute" lookup. #### Derived operations: `parentdir`, `basename`, `root`, `parts`, `within` Given the core structure, the post derives familiar operations—`parentdir` and `basename` (tail/head), and then `root`, `depth`, `parts`, and `within` (a restricted "relative" that only works when one path is nested within the other). The emphasis is that these are *lexical* and don’t require touching the backing system. #### Evaluation is intentionally effectful and many-to-one Resolution is modeled as an evaluation function (e: $A^* \to \mathcal{P}(D)$), mapping an absolute path name to a *set* of domain objects (to accommodate aliases, links, globs, XPath, etc.). The post distinguishes "MonoPath" schemas (0/1 object, like typical filesystem paths) from "MultiPath" schemas (0/many, like globs/XPath). #### `..` as a "pseudoparent" element and why it’s hard A key design lesson is that treating `..` as an ordinary relative component breaks the free-monoid structure; instead the post introduces a special element ($\varphi$) defined *via evaluation* (intuitively, "append $\varphi$ then evaluate equals evaluating the parent"). It then argues that POSIX `..` semantics are not purely lexical in the presence of symlinks, motivating designs that avoid collapsing `..` without filesystem access (and noting that some systems ban it outright). #### Normalization and `relative_to` depend on $\varphi$ A normalization function `norm` is defined to remove $\varphi$ where possible without changing evaluation, and `relative_to(x,y)` is defined using a common-prefix computation plus the necessary number of $\varphi$ "up-steps". The post explicitly notes that proofs of the normalization/equivalence properties are non-trivial and not completed there. #### Optional extensions: canonical names and directory-vs-file paths Two extensions are sketched: (1) "canonical name" schemas where evaluation is injective (one object ↔ one name), and (2) splitting file paths vs directory paths to restrict which joins are permitted—at the cost of losing a simple monoid over all relative paths (while preserving a free monoid for directory-relative paths). #### Tradeoffs * The high level of generality (covering URLs/XPath/globs/filesystems uniformly) clarifies what is *fundamental* versus *conventional*, but it can be too abstract to settle concrete API questions without additional constraints. * Introducing $\varphi$ enables familiar operations like `relative_to`, but it forces a careful split between lexical structure and effectful semantics—highlighting exactly where "stringy" normalization becomes unsound. #### Limitations * This is a *theoretical* specification rather than an implementation guide; many operational details (errors, permissions, encodings, etc.) are out of scope. * $\varphi$/`..` cannot be made fully POSIX-faithful without consulting the filesystem (symlink interaction), and some schemas (notably multipaths like globs) may not admit any $\varphi$-like element at all. * The "evaluation" function models *name → object(s)*, but does not model *handles/capabilities*; it stops at identifying resources rather than representing authority to operate on them. ### Zig’s path APIs > https://ziglang.org/documentation/master/std/#std.fs > https://ziglang.org/documentation/master/std/#std.fs.Dir > https://ziglang.org/documentation/master/std/#std.fs.path > https://github.com/ziglang/zig/issues/16736 Zig largely treats paths as byte slices (`[]const u8`) plus a set of lexical utilities (`std.fs.path`). Effectful filesystem operations are separated into `std.fs`, and are designed to be used primarily via open directory handles (`std.fs.Dir`) and relative paths, rather than "stringly" absolute-path APIs. ```mermaid flowchart TB subgraph ZIG["Zig: std.fs + std.fs.path"] Slice["[]const u8\n(path bytes)"] PathMod["std.fs.path\n(lexical helpers)"] Dir["std.fs.Dir\n(directory handle / capability)"] File["std.fs.File\nfile handle (OS FD wrapper)"] Slice -->|lexical ops| PathMod Dir -->|openFile(relpath)| File Dir -->|openDir(relpath)| Dir Slice -->|argument (typically rel to Dir)| Dir end ``` #### Sample usage ```zig // Lexical composition (allocates for convenience) const rel = try std.fs.path.join(allocator, &.{ "reports", "2026-01.csv" }); defer allocator.free(rel); // Directory-handle-centric I/O (File/Dir are the actual resource handles) var root = try std.fs.cwd().openDir("data", .{ .iterate = true }); defer root.close(); var f = try root.openFile(rel, .{ .mode = .read_only }); defer f.close(); ``` #### Paths are functions over slices, not a first-class "Path" type Rather than a dedicated path value type, Zig centralises path manipulation as functions that operate on `[]const u8` and expose platform-specific constants (e.g. directory separator and PATH-list delimiter). #### Dir-centric I/O (directory handles as capability-like authority) Filesystem operations are intended to be performed relative to a `Dir` handle (an OS-backed resource), which can reduce TOCTOU hazards and makes "what subtree do we have authority over?" more explicit in APIs and call graphs. The presence of `*Absolute` convenience functions is increasingly treated as legacy/avoidable, since many are thin wrappers over `cwd().*` calls. #### Explicit Windows-wide variants alongside UTF-8/byte-slice APIs Many APIs accept `[]const u8`, but Windows-specific variants exist that take wide strings (WTF-16), reflecting the platform’s native calling conventions and making encoding/interop concerns explicit at the API boundary. #### Tradeoffs * **Simplicity and performance:** treating paths as slices avoids object overhead and keeps path manipulation lightweight, but provides less structural guidance than a dedicated `Path` type. * **Handle-first ergonomics:** `Dir`-relative operations sharpen authority and robustness, but can feel heavier for simple scripts and push more users toward "plumbing" directory handles through APIs. * **Platform realism:** exposing wide-string Windows variants improves correctness/interop, but increases surface area and multiplies "which encoding do I use?" decisions. #### Limitations * **Single ambient filesystem context:** `cwd()` is the default anchor for many operations, and the model assumes the host OS filesystem semantics as the baseline context. * **No datatype distinction between paths and strings:** paths remain names (byte sequences); "handle-ness" only appears once a `Dir`/`File` is opened. ### Node.js `path` and `fs` > https://nodejs.org/api/path.html > https://nodejs.org/api/fs.html > https://nodejs.org/api/url.html#urlfileurltopathurl > https://nodejs.org/api/url.html#urlpathtofileurlpath Node splits lexical pathname manipulation (`path`) from effectful filesystem interaction (`fs`), but paths themselves are primarily represented as plain strings (with some `fs` APIs also accepting `Buffer` or `file:` URLs). The net effect is a clear separation of "build a name" vs "touch the filesystem", without introducing a dedicated path value type. ```mermaid flowchart TB subgraph NODE["Node.js: path + fs"] PathMod["path module\n(lexical operations)"] FsMod["fs module\n(effectful operations)"] PathLike["PathLike\n= string | Buffer | URL"] Str["string"] Buf["Buffer"] Url["URL"] Str --> PathLike Buf --> PathLike Url --> PathLike PathLike --> PathMod PathLike --> FsMod FH["fs.promises.FileHandle\n(handle wrapper)"] Streams["fs.ReadStream / fs.WriteStream\n(stream wrappers)"] FsMod -->|fs.promises.open| FH FsMod -->|createReadStream/createWriteStream| Streams end ``` #### Sample usage ```js // Lexical path operations (no I/O) const p = path.join("data", "results.csv"); const parts = path.parse(p); // { dir, base, name, ext, ... } const abs = path.resolve(p); // still just a string // Effectful operations (I/O) yield handles const fh = await fs.promises.open(p, "r"); try { const text = await fh.readFile({ encoding: "utf8" }); } finally { await fh.close(); } // URL interop when needed const fileUrl = url.pathToFileURL(abs); const p2 = url.fileURLToPath(fileUrl); ``` #### Stringly paths and "PathLike" inputs `path` operates over strings and returns strings. In `fs`, string paths are interpreted as UTF-8 sequences naming absolute/relative filenames, and relative paths are resolved against `process.cwd()`. Many `fs` APIs also accept `Buffer` and (for `file:`) WHATWG `URL` objects, which makes interop convenient but keeps the "path as text" boundary relatively soft. #### Platform-specific semantics with explicit `posix`/`win32` variants The default `path` behavior follows the host platform, while `path.posix` and `path.win32` provide explicit access to the other platform’s parsing/joining rules. This allows cross-platform manipulation without changing the underlying representation (still strings). #### Handles as file descriptors and `FileHandle` Resource access is mediated by OS-backed handles: the promises API exposes a `FileHandle` object with explicit `close()`, while other APIs expose numeric file descriptors. This matches the "handle is authority" story operationally, even though it is not reflected in a capability-oriented path resolution model (paths remain globally interpretable strings). #### Lexical normalization vs filesystem-aware canonicalization Node’s `path` functions are purely lexical; canonicalization and symlink resolution live in `fs` (e.g. `realpath`). This preserves a practical "names vs effects" separation, but with no distinct type-level boundary between lexical and resolved forms. #### Tradeoffs * **Simplicity and interop:** strings keep the surface area small and integrate naturally with the JS ecosystem, but do not prevent accidental mixing of "text" and "path". * **Cross-platform control vs ergonomics:** explicit `path.posix`/`path.win32` enables portable tooling, but correctness remains a caller responsibility because everything is still representationally a string. * **Flexible inputs vs clarity:** accepting `Buffer`/`URL` is pragmatic, but further blurs the conceptual model (multiple "path-like" carriers with different invariants). #### Limitations * **Single ambient filesystem context:** `fs` operations implicitly target the process-visible OS filesystem, with relative paths interpreted via `process.cwd()`. * **No capability model:** while handles exist (`FileHandle`/fd), the dominant API surface remains ambient (global path resolution rather than resolution relative to an explicit directory capability). * **No first-class path value:** the model cannot leverage types to encode invariants (absolute vs relative, platform flavour, lexical vs canonical) beyond convention and runtime checks. ### .NET's `System.IO` > https://learn.microsoft.com/en-us/dotnet/api/system.io.path?view=net-10.0 > https://learn.microsoft.com/en-us/dotnet/api/system.io.path.combine?view=net-10.0 > https://learn.microsoft.com/en-us/dotnet/api/system.io.path.join?view=net-10.0 > https://learn.microsoft.com/en-us/dotnet/api/system.io.path.getfullpath?view=net-10.0 > https://learn.microsoft.com/en-us/dotnet/api/system.io.file.openhandle?view=net-10.0 > https://learn.microsoft.com/en-us/aspnet/core/fundamentals/file-providers?view=aspnetcore-10.0 .NET’s core `System.IO` APIs treat paths primarily as `string` values, with lexical manipulation provided by the static `Path` utility class and effectful operations provided by `File`/`Directory` and stream/handle types. Stable authority over a resource is represented by streams/OS-handle wrappers created by opening a path, rather than by the path value itself. ```mermaid flowchart TB subgraph DOTNET[".NET: System.IO (+ common abstractions)"] StrN["string / ReadOnlySpan<char>\n(path representation)"] PathStatic["System.IO.Path\n(static lexical helpers)"] StrN --> PathStatic FSI["FileSystemInfo (abstract)"] FI["FileInfo"] DI["DirectoryInfo"] FSI --> FI FSI --> DI StrN -->|constructs| FSI SFH["SafeFileHandle\n(OS handle)"] FS["FileStream\n(handle wrapper)"] FS --> SFH FileProvider["IFileProvider\n(ASP.NET Core; VFS-style)"] FileProvider -.->|virtualized access| FSI end ``` #### Sample usage ```csharp var p = Path.Combine("data", "results.csv"); // lexical construction var full = Path.GetFullPath(p); // resolves against current directory var text = File.ReadAllText(full); // effectful: opens + reads using var h = File.OpenHandle(full); // explicit OS-handle wrapper using var s = new FileStream(h, FileAccess.Read); ``` #### `Path` as a purely-lexical utility surface `System.IO.Path` is a "path algebra" of string-in/string-out helpers (join, split, query), while filesystem effects live in other types. This keeps naming operations distinct at the API level, but does not create a first-class path value type. #### Joining semantics: `Combine` vs `Join` and rooted segments `Path.Combine` follows OS-like semantics: if any argument after the first is rooted, earlier components are discarded. `Path.Join` concatenates more mechanically (preserving duplicate separators), with normalization typically deferred to `GetFullPath`. #### Ambient resolution via current directory (and drive rules on Windows) `Path.GetFullPath` resolves relative inputs using ambient process state (current directory; and, on Windows, drive-relative conventions). A base-path overload exists to avoid depending on ambient state when determinism matters. #### Handles are explicit (and increasingly first-class) The canonical "resource handle" is a stream (e.g. `FileStream`), and modern .NET also exposes `File.OpenHandle` returning a `SafeFileHandle` directly, making the "name → handle" boundary explicit when desired. #### VFS-style abstractions exist, but outside `System.IO` While `System.IO` itself is OS-filesystem-oriented, .NET includes other, scoped abstractions for non-physical or restricted views—most notably ASP.NET Core’s `IFileProvider` (with physical, embedded, and composite providers). This is not the general-purpose `System.IO` model, and is intentionally narrower in scope. #### Tradeoffs * **Stringly paths:** maximizes interop and keeps the core surface small, but conflates text with namespace locations and limits type-driven correctness and dispatch. * **Convenient OS-like joins:** `Combine`’s rooted-path rule is pragmatic, but can produce surprising results if rootedness appears unexpectedly in inputs. * **Exceptions as the primary error channel:** simplifies common-case use, but makes "probe" patterns more awkward than explicit error-valued returns. #### Limitations * **No first-class path value type in the BCL:** path semantics remain "strings with conventions", despite a rich helper API. * **No first-class filesystem-context object in `System.IO`:** there is no standard `FileSystem` value you pass around to select a backend or restrict authority; alternate filesystem views are provided via separate subsystems (e.g. `IFileProvider`) rather than integrated into the core path+fs model. ### FilePathsBase.jl > https://github.com/rofinn/FilePathsBase.jl > https://rofinn.github.io/FilePathsBase.jl/stable FilePathsBase.jl is a Julia ecosystem attempt at a first-class path vocabulary type (`AbstractPath`) with platform-aware concrete path types (e.g. POSIX/Windows) and broad integration with Julia’s existing filesystem functions via method extensions. It treats paths as structured values (not strings), while still ultimately operating against the ambient host filesystem for the default "system path" types. It gets a lot of things right, but suffers from a few fatal flaws (such as type instability). ```mermaid flowchart TB subgraph FPB["FilePathsBase.jl"] AP["AbstractPath"] SysP["SystemPath"] Posix["PosixPath <: SystemPath"] Windows["WindowsPath <: SystemPath"] AP --> SysP SysP --> Posix SysP --> Windows Str["String"] Str -->|construct| AP AP -->|stringify| Str end ``` #### Sample usage ```julia p = p"data" / "results.csv" # path literal + /-join write(p, "hello\n") # effectful filesystem use via methods/extensions txt = read(p, String) ext = extension(p) # "csv" parent = parent(p) here = @__PATH__ # path-valued analogue of @__FILE__ ``` #### `AbstractPath` as a vocabulary type (not `AbstractString`) A central design choice is making paths distinct from strings, rather than a string subtype. This forces explicit conversions at boundaries and reduces accidental misuse (e.g. treating text as a filesystem name or vice-versa). The package also preserves `*` for string concatenation and uses `/` for path joining. #### Platform-specific path types with a "system path" default FilePathsBase provides separate path types for differing platform semantics (POSIX vs Windows) and a "system" default intended to match the running platform, analogous to other ecosystems’ platform-dispatched path aliases. #### Structured representation and a "path interface" Paths are modelled as structured values (conceptually segments plus platform rules). The docs define an interface for implementing new `AbstractPath` subtypes, including operations to access components and to support common filesystem behaviours. This interface is intended to allow alternate path kinds beyond the local OS, even if the default types map to the ambient host filesystem. #### Integration via method extension over `Base.Filesystem` Rather than introducing a new filesystem context object, the package primarily integrates by extending existing `Base.Filesystem`-style operations to accept `AbstractPath`. This makes the API feel "native" when it’s in scope, but relies on broad method coverage and consistent adoption. #### Tradeoffs - **Ergonomics vs. "ambient by default":** the default system path types implicitly target the process-visible host filesystem, which is convenient but keeps the core model tied to ambient context. - **Operator overloading vs. local surprises:** using `/` for joins is ergonomic, but it must coexist with division and with user expectations about what operators mean in Julia codebases. - **External package vs. ecosystem coherence:** because this lives outside Base, interoperability depends on adoption and on the completeness of method extensions across Base/stdlibs and third-party packages. #### Limitations - **Not a Base-level abstraction:** it cannot enforce a ubiquitous "paths are paths" vocabulary across the ecosystem, so friction remains at API boundaries (conversion, missing overloads, mixed conventions). * **No explicit capability or filesystem context object:** authority and context remain largely implicit (host filesystem, process CWD), rather than being represented as explicit handles/capabilities. - **Type instability:** using `Tuple{Vararg{String}}` for segments makes access/modification `O(1)`, but makes any operations that require the full path `O(depth)` which isn't great. ### FilePaths2.jl > https://gitlab.com/ExpandingMan/FilePaths2.jl An experimental rethink of typed paths in Julia from 2023, motivated by making path objects usable for non-local backends (notably S3) without implicitly triggering expensive "update" operations. It treats every path as a tree node (via the AbstractTrees interface) and imposes strict rules on what is inferable from path strings and when remote calls are allowed. ```mermaid flowchart TB subgraph FP2["FilePaths2.jl (conceptual core)"] AP2["AbstractPath\n(AbstractTrees node)"] Trees["AbstractTrees.jl interface\n(children, parent, ...)"] PS["PathSpec\n(absolute path string wrapper)"] AP2 -.->|implements| Trees AP2 -->|typically backed by| PS EffectBoundary["Only certain ops may be effectful\n(children, readdir, walkpath,\nispath/isfile/isdir, ...)"] AP2 --> EffectBoundary end ``` #### All paths are tree nodes All path types are required to implement the `AbstractTrees.jl` interface, so that generic traversal utilities (e.g. `walkpath`) can be expressed in terms of standard tree algorithms rather than re-implemented per path type. This frames "a path is a node in a namespace tree" as a semantic guarantee, not just a metaphor. :::info This is complicated by symlinks, which is not addressed in the current design of FilePaths2.jl ::: #### Strict semantics about what can be inferred from strings FilePaths2 leans on the fact that "complete" paths can be described by strings from a root, and uses a `PathSpec` wrapper to support purely-parsed operations like determining parents and testing ancestry/antecedence relationships. A key consequence is that path strings used to construct path objects must be *absolute*, or must refer to an existing resource so the absolute path can be inferred. #### Disallowing relative paths as a semantic constraint Relative paths are treated as a "pun": they depend on ambient global state (e.g. a current directory) which may be ill-defined or nonsensical for remote backends, and they weaken what can be inferred (e.g. parent relationships). Relative forms may still exist as *constructors* or *views*, but the path object model aims to be absolute/anchored. #### Explicit control of remote calls To avoid accidental network traffic and to make the "reasoning footprint" of operations clear, only a small set of functions are permitted to perform remote calls, and these must accept an `update` keyword (even if `update=false` can still error when the object lacks enough information). The README lists: `AbstractTrees.children`, `readdir`, `walkpath`, `ispath`, `isfile`, `isdir` (with an explicit note the set may be incomplete). #### Treating key-value stores as trees by construction For S3-like systems (key-value stores that "masquerade" as filesystems), the design asks: "what tree can be constructed from only what the API provides?". The proof-of-concept infers tree structure solely from key strings, which yields meaningful definitions for questions like `isdir` (e.g. "is this a strict ancestor of some leaf?") while acknowledging constraints (e.g. no truly empty directories). #### Tradeoffs * **Stronger invariants vs. friction:** requiring absolute/anchored paths and restricting inference rules improves semantic clarity (especially for remote backends), but rejects common local patterns (relative paths) and may require more explicit anchoring in user code. * **Cost transparency vs. ergonomic opacity:** concentrating remote calls in a small set of "update-permitted" functions makes costs auditable, but can surprise users when seemingly simple queries error unless `update=true` is provided. * **Tree-first abstraction vs. backend mismatch:** forcing a tree model onto S3 enables generic tooling, but necessarily bakes in approximations (e.g. "directories" inferred from key prefixes) that are not native to the store. #### Limitations * **Incomplete:** the project is a prototype, not a complete alternative to FilePathsBase.jl ## Virtual Filesystems ### Java's NIO2 ### Python's fsspec ### Go's io/fs ### Go's afero ### Plan 9 (9P) ### WASI