Jakob Degen
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Do we need provenance? ### What is provenance? Provenance is the umbrella term for any state that a pointer holds in addition to the address it points to. In other words, a memory model has a concept of provenance if it is possible for two pointers to be numerically equal, but different in some other dimension. I will briefly give two examples of what provenance can look like: - C and LLVM both have "allocation level provenance." This means that every pointer "remembers" which allocation it belongs to, and using a pointer that originated in one allocation to access memory in another is undefined behavior. - [Stacked Borrows](https://plv.mpi-sws.org/rustbelt/stacked-borrows/paper.pdf) is a proposed aliasing model for Rust. The provenance it suggests consists of two parts: An **allocation id**, used to implement the same "allocation level provenance" as described above, and a **tag**, which describes what part of the borrow stack a pointer belongs to. To emphasize: It is not the goal of this meeting to discuss which kind of provenance we should have, only to discuss whether we should have *any*. ### Reasons not to have provenance The biggest downside of provenance is complexity. The existence of provenance means that authors of unsafe code must always not only be concerned with whether the pointer they have points to the right place, but also whether it has the right provenance (in practice, this means "was obtained the right way"). Not having provenance ensures that this is never a problem - all pointers that point to the right address are equally valid to use. ### Reasons to have provenance #### Optimizations Many (most?) optimizations done by compilers require some form of *alias analysis*. This is an analysis that reports when two memory operations might alias each other. Alias analysis benefits greatly from notions of provenance since this generally means there is more UB and more information with which to justify optimizations. For example, without provenance, the following program must be DB as the two marked lines are entirely equivalent. This means it is unsound to optimize to `print(0)`. However, with allocation level provenance it is possible to call this program UB (i.e., it is possible to declare that `q` and `p+1` do not alias) without the commented line being UB, and so the optimization can be permitted. ```c= char p[1], q[1] = {0}; uintptr_t ip = (uintptr_t)(p+1); uintptr_t iq = (uintptr_t)q; if (iq == ip) { *(p+1) = 10; // <-- This line // *q = 10; // <-- And this line print(q[0]); // can be optimized only with provenance } ``` Similarly, it has long been desirable for it to be sound to optimize code like this: ```rust= fn foo(x: &mut i32) -> i32 { *x = 10; bar(); *x } ``` It's very difficult to see how to make this optimization sound without provenance. Ralf had [attempted](https://www.ralfj.de/blog/2017/07/17/types-as-contracts.html) such a model in the past, but it was unsuccessful in a number of ways. #### LLVM LLVM IR (despite its lack of a clear spec) recognizes a notion of allocation level provenance. Compiling Rust to LLVM IR if Rust does not recognize provenance is likely to be impossible. We'd probably have to insert a `black_box` after every allocation and every memory access, and it's not clear that that is enough. As far as I know there is no option to turn this off, and the assumptions are sufficiently widespread that it is unlikely that we could convince upstream to add one. --- ### Appendix: Stabilizing parts of the strict provenance APIs This topic initially came up because Ralf suggested [on Zulip](https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Stabilizing.20strict.20provenance.20APIs.3F) that parts of the strict provenance APIs be stabilized. This lead to some confusion, and so I want to go over here exactly what is being proposed, and the reasoning for it. The proposal consists of two parts: - Stably acknowledge the existence of provenance, ie acknowledge that the answer to the question in the title is "yes" - Stabilize the following methods: `pointer::addr`, `pointer::with_addr`, `ptr::map_addr`, `core::ptr::invalid`, `core::ptr::invalid_mut` :::spoiler Brief description of what they do - `addr`: Yields the address of the pointer as a `usize`. There is no expectation that the address can be turned back into a usable pointer via an `as` cast. - `with_addr`: `p.with_addr(a)` yields a pointer that has the same provenance as `p` but the address `a` - `map_addr`: Literally ```rust= fn map_addr(self, f: impl FnOnce(usize) -> usize) -> Self { self.with_addr(f(self.addr())) } ``` - `invalid`, `invalid_mut`: Yield a (possibly `mut`) pointer that has the given address, but some kind of null/None provenance and so is illegal to dereference. ::: That is the entirety of the proposal. We are not suggesting any change to the memory model that would make these APIs required to write correct code. There are also other methods under `#![feature(strict_provenance)]`. We are not suggesting stabilizing those at this time. Some of those methods imply a PNVI-like treatment of ptr2int and int2ptr casts. (PNVI being an explicit and detailed provenance proposal for C.) We are also not suggesting stabilizing those concepts at this time. The "main goal" of the these APIs is to allow users to operate on the provenance and address of a pointer separately. This in turn allows users to be precise about how they handle provenance, which makes the meaning of their code clearer both to humans and to the compiler, which will have more opportunities to optimize. It is Jakob's (and others?) opinion that these APIs represent functionality that is fundamentally necessary in a language with provenance. The `expose*` methods and the associated semantics are not an alternative, but rather a possible addition. (Of course, that does not necessarily mean that this particular combination of functions is the right way to expose this functionality, we are still free to bikeshed that :) ) --- ## Discussion ### invalid/invalid_mut take a parameter Yes. It just constructs a pointer with the given address that you *cannot use* Use cases: - Making pointer to zero-sized types (which you cannot use) without making null serve that purpose (because you want to preserve its usage as a niche) - If you're interacting with some API (via FFI or otherwise) that accepts a pointer but that pointer does not need to actually access memory - Q from Felix: How would you know if the FFI requires provenance-tagged pointers? (I guess API of foreign function will promise to not dereference it) - Jakob: Yes. - Gary: Can we define a ZST deref to pointers with `invalid` provenance as UB and have a special provenance for ZSTs that allow dangling use of it in `Vec`? - Ralf says defer this for future meetings. - Q: Can this be used with embedded when I know there is some memory at a fixed address? - Jakob: Probably not. We don't have a good story for these kinds of allocations yet, but I would be surprised if this was the case - Gary: I don't think so, you would have to use `from_exposed_addr` for that. - Felix: maybe would need some notion of `volatile` in order to have the original question's proposal make sense, in terms of stopping compiler from optimizing things in way that interfere with such accesses? (Or just use `from_exposed_addr` as suggested.) ## Questions ### LLVM workarounds Scott: Black box "every allocation" would need to do this for `alloca`s too, right? Not just `__rust_alloc`? Ralf: Yes. Heap and stack allocations are not fundamentally different. These next two are Scott attempting to summarize after the fact: > Scott: ok, so even if we had a way to do this in LLVM, it would be completely unacceptable because it would mean nothing goes into registers ever and performance would be terrible > > Ralf: Adapting LLVM is almost certainly not going to work, yes, so it would need a compiler build to not need provenance Josh: Wouldn't completely rule out use of registers, but it'd require compiler backend support that doesn't exist and nobody is working on or has plans to work on. ### LLVM without provenance Felix: are there languages (with references) with LLVM backends that do not support provenance? (I'm imagining gymnastics one could do, e.g. massive-array for memory + every pointer is an integer, but there's not much point in going down that road unless there's some existing evidence that it can ever work out well.) Jakob: Hard to answer in the negative, but I don't know one. ~~As a reference example, Go uses LLVM as well, but (based on my limited understanding) its rules are "what LLVM does."~~ (Edit: wrong) ### Can we use aliasing rules instead of provenance Gary: Can we use aliasing rules to cover many optimisations without having provenance? Discussion: our goal is to develop a model that would justify code that is leveraging Rust's current rules e.g. `&mut`; but those syntactic rules alone are not sufficient to be able to e.g. justify the code that exists in the Rust stdlib itself. See also Ralf's earlier model linked above (https://www.ralfj.de/blog/2017/07/17/types-as-contracts.html) ### Does comparison between pointers take provenance into account? Ralf: No, it doesn't; comparison ignores provenance. ### (Defer until after "should we have provenance" is answered): Should we have `expose` or not? Josh: Yes, we should. Seems necessary for widespread adoption of provenance. Gary: We still have `as` which needs to be using the `expose` for compatibility reasons, right? Scott: Yes, I think we need it, because <https://github.com/rust-lang/rust/pull/95583> says to me that if we don't have the method to compare against, people will start using `.addr()` in a place where they needed to expose. ### How useful is `invalid` and `invalid_mut`? Gary: Specifically, what optimisation would we gain from them compared to just use `from_exposed_addr()`? Connor: Because the compiler knows the pointer cannot be dereferenced, it can assume the pointer does not alias any other pointer. Ralf: Also the entire point of these APIs is not to use `exposed`/`as`. ### What other arguments are there against provenance? y86-dev: I have the feeling that the arguments are highly in favor of provenance. Are there any other arguments other than increased complexity? Josh: "doesn't match the mental model many programmers have", but that still doesn't change how compilers work. :( ### What are we telling people? nikomatsakis: I'm catching up here but I'm trying to understand what we would be telling folks if we were to stabilize these APIs. The text above explicitly says they're not needed to write correct code. But there is *something* we're deciding here, I'd like to see it written down. Is it... > The final model is not determined, but if you can use these APIs as described, your code is free of UB (due to aliasing restrictions) ? Jakob: I think we are telling people 1) provenance exists and your code might be wrong if it pretends it doesn't, and 2) these APIs exist and they do what they say they do points raised during call: * these APIs protect you w/r/t pointer-to-integer conversions, but it'd still be possible to violate `&mut` or other aliasing restrictions One thing I would make sure is clear about for if and when the wider audience sees this announcment is that this affects unsafe code, and won't affect safe. Josh: Is expose well-defined at this point, can people use that without UB? ralfjung: it has a spec, but it's more questionable, uses angelic non-determinism. If you use integer-to-pointer casts to access memory that doesn't alias Rust data, would definitely be ok (e.g., a magic number outside domain of Rust you'd like to use a a pointer). ### should we have expose at same time? scottmcm: people will try to do this anyway, using addr if they have it, as you can see from the /from-bits/to-bits/ discussion jakob: as casts are probably a better way for people to write this, we have to deal with those anyway (--interpred, ed.) joshtriplett: I think we should not steer people towards new uses of as-casts, if we're going to give guidance, let's steer them to something like expose ralf: can't replace all the code that uses `as` with these APIs, we should encourage folks to use those subset when they can joshtriplett: saying let's not have people switch *to* `as` casts jakob: but stabilizing expose feels premature, still concerned about possible effects of angelic nondeterminism

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully