volatile 2.0 - HackMD

# volatile 2.0 Attendee: Ralf Jung, Amanieu, Alexandru Radovici, Crystal Durham (CAD97),... Remote attendee: Demi Obenour Notetaker: Gary Guo ## Use cases Ralf: let's collect use cases first Amanieu: 3 ways people use volatiles. * Atomic variable where C people use before atomics -> should just use atomics instead * Memory-mapped I/O, for bus size with required access size. * Ariel: need synchronization to ensure DMA correctness * Gary: DMA need a special fence * Boqun: I/O also have a special fence * Two process don't trust each other but share a memory-mapped file. Write data and set a ready flag (which is just an atomic). But since you don't trust the other side, the receiver side cannot guarantee that race will not happen on the other side. * Boqun: debuggers can poke memory while process is running. Amanieu: if we can cover the last case then it should cover most cases. Ralf: for AM purpose.. Alice: for DMA, writing a memory that HW has access to is similar to a just normal write. Amanieu: we don't care what other process is doing, it's fine as long as this part is defined. Ralf: concurrency memory model doesn't have a notion of UB in part of system. UB and data race are global. Ariel: decompile assembly to atomic accesses and define like that? Amanieu: at assembly level, all read and write are atomic, there is no data race. Demi: A process fork itself to... Ariel: if forked, the one side is AM and the other side can be treated as just assembly Bart: don't we need to worry about tear even for machine writes? Gary: in C there is also a case to prevent optimization. Ralf: not a legit use case. Ariel: benchmarking is not in the spec anyway Demi: at least in cryptographic code, volatile is used to prevent memset from being removed. Ralf: if they do this, then it's just weird but not UB. They need to inspect the assembly to ensure it's doing correctly. We won't actively try to break it but there's also not a guarantee. Mario: hands-off compiler. Similar to inline assembly. * Ralf: we need actual use cases. Ralf: observed behaviours. Accessing memory is not observable behaviour. So load can disappear, but println can't. For MMIO, it makes the read/write actual observable. Importantly, atomics are not actually observable. * Demi: need specific instruction to access the MMIO as well. You want to desugar to inline assembly. Alice: I have another use case. Seqlocks. * Ralf: it needs atomics. * Alice: but it need byte-wise ones. * Amanieu: that's similar to third use case. * Alice: but for seqlock you can actually get concurrency writes. * Amanieu: we have an RFC for per-byte atomics. (https://github.com/rust-lang/rfcs/pull/3301) * Ralf: for seqlocks you don't want this to leave the AM, so volatile sounds the wrong solution. * CAD97: does this solve the DMA case as the buffer filling part can use per-byte atomics. * Demi: is per-byte atomic efficient as people would expect word-level accesses for performance? * Ralf: per-byte memcpy doesn't actually have to do it per-byte. It'll be a perf bug if it's not as fast as the normal memcpy. * Demi: normal memcpy is allowed to read twice and UB if read bytes are different, but it'll be stupid to actually do it. Bart: can we have a crate that just does inline asm and if we have a solution we can change that crate? Ralf: volatiles are already widely used. They also have a weird behaviour of allowing arbitrary large types. Amanieu: can we have API of volatiles similar to atomics and restrict to certain types only. Ralf: we can't remove the existing APIs. Counterproposal: make volatile API today do what people expect for small accesses. Alice: ARM uses a wrong instruction. [Please insert background reading] * Nikita: an ARM bug that needs to be fixed by LLVM. * Alice: for MMIO, we should fix volatile to use the correct instruction. Amanieu: volatile is not about only the right size. `(u16, u16)` volatile access would cause two accesses to be generated. Ralf: we can guarantee for certain types. Nikita: can we make it work for these types and make access non-volatile for other types? Ralf: probably break too much code. Bee: read volatile struct with padding, padding that doesn't get read. * Ralf: volatile read returns a value, so what is read is lost anyway because there's a typed copy afterwards. * Nikita: for LLVM, if we lower instruction in a way with an explicit struct field to replace the padding then it'll be read, but otherwise it won't. * Ralf: if we want read anything you can read array of u8. * Trevor: bytemuck/zerocopy has a trait to ensure struct has no padding. * Ralf: we shouldn't spec anything about lowering Rust to LLVM. Ralf: for the use case: messaging passing with hardware. Data writes are not volatile so it's not observable, only the flag write is observable. Jakob: this is correctly modelled if you consider the HW to be another thread with a sufficiently strong release. (discuss about release fences) Demi: there's an AM useful for formal reasoning, and then there's a model used for programmers which reason use lowering instead.. Gary: can we just say volatile accesses are ordered with fences? * Ralf: we can make all volatiles atomics? CAD97: can we tie this into the same table that guaranteeing atomics? Ralf: u64 store on u32 Jakob: what's the case of making all volatiles atomics? Gary: atomic are not observable. Is the proposal making volatiles per-byte atomics for the atomic memory model *and* have observable behaviour of guaranteeing non-teared accesses (for sufficently small integers). Boqun: for MMIO the barrier has to be special. Can we have a special barrier with new guarantee. Ralf: but barrier is between things that can't observe. Instead of doubling the model that we have, we should rather just define it with something that we currently have. Nikita: what memory ordering is interested for volatile atomics. * Ralf: so far we don't have any ordering so it'll just be relaxed. * Nikita: would it be feasible to change LLVM to define volatile to always be atomic relaxed. * (some LLVM impl discussion) * Progress guarantees Demi: if there's a Rust impl where addresses are not real address, then we need to ensure volatile emits accesses to the "real address". Alexandru: MMIO that activate the cache would also need ordering between the write with any code running after it. Alice: for inline asm fences, we can say in AM it's a release fence, but would it be as strong as a release fence? Ralf: a fence strong for MMIO should also be good for normal memory? Gary: RISC-V has separate fences for I/O and normal memory. (a fence can also fence both). * Ralf: weird, but if'll probably be fine if we rationize them as compiler fence instead? Bee: Compiler can't reason about inline asm so it wouldn't optimize reasoning that this is a release fence. * Ralf: but if we axiomtize it as a release fence then a programmer can get rid of it?

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.