This proposal aims to define the memory model of Julia and to provide certain guarantees in the presence of data races, both by default and through providing intrinsics to allow the user to specify the level of guarantees required. This should allow native implementation in Julia of simple system primitives (like mutexes), interoperate with native system code, and aim to give generally explainable behaviors without incurring significant performance cost. Additionally, it strives to be general-purpose and yet clear about the user's intent—particularly with respect to ensuring that an atomic-type field is accessed with proper care for synchronization.
The last two points deserve particular attention, as Julia has always provided strong reflection and generic programming capabilities that has not been seen—in this synergy combination—in any other language. Therefore, we want to be careful to observe a distinction between the asymmetries of reading vs. writing that we have felt is often not given attention.
In Julia today, multi-threaded code can only be written to perform limited tasks, or in ignorance of the true weakness of the guarantees provided by our current semantics. Additionally some necessary system primitives can also only be implemented by inlining a complicated llvmcall, which can be brittle and limiting. This proposal aims to change that by adopting some more sane defaults for shared variables and exposing new functionality in the builtin functions by adding a specification of memory synchronization ordering for them.
One such aspect is guaranteeing that any data written in the struct initializer is always visible to any place where the struct pointer can be seen, as is already the guarantee for single-threaded execution. This is particularly relevant for ensuring the type-tag of the object is always valid and that the fields of immutable objects (or simply constant fields) always are seen to have their correct value. This implies a release barrier must exist between the end of the constructor and the first time the object is stored to a shared location from a given thread. This has some subtleties to explore later when we discuss the new defaults.
A related aspect is ensuring that we want to never invent pointer reads or writes "out-of-thin-air." This implies using LLVM's "unordered" memory model (the minimum atomic level) for any read or write to a field that may contain or be a pointer. This is sufficient to prevent torn reads and writes (which is likely what we'd expect from the optimizer anyways), generally without preventing any other optimizations that could have applied.
In addition to those basic guarantees that references can't be broken (which could potentially crash the garbage collector, as well as could cause more prosaic corruption of the runtime), users need the ability to pass data between threads. In most cases, they are best served by using a lock. But in some cases (such as the implementation of aforementioned lock), it's necessary to give defined behavior to concurrent mutations of a particular field of an object.
Firstly, in defining this work, it's necessary to specify our memory model for concurrent access. We propose that we can't do better than adopting the current C/C++ standard as our primary inspiration. However, we also would like to adopt some aspects of the Java standard that suit our requirements. Some notable aspects of this include:
We currently have two basic classes of memory operations (read and write) that can be done on three different object types (raw pointers, object fields, and array elements):
Intrinsics.pointerref
/ Intrinsics.pointerset
(and unsafe_load
/ unsafe_store!
)
Core.getfield
/ Core.setfield!
/ Core.isdefined
(and getproperty
/ setproperty!
)
Core.arrayref
/ Core.arrayset
/ isassigned
(and getindex
/ setindex!
)
This proposal aims to add a new trailing argument to each of these operations that gives the memory ordering for the operation. Additionally, we add a trio of new functions for each of these sets, from which all other behaviors can be variously expressed: swap, modify, and compare-and-swap.
These memory orderings can be any of the following symbols. More precise definitions can be found elsewhere (in particular, at https://llvm.org/docs/Atomics.html#atomic-orderings), so here we give only the particulars of how they will be mapped inside the Julia runtime.
not_atomic
: single-threaded usage only, or inside a lock. This is the default behavior for most operations. This cannot be used to down-grade an atomic write, but simply names the default parameterization for a non-atomic access. And when applied to an atomic field, it'll be automatically upgraded to a monotonic
read.monotonic
: This ensures atomicity of the operation, but otherwise does not ensure the code makes forward progress (do not use in a loop), but only ensures that the operations are seen monotonically making progress within a thread.acquire
: This is an error to specify on a store.release
: This is an error to specify on a load.acquire_release
: This is an error to specify on an operation that doesn't both read and write memory (so it's an error to specify on a load or store, and the release will be ignored for a replacefield!
failure).sequentially_consistent
: Sequential consistency is the default behavior for @atomic
when the ordering is not specified.The specification and constraints on how these apply to individual operations is particularly nicely described here: https://doc.rust-lang.org/std/sync/atomic/struct.AtomicI8.html
We also propose having a higher level API behind an @atomic
macro that can translate common expressions to call their atomic counterparts.
This includes some details on using arrays atomically, but completing our array specification will be done later, and will be adapted to inherit details from this field design, though not all constraints and features may be found to be feasible. Presently, we recommend continuing to use locks and other barriers (such as channels and spawn/wait) to bulk synchronize accesses, or choose data-structures better suited to handling conflicting modifications on the same memory address.
We feel it is valuable to make basic promises about memory safety, extending those that the GC already promises for single-threaded code. Particularly in a well-designed and general-purpose language like Julia, these design decisions must be taken as a whole. For example, we consider it a necessity to be able to write generic code over the contents of a struct via reflection. While we permit this code to exhibit some aspects of undefined behavior, we do not want it to violate those essential promises. This means that certain operations (for example, enumeration of the globals in a module) should exist and should be free from undefined behavior.
This property generally has little runtime cost, but omitting it might impose a high mental cost on the user or complicate generic code. Sort of like a garbage collector—while theoretically you can write bug-free code in C, this has shown to be quite difficult. And then we measure that C can actually perform worse than a GC language because of this insistence that it should be possible to write bad code (the possibility of bad code can greatly impact the correctness of certain compiler transforms that are otherwise essential for performance). While C's weak guarantees motivate and necessitate the use of tools like LLVM's ThreadSanitizer, making this automatic alleviates that concern from the user. Note however that since we're following the Java memory model here, any tooling designed and useful for that language would also apply here.
We propose adding an @atomic
macro, that wraps expressions and rewrites them to use atomics. It should handle the following syntax forms:
@atomic x
(@atomic x)::Int
@atomic a.b.x
@atomic :acquire x
@atomic :acquire a.b.x
and
@atomic local x
(@atomic x::Int) = 3
mutable struct RWLock
@atomic readers::Int
@atomic writer::Union{Task, Nothing}
@atomic data
end
These operate the same as the non-atomic variants of each expression usage and declare the existence of a shared memory location (global variable, local variable, captured variable, or field declaration) that can be mutated from multiple threads. If used in statement position, it would also be a load of that atomic value.
For the first three forms, these macros expand to Expr(:atomic, expr)
, and lower to the appropriate metadata for a declaration. For a load, it additionally emits a normal call, such as getproperty(a, :x, :sequentially_consistent)
(which in turn calls getfield(a, :x, :sequentially_consistent)
).
@atomic x = y
@atomic a.b.x = y
@atomic :release x = y
@atomic :release a.b.x = y
This mutates the location x
to contain y
, where x
has been declared @atomic
, and y
is any expression. For example, it could itself be an atomic load from something else, such as @atomic y
.
These macros expand to a builtin call, such as setproperty!(a.b, :x, y, :sequentially_consistent)
(which in turn would dispatch to setfield!(a.b, :x, y, :sequentially_consistent)
, for example).
These loads and stores default to providing sequential-consistency semantics (this is equivalent to volatile
in Java). This can be refined by specifying an argument that explicitly gives the ordering for the operation. For example, @atomic :release x = y
for an atomic release store (such as would appear inside a lock's unlock
method).
Already mentioned was the new argument to Core.getfield!
, Core.setfield!
, and Core.isdefined
. Additionally, we propose defining 3 additional functions to describe all processor capabilities: Core.modifyfield!
and Core.replacefield!
and Core.swapfield!
. They can both do much more than the basics provided by the processor though, through some slight compiler heroics. They could be merged into one also, but there's little to be gained from that. These will be documented as follows:
"""
getfield(value, name::Symbol, [order::Symbol) getfield(value, i::Int, [order::Symbol)
Extract a field from a composite
value
by name or position. Optionally, an ordering can be defined for the operation. If the field was declared@atomic
, the specification is strongly recommended to be compatible with the stores to that location. Otherwise, if not declared as@atomic
, this parameter must be:not_atomic
if specified.
"""
"""
setfield!(value, name::Symbol, x, [order::Symbol) setfield!(value, i::Int, x, [order::Symbol)
Assign
x
to a named field invalue
of composite type. Thevalue
must be mutable andx
must be a subtype offieldtype(typeof(value), name)
. Additionally, an ordering can be specified for this operation. If the field was declared@atomic
, this specification is mandatory. Otherwise, if not declared as@atomic
, it must be:not_atomic
if specified.
"""
"""
isdefined(m::Module, s::Symbol, [order::Symbol]) isdefined(object, s::Symbol, [order::Symbol]) isdefined(object, index::Int, [order::Symbol])
Tests whether a global variable or object field is defined. The arguments can be a module and a symbol or a composite object and field name (as a symbol) or index. Optionally, an ordering can be defined for the operation. If the field was declared
@atomic
, the specification is strongly recommended to be compatible with the stores to that location. Otherwise, if not declared as@atomic
, this parameter must be:not_atomic
if specified.
"""
"""
Core.swapfield!(value, name::Symbol, x, [order::Symbol) Core.swapfield!(value, i::Int, x, [order::Symbol)
These atomically perform the operations to simultaneously get and set a field:
y = getfield!(value, name) setfield!(value, name, x) return y
"""
"""
Core.modifyfield!(value, name::Symbol, op, x, [order::Symbol]) Core.modifyfield!(value, i::Int, op, x, [order::Symbol])
These atomically perform the operations to get and set a field after applying the function
op
.y = getfield!(value, name) z = op(y, x) setfield!(value, name, z) return y, z
If supported by the hardware (for example, atomic increment), this may be optimized to the appropriate hardware instruction, otherwise it'll use a loop.
"""
"""
Core.replacefield!(value, name::Symbol, expected, desired, [success_order::Symbol, [fail_order::Symbol=success_order]]) => (old, success::Bool)
These atomically perform the operations to get and conditionally set a field to a given value.
y = getfield!(value, name, fail_order) ok = y === expected if ok setfield!(value, name, desired, success_order) end return y, ok
If supported by the hardware, this may be optimized to the appropriate hardware instruction, otherwise it'll use a loop.
The higher-level API will mirror that of the non-atomic structure. In particular, @atomic :acquire a.b
would lower to getproperty(a, :b, :acquire)
instead of getproperty(a, :b)
as it does now. In each case, the order
argument is optional, and defaults to :sequentially_consistent
when not explicitly specified in the macro. Other syntax forms supported by @atomic
to include:
y = @atomic a.b.x += 1
y = @atomic :acquire_release a.b.x += 1
This turns into a call y = modifyproperty!(a.b, :x, +, 1, :acquire_release)[2]
x, y = @atomic max(a.b.x, 1)
x, y = @atomic a.b.x max 1
x, y = @atomic :acquire_release max(a.b.x, 1)
x, y = @atomic :acquire_release a.b.x max 1
This turns into a call x, y = modifyproperty!(a.b, :x, max, 1, :acquire_release)
x = @atomicswap a.b.x = 1
x = @atomicswap :acquire_release a.b.x = 1
These turn into a call x = swapproperty!(a.b, :x, 1, :acquire_release)
(note only the modification to x
is atomic here).
The last operation we cover is compare-and-swap. That is also created as a generic function replaceproperty!
, and can written similar to the replace!
function with a pair showing the expected old value and the desired new value.
old, success = @atomicreplace a.x expected => desired
xchg = expected => desired
old, success = @atomicreplace :acquire_release a.x xchg
old, success = @atomicreplace :acquire_release :monotonic a.x xchg
Similar to their non-atomic counterpart setproperty!
, the swapproperty!
and replaceproperty!
functions call convert(fieldtype(typeof(x), f), v)
on the value to store for convenience. The modifyproperty!
function does not provide this convenience (TODO: should it?).
Some highlights of this design include:
ConcurrencyViolationError
exception to be thrown.new
), the values may be stored without observing the atomic declaration. When first publishing the object to a shared location, the user must there decide what kind of ordering to give to those prior operations (usually either release or sequentially_consistent).show_any
).@atomic :monotonic a.x += 1
should get upgraded to include a release barrier if we decide not to allocate x
inline.)modifyfield!
).getfield
, arrayref
, pointerref
, isdefined
) each gain arguments for the atomic ordering, which they validate against the declaration requirement. The first two will be added to the same builtin function. For pointers, we'll add new intrinsics, since the behaviors with atomics may be more restricted and divergent than for fields (which we have more control over).replace
and modify
) which combine aspects of getfield
and setfield!
. We also will add (swap
) for simplicity, though it could be derived from either of the other two.:invoke
with that of each new builtin operation class. Here's how we use it:Given Core.modifyfield!(x, i, +, fv)
in the source code, we could teach inference to generate the optimized code for +
there, but not yet fully inline it there.
Then during codegen, we could examine the content of +
and decide to emit either that loop above, or the atomicrmw version.
This means it might need some dramatically unique inlining: given Core.modifyfield!(x, i, +, fv)
, we'll transform this into a special representation which includes the funclet as metadata in place of the call-site, similar to invoke
:
Expr(:invoke_modifyfield!, MethodInstance(:+, Int, Int), x, i, +, fv)
While recursive, this seems like just the right amount of information for codegen to be able to inspect the object. But that theory remains to be tested.
However, note that this may not be as dire as it may sound. Some systems, such as Java and even Intel's own icc
used the CAS loop for many years.
TODO: finish fleshing out details of the new compiler feature needed for this proposal
TODO: where do we allocate the lock, currently it is at the beginning of the struct, but should it be at the end?
At the implementation level, one subtle point to be aware of is that the replace
may spuriously fail due to padding bits or other ways of getting an egal object with a different representation. This (and for other similar reasons) means that the operator in modifyfield!
might get called multiple times. It also implies that some seemingly trivial cases (such as a field typed as Any
), actually require a loop around the replace
, until the representation converges to a stable value, or until the old value is not egal to the expected value.
It is worth mentioning also that floating point numbers (Float64, Float32, and Float16) are not unique in this context either (particularly for replace
): we operate on their representation bits, like in egal testing, and do not use floating point comparisons.
In addition to the additions discussed above, similar intrinsic functionality will be added to Ptr also. Unlike their generic brethren, they will be much more limited in capability however. The Ptr object must be suitably aligned (otherwise the behavior is undefined) and the element type must be a power-of-two in size (otherwise an error will be thrown). Initially, the sizes supported may be limited as well (only 0, 1, 2, 4, and 8). These operations shall be defined suitably for defined interoperability with C code operating on the same pointers, per the relevant C standard memory models.
atomic_fence(order::Symbol) => Nothing
atomic_pointerref(p::Ptr{T}, order::Symbol) => old::T
atomic_pointerset(p::Ptr{T}, x::T, order::Symbol) => p::Ptr{T}
atomic_pointerswap(p::Ptr{T}, x::T, order::Symbol) => old::T
atomic_pointermodify(p::Ptr{T}, func, x::T, order::Symbol) => (old::T, new::T)
atomic_pointerreplace(p::Ptr{T}, x::T, expected::T, success_order::Symbol, failure_order::Symbol) => (old::T, success::Bool)
_
to separate words to improve readability of the long onesgetproperty
and atomic_getproperty
and such, but eventually rejected this idea and merged them into the same generic function. While the later name might have been clearer for searching code for a consistent pattern, we believe static tooling can be better and more general for statically identifying uses. The dynamic checks on field type already should help with flagging any mistakes.getproperty(x, y) = getproperty(x, y, :not_atomic)
, but currently this directly calls getfield(x, y)
. This should infer slightly better, and have somewhat better inlining characteristics, despite the implied duplication for places in the code that want to overload this.SequentiallyConsistent
on one of the arguments, probably the index or field name, instead of passing the memory ordering as a separate value. This idea seemed to have considerable merit. It would particularly be handy if wrapper array types, for example, would forward the ordering along with the index through any intermediate types. Eventually, we concluded that it was unclear that this would propagate correctly. This probably also would require frequent unwrapping and rewrapping. In attempting to write sample code, we also observed that this seems likely to be incorrectly handled by existing code: many wrappers would likely need to take locks in order to correctly synchronize their operations. Indeed, this complexity is also one of the reasons that array access isn't covered in this design proposal.@atomic
on both the field declaration and writes. Not requiring it on field declarations would severely limit the flexibility or performance, since there are alignment and padding requirements. Not requiring it on writes would make it easy to accidentally write a.x += 1
instead of @atomic a.x += 1
, and get the wrong behavior.@atomic
for field reads would inhibit generic programming and reflection. The risk of mistakes here is thought to be very low, since there must be a write for there to be a data race, and those require explicit annotation.monotonic
or not_atomic
depending on the field declaration (or sometimes even unordered
). We considered strengthening these reads all of the way to sequentially_consistent
. However, without a memory write
there's much less that we would be synchronizing at the machine level, and without explicit intent, there's probably not even anything we'd be synchronizing at the algorithm level. So we'd be adding potential runtime cost, but without even a clear idea of what value we gained by doing that additional effort.:not_atomic
is therefore not the same as the default behavior of not specifying an ordering.modifyfield!
:
modifyfield!
function (through a macro) for all related operations is unusual among programming languages that we surveyed. This could carry some minor risk in the time required to implement this, but we feel this is worthwhile in order to expose the intrinsic API in a predictable and extensible way. Notably, the GPU folks already have additional functions they wish to support beyond LLVM's keyword-based atomicrmw
function.atomic_nandfield!
/ @atomic x.a = nand(x.a, y)
). Indeed, this is already the approach already taken in Base.Threads
. However, this felt fundamentally tedious and non-scalable.x = @atomicmodify :sequentially_consistent :nand r.a 1
/ x = modifyfield!(r, :a, :nand, 1, :sequentially_consistent)
. However, this seemed to still suffer the same issues as having separate functions, with the additional awkwardness of not even being first-class functions individually.The astute reader may have noticed there are many facets of using atomics that we have not specified. These are typically left as future work, but we still wish to acknowledge some of them here. Those include:
@atomic
or any intrinsics for lowering that either (translation: we don't support @atomic a[i]
or getindex(:sequentially_consistent, a, i)
or arrayref(:sequentially_consistent, a, i)
). We expect this is likely to be added at some future time, due to being a logical feature extension to complete the set of memory operations. However, there are additional complexities with this that make it seem rare a user would be using atomic operations on individual entries of an array, both due to the algorithmic complexity of such usage will likely prefer towards wrapper objects that define locks, and the rate of cache misses would be likely also to kill performance of such an algorithm.replacefield!
/setfield!
to set an #undef
value exactly once. It can be simulated with Union{Nothing,T}
in many cases, but we may want to provide the ability to indicate conditional assignment. This perhaps fits most closely with setfield!
's definition, since there is no old value. But currently there is no way to specify it. Perhaps it could be an ordering? @atomic :if_undef a.x = 1
or @atomic :release_if_undef a.x = 1
icc
use CAS loop instead of hardware intrinsics (https://travisdowns.github.io/blog/2020/07/06/concurrency-costs.html#fn:java)getAndUpdate
method: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/AtomicInteger.html#getAndUpdate-java.util.function.IntUnaryOperator-fetch_update
method: https://github.com/rust-lang/rust/issues/48655Working Group Minutes: Julia Atomics
Possible start of an atomic-annotated array hierarchy
## struct LocalAccessArray{T, N::Int, atomic::Bool} <: AbstractArray{T, N} end
## const AtomicArray{T, N} = LocalAccessArray{T, N, true}
## const Array{T, N} = LocalAccessArray{T, N, false}
## a = AtomicVector{Int}()
# Completely generalized form combining read-modify-write and compare-exchange.
# This is just a curiosity really, but it's a very general purpose operator. and so demonstrates
# what we can provide inside the runtime as the expansion and implementation of the
# new proposed intrinsics.
function modifyfieldif!(f, t, x, i, fv, tv, success_ordering, fail_ordering=success_ordering)
old = Core.getfield(x, i, fail_ordering)
while t(tv, old)
new = f(fv, old)
temp, eq = Core.cmpxchgfield!(x, i, old, new, success_ordering) # (function doesn't exist)
eq && break
old = temp
GC.safepoint() # Temporary solution before we have gc transition support in codegen.
end
return old
end
# implement (atomicrmw add v)
always = (v, old) -> true
modifyfieldif!(+, always, r, :a, v, nothing, :sequentially_consistent, :sequentially_consistent)
# implement (atomicrmw add 1)
modifyfieldif!(
(v, old) -> 1 + old,
(v, old) -> true,
r, :a, v, nothing, :sequentially_consistent, :sequentially_consistent)