Array
using a new lower-level container typeFrom its inception, Julia has had an Array
type intended to be used any time you need
"a bunch of things together in some order". This is one of the most important and most unusual
types in the language. When Array
was designed, it was believed (possibly by multiple people,
but Jeff is willing to take the blame) that the easiest-to-use interface would be a
single array type that does most of what users need. Many Julia programmers now feel that
the language and community have outgrown this trade-off, and problems have emerged:
Array
has many features, with an unusually large amount of functionality written in C.
This makes the code harder to maintain and harder for the compiler to optimize.Array
is slower than it should be, since it is implemented with branches
in C instead of compiler specialization.Array
struct layout is opaque, inhibiting optimizations and understanding. One notable
example of this is push!
which is expected to be fast almost all of the time, but has significant
overhead when resizing the array is not needed.Array
, even if tolerable in most cases, is a
significant cost when arrays are used to implement other types like Dict
.Array
has the following "fancy" features beyond the obvious function of storing a series
of elements:
Array
s to share data without copying. Optionally,
we will call free
on the pointer when the Array
is garbage collected.Array
can use a String
as storage to allow zero-copy string initialization.Here we propose a new lower-level storage type and to reimplement Array
in Julia using it.
Also Core.GenericPtr
and Core.AddrSpace
We propose a new type with the following representation:
primitive type AddrSpace{From::Module} 8 end
const CPU = bitcast(AddrSpace{Core}, 0)
primitive type GenericPtr{T, AS::AddrSpace} sizeof(Int) end
const Ptr{T} = GenericPtr{T, CPU}
mutable struct GenericMemory{kind::Symbol, T, AS::AddrSpace} <: AbstractVector{T}
const length::Int
const ptr::GenericPtr{Cvoid, AS::AddrSpace}
# The following fields are hidden:
# Either:
elements::T... # `length` elements stored inline
# Or:
owner::Any # owning object of `ptr`
end
const Memory{T} = GenericMemory{:not_atomic, T, CPU}
The intention of this abstraction is to represent "a range of memory beginning at a fixed
location". The memory space can either be directly after the metadata fields (in which
case the ptr
field is redundant) or refer to foreign memory via the ptr
field. In
both cases ptr
points to the data. When the data is stored in-line, we predict no
extra overhead from loading the ptr
field since it shares a cache line with the data.
Memory
has constructors similar to those of the Vector
type. GenericMemory
generally does not have any constructors.
kind
parameterThe kind
type parameter specifies the required access style of the data, currently intended to support either atomic (:atomic
) or "plain" (:not_atomic
) access.
The details and implementation for :atomic
have not yet been written. In brief though, this is expected to be similar to @atomic
fields, with implied :monotonic
order on getindex
(unless specified), but explicit ordering required on setindex!
. The alignment will be increased to support atomic access as required for the current hardware, and locks will be added (per element) if necessary.
In the future, there are other possible uses for kind
such as :const
.
addrspace
parameterThe addrspace
parameter is required to be Core.CPU
for allocations in main memory. It is present to permit extension by other systems such as GPUs, which might define values such as:
module CUDA
const Generic = bitcast(Core.AddrSpace{CUDA}, 0)
const Global = bitcast(Core.AddrSpace{CUDA}, 1)
end
The exact semantics of these other addrspaces is defined by the specific backend, but will error if the user is attempting to access these on the CPU.
The order of the type parameters in GenericMemory
might be surprising. We did it this way for two reasons:
GenericMemory{T}
, but that would create an abstract type, which we want to discourage. The Memory{T}
alias solves this.To handle very large data, it is useful to expose the ability to expand a region of memory
by extending the OS kernel's mapping. This is supported because one can construct a new Memory
struct with a larger length
which points to the same data as the original Memory
.
Growing blocks by moving, or any form of shrinking blocks, is left to higher-level
abstractions to implement.
Note this type does not and cannot implement resize!
or related functions like push!
,
because it has different semantics.
Several specific interface functions have been proposed here (names are provisional):
expand(mem, n)
: Grow size to n
elements if possible, otherwise do nothing. Returns a new Memory
or nothing
.expand_upto(mem, n)
: Grow size to at most n
elements, possibly fewer, returning the new (or same) Memory
.struct GenericMemoryRef{kind::Symbol, T, AS::AddrSpace} <: Ref{T}
# Note that since this is an immutable struct it will be stored inline in `Array`.
mem::GenericMemory{kind, T, AS} # `Memory` holding data
# Points to a location within `mem`; either an actual pointer
# or an offset (if T is zero size or a Union).
ptr_or_offset::GenericPtr{Cvoid, AS}
end
const MemoryRef{T} = GenericMemoryRef{:not_atomic, T, CPU}
The GenericMemoryRef
type is similar to the existing RefArray
type, in that it points to a location within another object. Unlike that type, it has special codegen support so that the element offset computation can be elided.
Note that GenericMemoryRef
here is always a valid index (unlike C/C++/LLVM GEP which lets it point one element past the end), except when the GenericMemory
is empty, in which case accessing it can throw a BoundsError.
memoryref(mem::Memory)::MemoryRef
Return a MemoryRef constructed from a Memory, pointing to the first element.
memoryref(mem::Memory, idx::Int, [boundscheck::Bool=true])::MemoryRef
memoryref(mem::MemoryRef, idx::Int, [boundscheck::Bool=true])::MemoryRef
Return a MemoryRef constructed from an existing Memory or MemoryRef and an offset, with the
boundscheck optionally disabled (unsafe).
memoryrefget(mem::MemoryRef{kind, T}, order::Symbol, boundscheck::Bool)::T
Load the element of a MemoryRef and return it, with the given memory order and
with the isempty check optionally disabled (unsafe). The `order` must be
compatible with the `kind` parameter.
memoryrefset(mem::MemoryRef{kind, T}, x::T, order::Symbol, boundscheck::Bool)
Store `x` to a MemoryRef with the given memory order and with the isempty check
optionally disabled (unsafe). The `order` must be compatible with the `kind`
parameter.
memoryoffset(mem::MemoryRef)
Get the integer index of a MemoryRef's location in its underlying Memory.
The process of getting or setting elements is decomposed into two operations: constructing a MemoryRef
, and then accessing it. The reason for this is that constructing a MemoryRef
is consistent and effect-free, but loading and storing elements are not. In particular the bounds check is part of MemoryRef
construction, and these properties make code transformations easier.
arraylen
, arrayref
, arrayset
Array
definitionmutable struct Array{T, N} <: AbstractArray{T, N}
ref::MemoryRef{T}
axes::NTuple{N, Int}
end
For N==1
, the axes and ref
are mutable and axes[1]
is the length.
For other N
, the fields are const
, and the length is .ref.mem.length
or prod(.axes)
.
Base.StringVector
returns a Vector
whose storage is a String
, allowing conversion
via the String
constructor without copying. This is used to initialize (immutable)
strings efficiently. Declaring the data field of Array
as Union{Memory, String}
is not a good idea.
We handle this by making an Array
whose memory is backed by a Memory{UInt8}
whose storage is owned by a new String
.
Important
When calling String(::Vector)
, we set the Vector's length to 0 and replace its Memory
with an empty one. The String
-backed Memory
will still exist, but we assume code will not reference it directly.
Compilers always want to know when a value is constant. Array
has the complication
that the size is constant only when N≠1. Our layout system cannot express this, so
it needs to be hacked into the compiler and will continue to be.
Array
and Memory
This design requires allocating two objects (an Array and a Memory) for most arrays, giving it a slight disadvantage over the current implementation in some cases where the Array does not later change size. We think this could be improved by adding a feature to the GC to allocate two objects at once: (1) Space big enough to hold both objects is allocated, (2) A header bit in the first object indicates that it is followed by another, (3) The compiler recognizes this pattern:
Array(length) = new(Core.Memory(length))
and replaces it with a single allocation call that contains both objects. We do not currently believe this is necessary however.
Memory
Because all zero-sized memory has no actual mutable content, we can instead allocate only one of these for all consumers. This can be stored in the .instance
field when first allocated and reused for all future allocations as an optimization (like the zero-size String optimization).
Secondly, this is the one case where MemoryRef
may point to invalid memory, so we have a boundscheck parameter on read/write access to verify this case is not present.
It is important (especially in the GC) to have fast access to the layout metadata
for an array's element type. Thus, we set Memory{T}.layout
to the layout
of the element type but with the opaque layout flag set. unwrap_unionall(GenericMemory).layout
holds the layout metadata for the GenericMemory
type itself.
Previously, Array
did layout computations on array construction, which was one reason allocating arrays was surprisingly slow. Union
types were especially a problem since unlike DataType
they do not have a field to store a layout.
We need to consider how this design will affect advanced array optimizations including SROA of arrays, allocation elision, and allocation hoisting. The two-layer object structure might make this more difficult.
For efficiency, the GenericMemoryRef
data field is a Ptr{Cvoid, AS}
to the referenced element for most arrays, but needs to be interpreted as an Int index for bitsunion and ghost value (elsize=0) elements. The Base.memoryoffset
, pointer
, and Core.memoryref
functions can be used to convert between the two representations.
The addrspace Loaded will now have a function form, which combines a Tracked pointer and a raw pointer to return a Loaded pointer.
# memory(none) nosync nounwind speculatable willreturn norecurse
declare nonnull noundef ptr(Loaded) @"julia.gc_loaded"(
ptr(Tracked) nocapture nonnull noundef readnone,
ptr nonnull noundef readnone) {
top:
%metadata GC base pointer for rooting is %0
ret (addrspacecast ptr %1 to ptr(Loaded))
}
The codegen for it will attempt to move this over some GEP call chains, and later the -llvm-julia-licm pass will do more of that as well.
push!
is ~2.2x faster in the common casefunction f()
X = Int[1,2]
sizehint!(X, 100)
for n in 3:100
push!(X, X[n-1] + X[n-2])
end
X[end]
end
Int[]
)Dict
, IDDict
, and Base64
's buffer type use Memory
to save space (for Dict
it saves 72 bytes per instance).Memory
(80% faster for empty Dict{Int, Int}
)resize!
ing it. This makes BitSet()
roughly 2x faster.Vector
which was previously inaccessable memory to julia.