# Persistent Libraries
This document explains a design for persisting *Libraries* and the *Module*s
that comprise them to that we can stage them once, store them in Templandia,
and reload them when a recompile needs them again.
As the Temper library ecosystem expands, most libraries will reside be
downloaded on demand based on *import* instead of being under the current
work root. Not re-compiling the universe every time `temper build` is run
is a nice performance boost.
This document is divided into a few pieces:
- How to find a persisted module given a *LibraryName*.
- Freshness metadata: how to determine quickly, from a persisted library,
whether it is up-to-date with the current build environment:
- the version of the compiler that persisted it,
- a hash of its source file path and content,
- hashes of persisted libraries it depends upon.
- Persisted format overview; what are the main pieces of data in a
persisted library file, how are they generated when persisting
and how are they consumed when un-persisting.
- The flow of persisting and unpersisting libraries.
- Code-generation
- How we annotate Kotlin classes used by the compiler and generate
code from them that handles persisting and un-persisting them.
- Provenance
- Which library does a thing belong to?
## Goals
Allow for quick recompiles by saving work done by earlier invocations of
`temper build` and reusing it in subsequent builds.
For example, we should be able to compile *std* once, save it to local
disk, and load it back into memory faster than it takes to pass its modules
through the staging process.
## Non-goals
It is not a goal of this design to allow for re-using persisted modules created
by one version of the Temper toolchain by a later version of the Temper toolchain.
Specifically, this does not establish a binary distribution format for Temper
libraries, just a local cache of work done.
## Templandia file layout: How to find a persisted library given a *LibraryName*
For each Temper library, we might have several versions in play.
- If the library is in a work-root on the local machine, we can use the
canonical work-root path as an identifier.
- If the library was downloaded into Templandia, we can use the semver identifier.
```kt
sealed class LibraryNameAndVersion {
abstract val libraryName: DashedIdentifier
}
data class RemoteLibrayNameAndVersion(
override val libraryName: DashedIdentifier,
override val version: SemVer,
) : LibraryNameAndVersion()
data class LocalLibrayNameAndVersion(
override val libraryName: DashedIdentifier,
override val workRoot: SystemPath,
) : LibraryNameAndVersion()
```
In addition to the library name and version, it might be good to recognize which
version of the Temper toolchain is using them because the file format might differ
by different versions of the toolchain.
We might, in the future, generate code differently for different backends.
If we don't need to, we can use a generic backend-id like `-any`, but if so,
designing for a place for the backend id in the persist/unpersist flow will
avoid headaches down the road.
A very tentative file layout would be:
.templandia/.built/<toolchain-version-tag>/<backend-id>/<library-name>.temper-prebaked.json
The rest of this document assumes the file format is JSON based but a
goal of the use of code generation below is to allow experimentation
with and benchmarking of different approaches to persisting.
## Version tagging the toolchain
As a Temper team contributor is working on a new version of the compiler, they
shouldn't be bothered by frequent complaints that their changes to the Kotlin
source files that make up the compiler cause attempts to load *std* to fail
with an exception because it can no longer be unpacked.
But the *\<toolchain-version-tag\>* used by stable distributions of the compiler
should map to something recognizable to users so that they can file bugs.
A stable version has a resource file loadable from a well-known location that
specifies the semver.
`gradle cli:deploy` creates a temporary version tag based on a hash of the
toolchain's files, and bundles it with the deployed gradle application.
Running tests locally comes up with a similar temporary version tag, but does
not persist it based on `git rev-parse <LOCAL-REPO_ROOT> HEAD` but including
untracked-but-not-gitignored files.
In short, there is a Kotlin class, *ToolchainVersion*, that exposes a string
which is one of:
- `stable-<semver>` when the stable version semver tag resource file exists
- `deployed-<date>-<hash>` when the `gradle cli:deploy` was run and bundled a
resource file with the hash
- `development-<date>-<hash>` or falls back to invoking `git` to compute a
fast hash of content for the current JVM run.
## Persisted file format overview
Assuming a JSON like format for explanatory purposes, the outer layer looks
like the below:
```json
{
"persisted-by": "<stable-1.0.0>", // toolchain-version-tag
"library-name": "<my-library>", // redundant self identifier in case people
// upload a file in a bug report without its path
"source-hash": "<HASH>", // SHA hash of relative file paths and file content
// of files under this library's root.
// Files are sorted lexicographically by OS-independent
// file path.
"backend-id": "<backend-id>",
"depends-on": [
// key names have the same meaning as above, but for other libraries
{ "library-name": "<other-library>", "source-hash": "<HASH>" },
...
],
"ref-table": { // ref-tables explained below
"<reference-key>": {<reference-value>},
...
}
}
```
## Freshness: Can a persisted module be reused?
The `depends-on` key above allows us to check whether a group of persisted files
are internally coherent, whether source-hashes match with those in the `depends-on` list.
Inconsistencies might happen if we have libraries with dependencies like the below:
depends-on-lots --depends-on--> depends-on-some --depends-on--> depends-on-none
Consider the following sequence of events
- `temper build` builds `depends-on-lots` and its two transitive dependencies.
- Source files for `depends-on-none` change.
- `temper build` builds `depends-on-some` which generates two persisted library files
but does not update `depends-on-lots`'s persisted library.
- Some temper toolchain command tries to load `depends-on-lots` from the persisted
file, but it's out-of-sync with its dependencies' persisted library files, so we
rebuild it (using the persisted library files for its dependencies) and repersist
it.
So it's fine if hashes don't match, and we can solve the problem by aborting unpersisting,
restaging its modules, and then persist a newly-consistent library file.
## Ref-table architecture
We have Kotlin classes like the below that we will probably have to persist.
```kotlin
data class Value<T : Any>(
stateVector: T,
typeTag: TypeTag<T>,
) : Result { ... }
```
And persisting that might require persisting `content`, for example, if the value has a
*UserFunctionValue*, or if the *typeTag* is a *TClass* for a user-defined type.
Maybe one library constructs a *TClass* instance using a *TypeShape* defined in another
library.
When persisting we have a persisting context:
```kotlin
class PersistingContext {
val refTables: MutableMap<LibraryName, RefTable>
}
```
That lets us look up a reference table.
If we know, for each thing we persist, which library it comes from, we
can look up the ref table, allocate a *reference key* if necessary,
and store a *pre-persisted form* in the table.
A pre-persisted form is just a list of key/value pairs, where a value
is a ref-table entry or an *unowned value* (see below).
A ref-table entry is identified by:
- a library name (or in the JSON form, a small integer index into the dependencies list)
- a key into that library's ref-table (or in the JSON form, an int index into the ref-table list)
It doesn't make sense to store some values in a particular library's ref-table:
values like `null`, `false`, `true`.
If a ref-table entry value is not a JSON object, then it's an *unowned* value, and we assume
the un-persister knows how to deal with it.
```kotlin
typealias RefTableKey = Int
class RefTable {
val valueToRefTableKey<Persistable, RefTableKey>
val refs: MutableMap<RefTableKey, ComplexPersistedResult>
}
```
As noted, a *Persisted* is either:
- a simple, unowned value, represented in JSON as a non-JSON object value,
- or it's complex so it needs a ref-table key and is representible as a series of
string-key/value pairs.
(More on how those are derived and converted back into values later)
And the values in a series of string-key value pairs are themselves either
simply persisted, or are references to a row in a ref-table (*Persisted.ByReference*).
```kotlin
sealed PersistResult
/** A form that can be easily serialized to an entry in a persisted library file */
sealed interface Persisted {
sealed interface SimplePersisted : Persisted, PersistResult
object NullValue : SimplePersisted
data class BooleanValue(val b: Boolean) : SimplePersisted
data class BytesValue(val x: WrappedByteArray) : SimplePersisted
data class DoubleValue(val x: Double) : SimplePersisted
data class FloatValue(val x: Float) : SimplePersisted
data class IntValue(val x: Int) : SimplePersisted
data class LongValue(val x: Long) : SimplePersisted
data class StringValue(val x: String) : SimplePersisted
data class ByReference(
val libraryName: DashedIdentifier,
val refTableKey: RefTableKey,
) : Persisted
}
data class ComplexPersistResult(
/**
* Optionally allows pairing an arbitrary value with a class that knows
* how to un-persist it.
*/
val typeTag: KClass<Unpersister>?,
val keysAndValues: List<Pair<String, Persisted>>,
) : PersistResult
```
As can be seen in *RefTable* above, we also keep a cache from *Persistable* to
an assigned integer key.
A *Persistable* is something that can be hashed in a ref table for equivalence, so
we can avoid generating duplicate entries.
Farther below we talk about how we generate *Persistable* implementations so these
need not be hand maintained, and so that they unpersist correctly.
```kotlin
/**
* A persister converts a value that needs to be persisted to
* a [PersistResult], and also allows checking whether
*/
interface Persister<T> {
fun persist(pc: PersistingContext): PersistResult
/** Maybe box a */
fun keyFor(x: T): Persistable
}
/**
* Keys into a reference key table.
*/
interface Persistable {
/**
* Which library *owns* this for the purpose
* of Temper library persistence
*/
val persistProvenance: DashedIdentifier,
}
```
## Unpersisting and relationship to persisting flow
An un-persister knows how to reverse persistence by a *Persister*.
```kotlin
interface Unpersister<T> {
fun unpersist(pc: UnpersistingContext): RResult<T, MalformedPersistFileException>
}
```
For a library, we are starting with some library metadata and a list of staged modules.
For a set of co-compiled libraries, we do the following:
1. create a blank *PersistingContext*,
2. get `Persisters.getPersister<LibraryModuleAndMetadataPersister>()`,
an inline method that accesses a generated class,
3. reserve key 0 for each library's persistable *LibraryModuleAndMetadata*
instance,
4. pass each *LibraryModulesAndMetadata* instance to the persister from 2
to fill in the ref-tables,
5. look at the reference values to figure out the *dependencies*
section of each ref-table file,
6. convert each ref-table along with its library metadata and
dependencies, to a file, and
7. write those files to disk using the file path convention above.
Unpersisting involves a similar process:
1. read in a persisted file,
2. read in more persisted files by looking at the *dependencies* list,
3. store a failure result for any library with a dependency hash mismatch,
or missing dependency file,
4. let *toUnpersist* be the set of files with coherent (*transitive*) hashes,
5. create a blank *UnpersistingContext*,
6. for each entry in each *toUnpersist* file's ref-table, create a *RefRable*
entry pointing to a *PendingUnpersist\<\*\>* wrapping,
7. for each *RefTable*, access the *PendingUnpersist* at row 0 and pass it
to the *Unpersister* for *LibraryModulesAndMetadata* accessed similarly to
(2) above via `Unpersisters.getUnpersister<...>()`.
8. If any application from (7) resulted in something other than *RSuccess*
report errors and exit with overall failure.
9. Fold the library metadata from the persisted files into *LibraryModuleAndMetadata*
and return an indicator of broken fiels from (3) and the successfully unpersisted
libraries from (8).
## Code-generation: Making it easy to persist and un-persist Kotlin class instances
There are many Kotlin classes that need to be persisted and
un-persisted and we need the flexibility to evolve those classes
without writing and re-writing persisting and un-persisting code.
We also need to be sure, at compile time, that libraries can be
persisted; that there's not some rarely used class that the system
doesn't know how, via reflection, to persist or un-persist.
We use annotations and code-generation to produce *Persister*s and
*Unpersister*s for code types.
A gradle task, `gradle kcodegen:u`, updates Kotlin files that define maps
used by `Persisters.getPersister<T>()` and `Unpersisters.getUnpersister<T>()`
which relate [*KType*s](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.reflect/-k-type/)
to implementations.
Unaided persisters require no additional annotations:
- If a KType with @GeneratePersister is sealed, then we generate a persister by
looking at each sub-type and apply un-aided persisters as below, or for
concrete types, if the key name sets are distinct we generate a persister
by keeping a map from sets of key strings to the unpersisters for those types.
- for `object` (singleton) sub-types, we unpersist a string like
`object:lang.temper.KotlinObjectClassName`
to the object value.
- for `enum` sub-types, we relate a string like `enum:<EnumMemberName>` to each
enum member
Aided persisters are required for complex classes with fields.
- `val` fields in the constructor and type body and zero argument mehods may
be annotated with `@PersistedField` optionally with a field name.
- If the field name differs from the parameter name in the constructor
(or factory, see below), `@UnpersistParamName("name")` may be used.
- If the field needs to be set after construction, `@UnpersistLate` may be used
to exclude it from the constructor/factory parameter list, and to have the
unpersister generate a field assignment.
- To disambiguate the type from others that might have the same key set,
`@PersistNeedsTypeTag` allows adding
a field with a type tag as in *ComplexPersistResult.typeTag*.
- To specify a factory function for use by the persister, specify `@UnpersistFactory`.
With no argument, it applies to a static method of the unpersisted type.
- `@UnpersistFactory` can also be passed an `object` with an `operator invoke` method
that can be referenced by class name in generated code and used to construct a
value on unpersisting as in
`@UnpersistFactory(object : UnpersistFactory<T> { operator fun invoke(args): T { ... } })`.
Going back to the *class Value* example from above, there are some complications because
we'd like to persist *Value*'s of type *TString* using unowned strings; the type tag
knows how to persist/unpersist the state vector.
```
// This is sealed, so we need a Persister and an Unpersister implementation
// based on the sealed type rules above.
sealed class PartialResult : Persistable {...}
sealed class Result : PartialResult() {...}
// Falls into the singleton branch of the unaided.
object NotYet : PartialResult() {...}
// For this type, we need to specify that both constructor fields are persisted.
// That way the generated persister produces something like
// { "stateVector": ..., "typeTag": ... }
// and the generated unpersister uses a call like
// Value(typeTag = ..., stateVector = ...)
// stateVector.
data class Value<T : Any>(
@PersistedField
@UnpersistAfter("typeTag")
// We need to unpersist the typeTag first
// so that the typeTag can specify the persister/unpersister for
@PersistUsing(object : Persister<T> by typeTag.persistorForValue)
@UnpersistUsing(/* invokable object that takes type tag and fetches its value unpersister */,
// Which parameters to pass to the unpersister getter
"typeTag")
stateVector: T,
@PersistedField
typeTag: TypeTag<T>,
) : Result() { ... }
```
For generic Kotlin types like *Map* and *List* we need to generate persisters that
wrap the element type persister on demand.
## Provenance rules
`@PersistProvenancer(object ...)` also allows us to figure out which library a
persistable is part of, its *provenance*.
In many cases, the answer is simply the larger structure we're persisting,
so we need multiple views of *PersistContext* from the point of each *current*
library.
For some *Value*s the answer is more nuanced.
We need to rewrite *Interpreter* to store information with *Value*s so that
we can avoid simplifying to a constant across library boundaries.
But we have some good rules of thumb:
- For *TType* values, the library of the declaring module owns it.
- For *TClass* values, the library of the module that constructed the value
owns it.
- For *UserFunction* values, the library of the declaring module owns it.
- For *BuiltinFun* values, the provenance can be the current library;
most will be unaided `object`s anyway.