# Nullable types and flat option equivalence
Temper has some *optioning backends*, Rust, C# (strangely), and eventually OCaml where a null type translates to an option type. (Swift calls their nullable types option types, but this seems to be marketing)
`Foo?` is a nullable type; it admits all the values that `Foo` admits but also admits the special value `null`. The `null` value can be differentiated from the other values: `someFoo != null`.
## Goals
**Support optioning languages** by translating nullable types even where there's no `null` value or that, like C#, don't allow its use with type parameters that can bind to both value and reference types.
Provide **flexible container types** like `List<String>` and `List<String?>` for bundling results and for intermediate computations.
Provide **consistent semantics for null checks**: `x == null`. Specifically avoid option nesting that would lead `Some(None)` to be spuriously recognized as distinct from `null`/`None`.
Clarify **which subtype-like relationships are translatable**. There are a number of type relationships related to subtyping that are used in Temper. What restrictions do we need to put on them, and which cause translation problems?
Explore the **null / void / unit** relationship in more detail. Specifically, is it possible to have a *Void* type with *Unit* semantics while still preserving the semantics/translatability of `x == null` where `x` could be null.
## Design Sketch
We can translate `SomeType?` easily enough to an option type.
When a `SomeType` appears in a method return type because it narrows a super-type method that was `SomeType` we can detect that using the same kind of signature adjustment checks we need for Java and C#.
The source of complexity discussed here is generic type parameters.
- Is `T` an optional type? Do we need to pack an option when passing a value into a `T` or unpack when the return type is `T`?
- Could the type `T?` introduce a nested option type?
- Can a `T` be `null`? We want to avoid asking `.is_some()` when the value representation is not statically known to be an option.
The rest of this document discusses a 3-way distinction for type parameters.
1. An agnostic type parameter like that for `class List<T>` which is agnostic to whether `T` is nullable. It could be an option or it could not. That way, Temper `List<String>` could be a Rust `Vec<Str>` but `List<String?>` would be a `Vec<Option<Str>>`
2. A definite option type parameter. With this, you can ask if a value of that type is `null`. Binding a non-nullable type to this type parameter might involve auto-wrapping and unwrapping above.
3. A definite non-null type parameter. With this, you can introduce `T?` as a distinct type. Extension functions like `List.getOrNull` could use this.
(Note: A capital letter inside angle brackets `<X>` refers to a type parameter declaration. A single capital letter name, eg `X`, without the angle brackets denotes a use of that declared type in the signature or body of the declaring type/method.)
These distinctions are made on type parameter definitions, so we can clarify the cases using the syntax below though all three distinctions can be made in-language without extra syntax if we do not infer an *AnyValue* upper bound for type formals without an explicit upper bound and make it illegal to mix nullable and non-nullable upper bounds:
`<T~ extends ...>` means `T` is agnostic. `T` could be nullable or not. It does not make sense to call methods on values of type `T~` since any value could be `null`.
`<U! extends ...>` means `U` is not a nullable type, so `U?` is a distinct, usable type. You can call methods on values of type `U` based on its upper bounds.
`<V? extends ...>` means `V` is nullable and translates to an option type on optioning backends. It makes sense to ask whether a `V` is null or not. Inside an `if (myV != null)` context, we could allow `myV.method()` uses based on upper bounds with nullity erased because it's safe to unwrap the option on optioning backends.
## Syntactic Sugar
As a syntactic convenience, `<T extends ...>` is one of the above.
- `<W extends ...>` is `<W! extends ...>` if it has no nullable upper bounds.
- `<W extends ...>` is `<W~ extends ...>` if it has an upper bound and all its upper bounds are nullable.
Since a type parameter can have, as an upper bound, a previously declared type parameter, the below are equivalent because *nullable* above includes references to `<T?>`:
| Unqualified | Equivalent to | Because |
| ---- | ---- | ---- |
| `<W extends AnyValue>` | `<W! extends AnyValue>` | the upper bound is not nullable |
| `<W extends AnyValue?>` | `<W? extends AnyValue?>` | the upper bound is nullable |
| `<W>` | `<W!>` | W has an implicit upper bound of *AnyValue* which is not nullable |
| `<V?, W extends V>` | `<V?, W? extends V>` | W's upper bound is nullable |
| `<U! extends ..., W extends U>` | `<U! extends ..., W! extends U>`| W's upper bound is not nullable |
| `<T~, W extends T>` | *invalid* | W's upper bound is neither clearly nullable nor non nullable |
| `<T~, W~ extends T~>` | *valid* | No guessing needed |
## Rules
As above, `T` is agnostic, `U` is guaranteed non-nullable, and `V` is guaranteed nullable.
1. No asking if a `<T~>` is `null`.
2. `T?` is an illegal type since it could lead to nested option types.
3. No binding a nullable type to a `<U!>`. This includes `T` and `V`.
4. No asking if a `U` is null, but it's ok to ask if a `U?` is null.
5. `V?` could be banned as a type or could be treated as equivalent to `V`.
6. Binding `T` to `<V?>` is illegal since it is not clear when to insert option wrapping/unwrapping instructions.
7. Given a contravariant definition `<in X!>, binding `X` (non-null) to `<V?>` is illegal since option types are covariant on their type parameter.
## Allowed bindings
To make clear what's not prohibited by the rules:
- `U`, `U?`, and `V` can bind to `<T~>`
- `U` and `U?` can bind to `<V?>`. A contravariant `U` cannot.
- any `T` can bind to any other `<T2~>`
- any `U` can bind to any other `<U2!>`
- any `V` can bind to any other `<V2?>`
- any concrete type can bind to `<T~>`, e.g. `String` and `String?`
- any non-nullable concrete type can bind to `<U!>`, e.g. `String`
- any nullable concrete type can bind to `<V?`, e.g. nullable types like `String?` can bind.
## Example Translations
### An agnostic `<T>` type.
`List<String>` -> `Vec<String>`
`List<String?>` -> `Vec<Opt<String>>`
A reverse operation
```ts
let listOf2<T>(a: T, b: T) { [a, b] }
```
The optioning rust backend doesn't need to care whether a T is an option.
```rust
fn listOf2<T>(a: T, b: T) -> Vec<T> { vec![a, b] }
```
### A definite non-null `<U!>` type
A first-or-null function.
It uses `U?` as a distinct type for the return type.
```ts
let firstOrNull<U!>(ls: List<U>): U? {
if (ls.isEmpty) { null } else { ls[0] }
}
```
When converting a U to a U? we need a wrapping instruction.
```rust
fn firstOrNull<U>(ls: Vec<U>): Option<U> {
if (ls.is_empty) { None } else { Some(ls[0]) }
}
```
### A nullable `<V?>` type
```ts
let eitherOr<V?>(a: V, b: V): V {
if (a != null) { a ) else { b }
}
eitherOr<String?>(null, "${f()}")
eitherOr<String>("${f()}", "${g()}")
```
```rust
fn eitherOr<V>(a: Option<V>, b: Option<V>): Option<V> {
if (a.is_some()) { a } else { b }
}
// We need to un-option the nullable type to bind it to <V?>
// We need to insert wrapping and unwrapping instruction where a non-nullable type binds to <V?> or just auto-infer promote to a nullable bound.
eitherOr<Str>(None, Some(...))
eitherOr<Str>(Some(...), Some(...)).get()
```
## Subtype-like relationships
### Plain old subtyping
If `X <: X?` then translation of `X` to a language with subtypes should preserve the relationship: the translation of `X` should be a subtype of the translation of `X?`.
This is trivially not the case on optioning backends.
`String <: String?`, but `String </: Option<String>`.
### Assignable to
A *String* is-not-an *Option\<String>* but it may be assignable to a *String* if we can identif
Assignable-to is used in static checks and in type inference? Can we treat `X?` as like a supertype of `X` for inferring types (`<~:` meaning assignable-to/subtype-ish)? (Though we might need to insert unpacking instructions later)
```ts
let foo: Foo = f();
let temporary;
let fooOrNull: Foo?;
temporary = foo;
fooOrNull = temporary;
```
To solve the type of *temporary* it might be good to know that *Foo <~: temporary's type <~: Foo?* which makes sense when *Foo <~: Foo?*.
This assignable-to but not a subtype-of is also exactly where we need to insert auto-boxing/unboxing instructions.
The above has two valid translations on an optioning backend:
```ts
// Translation 1: temporary is an Option
let foo: Foo = f();
let temporary/*: Option<Foo>*/;
let fooOrNull: Option<Foo>;
temporary = Option.some(foo);
fooOrNull = temporary;
// Translation 2: temporary is not an Option
let foo: Foo = f();
let temporary/*: Foo*/;
let fooOrNull: Option<Foo>;
temporary = foo;
fooOrNull = Option.some(temporary);
```
Most type solvers prefer lower bounds to upper bounds where both are available, so translation 2 is more likely.
### Signature narrowing in overrides.
When a subtype overrides a method, can it widen input types and/or narrow output type?
```ts
interface I {
public f(i: Int ): String?;
}
class C extends I {
public f(i: AnyValue?): String { "Hello" }
}
```
On Java, a non-optioning backend, that would be fine, but we would need to insert a bridge to allow widening Java `int`, a value type, to `@Nullable Object`, a Java reference type.
```java
interface I {
@Nullable String f(i: int);
}
class C implements I {
@Override
public @Nonnull String f(int i) {
return f((Object) i);
}
public @Nonnull String f(@Nullable Object i) {
...
}
}
```
But on an optioning backend we run into the same subtype problem: an option is not a supertype of its type parameter.
```ts
interface I {
f(i: Int): Option<String>;
}
class C extends I {
public f(i: Option<AnyValue>): String { "Hello" }
}
```
Since *Int </: Option\<AnyValue>* and *String </: Option\<String>*, the above signatures would not be compatible between *I*'s declaration and its subtype *C*'s declaration.
Optioning backends that allow method overloading could bridge like the Java above, but how overloading works is very language specific and potentially creates other translation hazards.
TODO: draft rule above for signature narrowing
### Signature specialization
```ts
interface I<T extends AnyValue?> {
f(): T;
}
class C(public x: String?) extends I<String?> {
// The output type works because
// I's <T> is String? in this context.
public f(): String? { x }
}
```
Below considers each of the 3 types of type variables above (`!`, `?`, and `~`) × 5 cases (Nonnullable binding, Nullable binding, binding to a type parameter of the 3 types)
By including other type cases in our binding set, we also explore when one type parameter can have another as an upper bound: `<T, U extends T>`.
For each of these cases, we have a header like "`<T!>` binds to `Foo`." Here's what each of those lefts and rights mean:
Lefts:
- `<T~>`: a type parameter that translates to an agnostic type: may be an option or not
- `<U!>`: a type parameter that translates to a non-option type on optioning backends
- `<V?>`: a type parameter that translates to an option type on optioning backends
Rights:
- `Foo`: an actual binding for the left type parameter which is a non-nullable type
- `Foo?`: an actual binding which is a nullable type
- `<W~ extends >`: the actual binding for the left parameter is `W` which is itself a reference to a type parameter declared in the same or a narrow scope than the left parameter and which has it as an upper bound
- `<W! extends >`: same but `W` is not an option
- `<W? extends >`: same but `W` is an option type
#### `<T~>` v `Foo`:
OK
```ts
interface I<T~> {
f(): T;
}
class C extends I<Foo> {
f(): Foo {...}
}
//----
interface I<T> {
f(): T
}
class C extends I<Foo> {
f(): Foo {...}
}
```
#### `<T~>` v `Foo?`:
OK
```ts
interface I<T~> {
f(): T;
}
class C extends I<Foo?> {
f(): Foo? {...}
}
//----
interface I<T> {
f(): T
}
class C extends I<Option<Foo>> {
f(): Option<Foo> {...}
}
```
#### `<T~>` v `<W~ extends >`
OK
```ts
interface I<T~> {
f(): T;
}
class C<W~> extends I<W> {
f(): W {...}
}
//----
interface I<T> {
f(): T
}
class C extends I<W> {
f(): W {...}
}
```
#### `<T~>` v `<W! extends >`:
```ts
interface I<T~> {
f(): T;
}
class C<W!> extends I<W> {
f(): W {...}
}
//----
interface I<T> {
f(): T
}
class C extends I<W> {
f(): W {...}
}
```
#### `<T~>` v `<W? extends >`: ...
OK. Option shows up in the extends clause.
```ts
interface I<T~> {
f(): T;
}
class C<W?> extends I<W> {
f(): W {...}
}
//----
interface I<T> {
f(): T
}
class C<W> extends I<Option<W>> {
f(): Option<W> {...}
}
```
#### `<U!>` v `Foo`:
OK
```ts
interface I<U!> {
f(): U;
}
class C extends I<Foo> {
f(): Foo {...}
}
//----
interface I<U> {
f(): U
}
class C extends I<Foo> {
f(): Foo {...}
}
```
#### `<U!>` v `Foo?`:
Banned
```ts
interface I<U!> {
f(): U;
}
class C extends I<Foo?> {
f(): Foo? {...}
}
```
#### `<U!>` v `<W~ extends >`:
Banned
```ts
interface I<U!> {
f(): U;
}
class C<W~> extends I<W> {
f(): W {...}
}
```
#### `<U!>` v `<W! extends >`:
OK
```ts
interface I<U!> {
f(): U;
}
class C<W!> extends I<W> {
f(): W {...}
}
//----
interface I<U> {
f(): U
}
class C extends I<W> {
f(): W {...}
}
```
#### `<U!>` v `<W? extends >`:
Banned
```ts
interface I<U!> {
f(): U;
}
class C<W?> extends I<W> {
f(): W {...}
}
```
#### `<V?>` v `Foo`:
Banned. Just use Foo?
```ts
interface I<V?> {
f(): V;
}
class C extends I<Foo> {
f(): Foo? {...} // Need `?` here
}
//----
interface I<V> {
f(): Option<V>
}
class C extends I<Foo> { // No Option here
f(): Option<Foo> {...} // Option here
}
```
#### `<V?>` v `Foo?`:
```ts
interface I<V?> {
f(): V;
}
class C extends I<Foo?> {
f(): Foo? {...}
}
//----
interface I<V> {
f(): Option<V>
}
class C extends I<Foo> { // No option here
f(): Option<Foo> {...}
}
```
#### `<V?>` v `<W~ extends >`:
Banned
```ts
interface I<V?> {
f(): V;
}
class C<W~> extends I<W> {
f(): W {...}
}
//----
interface I<V> {
f(): Option<V>
}
class C extends I<W> {
f(): Option<COULD_NEST> {...}
}
```
#### `<V?>` v `<W! extends >`:
Ok with W? as binding
```ts
interface I<V?> {
f(): V;
}
class C<W!> extends I<W?> {
f(): W? {...}
}
//----
interface I<V> {
f(): Option<V>
}
class C extends I<W> { // No option here
f(): Option<W> {...}
}
```
#### `<V?>` v `<W? extends >`:
OK
```ts
interface I<V?> {
f(): V;
}
class C<W?> extends I<W> {
f(): W {...}
}
//----
interface I<V> {
f(): Option<V>
}
class C extends I<W> {
f(): Option<W> {...}
}
```
## Void vs Null
Short answer: No it is not passible to have *Unit* semantics for *Void* because on Python *None* is used for both concepts, and the below Temper code would not preserve semantics:
```ts
let f<T!>(x: T?, y: T?): Int {
var n = 0;
if (x != null) { n++ }
if (y != null) { n++ }
n
}
f(void, null) == 1
```
That would translate to Python3:
```py
def f[T](x: T | None, y: T | None) -> int:
n = 0
if (x is not None): n += 1
if (y is not None): n += 1
return n
f(None, None) == 1
```
----
Can Temper transparently have *Unit* style semantics for its *Void* type?
`null` and `void` are both singleton values.
`void` often means a function has no outputs.
*Unit* semantics allow zero-value-returning functions to be unary: have a result. There is a special, stateless, singleton "unit" value that is the output of any function called for its side-effect instead of its result.
This is important in generic programming as when every function has exactly one result, it's easier to use functions with higher-order functions.
In Java, `void` cannot bind to a type parameter, but the pseudo-reference-type, `java.lang.Void`, can.
If you want to use a void returning function-like type, e.g. *Runnable*, where a unary function like *Producer\<Void>* is expected, you have to adapt it.
```java
Runnable r = () -> {};
Supplier<Void> s = () -> { r.run(); return null; };
```
If Temper's *Void* type had *Unit* semantics, then Temper authors could avoid explicit adapting, and backends that have a *Unit* type could use it for *Void*.
For backends that disallow binding their *void* type to type parameters, we could do the following:
1. ensure that any calls to `void` functions that are inputs to other calls are extracted to the block root and replaced with the `void` literal in a way that preserves order of operations
2. identify and mark methods that return `Void` because they specialize a supertype method:
```ts
interface I<T> {
f(): T;
}
class C extends I<Void> {
f(): Void { ... }
}
```
3. identify where a void returning function value is passed to a `<T>` returning function and mark as needing auto-adaption on some backends.
Unfortunately, we can't have unit semantics internally within Temper's frontend.
Many dynamic languages conflate the concepts of null and void.
- Python3 as noted above uses `None` for both.
- JavaScript uses `undefined` for `void` and assigning distinct semantics to `null` and `undefined` makes for brittle library APIs.
- Lisps often treat the empty list, `nil`, as both.
- Lua?
As noted above, if Temper `Void` has Unit semantics, then `Void?` is a mentionable type and we need to, at runtime, be able to distinguish the translation of `void` from the translation of `null`.
Temper will probably have a singleton *Unit* type/value (tentatively called *Empty*/*empty*) in its standard library that is distinct from `null`.
Internally, it will reserve *Void* as an output-only type but allow *Unit* as an internal type.
The TmpL middle-end may auto-adapt void functions to unit-returning functions for backends that benefit from that to allow for the illusion of unit semantics, but backends will need to grapple with both concepts.