Nullable types and flat option equivalence

# Nullable types and flat option equivalence Temper has some *optioning backends*, Rust, C# (strangely), and eventually OCaml where a null type translates to an option type. (Swift calls their nullable types option types, but this seems to be marketing) `Foo?` is a nullable type; it admits all the values that `Foo` admits but also admits the special value `null`. The `null` value can be differentiated from the other values: `someFoo != null`. ## Goals **Support optioning languages** by translating nullable types even where there's no `null` value or that, like C#, don't allow its use with type parameters that can bind to both value and reference types. Provide **flexible container types** like `List<String>` and `List<String?>` for bundling results and for intermediate computations. Provide **consistent semantics for null checks**: `x == null`. Specifically avoid option nesting that would lead `Some(None)` to be spuriously recognized as distinct from `null`/`None`. Clarify **which subtype-like relationships are translatable**. There are a number of type relationships related to subtyping that are used in Temper. What restrictions do we need to put on them, and which cause translation problems? Explore the **null / void / unit** relationship in more detail. Specifically, is it possible to have a *Void* type with *Unit* semantics while still preserving the semantics/translatability of `x == null` where `x` could be null. ## Design Sketch We can translate `SomeType?` easily enough to an option type. When a `SomeType` appears in a method return type because it narrows a super-type method that was `SomeType` we can detect that using the same kind of signature adjustment checks we need for Java and C#. The source of complexity discussed here is generic type parameters. - Is `T` an optional type? Do we need to pack an option when passing a value into a `T` or unpack when the return type is `T`? - Could the type `T?` introduce a nested option type? - Can a `T` be `null`? We want to avoid asking `.is_some()` when the value representation is not statically known to be an option. The rest of this document discusses a 3-way distinction for type parameters. 1. An agnostic type parameter like that for `class List<T>` which is agnostic to whether `T` is nullable. It could be an option or it could not. That way, Temper `List<String>` could be a Rust `Vec<Str>` but `List<String?>` would be a `Vec<Option<Str>>` 2. A definite option type parameter. With this, you can ask if a value of that type is `null`. Binding a non-nullable type to this type parameter might involve auto-wrapping and unwrapping above. 3. A definite non-null type parameter. With this, you can introduce `T?` as a distinct type. Extension functions like `List.getOrNull` could use this. (Note: A capital letter inside angle brackets `<X>` refers to a type parameter declaration. A single capital letter name, eg `X`, without the angle brackets denotes a use of that declared type in the signature or body of the declaring type/method.) These distinctions are made on type parameter definitions, so we can clarify the cases using the syntax below though all three distinctions can be made in-language without extra syntax if we do not infer an *AnyValue* upper bound for type formals without an explicit upper bound and make it illegal to mix nullable and non-nullable upper bounds: `<T~ extends ...>` means `T` is agnostic. `T` could be nullable or not. It does not make sense to call methods on values of type `T~` since any value could be `null`. `<U! extends ...>` means `U` is not a nullable type, so `U?` is a distinct, usable type. You can call methods on values of type `U` based on its upper bounds. `<V? extends ...>` means `V` is nullable and translates to an option type on optioning backends. It makes sense to ask whether a `V` is null or not. Inside an `if (myV != null)` context, we could allow `myV.method()` uses based on upper bounds with nullity erased because it's safe to unwrap the option on optioning backends. ## Syntactic Sugar As a syntactic convenience, `<T extends ...>` is one of the above. - `<W extends ...>` is `<W! extends ...>` if it has no nullable upper bounds. - `<W extends ...>` is `<W~ extends ...>` if it has an upper bound and all its upper bounds are nullable. Since a type parameter can have, as an upper bound, a previously declared type parameter, the below are equivalent because *nullable* above includes references to `<T?>`: | Unqualified | Equivalent to | Because | | ---- | ---- | ---- | | `<W extends AnyValue>` | `<W! extends AnyValue>` | the upper bound is not nullable | | `<W extends AnyValue?>` | `<W? extends AnyValue?>` | the upper bound is nullable | | `<W>` | `<W!>` | W has an implicit upper bound of *AnyValue* which is not nullable | | `<V?, W extends V>` | `<V?, W? extends V>` | W's upper bound is nullable | | `<U! extends ..., W extends U>` | `<U! extends ..., W! extends U>`| W's upper bound is not nullable | | `<T~, W extends T>` | *invalid* | W's upper bound is neither clearly nullable nor non nullable | | `<T~, W~ extends T~>` | *valid* | No guessing needed | ## Rules As above, `T` is agnostic, `U` is guaranteed non-nullable, and `V` is guaranteed nullable. 1. No asking if a `<T~>` is `null`. 2. `T?` is an illegal type since it could lead to nested option types. 3. No binding a nullable type to a `<U!>`. This includes `T` and `V`. 4. No asking if a `U` is null, but it's ok to ask if a `U?` is null. 5. `V?` could be banned as a type or could be treated as equivalent to `V`. 6. Binding `T` to `<V?>` is illegal since it is not clear when to insert option wrapping/unwrapping instructions. 7. Given a contravariant definition `<in X!>, binding `X` (non-null) to `<V?>` is illegal since option types are covariant on their type parameter. ## Allowed bindings To make clear what's not prohibited by the rules: - `U`, `U?`, and `V` can bind to `<T~>` - `U` and `U?` can bind to `<V?>`. A contravariant `U` cannot. - any `T` can bind to any other `<T2~>` - any `U` can bind to any other `<U2!>` - any `V` can bind to any other `<V2?>` - any concrete type can bind to `<T~>`, e.g. `String` and `String?` - any non-nullable concrete type can bind to `<U!>`, e.g. `String` - any nullable concrete type can bind to `<V?`, e.g. nullable types like `String?` can bind. ## Example Translations ### An agnostic `<T>` type. `List<String>` -> `Vec<String>` `List<String?>` -> `Vec<Opt<String>>` A reverse operation ```ts let listOf2<T>(a: T, b: T) { [a, b] } ``` The optioning rust backend doesn't need to care whether a T is an option. ```rust fn listOf2<T>(a: T, b: T) -> Vec<T> { vec![a, b] } ``` ### A definite non-null `<U!>` type A first-or-null function. It uses `U?` as a distinct type for the return type. ```ts let firstOrNull<U!>(ls: List): U? { if (ls.isEmpty) { null } else { ls[0] } } ``` When converting a U to a U? we need a wrapping instruction. ```rust fn firstOrNull(ls: Vec): Option { if (ls.is_empty) { None } else { Some(ls[0]) } } ``` ### A nullable `<V?>` type ```ts let eitherOr<V?>(a: V, b: V): V { if (a != null) { a ) else { b } } eitherOr<String?>(null, "${f()}") eitherOr<String>("${f()}", "${g()}") ``` ```rust fn eitherOr<V>(a: Option<V>, b: Option<V>): Option<V> { if (a.is_some()) { a } else { b } } // We need to un-option the nullable type to bind it to <V?> // We need to insert wrapping and unwrapping instruction where a non-nullable type binds to <V?> or just auto-infer promote to a nullable bound. eitherOr<Str>(None, Some(...)) eitherOr<Str>(Some(...), Some(...)).get() ``` ## Subtype-like relationships ### Plain old subtyping If `X <: X?` then translation of `X` to a language with subtypes should preserve the relationship: the translation of `X` should be a subtype of the translation of `X?`. This is trivially not the case on optioning backends. `String <: String?`, but `String </: Option<String>`. ### Assignable to A *String* is-not-an *Option\<String>* but it may be assignable to a *String* if we can identif Assignable-to is used in static checks and in type inference? Can we treat `X?` as like a supertype of `X` for inferring types (`<~:` meaning assignable-to/subtype-ish)? (Though we might need to insert unpacking instructions later) ```ts let foo: Foo = f(); let temporary; let fooOrNull: Foo?; temporary = foo; fooOrNull = temporary; ``` To solve the type of *temporary* it might be good to know that *Foo <~: temporary's type <~: Foo?* which makes sense when *Foo <~: Foo?*. This assignable-to but not a subtype-of is also exactly where we need to insert auto-boxing/unboxing instructions. The above has two valid translations on an optioning backend: ```ts // Translation 1: temporary is an Option let foo: Foo = f(); let temporary/*: Option<Foo>*/; let fooOrNull: Option<Foo>; temporary = Option.some(foo); fooOrNull = temporary; // Translation 2: temporary is not an Option let foo: Foo = f(); let temporary/*: Foo*/; let fooOrNull: Option<Foo>; temporary = foo; fooOrNull = Option.some(temporary); ``` Most type solvers prefer lower bounds to upper bounds where both are available, so translation 2 is more likely. ### Signature narrowing in overrides. When a subtype overrides a method, can it widen input types and/or narrow output type? ```ts interface I { public f(i: Int ): String?; } class C extends I { public f(i: AnyValue?): String { "Hello" } } ``` On Java, a non-optioning backend, that would be fine, but we would need to insert a bridge to allow widening Java `int`, a value type, to `@Nullable Object`, a Java reference type. ```java interface I { @Nullable String f(i: int); } class C implements I { @Override public @Nonnull String f(int i) { return f((Object) i); } public @Nonnull String f(@Nullable Object i) { ... } } ``` But on an optioning backend we run into the same subtype problem: an option is not a supertype of its type parameter. ```ts interface I { f(i: Int): Option<String>; } class C extends I { public f(i: Option<AnyValue>): String { "Hello" } } ``` Since *Int </: Option\<AnyValue>* and *String </: Option\<String>*, the above signatures would not be compatible between *I*'s declaration and its subtype *C*'s declaration. Optioning backends that allow method overloading could bridge like the Java above, but how overloading works is very language specific and potentially creates other translation hazards. TODO: draft rule above for signature narrowing ### Signature specialization ```ts interface I<T extends AnyValue?> { f(): T; } class C(public x: String?) extends I<String?> { // The output type works because // I's <T> is String? in this context. public f(): String? { x } } ``` Below considers each of the 3 types of type variables above (`!`, `?`, and `~`) × 5 cases (Nonnullable binding, Nullable binding, binding to a type parameter of the 3 types) By including other type cases in our binding set, we also explore when one type parameter can have another as an upper bound: `<T, U extends T>`. For each of these cases, we have a header like "`<T!>` binds to `Foo`." Here's what each of those lefts and rights mean: Lefts: - `<T~>`: a type parameter that translates to an agnostic type: may be an option or not - `<U!>`: a type parameter that translates to a non-option type on optioning backends - `<V?>`: a type parameter that translates to an option type on optioning backends Rights: - `Foo`: an actual binding for the left type parameter which is a non-nullable type - `Foo?`: an actual binding which is a nullable type - `<W~ extends >`: the actual binding for the left parameter is `W` which is itself a reference to a type parameter declared in the same or a narrow scope than the left parameter and which has it as an upper bound - `<W! extends >`: same but `W` is not an option - `<W? extends >`: same but `W` is an option type #### `<T~>` v `Foo`: OK ```ts interface I<T~> { f(): T; } class C extends I<Foo> { f(): Foo {...} } //---- interface I<T> { f(): T } class C extends I<Foo> { f(): Foo {...} } ``` #### `<T~>` v `Foo?`: OK ```ts interface I<T~> { f(): T; } class C extends I<Foo?> { f(): Foo? {...} } //---- interface I<T> { f(): T } class C extends I<Option<Foo>> { f(): Option<Foo> {...} } ``` #### `<T~>` v `<W~ extends >` OK ```ts interface I<T~> { f(): T; } class C<W~> extends I<W> { f(): W {...} } //---- interface I<T> { f(): T } class C extends I<W> { f(): W {...} } ``` #### `<T~>` v `<W! extends >`: ```ts interface I<T~> { f(): T; } class C<W!> extends I<W> { f(): W {...} } //---- interface I<T> { f(): T } class C extends I<W> { f(): W {...} } ``` #### `<T~>` v `<W? extends >`: ... OK. Option shows up in the extends clause. ```ts interface I<T~> { f(): T; } class C<W?> extends I<W> { f(): W {...} } //---- interface I<T> { f(): T } class C<W> extends I<Option<W>> { f(): Option<W> {...} } ``` #### `<U!>` v `Foo`: OK ```ts interface I<U!> { f(): U; } class C extends I<Foo> { f(): Foo {...} } //---- interface I { f(): U } class C extends I<Foo> { f(): Foo {...} } ``` #### `<U!>` v `Foo?`: Banned ```ts interface I<U!> { f(): U; } class C extends I<Foo?> { f(): Foo? {...} } ``` #### `<U!>` v `<W~ extends >`: Banned ```ts interface I<U!> { f(): U; } class C<W~> extends I<W> { f(): W {...} } ``` #### `<U!>` v `<W! extends >`: OK ```ts interface I<U!> { f(): U; } class C<W!> extends I<W> { f(): W {...} } //---- interface I { f(): U } class C extends I<W> { f(): W {...} } ``` #### `<U!>` v `<W? extends >`: Banned ```ts interface I<U!> { f(): U; } class C<W?> extends I<W> { f(): W {...} } ``` #### `<V?>` v `Foo`: Banned. Just use Foo? ```ts interface I<V?> { f(): V; } class C extends I<Foo> { f(): Foo? {...} // Need `?` here } //---- interface I<V> { f(): Option<V> } class C extends I<Foo> { // No Option here f(): Option<Foo> {...} // Option here } ``` #### `<V?>` v `Foo?`: ```ts interface I<V?> { f(): V; } class C extends I<Foo?> { f(): Foo? {...} } //---- interface I<V> { f(): Option<V> } class C extends I<Foo> { // No option here f(): Option<Foo> {...} } ``` #### `<V?>` v `<W~ extends >`: Banned ```ts interface I<V?> { f(): V; } class C<W~> extends I<W> { f(): W {...} } //---- interface I<V> { f(): Option<V> } class C extends I<W> { f(): Option<COULD_NEST> {...} } ``` #### `<V?>` v `<W! extends >`: Ok with W? as binding ```ts interface I<V?> { f(): V; } class C<W!> extends I<W?> { f(): W? {...} } //---- interface I<V> { f(): Option<V> } class C extends I<W> { // No option here f(): Option<W> {...} } ``` #### `<V?>` v `<W? extends >`: OK ```ts interface I<V?> { f(): V; } class C<W?> extends I<W> { f(): W {...} } //---- interface I<V> { f(): Option<V> } class C extends I<W> { f(): Option<W> {...} } ``` ## Void vs Null Short answer: No it is not passible to have *Unit* semantics for *Void* because on Python *None* is used for both concepts, and the below Temper code would not preserve semantics: ```ts let f<T!>(x: T?, y: T?): Int { var n = 0; if (x != null) { n++ } if (y != null) { n++ } n } f(void, null) == 1 ``` That would translate to Python3: ```py def f[T](x: T | None, y: T | None) -> int: n = 0 if (x is not None): n += 1 if (y is not None): n += 1 return n f(None, None) == 1 ``` ---- Can Temper transparently have *Unit* style semantics for its *Void* type? `null` and `void` are both singleton values. `void` often means a function has no outputs. *Unit* semantics allow zero-value-returning functions to be unary: have a result. There is a special, stateless, singleton "unit" value that is the output of any function called for its side-effect instead of its result. This is important in generic programming as when every function has exactly one result, it's easier to use functions with higher-order functions. In Java, `void` cannot bind to a type parameter, but the pseudo-reference-type, `java.lang.Void`, can. If you want to use a void returning function-like type, e.g. *Runnable*, where a unary function like *Producer\<Void>* is expected, you have to adapt it. ```java Runnable r = () -> {}; Supplier<Void> s = () -> { r.run(); return null; }; ``` If Temper's *Void* type had *Unit* semantics, then Temper authors could avoid explicit adapting, and backends that have a *Unit* type could use it for *Void*. For backends that disallow binding their *void* type to type parameters, we could do the following: 1. ensure that any calls to `void` functions that are inputs to other calls are extracted to the block root and replaced with the `void` literal in a way that preserves order of operations 2. identify and mark methods that return `Void` because they specialize a supertype method: ```ts interface I<T> { f(): T; } class C extends I<Void> { f(): Void { ... } } ``` 3. identify where a void returning function value is passed to a `<T>` returning function and mark as needing auto-adaption on some backends. Unfortunately, we can't have unit semantics internally within Temper's frontend. Many dynamic languages conflate the concepts of null and void. - Python3 as noted above uses `None` for both. - JavaScript uses `undefined` for `void` and assigning distinct semantics to `null` and `undefined` makes for brittle library APIs. - Lisps often treat the empty list, `nil`, as both. - Lua? As noted above, if Temper `Void` has Unit semantics, then `Void?` is a mentionable type and we need to, at runtime, be able to distinguish the translation of `void` from the translation of `null`. Temper will probably have a singleton *Unit* type/value (tentatively called *Empty*/*empty*) in its standard library that is distinct from `null`. Internally, it will reserve *Void* as an output-only type but allow *Unit* as an internal type. The TmpL middle-end may auto-adapt void functions to unit-returning functions for backends that benefit from that to allow for the illusion of unit semantics, but backends will need to grapple with both concepts.