[GT4Py] Unified type system

# [Gt4py] Unified type system ###### tags: `cycle 18` - Shaped by: [Peter] ## Motivation ### Shortcomings of the current system - Two independent type systems: one in the frontend and one in iterator IR - Frontend type system: - Is not extendable - Inferring the type from objects or type hints is done by a bunch of if statements - Is not complete - Lack of representation for various integral types - Lack of representation for offset providers - Lack of representation for tensor memory layout - Incomplete callable types: offset providers not represented in callable type - This makes it unsuitable for caching - Poor code quality - Inference from objects and type hints is done by ifs and matches - Dead code (`STRING` scalar type?) - Poor naming (`primitive_constituents`) - Dummy types: `DeferredType` - Mixing constraints into the type classes: `DataType`, `CallableType`, `DeferredType` - Iterator IR type system: - DISCLAIMER: I don't know much about this - Superfluous features: - Constraits are not needed ### Why we need a good type system - Native compilation: - Most stages after the Python AST rely on concrete typing information - frontend IR transforms - iterator IR transforms (e.g. determining type of shift, estimating volume of data transfer to calculate fusion benefits) - SDFG generation - C++ binding generation - It's crucial that these passes can readily and easily access typing information - Caching: - Concrete typing information is a necessary (though not sufficient) input to all SDFG and GTFN caching - Consistency: - Different passes on the same IR should adhere to the same rules regarding typing (e.g. `common_type` should always be the same) - Extendability: - Changing IRs and API should not be prevented by an inflexible and cumbersome type system ### Objectives - Promote code reuse by having a single typing framework instead of multiple ones - Make sure the typing framework is extendable - Should be possible to define new types in submodules - Should be possible to register object & hint inference hooks in submodules - Define a set of types that captures information needed by IRs - Full spectrum of integral types - Tensor memory layouts (static/dynamic size, stride, [offset]) - [TBD] Tensor memory residency (CPU, CUDA, AMDGPU, etc.) - Offset provider types - Find a system to define behaviors of types - Type casting: lossy and lossless, implicit and explicit - Determine callable invocability (can be simplified by redesigning AST type inference) - Determine type-operation compatibility (e.g. can I call operator+ with a string?) - Should be extendable by new types and behviors defined in submodules ## Design ### Type hierarchy - Type - IntegerType - SignedIntegerType - Int64Type - ... - UnsignedIntegerType - Uint64Type - ... - FloatType - Float64Type - ... - FieldLikeType/TensorLikeType/ShapedType - FieldType/TensorType -- needed by DaCe - OffsetedFieldType/OffsetedTensorType -- needed by frontend - TupleType - StructType -- can represent the dict of offset providers - FunctionType -- see the notes - shift-related: - DimensionOffsetProviderType - NeighborTableOffsetProviderType - StridedNeighborOffsetProviderType Notes: - `DeferredType` should be replaced by a `Constraint`/`Trait`/`Concept` object that's unrelated to the actual `Type` class, or simply by `Optional[Type]` if constraints are not needed. - The distinction between `ProgramType`, `FieldOperatorType` and `ScanOperatorType` can be moved into the IR by having `CallFieldOperator` and `CallScanOperator` nodes. This would simplify IR passes and allow to have only a single `FunctionType`. **Objectives**: - Find an initial set of types that can at least support the current frontend ### Behavior #### Ways to represent/define behavior We can look at existing systems for inspiration: - C++'s concept system: define a set of named criteria && add functions that determine if a type satisfies a criterium - C++ <type_traits>: similar to C++ concepts, but shabbier - Rust's trait system: define a set of named criteria && add types that advertise which critera they satisfy Reference: - https://github.com/pretzelhammer/rust-blog/blob/master/posts/tour-of-rusts-standard-library-traits.md - https://en.cppreference.com/w/cpp/header/type_traits - https://en.cppreference.com/w/cpp/language/constraints Example using a C++ type_traits/concepts style: ```python def is_invocable(fun: FunctionType, *args: Type): if len(args) != len(fun.params): return false for arg, param in zip(args, fun.params): if !is_convertible(arg, param): # delegating to 'is_convertible' return false return true # Usage if is_invocable(add_fun_ty, f64_ty, f64_ty): ... ``` Example using Rust-like traits: ```python def implements_trait(ty, trait): if issubclass(ty, trait): return type(trait).implements(ty, trait) return false class InvocableTrait(Trait): args: list[Type] def __init__(self, args: list[Type]): ... def implements(self, other: InvocableTrait): if len(self.args) != len(other.args): return false for arg, other_arg in zip(self.args, other.args): if !implements_trait(arg, From(other_arg)): # delegating to 'From' return false return true class FunctionType(Type, InvocableTrait): def __init__(self, params: list[Type]): InvocableTrait.__init__(self, params) # Usage if implements_trait(add_fun_ty, InvocableTrait(f64_ty, f64_ty)): ... ``` #### Most important behaviors - Can a function be called with a set of arguments - Applying only implicit conversions - C++ `std::is_invocable` - Can a type be converted to another - Implicitly (lossless conversions) - Explicitly (lossy, but meaningful conversions) - No (conversion doesn't make sense) - C++: `std::convertible_to`, `std::is_constructible`, Rust: `From` & `Into` - Can a type participate in an operation - operator+ can't add for example two functions - C++: `std::is_arithmetic`, Rust: `Add` #### Considerations regarding time investment The API typing is complex: - Polymorphic/constrained argument types - Vararg functions - Implicit conversions The machine code typing is simple: - Concrete types only - "Templates" (vararg, polymorphic/constrained) instantiated - No implicit conversions We should concretize type information as soon as possible (i.e. allow implicit conversion only in the very first IR), because it substantially simplifies lower level IRs. If complex typing is reserved to only a small section in the frontend, then we can get away with a much less extendable implementation of the complex stuff. **Objectives**: - Determine best approach to represent basic behavior - Take possible future requirements into account - Currently, API functions must be fully typed by Python type annotations - There is the idea of adding generic type annotations (i.e. field operator can be called with any integer type or any arithmetic type) ### Conversion from hints and instances **Objectives**: - Implement `from_type_hint` and `from_instance` in a scalable way - One way would be to introduce a type inferrer class with which one can register pairs of IR types and Python types ## Tasks & limitations - [necessary] Design and implement the type system - [necessary] Add proper unit tests for the type system - [optional] Replace frontend IR types with new type system (would be nice) - [probably not] Replace iterator IR types with new type system (don't know the implications)