changed 4 years ago
Linked with GitHub

All your const belong to the type system

TLDR

Duplicate ty::Const into ty::Const and mir::Const where the former becomes a tree with integer leaves and the latter loses all forms of "optimized" representation and just refers to the plain const allocation backing the constant.

Motivation

Currently ty::Const has multiple ways of representing the same value, which by itself isn't bad, but we actually do represent the same value in different ways. This breaks the invariant in the compiler that if interned things are equal, their interned address is the same. This is very visible when you have two constants representing a pointer value. Const generic functions that are generic over pointer values can arbitrarily cause linker errors, because code defining the generic function may compute a different symbol than the code calling the generic function.

Guts and other gory details

ty::Const's val field is of type ConstKind, whose Value variant contains a ConstValue. This ConstValue will be replaced by

enum ConstValue<'tcx> {
    Leaf(u128),
    Node {
        variant: Option<VariantIdx>,
        elements: &'tcx [ConstValue<'tcx>],
    }
}

This change will shrink the size of ConstValue from its current 32 bytes to 24 bytes.
A value of struct, tuple or array type will be represented as a Node, where all the fields or elements are again encoded as ConstValue. Only integers, bools and char are allowed as Leaf values.
A value of enum type will be represented as a Node where the variant field is set to the active enum variant's index and the enum variant's fields are encoded just like struct fields.
A pointer is represented as a Node with a single child, so (42,) and &42 are represented exactly the same, only the type at the ty::Const level will differ. While we could have a separate Pointer(&'tcx ConstValue<'tcx>) variant, there's little use in making this explicit and not having it will reduce code duplication everwhere where the difference is not relevant without affecting the places where the difference is relevant.

A lossy conversion from mir::Const to ty::Const will be introduced. It will be used for pretty printing mir::Const during mir dumps. This way we only have to support pretty printing ty::Const, which is trivial to pretty print.

mir::Const will be changed to just become the old ConstValue::ByRef representation in case of ConstKind::Value. We can likely also remove other variants from the new mir::ConstKind - I belive Infer, Bound and PlaceHolder are only used in typeck and irrelevant for MIR.

Future extensions

  • Function pointers in const generics can be supported by adding a Function(Instance<'tcx>)' variant to ConstValue.
  • Move parts from ConstKind to ConstValue, allowing things like (42, N) to be encoded, while currently const generic parameters can only be encoded if the constant is just a const generic parameter directly.
    • Similar designs may be feasible for associated constants
  • Make the mir::Const -> ty::Const conversion fallible instead of lossy. Whenever there would be a loss of information, resort to verbose printing of the mir::Const.

Alternative designs

  • ty::Const contains another ty::Const at every level. This allows types like (42, N) to be encoded in ty::Const at all.
    • I worry that this will keep causing bugs where we accidentally encode the same constant in different ways, which is why I'm proposing the explicit design.
Select a repo