Pattern matching WG

# Pattern matching WG Members: [@raph-amiard](https://github.com/raph-amiard), [@clairedross](https://github.com/clairedross), [@roldak](https://github.com/roldak), [@yakobowski](https://github.com/yakobowski) ## Overview ## Syntax Have a “moving target” syntax for our pattern matching proposal. Current proposal: ### Syntax for matching composite types Our proposal uses the syntax for aggregates to match composite values, extended with `<>` to represent the `others` (which will match any value). ```ada case Arr is when (1, 2, 3) => null; when (others => 12) => null; when (1 | 2 => 4, 3 .. 8 => <>) => null; when <> => null; end case; ``` You can also match records, including variant records. ```ada type Opt (Has_Value : Boolean) is record case Has_Value is when True => Val : Int; when others => null; end case; end record; I : Opt := ...; case I is when (Has_Value => True, Val => 2) => null; when (Has_Value => False) => null; end case; ``` ### Syntax for bindings values To bind a value, the simple form is: ``` pattern `as` binding_name [`:` subtype_indication] ``` ```ada case I is when (Has_Value => True, <> as Val : Integer) => null; when (Has_Value => False) => null; end case; ``` We would like to have a short form for the “others” pattern (expressed as `<>`) when binding a name: ``` <binding_name> ``` ```ada case I is when (Has_Value => True, <Val>) => null; when (Has_Value => False) => null; end case; ``` ### Syntax for dereferencing accesses You can match an access value with either: ``` `null` | `(` `all` `=>` pattern `)` ``` ```ada function Add (A, B : Int_Access) return Integer is begin case (A, B) is when ((all => <A>), (all => )) => A + B when ((all => <A>), null) or (null, (all => <A>)) => A when (null, null) => 0 end case; end Add; ``` Note: not sure about the all syntax, but we don’t have a better idea that is readable and composable at the moment. We tried playing with “.all” but it didn’t work out well. One idea would be to have `all` as a prefix operator: ```ada function Add (A, B : Int_Access) return Integer is begin case (A, B) is when (all <A>, all ) => A + B when (all <A>, null) or (null, all <A>) => A when (null, null) => 0 end case; end Add; ``` ### Syntax for matching tagged types Basically using qualified expressions: ```ada function Add (A, B : Maybe (Integer)) return Integer is begin case Exp is when Bin_Op'(<L>, <Op>, <R>) => ... when Call'(<Callee>, (Some, <Args>)) => ... when Call'(<Callee>, None) => ... end case; end Add; ``` Additionally, would be nice to be able to use just the type name, as done for subtypes currently: ```ada function Add (A, B : Maybe (Integer)) return Integer is begin case Exp is when Bin_Op => ... when Call => ... when Un_Op as U => ... end case; end Add; ``` ### Syntax for matching single discriminant record ? When matching a maybe type, or any type that has a valid value that is only represented by one discriminant value, we have to use the long form `(Disc => Value)`. In the case of the `Maybe` type, this is very heavy, as evidenced in the example below: ```ada function Add (A, B : Maybe (Integer)) return Integer is begin case (A, B) is when ((Some, <A>), (Some, )) => A + B when ((Some, <A>), (Kind => None)) or ((Kind => None), (Some, <A>)) => A when ((Kind => None), (Kind => None)) => 0 end case; end Add; ``` **NOTE**: Actually in the example above in can be worked around by the fact that the only remaining value is None in every case we use it, so we can just use `<>` ```ada function Add (A, B : Maybe (Integer)) return Integer is begin case (A, B) is when ((Some, <A>), (Some, )) => A + B when ((Some, <A>), <>) or (<>, (Some, <A>)) => A when (<>, <>) => 0 end case; end Add; ``` This alleviates the concern somehow, since you can probably expect that the empty alternative in discriminated records will always be the last match. However this will mess up with people trying to enforce complete coverage by manually matching every case. Ideas so far Romain proposed allowing to put the discriminant first, before the aggregate, which would make it look like a ML constructor, or a function call in Ada. Claire said that in this case having this syntax for everything: ```ada when (Some(<A>), None) or (None, Some(<A>)) => A ``` The problem with that syntax is that it’s highly ambiguous in the regular aggregate case. It looks like a function call, but would be very hard to implement in terms of function calls If several discriminated records use the same enum discriminant Another possibility that arose would be using subtypes: ```ada generic type T is private; package Opt is type Opt_Kind is (Kind_None, Some); type Maybe (Kind : Opt_Kind) is record case Kind is when Some => Val : T; when Kind_None => null; end case; end record; subtype None is Maybe (Kind_None); end Opt; X : Int_Option := Some'(Value => 2); function Add (A, B : Maybe (Integer)) return Integer is begin case (A, B) is when ((Some, <A>), (Some, )) => A + B when ((Some, <A>), None) | (None, (Some, <A>)) => A when (None, None) => 0 end case; end Add; function Add (A, B : Maybe (Integer)) return Integer is begin case (A, B) is when ((Some, let A), (Some, let B)) => A + B when ((Some, let A), None) | (None, (Some, let A)) => A when (None, None) => 0 end case; end Add; ``` **Decision: Using a subtype for None seems like a good enough solution that can be implemented with existing language facilities.** ## Semantics ### Semantics of bindings values #### If we consider that a variable binding is like a rename: This causes potential safety issues, which already exist and are why you cannot rename a component of a mutable discriminated record. Is the user allowed to mutate the renamed component ? #### If we consider that a variable binding is like a copy: Consider that the copy is “constant” by default #### Do we want to allow both ? Rename semantics is desirable because then you can mutate matched components But then you cannot use pattern matching on mutable discriminated records, which is a huge cons. Is adding the complexity of two kind of bindings worth it ? Maybe have `as` for copy-like bindings and `renames` for rename bindings ? Answer: That’s horrible because this would imply "pattern `as` binding_name" but "binding_name `renames` pattern" ### Sequential vs partitioned semantics for branch matching Claire & Raphael, while writing the original AI, have found a paradigm for matching that allows saving some of the original philosophy behind the Ada case statement’s rules, while allowing some expressiveness. The idea is that each pattern sub-rule must go from specific to more general. The order of branches doesn’t influence execution (cannot, really, because either branches are non overlapping, either the order in which branches are written is the only possible one). The rules in short are: Matches and sub matches should generally be ordered from less general to more general. When that is not the case, and you go from a general match to a specific match, and that generates overlapping cases, you need to make sure that those cases are also explicitly handled. The question is whether we should explore this idea or dig into the ML family pattern matching direction, where order of branches matter, overlap can happen, and only dead branches are discovered. Claire mentioned live that the Ada-like fashion is still more safe, while less expressive. The rationale is similar to the one used for contract cases, where we want only one case to apply at each time for readability. If you have a case with -100 .. 0 and another with 0 .. 100, it is interesting to make sure that the programmer has considered the case of 0 and decided what behavior he wanted here. And assuming that the pattern matching statement is big, it would be helpful to know when we have a constrained pattern that we necessarily end up here for all matched values: ```ada case X : Integer is when Positive => ...; when Negative => ...; -- if this one is far from the others, it would be nice to be sure when -- reading it that we will enter this alternative each time we have -1, -- 0 or 1. when -1 .. 1 => ...; end case; ``` Raph: Romain has pointed out that if the -1 .. 1 range above is treated like 3 different branches, the dead branch warning would catch this. If we consider it this way, the true difference between linear matching a la OCaml and what we are proposing is that in OCaml you will only get a warning when the compiler is sure that there is a dead branch (which cannot be verified in all cases because of guards - maybe amongst other features), whereas in Ada, you are forbidden to write anything that can generate such uncertainty, and hence the order *must not* matter. In other words, if you remove cases that cannot be statically evaluated (such as branches) and make the dead branch detector mandatory, you basically have the same system that we are proposing (I think). Claire: As discussed live, either I don’t understand or don’t agree with the above. If two cases are overlapping, and none are dead, the dead branch detector won’t complain, as in the case below if you remove (1,1). ```ada case P : Point is when (1, 1) => when (1, <>) => when (<>, 1) => end case; ``` Boris looked at ~100 complex pattern-matching in the code of Why3, trying to find examples where this semantics would be unacceptably more verbose, and did not find any. The increase in verbosity was always acceptable. **Decision: Try to implement the explicit overlap model. See if it’s implementable, and ergonomic.** ### Semantics of matched values & completeness Matching tagged types would anyway imply matching potentially infinite number of cases. Also, it is felt amongst members of the WG that the restrictions on real types is kind of artificial anyway. **So we propose completely removing any restrictions on the types that can be matched, and instead replacing by a completeness check that will try to detect uncovered cases depending on the type being matched.** ## Open questions