# PHP RFC: Shapes * Version: 0.1 * Date: 2021-09-18 * Author: Kacper Donat <donat.kacper@gmail.com> * Status: Draft * First Published at: http://wiki.php.net/rfc/shapes ## Introduction It's not uncommon to see functions with a lot of optional configuration, this problem was partially solved for simpler cases by the [named arguments RFC], however it's not feasible to use named arguments with a lot (10+) of possible configuration options. Commonly used solution to this problem is introducing `array $options = []` as the last argument of function. This comes with its own cost and disadvantages were clearly stated in aforementioned RFC: > - For constructors in particular, it precludes usage of constructor promotion. > - The available options are not documented in the signature. You have to look at the implementation or phpdoc to find out what is supported and what types it requires. Phpdoc also provides no universally recognized way to document this. > - The type of the option values is not validated unless manually implemented. In the above example, the types will actually be validated due to the use of property types, but this will not follow usual PHP semantics (e.g. if the class declaration uses strict_types, the options will also be validated according to strict_types). > - Unless you go out of your way to protect against this, passing of unknown options will silently succeed. > - Use of an options array requires a specific decision at the time the API is introduced. If you start off without one, but then add additional optional parameters and realize that using an options array would be cleaner, you cannot perform the switch without breaking existing API users. Some solution of mentioned problems already exist in userland, for example excelent [Option Resolver] from the Symfony Framework. This proposal aims to address most of the issues with in-language solution that can be also applied to wider spectrum of cases. ## Proposal This RFC proposes to add structural typing to PHP language by introducing concept of array `shapes`. The shape can be thought of as interface for array. This RFC is also generalization of typed arrays concept, described in [arrayof RFC]. As mentioned earlier, shapes can be easily used to type most of `$options` arrays and other well-defined arrays. Because arrays in PHP acts loosely like javascripts objects semantics of shapes are very similar to those found in [typescripts interfaces](https://www.typescriptlang.org/docs/handbook/interfaces.html) ### The shape of an array Every array is simply a set of _key_ and _value_ pairs and is the simplest possible anonymous data structure in PHP. Different array values can be more or less similar to each other. Let's consider few examples ```php $arr1 = ["name" => "John", "age" => 23]; $arr2 = ["name" => "Kate", "age" => 17]; $arr3 = ["total" => 30, "items" => [ ... ]]; ``` As far as equality (`==`) is concerned each of this array is different from another, but clearly `$arr1` and `$arr2` are really similar to each other. The set of keys are the same, and types associated with each keys are also the same - we can say that `$arr1` and `$arr2` have the same shape, and shape of `$arr3` is different. According to that example we can define notion of array shape as the set of _key_, _type_ pairs. The shape of an array is not statically assigned to an array like, for example, the class is assigned to an object but rather is an inherent property of the value. Array shapes can also be different but compatible. Let's say that we have some function: ```php function greet(array $user) { echo "Hello {$user['name']}." if ($user["age"] > 18) { echo "Here is your beer!"; } else { echo "Here is your candy!"; } } ``` This function expects `$user` to have two keys: - `"name"` being string - `"age"` being int We can however safely pass `$arr4 = ["name" => "Kate", "age" => 22, "position" => "Developer"]`. The shape of `$arr4` is different from that of `$arr1, $arr2` but is somewhat compatible. ### Syntax and examples ``` [final] shape <name> [extends <parent1>, <parent2>, ...] { key[?][, key[?], ...]: type; key[?][, key[?], ...]: type; ... } where key can be: - string literal (e.g. "key"), - integer literal (e.g. 0), - default keyword (for every other key) ``` Main reason behind this syntax was to resemble syntax of array creation where `=> value` is replaced by `: type` and entries are separated by `;` instead of `,`. Shape can contain optional keys denoted by `?` after key definition. Any valid type can be used, including interfaces, classes and other shapes (creating nested structures). As additional bonus this syntax is almost identical to syntax of interfaces found in [typescript](https://www.typescriptlang.org/docs/handbook/interfaces.html). ```php shape FooShape { "str": string; // value associated with "str" key must be a string "num"?: int; // value associated with optional "num" key must be integer } // examples: ["str" => "string", "num" => 1] // is FooShape shaped ["str" => "string"] // is FooShape shaped, "num" key is optional ["str" => 2, "num" => 1] // is not FooShape shaped, "str" is not a string ["num" => 2] // is not FooShape shaped, "str" key is required ["str" => "string", "num" => 1, "foo" => "bar"] // is FooShape shaped, FooShape is not final and thus allows more keys ``` It is allowed to declare multiple keys in one line (similar to multiple conditions in [match expression v2 RFC]): ```php shape PaginationDTO { "count", "total"?: int; // multiple keys, both int type and one is optional "items": array; } // examples: ["count" => 1, "items" => []] // is PaginationDTO shaped ``` Shapes also allows to define default type for all not explicitly declared keys: ```php // shape with one required "count" key that must be int // when all other values must strings. shape ExtendableShape { "count": int; default: string; } // examples: ["count" => 5] // is ExtendableShape ["count" => 5, "foo" => "bar"] // is ExtendableShape shaped, value of "foo" key matches default type of string ["count" => 5, "foo" => 5] // is not ExtendableShape shaped, value of "foo" key does not match default type of string ["foo" => "bar", "bar" => "foo"] // is not ExtendableShape shaped, required "count" key is missing ``` This could also be used to define types for arrays of specific types: ```php shape IntArray { default: int; } // examples: [1, 2, 3] // is IntArray shaped, only ints [1, 2, "foo"] // is not IntArray shaped, "foo" is not an integer ``` By using only consecutive integer keys it is possible to properly define types for tuples: ```php shape IntStringPair { 0: int; 1: string; } // examples: [1, "foo"] // is IntStringPair shaped ["bar", "foo"] // is not IntStringPair shaped, value of 0 key is not an integer ``` Just like interfaces, FQN of shape could be obtained using `::class`. ```php namespace Foo\Bar; shape FooShape { "str": string; "num"?: int; } echo FooShape::class; // Foo\Bar\FooShape ``` #### Final shapes Often we want to have array of exact shape, that does not allow any not explicitly declared keys. This could be achieved by marking shape as final. Final shapes like classes **could not be extended**. ```php final shape KeyValuePair { "key": string; "value": mixed; } // examples: ["key" => "test", "value" => 1] // is KeyValuePair shaped ["key" => "test", "value" => "test"] // is KeyValuePair shaped ["key" => "foo", "value" => "bar", "note" => "not allowed"] // is not KeyValuePair shaped, shape is final and "note" key is not declared in the shape ``` To some extent this could also be used for declaring arrays of given length, but would not be feasible for larger arrays. It coulde be possible to later introduce range syntax for keys like `1..15`, but this feature is out of scope for this as PHP does not support range syntax at this point. ### Inheritance of shapes Shapes like interfaces and classes can be inherited from one another. It should be allowed to inherit from multiple shapes like it is possible with interfaces. The rules of inheritance should be identical to those applied for properties of classes to comply with [Liskov Substitution Principle]. ``` shape Foo { "foo": string; } shape FooBar extends Foo { "bar": string; } // Allowed, functionally equivalent to: shape FooBar { "foo": string; "bar": string; } shape FooInt extends Bar { "foo": int; // Not allowed, Type of B["foo"] must be string (as in shape A) } shape IntArray { default: int; } shape Options extends IntArray { "length": int; "name": string; // Not allowed Type of Options["name"] must be an int (as in shape IntArray) } shape A { "a": string; } shape B { "b": int; } shape C { "c": IntArray; } shape ABC extends A, B, C { } // Allowed, functionally equivalent to: shape ABC { "a": string; "b": int; "c": IntArray; } final KeyValuePairWithNote extends KeyValuePair { "note"?: string; } // Not allowed, KeyValuePair is marked as final ``` ### Alternative syntax choices There are few alternatives to proposed syntax: ``` // more like interfaces and classes shape FooShape { string $str; int $num?; } // even more like array shape FooShape { "str" => string; "num"? => int; } // types before keys shape FooShape { string "str"; int "num"?; } ``` From them `even more like array` case is only feasible alternative, and can be considered. Syntax similar to interface or class would be hard to use with `default` and integer keys and in authors personal opinion are less readable than suggested syntax. It is also possible to create syntax resembling [type alias](https://www.typescriptlang.org/docs/handbook/advanced-types.html#type-aliases) syntax found in typescript: ```php type KeyValuePair = [ "key": string; "value": mixed; ] ``` However, even if technically this syntax is perfectly feasible it should be part of wider type alias RFC, becoming an alternative syntax - just like [type aliases and interfaces](https://www.typescriptlang.org/docs/handbook/advanced-types.html#interfaces-vs-type-aliases) coexists in typescript. ### Shape validation and enforcing As mentioned earlier - shape of an array is inherent property of specific array and in opposition to objects is not stored anywhere. Therefore it would not be possible to enforce shape of array for its entire lifetime. Compatibility of shapes is checked in the same scenarios as primitive types: - when calling function or method with type hinted properties; - when returning value from function or method with return type declared; - used as type of typed object property; - explicitly checking by `instanceof` operator or `is_shape()` function. #### Type hinted parameter, reference parameter and return type When calling function or method with parameter typehinted to shape, shape compatibility is checked on call only and not enforced afterwards. It is therefore possible to change array breaking the shape contract: ```php function contractBreaking(KeyValuePair $pair) { print_r($pair instanceof KeyValuePair); // always bool(true) $pair["note"] = "test"; // valid, but then... print_r($pair instanceof KeyValuePair); // bool(false) } ``` This is exact same behavior like for any primitive type. It's also possible to break that contract for references, just like it is for primitive types: ```php function contractBreakingByReference(KeyValuePair &$pair) { $pair["note"] = "test"; // valid, but then... } $pair = ["key" => "string", "value" => "test"]; print_r($pair instanceof KeyValuePair); // bool(true) contractBreakingByReference($pair); print_r($pair instanceof KeyValuePair); // bool(false), shape was changed ``` Return types works exactly like expected, being checked when executing return statement. ```php function contractBreakingWithReturn(KeyValuePair $pair): KeyValuePair { $pair["note"] = "test"; return $pair; } $broken = contractBreakingWithReturn(["key" => "string", "value" => "test"]); // error returned value was expected to be KeyValuePair shaped ``` #### `instanceof` and `is_shape` function Shapes of array could be checked using `instanceof` operator just like objects: ```php shape FooShape { "str": string; // value associated with "str" key must be a string "num"?: int; // value associated with optional "num" key must be integer } // examples: ["str" => "string", "num" => 1] instanceof FooShape; // true ["str" => "string"] instanceof FooShape // true ["str" => 2, "num" => 1] instanceof FooShape // false ["num" => 2] instanceof FooShape // false ["str" => "string", "num" => 1, "foo" => "bar"] instanceof FooShape // true ``` It is also possible to use `is_shape(array $array, string $shape)` function: ```php is_shape(["str" => "string", "num" => 1], FooShape::class); // true ``` ### Reflection TBD ## Grammar definition (WIP) ``` shape = ["final"] "shape" T_STRING ["extends" parent-list] "{" shape-entries-list "}"; shape-entries-list = shape-entry | shape-entry shape-entries-list ; shape-entry = identifier-list ":" type-declaration ";"; identifier-list = identifier ["?"] | itentifier "," identifier-list ; identifier = string-literal | int-literal | T_DEFAULT; ``` ## Backward Incompatible Changes What breaks, and what is the justification for it? ## Proposed PHP Version(s) PHP 8.1 ## RFC Impact ### To SAPIs None ### To Existing Extensions None ### To Opcache It is necessary to develop RFC's with opcache in mind, since opcache is a core extension distributed with PHP. Please explain how you have verified your RFC's compatibility with opcache. ### New Constants None ### php.ini Defaults None ## Open Issues ### Typed properties As described earlier most of type checking in PHP occurs only occasionally for example when calling functions. This is not the case for the typed properties. Maintaining contract would require to check shape correctness every time that array changes, i.e. key is inserted, deleted or value of key is changed. This could impose some real implementation trouble ## Unaffected PHP Functionality List existing areas/features of PHP that will not be changed by the RFC. This helps avoid any ambiguity, shows that you have thought deeply about the RFC's impact, and helps reduces mail list noise. ## Future Scope ### Typed array alias, for example `int[]` This feature will make implementing feature like `T[]` trivial. Basically it'll only require to create shape with the default type of `T`. ## Proposed Voting Choices Should shapes be introduced into language? yes / no, 2/3 required ## Patches and Tests Links to any external patches and tests go here. If there is no patch, make it clear who will create a patch, or whether a volunteer to help with implementation is needed. Make it clear if the patch is intended to be the final patch, or is just a prototype. For changes affecting the core language, you should also provide a patch for the language specification. ## Implementation After the project is implemented, this section should contain - the version(s) it was merged into - a link to the git commit(s) - a link to the PHP manual entry for the feature - a link to the language specification section (if any) ## References - [named arguments RFC] ## Rejected Features Keep this updated with features that were discussed on the mail lists. ## Changes [named arguments RFC]: https://wiki.php.net/rfc/named_params [arrayof RFC]: https://wiki.php.net/rfc/arrayof [match expression v2 RFC]: https://wiki.php.net/rfc/match_expression_v2 [Liskov Substitution Principle]: https://en.wikipedia.org/wiki/Liskov_substitution_principle [Option Resolver]: https://symfony.com/doc/current/components/options_resolver.html