PHP RFC: Shapes

Introduction

It's not uncommon to see functions with a lot of optional configuration, this problem was partially solved for simpler cases by the named arguments RFC, however it's not feasible to use named arguments with a lot (10+) of possible configuration options.

Commonly used solution to this problem is introducing array $options = [] as the last argument of function. This comes with its own cost and disadvantages were clearly stated in aforementioned RFC:

  • For constructors in particular, it precludes usage of constructor promotion.
  • The available options are not documented in the signature. You have to look at the implementation or phpdoc to find out what is supported and what types it requires. Phpdoc also provides no universally recognized way to document this.
  • The type of the option values is not validated unless manually implemented. In the above example, the types will actually be validated due to the use of property types, but this will not follow usual PHP semantics (e.g. if the class declaration uses strict_types, the options will also be validated according to strict_types).
  • Unless you go out of your way to protect against this, passing of unknown options will silently succeed.
  • Use of an options array requires a specific decision at the time the API is introduced. If you start off without one, but then add additional optional parameters and realize that using an options array would be cleaner, you cannot perform the switch without breaking existing API users.

Some solution of mentioned problems already exist in userland, for example excelent Option Resolver from the Symfony Framework. This proposal aims to address most of the issues with in-language solution that can be also applied to wider spectrum of cases.

Proposal

This RFC proposes to add structural typing to PHP language by introducing concept of array shapes. The shape can be thought of as interface for array. This RFC is also generalization of typed arrays concept, described in arrayof RFC. As mentioned earlier, shapes can be easily used to type most of $options arrays and other well-defined arrays. Because arrays in PHP acts loosely like javascripts objects semantics of shapes are very similar to those found in typescripts interfaces

The shape of an array

Every array is simply a set of key and value pairs and is the simplest possible anonymous data structure in PHP. Different array values can be more or less similar to each other. Let's consider few examples

$arr1 = ["name" => "John", "age" => 23];
$arr2 = ["name" => "Kate", "age" => 17];
$arr3 = ["total" => 30, "items" => [ ... ]];

As far as equality (==) is concerned each of this array is different from another, but clearly $arr1 and $arr2 are really similar to each other. The set of keys are the same, and types associated with each keys are also the same - we can say that $arr1 and $arr2 have the same shape, and shape of $arr3 is different.

According to that example we can define notion of array shape as the set of key, type pairs. The shape of an array is not statically assigned to an array like, for example, the class is assigned to an object but rather is an inherent property of the value.

Array shapes can also be different but compatible. Let's say that we have some function:

function greet(array $user) {
    echo "Hello {$user['name']}."
    
    if ($user["age"] > 18) {
        echo "Here is your beer!";
    } else {
        echo "Here is your candy!";
    }
}

This function expects $user to have two keys:

  • "name" being string
  • "age" being int

We can however safely pass $arr4 = ["name" => "Kate", "age" => 22, "position" => "Developer"]. The shape of $arr4 is different from that of $arr1, $arr2 but is somewhat compatible.

Syntax and examples

[final] shape <name> [extends <parent1>, <parent2>, ...] {
    key[?][, key[?], ...]: type;
    key[?][, key[?], ...]: type;
    ...
} 

where key can be:
 - string literal (e.g. "key"), 
 - integer literal (e.g. 0),
 - default keyword (for every other key)

Main reason behind this syntax was to resemble syntax of array creation where => value is replaced by : type and entries are separated by ; instead of ,. Shape can contain optional keys denoted by ? after key definition. Any valid type can be used, including interfaces, classes and other shapes (creating nested structures). As additional bonus this syntax is almost identical to syntax of interfaces found in typescript.

shape FooShape {
    "str": string; // value associated with "str" key must be a string
    "num"?: int;   // value associated with optional "num" key must be integer 
}

// examples:
["str" => "string", "num" => 1] // is FooShape shaped
["str" => "string"] // is FooShape shaped, "num" key is optional
["str" => 2, "num" => 1] // is not FooShape shaped, "str" is not a string
["num" => 2] // is not FooShape shaped, "str" key is required
["str" => "string", "num" => 1, "foo" => "bar"] // is FooShape shaped, FooShape is not final and thus allows more keys

It is allowed to declare multiple keys in one line (similar to multiple conditions in match expression v2 RFC):

shape PaginationDTO {
    "count", "total"?: int; // multiple keys, both int type and one is optional
    "items": array;
}

// examples:
["count" => 1, "items" => []] // is PaginationDTO shaped

Shapes also allows to define default type for all not explicitly declared keys:

// shape with one required "count" key that must be int 
// when all other values must strings.
shape ExtendableShape {
    "count": int;
    default: string;
}

// examples:
["count" => 5] // is ExtendableShape
["count" => 5, "foo" => "bar"] // is ExtendableShape shaped, value of "foo" key matches default type of string
["count" => 5, "foo" => 5] // is not ExtendableShape shaped, value of "foo" key does not match default type of string
["foo" => "bar", "bar" => "foo"] // is not ExtendableShape shaped, required "count" key is missing

This could also be used to define types for arrays of specific types:

shape IntArray {
    default: int;
}

// examples:
[1, 2, 3] // is IntArray shaped, only ints
[1, 2, "foo"] // is not IntArray shaped, "foo" is not an integer

By using only consecutive integer keys it is possible to properly define types for tuples:

shape IntStringPair {
   0: int;
   1: string;
}

// examples:
[1, "foo"] // is IntStringPair shaped
["bar", "foo"] // is not IntStringPair shaped, value of 0 key is not an integer

Just like interfaces, FQN of shape could be obtained using ::class.

namespace Foo\Bar;

shape FooShape {
    "str": string;
    "num"?: int; 
}

echo FooShape::class; // Foo\Bar\FooShape

Final shapes

Often we want to have array of exact shape, that does not allow any not explicitly declared keys. This could be achieved by marking shape as final. Final shapes like classes could not be extended.

final shape KeyValuePair {
    "key": string;
    "value": mixed;
}

// examples:
["key" => "test", "value" => 1] // is KeyValuePair shaped
["key" => "test", "value" => "test"] // is KeyValuePair shaped
["key" => "foo", "value" => "bar", "note" => "not allowed"] // is not KeyValuePair shaped, shape is final and "note" key is not declared in the shape

To some extent this could also be used for declaring arrays of given length, but would not be feasible for larger arrays. It coulde be possible to later introduce range syntax for keys like 1..15, but this feature is out of scope for this as PHP does not support range syntax at this point.

Inheritance of shapes

Shapes like interfaces and classes can be inherited from one another. It should be allowed to inherit from multiple shapes like it is possible with interfaces. The rules of inheritance should be identical to those applied for properties of classes to comply with Liskov Substitution Principle.

shape Foo {
   "foo": string;
}

shape FooBar extends Foo {
   "bar": string;
}

// Allowed, functionally equivalent to:
shape FooBar {
   "foo": string;
   "bar": string;
}

shape FooInt extends Bar {
   "foo": int; // Not allowed, Type of B["foo"] must be string (as in shape A)
}

shape IntArray {
   default: int;
}

shape Options extends IntArray { 
   "length": int;
   "name": string; // Not allowed Type of Options["name"] must be an int (as in shape IntArray)
}

shape A {
    "a": string;
}

shape B {
    "b": int;
}

shape C {
    "c": IntArray;
}

shape ABC extends A, B, C { }
// Allowed, functionally equivalent to:
shape ABC {
    "a": string;
    "b": int;
    "c": IntArray;
}

final KeyValuePairWithNote extends KeyValuePair {
    "note"?: string;
}
// Not allowed, KeyValuePair is marked as final

Alternative syntax choices

There are few alternatives to proposed syntax:

// more like interfaces and classes
shape FooShape {
    string $str;
    int $num?;
}

// even more like array
shape FooShape {
    "str"  => string;
    "num"? => int;
}

// types before keys
shape FooShape {
    string "str";
    int "num"?;
}

From them even more like array case is only feasible alternative, and can be considered. Syntax similar to interface or class would be hard to use with default and integer keys and in authors personal opinion are less readable than suggested syntax.

It is also possible to create syntax resembling type alias syntax found in typescript:

type KeyValuePair = [
    "key": string;
    "value": mixed;
]

However, even if technically this syntax is perfectly feasible it should be part of wider type alias RFC, becoming an alternative syntax - just like type aliases and interfaces coexists in typescript.

Shape validation and enforcing

As mentioned earlier - shape of an array is inherent property of specific array and in opposition to objects is not stored anywhere. Therefore it would not be possible to enforce shape of array for its entire lifetime.

Compatibility of shapes is checked in the same scenarios as primitive types:

  • when calling function or method with type hinted properties;
  • when returning value from function or method with return type declared;
  • used as type of typed object property;
  • explicitly checking by instanceof operator or is_shape() function.

Type hinted parameter, reference parameter and return type

When calling function or method with parameter typehinted to shape, shape compatibility is checked on call only and not enforced afterwards. It is therefore possible to change array breaking the shape contract:

function contractBreaking(KeyValuePair $pair) {
    print_r($pair instanceof KeyValuePair); // always bool(true)
    $pair["note"] = "test"; // valid, but then...
    print_r($pair instanceof KeyValuePair); // bool(false)
}

This is exact same behavior like for any primitive type. It's also possible to break that contract for references, just like it is for primitive types:

function contractBreakingByReference(KeyValuePair &$pair) {
    $pair["note"] = "test"; // valid, but then...
}

$pair = ["key" => "string", "value" => "test"];
print_r($pair instanceof KeyValuePair); // bool(true)
contractBreakingByReference($pair);
print_r($pair instanceof KeyValuePair); // bool(false), shape was changed

Return types works exactly like expected, being checked when executing return statement.

function contractBreakingWithReturn(KeyValuePair $pair): KeyValuePair {
    $pair["note"] = "test";
    return $pair;
}

$broken = contractBreakingWithReturn(["key" => "string", "value" => "test"]);
// error returned value was expected to be KeyValuePair shaped

instanceof and is_shape function

Shapes of array could be checked using instanceof operator just like objects:

shape FooShape {
    "str": string; // value associated with "str" key must be a string
    "num"?: int;   // value associated with optional "num" key must be integer 
}

// examples:
["str" => "string", "num" => 1] instanceof FooShape; // true
["str" => "string"] instanceof FooShape // true
["str" => 2, "num" => 1] instanceof FooShape // false
["num" => 2] instanceof FooShape // false
["str" => "string", "num" => 1, "foo" => "bar"] instanceof FooShape // true

It is also possible to use is_shape(array $array, string $shape) function:

is_shape(["str" => "string", "num" => 1], FooShape::class); // true

Reflection

TBD

Grammar definition (WIP)

shape = ["final"] "shape" T_STRING ["extends" parent-list] "{" shape-entries-list "}";
      
shape-entries-list = shape-entry
                   | shape-entry shape-entries-list
                   ;
                   
shape-entry = identifier-list ":" type-declaration ";";
            
identifier-list = identifier ["?"]
                | itentifier "," identifier-list
                ;
                
identifier = string-literal | int-literal | T_DEFAULT;

Backward Incompatible Changes

What breaks, and what is the justification for it?

Proposed PHP Version(s)

PHP 8.1

RFC Impact

To SAPIs

None

To Existing Extensions

None

To Opcache

It is necessary to develop RFC's with opcache in mind, since opcache is a core extension distributed with PHP.

Please explain how you have verified your RFC's compatibility with opcache.

New Constants

None

php.ini Defaults

None

Open Issues

Typed properties

As described earlier most of type checking in PHP occurs only occasionally for example when calling functions. This is not the case for the typed properties. Maintaining contract would require to check shape correctness every time that array changes, i.e. key is inserted, deleted or value of key is changed. This could impose some real implementation trouble

Unaffected PHP Functionality

List existing areas/features of PHP that will not be changed by the RFC.

This helps avoid any ambiguity, shows that you have thought deeply about the RFC's impact, and helps reduces mail list noise.

Future Scope

Typed array alias, for example int[]

This feature will make implementing feature like T[] trivial. Basically it'll only require to create shape with the default type of T.

Proposed Voting Choices

Should shapes be introduced into language? yes / no, 2/3 required

Patches and Tests

Links to any external patches and tests go here.

If there is no patch, make it clear who will create a patch, or whether a volunteer to help with implementation is needed.

Make it clear if the patch is intended to be the final patch, or is just a prototype.

For changes affecting the core language, you should also provide a patch for the language specification.

Implementation

After the project is implemented, this section should contain

  • the version(s) it was merged into
  • a link to the git commit(s)
  • a link to the PHP manual entry for the feature
  • a link to the language specification section (if any)

References

Rejected Features

Keep this updated with features that were discussed on the mail lists.

Changes