# Object Shapes Update
We are writing with an update on the Object Shapes implementation, and to ask what needs to be done before we can merge our work. I have continued work on this alongisde Aaron and Eileen.
## Code changes
[These](https://github.com/jemmaissroff/ruby/tree/object-shapes-prototyping) are our proposed code changes to implement Object Shapes in CRuby.
This patch adds an object shape implementation. Each object has a shape, which represents attributes of the object, such as which slots ivars are stored in and whether objects are frozen or not. The inline caches are updated to use shape IDs as the key, rather than the class of the object. This means we don't have to read the class from the object to check IC validity. It also allows more cache hits in some cases, and will allow JITs to optimize instance variable reading and writing.
The patch currently limits the number of available shape IDs to 65,536 (using 16 bits). We created a new IMEMO type that represents the shape, so shapes can be garbage collected. Collected shape IDs can be reused later.
## CPU performance:
We measured performance with microbenchmarks, [RailsBench](https://github.com/k0kubun/railsbench), and [YJIT bench](https://github.com/Shopify/yjit-bench). Here are the performance metrics we gathered.
These are all microbenchmarks which measure ivar performance:
```
$ make benchmark ITEM=vm_ivar
compare-ruby: ruby 3.2.0dev (2022-08-16T15:58:56Z master ac890ec062) [arm64-darwin21]
built-ruby: ruby 3.2.0dev (2022-08-16T20:12:55Z object-shapes-prot.. 872fa488c3) [arm64-darwin21]
# Iteration per second (i/s)
| |compare-ruby|built-ruby|
|:--------------------------|-----------:|---------:|
|vm_ivar | 98.231M| 102.161M|
| | -| 1.04x|
|vm_ivar_embedded_obj_init | 33.351M| 33.331M|
| | 1.00x| -|
|vm_ivar_extended_obj_init | 25.055M| 26.265M|
| | -| 1.05x|
|vm_ivar_generic_get | 18.374M| 17.215M|
| | 1.07x| -|
|vm_ivar_generic_set | 12.361M| 14.537M|
| | -| 1.18x|
|vm_ivar_of_class | 8.378M| 8.928M|
| | -| 1.07x|
|vm_ivar_of_class_set | 9.485M| 10.264M|
| | -| 1.08x|
|vm_ivar_set | 89.411M| 91.632M|
| | -| 1.02x|
|vm_ivar_init_subclass | 6.104M| 12.928M|
| | -| 2.12x|
```
To address the outliers above:
* `vm_ivar_generic_set` is faster because this patch adds inline caches to generic ivars, which did not exist previously
* `vm_ivar_init_subclass` is significantly faster because, with shapes, subclasses can hit caches (as class is no longer part of the cache key)
Object Shapes and Ruby master perform roughly the same on [RailsBench](https://github.com/k0kubun/railsbench).
On the following measurement, Ruby master had 1852.1 requests per second, while Object Shapes had 1842.7 requests per second.
```
$ RAILS_ENV=production bin/bench
ruby 3.2.0dev (2022-08-15T14:00:03Z master 0264424d58) [arm64-darwin21]
1852.1
```
```
$ RAILS_ENV=production bin/bench
ruby 3.2.0dev (2022-08-15T15:20:22Z object-shapes-prot.. d3dbefd6cd) [arm64-darwin21]
1842.7
```
## Memory performance
Each Ruby object contains a shape ID. The shape ID corresponds to an index in an array. We can easily look up the shape object given a shape ID. Currently, we have a fixed size array which stores pointers to all active shapes (or NULL in the case that the shape is yet to be used). That array is ~64k * sizeof(uintptr_t) (about 500kb) and is currently a fixed size overhead for the Ruby process.
Running an empty Ruby script, we can see this overhead. For instance:
On Ruby master:
```
$ /usr/bin/time -l ruby -v -e' '
ruby 3.2.0dev (2022-08-15T14:00:03Z master 0264424d58) [arm64-darwin21]
28639232 maximum resident set size
```
With the shapes branch:
```
$ /usr/bin/time -l ./ruby -v -e' '
ruby 3.2.0dev (2022-08-15T15:20:22Z object-shapes-prot.. d3dbefd6cd) [arm64-darwin21]
28917760 maximum resident set size
```
This is roughly a 0.97% memory increase on an empty Ruby script. Obviously, on bigger Ruby processes, it would represent an even smaller memory increase.
## YJIT Statistics
We also ran YJIT-bench and got the following results:
on Ruby master:
```
end_time="2022-08-17 09:31:36 PDT (-0700)"
yjit_opts=""
ruby_version="ruby 3.2.0dev (2022-08-16T15:58:56Z master ac890ec062) [x86_64-linux]"
git_branch="master"
git_commit="ac890ec062"
------------- ----------- ---------- --------- ---------- ----------- ------------
bench interp (ms) stddev (%) yjit (ms) stddev (%) interp/yjit yjit 1st itr
30k_ifelse 2083.0 0.1 203.6 0.0 10.23 0.80
30k_methods 5140.1 0.0 476.7 0.1 10.78 3.95
activerecord 188.1 0.1 99.5 0.2 1.89 1.23
binarytrees 804.8 0.1 409.2 1.1 1.97 1.93
cfunc_itself 232.5 2.4 43.3 1.5 5.36 5.34
chunky_png 2316.9 0.2 757.3 0.3 3.06 2.86
erubi 412.1 0.4 281.3 1.0 1.46 1.47
erubi_rails 31.1 2.2 17.4 2.7 1.78 0.33
fannkuchredux 11414.6 0.2 2773.5 1.3 4.12 1.00
fib 591.8 1.1 41.7 4.5 14.20 13.93
getivar 234.2 3.1 23.5 0.1 9.95 1.00
hexapdf 4755.7 1.0 2517.3 3.0 1.89 1.51
keyword_args 520.7 0.6 54.6 0.2 9.55 9.24
lee 2274.1 0.2 1133.3 0.2 2.01 1.98
liquid-render 296.7 0.3 139.3 2.8 2.13 1.46
mail 212.9 0.1 127.9 0.1 1.66 0.72
nbody 225.4 0.2 78.3 0.2 2.88 2.70
optcarrot 14592.1 0.7 4072.8 0.3 3.58 3.43
psych-load 3947.8 0.0 2075.5 0.1 1.90 1.88
railsbench 2826.0 0.6 1774.4 1.9 1.59 1.26
respond_to 424.3 0.2 154.5 3.1 2.75 2.76
rubykon 22545.1 0.4 6993.5 1.3 3.22 3.24
setivar 185.9 5.6 97.0 0.0 1.92 1.00
str_concat 123.1 0.9 28.6 2.0 4.31 3.35
------------- ----------- ---------- --------- ---------- ----------- ------------
Legend:
- interp/yjit: ratio of interp/yjit time. Higher is better. Above 1 represents a speedup.
- 1st itr: ratio of interp/yjit time for the first benchmarking iteration.
```
with the shapes branch:
```
end_time="2022-08-16 13:56:32 PDT (-0700)"
yjit_opts=""
ruby_version="ruby 3.2.0dev (2022-08-15T18:35:34Z object-shapes-prot.. 51a23756c3) [x86_64-linux]"
git_branch="object-shapes-prototyping"
git_commit="51a23756c3"
------------- ----------- ---------- --------- ---------- ----------- ------------
bench interp (ms) stddev (%) yjit (ms) stddev (%) interp/yjit yjit 1st itr
30k_ifelse 2135.2 0.0 340.1 0.1 6.28 0.95
30k_methods 5180.7 0.0 906.2 0.1 5.72 3.56
activerecord 189.2 0.1 174.5 0.1 1.08 0.83
binarytrees 783.2 1.0 438.7 2.5 1.79 1.82
cfunc_itself 225.2 1.6 44.0 0.6 5.11 5.01
chunky_png 2394.9 0.2 1657.0 0.2 1.45 1.44
erubi 418.1 0.5 284.3 1.1 1.47 1.45
erubi_rails 31.6 1.5 26.2 2.1 1.21 0.34
fannkuchredux 12208.5 0.1 2821.6 0.4 4.33 0.99
fib 565.7 0.3 41.3 0.1 13.69 13.59
getivar 247.6 0.1 244.9 2.0 1.01 1.02
hexapdf 4961.0 1.6 4926.1 0.9 1.01 0.94
keyword_args 499.7 0.8 57.0 0.4 8.77 8.65
lee 2360.0 0.6 2138.6 0.6 1.10 1.11
liquid-render 294.7 0.7 274.9 1.4 1.07 0.91
mail 216.6 0.1 157.7 0.7 1.37 0.70
nbody 232.7 0.2 237.2 0.5 0.98 0.99
optcarrot 15095.8 0.7 18309.2 0.5 0.82 0.83
psych-load 4174.5 0.1 3707.9 0.1 1.13 1.13
railsbench 2923.7 0.8 2548.4 1.4 1.15 0.98
respond_to 409.2 0.3 162.6 1.7 2.52 2.52
rubykon 22554.1 0.7 20160.6 0.9 1.12 1.10
setivar 249.6 0.1 169.5 0.1 1.47 0.99
str_concat 137.8 0.8 29.0 2.4 4.75 3.50
------------- ----------- ---------- --------- ---------- ----------- ------------
Legend:
- interp/yjit: ratio of interp/yjit time. Higher is better. Above 1 represents a speedup.
- 1st itr: ratio of interp/yjit time for the first benchmarking iteration.
```
We are seeing some variations in YJIT benchmark numbers, and are working on addressing them.
## 32 bit architectures
We're storing the shape ID for T_OBJECT types in the top 32 bits of the flags field (sharing space with the ractor ID). Consequently 32 bit machines do not benefit from this patch. This patch makes 32 bit machines always miss on inline caches.
## Instance variables with ID == 0
This is minor, but we also do not support instance variables whose ID is 0 because the outgoing edge tables are `id_table`s which do not support `0` as a key. There is [one test for this feature](https://github.com/ruby/ruby/blob/ac890ec0624e3d8a44d85d67127bc94322caa34e/test/-ext-/marshal/test_internal_ivar.rb#L9-L21), and we have marked it as pending in this patch.
## Merging
We think this feature is ready to merge. Please give us feedback, and let us know if it is possible to merge now. If it's not possible, please let us know what needs to be improved so that we can merge.
## Future work
We plan to work next on speeding up the class instance variables. We will implement caching for this, and see the full benefits of object shapes in this case.