I think this is starting to shape up nicely. Although I have to admit that I'm still quite concerned about the added complexity for something that feels like it should be simple (but provably turns out not to be). Usually I would not be too concerned about complexity, especially if it's well documented, commented, and tested like here. However, this is core code and shape is something that has caused us tremendous amounts of work in the past because we did not get it right the first time, so I think for developer generations to come it is important to get this right this time around.

I actually had a thought that I'd love to get your input on @michaelosthege. Currently a lot of the complexity comes from the fact that we are trying to convert shape to size, and that's really hard because of all the edge cases, then throw dims, Ellipses, and observed into the mix and here we are. My idea currently only pertains to shape and the simplest implementation to get that.

What if instead of doing the shape->size conversion, could we just create the RV with the specified shape and be done with it? I think we might be able to do this by just broadcasting the dist_params that are inputted to the specified shape and everything follows from there.

We should be able to do that quite easily with at.broadcast_to(). There is still an issue with ndims but I found https://github.com/pymc-devs/aesara/blob/master/aesara/tensor/random/utils.py#L53 that seems to take care of that case.

Not thinking about dims or Ellipses for now, I think that should be quite straight forward and get us reasonably close with just a few lines (famous last words ;)).

From there, I think we'd be in a good spot where we can just use the dims->shape translation that you already implemented here in a separate PR.

Anyway, I know this thing is testing all of our patience to the maximum but thinking about our future selves and new contributors we owe it to us/them to think through various options here and pick the best path forward.

Select a repo