The Infinitism already implemented these tests. The challenge of these tests is that validating need debug_traceCall geth api which some clients doesn't support it. And I planned to use anvil test node from foundry to play a role like hardhat node in testing. Gladly, I make a pr last week to add debug_traceCall api into the anvil node. In that case, we could use anvil for testing validation of user operations.
So the rest is to implement these tests in Rust.Well, that's simple then.
After reviewing the notes and some discussions in the telegram group, currently most of the developers agrees that the public mempool should be a separate mempool from the execution transaction mempools. But there are still some developers want to put this mempools in to origin exectution mempools. But Vitalik gives a very good point that the erc-4337 should be separte network because risk could be contained from clients. That's the original idea of erc-4337.