# EPF5 Week8
This week I've finished a demo version of the`trace_block` API in geth, now this RPC works fine to store each block's trace results into local disk, and retrieving it with `trace_block` RPC. I'm running this version in my local env in the mainnet, try to find out some issues.
See PR [ethereum/go-ethereum#30255](https://github.com/ethereum/go-ethereum/pull/30255), leaving some more tasks to be done.
## 1. data inconsistency between canonical chain and stored data
As the `OnBlockStart` hook is invoked before processing each block, ref https://github.com/ethereum/go-ethereum/blob/142c94d62842c7801e8f4d71f080ae156ecd1f2b/core/blockchain.go#L1909-L1923.
However, sometimes we may receive a block that does not end up in the canonical chain. In such cases, the trace hooks don’t provide any indication of this, leading to a discrepancy between the data we collect for each block and that in the canonical chain.
eg, from geth's logs we have:
```bash
INFO [08-02|01:13:39.075] Imported new potential chain segment number=20,437,504 hash=59d8a0..fde473 blocks=1 txs=145 mgas=12.401 elapsed=1.553s ...
INFO [08-02|01:13:39.204] Chain head was updated number=20,437,504 hash=59d8a0..fde473 root=f09150..55dd05 elapsed=12.083725ms
INFO [08-02|01:13:50.001] Imported new potential chain segment number=20,437,505 hash=94ba38..565dab blocks=1 txs=117 mgas=12.239 elapsed=829.363ms ...
INFO [08-02|01:13:50.137] Chain head was updated number=20,437,505 hash=94ba38..565dab root=6c58cd..d85eb1 elapsed=12.595462ms
INFO [08-02|01:14:10.623] Imported new potential chain segment number=20,437,504 hash=01303d..29bfea blocks=1 txs=0 mgas=0.000 elapsed=79.759ms ...
```
the block [20437504](https://etherscan.io/block/20437504) were inserted twice:
1. first inserted with block hash: `59d8a0..fde473`
2. second inserted after block 20437505 with hash: `01303d..29bfea`
But actually the hash of block [20437504](https://etherscan.io/block/20437504) is `0x59d8a06a30ab8b6d81a8dac44e745e2c32baa357412ec0c53238345752fde473`, so the second one should be ignored, in trace's side we should not modify the stored data when the second insertion happened.
In my current implemetion, the second inserted 20437504 will overwrite the first one, and the block 20437505 will be truncated, causing all subsequent data to be corrupted.
## 2. performance comparison of RLP vs JSON (un)marshal
From @s1na's side, seems RLP has better performance over JSON
> I’d argue json serialization for persistence is not great and we should do rlp. When doing filtering we have to decode all the objects and actually filter matching items
I need to benchmark the difference of JSON and RLP in the following aspects:
1. disk usage
3. cpu/memory performance in storing and retriving scenes.
## Plans for next week
So in the upcoming week, I'll focus on resolving the data inconsistency issue first, this will require a deeper understanding of the potential chain insertion and pruning methods.