# Handling hex values in Eth2 APIs ## About This document summarizes my research about bridging together `grpc-gateway` and the official Eth2 APIs when dealing with hex-encoded inputs and outputs. ### How it was supposed to be ![](https://i.imgur.com/VaauvHd.png) ### How it really was ![](https://i.imgur.com/6axBBfy.jpg) ## Introduction We use the `grpc-gateway` library, https://github.com/grpc-ecosystem/grpc-gateway, in our current codebase to facilitate translations between HTTP and gRPC. We want to continue using it for Eth2 APIs. Many Eth2 API endpoints use hex-encoded strings in the input and/or output JSON values e.g. `genesis_validators_root` or `fork.previous_version` in https://ethereum.github.io/eth2.0-APIs/#/Debug/getState. We want to pass these values to and from our gRPC functions using `[]byte` on the gRPC side. The issue is that protocol buffers expect a base64-encoded string when marshaling to a `bytes` proto field and output a base64-encoded string when unmarshaling from a `bytes` proto field. `grpc-gateway` conforms to this specification. The JSON hex-encoded inputs/outputs are therefore not compatible with what `grpc-gateway` would like to see, which requires us to manually handle hex <--> base64 translations. ## Translation options I tried several translation options and obtained mixed results. ### Option 1: Proxy HTTP Server #### What's the idea? Define a proxy HTTP server between the client and the gateway server. Translate the incoming hex strings into base64 strings inside the proxy before sending them to the gateway server (reversely for outgoing strings). Code with detailed explanation can be found here: https://hackmd.io/9r47oPRHT1yjI_uKmhWVtw #### Does it work? If no, why? YES #### What are the cons? Firstly we have to write the code responsible for proxying everything correctly, which can be quite complex. The upside is that we only need to do it once for all message types (and the linked example already has this mostly covered). The bigger issue, as shown in the linked example, is that we need to define a custom struct for every JSON. This is because we have to annotate which fields of the JSON are hex strings as well as the field names. #### What are the pros? The biggest advantage of having a dedicated middleware to handle API quirks is that we don't pollute the gRPC layer with API-specific code. Ideally the gRPC layer should be ignorant about the API. It's likely that future changes to the API (e.g. v2) will introduce more friction with gRPC and trying to handle things inside the gRPC layer might be difficult if not impossible. As a matter of fact, we already have https://ethereum.github.io/eth2.0-APIs/#/Beacon/submitPoolAttestations with a custom `failures` field in the 400 response, while the gateway only supports generic error messages. This must be handled by a proxy server. #### FAQ Q: *Why can't we annotate proto messages instead of creating custom structs?* A: `http: panic serving 127.0.0.1:37002: reflect: call of reflect.Value.SetString on slice Value` on line `j, _ := json.Marshal(m)` where `m := v1.HexMessage{}` is the proto message. The issue is that the incoming JSON has a string field, but the message's field type is `[]byte`. Q: *Do we have to define all endpoints in the proxy server to match gateway endpoints?* A: Most likely no, we can have one endpoint in the proxy server and forward the request to the appropriate gateway endpoint by reading the URL. Q: *Will we have to implement the proxy server anyway to fully implement the API spec?* A: Most likely. There doesn't seem to be any other way to have a custom field inside an error message. ### Option 2: `Hex` wrapper with custom marshaling #### What's the idea? Protobuf defines two interfaces that can be implemented to change how they are marshaled and unmarshaled: ```go= type JSONPBMarshaler interface { MarshalJSONPB(*Marshaler) ([]byte, error) } type JSONPBUnmarshaler interface { UnmarshalJSONPB(*Unmarshaler, []byte) error } ``` We create a message type wrapping the `bytes` proto field and implement custom marshaling for this type. ``` message Hex { bytes b = 1; } ``` ```go func (h *Hex) MarshalJSONPB(*jsonpb.Marshaler) ([]byte, error) { s := base64.StdEncoding.EncodeToString(h.B) hexString := hex.EncodeToString([]byte(s)) // Add quotation marks and '0x' to represent hex value in JSON ("0x...") b := []byte("\"0x"+hexString+"\"") return b, nil } func (h *Hex) UnmarshalJSONPB(_ *jsonpb.Unmarshaler, b []byte) error { // Strip off quotation marks and '0x' from hex value in JSON ("0x...") s := string(b)[3 : len(b)-1] hexBytes, err := hex.DecodeString(s) if err != nil { return err } b64, err := base64.StdEncoding.DecodeString(string(hexBytes)) if err != nil { return err } h.B = b64 return nil } ``` #### Does it work? If no, why? YES with a caveat [desribed here](https://hackmd.io/YR8OAfEOQMiY3u9EUyG1hQ?both#Aside-options-2-and-3-might-not-longer-be-possible). #### What are the cons? Probably the biggest disadvantage of this approach is that we would need to amend all proto messages that use `bytes` fields, wrapping these fields in `Hex`. What's more, we would have to most likely define separate wrapper types for each ssz size and for each custom type like `BitVector` (and eth2types such as `Root` in the future), which can lead to a blow up of proto types. Additionally, an API concern leaks into proto resulting in a redundant wrapper for all non-API usages. We might also encounter issues with ssz serialization. #### What are the pros? We define custom marshaling once for all message types. Assuming that we will need several wrapper types, the algorithm can be extracted into a private function and reused because it's all `[]byte`s under the hood. This option does not require a lot of code apart from defining all the hex types (in case it is requred), and fits well into protocol buffers. ### Option 3: Custom marshaling for messages #### What's the idea? Similarly to option 2, we take advantage of `JSONPBMarshaler` and `JSONPBUnmarshaler` interfaces, but this time we implement them on the entire message instead of a single field. ``` message HexMessage { bytes h = 1; } ``` ```go func (h *HexMessage) MarshalJSONPB(*jsonpb.Marshaler) ([]byte, error) { s := base64.StdEncoding.EncodeToString(h.Hex8) hexString := hex.EncodeToString([]byte(s)) // Add quotation marks and '0x' to represent hex value in JSON ("0x...") h.Hex8 = []byte("\"0x" + hexString + "\"") return json.Marshal(h) } func (h *HexMessage) UnmarshalJSONPB(_ *jsonpb.Unmarshaler, b []byte) error { err := json.Unmarshal(b, h) if err != nil { return err } // Strip off quotation marks and '0x' from hex value in JSON ("0x...") s := string(b)[3 : len(b)-1] hexBytes, err := hex.DecodeString(s) if err != nil { return err } b64, err := base64.StdEncoding.DecodeString(string(hexBytes)) if err != nil { return err } h.Hex8 = b64 return nil } ``` #### Does it work? If no, why? NO This solution works properly for the example proto because it does not have nested messages. It doesn't seem possible to continue marshaling the rest of the message using the default protobuf functionality though. Once `MarshalJSONPB` or `UnmarshalJSONPB` is invoked, one would need to marshal a message manually all the way down. The example uses `json.Marshal` and `json.Unmarshal` for simplicity, which fails to properly work with messages like this: ``` message Outer { bytes h = 1; Inner i = 2; } message Inner { bytes h = 1; } ``` #### What are the cons? N/A - solution does not work. #### What are the pros? N/A - solution does not work. ### Aside: options 2 and 3 might not longer be possible When I was testing options 2 and 3 a while back, my custom marshaling functions got invoked as expected. After using `protoc-gen-go v1.25.0` and `protoc v3.15.6` these functions are no longer invoked, even though they are defined and visible. This may be due to a comment present on the implemented interfaces: > // Deprecated: Custom types should implement protobuf reflection instead. It might be that without using reflection both options 2 and 3 are not viable anymore. This needs more investigation. ### Option 4: Fork `grpc-gateway` and implement custom marshaling of hex-encoded strings #### What's the idea? Fork the `grpc-gateway` repo and plug in custom code for hex marshaling and unmarshaling. #### Does it work? If no, why? NO This is the decoding function (https://github.com/grpc-ecosystem/grpc-gateway/blob/74ecd1deffacf97bcbee90e81c631ef6a3c275f2/runtime/marshal_jsonpb.go#L183), which translates from JSON to a proto message, annotated with places where we can execute our custom code: ```go func decodeJSONPb(d *json.Decoder, unmarshaler protojson.UnmarshalOptions, v interface{}) error { p, ok := v.(proto.Message) if !ok { return decodeNonProtoField(d, unmarshaler, v) } // ourCustomFunc(p) #1 // Decode into bytes for marshalling var b json.RawMessage err := d.Decode(&b) if err != nil { return err } // ourCustomFunc(p) #2 if err := unmarshaler.Unmarshal([]byte(b), p); err != nil { return err } // ourCustomFunc(p) #3 return nil } ``` `#1` won't work because `p` is `nil` at this point as the JSON hasn't been decoded yet. `#2` won't work because `p` is still `nil`. The JSON was decoded into a `json.RawMessage` object, but that's not what we want to work with. `#3` also won't work because after the call to `unmarshaler.Unmarshal` our `p` message does not contain the original byte slices but slices already marshaled by protobuf. We need the original slices from `d` or `b`, both of which have very inconvenient types. A similar argument can be made for translating in the other direction - from a proto message into JSON. #### What are the cons? N/A - solution does not work. #### What are the pros? N/A - solution does not work. ### Option 5 custom marshaler and parser using features of grpc-gateway Ivan has investigated using an in-built feature of grpc-gateway that allows custom marshalers input/output marshalers. He has written a wrapper marshaler which takes the jsonpb output/input and parses it in order to give the result needed. While this has worked gracefully for inbound and outbound messages, grpc-gateway does not support any kind of custom encoding for path parameters which we use. ## My personal opinion I am advocating Option 1. It allows a clean separation of concerns between the API and gRPC. I believe a proxy server will have to be implemented sooner or later anyway (as mentioned, the spec already includes a custom field in the error message, which is probably impossible to implement with the gateway alone). Even though this option requires quite a lot of additional code, having the proxy allows us maximum flexibility: we can inspect query params, do custom routing, and generally perform all HTTP-related stuff without being constrained by the gateway or gRPC in general. Some things like struct generation can be done by a tool in the future, lessening the developer's work. There is of course a possibility that one of the other options is possible to implement, maybe even very easily. I am sure that I overlooked some things. But even if we decide on another option, how reasonable it is that it will be able to implement the current and future API spec on its own? Every other option is tightly coupled with gRPC and/or the gateway, which makes me worried that it won't stand the test of time.