owned this note
owned this note
Published
Linked with GitHub
# Stewards Discussion Time: HTTP APIs that accept binary data as query string args
[TOC]
# Scoping workshop @ 2021-04-22
## Plan for this call
Focus on the immediate concern ([go-ipfs#7958](https://github.com/ipfs/go-ipfs/issues/7958)) ,agree on a way to fix it and document it as "ready to work on" issue on the [Maintenance Board](https://github.com/orgs/ipfs/projects/12).
The immediate concern relates to the idea of "Drop HTTP API for gRPC API" (more notes on this below), but we should be very careful and keep the scope in check so we can deliver solution to this problem in a realistic timeframe.
If time allows, discuss long term options and align on our long term strategy around HTTP API. If not, schedule a separate call for that.
## The immediate concern: handling of binary data in `/api/v0`
Details in: https://github.com/ipfs/go-ipfs/issues/7958
### Problem summary
Some commands exposed by the HTTP API at `/api/v0`:
- interpret `arg=` query strings from request URL as binary data
- this causes problems for browsers and other JS client due to [URL-encoding](https://en.wikipedia.org/wiki/Percent-encoding) breaking binary data (example in the issue)
- interpret request body as binary or return binary in response body or one of its fields
- this breaks things in unexpected ways, usually because the binary payload includes a newline character or a sequence of characters that we use for message or field separation
- **prior art:** we solved a variant of this problem in [`/api/v0/object/get`](http://docs.ipfs.io.ipns.localhost:8080/reference/http/api/#api-v0-object-get) by adding support for `&data-encoding=base64`, but it was scoped to a specific field in request body ([object/get.js](https://github.com/ipfs/js-ipfs/blob/6870873f0696bb5d8d91fce4a4ef1f7420443993/packages/ipfs-http-client/src/object/get.js#L24)).
### Solutions
We don't want to burn too much time on `/api/v0`, so the change should be surgical. We want to ship a minimum change that enables people to fix binary data issues in their apps today.
There seems to be a consensus that we should solve the problem in a generic fashion that can be applied to all commands (existing and future ones).
We need to agree on semantics how to accomplish that.
Options are:
#### (A) multibase as opt-in flag
[Surgical approach](https://github.com/ipfs/go-ipfs/issues/7958#issuecomment-819121857) is to add a single flag `&multibase-something=base64url`
that informs the `/api/v0/` endpoint that:
1. all `arg` parameters and the body present in request need to be unwrapped using multibase
2. response body should also be wrapped in multibase envelope (using encoding from the flag)
Flag would be off by default to maintain backward-compatibility of `/api/v0`, but be well documented in `--help` text for impacted commands, so it will also show up on https://docs.ipfs.io/reference/http/api/
Bonus 🏚️🚴 (implementation details)
- Should we bother with multibase, if only a subset can be used (effectively only base64url)?
- Not every base is URL-safe (for example `base64` is not safe for `arg=`, and `base64url` should be used instead), so we need to return error when `arg` and `base64` are used together.
- After go-ipfs ships with the fix, the js-ipfs-http-client could use this flag by default on `/api/v0/pubsub/pub` and similar endpoints, just like it already does similar thing in [object/get.js](https://github.com/ipfs/js-ipfs/blob/6870873f0696bb5d8d91fce4a4ef1f7420443993/packages/ipfs-http-client/src/object/get.js#L24), quietly solving the problem for everyone who uses the lib.
Mood:
- :green_heart: does not float the boat too much
- :green_heart: does not require endpoint change (no need to update nginx)
- :green_heart: can be made the default behavior in client libraries like js-ipfs-http-client, effectively solving the issue in transparent manner for most of our devs
- :green_heart: even if no lib is used, docs provide a way for dev to do a self-service fix for binary data
- :broken_heart: binary data is not handled properly by default, unless a client library is used
- :broken_heart: complicates our test suite by adding the need for testing text and multibase variants for known edge cases
#### (B) multibase as default in /api/v1
[Alternative approach proposed by Alex](https://github.com/ipfs/go-ipfs/issues/7958#issuecomment-819399236):
Make it `/api/v1/pubsub/pub`, `/api/v1/dht/put` etc and demand multibase for all binary sent on the query string or as fields in JSON instead of adding additional arguments which add complexity and the potential for misuse.
No need to rev the version number for all endpoints at once, just a few at a time as and when they need upgrading.
Bonus 🏚️🚴 (implementation details)
- Should we mark related `/api/v0` endpoints as deprecated and switch js-ipfs-http-client to only use new one?
- How to handle `apiPath` in `ipfs-http-client` constructor?
Mood:
- :green_heart: binary data handled correctly by default
- :green_heart: easier to test, unless we support a flag to disable multibase envelope, then it is the same as (A)
- :broken_heart: working with text-only data became extra tedious, unless we have a flag to enable plain-text mode
- :broken_heart: we add `/api/v1` which will require deeper changes to [client libs](https://github.com/ipfs/ipfs/#http-client-libraries) and reverse proxies across ecosystem (nginx etc) – shifting the burden from IPFS project to our users
#### (C\) ?
_{feel free to add alternative options here}_
----
## Meeting notes
- Option B is a lot of work (especially when accounting for docs), and we don't want to spend that much effort for a patch.
- Going with multibase
- Single flag (multibase-envelope) to encode everything as multibase (all args and payload)
- Fine for pubsub. We're just scoping to pubsub for now.
- If there is a command where one field is a string and one is binary data, there could be confusion.
- Default to multibase encoding of base64url
- There was discussion about "flags" vs. "args" in the API
- Output
- Wrapping the full out put isn't enough
- @lidel : to summarize the open issues so we can do an async round
- @biglep : schedule followup meeting for part 2 (gateway++ / new rpc api)
Recording:
https://protocol.zoom.us/rec/share/Z3CjkYxU9QJIK9jNZmleZq7Zs-5dPiQaP8Up4ws_j-rPuJoJpU_N4Y3g9HCl4rWB.bwjafIpGr2Tltg3O
Passcode: zS.J!4wF
# *TODAY:* 👉️ Solution workshop @ 2021-05-06
## Plan for this hour
Time-box 15-20min to agree on details how to address [go-ipfs#7958](https://github.com/ipfs/go-ipfs/issues/7958) short term + create AI to document it as "ready to work on" issue on the [Maintenance Board](https://github.com/orgs/ipfs/projects/12).
Then, spend remaining time on:
- going over "the fate of HTTP API at /api/v0"
- temperature check if we all pull in the same direction
- perhaps agree on some map, instead of everyone using a compass :)
## Scope and goal
- PR that fixes `pubsub/*` commands and closes [go-ipfs#7939](https://github.com/ipfs/go-ipfs/issues/7939).
- The convention should be reusable if we need it elsewhere ([go-ipfs#7958](https://github.com/ipfs/go-ipfs/issues/7958)).
## Solution details
@lidel and his new mRNA propose:
```bash
curl -X POST "http://127.0.0.1:5001/api/v0/pubsub/pub
?arg=<topic-in-multibase>
&arg=<data-in-multibase>
&arg-multibase=true
&data-multibase=true"
(HTTP 200)
```
```bash
curl -X POST "http://127.0.0.1:5001/api/v0/pubsub/sub
?arg=<topic-in-multibase>
&arg-multibase=true
&data-multibase=true"
{
"data": "<multibase-string>",
"from": "<multibase-string>",
"seqno": "<multibase-string>",
"topicIDs": ["<multibase-string>"]
}
```
```bash
curl -X POST "http://127.0.0.1:5001/api/v0/pubsub/ls
?data-multibase=true"
{
"Strings": ["<multibase-string>"]
}
```
- `arg-multibase` (a boolean flag)
- **request processing:** every query parameter named `arg` needs to be decoded from multibase
- implementation notes:
- avoid silent bugs: return error when URL-unsafe multibase code is used (and suggest base64url or base32)
- we might be able to add it globally in https://github.com/ipfs/go-ipfs-cmds/ (TBD)
- `data-multibase` (a boolean flag)
- **request processing:** every field with binary data needs to be decoded from multibase
- **response processing:** every field with binary data needs to be interpreted as multibase string
- implementation notes:
- requires identifying binary fields and wiring them up
- use base64 in response (as there is no obvious reason why one would want to use something else)
- while we don't need this for `pubsub/sub` (fields already returned as base64),
we do need to add this to `pubsub/ls` (binary topic names are a thing)
### Q&A:
- **Why two parameters?**
- Flexibility and the lack of ambiguity:
- One can infer what `?arg=bafkqaa3mn5wa&arg-multibase=true&data-multibase=true` means _without reading any docs_.
- We want a convention that is generic enough to solve "binary problem" in other commands. Some have `data-encoding`/`datafieldenc` with base64 already, some behave as if they had an implicit `&arg-multibase=true` already.
- **Why booleans?**
- multibase is self-describing
- not sure why anyone would use something other than base64 for data in response, so keeping things simple
----
## Meeting notes
- Recording:
- https://protocol.zoom.us/rec/share/-Ev-LrhDDqrC8XU8mNADPtx9U31ItAiuJefU4Jx8uY1duV5oNk-K7grz0rTq7GNt.GyGJmg5MgFJjfdks
- Passcode: +oZZ499i
- AI (@lidel)
- break pubsub commands and just fix it – continued in https://github.com/ipfs/go-ipfs/issues/7939#issuecomment-833812863
- comment in generic issue: https://github.com/ipfs/go-ipfs/issues/7958#issuecomment-833831032
----
# Checkpoint: _did we solve the immediate problem first?_ 🐲
If yes, the dragon lets us pass and you start the next quest.
If not, we go back to the beginning of this map.
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
# Long term concern: the fate of HTTP API at `/api/v0`
.. and the argument we need a better RPC API.
## Problems with `/api/v0`
The current custom "RPC API system over HTTP" is a constant pain-point for users because it was created for use in CLI, and was never designed to be exposed to generic http clients and web browsers.
We frequently have to document, fix bugs, add features, etc. in the current RPC API when we'd rather just deprecate it.
Selected pain points:
- Unclear request format.
- Some requests take multipart data
- Multipart is a very complex/confusing spec.
- We use it in a very special way.
- Some requests take line delimited text.
- Difficult to include text with newlines.
- Some requests take only query args.
- Binary data is [really hard to deal with](https://github.com/ipfs/go-ipfs/issues/7958#issue-819946379).
- CLI versus RPC API needs are different
- Documentation is autogenerated from the CLI docs which don't cover HTTP API nuances.
- Some flags are applied client-side, some are applied server-side.
- CLI changes _always_ imply changes to the API.
- [Trailing headers + streaming results are unworkable in browsers](https://github.com/ipfs/js-ipfs/issues/2519). Browsers can either stream results (e.g., to subscribe to a pubsub channel) or receive errors, but not both.
- No API tokens.
- Hard to configure.
- Really tricky to get security right.
- No way to have multi-tenency.
- Inefficient binary handling.
- Not reasonable to use the HTTP API to "extend" IPFS through plugins. I.e., no "sessions", bi-directional RPC, etc.
## Exploring the solution space
Opinionated primer from @lidel :-)
The discussion is often scoped to short-mid term, and framed as "deprecate HTTP, use gRPC over websockets".
We can't just replace RPC-over-HTTP with gRPC-over-websockets and say our problems are solved. IPFS won't live up to its full potential without a good HTTP API. I'd argue we not only need a better RPC API, but we need a _real_ HTTP API even more.
The TBD part here is what we mean by "HTTP API". Gateway endpoints alone won't be enough for all use cases. We will always need a modern, permissioned HTTP API at `/api/` for advanced use cases and node orchestration. Without it, we will be paying the cost of lost opportunity.
I am happy to share feedback to back up the gut feeling, like our partners expecting a HTTP API, not RPC-over-HTTP and hitting some edger case, or asking us for a way to limit access to the API on localhost.
Some food for thought below:
### Move some responsibility to Gateway endpoints (`/ipfs`, `/ipns`, `/mfs`)
Making gateways more useful would decrease the reliance on the `/api/v0` and buy us some time to properly design its replacement.
Some of this is already being tracked in issues linked from [ipfs/in-web-browsers#180](https://github.com/ipfs/in-web-browsers/issues/180)
### Deprecate HTTP API at `/api/v0`, add modern `/api/v1`
Details TBD, but we gathered enough feedback and identified enough edge cases and footguns that we should be able to design something new that:
- is spec driven (eg. OpenAPI)
- is simple and pleasant to use
- can be used in web browsers without any footguns
- remains HTTP-based for maximum interop with various tools and systems
- has built-in access controls
- [long standing feature request](https://github.com/ipfs/go-ipfs/issues/1532), missing from both `/api/v0` and gRPC in js-ipfs
Mood:
- :green_heart: the best interop we can get in 2021 (everything talks HTTP)
- :green_heart: we could reuse modern API patters and conventions, improving onboarding and decreasing cognitive load
- :green_heart: with OpenAPI spec people could generate clients instead of writing them by hand
- :broken_heart: way more design work than just exposing existing CLI commands over websockets, but paying off technical debt is never easy
- :broken_heart: there will be things which won't be easy/possible over HTTP in browsers any time soon (like bidirectional streaming)
- Given how vast `/api/v0` is, this will be relativerly small set of things.
- I suspect that we could abstract the need for bidirectionality away by listening on a diagnostic channel with things like [EventSource](https://github.com/ipfs/in-web-browsers/issues/140).
- ..or just say that certain features only make sense over websockets API for now
### gRPC over websockets
- js-ipfs has done some work on this in [js-ipfs#3403](https://github.com/ipfs/js-ipfs/pull/3403) (gRPC over websockets)
- Key features:
- Exposes existing commands as-is
- Enables full duplex communication with a remote node
- Enables streaming errors (solves the "trailer problem")
- Listens on websockets with no HTTP fallback
- This shipped with js-ipfs, but [the proposal](https://github.com/ipfs/js-ipfs/pull/3403#issue-522254247) was not reviewed by go-ipfs team
- It was proposed to port this to go-ipfs, but we have no spec, only js-ipfs implementation.
Mood:
- :green_heart: already in js-ipfs
- :green_heart: uses websockets, so works well in browsers
- :green_heart: solves problems with `/api/v0` without introducing new commands
- :broken_heart: js-ipfs implementation-first, lacking go-ipfs review
- :broken_heart: client requires grpc+websockets which is not as easy sell to devs as a regular HTTP API (good luck if you are using less popular language that does not have necessary libs)
### Other option?
_{feel free to add alternative options here}_
----
## Meeting notes
- AI
- review latest revision of gateway++ proposal
- come up with multiple user stories