owned this note
owned this note
Published
Linked with GitHub
# IPLD❤️WASM
2020.12.16 chat notes
---------------------
- want to come out of this meeting with an actionable plan and case for some sort of js-ipld-prime
- or at least: illustrate what the concerns are with the current approach.
- working style: work by writing types (in typescript)
- heck yes
- make illustrative cases about how stuff would work on this API vs the current APIs -- not necessarily implement the new interface yet, just show usage.
- articulating issues with current system
- it's not performance per se, it's... the current design locks in that lots of things must eagerly evaulated that might be unnecessary operations.
- Node interface allows you to choose between lazy and eager loading; that's neat.
- misc implementation thoughts
- subclasses
- would good results come from having a Node base class in JS and have lots of subclassing of it?
- proxies
- maybe
- it's possible that the difference between JS primitive strings and JS class strings would become visible here and might create problems.
- static delegate
- a Node type that has a bunch of methods on it, and you pass in the actual data with each call.
- reference material for Selectors from golang:
- Selector interface: https://github.com/ipld/go-ipld-prime/blob/05f8c4d3e7e47b01f60d21c9037cda80ce483fc5/traversal/selector/selector.go#L11-L15
- execution of Selectors: https://github.com/ipld/go-ipld-prime/blob/05f8c4d3e7e47b01f60d21c9037cda80ce483fc5/traversal/walk.go#L87
would it be crazy for there to be a `Node.Selector(s)` access pattern?
- well, maybe. it's going to suggest that each Node implementer (not all end users, but some library people) TODO FINISH
- want to write demonstrations of how we'd do ADL reads, and getting an ADL writable info from an existing ADL that could do an update-like user story with generic code that doesn't know what the ADL is.
- other reading material:
- a "rot13" demo adl in go, maybe interesting: https://github.com/ipld/go-ipld-prime/tree/05f8c4d3e7e47b01f60d21c9037cda80ce483fc5/adl/rot13adl
- go-ipld-prime could use more examples -- because we'd like to port examples to the js spike if possible!
- desired: some example user story about when to use a traversal vs when to step nodes manually, and example data
- super want to include a demo of traversal over multiple blocks in the js-ipld-prime demo.
- consider: an "parser combinator inspired" approach which involves a lot of "tell me what you need and i'll get it all"? what would this entail?
- should review what @rvagg's latest schema view functions are doing
- incidentally: we should surface selectors to textile and 3box more; their threads would probably work a lot better with them.
- of course those folks need them in javascript
- other references:
- rust discussions: https://github.com/ipld/specs/issues/323
- probably at this moment we have _discussions_ about rust but might not have _lessoned learned_ from rust yet because so many things are so incomplete there.
- function pointers all over the place: good. use names or multicodec codes only close to CIDs, convert to function pointers asap.
- maybe go so far as having a `Decode(multicodecTable map[int]Codec, cid Cid) -> Node`, where the codec table is a parameter, instead of single codecs?
- knock yourself out?
- you could this much later in the project's life and not have difficulty adding APIs to do this without breaking existing ones.
- oh, going as far as accumulating new codecs into the table as you walk the dag?
- the halting problem
- yes, you can solve this with budgets. but two problems with budgets:
1. when the budget expires: have you made partial progress? not necessarily. (reinvokes the halting problem if you try to guarantee progress!)
- this is a theoretically hard problem that has no general form solutions, to my knowledge.
- analogy: think about integers with fixed bit size. What if when you overflow, your program halts? Almost no math is safe. Writing code in this environment is... not fun. Now imagine that the fixed bit size of integers is also so low that programs *regularly* encounter it. You would *need* compensation mechanisms for this. What would they be?
2. implementing them is hard: it's very easy to write one or two components that carry budgets through them, but then when composing them, forget to ferry a consistent budget across them, which almost always results in total failure (DoS possibilities).
- e.g. if you can do a loop around a non-DoS-potential, that's... actually immediately a DoS-potential :D
- subproblem: if you have multi-dimentional budget tensors? the design work of deciding how to compose these, or the API work to just carry the whole N-dimensional tensor around, is rough. needs monolithic agreement of relevant APIs; implementation organization challenge.
- this is solvable with "sufficient diligence". Just a challenge to be aware of when trying to put this into practice. (The fact that it's an _iterated_ diligence problem every time any new component is introduced also means failures are especially likely.)
2020.12.16 warpfork braindump
-----------------------------
- a sea of polyfills, probably
- it's great if you want to use POJSOs. idc. we just can't be **restricted** to this anymore.
- **we should be able write generic "IPLD transformation" code where every important piece of feature detection is IPLD code**, though. No weird edgecases of JS operators leaking through.
- @rvagg has written a lot of stuff in this vein -- functions for looking at POJSOs and seeing what IPLD Kind they are; and doing some schema features on that too -- maybe we can build on these. Add more `||` cases to detect our fancier interfaces, done?
- prepare for ADLs.
- probably implies a lot of accessor methods need to be async. or at least _able_ to be async.
- this might be a buggaboo. performance costs? syntactic complexity costs? in which contexts? avoidable? if unavoidable, stomachable? all unclear.
- also **requires returning errors**.
- this tends to make a sucky API no matter what. (We're still struggling with this in Golang, and the only solution space we can see is "core API sucks because it's designed for power; lots and lots of decorators for when you want ease of use".)
- most important that this at least work in read-only mode -- so we can have transparent pathing.
- nice if we can also support at least one write-mode builder.
- not necessary that this be the _only_ way to build a value. but nice to have _at least_ this: makes a lot of operations have symmetry and that's good.
- if the ADL has configurable details needed for creation, this can curry up any config parameters needed for that.
- **requires an interface pattern for builders**, which the other already existent JS libraries might not have since they're so POJSO-oriented.
- remember: ADLs we think of most often are _maps_ and _lists_, but it's not just those: _bytes_ are also an important one.
- really, we should probably figure an ADL can have any kind. (Especially since e.g. people have talked about the possibility of using these for encryption, too, which could wrap anything.)
- fallback position: if we can't make an interface for ADLs that's cleanly exchangable with the interface for plain in-memory nodes, can we at least make a standard interface for ADLs with exactly one standard way to feature-detect it?
- the success heuristic in general here is: "if I'm writing code that's generic (doesn't know what the application or data shape is _at all_, and is going by feel), how many branches do I need to handle all situations?" Less is better. Zero is best, and the goal! One is maybe okay. Two or three is getting very sad.
- prepare for Selectors
- success criteria: if we're going to do any big new foundational interfaces in JS, we **must** make sure there's a clear path to Selectors over them.
- what does this really mean/need? surprisingly little:
- have to be able to ask every Node what Kind it is.
- if it says it's a map kind, it's gotta respond to lookups by string, and have iterators.
- there should not be reserved names
- if it says it's a list kind, it's gotta respond to lookups by index, and have iterators.
- iterators have to have stable orders (because Selectors must yield stable orders (otherwise higher level systems like graphsync would need unbounded resources! big cumulative effects.)).
- this needs to be the same across IPLD implementations. (example: graphsync needs the same order regardless of what langauges the communicating peers are implemented in!)
- that's about it.
- we needed "unboxing" functions in golang for all the scalars to get golang primitives, but JS might not end up needing that much boilerplate.
- major :key: : to get stuff like those iterators in a consistent way, we either:
1. have to be able to use Proxy objects, or,
2. make another interface that also applies to POJSOs, or,
3. define two separate interfaces but make a clear feature detection mechanism, and standardize both interfaces and the detection mechanism.
4. (or some other alternative that's totally not obvious to me? are there more ideas possible?)
- a lot of this is *exactly* the same constraints and interests as any other developer wanting to write their own generic code over these interfaces will have, and by optimizing to make Selectors easy to implement, you'll please every other developer wanting to write generic code. So _even if you DGAF about Selectors, these are still good things to pay attention to_.
- what do we need to discover about WASM?
- hypothesis to test: Node interface might make delayed materialization easier
- also might not be the perfect interface for that: the "prime"-style Node interface has entirely single-step lookup methods.
- best you can do with this is: parse a block into a "skip tree" of index info while deferring other processing.
- if trying to stride over data doing the truly minimal work, getting several step instructions at once is generally going to be the ideal.
- legitimately may want to spike some sort of demo/benchmark that checks how fast it is to transcode one codec to cbor and then pass the whole cbor binary back out to JS.
- this might not answer every question ever about every kind of feature we might want WASM for... but it's certainly interesting for the "fallback codecs" feature, which is also the one most clearly understood.
- not an immediate goal to micro-optimize: just to get order-of-magnitude understandings backed by evidence.
- this won't tell us anything alone unless there's something to compare it with, so other spikes required too.
- would WASM-backed nodes prompt the async question?
- is it the same as ADLs or subtly different?
- if both do, and we couldn't fold this into the standard Node interface: can we at least unify their feature detection for this into one thing?
- scope bounding-box question: what would we think (hypothetically!) of implementing *Selectors* in WASM?
- purpose of asking this question is to think about how much logic we'd be willing to push into WASM in order to avoid round-trips. (**not** because we'd actually do this work anytime soon.)
- possible that we wouldn't really want to do this at all: do we want to tell _every_ developer that wants to do some generic code that they should actually do it on the side that's in-WASM rather than their preferred host langauge?
- probably not
- well, maybe. but would require designing and stabilizing a whole API for the in-WASM side! and that's a whole different ball o' wax. and might just plain be a little premature at the present date.
- what do we need to discover about JS in general?
- are Proxy objects a good idea? are they fast, do they provide desirable DX?
- grep to other mentions of "Proxy" in this braindump to see why this matters.
---
---
---
2020 early
----------
### Goals
- Enable IPFS peers operate on DAGs that use user space codecs without making them to converge on same codec configuration.
- Language agnostic codec implementations that can be used across go-ipfs, js-ipfs, rust-ipfs, etc..
### Approach
- Codecs are implemnted in WASM that comply with a specific interface (agnostic of the source language so they could be implemented in any language that can compile to WASM).
- Binary WASM modules are distributed via IPFS network and identified by a CID.
- IPFS nodes use embedded WASM VM (e.g https://wasmer.io) to load a WAMS module codecs and perform codec operations.
### Design Constraints
- Whatever WASM interface ends up being will have to be supported **FOREVER** or data will become inaccesible. So it is really important to make that interface as futureproof as possible.
### Open Questions
- How does WASM codec info is carried ?
- One idea is to have linking node also provide a link to the codec to be used.
- Another idea is to mash pointer to the wasm-codec into CID.
### Ideas
- We could have a system codec registry (shipped with nodes) and user space registries that are extensions of system registry or other user space regisries. Conceptual idea is that linking node could annotate link with a registry extension (user space registry) to be used with a link. This can form a basis for context passing with following properties:
1. Links carry metadata about a codec with CID.
2. No centralized codec registry is required.
3. Blocks encoded with user space codecs can be decodec given a right registry
- Codecs can verify that passed registry is compatible
- WASM modules could provide an effective way to provide computable link to derived data. E.g. it maybe more efficient to fetch code that can compute HTML source from markdown data than fetching linked HTML source from the network (especially if DAG contains many such links). WASM codecs could provide a way to perform those computations locally and avoid having to fetch computed data.
### Appendix
- [wasm-bindgen design](https://rustwasm.github.io/docs/wasm-bindgen/contributing/design/index.html)