owned this note
owned this note
Published
Linked with GitHub
Welcome to cypherspace! Here's your hypercore!
===
## About this documentation
This is an FAQ and guide to USING hypercore as a developer. It's not focused on the internals of hypercore.
## What is hypercore?
Hypercore is:
* An append only log, like an array of values which can only grow forever,
* writable by one device,
* easily replicatable to other devices in full or in part,
* with signatures and hashes to prevent tampering of the data, allowing it to pass through untrusted peers along the way.
Hypercore is not:
* editable _(but it can be implemented at the application level by adding "delete" messages to the log)_
* writable by more than one peer/device _(but you can get this by bundling together one hypercore from each writer with some logic on top. See [multifeed](https://github.com/noffle/multifeed) and [hyperdb](https://github.com/mafintosh/hyperdb))_
You can use it as:
* a base to build distributed applications on top of
## (Other docs)
https://github.com/mafintosh/hypercore
https://dat-dev-story.hashbase.io/hypercore/basic-usage.html
Overview of the many small modules related to dat and hypercore:
https://dat-dev-story.hashbase.io/extra-guides/breakdown.html
https://datprotocol.github.io/book/
https://github.com/lachenmayer/hyperdb-authorization-guide
https://dissecting-dat-lachenmayer.hashbase.io/
https://github.com/noffle/kappa-arch-workshop
## Where can I run hypercore?
* node (which storage libraries can you use?)
* browser (any limitations to replication or storage? which storage libraries are there?)
* Beaker - what APIs does it expose, maybe only "dat" aka hyperdrive, and some swarm library?
* non-javascript implementations
* [rust](https://github.com/datrs/hypercore)
* special notes about compiling leveldb and the crypto library? which version of node works best for that?
* extra credit: tricks to get it running on ARM devices, android, iOS, raspi?
## What kind of data can you put in a hypercore? What's the data model?
* Apparently you can access the log by item index using `createReadStreamn` or by overall byte offset (for all items together) using `seek`?
* What data can go in each item of the log? Are they binary buffers or JSON objects? `valueEncoding` ?
* One writer device, multiple readers. (What happens if you try to write from multiple devices by sharing a key? Do you fork it and break it like SSB?)
* Append only, grows forever, no deletion? Though sparse mode helps work around this.
* Can you discard its history? What about breakpoints for history? (I'll sync starting from version xxx)
* ^^^ you can use [feed.clear()](https://www.npmjs.com/package/hypercore#feedclearstart-end-callback) to discard records from a hypercore you have on your device, but the meta-data about the record cannot be discarded.
* Is it good for large files? What about lots of very small updates? Practical scaling limits? It is awesome for large data sets because you can use sparse replication mode and `feed.clear()` to retain only what is needed for processing.
* Hash, crypto properties?
* Each hypercore has a public/private keypair which can do several things.
* The public key is like the address of the hypercore, and the private key gives you the ability to write to it (from a single device).
* A. The discovery key is derived from the public key, so you can find peers in a DHT without divulging the public key
* B. The private key is used to sign writes, proving that they were made by the owner of the hypercore
* C. At the application level you could for example implement private messages between users by encrypting things to the public key of someone's hypercore (and publishing it in your own hypercore of outgoing messages). That person is the only one with the private key to read your message.
* What else do we need to know about hashes and signatures?
* How do you refer to a specific message in a hypercore? You could say HYPERCOREHASH@SEQNUMBER - is there a hash-based way to do it which is safer?
* https://github.com/mafintosh/hypercore-strong-link
* noffle says: "hypercore already stores the hash of each message! Except it's more complicated than a regular content hash (it contains leaf vs internal node info and the length), and there's no public API for it yet."
## What other things are built on top of hypercore?
* [Hyperdrive](https://github.com/mafintosh/hyperdrive#hyperdb-backend) - filesystem type datastructure. Used to be on top of hypercore directly, but maybe now will be on top of hyperdb?
* [dat](https://github.com/datproject/dat) - is a friendly branding for hyperdrive w/ a specific setup for swarm replication?
* [Hyperdb](https://github.com/mafintosh/hyperdb) - kv store with multiple writers, who all have full permissions to write (once they have been [authorized](https://github.com/lachenmayer/hyperdb-authorization-guide)).
* [multifeed](https://github.com/noffle/multifeed) / [kappa-core](https://github.com/kappa-db/kappa-core) / [cabal](https://github.com/cabal-club/cabal) - collection of hypercores with indexes and views on top, similar to SSB's views
## How do you replicate (sync) w/ others? What swarm libraries are there?
* Does each hyperdrive have its own unique crypto key? And the discovery key is derived from that?
* hypercore might have a header that you can use to specify versions to allow you to reject incompatible peers? Or iss there another sidechannel you can use during replication for your own application purposes like authorization to join the swarm, version checking, etc?
* Swarm libraries:
* discovery-swarm includes 3 strategies
* mDNS (bonjour)?
* DHT (like bittorrent trackers)?
* DNS (using a special DNS server hosted by mafintosh)?
* hyperswarm
* next-gen version of discovery-swarm
* does NAT hole-punching
* https://github.com/hyperswarm/network
* https://pfrazee.hashbase.io/blog/hyperswarm
* https://www.npmjs.com/package/consent-swarm
* https://github.com/mafintosh/webrtc-swarm
* https://github.com/geut/discovery-swarm-webrtc
* Which of those swarm libraries do NAT holepunching?
* How are replication responsibilities divided between the swarm library, hypercore, and something like multifeed?
* How many peers can be in a swarm?
* [256 peers is the default](https://github.com/mafintosh/hypercore-protocol/blob/master/index.js#L112-L115) but you can pass in `opts.maxFeeds` to increase that.
* When you replicate several hypercores at once are they handled together for efficiency, or are they totally independent? What order does it replicate the feeds (one whole feed at a time, or in parallel?)
* What if you wanted to pipe two hypercores together by hand across your own transport, like websockets, or sneakernet?
* How did [dat-shopping-list](https://github.com/jimpick/dat-shopping-list) work? It hosted a single swarm peer on a server and multiplexed that connection down to web clients by websockets?
* How do you know when your hypercore is completely replicated to someone else?
* It's possible but tricky; see [this GitHub issue](https://github.com/mafintosh/hypercore/issues/179)
## What is sparse mode? Why, when, and how to use it?
* How to only get certain files?
* How to drop those files to reclaim space?
* What crypto protections do you lose in sparse mode? Can you still verify signatures? In [the SLEEP format docs](https://github.com/datproject/docs/blob/master/papers/sleep.md) it says after each write you sign the combined roots of the Merkle tree(s) -- seems like to verify those you would need all the data up to that point?
* Rumour: hypercore writes data in "sparse files" but OSX at least doesn't support those, so it takes up all the disk space of a whole hypercore just to download one file. Is this true? There's some kind of "random access pagefile" module you can use as hypercore's storage that fixes this? I think this came up in the context of Beaker Browser - cinnamon
## What is live mode vs static mode? Why and when to use them?
## API usage examples
* Lifecycle of a hypercore
* https://github.com/mafintosh/hypercore-protocol/
* Creating from scratch
* Creating with a given key
* Ready. What things have to wait until it's ready?
* Closing. What are the rules around closing?
* Writing
* Are there batch writes to append multiple items atomically?
* Are individual item writes atomic?
* Writing using a stream. (Any special rules about closing the stream?)
* Writing a single item
* Reading
* Random access reads on demand, by item index
* Random access reads on demand, by overall byte offset
* Listening to a stream of changes as they arrive - downloading on demand or not; waiting for new changes or not; any special rules about closing the stream?
* Peer status, number of peers
* Crypto and security
* https://github.com/mafintosh/hypercore-crypto/
* Verifying signatures and hashes
* Encrypting and decrypting messages to/from a specific hypercore's key, for sending private messages to that peer
* Syncing / replication
* Are incoming items added atomically or can they be incomplete (only some of the bytes)? If atomic, does that mean very large items can't be downloaded incrementally, stopped, and resumed?
* Sparse mode
* Knowing if an item is there yet or not
* Requesting an item if it's not there yet
* Making an item go away from your local copy if you don't need it - is this possible? I see `undownload` but that cancels a pending download, doesn't remove data from disk
* Live mode vs static mode
What other API calls are there?
## How to debug and explore a hypercore?
* Is there a command line utility to print out the contents of a hypercore?
* https://noffle.github.io/kappa-arch-workshop/build/04.html