owned this note
owned this note
Published
Linked with GitHub
# Whisper
###### tags: `Tag(HashCloak - Validator Privacy)`
Paper: https://eth.wiki/concepts/whisper/whisper
Definitions:
### Table of Contents
[toc]
### Use Cases
* DApps that need to publish small amounts of information to each other and have the publication last some substantial amount of time.
* DApps that need to signal to each other in order to ultimately collaborate on a transaction.
* DApps that need to provide non-real-time hinting or general communications between each other.
* DApps that need to provide dark (plausible denial over perfect network traffic analysis) comms to two correspondents that know nothing of each other but a hash.
In general, think transactions, but without the eventual archival, any necessity of being bound to what is said or automated execution & state change.
Uses the "shh" protocol string of DEVp2p.
### Specs
* Low-level API only exposed to DApps, never to users.
* Low-bandwidth Not designed for large data transfers.
* Uncertain-latency Not designed for RTC.
* Dark No reliable methods for tracing packets
Typical usage:
1. Low-latency, 1-1 or 1-N signalling messages.
2. High latency, high TTL 1-* publication messages.
Messages less than 64K bytes, typically around 256 bytes.
### Existing solutions
* UDP: Similar in API-level, native multicasting. No TTL, security or privacy safeguards.
* 0MQ: A distributed messaging system, no inherent privacy safeguards.
* Bitmessage: Similar in the basic approach of P2P network exchanging messages with baseline PKI for dark comms. Higher-level (e-mail replacement, only “several thousand/day”, larger mails), fixed TTL and no hinting to optimise for throughput. Unclear incentivisation.
* TeleHash: Secure connection-orientated RTC comms. Similar in approach to BitTorrent (uses modified Kademila tech), but rather than discovering peers for a given hash, it routes to the recipient given its hash. Uses DHT to do deterministic routing therefore insecure against simple statistical packet-analysis attacks against a large-scale attacker. Connection oriented, so no TTL and not designed for asynchronous data publication.
* Tox: Higher-level (IM & AV chat) replacement.
### Considerations for Defeating Traffic Analysis
> All existing protocols for location obscured instant messaging have complicated problems to do with routing.
The Bitmessage protocol propagates messages blindly across the network, and the proper recipient knows how to decrypt it and receives it (just like everyone else) but then stores it and lets its’ user know it’s got a new message.
None of the others listed above particularly have any means to hide the source and destination of messages. I believe it is one of the core objectives of the Whisper protocol to hide location of sender and receiver and in transit, make it difficult if not impossible to establish one, the other or both.
> For generating a circuit to an exit node in Tor, it is of no danger that your client knows the identity of the relays in each hop along the way, but to do that within the network compromises location obfuscation security, tying your ethereum identity with your router identity (I think it may need to be explicitly pointed out that routing function is something that Ethereum nodes will perform).
### Contributions
The contribution I would like to make to the discussion is this:
1. It is implied that the whisper protocol has to work through a system of location obfuscation relays like Tor. Not only this, but it must therefore also be using some form of the rendezvous protocol used in Tor for hidden services.
2. Depending on the needs of a particular connection, there can be occasion to mix up the obfuscation process so that it is further obscured from traffic analysis in the case of malicious nodes performing data gathering for an attacker.
For some purposes one wants lower latency, and other purposes, greater security is vitally important. When in the process streams are fragmented into parts, it can also increase security to apply an All Or Nothing Transform to the entire package, then if part is intercepted but not the complete message, it is impossible to assemble the data, not even for cryptanalysis purposes.
### Overview
> Whisper is a pure identity-based messaging system.
At its most secure mode of operation, Whisper can theoretically deliver 100% darkness. Whisper should also allow the users to configure the level of privacy (how much information it leaks concerning the ÐApp content and ultimately, user activities) as a trade-off for performance.
Basically, all Whisper messages are supposed to be sent to every Whisper node. In order to prevent a DDoS attack, proof-of-work (PoW) algorithm is used. Messages will be processed (and forwarded further) only if their PoW exceeds a certain threshold, otherwise they will be dropped.
Nodes should always keep messages that its ÐApps have created. Though not in PoC-1, later editions of this protocol may allow ÐApps to mark messages as being “archived” and these should be stored and made available for additional time.
Nodes should retain a set of per-ÐApp topics it is interested in.
Also being considered for is support for plausible deniability through the use of session keys and a formalisation of the multicast mechanism.
#### Encryption in version 5
> All Whisper messages are encrypted and then sent via underlying ÐΞVp2p Protocol, which in turn uses its own encryption, on top of Whisper encryption.
Asymmetric encryption uses the standard Elliptic Curve Integrated Encryption Scheme with SECP-256k1 public key. Symmetric encryption uses AES GCM algorithm with random 96-bit nonce. If the same nonce will be used twice, then all the previous messages encrypted with the same key will be compromised. Therefore no more than 2^48 messages should be encrypted with the same symmetric key (for detailed explanation please see the Birthday Paradox). However, since Whisper uses proof-of-work, this number could possibly be reached only under very special circumstances (e.g. private network with extremely high performance and reduced PoW). Still, usage of one-time session keys is strongly encouraged for all Ðapps.
#### Envelopes
> Envelopes are the packets sent and received by Whisper nodes.
[ Version, Expiry, TTL, Topic, AESNonce, Data, EnvNonce ]
Version: up to 4 bytes (currently one byte containing zero). Version indicates encryption method. If Version is higher than current, envelope could not be decrypted, and therefore only forwarded to the peers.
TTL: 4 bytes (time-to-live in seconds).
Topic: 4 bytes of arbitrary data.
AESNonce: 12 bytes of random data (only present in case of symmetric encryption).
Data: byte array of arbitrary size (contains encrypted message).
EnvNonce: 8 bytes of arbitrary data (used for PoW calculation).
Whisper nodes know nothing about content of envelopes which they can not decrypt. The nodes pass envelopes around regardless of their ability to decrypt the message, or their interest in it at all. This is an important component in Whisper’s dark communications strategy.
### Messages
> Message is the content of Envelope’s payload in plain format (unencrypted).
The message has the following structure:
1. flags: 1 byte
2. optional padding: byte array of arbitrary size
3. payload: byte array of arbitrary size
4. optional signature: 65 bytes
In the present protocol version, no explicit authentication token is given to indicate that the data field is encrypted; any would-be readers of the message must know ahead of time, through the choice of topic that they have specifically filtered for, that the message is encrypted with a particular key. This is likely to be altered in a further PoC to include a MAC.
Since the signature is a part of the message and not outside in the envelope, those unable to decrypt the message data are also unable to access any signature.
Payloads are encrypted in one of two ways:
1. If the message has a specific recipient, then by using ECIES with the specific recipient’s SECP-256k1 public key.
2. If the message has no recipient, then by AES-256 with a randomly generated key. This key is then XORed with each of the full topics to form a salted topic. Each salted topic is stored prior to the encrypted data in the same order as the corresponding topics are in the envelope header.
As a recipient, payloads are decrypted in one of two ways:
1. Specific Recipient: Use the private key to decrypt)
2. General multicast audience
Encryption using the full topic with “routing” using the abridged topic ensures that nodes which are merely transiently storing the message and have no interest in the contents (thus have access only to routing information via the abridged topics) have no intrinsic ability to read the content of the message.
The signature, if provided, is the SHA3-256 hash of the unencrypted payload signed using ECDSA with the insertion-identity’s secret key.
In the Javascript API, the distinction between envelopes and messages is blurred. This is because DApps should know nothing about envelopes whose message cannot be inspected; the fact that nodes pass envelopes around regardless of their ability to decode the message (or indeed their interest in it at all) is an important component in Whisper’s dark communications strategy.
#### Topics
> It might not be feasible to try to decrypt ALL incoming envelopes, because decryption is quite expensive. Topic gives a probabilistic hint about encryption key.
Upon receipt of a message, if the node detects a known Topic, it tries to decrypt the message with the corresponding key. In case of failure, the node assumes that Topic collision occurs, e.g. the message was encrypted with another key, and should be just forwarded further. Collisions are not only expected, they are necessary for plausible deniability.
Any Envelope could be encrypted only with one key, and therefore it contains only one Topic.
**Symmetric Encryption**
Nodes must exchange symmetric keys via some secure channel anyway. They might use the same channel in order to exchange the corresponding Topics as well.
**Asymmetric Encryption**
In case of asymmetric encryption, it might be more complicated since public keys are meant to be exchanged via the open channels. So, the Ðapp has a choice of either publishing its Topic along with the public key (thus compromising on privacy), or trying to decrypt all asymmetrically encrypted Envelopes (at considerable expense). Alternatively, PoW requirement for asymmetric Envelopes might be set much higher than for symmetric ones, in order to limit the number of futile attempts.
#### Filters
> Any Ðapp can install multiple Filters utilising the Whisper API. Filters contain the secret key (symmetric or asymmetric), and some conditions, according to which the Filter should try to decrypt the incoming Envelopes.
If Envelope does not satisfy these conditions, it should be ignored:
* array of possible Topics (or partial Topics)
* Sender address
* Recipient address
* PoW requirement
* AcceptP2P: boolean value, indicating whether the node accepts direct messages from trusted peers
All incoming messages, that have satisfied the Filter conditions AND have been successfully decrypted, will be saved by the corresponding Filter until the Ðapp requests them.
In future versions subscription will be used instead of polling.
#### PoW
> The purpose of PoW is spam prevention, and also reducing the burden on the network.
Thus, we can use PoW as a single aggregated parameter for the message rating. In the future versions every node will be able to set its own PoW requirement dynamically and communicate this change to the other nodes via the Whisper protocol. Now it is only possible to set PoW requirement at the Ðapp startup.
#### Basic Operation
> Nodes are expected to receive and send envelopes continuously.
Nodes should:
* Maintain a map of envelopes, indexed by expiry time, and prune accordingly.
* Efficiently deliver messages to the front-end API through maintaining mappings between Ðapps, their filters and envelopes
When a node’s envelope memory becomes exhausted, a node may drop envelopes it considers unimportant or unlikely to please its peers. Nodes should rate peers higher if they pass them envelopes with higher PoW. Nodes should blacklist peers if they pass invalid envelopes, i.e., expired envelopes or envelopes with an implied insertion time in the future.
> Personal note: PoW solves the issue of spam attacks being free, but leaves more of an issue for attackers being able to raise the barrier for honest nodes PoW with financial/"work" means.
Nodes should always treat messages that its ÐApps have created no different than incoming messages.
> Personal note: How do you ensure this? Also, why?
To send a message:
* the node should place the envelope its envelope pool.
* Then this envelope will be forwarded to the peers in due course along with the other envelopes.
Composing an envelope from a basic payload, is done in a few steps:
1. Compose the Envelope data by concatenating the relevant flag byte, padding, payload (randomly generated or provided by user), and an optional signature.
2. Encrypt the data symmetrically or asymmetrically.
3. Add a Topic.
4. Set the TTL attribute.
5. Set the expiry as the present Unix time plus TTL.
6. Set the nonce which provides the best PoW.
#### Mail Server *Security issue?*
> Store all messages, and resend them at the request of the known nodes.
the Mail Server should engage in peer-to-peer communication with the node, and resend the expired messages directly. The recipient will consume the messages and will not forward them any further.
In order to facilitate this task, protocol-level support is provided in version 5. New message types are introduced to Whisper v.5: mailRequestCode and p2pCode.
* mailRequestCode is used by the node to request historic (expired) messages from the Mail Server.
* p2pCode is a peer-to-peer message, that is not supposed to be forwarded to other peers. It will also bypass the protocol-level checks for expiry and PoW threshold.
#### Silent Operation
> The more one advertises to ones peers attempting to “fish” for useful messages and steer such message towards oneself, the more one reveals to ones peers.
For a securely anonymous dynamic two-way conversation, this trade-off becomes problematic; significant topic-advertising would be necessary for the point-to-point conversation to happen with sensible latency and yet so little about the topic can be advertised to guide messages home without revealing substantial information should there be adversary peers around an endpoint.
In this situation, dynamic topic generation would be used. This effectively turns the datagram-orientated channel into a connection-oriented channel.
### Whisper PoC 2 Protocol Spec
> details the full Whisper protocol for the first proof-of-concept and sets the vision for the final design.
#### What Whisper Is (and Is Not)
> Whisper combines aspects of both DHTs and datagram messaging systems (e.g. UDP).
Whisper is a new protocol designed expressly for a new paradigm of application development. It is designed from the ground up for easy and efficient multi-casting and broadcasting.
It is designed to be a building block in next generation ÐApps which require large-scale many-to-many data-discovery, signal negotiation and modest transmissions with an absolute minimum of fuss and the expectation that one has a very reasonable assurance of complete privacy.
#### Pitch-Black Darkness
> To understand information leakage, it is important to distinguish between mere encryption, and darkness.
Even with encrypted communications, well-funded attackers are still able to compromise ones privacy, often quite easily.
In the case of a simple client/server model, metadata betrays with which hosts one communicates - this is often plenty enough to compromise privacy given that content is, in many cases, largely determinable from the host.
With decentralised communications systems, e.g. a basic non-routed but encrypted VoIP call or Telehash communication, a network packet-sniffing attacker may not be able to determine the specific content of a transmission, but with the help of ISP IP address logs they would be able to determine to whom one communicated, when and how often. For certain types of applications in various jurisdictions, this is enough to be a concerning lack of privacy.
Even with encryption and packet forwarding through a third relay node, there is still ample room for a determined bulk transmissions-collector to execute statistical attacks on timing and bandwidth, effectively using their knowledge of certain network invariants and the fact that only a finite amount of actors are involved. There are ways to mitigate this attack vector, such as using multiple third-party relays and switching between them randomly or to use very strict framing, however both are imperfect and can lead to substantial inefficiencies.
> A truly dark system is one that is utterly uncompromising in information leakage from metadata.
#### Routing and Lack Thereof
> One of Whisper’s differences is in providing a user-configurable trade-off between ones routing privacy and ones routing efficiency.
At its most dark, Whisper nodes are entirely reactive - they receive and record pieces of data and forward them trying to maximise the utility of information transmission to the peers.
However, Whisper is also designed to be able to route probabilistically using two methods, both giving away minimal routing information and both being exceptionally resilient to statistical attacks from large-scale metadata collection.
1. > The first builds on the functionality of the ÐΞV-p2p backend. This backend provides the ability of Whisper to rate peers and, over time, probabilistically alter (or steer) its set of peers to those which tend to deliver useful (on-topic, timely, required for ones ÐApps to function) information. Ultimately, as the network evolves and the peer-set is steered, the number of hops between this peer and any others that tend to be good conduits of useful information (be they the emitters or simply the well-positioned hubs) will tend to 0.
2. > The second is more dynamic. Nodes are informed by their ÐApps over what sort of topics are useful. Nodes are then allowed to advertise to each peer describing these topics.
Through combining and reducing the Blooms/masks, weaker Nth-level information can be provided to peers about their peers’ interests, forming a probabilistic topic-reception vortex around nodes, the “topic-space” gravity-well getting weaker and less certain the farther away with the network hop distance from any interested peers.
### Whisper Wire Protocol
Peer-to-peer communications between nodes running Whisper clients run using the underlying ÐΞVp2p Wire Protocol.
This is a preliminary wire protocol for the Whisper subsystem. It will change.
For the Whisper sub-protocol, upon an active session, a Status message must be sent. Following the reception of the peer’s Status message, the Whisper session is active. The peer with the greatest Node Id should send a Messages message to begin the message rally. From that point, peers take it in turns to send (possibly empty) Messages packets.
### Conclusion