Chainlink Whitepaper

TLDR from my point of view.

Architecture

On-Chain Architecture

ChainLink nodes return replies to data requests or queries made by or on behalf of a user contract.

ChainLink has an on-chain component consisting of three main contracts: a reputation contract, an order-matching contract, and an aggregating
contract.
The reputation contract keeps track of oracle-service-provider performance metrics.

The order-matching smart contract takes a proposed service level agreement,logs the SLA parameters, and collects bids from oracle providers. It then selects bids using the reputation contract and finalizes the oracle SLA.

The aggregating contract collects the oracle providers’ responses and calculates the final collective result of the ChainLink query. It also feeds oracle provider metrics back into the reputation contract.

ChainLink contracts are designed in a modular manner, allowing for them to be configured or replaced by users as needed. The on-chain work flow has three steps:

Oracle Selection
Data reporting
Result aggregation

Oracle Selection

ChainLink is a platform that allows for the selection and management of oracles, which are entities that provide external data to smart contracts on a blockchain.
The platform allows for the creation of service level agreements (SLAs) between oracle providers and customers, which specify the parameters of the oracle service, such as query parameters and reputation requirements.
Oracles can be selected manually or through automated matching using order-matching contracts.
ChainLink maintains a listing service that verifies the reputation and performance of oracles and offers an off-chain platform for buyers and sellers to negotiate SLAs. Once an SLA is agreed upon, it is recorded on the blockchain, and the oracles selected are notified to perform the service outlined in the SLA.

Automated order matching

Manual matching is not possible for all situations. For example, a contract may need to request oracle services dynamically in response to its load. For these reasons, automated oracle matching is proposed.

Once the end users has specified their SLA proposal, instead of contacting the oracles directly, they will submit the SLA to an order-matching contract. The submission of the proposal to the order-matching contract triggers a log that oracle providers can monitor and filter based on their capabilities and service objectives.

ChainLink nodes then choose whether to bid on the proposal or not, with the contract only accepting bids from nodes that meet the SLA’s requirements. When an oracle service provider bids on a contract, they commit to it, specifically by attaching the penalty amount that would be lost due to their misbehavior, as defined in the SLA.

Bids are accepted for the entirety of the bidding window. Once the SLA has received enough qualified bids and the bidding window has ended, the requested number of oracles is selected from the pool of bids. Penalty payments that were offered during the bidding process are returned to oracles who were not selected, and a finalized SLA record is created. When the finalized SLA is recorded it triggers a log notifying the selected oracles. The oracles then perform the assignment detailed by the SLA.

Data reporting

@TODO

Result Aggregation

Once oracles have provided their responses to a query, the results are sent to an aggregating contract. The aggregating contract calculates a weighted answer based on all the responses, and then the validity of each oracle's response is reported to the reputation contract. Finally, the weighted answer is returned to the calling contract, which can then use it to execute its function.

The process of detecting and rejecting incorrect values in oracle responses is specific to the type of data and the intended use of the data. Therefore, instead of having a single aggregating contract, ChainLink allows the end users to specify a configurable contract address for this purpose. ChainLink will provide a standard set of aggregating contracts, but customers can also use their own custom contracts as long as they follow the standard calculation interface.

Off-Chain Architecture

Off-chain ChainLink nodes initially cosists of a network of oracles nodes connected to the Ethereum network.

Their individual responses are aggregated via one of several possible consensus mechanisms into a global response that is returned to a requesting contract.

The ChainLink nodes are powered by the standard open source core implementation which handles standard blockchain interactions, scheduling, and connecting with common external resources. Node operators may choose to add software extensions, known as external adapters, that allow the operators to offer additional specialized off-chain services.

Chainlink Core

ChainLink Core is the software that runs on the oracle nodes, it is responsible for:

Connecting to the blockchain
Scheduling
Balancing work across its various external services

Work done by ChainLink nodes is formatted as assignments. Each assignment is a set of smaller job specifcations, known as subtasks, which are processed as a pipeline. Each subtask has a specific operation it performs, before passing its result onto the next subtask, and ultimately reaching a final result. ChainLink’s node software comes with a few subtasks built in, including HTTP requests, JSON parsing, and conversion to various blockchain formats.

External Adapters

Beyond the built-in subtask types, custom subtasks can be
defined by creating adapters. Adapters are external services with a minimal REST API. By modeling adapters in a service-oriented manner, programs in any programming language can be easily implemented simply by adding a small intermediate API in front of the program. Similarly, interacting with complicated multi-step APIs can be simplified to individual subtasks with parameters

Chainlink Workflow

End users makes an on-chain request
ChanLink smart-contract logs an event for the oracles
ChanLink core picks up the event and routes the assignment to an adapter
ChainLink adapter performs a request to an external API
ChainLink adapter processes the reponse and passes it back to the core
ChainLink core reports data to ChainLink smart-contract
ChainLink smart-contract aggregates response and passes them back as a single response to the end user.

ChainLink decentralization approach

ChainLink proposes three basic complementary approaches to ensuring against faulty nodes:

Distribution of data sources
Distribution of oracles
Use of trusted hardware

Distributing sources

A simple way to deal with a faulty single source Src is to obtain data from multiple sources, i.e., distribute the data source. A trustworthy ORACLE can query a collection of sources Src1, Src2, . . . , Srck, obtain responses a1, a2, . . . , ak, and aggregate them into a single answer A = agg(a1, a2, . . . , ak).

ORACLE might do this in any of a number of ways. One, for example, is majority voting. If a majority of sources return the identical value a, the function agg returns a; otherwise it returns an error. In this case, provided that a majority (> k/2) sources are functioning correctly, ORACLE will always return a correct value A.

Distributing oracles

ChainLink ideal ORACLE service is a distributed system.

Requests are distributed across both oracles and data sources. This figure shows an example of such two-level distribution.

Some of these oracles may be faulty. So clearly the set of all oracles’ answers A1, A2, . . . , An will need to be aggregated in a trustworthy way into a single, authoritative value A. But given the possibility of faulty oracles, where and how will this aggregation happen in ChainLink?

In-contract aggregation

ChainLink aggregate oracle response. In other words, ChainLink smart-contract will compute A = Agg(A1, A2, . . ., An) for some function Agg

This approach is practical for small n, and has several distinct benefits:

Conceptual simplicity: Despite the fact that the oracle is distributed, a single entity, ChainLink smart-contract, performs aggregation by executing Agg
Trustworthiness: Smart-contract can be publicly insepcted, its correct behaviour can be verified
Flexibility: ChainLink smart-contract can implement most desired aggregation functions

Problems with in-contract aggregation (probably with most oracles)

Gamification

The problem of freeloading is a novel technical challenge presented by the oracle model. Freeloading is when an oracle, Oz, observes the response of another oracle, Oi, and copies it, thereby avoiding the expense of querying data sources, which may charge per-query fees. This weakens security by undermining the diversity of data source queries, and it also disincentivizes oracles from responding quickly. A solution to this problem is the use of a commit/reveal scheme. In this scheme, oracles send cryptographic commitments to their responses to the CHAINLINK-SC contract in the first round. After CHAINLINK-SC has received a quorum of responses, it initiates a second round in which oracles reveal their responses, this way oracles that freeload are easily detected

Cost

In-contract aggregation has a key disadvantage: Cost.
It incurs the cost of transmitting and processing on-chain O(n) oracle messages (commits and reveals for A1, A2, . . . , An). In permissioned blockchains, this overhead may be acceptable. In permissionless blockchains with onchain transaction fees such as Ethereum, if n is large, the costs can be prohibitive. A more cost-effective approach is to aggregate oracle responses off-chain and transmit a single message to CHAINLINK-SC A. We propose deployment of this approach, called off-chain aggregation. This approach solves the cost issue but the freeloader problem discussed in Gamification is still an issue.

The ChainLink system proposes the use of a simple protocol involving threshold signatures GIP-2. Such signatures can be realized using any of a number of signature schemes, but are especially simple to implement using Schnorr signatures. In this approach, oracles have a collective public key pk and a corresponding private key sk that is shared among O1, O2, . . . , On in a (t, n) threshold manner. Such a sharing means that every node Oi has a distinct private / public keypair (ski, pki).

ChainLink Security Services

Trusted hardware is being considered as a secure approach toward protecting against corrupted oracles providing incorrect responses but, this may not provide definitive protection for three reasons:

It will not be deploted in initial versions of the ChainLink network
Some users may not trust trusted hardware
Trusted hardware cannot protect against node downtime, only against node misbehavior

Tho this end, ChainLink proposes the use of four key security services:

Validation System
Reputation System
Certification Service
Contract-Upgrade Service

Validation system

Monitors on-chain oracle behavior, providing an objective performance metrict that can guide user selection of racles.

Availability: The validation system should record failures by an oracle to respond in a timely way to queries.
Correctness: The validation system should record apparent erroneus responses by an oracle as measured by deviations from responses provided by peers

The off-chain aggregation system does not have the visibility to monitor availability and correctness.
Since oracles digitally sign their responses, and thus, as a side effect, generate non-repudiable evidence of their answers. The approach will be to realize the validation service as a smart contract that would reward oracles
for submitting evidence of deviating responses. In other words, oracles would be incentivized to report apparently erroneous behavior.

Availability is somewhat trickier to monitor, as oracles of course don’t sign their failures to respond. Instead, a proposed protocol enhancement would require oracles to digitally sign attestations to the set of responses they have received from other oracles. The validation contract would then accept (and again reward) submission of sets of attestations that demonstrate consistent non responsiveness by an underperforming oracle to its peers.
In both the on-chain and off-chain cases, availability and correctness statistics for oracles will be visible on-chain. Users / developers will thus be able to view them
in real time through an appropriate front end, such as a Dapp in Ethereum or an equivalent application for a permissioned blockchain.

Reputation system

The Reputation System proposed for ChainLink would record and publish user ratings of oracle providers and nodes, offering a means for users to evaluate oracle performance holistically. Validator System reports are likely to be a major factor in determining oracle reputations and placing these reputations on a firm footing of trust. Factors
beyond on-chain history, though, can provide essential information about oracle node.

Reputation is an on-chain component that can be easily referenced on-chain/off-chain.

Reputation metricts

Total number of assigned requests: The total number of past requests that an oracle has agreed to, both fulfilled and unfulfilled.
Total number of completed requests: The total number of past requests that an oracle has fulfilled. This can be averaged over number of requests assigned to calculate completion rate.
Total number of accepted requests: The total number of requests that have been deemed acceptable by calculating contracts when compared with peer responses. This can be averaged over total assigned or total completed requests to get insight into accuracy rates
Average time to respond: While it may be necessary to give oracle responses time for confirmation, the timeliness of their responses will be helpful in determining future timeliness. Average response time is calculated based on completed requests.
Amount of penalty payments: If penalty payments were locked in to assure a node operator’s performance, the result would be a financial metric of an oracle provider’s commitment not to engage in an “exit scam” attack, where the provider takes users’ money and doesn’t provide services. This metric would involve both a temporal and a financial dimension.

Certification Service

While Validation and Reputation Systems are intended to address a broad range of faulty behaviors by oracles and is proposed as a way to ensure system integrity in the vast majority of cases, ChainLink may also include an additional mechanism called a Certification Service. Its goal is to prevent and/or remediate rare but catastrophic events, specifically en bloc cheating in the form of Sybil and mirroring attacks.

Sybil and mirroring attacks. Both the simple and in-contract aggregation protocols of ChainLink aim to prevent freeloading by dishonest oracles copying honest oracles answers. However, they do not protect against Sybil attacks. Sybil attacks occur when an adversary controls multiple oracles, and attempts to dominate the oracle pool by providing false data. This can happen through collusion among multiple adversaries or a single adversary. Additionally, in order to reduce operational costs, a Sybil attacker can adopt a behavior called mirroring, in which oracles share data off-chain but pretend to source data independently. This reduces security by eliminating the error correction resulting from diversified queries against a given source. To eliminate these attacks, ChainLink's long-term strategy is to use trusted hardware and certification service.

The ChainLink Certification Service aims to provide assurance of integrity and availability for oracle providers, detecting and preventing mirroring and colluding oracle quorums. The service would issue endorsements of high-quality oracle providers, it would monitor the Validation System statistics on oracles and perform spot-checking of on-chain answers, particularly for high-value transactions, comparing them with answers obtained directly from reputable data sources. With sufficient demand for an oracle provider's data, there will be enough economic incentive to justify off-chain audits of oracle providers, confirming compliance with relevant security standards, and providing useful security information. The certification service is planned as a means to identify Sybil attacks and other malfeasance that automated on-chain systems cannot detect.

LINK token usage

The ChainLink network utilizes the LINK token** to pay ChainLink Node operators for the retrieval of data from off-chain data feeds, formatting of data into blockchain readable formats, off-chain computation, and uptime guarantees they provide as operators. In order for a smart contract on networks like Ethereum to use a ChainLink node, they will need to pay their chosen ChainLink Node Operator using LINK tokens, with prices being set by the node operator based on demand for the off-chain resource their ChainLink provides, and the supply of other similar resources. The LINK token is an ERC20 token, with the additional ERC223 “transfer and call” functionality of transfer(address,uint256,bytes), allowing tokens to be received and processed by contracts within a single transaction.