Consensus-Based Content Reputation System

Status: Draft | Category: Standards ## Abstract In the current digital age, decentralized validation and transmission of information have become essential. This white paper introduces an innovative system focused on rating and validating content within a decentralized network, upholding the integrity and reliability of shared information. Throughout this document, we explore the design, challenges, and proposed solutions for this system. ## 1. Introduction In the current context of the proliferation of multimedia content on distributed networks, it is essential to have robust and reliable systems. The integrity and reliability of information are crucial in a world flooded with data. This document proposes a consensus layer atop pubsub[1] designed to combat spam, deter abuse, and uphold the quality of multimedia content. ## 2. CID and SEP-001 Efficient and secure data transmission and handling are essential in the modern digital world. To achieve this, it is imperative that standards are in place to allow for the interoperability and reliability of transmitted information. SEP-001[2] is introduced as a standard, leveraging the jwt[3] specification, emphasizing security and interoperability throughout networks. **2.1 CID Concept** Content Identifiers[4] refers to the method used to label and retrieve data in decentralized systems. Rather than referencing data by its location, as is done with traditional URLs, CIDs identify information by its content. This content-based unique identification ensures that its identity remains consistent and verifiable. **2.2 Relation** It is essential to understand that the retrieval of content from a CID must directly correlate with the SEP-001[2] standard to be considered valid in the context of consensus. Specifically, any retrieval operation on a CID within this context must always result in an implementation that complies with the SEP-001 standard, adhering to any of the serializations described in its specification. By resolving a CID[4], users access the precise data as originally issued by its creator, validating its authenticity and ensuring its unaltered state since initial storage. ## 3. Fingerprint Addressing In the modern landscape of digital consensus and increased emphasis on data integrity, the strategies we employ to reference or identify data have evolved dramatically. The jwk[12] used to represent a network participant's public key, lays the foundation for a system based on derived addresses, commonly known as fingerprints[5]. Much like externally owned accounts[6] in certain blockchain architectures, these fingerprints provide a unique identifier directly tied to the ownership and control of assets or data within the network. **3.1 Operational Overview** The SEP-001[2] standard integrates the public key into the CID's header. One of the ensuing advantages of this inclusion is the capability to derive a unique fingerprint from the associated public key[12]. By harnessing the thumbprint[5] standard, this fingerprint emerges as a distinct address, central to the rating system. **3.2 Anatomy of an Account** Accounts map each address to its corresponding statistics. Each address contains four fields: * Nonce: A counter indicating the total number of votes. * Rating: The overall rating balance of the account. * Reputation: The total number of negative votes received. * Timestamp: The time at which the account was created. **3.2 Automatic Account Genesis** Within the purview of the SEP-001 standard, whenever a participant submits a CID[4] for curation (Section 5, index 5.3), the system inherently initiates an account creation process. This is an automated and organic progression. If an account associated with the derived address[5] doesn't already exist in the accounts store, it's instantiated, laying the foundation for subsequent operations. * 3.3 Initial rating balance Every account creation, instigated by the CID submission process, is given an initial rating balance of 1. This value symbolizes a gesture of trust or an initial reward, recognizing the user's effort in contributing a CID. The rationale behind this initial incentive is clear: even though the content of the CID[4] is not immediately known and the network still needs to determine its validity or importance, this initial point serves as an acknowledgment gesture, giving the user the "benefit of the doubt." ## 4. Rating Weight The 'rating weight' represents the interest and identity of an account on the network, reflecting both its historical contribution and its current relevance. This dynamic system provides individuals with the power to influence the network in proportion to their credibility and contribution. Inspired by Algorand [14], the weights are used to ensure that attackers cannot expand their power using pseudonyms (sybil attack). * 4.1 Rating Mechanism: Each account on the network has a rating. Positive interactions, such as spreading a valid CID, reward this rating, while negative interactions, such as spreading a malicious CID[4] or spam, decrease it. * 4.2 Cooling Mechanism: To ensure an equitable distribution of power on the network and to prevent domination by high-rating accounts, we have introduced a cooling mechanism. Once an account reaches a certain rating threshold, subsequent increases in its rating will have diminishing returns. This means that while the account continues to be rewarded for its positive actions, the magnitude of the rating increase diminishes as the overall rating escalates. ``` FUNCTION CoolingReward(address) n = sum(GetRatings()) x = GetAccountRating(address) reward = 1 - (x/n) RETURN reward ``` ![](https://hackmd.io/_uploads/SJj6iJynh.png) ## 5. CID Reputation The reputation system is pivotal in gauging the trustworthiness of the network. It assigns a reputation score to every CID, updating it based on account interactions. * 5.1 Hash Key A key component in this process is the hash key of each CID, which allows the system to identify unique content efficiently. The CID hashes are used as keys, ensuring that the actual CIDs[4] are not publicly revealed. ``` FUNCTION HashKey(cid): hash = SHA256(SHA256(cid)) RETURN hash ``` * 5.2 Reputation store The reputation store is a rainbow table[7] that maintains the reputation level associated with each hash; each hash contains as value a structure with positive (allow) and negative (deny) weigths. ``` FUNCION RetrieveReputation(cid) hash = HashKey(cid) reputation = reputationStore[hash] RETURN reputation ``` * 5.3 CID Submission Before a CID can be registered in the reputation store, its creator must cast a self-vote (Section 6, index 6.1) in support of the particular CID[4]. * 5.5 Initial Weight The weight assigned to each hash during the creation of a new key starts at 1 for "allow" (originating from the initial vote or auto-vote of the CID. Section 3.3) and 0 for "deny". This approach is adopted because, initially, we do not have knowledge of the CID's[4] content, and it is not appropriate to make assumptions about it. ``` FUNCTION SetInitialWeight(cid): hash = HashKey(cid) IF hash not in "reputation store": initial = { allow: 1, deny: 0 } reputationStore[hash] = initial RETURN hash ``` * 5.6 Deny/Allow Exchange mechanism Each time new votes are recorded in the reputation store, the negative ratio is recalculated. Based on the verdict, appeal, and absolution described in Section 6, if the verdict is a penalty, the associated hash is moved to the denylist and simultaneously removed from the allowlist. On the other hand, if the verdict is an absolution, the hash is removed from the denylist and added back to the allowlist. By default a CID[4] is white listed. * 5.7 Understanding Negative Reputation Within the context of the network, votes can be either negative or positive. It's not essential to explicitly detail the "reason" behind a negative vote. Instead, the process relies on the network's integrity and trust to determine when a CID[4] has been negatively marked or placed on a "denylist". Due to the vast array of possible reasons stemming from various social and cultural environments, and the inherent subjectivity within them, it's challenging, if not impossible, to cover the full spectrum of "reasons" directly. ## 6. Voting Intention Voting embodies each user's intention and stance within the network. Ensuring a voter's genuine recognition of the CID is paramount. This authenticity can be confirmed by the network through signature verification. Every vote is unique, with the potential to either support or contest a CID[4]. * 6.1 Self vote The self-vote is a vote in favor of one's own CID[4] within the reputation system. The necessity for self-voting is purposeful and deliberate. It acts as a testament to the content creator's dedication, providing an initial indicator of trust in the content's authenticity and caliber. By requiring this intentional step from the creator, it can also function as a preliminary filter against spurious or harmful content. This move is foundational, paving the way for the organic voting evolution by the wider network membership. * 6.2 Voter's table The Voters Table serves a pivotal role in the voting process, ensuring that votes aren't cast multiple times for the same CID. The primary objective of this table is to keep track of the voters and the CID[4] hashes they've voted for. To achieve storage efficiency and swift querying, a Bloom Filter[7] is employed. The Bloom Filter provides a way to check an element's membership in a set without the need to store the entire set itself. ``` FUNCTION FindVoter(address, cid): item = SHA256(address + cid) exists = BloomSearch(item) RETURN exists ``` * 6.3 Votes Pool Given that the vote emission from nodes is asynchronous, there's a chance that a node designated as a validator might collect votes from the "pool" inconsistently if it doesn't have an appropriate conflict resolution mechanism. The 'pool' is an operation-based U-Set[9] CRDT, serving as a holding area for votes waiting to be collected. The U-Set[9], a simplified version of the 2P-Set[9] CRDT, operates under the assumption that each vote is unique and that once a vote is removed, it will never be added again (See Section 6, index 6.2). Additionally, a 'downstream precondition'[9] ensures the causal order of operations, thereby guaranteeing consistent convergence within the set of votes. The removal operation of votes is reserved exclusively for the validator chosen by the network. Once the votes have been processed and integrated into a block, the validator issues a singed removal operation, ensuring efficient cleaning of the pool. * 6.4 Vote format Voting intentions are expressed using the JWS in its flattened serialization form. To streamline the process of signature validation, the JWS header should contain the public key, following the standards set in the jwk header parameter spec[13]. - 6.4.1 Header Votes are chronologically ordered and indetified by a unique hash: * Hash: A distinct hash for the vote acting as a unique identifier. * Clock: A logical clock that records chronological order. * Operation: The operation to apply. eg: 'insert', 'remove' - 6.4.2 Payload For each vote, it's imperative to incorporate the following claims: * Intention: 1 for "Allow" or -1 for "Deny". * Cid: The specific Content Identifier (CID) that the vote is associated with. * 6.5 Voting moral Sybil attacks are among the most well-known issues in traditional consensus protocols. As part of the solution to address this problem, we propose the concept of "voting morality." Consider a participant in the network to have sufficient "moral load" for voting in favor or against a CID[4] when their accumulated rating is at least as given by: ``` FUNCTION CalcMoral() ratings = GetRatings() x = sum(ratings[i]) moral = ln(x) RETURN moral ``` ![](https://hackmd.io/_uploads/BkpSo1Jn2.png) The logarithmic[11] calculation ensures gradual growth as the harmonic mean increases, allowing for a smooth adjustment of the required value to attain the "moral vote" threshold. The combination of the harmonic mean and logarithmic calculation offers a balanced approach to prevent the dominance of extreme values and ensures a fair distribution of decision-making power in the network or platform. ## 7. Appeal and Absolution The appeal is a tool based on the dynamic calculation of a threshold, considering votes for and against within the network. Its purpose is to determine if these votes should have a positive or negative impact on decisions, depending on whether the votes surpass the established negative threshold or not. * 7.1 Negative Ratio Calculation This function allows us to determine the proportion of votes and discern whether the majority of them are for or against a specific hash. ``` FUNCTION CalcNegativeRatio(cid): stats = RetrieveReputation(cid) total = stats.allow + stats.deny ratio = stats.deny / total RETURN ratio ``` ![](https://hackmd.io/_uploads/HJZQmlJ3h.png) * 7.2 Decision Process The will of the majority is the determining factor in the network's decisions. Every verdict given by the votes can be subject to appeal and subsequently be absolved of any penalty. The condition for this is as follows: If the negative proportion exceeds 51% of the votes, a negative effect is triggered, and a penalty is imposed on the verdict. Conversely, if the proportion is below 50%, the appeal process leads to the absolution of the imposed verdict. ## 8. Penalization and Reward The Account Penalization/Reward system functions as a mechanism to enforce good behaviour and discourage detrimental activities in the network. It involves a decrease or increase in a network account's rating respectively. * 8.1 Validator reward and penalization The conditions under which a validator node can be penalized are determined by malicious or bad-faith behavior within the network. Once a node is selected, it is granted the privilege, through the collective trust of the network, to issue the next block of information. If any of the parameters set out in Section 11 are violated, the network will impose a sanction, reducing the selected node's rating to half of its total value. On the contrary, if the validator produces a 'healthy' block, contributing positively to the network, the account's rating undergoes an increase, calculated as specified in Section 4, index 4.2. ``` FUNCTION PenalizeValidatorRating(address): rating = GetAccountRating(address) penalization = rating / 2 RETURN penalization ``` * 8.2 Voting reward Voters are the fundamental pillar of the rating system and determine the reputation of the information published through the CIDs. They are the ones who establish a "collective criterion" through the decisions made in their votes. Therefore, it's essential for network members to maintain their commitment, ensuring an honest and good-faith environment that promotes the curation of content distributed across the network. To encourage active participation, the voting reward will increase the rating balance of the voter, according to details specified in section 4, index 4.2. ``` FUNCTION GetVoterReward(address, cid): exists = FindVoter(address, cid) if not exists in "voter's table": RETURN CoolingReward(address) RETURN 0 ``` * 8.3 Lock/Unlock mechanism An account rating can either increase or decrease based on the account's activities. If an account's rating falls below 0, the account is locked and prohibited from further participation until the rating is raised above 0 again, at which point the account is automatically unlocked. ``` FUNCTION LockUnlockAccount(address): rating = GetAccountRating(address) if rating < 0: LockAccount(address) else: UnlockAccount(address) RETURN rating ``` * 8.4 Factors Leading to Penalization Penalization occurs when an account attempts to propagate any that violates the parameters outlined in Section 11. For every violation, the account's rating decreases by 1 point. This mechanism aims to discourage the distribution of inappropriate or harmful content. * 8.5 Factors Leading to Reward Rewards are bestowed upon accounts that disseminate content adhering to the guidelines outlined in Section 11. When such criteria are fulfilled, the account's rating undergoes an increment calculated as specified in Section 4, index 4.2. This mechanism underscores the network's commitment to amplifying the circulation of credible, high-quality content, ultimately elevating informational trustworthiness and enriching the user experience. * 8.6 Impact of Negative and Positive Voting on the Account's Rating Every time new votes are added to the reputation store, the system recalculates the negative ratio. As detailed in Section 6, the outcome of the verdict, appeal, and absolution processes directly affects the account rating associated with a particular cid hash: - If the verdict results in a penalty, the rating decreases by 1 point. - If the verdict leads to absolution, the rating increases by N amount calculated as specified in Section 4, index 4.2. ## 9. Validators Validators are responsible for releasing the next block of information on the network, which includes valid CIDs and validated votes from participants. Both the CID[4] and the votes play a crucial role in the network's operation. **9.1 Democratic Validation:** Validators are selected by the network based on their rating weight, longevity and behavior in a process inspired by proof of stake[10]. **9.2. Selection parameters:** Choosing a validator is an essential process based on accumulated rating and longevity of each validator. The probability of a validator being selected is directly related to its rating and uptime on the network. * 9.2.1 Rating-based Reputation: This process ensures that validators with higher ratings have a higher probability of being selected, incentivizing nodes to act positively and constructively within the network. * 9.2.2 Behavior-based Reputation: A validator who has never had sanctions or penalties on the network would have a slight advantage over another with a less clean track record. ## 10. Operations Blocks Blocks serve as efficient and secure transmission mechanisms for conveying voting intentions throughout the network. They follow and adhere to the same purpose as a replicated state machine[15]. Blocks enable the efficient transmission of operations, mitigating network congestion caused by individual operation transmission and processing, thus enhancing system scalability. Furthermore, blocks facilitate the preservation of chronological order, capturing operations in the sequence they occur. Each block links to its preceding and succeeding blocks, ensuring the continuity of operations. ![image](https://hackmd.io/_uploads/SkNHIdBeA.png) > *Figure: Raft's Proposed Replicated State Machine Architecture https://www.alibabacloud.com/blog/an-introduction-to-the-system-architecture-abstraction-of-the-replicated-state-machine_599279* **10.1 Blocks** The blocks constitute an operation-based list RGA[9] CRDT designed exclusively for insertions, with write access restricted to the validator chosen by the network. Each block contains a set of voting intentions, referring to Section 6, index 6.3, collected at a specific moment. These blocks act as temporal snapshot logs that immutably record the validated votes at a specific instance. RGA[9] CRDT is an ideal choice for ordered collections. In our scenario, it represents a linked list of blocks that demands causality to ensure efficient replication, steadfast persistence, and consistent state rebuilding across the diverse nodes of the network. **10.3 Blocks and State Reconstruction** The logic of state reconstruction follows the "event sourcing" pattern[16], where each block contains a set of events captured at a given moment and stored in a store as a "sequence of changing state events", which when executed in the correct order leads to a deterministic global state. As Martin Fowler points out, "The fundamental idea of Event Sourcing is to ensure that every change in the state of an application is captured in an event object, and that these event objects are stored in the sequence they were applied for the same lifetime as the application state itself." It is important to understand that since the insertion of new blocks into the list is limited to one validating node at a time, it inherently remains "conflict-free". Although this may seem like an act of "overengineering", the reconstruction of the global state or consensus requires consistency. This consistency is essential, whether it's for new nodes requesting a replica of all blocks or for nodes that lost connection at a specific point and need a partial update of those blocks. Each node maintains a set of stored blocks as logs, which represent a sequence of changing states. These blocks collaboratively work to achieve a replicated and cohesive global state. This is accomplished by systematically applying the operations housed within each block to the current state of every node, thereby contributing to the coherence and replication of the global state. **10.2 Block format** Blocks adopt the JWS JSON flattened serialization method for construction. It is essential that each block be signed by the validator designated for its creation. Streamlining the process of block signature validation by nodes, the JWS header mandates the inclusion of the public key, adhering to jwk header parameter spec[13]. **10.2.1 Block header** Blocks are chronologically ordered and are uniquely identified by both their hash and number. This structure guarantees a continuous, tamper-resistant sequence stretching from the genesis block to the most recent one. The key components of the block header are: * Number: Assists in tracking the cumulative block count and supports the system's chronoogical ordering. * Clock: A logical clock that records chronological order and determines causality. * After: A hash referencing the logically preceding operations block. * Hash: A cryptographic signature representing the content of the current block. * Merckle root: By integrating and hashing votes into a Merkle tree, the block's data integrity is ensured. **10.2.2 Block Payload** For each vote, it's imperative to incorporate the following claims: * Votes: A set of vote intentions. ## 12. Jurisdiction and Channels Channels can be seen as analogous to different "chains" in blockchain networks. Each channel has its own state and set of rules or jurisdictions related to the content of the CIDs[4] that might be considered appropriate or admissible in each channel. This allows different channels to comply with specific regulations, making global compliance an harmonious adaptation rather than an obstacle. An example would be the idea is you want to be compliant with US laws in the US, Australian laws in Australia, region T, etc. ## 14. Conclusions This white paper has presented an innovative approach to tackle the challenges of content transmission on decentralized networks. An outlined system has been provided that, by incorporating a robust rating mechanism, not only guarantees the integrity of the content but also promotes the fair and reliable participation of all nodes. ## 15. References [1] PubSub - "PubSub Overview". libp2p Documentation. Accessed on August 29, 2023. - https://docs.libp2p.io/concepts/pubsub/overview/ [2] SEP-001 - "SEP-001 Specification". Synapse Media GitHub Repository. Accessed on August 29, 2023. - https://github.com/SynapseMedia/sep/blob/main/SEP/SEP-001.md [3] JWT - J. Jones, "JSON Web Token (JWT)", RFC 7519, May 2015. - https://www.rfc-editor.org/rfc/rfc7519 [4] CID - "Content Identifiers: What is a CID?". IPFS Documentation. Accessed on August 29, 2023. - https://docs.ipfs.tech/concepts/content-addressing/#what-is-a-cid [5] JWK Thumbprint - M. Jones, "JSON Web Key (JWK) Thumbprint", RFC 7638, September 2015. - https://www.rfc-editor.org/rfc/rfc7638 [6] Externally Owned Accounts - "Accounts". Ethereum Developers Documentation. Accessed on August 29, 2023. https://ethereum.org/en/developers/docs/accounts/ [7] Rainbow Table - "Rainbow Table". Wikipedia. Accessed on August 29, 2023. - https://en.wikipedia.org/wiki/Rainbow_table [8] Bloom Filter - "Bloom Filter". Wikipedia. Accessed on August 29, 2023. - https://en.wikipedia.org/wiki/Bloom_filter [9] A comprehensive study of Convergent and Commutative Replicated Data Types - M. Shapiro et al., "A comprehensive study of Convergent and Commutative Replicated Data Types". HAL archives-ouvertes. https://hal.inria.fr/inria-00555588 [10] Proof of Stake - "Proof of Stake". Wikipedia. Accessed on August 29, 2023. - https://en.wikipedia.org/wiki/Proof_of_stake [11] Harmonic mean - "Harmonic mean". Wikipedia. Accessed on Auguts 29. 2023. - https://en.wikipedia.org/wiki/Harmonic_mean [11] Logarithm - "Logarithm". Wikipedia. Accessed on August 29, 2023. - https://en.wikipedia.org/wiki/Logarithm [12] JWK - M. Jones, "JSON Web Key (JWK)", RFC 7517, May 2015. - https://www.rfc-editor.org/rfc/rfc7517 [13] JWS - M. Jones et al., "JSON Web Signature (JWS)", RFC 7515, May 2015. - https://www.rfc-editor.org/rfc/rfc7515 [14] Algorand: Scaling Byzantine Agreements for Cryptocurrencies - https://people.csail.mit.edu/nickolai/papers/gilad-algorand-eprint.pdf [15] State machine replication - "State machine replication". Wikipedia. Accessed on April 11, 2024. - https://en.wikipedia.org/wiki/State_machine_replication [16] Event sourcing - Martin Fowler "Capture all changes to an application state as a sequence of events." December 2005. - https://martinfowler.com/eaaDev/EventSourcing.html