Decentralize BadBits - A Transparent and Collaborative Claims Network

# Decentralize BadBits - A Transparent and Collaborative Claims Network (Adapted from Vasco's [original proposal](https://hackmd.io/@vasco-santos/Sy2Pv9V9kg)) ## Introduction The Bad Bits Denylist is a public list of hashed Content Identifiers (CIDs) that have been flagged for various reasons, such as copyright violations, malware, and other forms of abuse. This list helps IPFS node operators —such as public IPFS gateway providers—choose to opt out of serving content flagged as problematic by other authorities or services. [IPIP-383](https://github.com/ipfs/specs/blob/main/src/ipips/ipip-0383.md) took the first step on decentralizing content moderation in IPFS setups. However, the only list being published in this [IPIP-383](https://github.com/ipfs/specs/blob/main/src/ipips/ipip-0383.md) format is the output of the current takedown process centralized under Protocol Labs, requiring manual review via email or Google Form submissions. This process lacks transparency, is prone to inefficiencies, and does not scale with the growing needs of the IPFS ecosystem. Currently, error messages (on any gateway) refer the requester of a taken-down CID to this central process, so while multi-list functionality is already live, no other lists are available on [denyli.st](https://denyli.st) or in IPFS documentation/tutorials. Waiting for organic demand is not enough to spread the responsibility and service burden to non-PL-operated gateways. Facilitating the development of multiple lists and weening people off of status quo (total dependence on a single list) require some incentivization and investment in the community. [IPIP-383](https://github.com/ipfs/specs/blob/main/src/ipips/ipip-0383.md) format does not provide out of the box flexibility for diverse policies (operators with different needs, risk assessements or even jurisdictional requirements), nor out of the box protection for tampering and abuse. We feel that additional functionality like machine-readable and interoperable "takedown receipts" for when gateway operators add something to their own blocklist can help with the compliance risk of gateway operators, empower more control over local "dispute resolution" processes for their own customers, as well as incentivizing them to subscribe to each others' lists in addition to (or instead of) the single list operated by PL. This supports wholistically the move to push more traffic off of public gateways and incentivize downstream developers to host their own gateways on domains they control (finally addressing a years-long issue of "ISP-level blackouts" caused by sloppy implementation of public suffix lists). ## Project Overview This proposal consists of three outputs: 1.) An IPIP that extends [IPIP-383](https://github.com/ipfs/specs/blob/main/src/ipips/ipip-0383.md), specifying a data model for the per-block "annotations" left unspecified in 383, and any other functionality needed to scaffold interoperable, machine-readable annotations of blocks. 2.) In addition to the list-subscription mechanism for syncing denylists already specified in 383, specify an HTTP API (for either internal use or use between trusted gateways) to check a single CID against denyli.sts in realtime. 3.) A quick-and-dirty prototype of a system that generates takedown receipts and publishes them before appending each block to a local denylist. This will be for evaluation purposes only, allowing gateway operators and Shipyard to evaluate or even benchmark the core of a local takedown service. If there is sufficient evaluator interest to adopt items 1, 2 and/or 3 in a more production-ready form, these can be addressed in a future grant, and/or incorporated into Shipyard's roadmap. ## Technical Design ### Core Principles * **Transparency** – Every claim is verifiable and publicly auditable. * **Decentralization** – Multiple actors participate in maintaining the list, preventing unilateral control. * **Customizability** – Operators can choose whom to trust and define their own policies. * **Automation** – Claims submission and verification can be automated, reducing manual overhead. ### Technical Overview We propose transitioning from one centrally-managed compact denylist to a patchwork of lists, by extending each denylist entry with annotations. These annotations form a decentralized registry of moderation receipts authored by actors identified with DIDs, which are linked from each compact denylist entry and synced between actors. 1. A gateway operator (e.g. w3s.link) associates a simple `did:web` identifier with each gateway by hosting a DID document at, e.g.: `https://w3s.link/.well-known/did.json` 2. Anyone can submit a claim that a CID is malicious to a gateway operator, along with a reason and supporting metadata. If accepted, this takedown event is encoded in a compact DAG-CBOR document attached (by CID reference) to each denylist entry, after the blocking gateway's DID (. A dictionary for coarse-grained reasons will be included in the annotation data model IPIP, as illustrated in the following example: ```json { "data": { "cid": "bafy...", "reason": "Malware detected", "timestamp": 1710000000, "claimer": "did:web:w3s.link" }, "signature": "abc123..." } ``` 3. Gateway operators can choose whom to trust by maintaining an offline list of the DIDs of trusted gateways (or even the DIDs of other trusted reporters, not necessarily limited to `did:web`). 4. Claims are cryptographically signed and stored transparently, allowing verifiable decision-making. 5. Consumers of lists can _optionally_ apply policies based on the DID, the verification of the signature, the timestamp, and/or the reason; simply ignoring the annotations and blocking anything that appears in any list to which a gateway is subscribed (i.e. status quo) remains possible as well, and may well be the default behavior for smaller and self-hosted gateways. ### Implementation Given the core importance of this system, the implementation MUST enable easy adoption from the community and facilitate more, not less, users hosting their own gateways. The prototype for gateway-operator consideration can hard-code the `did:web` features above since gateways are the primary adopters, but a production-ready version **MAY** abstract this to allow other DID methods, if adopter feedback justifies using the same mechanism for gateways and other actors. Each actor (e.g., an IPFS gateway operator) must have a Decentralized Identifier (DID), hosted under their domain using did:web, for the prototype implementation; abstracting to support other DID methods and/or other actor types can be explored in future iterations. They CAN submit publish a receipt, cryptographically signing them with their DID-associated key, for each entry they add to their own denylist. By default, these can be published and pinned on the DHT, but additionally these could be hosted and requested via API, or otherwise shared with other gateways. Other operators can, at time of syncing denylists OR at time of fetching CIDs, query and verify existing claims, checking: 1. If a CID has been flagged as bad. 2. Who flagged it and whether they trust that entity. 3. (Optionally) fetching the CID of the receipt for the flagging event and parsing it for more fine-grained policies based on a taxonomy of takedown category provided in the IPIP or future ones. As all of these claims will be accessible via DHT, they form a publicly accessible registry, ensuring transparency, auditability, and tamper resistance via cryptographic signatures. ## User Feedback and Adoption Plan Organizations that could benefit from adopting this system: * Storacha (w3s.link) * nft.storage (nftstorage.link) * ipfs.io / dweb.link * Pinata * Fleek * Other public IPFS gateways and storage providers Outer-ring of possible interest: * how are Swan/Ramo/Lilypad taking down toxic content generated in their marketplaces? * bluesky moderation team interested in sharing notes? * does ROOST include any other platforms that would be interested in switching to a DASL-based system? These would be asked for input at the beginning of the design process, and after the conclusion of the prototype, invited to review the prototype and draft designs. Pending interest, follow-on grants could production the prototype, productize it, modify the design, support false reporting mechanisms, and/or assist integration to existing codebases (including Rainbow and/or Helia). ## Schedule and Budget This grant is split into 5 milestones. Each milesone has a reasonably similar timeline and has an estimated effort of 30H (~4 days) for an Engineer, assuming 6~10H of Design support and Community engagement from IPFS Foundation core team for each milestone. 1. Annotation Data Model IPIP 2. Specification, implementation and release of a HTTP API library and datastore interfaces for the state registry. 3. Implementation and Release of a JS Client library and CLI tool – Allowing developers to easily interact with the service, parse and filter lists by DID or by moderation receipt metadata, etc. 4. Documentation adequate to serious evaluation by Gateway-as-a-Service providers and other possible adopters; additionally, technical documentation sufficient for integration and follow-on grant to be implemented, potentially, by another grantee less familiar with rainbow, kubo, and storacha stack. 5. Quick-and-dirty prototype that can be run locally with a datastore implementation based on append-files approach in https://denyli.st/ Total Engineering Budget: `30 hrs @ $200/hr x 5 = ` 30,000 U$ ### Projected Scope for Follow-on grant, after user research validation 1. Deployable service to accept claims from DIDs (potentially with a datastore aggregation depending on user needs) 2. Documentation and adoption guide for how to deploy and operate a running system. 3. One year of maintenance on all of the above ### Future work Once there is a deployable PoC for this "gateway sidecar", figuring out a Governance Model is critical. Ecosystem funding is essential for keeping the service operational and widely adopted. This would unblock: 1. Deployment of the service API with public access. 2. Maintain infrastructure 3. Monitoring and observability 4. Ensure uptime and reliability. 5. Engage with community for adoption. Other ideas to build on top of the initial proposed MVP: * Implement a reputation layer to prioritize trusted sources; work with gateway providers and/or other possible users (ROOST?) * Explore further decentralization of the badbits service, like open-sourcing reporting-pipeline tooling that conforms to the proposed annotation data model ## Qualifications of Team, including prior open-source work - Vasco Santos - Worked on IPFS, libp2p, UCAN tooling, reads pipeline and moderation system at web3.storage/nft.storage before nucleation of Storacha - Bumblefudge - Can support on IPIP and design of system (not included in budget) - TBD