Blocklist Tagging & Architecture Sketch

# Blocklist Tagging & Architecture Sketch ## wait wut see https://hackmd.io/_QVeXEeDSVaJNEpo69GdXw for context ## Open Design Questions 1. Can [published] lists consist of heterogenous entries in terms of, say, URI type? mix regex strings, URLs, and CIDs? - Filter out URLs which are CIDs? 2. Is it useful to distinguish "internal representations" or merged lists from published lists? 3. Is ignoring all per-item tags and executing an entire list a valid use-case or are there any tags that need to be parsed? 4. Can the same list format be used for user accounts, regexs/filter expressions, cusswords, ActivityPub URLs, CAIP URIs, etc? Should lists only contain one kind of entry, in this regard? - How hard could/should we lean on, e.g., IPNS or some such mutable indirection to handle actors by reference to avoid native-format blocklists specific to a network/use-case? ## Minimum Viable Tags-onomy Presumably locking us into a 10x10 matrix isn't gonna work, just wanted to have tags for use in a diagram to sketch out some use-cases. More robust tag namespace to come if the other stuff gets traction among the operators. ### Types of Lists > Hypothesis: across most use-cases, individual lists would cover ranges like 10-19 or 30-39, with specific numbers used as hints per-item, but some lists (or derived/merged lists and internal lists) could drill down to single-digit lists, or single-entry-type lists? 00 - "Universally" bad bits - (double-hashed) 10 - Sexytime 20 - Rightsholdertime 30 - Toxictime 40 - Antisocial Timeouts (content) 50 - Antisocial Timeouts (actors) 60 - Inauthentic Activity 70 - Inauthentic Actors Presumably within each of these high-level categories there would be high-level categories, for just spitballing here: 10 - Sexytime 17 - porno companies would call this niche 18 - porno companies would call this amateur (i.e. potentially non-consensual; see 22/23 below) 19 - credit card companies (or other specific, globally-important authority) would call this porno 20 - Rightsholdertime 30 - Content Toxic to social norms 37 - Scoped to some 38 - Scoped to a specific set of countries or sub-country regions 39 - Scoped to specific country*** 40 - Antisocial Timeouts (content) 50 - Antisocial Timeouts (actors) 58 - subnet something something?? 59 - Scoped to specific identity system/social graph/identifier-namespace*** 60 - Inauthentic Activity 68 - Spam (regardless of humanity of author) 69 - Inauthentic content suspected/detected 70 - Inauthentic Actors 78 - Fraud 79 - Astroturf/fake profiles ## Scoping Lists Thoughts * It probably doesn't make sense to overspecify this or overthink it in advance but the top level categories (10s, 20s, etc) were designed assuming most list publishers and subscribers would naturally group them all into ONE list for all 1X or 2X categores. * As such, blocks wouldn't need to be tagged per item, except perhaps as annotations where entries link to documentation of decisions for transparency * A list publisher could annotate individual blocks in detail with "internal" tags on top of the externally consumed ones * Subscribers in conservative/paranoid-mode subscribers could just process entire lists as "block all" (ignore all tags) * Subscribers with more nuanced policies could use tags to selectively consume and/or replicate/subset-publish. * Perhaps someone publishing country-specific lists would break 39 out from the rest of the 30s, at time of blocking or later, say at time of publication, as a convenience to subscribers? * Model of "authority" is schema-based as well as actor-based, i.e., you might choose to subscribe to some list-publishers and ignore others, AND/OR get granular about authorities using annotations and linked documented * 5X might only make sense within social-graph anchored data like ActivityPub, Bluesky, and Nostr, where stable identifiers can be attributed to actors and instances. Analogy here is 39::59, i.e. treat these networks like nations (and hope they're not run by benevolent dictators) * rabbithole - bridged protocols? :scream: * total honest, I am not 100% sure there's even a use-case for using the same rails for 5X as for everything else. do IPFS gateways ever need to care what users are blocking what users, or is that only for servers running social-web-specific software in a low-trust/permissionless environment? might be moot until there are multiple bskys, multiple instances running ipfs-based AP, etc * AP dev community is still [figuring out how to add this to AP core data model...](https://socialhub.activitypub.rocks/t/fep-c648-blocked-collection/3349/7) ## User Stories 1. Gateway X in country DE wants to not serve a single CID with a certain nasty symbol on it - X subscribes to every 39:de list and even follows many 3X list scanning for 39:de entries just to be sure 2. Divide and conquer: 3 major gateways (A,B,C) decide to divvy up responsibility. - A will publish an authoritative list of 1X blocks every 24 hours. - B will publish an authoritative list of 2X blocks every 24 hours. - C will publish an authoritative list of 3X blocks, plus a few 39 lists for exceptionally enforcement-addled countries - A, B, and C have a 24 hour embargo on new content - A & B compile 31-only lists and send to C just as an input an courtesy - A & B subscribe to C's 3X list * A blocks anything 32 and above, and all 39s just in case * B is a little more liberal, serves 32s until same CID appears as a 33, only applies 39s regionally 3. Consensus-based Moderation (see [this ActivityPub Prior Art](https://codeberg.org/oliphant/blocklists#the-tier-consensus-system)), or "polling" multiple blocklists (of actors and/or of content) ## Merging Logistics