# DigiCom Exam Prep # OSN History ### Beginning - BBS AND IRC --- ### Pre OSN Webservices with User profiles, Interests, Connections Examples: - Match.com (1995) - Classmates.com (1995) - Ryze.com (2001) --- ### OSN Era Webservies that **statisfy formal definition** Sixdegrees was first one to reach that goal Details differ Examples: - sixdegrees.com (1997) - Friendster (2002) - MySpace (2003) - Hi5 (2003) - Facebook (2003) - Orkhut (2004) --- ### Copycat Era Competitors emerge around the globe Examples: - StudiVz (2005) - RenRen (2005) - VKontakte (2006) - Odnoklassniki (2006) - Google+ (2011) --- ### Niche OSN Focus on specific type of content Examples: - Images - Flickr - Instagram - Tumblr - Video - YouTube - Twitch - Vine - TickTok - Business - LinkedIN - Xing - Travel - Trip Advisor - Microblogging - Twitter --- ### Decentralized OSN Era (Mostly) independent, open source projects Allow users to setup their **own servers** or maintain a **variety of mutually independent servers** to choose from Different Types: - P2P - Federated - Hybrid Examples: - Identi.ca (2008) - Diaspora (2010) - Friendica (2010) - GnuSocial (2010) - Hubzilla (2012) - Pump.io (2012) - Mastodon (2016) --- ### Instant Messaging Focused on textual communication **Not** actual **OSN** services **Similar** **evolution** as OSN services: **Copycats**, then **specialization** and **decentralization** Examples: - IM - ICQ (1996) - AIM (1997) - MSN Messenger (1999) - Tencent QQ (1999) - Live Messenger (2001) - Skype (2003) - Mobile IM - WhatsApp (2009) - WeChat (2011) - FB Messenger (2011) - iMessage (2011) - Line (2012) - Signal (2013) - Telegram (2013) - Hangouts (2013) - Discord (2015) --- # Definition **Boyd and Ellison** 2007: An OSN is a web-based service that allows „[…] individuals to (1) construct a **public or semi-public** profile within a bounded system, (2) articulate a **list of other users** with whom they share a common connection, and (3) **view and traverse their list of connections** and those made by others within the system“. 2013: OSN services are „networked communication platforms in which participants (1) have **uniquely identifyable profiles** that consist of user-supplied content, content provided by other users, and/or system-provided data; (2) can **publicly articulate connections** that can be viewed and traversed by others; and (3) can **consume, roduce, and/or interact with streams of user-generated content** provided by their connections on the site“. Datta et al. (2010): An OSN is "an online platform that (1) provides services for a user to build a** public profile** and to explicitly **declare the connection between his or her profile with those of other users**; (2) enables a user to share information and content with the chosen users or public; and (3) **supports the development and usage of social applications** with which the user can interact and collaborate with both friends and strangers". --- ## Definition - User **Profiles** - Friend Lists /**Connections** - **Interaction** with and access to profiles (Social Graph) - **Communication** between users ### Web Service vs. OSN - User Profiles - List of connections (Social graph) - Access to content of other users - --- ## Classification ### Classicication by Properties Two Dimensions - Social presence and Media richness - Self-presentation and Self-disclosure - ![](https://i.imgur.com/zCBeiuP.png) ### Classification by Scope Two Dimensions: - Usage: Private vs. Business - Focus: General vs. Special Interest - ![](https://i.imgur.com/xntXrkq.png) --- ## OSN Features ### Definition: OSN Feature Functionality that facilitate **interaction between users** or **access to content** If **I can access** something specific in **another user**’s OSN profile, **it is an OSN feature**. Otherwise not. ### Core Featureset - **Social Profile** - Customizable page representing the user - Date of Birth, Picture, Hobbies,... - Accessable by other users - **Link** - Connections to other users - Indicates Friednship or interest - Uni- or bidirectional - Usually involves a protocol - **Conversation** - IM between two or more users - Can contain images, files, video... - **Activity Stream** - Personalized feed - Status updates and content from others - Typically a combination of Actor, object and verb - **Like/Reaction** - Public statement - Like/Dislike - Reactions (more specialized) - Music & Playlists - Image/Video/Livestream - Poke (anstupsen) - Comments - Events - Voice/Video Calls - Groups # DECENTRALIZATION ## Centralization ### Web Service Architecture Typical web service follows the **traditional Client-Server** paradigm **Server provides all functionality**, data storage, authentication, access control, … **Clients access the web service** via a web browser or special client software (e.g. mobile application) **Easy to implement, easy to maintain.** In case **changes** need to be implemented, the **central server is updated** ### Initial Idea - Arpanet - Fault tolerant communication network that can withstand even war or natural disasters - Connecting mainframes of US universities - WWW - Sir Tim Berners-Lee invented the web as a decentralized, distributed network for information exchange and collaboration - Connecting documents via the Internet ### Re-centralization of the Web - Services Depending on Thrid parts resources - Hosted on Cloud Services (AWS, Azure, GCP) - Integration of other Services (Payment, Maps) - Scrips loading from external sources - Vulnerable (Dyn attack) ![](https://i.imgur.com/98GMjRK.png) ### Resons for centralized - Clients have less resource then Servers --- ### Distributed vs. Decentralized In a **distributed system**, a collection of **independent computers** appears to its users as a **single coherent system**. Decentralization is achieved by **placing logically different components on different machines**“ But: the components of **distributed systems** form a network of connected components **without having any central element of coordination or control** - Centralized: - **One entity** or organization is able to exert **control** over (at least) one component of a service. - Decentralized: - **No entity** is able to exert control over **any part** of a service that is critical for its operation - No decentralization without distribution! ### Centralized Frustation - Data Silos: Access to data and functionality exclusively via proprietary APIs/Protocols - Barriers Between OSN Services - The “Social Web” has become a landscape of “isolated islands” - OSN Services as a Commodity: Giving up on OSN services: 28% Hard - Significant loss of functionality: Data is not reusable outside of the site ### Bill of Rights for Users of the Social Web - Ownership of their own personal information ### Interoperability Four Layers: - Technology (beeing able to connect) - Data (exchange of messages) - Humans (Understanding each other) - Institutional (Collaboration between companies) ![](https://i.imgur.com/FjLaATh.png) #### Why? - Interoperability fosters innovation and competition - Dominant companies will often Attempt to further manifest their market position - Introduces more diversity #### Requirements - A formal description of a protocol - A set accompanying data formats - An open standard applying those #### Effects - Network Effect - Lock-In Effect - Walled Gardens #### Data Portability Allow users to export and re-import their user data between services European General Data Protection Regulation (GDPR) grants a general right to data portability in §20 -> Problem: No Data Format defined -> Zip File #### Done Right Requirements - **Data formats** for cross-platform information exchange - **Protocols and APIs** for interoperability between different OSN service implementations - **Protocols for data portability** - A **domain-independent identification framework** for users and data ### Interoperability of DOSN Open Source implementations Implemented with open protocols and data formats from the “Indieweb” Due to poor or **missing documentation** of the used protocols and standards, the implementations are **mostly incompatible** even within the same protocol suite #### Aggregator Services - Applications or Services that manage accounts from multiple OSN services - Aggregator fetches content from all managed OSN services and displays them in an integrated view --- ## Decentralization ### Benefits #### Censorship If a system has a central entity (organization, server, ...), it may impede access to information by - Restricting access to the service - Inhibiting communication - Deletion of content - Force (de)publication of specific content Dark Side of decentralization: No Way to remove - Hate Speech - Fake NEws - Legal content #### Data privacy and Control Problems with out it: - Targeted Advertisement - Filter Bubble - “Hacking” Society Implications: - Complex Organization and Management - Technical Overhead - Business Models ## Requirements for Design and Implementation ### Web Services - Interoperability - Integration into WWWW - Security - Scalability - Extensibility ### Future Internet Design Principles - Heterogeneity - Scalability - Robustness - Loose coupling - Locality --- ### Decentralized OSN Services #### Design and Implementation - Transparency - Integration - Functionality - Relations - Availability - Confidentiality - Access Control - Privacy #### Sucess - Independence - Free-of-Charge - Online Times (Availability) - Mobile Support - Efficiency - Scalability - Resiliency - Privacy - Performance --- # OSN Architectures ## Overview ![](https://i.imgur.com/nV45qyg.png) ### Peer to Peer - Serverless - Clients connect directly - Nodes store data and provide functionality - Problems - Availability - Authenticity / Reliability - Access control ![](https://i.imgur.com/yDjuoNA.png) ### Federation - Multiple independent servers (PODs) running the same OSN - Users select a Pod an create a user profile - PODs are connected via loose coupling - Communication between PODs via federation protocol ![](https://i.imgur.com/CoSFWDf.png) Loose Coupling: Components have little knowledge of another component’s internal functionality ### Hybrid Option 1: Some aspects are centralized Option 2: Combination of different decentralized approaches (eg. P2P) --- ## GNUTELLA - Decentralized P2P Network - File Sharing ### IDEA - Each node connects to a number of other nodes - Node keep dynamic list of other nodes - Messages are forwared to all neighbors - Transmission of files via HTTP GET - Bootstraping: Needs to know one node ### Protocol #### Header Structure - GNodeID: UUID - Function - TTL: decremented per hop - Hops: incremented per hop #### Functions - Ping: Keep alive message - Pong: Answer to Ping, Should include IP adresses of other nodes - Query: Search for a file - Queryhits: Result, contains the Target Node's Info - Push: Initiate download in case of firewall #### Routing If Node receives PING or QUERY: - Save ofirigin of message - Store DecriptorID (GNodeID) - IF TTL = 0, kill the messge - If DescriptorID is already known, kill the message - PONG and QUERYHITS are forwarded to the origin ![](https://i.imgur.com/SXlQ0zt.png) #### Gnutella 0.6 - New Concepts: Ultrapeeds and leafnodes - Ultrapeers handle and forward search requests - Leafnodes only connect to one Ultrapeer - Any node can become Ultrapeer --- ## Structured P2P Problems with Gnutella: - Gnutellas flooding search is slow, inefficient and unrealiable - Gnutella is not scalable - Failing nodes can partition the network Power Law: Some nodes are highly connected (hubs) Small World: Nodes are highly connected to the near and have shortcuts to the far nodes ### Properties - Unified Address Space - Data is mapped to address space - Nodes a responsible of a part of the address space - Nodes monitor responsibilities autonomously --- ## Distributed Hash Tables ### Problems - Authenticity, availability and integrity - Anyone can read and overwrite data in DHT - Changing Data in DHTs - Needs to be updated - Attacks - Sybil Attack: Forging multiple identities to control, create large niumber of node to control DHT - Eclipse Attack: Using fake nodes to provide false references - Spartacus Attack: Claiming the same node ID as another node - DoS/DDoS Attack: Flooding a Node with requests to crash it ## Chord Protocol - Decentralized P2P Network without central entities - Peers establish and maintain connections between each other in a structured manner ### Idea - Circular Address Space - Map Nodes to that circular address space with a hash map. - Node is responsible for address space behind it - Peers maintain connection to a subset of other nodes - Each nodes maintains finger table and know predecessor and successor - Fingers adress nodes with ever increasing distance - Recursive lookup - Answer direct to the requesting Node - Bootstraping: One Node needs to be known, Limitations: - Key needs to be exact - Default: every file is only on one Node, Circumvention: Two has functions, duplicat data ![](https://i.imgur.com/Y1QXsU3.png) --- ## Kademlia - DHT - Address Space of 160 Bit - Nodes and data are mapped to the same address - Nodes store keys that are closest - Structure: Binary Tree - Recursive Lookup - Next Nodes are calculatated by XOR ### Protocol - UDP based - RPC calls have random number - Calls - Ping: Queries Node, Sender and receiver update their k-bucket - Store: Key-Value Pair to a node - Find_Node: Request a list of k triples from a node closest - Find_Value: Requests the value for a specified key, Returns value of a list of triples closest to the key --- ## XMPP - Extensible Messaging and Presence Protocol - Federated messaging protocol - Independent Servers are loosely coupled - User Sign up with one server - Users are addressed via username@servername - Servers relay messages to the target server - Encoded in XML - Protocol based on XML Streams ![](https://i.imgur.com/i6QJGtJ.png) --- ## Cryptography Four objectives of Information Security (InfoSec): - **Confidentiality**: Secrecy of the secret message - **Non-repudiation**: Verification that a message was authored by an entity - **Integrity**: Verification that data has not be changed - **Authentication**: Verification of identity of a user or entity ### Symetric - Caesar Ciffre (leter frequency attack) - DES (brute forced in 1998) - AES (not broke) ### Kerckhoffs’ Principle A cryptosystem should be secure even if everything about the system, except the key, is public knowledge ### Asymetric - RSA - Select two (large) prime numbers $p ≠ q$. Set $N = p ∗ q$. $p$ and $q$ need to be kept secret - Elliptic Curve ### Signature #### Idea - Encrypt the Hash of the message using the private key of the sender - Encrypt the message using the private key of the reciever #### Trust - Web of Trust: Siginging each others keys - Digital Authorities #### Digital Certificates - **Subject name** - Certificate issuer / **certificate authority** - Serial number - Version - the X.509 version used by a given certificate. - Validity period - **Signature generated by the CA** - Signature algorithm - Public key information: - algorithm - key size - key usage - public key - Standard: : X.509 v3 --- ## Hash Chains ![](https://i.imgur.com/e68MMnJ.png) ## Ledgers Saving changes as Transformation function Distributed Ledgers: - State: All Accounts wit all transactions applied - Syncronizing State: Kademlia based, Transactions are forwarded to known nodes Consensus Building: - Majority of nodes decide - Proof of Work Block: - Parent Hash (Chaining) - Number: incremental counter - timestamp - nonce - State Root (Root Hash of Merkle Tree) - MixHash: Proof of Work proof Transaction: - To - nonce: Number of transactions - value/data: additional information - signatur Merkle Tree - Store the transactions - Hash Tree Byzantine Fault Tolerance - Confirmation of 2/3+1 Nodes (by signature) --- ## Blockchain ### Components - P2P Network - Transactions - Concsensu Rules - State Machine - chain of blocks - incentivization scheme - OSS Components ### Classicifation - Public - Private (only specifiv users can participate) - Federated (Hybrid, Leader nodes verify the transcation and grant rights) ### Examples - Bitcoin - Tracks unspent transactions instead of state - Ethereum - Ethereum VM - Accounts - Nonce - Balance - Contract code - Storage - Account Types - Externally Owned Accounts - Contract Accounts - Hyperledger - Umbrella Project of Open Soruce Blockchain - Fabric: Permissioned Blockchain, PBFT, Smart Contracts - IOTA - Directed Acyclic Graph instead of linear Blockchain - Users that issue a new transaction must approve two previous transactions and perform a small amount of proof of work ### Smart Contracts - Smart contracts have - An address - Member variables - Methods - Functionality of Smart Contracts can be called by e.g. sending a transaction to the blockchain and specifying what functionality to execute - Smart Contracts do not ”belong” to anyone - Smart Contracts are compiled into bytecode via solc and run in the EVM - Execution of Smart Contracts is “fueled” by Gas - Solidity (Language) - Tokens - Smart Contracts can be used to create other cryptocurrencies, called tokens - A Token Contract (e.g. ERC20) allows users to buy token from an artificially created supply of tokens - Oracles provide off-chain data for Smart Contracts - Types: - Immediate Read - Used only once - Fetches Information and provides it in an on-chain Smart Contract to be requested - Publish-Subscribe - Frequent used for changing data - Broadcasts service data - Ethereum logs provide means for verification - Request response - Provides part of a dataset - Provided data at a later time - D-Apps - Application that does not relay on any central entity - Definition: - completely open-source - operate autonomously - may adapt its protocol by consensus - data cryptographically stored in a public, decentralized blockchain - Blockchain Performance - Bitcoin 6.83(1M) / 13.65(2M) Tx/sec - Ethereum 15.6 Tx/sec - IOTA 50000 Tx/sec --- ## Decentralized OSN Architectures - Presentation Layer - Web Interface - API - OSN Layer - Access control - Search functionality - Recommender - Account Management - Data Layer - Data Storage - Persistence - Formatting - Federation / P2P Layer - Connection to other instances - Identity Layer - Identity Management - Key Management - ![](https://i.imgur.com/n52ycnO.png) # OSN Protocols and APIs ## Definitions ## Data Formats ### MIME - Standard for identifying content types and data formats other than plain text - Originally created for email - Describes content types in the format top-level-type/sub-type - Top Level Media Types - text - image - audio - video - application ### Best Practises (W3C) - Provide metadata - version indicators - persistent URIs as identifiers - machine-readable and standardized data formats - multiple formats - Make data accessible via API ### Dublin Core - Defines 15 elements for content description - Each element is optional and may be repeated - Title - Subject - Description - Type - Source - Relation - Coverage - Creator - Publisher - Contributor - Rights - Date - Format - Identifier - ### Open Graph Protocol - Created by Facebook - Inspired by RDFa <meta> Tags in HTML - og:title - og:type - og:image - og:url ### CSV ### HTML & CSS ### XML - Specification of arbitrary elements in an XML **Schema document** - Stylesheet functionality: XSL - Rendered into arbitrary other documents: XSLT - Selection of XML Tags: XPath - XML Query ### Semantic Web - Build ontologies (here: “vocabularies”) to formally describe objects and their relationships to each other - Triples describe statements in a directed graph - Triples comprise a **subject**, an **object**, and a **predicate** ### Schema.org - create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond ### RDF (XML) - Proposed by the W3C as a framework for **description of resources** - Designed to describe **semantics for machines** - RDF uses **URIs** to identify resources and annotate them with properties and values ### RSS (XML) - Used to aggregate and represent content from news pages - Specifies a `<channel>` with one or multiple `<item>` elements - Each `<item>` represents information about an internet article (“post”) on a blog or website ### ATOM (XML) - Specifies a `<feed>` with ne or multiple `<entry>` elements - Each `<entry>` represents information about an internet article (“post”) on a blog or website - Support for Threads ### JSON - Key- Value Pairs - Two data structures - Objects - Arrays #### JSON Schema - Describes a validation schema for JSON-formatted data - Allows for automated verification of JSON obejcts #### JSON Linked Data - Allows Mapping to RDF Models - Keywords: - @context: specifies ontology/vocabulary - @id: URI - @type ### JWT - JWT can just encode a token, digitally sign it, or encrypt it - JWT defines the following standard headers - typ: should be JWT - cty: content type - alg: used algorithm - JWT defines a set of standard claim - iss: Issuer of the toke - sub: subject of the token - aud: Audience that the token is meant for - exp: expiry date - nbf: not before #### JWS (JSON Web Signature) - JWS is used to transport claims and/or data digitally signed - JWS Comprises 3 parts - JWT Object Signing and Encryption (JOSE) header - JWT payload - Signature - Base64 Encoded, Concatenated via "." #### JWE (JSON Web Encryption) - JWE is used to transport claims and/or data signed and encrypted - JWE Comprises 5 parts - JWT Object Signing and Encryption (JOSE) header - JWE Encrypted Key (to encrypt content) - JWE Initialization Vector (randomness) - JWE Additional Authentication Data (AAD) - JWE Ciphertext - JWE Authentication Tag (AEAD element) ### vCard - Contact informations for a person - Parameters - BEGIN/END: Start/end of the vCard (mandatory) - FN: Full Name of the person (mandatory) - VERSION: Version of the vCard (mandatory) - BDAY: birthday of the person - EMAIL: Email of the person ### Friend of a Friedn (FOAF) - Standard and ontology based on RDF - Describes connections between individuals - Linked Data ### XHTML Friends Network (XFN) - Similar to FOAF - Based on XHTML using `<rel>` tags - All entities are identified via URL ### Microformats 2 - Collection of microformats to represent social information in HTML - Small bits of HTML to embed informations into website - Contains JSON - Formats: - h-adr - h-card - h-entry - h-event - h-feed #### h-Card - Based on vCard - Parameters - h-adr - h-card - h-entry - h-event - h-feed ### hEvent - Modelling events - Parameters - p-namep - -summary - dt-start - dt-end - dt-duration ### Portable Contacts (POCO) - Open protocol and data format for describing and accessing contact information - Key-Value Pairs - Simple - Boolean - Complex (Value, Type, Bool) - Fields - Singluar: Id, DisplayName, updated, status - Plural: emails, phoneNumbers ### Activity Steams Activity Streams 1.0 based on the rather simple concept of **describing activities** comprising an **actor**, a **verb**, an **object**, and a **target** ### Open Social - Specify an open standard for accessing data in OSN services for 3rd party application developers - Open Social specifies: - Data formats - API endpoints - Implemented by StudiVZ, LinkedIn, Xing, Google+ - Idea: Global-ID: Domain-Name ":" Local-ID - API: REST based ![](https://i.imgur.com/ERJThEB.png) ## Protocols ### Gossip - Distribution of information, analogous to a virus in an epidemic - Nodes are neighbors to n other nodes - Each Node is either in one State: - Susceptible - Infected - Removed - Either push, pull, push-pull ### APIs Types: - Remote Procedure Call - Call a function - Parameters are mapped to function - tight coupling - Message based - Message object descripes tasks to be executed - Service logic maps object ot function - less tightly coupled - Resource based - Direct access to resources - e.g. Filtes, database records - loose coupled ### XML-RPC HTTP Request body: `<methodCall>` ### HTTP ### REST - Five basic principles: - Unambiguous identification of resources via URLs - Links and hypermedia - Standard HTTP methods - Flexible representation of resources - Stateless communication - Benefits of RESTful APIs: - Loose coupling - Interoperability - Reusability - Scalability - HTTP Method: POST; GET, PUT, DELETE ### WEB API Best Practices - Nameing - plural nouns - concrete - Methods - GET - POST - PUT - DELETE - Error Handling - Limit to 200, 4000, 500 - Verbore error message in response fioeld - API Versions - Pagination & Partial Response - Processing - When using things that not a resource - Use verbs not nouns - Formats - Multiple formats - JSON as default - Provide SDK - Proof of Concept implemention ### Linkback Protocols - When publishing content on the web in relation to other (remote) content, inform the author (server) of the referenced content - Examples - Pingback - Trackback - Refback - Webmention ### Well Known URIs ### Webfinger - WebFinger is used to discover information about people or other entities on the Internet that are identified by a URI using standard HTTPS ### LLRD - Mechanism for obtaining information about a resource on the Web, which is identified via a URI ### Salmon Protocol Idea: Tether every copy of a blogpost to its original ### WebSub - Protocol for distribution of content in blogs - Content gets pushed to subscribers as soon as it is released ### OSTATUS - Protocol suite managing establishment of follow relationships and distribution of status updates - Originated from “OpenMicroBlogging” ### DFRN (DISTRIBUTED FRIENDS & RELATIONS NETWORK) - DFRN aims to provide “an open and distributed social communication platform”, where “nodes […] communicate with each other on your behalf” - Capabilities - profile pages - friend requests - notifications - requesting available content :::warning TBD ::: ### MicroPub - Specification for creating, updating, and deleting posts on a server - Simple mechanism to create, update, and delete content - MicroPub is a serialization format for** Microformats 2** - Object types h=entry, h=card, h=event, or h=cite - Requests - UTF8-encoded Microformats 2 - HTTP requests either as - x-www-form-urlencoded - multipart/form-data - JSON - Creating Content: - Micropub requests transfer sets of properties for a h-* format (e.g. h-entry) - Update Content - Updates need to specify "action": "update" - Updates further must specify a property "replace", "add", or "delete" - Response - Only Http Header - ### ActivityPub - Both: - Server to Server - Client to Server - Roles: - Actors - Objects - Activities - Collections - Each actor has inbox and outbox - Data Representation - Objects are Activity Streams 2.0 (AS2) conformant objects - Two Mandatory parameter - id (as URI) - type ### Mastodon Webfinger Endpoints ## APIs ### Facebook Graph API - Rest-ish - HTTP based - JSOn formated - Data Models: - Nodes - Edges - Fields - Every Object has ID - Chaining specified by fields - OAuth 2 tokesn - ![](https://i.imgur.com/yEwUNkv.png) ### Twitter API - Search API - Ads API - Engagement API (Metrics) - Account/User API ### Mastodon API :::warning TBD ::: # Identity Management :::warning TBD ::: ## Uniform Resource Name ## Uniform Resource Locator ## Uniform Resource Identifier ## Internationalized Resource Identifier ![](https://i.imgur.com/CCRfzgr.png) ## WebID ## UUID - v1: - Timestamp based + Mac address - v3 and v5: Name Based by namespace - v4: Similar to v3 and v5, randomness ## Ethereum Name Service ## SELF-SOVEREIGN IDENTITY ![](https://i.imgur.com/MhSu9s5.png) ## Distributed Identifiers # Social Network Analysis