<!-- {%hackmd theme-dark %} --> # Hacker News discussions on online identity, GDPR and Big Data ![https://news.ycombinator.com/](https://i.imgur.com/AX2g38e.jpg) :memo: Introduction --- :::info :bulb: Hackernews is a content aggregator platform popular among tech enthusiasts. Using BigQuery by Google, we have downloaded stories (posts) published since 2018 containing the keywords: 'identity', 'GDPR' and 'big data'. We have also collected the comments that belong to the stories. During this analysis, we focused on stories that received at least 1 comment. Overall, we have analysed: ::: ```csvpreview {header="true"} keyword, N stories, N comments Indentity, 339, 2185 GDPR, 553, 4366, Big Data, 123, 730 ``` :::info First, the most popular stories are identified based on the number of upvotes (score). Usually, a story includes a link to an outside article, but popular stories are often just spontaneous discussions (e.g. Ask HackerNews, Tell HackerNews etc). The column *descendants* shows the number of comments below the story. ::: :trophy: Most popular stories --- ### Most popular stories: Identity ```csvpreview {header="true"} #,title,score,timestamp,url,descendants 0,"Sick of spending time on Auth, we built an open source 'Stripe for Auth'",545,2020-12-17 17:48:56 UTC,,339.0 1,What happens when your career becomes your whole identity,544,2019-12-27 08:14:14 UTC,https://hbr.org/2019/12/what-happens-when-your-career-becomes-your-whole-identity,264.0 2,Keycloak: Open-source identity and access management,455,2020-04-14 20:43:07 UTC,https://www.keycloak.org/,122.0 3,Ask HN: How do I reach making $1-1.5k/mo in 13 months?,437,2020-06-05 12:26:26 UTC,,329.0 4,Tell HN: I miss the old internet,391,2018-06-17 21:57:03 UTC,,221.0 5,"Scuttlebot: Peer-to-peer database, identity provider, and messaging system",361,2020-04-18 18:35:29 UTC,http://scuttlebot.io/,116.0 6,Telegram moves to protect identity of Hong Kong protesters,328,2019-08-31 01:08:38 UTC,https://www.reuters.com/article/us-hongkong-telegram-exclusive/exclusive-messaging-app-telegram-moves-to-protect-identity-of-hong-kong-protesters-idUSKCN1VK2NI,81.0 7,The Future of Online Identity Is Decentralized,276,2020-07-12 14:30:35 UTC,https://yarmo.eu/post/future-online-identity-decentralized,194.0 8,Cloud Identity,257,2018-03-21 14:30:42 UTC,https://cloud.google.com/identity/,93.0 9,Keep your Identity Small (2009),254,2018-02-24 18:40:50 UTC,http://www.paulgraham.com/identity.html,110.0 10,Ask HN: Online banks where I can open account worldwide?,233,2018-11-25 09:08:36 UTC,,114.0 11,One woman's stolen identity exposed a system of exam fraud,227,2020-07-09 16:07:18 UTC,https://www.bbc.com/news/world-asia-china-53316895,170.0 12,Estonian Electronic Identity Card: Security Flaws in Key Management,222,2020-07-02 11:55:36 UTC,https://www.usenix.org/conference/usenixsecurity20/presentation/parsovs,80.0 13,Increasing transparency through advertiser identity verification,200,2020-04-23 13:20:41 UTC,https://www.blog.google/products/ads/advertiser-identity-verification-for-transparency/,92.0 14,Clojure’s Approach to Identity and State (2008),190,2019-07-13 16:08:50 UTC,https://clojure.org/about/state,62.0 15,Identity Beyond Usernames,181,2020-07-08 06:09:20 UTC,https://lord.io/blog/2020/usernames/,78.0 16,The Tripartite Identity Pattern (2008),180,2018-02-12 04:26:17 UTC,http://habitatchronicles.com/2008/10/the-tripartite-identity-pattern/,3.0 17,Show HN: Oathkeeper – Cloud-Native Identity and Access Proxy,174,2018-07-11 09:28:36 UTC,https://github.com/ory/oathkeeper,82.0 18,Google will require proof of identity from all advertisers,172,2020-04-26 06:41:13 UTC,https://www.nytimes.com/2020/04/23/business/media/google-advertising.html,91.0 19,How a Pentagon Contract Became an Identity Crisis for Google,168,2018-05-30 22:53:44 UTC,https://www.nytimes.com/2018/05/30/technology/google-project-maven-pentagon.html,199.0 20,Phone Numbers Stink as Identity Proof,155,2019-03-22 13:12:27 UTC,https://krebsonsecurity.com/2019/03/why-phone-numbers-stink-as-identity-proof/,56.0 ``` ### Most popular stories: GDPR ```csvpreview {header="true"} #,title,score,timestamp,url,descendants 0,Google’s GDPR Workaround,1868,2019-09-04 11:59:36 UTC,https://brave.com/google-gdpr-workaround/,588.0 1,GDPR: Don't Panic,863,2018-05-18 07:59:02 UTC,https://jacquesmattheij.com/gdpr-hysteria,800.0 2,GDPR for lazy people: Block all European users with Cloudflare Workers,755,2018-05-25 15:59:09 UTC,https://apility.io/2018/05/25/gdpr-lazy-block-eu-users-cloudflare-workers/,1467.0 3,"StreetLend.com shuts down, citing GDPR regulations",724,2018-04-29 22:39:00 UTC,https://streetlend.com,1123.0 4,Romania orders investigative journalists to disclose sources under GDPR,723,2018-11-09 18:58:02 UTC,https://www.occrp.org/en/40-press-releases/presss-releases/8875-occrp-strongly-objects-to-romania-s-misuse-of-gdpr-to-muzzle-media?fbclid=IwAR3oyyn-S4AchYYnsQlw_jZASnHclQxLPwS66IsgF19W73WjtFXYU-FhuYM,190.0 5,Am I logged in or not? GDPR case study on the example of Chrome browser change,681,2018-09-24 18:13:00 UTC,https://blog.lukaszolejnik.com/am-i-logged-in-or-not-gdpr-case-study-on-the-example-of-chrome-browser-change/,485.0 6,GDPR will pop the adtech bubble,668,2018-05-13 17:58:52 UTC,http://blogs.harvard.edu/doc/2018/05/12/gdpr/,437.0 7,Ask HN: A way to adblock “we're using cookies” popups?,668,2020-06-14 21:08:24 UTC,,338.0 8,Only 9% of visitors give GDPR consent to be tracked,651,2020-07-07 08:26:50 UTC,https://markosaric.com/gdpr-consent/,446.0 9,Publishers Haven't Realized How Big a Deal GDPR Is,607,2018-04-08 16:44:03 UTC,https://baekdal.com/strategy/publishers-havent-realized-just-how-big-a-deal-gdpr-is/,454.0 10,"After GDPR, The New York Times cut off ad exchanges and kept growing ad revenue",572,2019-01-16 12:00:48 UTC,https://digiday.com/media/new-york-times-gdpr-cut-off-ad-exchanges-europe-ad-revenue/,452.0 11,"Whois public database is in breach of GDPR, according to European authorities",545,2018-04-17 06:49:20 UTC,https://www.theregister.co.uk/2018/04/14/whois_icann_gdpr_europe/,311.0 12,Facebook CEO says no plans to extend all of GDPR globally,526,2018-04-04 01:06:11 UTC,https://www.reuters.com/article/us-facebook-ceo-privacy-exclusive/exclusive-facebook-ceo-says-no-plans-to-extend-all-of-european-privacy-law-globally-idUSKCN1HA2M1,376.0 13,How GDPR Will Change The Way You Develop,526,2018-02-27 12:19:43 UTC,https://www.smashingmagazine.com/2018/02/gdpr-for-web-developers/,688.0 14,The Nightmare Letter: A Subject Access Request Under GDPR,508,2018-03-17 12:15:09 UTC,https://www.linkedin.com/pulse/nightmare-letter-subject-access-request-under-gdpr-karbaliotis/,523.0 15,The Original Cookie specification from 1997 was GDPR compliant (2019),502,2020-05-08 03:25:23 UTC,https://baekdal.com/thoughts/the-original-cookie-specification-from-1997-was-gdpr-compliant/,181.0 16,GDPR Version of USA Today Is 500KB Instead of 5.2MB,466,2018-05-26 13:50:56 UTC,https://twitter.com/fr3ino/status/1000166112615714816?s=19,205.0 17,Zoom still don't understand GDPR,451,2020-08-27 23:57:50 UTC,https://www.threatspike.com/blog/zoom_cookies.html,251.0 18,"Things to know about the GDPR, Mozilla and Firefox",430,2018-05-25 12:37:00 UTC,https://blog.mozilla.org/internetcitizen/2018/05/23/gdpr-mozilla/,97.0 19,Black Hat: GDPR privacy law exploited to reveal personal data,397,2019-08-08 17:29:25 UTC,https://www.bbc.co.uk/news/technology-49252501,232.0 20,GDPR: Programmatic ad buying plummets in Europe,394,2018-05-26 03:02:51 UTC,https://digiday.com/media/gdpr-mayhem-programmatic-ad-buying-plummets-europe/,396.0 ``` ### Most popular stories: Big Data ```csvpreview {header="true"} #,title,score,timestamp,url,descendants 0,Ask HN: Am I the longest-serving programmer – 57 years and counting?,2634,2020-05-31 01:53:44 UTC,,531.0 1,Uber’s Big Data Platform: 100+ Petabytes with Minute Latency,211,2018-10-17 19:35:28 UTC,https://eng.uber.com/uber-big-data-platform/,68.0 2,China using big data to detain people before crime is committed,198,2018-02-28 20:18:48 UTC,https://www.theglobeandmail.com/news/world/china-using-big-data-to-detain-people-in-re-education-before-crime-committed-report/article38126551/,179.0 3,Is it Pokemon or Big Data? (2016),168,2018-08-01 09:23:45 UTC,https://pixelastic.github.io/pokemonorbigdata/,23.0 4,MIT D4M: Mathematics of Big Data and Machine Learning [video],136,2018-09-30 12:43:50 UTC,https://www.youtube.com/watch?v=iCAZLl6nq4c&list=PLUl4u3cNGP62DPmPLrVyYfk3-Try_ftJJ&index=1,5.0 5,Launch HN: Data Mechanics (YC S19) – The Simplest Way to Run Apache Spark,131,2020-05-11 14:58:37 UTC,,42.0 6,Big Data' Has Come to Mean 'Small Sampled Data',129,2019-03-06 15:20:11 UTC,https://www.forbes.com/sites/kalevleetaru/2019/02/17/the-big-data-revolution-will-be-sampled-how-big-data-has-come-to-mean-small-sampled-data/,37.0 7,A.I. And Big Data Could Power a New War on Poverty?,125,2018-01-02 18:19:55 UTC,https://www.nytimes.com/2018/01/01/opinion/ai-and-big-data-could-power-a-new-war-on-poverty.html?action=click&pgtype=Homepage&clickSource=story-heading&module=opinion-c-col-right-region&region=opinion-c-col-right-region&WT.nav=opinion-c-col-right-region,286.0 8,"Metatron – Open-Sourced, Self-Service Big Data Discovery",88,2019-05-10 21:11:27 UTC,https://metatron.app/,12.0 9,The Big Data of Big Hair (2019),86,2020-02-17 13:09:05 UTC,https://pudding.cool/2019/11/big-hair/,12.0 10,Japanese Wine Meets Big Data,84,2018-03-28 03:44:43 UTC,https://www.nippon.com/en/features/c04602/,13.0 11,"Metabase, an Uber Co-Founder’s New Big Data Startup, Raises $13M",81,2019-02-11 16:19:45 UTC,https://news.crunchbase.com/news/metabase-an-uber-co-founders-new-big-data-startup-raises-13m/30/,14.0 12,"Ask HN: I'm quitting my job, will create a game – any advice?",79,2020-12-10 18:28:44 UTC,,114.0 13,Europe is drawing fresh battle lines around the ethics of big data,69,2018-10-05 17:47:30 UTC,https://techcrunch.com/2018/10/03/europe-is-drawing-fresh-battle-lines-around-the-ethics-of-big-data/,54.0 14,Scientists use big data to understand what separates winners from losers,63,2019-11-27 09:15:48 UTC,https://www.scientificamerican.com/article/failure-found-to-be-an-essential-prerequisite-for-success/,20.0 15,"To Know, but Not Understand: David Weinberger on Science and Big Data (2012)",60,2020-08-16 09:53:26 UTC,https://www.theatlantic.com/technology/archive/2012/01/to-know-but-not-understand-david-weinberger-on-science-and-big-data/250820/,18.0 16,An Overview of End-to-End Entity Resolution for Big Data,58,2020-12-17 12:16:11 UTC,https://blog.acolyer.org/2020/12/14/entity-resolution/,2.0 17,How big data has created a big crisis in science,44,2019-01-27 16:21:20 UTC,https://theconversation.com/how-big-data-has-created-a-big-crisis-in-science-102835,17.0 18,"Efficient Personal Search with Vespa, the Open Source Big Data Serving Engine",37,2019-02-13 20:44:31 UTC,https://yahoodevelopers.tumblr.com/post/182787567063/efficient-personal-search-at-scale-with-vespa-the,0.0 19,"One More Time, with Big Data: Measles Vaccine Doesn’t Cause Autism",35,2019-03-05 20:39:03 UTC,https://www.nytimes.com/2019/03/05/health/measles-vaccine-autism.html,52.0 20,In-Memory Performance for Big Data (2014) [pdf],29,2019-04-29 21:15:45 UTC,http://www.vldb.org/pvldb/vol8/p37-graefe.pdf,7.0 ``` :link: Co-occurrence analysis --- :::info Next, we focus on frequent terms in stories and comments. To have a better overview on the discussed topics, the frequency of bigrams (terms next to each other in a text) is calculated. The following tables contain the most relevant bigrams. ::: ### Most popular bigrams: Identity ```csvpreview {header="true"} term 1, term 2, number of occurrences open, source, 57, social, media, 53, identity, politics, 34, culture, fit, 33, digital, identity, 23, decentralized, identity, 22, identity, provider, 21, online, identity, 19, identity, theft, 18, identity, management, 16, ``` ### Most popular bigrams: GDPR ```csvpreview {header="true"} term 1, term 2, number of occurrences personal, data, 301, data, protection, 146, gdpr, compliant, 121, personal, information, 118, privacy, policy, 114, ip, address, 74, google, analytics, 71, third, party, 61, eu, citizens, 61, business, model, 45, delete, data, 43, outside, eu, 40, right, forgotten, 39, open, source, 34, 3rd, party, 33, data, breach, 27, cookie, consent, 26, personally, identifiable, 24, dark, patterns, 23 ``` ### Most popular bigrams: Big Data ```csvpreview {header="true"} term 1, term 2, number of occurrences big, data, 195, machine, learning, 21, data, analytics, 12, data, processing, 12, data, engineering, 10, data, scientist, 9, data, science, 8, open, source, 7 ``` :thermometer: Sentiment analysis --- :::info Finally, we focus on the sentiment of stories: which are the posts that received the most positive and negative comments? Using [VADER](https://github.com/cjhutto/vaderSentiment) the comments are classified as positive, negative or neutral. Below we present the top 5 most positive and negative stories. ::: ### Most positive stories: Identity ```csvpreview {header="true"} title,url,pos_score "Show HN: Tobab, a poor mans identity aware proxy. “BeyondCorp” for selfhosters",https://github.com/gnur/tobab/,0.208 "Sick of spending time on Auth, we built an open source 'Stripe for Auth'",,0.185 Keycloak: Open-source identity and access management,https://www.keycloak.org/,0.178 AI Anonymizer – use virtual faces to secure your identity,https://generated.photos/anonymizer,0.173 Ask HN: I trying to build a new social media,,0.172 ``` ### Most negative stories: Identity ```csvpreview {header="true"} title,url,neg_score Activist Jailed for Facebook Posts; FBI Tracks Him as “Black Identity Extremist”,https://www.theguardian.com/world/2018/may/11/rakem-balogun-interview-black-identity-extremists-fbi-surveillance,0.166 Imageboard-Users donating thousends of € to protest against KrebsOnSecurity,,0.137 Facebook Container Plugin from Mozilla Isolates Your Facebook Identity,https://blog.mozilla.org/blog/2018/03/27/facebook-container-add-on/,0.136 Protecting user identity against Silhouette,https://blog.twitter.com/engineering/en_us/topics/insights/2018/twitter_silhouette.html,0.134 "Depression, Self-Identity and Reality: Living in a Story Created by Facebook",https://medium.com/privateid-blog/depression-self-identity-and-reality-living-in-a-fictional-story-created-by-social-media-38f230ab9bf7,0.13 ``` ### Most positive stories: GDPR ```csvpreview {header="true"} title,url,score Ask HN: Hosting provider that respects privacy?,,0.227 GDPR Shield: block your website from EU visitors,https://www.gdpr-shield.io/,0.215 Show HN: Instantly make any Netlify form PCI DSS compliant,,0.207 Ask HN: Why am I getting all these GDPR emails?,,0.201 Google Feels the Brunt of GDPR Enforcement,https://www.saiglobal.com/en-au/news_and_resources/industry_news/google_feels_the_brunt_of_gdpr_enforcement/,0.186 ``` ### Most negative stories: GDPR ```csvpreview {header="true"} title,url,neg_score Show HN: Ship Your Enemies GDPR,https://shipyourenemiesgdpr.com,0.166 Ask HN: Does HN respect the GDPR?,,0.151 "CoinTouch.com shuts down, citing EU GDPR regulations",https://www.cointouch.com/,0.145 Punishing Blizzard for anti-HK partisanship by flooding it with GDPR requests,https://boingboing.net/2019/10/08/ddos-gdpr.html,0.141 Ask HN: Are GDPR cookie popups useful or just annoying?,,0.14 ``` ### Most positive stories: Big Data ```csvpreview {header="true"} title url pos_score Metabase, an Uber Co-Founder’s New Big Data Startup, Raises $13M https://news.crunchbase.com/news/metabase-an-uber-co-founders-new-big-data-startup-raises-13m/30/ 0.228 Ask HN: I'm quitting my job, will create a game – any advice? 0.19 The Big Data of Big Hair (2019) https://pudding.cool/2019/11/big-hair/ 0.187 Launch HN: Data Mechanics (YC S19) – The Simplest Way to Run Apache Spark 0.17 Ask HN: I want to teach myself Machine Learning. Where should I start? 0.151 ``` ### Most negative stories: Big Data ```csvpreview {header="true"} title url neg_score A.I. And Big Data Could Power a New War on Poverty? https://www.nytimes.com/2018/01/01/opinion/ai-and-big-data-could-power-a-new-war-on-poverty.html?action=click&pgtype=Homepage&clickSource=story-heading&module=opinion-c-col-right-region&region=opinion-c-col-right-region&WT.nav=opinion-c-col-right-region 0.13 Europe is drawing fresh battle lines around the ethics of big data https://techcrunch.com/2018/10/03/europe-is-drawing-fresh-battle-lines-around-the-ethics-of-big-data/ 0.125 One More Time, with Big Data: Measles Vaccine Doesn’t Cause Autism https://www.nytimes.com/2019/03/05/health/measles-vaccine-autism.html 0.106 Ask HN: Fake résumé generator 0.091 Ask HN: What niche/forgotten technologies do you wish were more mainstream? 0.087 ``` About us --- [Digital Economy Lab UW](https://fwd.delabapps.eu/) ![](https://i.imgur.com/URiAR3I.png) ###### `hackernews` `bigquery` `text-mining`