Rey
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Ethique & WebScrapping Ecole de formation R / Labex Dynamite / Florence / Septembre 2018 ![CCO](https://mirrors.creativecommons.org/presskit/buttons/88x31/png/cc-zero.png "Public Domain Licence" =100x) [Sébastien Rey-Coyrehourcq](http://umr-idees.fr/user/s%C3%A9bastien-rey-coyrehourcq/)/[@reyman64](https://twitter.com/reyman64) avec l'aide très précieuse de Lionel Morel (alias [Scinfolex](https://scinfolex.wordpress.com )/[@calimaq](https://twitter.com/Calimaq) sur le net) qui a accepté de répondre à mes nombreuses questions de novice. **Prologue** : The problematic is really complex, country-dependent, and I'm not a lawyer, so this is only my point of view (Sébastien Rey-Coyrehourcq) which is exposed here. Also, this is a work in progress document, we try to publish the final document somewhere on Internet during the next month. # The law ... of the jungle ? Many **grey zones** around the (somewhat) old pratice of webscraping remain in national and international law. There are *indeed* several laws which protect data and databases from harvesting, e.g. the *Civil Code* (article 1240), the *Commerce Code* (article L121-2 [^L121]), the *Intellectual Code Property* (article L342-1[^L342]) and the *Penal Law* (articles 323-3[^L323] and [^L311]). ## French and European legislation The French Penal Law clearly condemns any *fraudulent* intent, so if you're aware you overpass any security mechanisms to access data, you're guilty. While the definition of a *fraudulent act* seemed unclear during years, the *[Jurisprudence bluetouff (2015)](https://www.silicon.fr/vol-information-jurisprudence-bluetouff-pour-gloire-117057.html)* now cleary recognizes the fact that the terms "*soustraction frauduleuse de la chose d’autrui*" (fraudulent misappropriation or theft) are potentialy applicable to data. A position critized by some [lawyer](http://www.maitre-eolas.fr/post/2014/02/07/NON%2C-on-ne-peut-pas-%C3%AAtre-condamn%C3%A9-pour-utiliser-Gougleu) because, in case of duplication, the original data **doesn't disappear** (like a stolen hard drive would). The article *L121-2* associated with the article *1240* of Civil Code also protects companies from **parasitism behaviors**. You should not scrape data from a website and then create a similar website which relies solely on this data. The article *L342-1* about Intellectual property states that the producer of data can prohibit the extraction of data, given some unclear limit [^L342] (i.e. a "substantial part" of the data being extracted). Sadly, of course, this term "substantial" is open to interpretation ... These two articles point to the usage of data, rather than the harvesting in itself, being a potential source of legal problems. Some Jurisprudence could help us to understand to which extent: - RyanAir vs Opodo (Cour d'appel de Paris 2012) : This [jurisprudence](https://www.legalis.net/jurisprudences/cour-dappel-de-paris-pole-5-chambre-2-arret-du-23-mars-2012/) act in favor of WebScraping because ... ==Finir la description du cas== - RyanAir vs PR Aviation (Netherlands Supreme Court, then Cour d'appel de Justice Européenne CUJE 2015) : This is a similar affair, in which PR Aviation scrapped prices from different airplanes companies to create an online comparator. This time, though RyanAir went to the CUJE. Although the Dutch court recognized a normal use of the database, and rejected the application of european directive 96/9, the CUJE took another [decision](https://eur-lex.europa.eu/legal-content/FR/TXT/?uri=CELEX%3A62014CJ0030), in defavour of webscraping this time. The CUJE hence acted the possibility to limit the use of database based on the limitation of the CGU [^act]. This second affair is very bad news for public domain and by extension for scraping, as the specialist Lionel Morel explains to us on his [blog Scinfolex](https://scinfolex.com/2015/01/23/linformation-ne-peut-plus-etre-libre-a-propos-dun-arret-aberrant-de-la-cjue/) : > Car ce que la CJUE a fait disparaître par cette décision, c’est tout simplement une immense partie du domaine public : celle qui était auparavant constituée par l’information brute et les données. Son raisonnement instaure une possibilité, cette fois-ci absolue – sans aucune exception – de poser des limites par voie contractuelle à la réutilisation de l’information encapsulée dans une base, sans avoir de conditions particulières d’originalité ou d’investissement à remplir. > *Through this decision, the CJUE simply obliterated a huge part of public domain: that part which was once constituted by raw information and data. Its reasoning introduces the possibility to limit the use of the information contained in a database through a contract, regardless of its originality or the investment it required* We'll discuss the applicability in France and Europe of complex and/or abusive CGU later. For now, let's consider this objection : We're scientists and not not a commercial company ! You're totally right. You probably heard about the recent application of the RGPD European law, which also includes special derogations for researchers. Talk about that. **Source :** - « Le Web Scraping, Une Technique D’extraction Légale ? | Actualités Du Droit | Wolters Kluwer France ». Consulté le 14 août 2018. https://www.actualitesdudroit.fr/browse/tech-droit/start-up/9404/le-web-scraping-une-technique-d-extraction-legale. - « Le Web Scraping ? C’est Quoi ? La légalité du Web Scraping en 5 minutes | François Baulu | Pulse | LinkedIn ». Consulté le 14 août 2018. https://www.linkedin.com/pulse/le-web-scraping-cest-quoi-la-l%C3%A9galit%C3%A9-du-en-5-minutes-fran%C3%A7ois-baulu?articleId=6427821828948918272. ### RGPD, some derogations for researchers ? **Prologue :** I'll try to be short on that point, because there are already lots of litterature on this complex subject. In the context of research, the answer is definitively complex, though the question is quite simple: How can I do *ethically* and *legally* my everyday job as a scientist, when collecting, exploring and publishing part of these new sources of data ? Since the application of GDPR (or RGPD in french) in May, users have a new ally in their fight for their rights on the Internet. For "Mr. GDPR", alias Giovanni Buttarelli (European Union’s data protection supervisor), > The GDPR aims to redress the startling imbalance of power between big tech and the consumer, giving people more control over their data and making big companies accountable for what they do with it. It replaces the 1995 Data Protection Directive, which required national legislation in each of the 28 E.U. countries in order to be implemented. And it offers people and businesses a single rulebook for the biggest data privacy questions. Tech titans now have a single point of contact instead of 28. ([source Washington post 14/08/2018](https://www.washingtonpost.com/news/theworldpost/wp/2018/08/14/gdpr/?noredirect=on&utm_term=.11ebf48ebf46)) But a case study would be useful to assess the robustness of this recently applied law, when confronted to companies tempted to exploit flaws in the text. The **DisinfoLab affair** which occurred during/because of the **Benala affair** this August 2018 is probably a great first textbook case for RGPD/CNIL. The 08 of August, the belgian ONG **Disinfo lab** published [a report](https://spark.adobe.com/page/Sa85zpU5Chi1a/) which exhibited some allegedly inhabitual activity on Twitter around the Benala affair, presumably due to *russian bots*. Since then, the methodology and conclusions of this report have been criticized by many media and scientists. As far as we are concerned, though, the problem is elsewhere: to legitimize its methodology, **Disinfo lab** made raw data publicly available on its website (part of it has been deleted since). The three files contained personnal data : pseudonyms, texts of public profile, tweets and retweets, and for some accounts (identified by the ONG as hyper-active people)... a number which identified a politic orientation. Arthur Messaud, one of the jurists in the Quadrature du Net ONG, observed that these files were illegal in the eye of the recent RGPD : > Cette publication n’était nécessaire à la poursuite d’aucun objectif. Or, publier des données perso [nnelles] sans consentement est toujours illicite, si ce n’est nécessaire à aucun objectif » [Source](https://twitter.com/laquadrature/status/1027556732921364480) So what ? You could legitimely object that part of this information is public, and that anyone can read these people's opinion on their public timeline. Once more, it seems the answer is complex and depends on your country ... Regardless of the incoming [public decision of CNIL](https://www.cnil.fr/fr/etude-realisee-partir-de-messages-postes-sur-twitter-la-cnil-est-saisie-du-dossier) on this affair, many people already consider that expressing a public opinion to your followers doesn't give any implicit permission to be taken to task later by someone ! In an article of Numerama citing Valérie Nicolas, we understand clearly that all these informations (pseudonyms, tweets, retweets, likes, geolocalisation, etc.) are identified by CNIL as [personal](https://www.cnil.fr/cnil-direct/question/492?visiteur=part) because you could easily cross-check informations to find the physical person behind the tweets. **Personal data** is defined in RGPD as > any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person; [Source](https://ec.europa.eu/info/law/law-topic/data-protection/reform/what-personal-data_en) Some examples : - a name and surname - a home address - an email address such as name.surname@company.com - an identification card number - location data (for example the location data function on a mobile phone)* - an Internet Protocol (IP) address - a cookie ID - the advertising identifier of your phone - data held by a hospital or doctor, which could be a symbol that uniquely identifies a person. Worst, some of this personal information collected by this entreprise is also identified as [*données sensibles*](https://www.cnil.fr/cnil-direct/question/495) by the CNIL or *categorie particulières de données à caractère personnel* in RGPD (adding three new case). This data covers ethnics, sexuality, political opinion, etc. In France **collection, manipulation or publication of personal and/or sensible data before or after any treatment** is systematicly prohibited without the **explicit consent** of individuals, right ? **By default yes**, but depending of data sensibilities there are different derogations to authorize usage of these data **without an explicit consent** ! In RGPD, the *explicit consent* for treatment of *données à caractère personnel* is only **one of the six derogation** listed by [article 6](https://www.cnil.fr/fr/cnil-direct/question/1308). The extended category of personnal data named *categorie particulières de données à caractère personnel* derogations are listed in [Article 9 of RGPD](https://www.cnil.fr/fr/reglement-europeen-protection-donnees/chapitre2#Article9) If you look at this two articles, by chance you see that RGPD includes some derogations to protect the work of researchers, but also others professions, like journalist[^article85]. If we go back to *Disinfolab affair*, this ONG claimed some of these derogations applied to its case (see [document](http://disinfo.eu/2018/08/09/communique-du-eu-disinfolab/)) in order to bypass `a)` the absence of explicit consent and `b)` the right for individuals to be informed. Among the derogations explicited in the [Article 9 of RGPD](https://www.cnil.fr/fr/reglement-europeen-protection-donnees/chapitre2#Article9), though, only two seem admissible in this case : > [...] > e) le traitement porte sur des données à caractère personnel qui sont manifestement rendues publiques par la personne concernée; > [...] > j) le traitement est nécessaire à des fins archivistiques dans l'intérêt public, à des fins de recherche scientifique ou historique ou à des fins statistiques, conformément à l'article 89, paragraphe 1, sur la base du droit de l'Union ou du droit d'un État membre qui doit être proportionné à l'objectif poursuivi, respecter l'essence du droit à la protection des données et prévoir des mesures appropriées et spécifiques pour la sauvegarde des droits fondamentaux et des intérêts de la personne concernée. > [...] The `e)` derogation ask some new questions, there are two way to collect data, directly from users, or using some data sellers. In this case, Disinfolab use a specialized platform named *Visibrain* to collect data from Twitter, so we're probably in the second case [source](http://www.reputatiolab.com/2018/07/affaire-benalla-reseaux-sociaux-resurrection-partis-de-lopposition/) As we discuss later in the TOS section, we know that the way that Twitter sell our data to other companies by collecting our consent using TOS/CGU is clearly abusive ... We see why later, but in two words, the absence of finality which drive the harvest (and the reselling !) of data is one big part of this problem. RGPD is very clear on [finality](https://www.cnil.fr/fr/definir-une-finalite) of data processing : - You need an explicit and clear objective for data harvesting - This finality **cannot change** during project ! - This finality drive the data you collect - This finality determine the duration of conservation od data Moreover, although raw information was "publicly" accessible, the files shared by Disinfolab also contained a new categorization/data based on some largely biased algorithms. Is it a form of *profilage automatisé sans décision automatisé* case [source](https://www.cnil.fr/fr/profilage-et-decision-entierement-automatisee) ? Even in this case, less strict than *profilage automatisé avec décision automatisé*, profilage of sensible data **is forbidden by default (see derogations of article 6), and in any case the users need to be informed of their rights** ([CNIL page](https://www.cnil.fr/cnil-direct/question/1381?visiteur=part), [RGPD chapter 3](https://www.cnil.fr/fr/reglement-europeen-protection-donnees/chapitre3#Article22) and [blog avocatspi]([http://avocatspi.com/2017/01/17/profilage-ce-que-dit-le-nouveau-reglement-europeen/])) DisinfoLab mention for their defense the RGPD "not so well defined" notion of *intérêt légitime* (point `f)` of [article 6](https://www.cnil.fr/fr/reglement-europeen-protection-donnees/chapitre2#Article6)) to legitimate their treatment. But as very well said by [Scinfolex in dedicated post](https://scinfolex.com/2018/08/21/affaire-disinfolab-quelles-retombees-potentielles-sur-la-recherche-publique-et-la-science-ouverte/), this notion be cannot invocated *à moins que ne prévalent les intérêts ou les libertés et droits fondamentaux de la personne concernée qui exigent une protection des données à caractère personnel*. With the publication of all files which contains twitter id and political orientation, the Disinfolab defense seems very weak ... After some very interesting discussions with Scinfolex on Twitter, the best defence for Disinfolab ONG was probably to declare their studies under the scientific category, because this option `j)` of article 9 open more derogations than others. This part is very interesting for us. I'll try to summarise it in some bullet points for you, but please consult the original post for detailled explanation ! Principal derogations for researchers : - Flexibility on **finality** associated with your harvest project : finality associated with data could change with study. - Flexiblity of data retention duration - Possibility to acquire correct data (CGU!) from public/private third parties (twitter, facebook, insee, etc.) without re-asking to individual a new explicit consent. - If it's technicaly impossible to contact all individuals to get their explicit consent or to give information about their rights. - Sensible data processing is probably possible, but it depends of the finality of data harvesting (probably judged case by case by CNIL). Duty / Advice : - conctact local CNIL at university if exist or directly the global CNIL ! - anonymise or pseudonymise the data ! - minimise the data collected based on finality of the study Actually, we don't know if the belgium ONG falls into this **research category**, and this is probably the CNIL which will decide on this point... So, **Wait & See** now. ==Todo transition== So, as Lionel Morel told me in a private message, **there is currently no exception for text or data mining**. We could observe the construction of the future European Copyright directive (see latest news on [scinfolex](https://scinfolex.com/2018/07/11/debat-de-la-derniere-chance-au-parlement-europeen-pour-reconcilier-le-droit-dauteur-et-les-libertes-tribune-liberation/) and [Quadrature du Net](https://www.laquadrature.net/fr/copyright_mandat)) to see what happens on this point, but even if some sort of derogation is included in this directive, that could take a long long time before application... But if you follow actuality on this point, you could see that actually things go really wrong with the recent decision of European Union : ==Work in progress== **By chance, since june 2018 we have the perfect exemple of procedure to follow if you objective is harvesting (any) data on social network. It take the form of an official deliberation published on legifrance website :** https://www.legifrance.gouv.fr/affichCnil.do?id=CNILTEXT000036945250 ==Todo SOURCE== We now expose some USA legislation, first to appreciate the difference between legislations on webscraping activity but also to try to understand if the complex CGU of some GAFA are compatible with French or European law. --- This articulation between personal data and research context is very well described in some long and very detailled dedicated post of Scinfolex : - ["Données personnelles et recherche scientifique"](https://scinfolex.com/2018/07/18/donnees-personnelles-et-recherche-scientifique-quelle-articulation-dans-le-rgpd/) - ["Affaire DisinfoLab : quelles retombées potentielles sur la recherche publique et la science ouverte ?"](https://scinfolex.com/2018/08/21/affaire-disinfolab-quelles-retombees-potentielles-sur-la-recherche-publique-et-la-science-ouverte/). Other more generaliste articles about DisinfoLab affair : - « Est-il vrai que l’ONG DisinfoLab s’est rendue coupable sur Twitter de fichage politique? » Libération.fr, 9 août 2018. http://www.liberation.fr/checknews/2018/08/09/est-il-vrai-que-l-ong-disinfolab-s-est-rendue-coupable-sur-twitter-de-fichage-politique_1671816. - « Affaire Benalla : cinq questions sur l’étude de l’ONG DisinfoLab, accusée de fichage politique ». Consulté le 16 août 2018. https://www.francetvinfo.fr/politique/emmanuel-macron/agression-d-un-manifestant-par-un-collaborateur-de-l-elysee/affaire-benalla-cinq-questions-sur-l-etude-de-l-ong-disinfolab-accusee-de-fichage-politique_2890139.html. ## USA legislation for example the (old) US *CCFA law* of 1986, or in France, ==In work== # Algorithm & ethics ## Webscraping, following the GAFAM rules ? > If you are not paying for it, you're not the customer; you're the product being sold. > [Andrew Lewis in 2010](https://quoteinvestigator.com/2017/07/16/product/) You could follow some ethics of webscraping : following -unclear- law, following `robot.txt`, following TOS prohibiting scraping, but please please don't be naive ... We see in many ways in this chapter that big companies don't respect any of these rules. Example of industry of scraping : https://www.diffbot.com/welcome/data.jsp Some examples pick from dozens of aggregated services made by google illustrate this simple fact, google, but also other GAFAM act like *king of the jungle* when we talk of data collection/manipulation/aggregation : - Flight services of google which crawl and compare price of airflights companies : https://www.google.fr/flights/ - Google scholar crawl and index scientific publication - Google news crawl and aggregate news at international level - Google shopping By creating and publish such aggregated services on google first result, Google serve it's first own business, bypassing tools from other companies, bypassing the first sources of data collected, and finaly collect money by adding ads to result pages. But as say in 5.3 point in google TOS : > 5.3 You agree not to access (or attempt to access) any of the Services by any means other than through the interface that is provided by Google, unless you have been specifically allowed to do so in a separate agreement with Google. You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers) and shall ensure that you comply with the instructions set out in any robots.txt file present on the Services. > ‘Scraping’ is Just Automated Access, and Everyone Does It > Jamie Williams ==Transition== Consider **TOS** for example, this is a perfect example of abusive method used by company to limit webscraping, but not only that, **they limit your right by injecting the idea that YOUR personnal data in some common data.** As we see recently with Cambridge Analytica affair, but also with Disinfolab affair in France, and probably many affairs in the next years, you have the right **to not be ok** with the usage of your data. TOS is a contract between user and company which provide the services. First of all, many TOS are impossible to read, see the work made by Dima Yarovinsky, who prints some TOS to illustrate this problem. ![Dima Yarovinsky](https://i.imgur.com/RHBKWmJ.jpg) This text is adapted by compagny based on the law for each country. But in many cases, international company of data don't respect this and inject abusive claude in their TOS. Even in the USA, few people read this complex TOS, and as say by Facebook CEO Mark Zuckerberg in front of Senator who ask him what he think consumers understand of Facebook TOS, he say : *I don’t think the average person likely reads that whole document.* [(Bloomberg 04/20/2018)](https://www.bloomberg.com/news/articles/2018-04-20/uber-paypal-face-reckoning-over-opaque-terms-and-conditions) Do you really think that a young french children read all the TOS in english before creating an account on Facebook ? A judge probably invalid the TOS in this case ... Even is you read all the TOS, there are constant modification applied, humanly impossible to follow. The project [tosdr](https://tosdr.org/) and [tosback](https://tosback.org/) try to help users on this point. In France, you probably heard of the story of this French professor Frédéric Durand-Baïssas, who attacked Facebook in 2011 for the brutal desactivation of this account due to his posting on his wall the famous painting of Gustave Courbet "Origine du monde". Facebook first argued that, based on its TOS, only an American court could judge this case. This was not the point of view, though, of the French judge who took it. Seven years afterwards, *the tribunal of grande instance de Paris*, concluded that closing an account without any consent was actually abusive. But because Facebook was not really condemned in regard to freedom of expression, Mr Durand-Baissas is still fighting [(Telerama, article mars 2018)](https://www.telerama.fr/medias/facebook-vs-lorigine-du-monde-la-justice-considere-quil-y-a-eu-faute,-mais-ne-condamne-pas,n5528912.php) In another affair against Twitter's TOS started in 2014, the UFC-Que-choisir ONG finally obtained the removal of **250 abusive clauses** in Twitter's TOS on August 7th 2018 [(source ufc-que-choisir)](https://www.quechoisir.org/action-ufc-que-choisir-reseaux-sociaux-et-clauses-abusives-l-ufc-que-choisir-obtient-la-suppression-de-centaines-de-clauses-des-conditions-d-utilisation-de-twitter-n57621/) In Europe, you've probably seen lots of pop-ups during the last month, giving you a "choice" : **Accept conditions or leave the service**. Not much of a choice, is it? Scinfolex exposes the case of Facebook in this [long post](https://scinfolex.com/2018/04/22/veuillez-accepter-nos-conditions-la-fabrique-du-consentement-chez-facebook-et-les-moyens-dy-mettre-fin/). The RGPD prohibits this type of forced consent [(see this article of Mr. GDPR : Big tech is still violating your privacy )](https://www.washingtonpost.com/news/theworldpost/wp/2018/08/14/gdpr/?noredirect=on&utm_term=.11ebf48ebf46), this is why the EU ONG [NOYB](https://noyb.eu/), and in France *la quadrature du net ONG* intented a collective action to alert the CNIL [(see the collective action on GAFAM)](https://gafam.laquadrature.net) Another abuse concerning data is the implicit consent associated to various algorithms on the web. A very good example of this problem is the ReCaptcha of Google. ![HellCaptach](https://upload.wikimedia.org/wikipedia/fr/9/9d/Captcha_google_checkbox.gif) If you're lucky, you'll see the green mark appear on your first try, but if like me you use tools to protect your privacy (ad blocker, user agent switcher, eff privacy extension, https everywhere or worst, a VPN or TOR network), you'll probably need to click and click again on little squares with very bad photos of streetname, cars, buses, road signs, or other very silly things to train some neurals networks to do ... hum ... to do what exactly ? Hence, we give our time for free, as well as our implicit consent to **ONE compagny** to access some common data on a website, and we do so without knowing the **finality** of this **digital labor**... As Scinfolex or Affordance.info say, what would happen if all of these thousands of clicks per second helped the US army to train some future drone brain ? This aspect is detailed in this very interesting affordance.info's [blogpost on this subject](http://affordance.typepad.com/mon_weblog/2018/03/im-a-digital-worker-killing-an-arab.html) : > La question de la traçabilité, mais surtout celle de l'intentionnalité des régimes de collecte est essentielle. This problem could be generalized to all automatised algorithms that collect and perform decisions based on human interactions. This new field of **"Ethics of Algorithms"** tries to expose and understand the causes and consequences (when it's possible, because complexity is the norm) of this automated background on human life, virtually on the Internet, but also physicaly in "real world". --- [^article85]: Dans le cadre du traitement réalisé à des fins journalistiques ou à des fins d'expression universitaire, artistique ou littéraire, les États membres prévoient des exemptions ou des dérogations au chapitre II (principes), au chapitre III (droits de la personne concernée), au chapitre IV (responsable du traitement et sous-traitant), au chapitre V (transfert de données à caractère personnel vers des pays tiers ou à des organisations internationales), au chapitre VI (autorités de contrôle indépendantes), au chapitre VII (coopération et cohérence) et au chapitre IX (situations particulières de traitement) si celles-ci sont nécessaires pour concilier le droit à la protection des données à caractère personnel et la liberté d'expression et d'information. [^L121]: Une pratique commerciale est trompeuse si elle est commise dans l'une des circonstances suivantes : [...] 1) Lorsqu'elle crée une confusion avec un autre bien ou service, une marque, un nom commercial ou un autre signe distinctif d'un concurrent ; [...] 2) Lorsqu'elle repose sur des allégations, indications ou présentations fausses ou de nature à induire en erreur et portant sur l'un ou plusieurs des éléments suivants [...] 3) Lorsque la personne pour le compte de laquelle elle est mise en œuvre n'est pas clairement identifiable. [^L342]: Le producteur de bases de données a le droit d'interdire : a) L'extraction, par transfert permanent ou temporaire de la totalité ou d'une partie qualitativement ou quantitativement substantielle du contenu d'une base de données sur un autre support, par tout moyen et sous toute forme que ce soit ; b) La réutilisation, par la mise à la disposition du public de la totalité ou d'une partie qualitativement ou quantitativement substantielle du contenu de la base, quelle qu'en soit la forme. Ces droits peuvent être transmis ou cédés ou faire l'objet d'une licence. Le prêt public n'est pas un acte d'extraction ou de réutilisation. [^L323]: Le fait d'introduire frauduleusement des données dans un système de traitement automatisé, d'extraire, de détenir, de reproduire, de transmettre, de supprimer ou de modifier frauduleusement les données qu'il contient est puni de cinq ans d'emprisonnement et de 150 000 € d'amende. Lorsque cette infraction a été commise à l'encontre d'un système de traitement automatisé de données à caractère personnel mis en œuvre par l'Etat, la peine est portée à sept ans d'emprisonnement et à 300 000 € d'amende. [^L311]: Le vol est la soustraction frauduleuse de la chose d'autrui. [^act]: *La directive 96/9/CE du Parlement européen et du Conseil, du 11 mars 1996, concernant la protection juridique des bases de données, doit être interprétée en ce sens qu’elle n’est pas applicable à une base de données qui n’est protégée ni par le droit d’auteur ni par le droit sui generis en vertu de cette directive, si bien que les articles 6, paragraphe 1, 8 et 15 de ladite directive ne font pas obstacle à ce que le créateur d’une telle base de données établisse des limitations contractuelles à l’utilisation de celle-ci par des tiers, sans préjudice du droit national applicable.* [source](https://eur-lex.europa.eu/legal-content/FR/TXT/?uri=CELEX%3A62014CJ0030) ### Abbreviations *[WebScrapping]: Web scraping is a term for various methods used to collect information from across the Internet. *[XHR]: XHR is a GET request which even works outside ofthe website *[DOM]: The Document Object Model (DOM) is a cross-platform and language-independent application programming interface that treats an HTML, XHTML, or XML document as a tree structure where in each node is an object representing a part of the document. *[XPath]: *[Captcha]: *[TOR]: *[Docker]: *[AWS]: *[TOS]: Term Of Services *[XMLHttpRequests]: (XHRs) vers API *[API] *[Headless]: navigateur : Selenium / *[JSON]: *[JSON-LD]: *[User-Agent]: *[Bot Mitigation]: *[Proxy]: *[CSS Selector]: *[CDN]: (content delivery network) which acts as a reverse proxy, saving bandwhidt and protecting websites against denial-of-service attacks. - cloudflare - distil https://www.distilnetworks.com/block-bot-detection/

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully