Discussion group Humanities & Data Science @Turing === ###### tags: `Humanities and Data Science`, `Ethics`, `Discussion group` 📣 ==Joining Link: <Zoom link will be added/shared before the call>== --- :::info - **Next meeting date:** October 14, 2020 16:00 PM (BST) ([Calendar invite](https://arewemeetingyet.com/london/2020-10-14/16:00/Humanities%20and%20Data%20Science%20October#eyJ1cmwiOiJodHRwczovL2hhY2ttZC5pby9AbWFsdmlrYXNoYXJhbi9odW1hbml0aWVzRFMifQ==)) - **Hosts:** Fede, Leontien, Katie, Malvika - **Contact:** fnanni@turing.ac.uk, leontien.talboom.18@ucl.ac.uk - **Mailing list:**[Link](https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=TURINGINS-HUMANITIES-DATASCIENCE) - **To Know More about the Special Interest Group:** - https://www.turing.ac.uk/research/interest-groups/humanities-and-data-science - https://github.com/fedenanni/HDS-DiscussionGroup ::: **Direct links to the notes from the last meetings:** - [Notes: 14 October, 2020](#Notes-14-October-2020) - [Notes: 02 September, 2020](#Notes-02-September-2020) - [Notes: 24 June, 2020](#Notes-24-June-2020) - [Notes: 20 May, 2020](#Notes-20-May-2020) - [Notes: 08 April, 2020](#Notes-08-April-2020) - This was canceled due to the lockdown - [Notes: 04 March, 2020](#Notes-04-March-2020) - [Notes: 05 February, 2020](#Notes-05-February-2020) - [Template](#Template) ## Notes: 14 October, 2020 ## Topic - ethical implications of archiving the web, especially social media ## Aim of this meeting - We would like to talk about the benefits and drawbacks of guaranteeing long-term access to this type of material, focusing in particular on the dichotomy between authorial consent and historical preservation. - Here’s a few starting points for the discussion: - [Guest Editorial: Reflections on the Ethics of Web Archiving](https://www.tandfonline.com/doi/full/10.1080/15332748.2018.1517589) - [We Could, but Should We? Ethical Considerations for Providing Access to GeoCities and Other Historical Digital Collections](https://uwspace.uwaterloo.ca/bitstream/handle/10012/11649/Milligan_etal_JCDL2016%281%29-s.pdf?sequence=1&isAllowed=y) - Archiving social media for good: - [https://www.docnow.io/](https://www.docnow.io/) - [https://www.bellingcat.com/](https://www.bellingcat.com/) ## Volunteer to take notes - - - - - ## Participants **Participants (write your names below)** *Name / Institute - - - - - :books: Reference and other works mentioned during the discussion --- *Please add link and reference, any work that has been discussed and mentioned* - :mag: Main arguments from the discussion --- *anyone can help taking notes.* - :closed_book: Closing thoughts -- - ### Additional Drafted Notes <!-- Other important details discussed during the meeting can be entered here. --> We will start from a few examples on how this is done in practice. The UKGWA archives twitter, youtube and trying facebook They keep the tweets (no context / retweets / comments) Another example is the Document the Now (they don't archive material themselves) but offer tools Bellingcat is an investigate journalist company. They use social media to debunk certain things - examples on Covid misinformation They have a more don't ask for permission but ask for forgiveness. Is it useful to archive Jenny: difference between archiving and capture Nicola: from BL. Archiving social media preserving the entire life cycle. Legal deposit - they don't archive social media at scale. How to assess what is UK content? Heretrix used for archiving at scale - not suited for archiving at scale Recent blog post on an experiemnet we ran using Webrecorder in the last UK General Election: https://blogs.bl.uk/webarchive/2020/05/using-webrecorder-to-archive-uk-political-party-leaders-social-media-after-the-uk-general-election-2.html This is an older blog post discussing archiving social media throgh heritrix https://blogs.bl.uk/webarchive/2017/04/the-challenges-of-web-archiving-social-media.html The UKGWA doesn't keep dynamic nature of social media Losing context around the tweet UKWeb Archive at BL tries to get context from metadata Social media material is as much as possible publicly avaiable (you would need an additional permission from the content owner) Jenny: what do we create when we doing it? What are we capturing? (who archives social media at scale?) Jenny: build social network from the XVIII century through letters - you can do that, but it is not there in the same form Helena: what are we preserving and why? Research on social media - but how did you collect them? Discussion around anonymisation / deanonymisation And the role of the archive in this - it could act as a collaborator for the researchers for guaranteeing the way data is collected [Nicola] Prioritisation from the archivist point of view. This is usually not a priority Technical expertise are needed [Andy] We can run the code from our holdings CommonCrawl / Internet Archive are available so people can run their code first there [Helena] document the process of how data has been collected [Ian] question for the researchers: what do they want? Discussion around preserving cross-national events IIPC collections are cross national and multi lingual. All are open access https://archive-it.org/home/IIPC WARCnet https://cc.au.dk/en/warcnet/
 but mostly just available as a refernce rather than big data research but IIPC Research Working Group are working on this
https://netpreserve.org/about-us/working-groups/research-working-group/ [Andy] the distinction between private and public is dissolved now - the social context is all mixed together [Nicola] discussion about the role of BL on archiving and making available [Helena] archiving newsletter? discusses how BL archives currently Weibo [Nicola] moving more into a contracting out to the users the type of contents to preserver [Ian] who makes the choice? Delegating to the crowd [Jenny] but are we replicating the old model? [Helena] people are already contacting the BL for preserving their online activities before Discussion on who to communicate to the crawler what you should not ## Notes: 02 September, 2020 ## Aim of this meeting The focus is on the **current and future role of preprints** as a way of sharing research findings, with examples from different communities. ## Participants **Participants (write your names below)** *Name / Institute* - Federico Nanni, The Alan Turing Institute - Leontien Talboom, UCL & The National Archives - Jessica Polka, ASAPbio - Anna Rogers, University of Copenhagen - Dmytro Mishkin, CTU in Prague - Demitra Ellina, F1000Research - Barbara McGillivray, University of CamBarbara McGillivray - David Beavan, The Alan Turing Institute - Rennie Mapp, University of Virginia, US - Martin O'Reilly, The Alan Turing Institute - Callum Mole, The Alan Turing Institute - Adam Tsakalidis, QMUL & The Alan Turing Institute - Alessandro Tirapani, City, University of London - Giulia Paci, UCL - Amy Tabb, at the meeting as an independent scholar, USA :dart: Quick Questions --- *Feel free to answer them or add a '+1' next to a statement that you agree with and/or would like to discuss* **In which cases do you post a preprint of your work?** *Name / response* - Dmytro / Anytime, unless my collaborators are against it - Leontien / Never, it is not common in my field - Jessica / Always (except some commissioned review articles) - Adam T / faster access (post-acceptance) - David B / does final author copy count here? i.e. to satisfy open access - where it's mandadted by funder - Martin O / Pre-submission or post-acceptance (depending on journal policy) if journal paper is not open access, as I always want a freely available copy. I'd like to move my default to pre-submission pre-print as standard practice - Callum / I publish a pre-print at paper submission. Primarily for faster access since the review process can be so sluggish sometimes. I also like that the review process is then transparent (if the paper changes a lot from pre-print to journal article). - Amy Tabb / most of the time when publishing w/ CS/ECE researchers. **And when you don't?** *Name / response* - Dmytro / Only if co-authors are not allowed to. - Jessica / Review article requested by the journal (depending on journal policy) - Alessandro / In our field (organisation studies) it is extremely uncommon to do it alltogether. Multiple journals ask you to take it down (few do it nonetheless) and some do not accept articles already posted online - Amy Tabb / when the lead authors are not in favor and/or it is not in the discipline's tradition (entomology). **How do you select which new preprints to read?** *Name / response* - Leontien / Mainly shared by people on Twitter [name=DavidB] +1 [name=Martin O] +1 - Dmytro/ http://www.arxiv-sanity.com/, twitter - Jessica / Twitter - we have cataloged some efforts here: https://reimaginereview.asapbio.org/explore/?search_keywords=preprint&sort=latest - Amy Tabb / twitter **Do you ever question your approach?** *Name / response* - Dmytro / No. I thought about it a lot, but cannot find reasons not to for myself. - Amy Tabb / Also no. Preprinting has been very positive for my work and allows me to transfer the technology. **Do you then regularly read the final paper when it is published?** *Name / response* - David B / Nope, things have often moved on well before. Makes it tricky as what to cite, the preprint or the final - Leontien / Depends on the type of work it is, as some work will be outdated quite quickly - Dmytro / rarely, mostly if it is updated on arXiv and the paper is very relevant to me - Jessica / we have recommended that preprint servers implement changelog in metadata, would be good to see this for journals as well: https://asapbio.org/biopreprints2020-report - Demitra/ At F1000Research we combine the preprint with the open post-pub peer review process, with all versions linked and an ammendement box explaining what has changed between versions - Amy Tabb / not frequently because of access issues. :books: Initial Drafted Notes --- The Current and Future Role of Preprints Across Research Communities Preprints - scholarly or scientific papers that precedes formal peer review and publication in a peer-reviewed scholarly or scientific journal. Some communities use it a lot more than others, also there has been a paid increase of preprints in the last decades. Why publish preprints? - Increased visibility - Increased citations - Faster dissemnation of results - May prevent scooping - It migh be an easy way to wrap-up a side project - Bypassing paywall Anna discusses the behaviour in NLP regarding preprints. Main point of preprints here is to try and get results out faster. Preprints can be very different from the published version, Anna gives an example of her own work which turned out completely different than the initial preprint. She talks about how even if the published paper turns out much better, not a lot of people revisit it. Dmytro disagrees, and talks about the fact that he does read an updated version if it is published. Martin would love to see how a preprint can change over time, a change log would be very helpful with that. Are they even still the same paper over time? There are very few journals in the life sciences that point to the preprints. Jessica talks about how journals should acknowledge that preprints exist. Some fields seem to be more comfortable citing preprints than other fields. In NLP citations for preprints seem to be more common than citing the actual published work. Across our different disciplines there are different approaches to editing and finishing off the final version of a published paper. Another point that we touched upon is the sheer amount of preprints and this slightly touches on the topic of trust of preprints. How do you select them? What approach is used here? Demitra from F1000 Research talks about how she approaches different disciplines. They are looking at the differences between fields, for examples, for some fields you need a PhD to be considered an expert, whilst others view a Master's degree as sufficient. Because the reviews are open and citable, it encourages people to do a better job at reviewing. Dmytro is concerned about if early career researchers do an open critical reviews, which could impact their career. Demitra talks about how you can team up and protect yourself from these types of situations. Preprints is increasing the speed of research, but does this push research into a certain direction? And is that the direction that you want to go in? Is this harmful for the research community? Jessica - making science openly accessible and make it possible for everyone to provide their expertise gives the writers a much better feedback than when using a more traditional peer-review approach. But the downside of this is that it is easier to disseminate misinformation. Rennie, from a digital humanities background, takes an example of a journal from her field (Cultural Analytics). This journal is against preprints, because it can disrupt the blind peer-review process. Both Dmytro and Martin question why this process is blind. Anna would like to preserve anonymity, her blogpost about this is linked in the below section for more details. Alessandro talks about how his field does not perceive preprints very well, they are not very well known either. Also, the discussion is different depending on the research methods, qualitative and quantitative material will need different approaches. He closes about mentioning how we should rethink the journal process. :mag: Main arguments from the discussion --- - Preprints may change drastically during the time of it being made available and when the actual paper is published. This rises questions around what is actually being cited, as the finished paper could be very different - Peer review is perceived very differently across different fields. It is difficult to find the balance between giving researchers credit for this work, but also keeping people's reputation intact if they provide a critical review, especially if this is an early career researcher. - The benefits of preprints differ across fields, in the section above some example have been given. However, there was a strong positive perception of publishing preprints. But some fields may be more used to using them than others. :books: Reference and other works mentioned during the discussion --- There is a tool for [seeing changes in arxiv papers](https://github.com/temken/comparxiv) [ReImagineReview](https://reimaginereview.asapbio.org/explore/?search_keywords=preprint&sort=latest) [F1000 Research](https://f1000research.com/about) Anna Rogers - [Should the reviewers know who the authors are?](https://hackingsemantics.xyz/2020/anonymity/#BharadhwajTurpinEtAl_2020_De-anonymization_of_authors_through_arXiv_submissions_during_double-blind_review) Dmytro Mishkin & Amy Tabb - [(part I) Hands off Arxiv!](https://amytabb.com/ts/2020_06_29/) Dmytro Mishkin & Amy Tabb - [(part II) What does it mean to publish your scientific paper in 2020?](https://amytabb.com/ts/2020_08_21/) Amy Tabb - [arXiv paper explainer](https://amytabb.com/ts/2020_08_09/) [Data feminism](https://mitpress.mit.edu/books/data-feminism) as an example of sharing qualitative research before publication [Twitter thread](https://twitter.com/annargrs/status/1301204793235566600) by Anna Rogers wrapping up the discussion :closed_book: Closing thoughts -- Next aspects on the topic that we can discuss in further sessions: * open peer review in the humanities * preprints / working draft and qualitative research * being a reviewer as a "job" ----------- ## Notes: 24 June, 2020 - Topic: ***Commercial organisations doing the job of libraries/archives*** - [Slides](https://docs.google.com/presentation/d/1ZfY0\_GyYBkRyvkrCt\_7hJhShFNdYGaRYQUsKpizUBAY/edit?usp=sharing) ## Aim of this meeting - Having a conversation rather than a one-to-many reading group - Discussing topics at the intersections of the two disciplines - Trying to consider different / uncommon points of view - Sign up to this mailing list: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=TURINGINS-HUMANITIES-DATASCIENCE **Participants (write your names below)** Name / Institute / What brought you here? (answer in a short sentence) - Federico Nanni / Turing / Leading the session - Leontien Talboom / UCL / chairing the session - Katie McDonough / Turing - Malvika Sharan / Turing / Community discussions - Sarah Gibson / Turing - Scott Bailey - Rossiza Atanassova - Patricia Murrieta - KBeelen - Eirini Goudarouli (TNA) - Daniel Wilson / Turing / Historian working with/on 'data' - D Vanstrien - David Beavan / Turing / Co-Organiser Humanities & Data Science interest group - Bernard Ogden - A Lang - Barbara McGillivray :dart: Discussion Goal --- Commercial Digitalisation is not a library, or are they? Example 1: - National archives have agreement with "Findmypast" to secure records on ancestory - pro- Findmypast does the digitalisation and preserves the data in different format - Con- this is available only upon visiting the national archive reading room, but to access them from home there is a paywall for access Example 2: - Googlebooks: Digital bookstores are not library - they are copyright and authors don't benefit from them - You need to pay to access the books and hence its google that profits from this and not society Example 3: - Internet archive - not library but piracy as there is no license to make these books available ![](https://i.imgur.com/sQpDrsM.png) Questions: 2. What are the drawbacks of commercial organisation acting in this environment? - As their main goal is to make money, how do we ensure that our values also come across? - How do we guarantee long-term preservation? (when the hype is gone) - If the business is based on data, how do we ensure that data is open and fully available? 3. How should we be setting up such a relationship? - Which value do we recognize in their work, apart from the invested budget? - How can we ensure that our expertise is not lost? - Or is it something that academia should discourage as a whole? :books: Reference and other works mentioned during the discussion --- - [Google books](https://books.google.com/) - [Torching the Modern-Day Library of Alexandria](https://www.theatlantic.com/technology/archive/2017/04/the-tragedy-of-google-books/523320/) “Somewhere at Google there is a database containing 25 million books and nobody is allowed to read them.” - [Google & the Future of Books by Robert Darnton](http://www.nybooks.com/articles/22281) - Gale Cengage came up with the Digital Scholar Lab to allow computation with the digitised collections behind pay-wall: https://insights.uksg.org/articles/10.1629/uksg.482/ - [Removing Barriers to Digital Scholarship](https://www.gale.com/intl/primary-sources/digital-scholar-lab) :mag: Main arguments from the discussion --- **Questions 1.** What are the benefits of commercial organisation acting in this environment? - How would we otherwise fund large digitisation projects? - Does this mean the material is more widely available? - Does this simplify cross-country efforts? **Discussion on Google books and British Library (BL) contract on digitalisation of literature**: - Google books: https://books.google.com/ - What are their business model? - very unclear to library and archives - Rossitza: Google Books are still going, and digitising collections at the BL and other institutions - Daniel Wilson (in chat): As Rossitza said, they *are* digitising thousands of books a month: but the business model is more opaque.I assumed Google Books was meant to be a ‘loss leader’ for the wider operation: PR for their ambition to ‘organise all knowledge’ (aka advertising) - A Lang (in chat): book historian Robert Darnton wrote a good piece on this in the NYRB some years ago (ironically enough, behind a paywall: http://www.nybooks.com/articles/22281) Questions on Google digitalisation: - Patricia Murrieta (in chat): what is the arrangement between the BL and them Rossitza? In terms of what you get and what do they get? - The contract is available online, it is quite inflexible https://www.openrightsgroup.org/blog/access-to-the-agreement-between-google-books-and-the-british-library/ - What are they interested in digitalising? What kind of material? Just speculating what's their goal :D - Curators had the freedom to select materials but there are restrictions on dimension and condition to meet the requirements of the scanning equipment Google use - A lot of the books can be rejected if the metadata is missing - There is a focus on scale than content, e.g. a request to digitalise some specific material was rejected because metadata was missing and BL did not have the resources - The goal seem to be text mining and OCR ... - Libraries are having different dialogues with google group (separately), and it's not consistent - Even if their goal is not the most charitable, how can communities benefit from it? - David Beavan (in chat): They are hoovering up all of human knowledge. Born digital for them = web, they have got covered. This is a way of going back in time. Language models, semantic change, OCR, gateway to knowledge. If they become to de facto place for knowledge/search and put libraries out of business (even if only by convenience) then you’ll get adverts between page turns etc.etc. Google are ultimately a advertising company - Daniel Wilson: there is one buyer and no competition. we need to understand what is that they gain from this, in order to value the resource they are being given. - In any case, my point was more that the BL felt its hands were tied, even before it got to that point
 - Katie: ancestry free from local libraries in the UK during the pandemic. - Geneology organisations hold a lot of power (personal information) - Based on where they are (America or UK), they also compete for information - Patricia: It's in a way like publishing companies. In order to change the model, holders of knowledge would have to choose not to go with them... - Mia: Other organisation can access Google Books, but they don’t mind the unlimited liability that Turing didn’t agree to - Kate M (in chat): Is there any writing/research about which countries have provided public funding for digitization vs. those that have gone (at least primarily) with commercial digitization? - Mia: Really great overview on the efforts of different countries in digitizing their cultural heritage (France, Finland, Australia, New Zealand, Canada) in comparison with the UK :closed_book: Closing remarks/questions/topics (for future discussions!) -- - In science we have a strong open movement on the basis that the Tax payers (public funding) going into research should produce output that is publicly accessible. - However, that kind of funding is missing in humanities which is shocking given the fact that humanities affects generations of scholars, researchers, politicians and citizens. - What we have also realised that some of the researchers work on a field not because that's what they want to do, but because that's the only field they can access paper on - I wonder if that pattern exists within humanities as well. - The embargo for IP rights on research output are same across all these fields ### Additional Drafted Notes <!-- Other important details discussed during the meeting can be entered here. --> - ## Notes: 20 May, 2020 ## Aim of this meeting - Having a conversation rather than a one-to-many reading group - Discussing topics at the intersections of the two disciplines - Trying to consider different / uncommon points of view - Chatting over lunch (a tea / beer) to make it as informal and relaxed as possible - Trying to have this the first Wednesday of every month (up for discussion) ### Topic - The Computational Humanities and Toxic Masculinity? A (long) reflection ([Original blogpost](https://latex-ninja.com/2020/04/19/the-computational-humanities-and-toxic-masculinity-a-long-reflection/), [Our slides](https://docs.google.com/presentation/d/11qi43HYFjogFJV36u2pS0CPLqP43d_Ypvv4pYvxuKVY/edit?usp=sharing)) - Katie McDonough (from Living with Machines) will introduce the topic and Fede chairs the debate. **Participants (write your names below)** Name / Institute / What brought you here? (answer in a short sentence) - Federico Nanni / Turing / Leading the session - Leontien Talboom / UCL / Leading the session - Katie McDonough / Turing / Chairing the session - Malvika Sharan / Turing / Want to capture different perspective in the Turing Way project - Barbara McGillivray / Turing and Cambridge - Ismael Kherroubi Garcia / Ethics Research Assistant at Turing * Sarah Gibson / I'm a Research Software Engineer in the Research Engineering Group at the Turing. I'm an advocate for reproducible research and work on open projects like mybinder.org and The Turing Way. I'm also on the Living with Machines project at the Turing. * James Smithies from King’s Digital Lab, King’s College London * Glen Cameron / Illinois (US) working at HathiTrust Research Center * Laura Carter / Human Rights Centre at the University of Essex, currently an Enrichment student at the Turing * Arianna Ciula / Deputy Director and Senior Research Software Analyst at King’s Digital Lab, King’s College London (UK) * Scott Bailey / Data and Visualization Librarian at NC State University Libraries, but previously worked at the Scholars’ Lab @ UVa, and at Stanford’s Center for Interdisciplinary Digital Research (CIDR) * Eirini Goudarouli / Heads of Digital Research Programmes at The National Archives, UK * Jane Winters / School of Advanced Study, University of London. * James Cummings / Newcastle University, DH, Late Medieval Drama, TEI geek, that sort of thing. * Luca Scholz / Lecturer in Digital Humanities at the University of Manchester (UK) * David Beavan / Turing Research Engineering. Amongst other things, I’m co-organiser of the Humanities & Data Science SIG at the Turing, find out more here: https://www.turing.ac.uk/research/interest-groups/humanities-and-data-science * Kaspar Beelen / Research Associate at the Alan Turing Institute (Living with Machines project) * Charlotte Tupman / Digital Humanities Lab at the University of Exeter. Into ancient inscriptions. * Giulia Occhini / PhD student at the Turing in Data Science/NLP/Digital Humanities and other stuff * Sarah Lang / (also known as The LaTeX Ninja and author of the post discussed today) - my non-Ninja-self works at the Centre for Information Modellierung (Zentrum für Informationsmodellierung) in Graz, doing my PhD in Digital Humanities on early modern science / alchemy. My internet isn't always stable, so no permanent video ;) * Melvin Wevers / DHLab of the KNAW Humanities Cluster in Amsterdam. One of the organizers of the Computational Humanities Research workshop. * Kevin Xu / Research Software Engineer at the Turing.
 * Glen Worthey/ U. of Illinois, Urbana-Champaign, at the HathiTrust Research Center. Thanks to Katie (my former Stanford colleague) for the invitation from across the Atlantic. Great to see many old friends and colleagues, looking forward to meeting new ones! * David De Roure / (Dave D as opposed to Dave B…), a Turing Fellow and my project is AI and music (I’m a digital musicologist, also know occasionally as a computational musicologist…). In Oxford I look after the Digital Humanities network (DH@Ox) which comes together annually in the DHOx Summer school (a cut-down 3 day online event this year). I’m a visiting prof at the Royal Northern College of Music working on science and music. I’m also involved in the UKRI research and innovation infrastructure exercise. * Daniel van Strien / I work at the BL as a digital curator.
 * Olivia Vane / Research Software Engineer at the British Library (Living with Machines project) :dart: Discussion Goal --- - :books: Reference and other works mentioned during the discussion --- - Gender bias before and after “Computational Humanities,” some starting points - [Beyond the Margins: Intersectionality and the Digital Humanities](https://www.digitalhumanities.org/dhq/vol/9/2/000208/000208.html), DHQ (2015) by Roopika Risam - [The Radical Potential of the Digital Humanities](https://blogs.lse.ac.uk/impactofsocialsciences/2015/08/12/the-radical-unrealized-potential-of-digital-humanities/), Miriam Posner - [Bodies of Information](https://dhdebates.gc.cuny.edu/projects/bodies-of-information), ed. by Jacqueline Wernimont and Elizabeth Losh (2018) - [DH-WoGeM](http://www.dhwogem.org/) - [Data Feminism by Catherine D’Ignazio and Lauren F. Klein](https://bookbook.pubpub.org/data-feminism) - [LaTeX Ninja blog post](https://latex-ninja.com/2020/04/19/the-computational-humanities-and-toxic-masculinity-a-long-reflection), 19 April 2020 (the author is here!) ### Initial Drafted Notes <!-- Important details discussed during the meeting can be entered here. --> Here's we can collaboratively take notes of the main passages of the conversation, that we can then organize as below. - Does the computational skill denote to some power structure? - is it assumed to be masculin (and hence exclude women or other genders) - Digital humanity has grown out of humantities wby techie people, similarly computational humanity seem to have come out of folks in computation who are interested in humanities - not sure if that's what creates a niche (Comment from Zoom: Does CH vs DH play back into long disproved stereotypes of The Two Cultures? Or is it different from wethat?) - A lot of the points that is in the blog post echoes human right approach that people in legal space talk about - Problem of binaries: techie - fuzzy divide (technies are engineers and fuzzy are historian and literature folks) that separates intellectual community in a campus - There is a gendered aspects indeed that exist in many research spaces and as a research community we should think about how do we manage these privileges and power dynamics - By the author on what led her to draft the post - The motivation comes from the lack of full understanding of what Computational Humanity actually stood for - In many languages this as a field doesn't exist - She noticed that some jobs in humanities are offered to computer scientists because they can do machine learning and a qualified humanities specialist might not - Many conference only highlight computational visualisation and not so much on humanities - Women and men will have same chances to get selected if they work on the same topic, but what if a field is also gender biased and that's the field that gets more focus - Privilege hazard: when you have privilege you don't see the problem - Often people get offended by people pointing out less privilege. They are afraid to speak up, and therefore having a safe space is useful. - ![](https://i.imgur.com/nTqIN1y.jpg) - How power dynamics influence the way we do research - When working in science there is an attitude of "verificationism" - as people who don't have the same lived experience want to understand what others are talking about (putting yourself in the shoes of other genders) - this causes frustrations to both sides of debaters - People come for a value but stay for the ethos - How is DH formed, and what aspects are being considered? - Why is a separate community being formed? - Melvin: I think it's not a clear separation, in our view it's more like a special interest / subcommunity within the larger community - James: Glen, It may also be different in the different regionalities of DH, where there are difference focuses. - James: Sarah, Maybe the perception of marginalisation is something we all have but to more/lesser degrees? ### Some comments from the chat - From James Cummings to Everyone: (4:32 pm): I'm not sure women have the same chance of getting accepted to a conference. At least that isn't what the statistics over a long period seem to show. There is a PI-goes-to-the-conference, and then far too many of those are still men. - From Ismael to Everyone: (4:34 pm): Having zero background in computational or digital humanities, I only learned the term when I saw this discussion advertised! I am happy to see the definitions are vague - I have a feeling that defining what either one is (or clarifying that they are the same) could be a first step (setting aside the enormous social background through which all concepts, names, etc. are interpreted for a moment) - From Jez Cope to Everyone: (4:35 pm): My gut reaction is "we need to investigate this more" too, but I'm also aware that attitude tends to perpetuate the status quo, both because you don't have to change anything until you've investigated, but but also there's a danger of confirmation bias - From James Cummings to Everyone: (4:36 pm): Ismael: There is a long history (and publications like 'Defining DH') on what is or isn't DH. I've learned from experience when someone claims they are doing DH it isn't my place to say whether that is real 'DH' or not. ;-) - From quinn dombrowski to Everyone: (4:37 pm): On representation in conference acceptance, there's this paper on DH (through 2015) which suggests underrepresentation https://scottbot.net/representation-at-digital-humanities-conferences-2000-2015/
 - Sarah: Thanks! I think this is a good summary of where I wanted to go with the post - I get how continuous feminism debates can be somewhat annoying to men, but it's just like Laura said- if you have the priviledge, you can't just "see" the perspective of those who doN#t - From quinn dombrowski to Everyone: (4:42 pm) If the stats reflect fewer women submitting, isn't that a problem too?
 - From Arianna Ciula to Everyone: (4:42 pm) My reactions: names ARE important, names often mean identity especially at certain stages in life (e.g. early career); society has problems with diversity (just look at figures on salaries across sectors); DH/RSE/Computational Humanities are right to question/problematise names and question bias/problemitise reifications of societal bias/problems; however you would assume we had figured out by now that instrumental and intellectual are entangled - if itsn’t this community who can articulate it best, who else? - james - Yeah, we are way passed having to ask women to prove they don't feel comfortable/experience misogyny. Just believe them. (But also, forgive us privileged if we forget EDI sometimes. Prod us when we do.) - From James Cummings to Everyone: (5:02 pm): (Hopefully we'll eventually get to a place where we don't need prodding, it is normal.) - From Melvin Wevers to Everyone: (5:03 pm): @sarah, I see how in the grant-world, traditional hum is threatened by computational approaches. But having a community dealing with issues related to communities is not necessarily set up as something that invalidates this field of scholarship, communities = computation - From James Smithies to Everyone: (5:03 pm): Thanks to the organisers and everyone who shared their thoughts - really valuable for me. :mag: Main arguments from the discussion --- - :closed_book: Closing remarks/questions/topics (for future discussions!) -- - ### Additional Drafted Notes <!-- Other important details discussed during the meeting can be entered here. --> - ## Notes: 04 March, 2020 - **Hosts:** - Fede, Leontien ### Topic - [Data-driven publications in the Humanities](https://docs.google.com/presentation/d/13nPK5f9Z6wEwOkjbNfLQI4WZ1cRJ9HfDcl6MmmuaJtY/edit?usp=sharing) - comments are open **Participants (write your names below)** - Malvika, Kasra, Laura, Katie, DanVan, Kaspar, Mariona, Giorgia Occhini, Tim, Amy :dart: Discussion Goal --- - Discussing the impact of data-driven research on the overall debate concerning methodology in the humanities :books: Works mentioned during the discussion --- - Gregory Crane, [What Do You Do with a Million Books?](http://www.dlib.org/dlib/march06/crane/03crane.html), 2006 - [The Culturomics paper](http://www.culturomics.org/), 2010 - Dan Cohen, [Initial Thoughts on the Google Books Ngram Viewer and Datasets](https://dancohen.org/2010/12/19/initial-thoughts-on-the-google-books-ngram-viewer-and-datasets/ "https://dancohen.org/2010/12/19/initial-thoughts-on-the-google-books-ngram-viewer-and-datasets/"), 2010 - Anthony Grafton, [Loneliness and Freedom](https://www.historians.org/publications-and-directories/perspectives-on-history/march-2011/loneliness-and-freedom), 2011 - Cameron Blevins, [Topic Modeling Martha Ballard's Diary](http://www.cameronblevins.org/posts/topic-modeling-martha-ballards-diary/), 2010 - Matthew Jockers and Annie Swafford discussion around the Syuzhet package, 2015. Starting points: [1](http://www.matthewjockers.net/2015/02/02/syuzhet/) and [2](https://annieswafford.wordpress.com/2015/03/02/syuzhet/) - Scott Weingart, [“Digital History” Can Never Be New](https://scottbot.net/digital-history-can-never-be-new/), 2016 :mag: Main arguments from the discussion --- - The different role that examples play in historical research compared to other disciplines (especially the social and natural sciences). In the first, they are presented as evidences for sustaining a specific narrative, while in the others they offer insights on a quantified property of the analysed data. - Initially, a distant reading method like topic modeling was used for browsing and visualizing the collection, not for deriving evidence (however the distinction is thin) - History deals with questions that often cannot be answered using big data and quantitative approaches. For instance "how" questions, rather than "what". :closed_book: Closing remarks/questions/topics (for future discussions!) -- - The role of private companies in digitizing and making available collections (ethical, copyright and accessibility issues) - Non-domain experts doing research in the humanities (as well in biology, medicine, psicology) because they know how to work at scale - The difference between discovering and justification in the Humanities (starting from [Trevor Owens, 2012](http://www.trevorowens.org/2012/11/discovery-and-justification-are-different-notes-on-sciencing-the-humanities/)) - How different disciplines answer "why" questions, and whether this is changing with the advent of data science. ### Additional Drafted Notes <!-- Other important details discussed during the meeting can be entered here. --> - Previous experience working/playing with tools for working with large dataset without any goal or questions in mind - How can playing around be changed to actual research: making sense of the outcome? - What happens when you find something you did not expect? i.e. sentiment analysis of Dorian Grey that ends up showing that the book is sad in the first part and happy towards the end. Is data exploration used in other fields? - Linguist researchers working on Oral tradition - can create maps and names - Engineers - hypothesis generation does not expect surprises, but going forward in exploration can give you surprise to support or reject these surprises - Historians go to archive with some question to select the collections to look at - then they can lean on serendipity that can lead to the crystalisation of bigger new questions - In bioinformatics we can start with a set of data, i.e. multiple cancer sequencing data (transcriptomics, metagenomics, metabolomics, proteomics) and we can study patterns and derive conclusion on what kind of cancer are they, what causes them, which are the genes or the drug targets. Serendipity and surprises are basically everything - but larger dataset allow us remove noise from actual signal in data. - Conclusion: Its hard to ignore surprises and ask more questions when they appear as a side effect of an original question - Dan Cohen's reaction to n-gram: are trends derived from big data as historical evidences or do we just want to search and learn? - Human right critical theory: this approach allows them to look at the last status (what it is) and then go back to looking into data to see how it started. - Social sciences: Hypothesis generation in social sciences are based on assumptions derived by a specific group of people working on selected cases and examples - it changes with people, their environment and cases/examples - Engineer: we need tools for exploration and other tools for trends - trends can allow us to avoid averaging out (overfitting or underfitting) of observations. - when you are working with millions of article, you will find an article that matches your ideal observation - improving how we use methods to go from distant to close reading or vice versa: avoiding cherry picking of observation made through an analysis by using computational methods that can help them avoid these bias - Do computational approaches to history help historian ask new question, or provide new methods to explore old question? - In other fields, general observations allowing future predictions, for e.g. conflict, infections - It also depends of the relevance in our community, for e.g. coronavirus vs infection in general - Some questions can't be addressed with the close reading because it is about trend (and vice versa) --- ## Notes: 05 February, 2020 - **Hosts:** - Fede, Leontien ### Topic - [Data Science Tutorials & Humanities Scholars](https://docs.google.com/presentation/d/1ZfY0_GyYBkRyvkrCt_7hJhShFNdYGaRYQUsKpizUBAY/edit?usp=sharing) **Participants (write your names below)** - Malvika, Kasra, Katie, Daniel W, DanVan, Dave, Kaspar, Mariona, Olivia ... :books: References --- - Programming Historians: https://programminghistorian.org/ - :dart: Discussion Goal --- - The focus will be on the benefits and drawbacks of tutorials enabling humanities scholars to easily use data science methods. :mag: Main arguments from the discussion --- 1. Tutorials are useful for very specific tasks, to learn what a tool could/should do, not necessarily to learn data science. They are a first step into the field, but from the discussion it became apparent that it is easier to learn data science from books, courses or internships. 2. Different methodological frameworks between science and humanities education. We talked about whether data science will ever become part of humanities curricula, due to the demand from students which see it necessary for entering the job market. We had a comparison with the training in biology, a science that (partly) relies on qualitative methods. 3. Tutorials often don’t have an interactive component (compared to working in groups). This leads to less of a community feeling; also it is unclear how reliable they generally are. The Programming Historian addresses many of these issues, with peer-review, frequent updates and an active Twitter community. :closed_book: Closing thoughts -- - ### Additional Drafted Notes <!-- Other important details discussed during the meeting can be entered here. --> - None for this meeting. # Template -- ## Notes: dd Month, yyyy ## Topic - ## Aim of this meeting - ## Volunteer to take notes - - - - - ## Participants **Participants (write your names below)** *Name / Institute / What's your experience with preprints?* (answer in ONE short sentence) - - - - - :dart: Discussion Goal --- - **2 minutes silent note-taking: personal reflection** "Add '+1' (plus 1_) next to a statement that you agree with and would like to discuss" *Name / response* - - :books: Reference and other works mentioned during the discussion --- *Please add link and reference, any work that has been discussed and mentioned* - :mag: Main arguments from the discussion --- *anyone can help taking notes.* - :closed_book: Closing thoughts -- - ### Additional Drafted Notes <!-- Other important details discussed during the meeting can be entered here. --> -