Working with Web Archives Workshop

# Working with Web Archives Workshop **What**: A workshop at the [Society for Textual Scholarship 2017](http://mith.umd.edu/sts2017/) **Where**: University of Maryland, College Park **When**: May 31, 11-1PM **How**: [Register](https://app.certain.com/profile/form/index.cfm?PKformID=0x25604792c18) but be sure to select *Pre-conference only* if you only want to attend this or any of the the other free workshops. --- The web is popularly imagined as a ubiquitous constantly changing cloud that resists fixity, the archive and the demands of research. But our experience of using the web is the result of a discrete configuration of networks, protocols, software and hardware. Tools and practices exist fixing this seemingly ephemeral content, and making it available for study. In this workshop we will talk about how and when to use these tools in your work. We will get hands on experience using web archiving services like the Internet Archive, Webrecorder and Hypothesis. We'll also take a peek at what web archives look like as data. ## Pre-Workshop Before you come to the workshop please spend a little bit of time jotting down some brief notes about any projects you have done, or would like to do that involve collecting content from the web. Also please read this short article that we will discuss at the beginning of the workshop. * [OkCupid Study Reveals the Perils of Big-Data Science](https://www.wired.com/2016/05/okcupid-study-reveals-perils-big-data-science/) ## Discussion * How have you wanted to use web content in your own research? * What happened in the OKCupid data release, and what could have been done differently? * What is scraping and crawling and how do they relate to archiving? * What is a robots.txt file? ## Internet Archive * What are the [Internet Archive](https://archive.org) and [Archive-It](http://archive-it.org)? * How can you use the Internet Archive in your research? * How can you add web pages to the Internet Archive? * What are some alternatives to the Internet Archive? ## Webrecorder * What is [Webrecorder](https://webrecorder.io)? * What does it mean to *record* the web? * How can you use Webrecorder in your research? * How can you download and replay your collections with [WebrecorderPlayer](https://github.com/webrecorder/webrecorderplayer-electron/releases/tag/v1.0.3)? ## WARC Data * What is [WARC](https://en.wikipedia.org/wiki/Web_ARChive) data? * Dissect a WARC file and see what is inside. * Example of processing WARC data in a [Jupyter Notebook](http://jupyter.org) with [warcio](https://github.com/webrecorder/warcio) ## Hypothesis * What is [hypothes.is](https://hypothes.is)? * What does it mean to annotate the web? * How can you share your annotations? ## Social Media Archives *and if there's time* * How can you download your social media archive from Twitter or Facebook? * What does the archive consist of, how does it work? * What are the potentials or implications for research and the archive? ## Wrap Up * Overall questions/comments * Ways to learn more stay in touch ## Presenters * Purdom Lindblad: [@Purdom_L](https://twitter.com/Purdom_L) * Ed Summers: [@edsu](https://twitter.com/edsu)