Try   HackMD

Working with Web Archives Workshop

What: A workshop at the Society for Textual Scholarship 2017
Where: University of Maryland, College Park
When: May 31, 11-1PM
How: Register but be sure to select Pre-conference only if you only want to attend this or any of the the other free workshops.


The web is popularly imagined as a ubiquitous constantly changing cloud that resists fixity, the archive and the demands of research. But our experience of using the web is the result of a discrete configuration of networks, protocols, software and hardware. Tools and practices exist fixing this seemingly ephemeral content, and making it available for study.

In this workshop we will talk about how and when to use these tools in your work. We will get hands on experience using web archiving services like the Internet Archive, Webrecorder and Hypothesis. We'll also take a peek at what web archives look like as data.

Pre-Workshop

Before you come to the workshop please spend a little bit of time jotting down some brief notes about any projects you have done, or would like to do that involve collecting content from the web.

Also please read this short article that we will discuss at the beginning of the workshop.

Discussion

  • How have you wanted to use web content in your own research?
  • What happened in the OKCupid data release, and what could have been done differently?
  • What is scraping and crawling and how do they relate to archiving?
  • What is a robots.txt file?

Internet Archive

  • What are the Internet Archive and Archive-It?
  • How can you use the Internet Archive in your research?
  • How can you add web pages to the Internet Archive?
  • What are some alternatives to the Internet Archive?

Webrecorder

  • What is Webrecorder?
  • What does it mean to record the web?
  • How can you use Webrecorder in your research?
  • How can you download and replay your collections with WebrecorderPlayer?

WARC Data

  • What is WARC data?
  • Dissect a WARC file and see what is inside.
  • Example of processing WARC data in a Jupyter Notebook with warcio

Hypothesis

  • What is hypothes.is?
  • What does it mean to annotate the web?
  • How can you share your annotations?

Social Media Archives

and if there's time

  • How can you download your social media archive from Twitter or Facebook?
  • What does the archive consist of, how does it work?
  • What are the potentials or implications for research and the archive?

Wrap Up

  • Overall questions/comments
  • Ways to learn more stay in touch

Presenters