Tim Sherratt

Sharing recent updates and work-in-progress

May 2021

Web archives section of GLAM Workbench updated!

My program of rolling out new features and integrations across the GLAM Workbench continues. The latest section to be updated is the Web Archives section!

There are no new notebooks with this update, but some important changes under the hood. If you haven’t used it before, the Web Archives section contains 16 notebooks providing documentation, tools, apps, and examples to help you make use of web archives in your research. The notebooks are grouped by the following topics: Types of data, Harvesting data and creating datasets, and Exploring change over time.

I’ve updated all the Python packages used in this repository and changed the app-ified notebooks to run using Voila (which is better integrated with Jupyter Lab than Appmode). But most importantly, you can now install the repository into your own persistent environment using Reclaim Cloud or Docker.

As Christie Moffatt noted recently harvesting data from web archives can take a long time, and you might hit the limits of the free Binder service. These new integrations mean you don’t have to worry about your notebooks timing out. Just click on the Launch on Reclaim Cloud button and you can have your own fully-provisioned, persistent environment up and running in minutes!

This is possible because every change to the Web Archives repository now triggers the build of a new Docker image with all the software that you need pre-installed. You can also run this Docker image on your own computer, or using another cloud service.

The Web Archives section now includes documentation on running the notebooks using Binder, Reclaim, Cloud or Docker. #dhhacks