Minor change to Reclaim Cloud config

When the 1-click installer for Reclaim Cloud works its magic and turns GLAM Workbench repositories into your own, personal digital labs, it creates a new work directory mounted inside of your main Jupyter directory. This new directory is independent of the Docker image used to run Jupyter, so it’s a handy place to copy things if you ever want to update the Docker image. However, I just realised that there was a permissions problem with the work directory which meant you couldn’t write files to it from within Jupyter.

To fix the problem, I’ve added an extra line to the reclaim-manifest.jps config file to make the Jupyter user the owner of the work directory:

	- cmd[cp]: chown -R jovyan:jovyan /home/jovyan/work

This takes care of any new installations. If you have an existing installation, you can either just create a completely new environment using the updated config, or you can manually change the permissions:

  • Hover over the name of your environment in the control panel to display the option buttons.
  • Click on the Settings button. A new box will open at the bottom of the control panel with all the settings options.
  • Click on ‘SSH Access’ in the left hand menu of the settings box.
  • Click on the ‘SSH Connection’ tab.
  • Under ‘Web SSH’ click on the Connect button and select the default node.
  • A terminal session will open. At the command line enter the following:

    	chown -R jovyan:jovyan /home/jovyan/work

Done! See the Using Reclaim Cloud section of the GLAM Workbench for more information.

Trove Query Parser

Here’s a new little Python package that you might find useful. It simply takes a search url from Trove’s Newspapers & Gazettes category and converts it into a set of parameters that you can use to request data from the Trove API. While some parameters are used both in the web interface and the API, there are a lot of variations – this package means you don’t have to keep track of all the differences!

It’s very simple to use.

How to use the Trove Query Parser.

The code for the parser has been basically lifted from the Trove Newspaper Harvester. I wanted to separate it out so that I could use it at various spots in the GLAM Workbench and in other projects.

This package, the documentation, and the tests were all created using nbdev, which is really quite a fun way to develop Python packages. #dhhacks

Some GLAM Workbench stats

I deliberately don’t keep any stats about GLAM Workbench visits, because I think they’re pretty meaningless. On the other hand, I’m always interested to see how often GLAM Workbench repositories are launched on Binder. Rather than just random clicks, these numbers represent the number of times users started new computing sessions using the GLAM Workbench. I just compiled these stats for the past year, and I was very pleased to see that the Web Archives section has been launched over 1,000 times in the past twelve months! The Trove Newspapers and Trove Newspaper Harvester repositories are also well used – on average these are both being launched more than once a day.

Binder launches of GLAM Workbench repositories, 1 June 2020 to 2 June 2021.

The GLAM Workbench is never going to attract massive numbers of users – it’s all about being there when a researcher needs help to use GLAM collections. One or two launches per day means one or two researchers from somewhere around the world are able to explore new datasets, or ask new questions. I think that’s pretty important.

More Reclaim Cloud integrations!

Five of the GLAM Workbench repositories now have automatically built Docker images and 1-click integration with Reclaim CloudANU Archives, Trove Newspapers, Trove Newspaper Harvester, NAA RecordSearch, & Web Archives.

This means you can launch your very own version of these GLAM Workbench repositories in the cloud, where all your downloads and experiments will be saved! Find out more on the Using Reclaim Cloud page.

Get your GLAM datasets here!

I’ve updated my harvest of Australian GLAM datasets from state/national government open data portals. There’s now 387 datasets, containing 1049 files (including 684 CSVs). There’s a list if you want to browse, and a CSV file if you want to download all the metadata. For more more information see the data portals section of the GLAM Workbench.

Number of datatsets by institution

If you’re interested in finding out what’s inside all those 684 CVS files, take the GLAM CSV Explorer for a spin! It’s also been given a refresh, with new data and a new interface. #dhhacks

NAA RecordSearch section of the GLAM Workbench updated!

If you work with the collections of the National Archives of Australia, you might find the RecordSearch section of the GLAM Workbench helpful. I’ve just updated the repository to add new options for running the notebooks, including 1-click installation on Reclaim Cloud. There’s also a few new notebooks.

New notebooks and datasets


I’ve started (but not completed) updating all the notebooks in this repository to use my new RecordSearch Data Scraper. The new scraper is simpler and more efficient, and enables me to get rid of a lot of boilerplate code. Updated notebooks include:

Other updates include:

  • Python packages updated
  • Integration with Reclaim Cloud allowing 1-click installation of the whole repository and environment
  • Automatic creation of Docker images when the repository is updated
  • Updated README and repository index with list of all notebooks
  • Notebooks intended to run as apps now use Voila rather than Appmode for better integration with Jupyter Lab
  • requirements-unpinned.txt added to repository for people who want to develop the notebooks in their own clean environment

Hope you find these changes useful! #dhhacks

Web archives section of GLAM Workbench updated!

My program of rolling out new features and integrations across the GLAM Workbench continues. The latest section to be updated is the Web Archives section!

There are no new notebooks with this update, but some important changes under the hood. If you haven’t used it before, the Web Archives section contains 16 notebooks providing documentation, tools, apps, and examples to help you make use of web archives in your research. The notebooks are grouped by the following topics: Types of data, Harvesting data and creating datasets, and Exploring change over time.

I’ve updated all the Python packages used in this repository and changed the app-ified notebooks to run using Voila (which is better integrated with Jupyter Lab than Appmode). But most importantly, you can now install the repository into your own persistent environment using Reclaim Cloud or Docker.

As Christie Moffatt noted recently harvesting data from web archives can take a long time, and you might hit the limits of the free Binder service. These new integrations mean you don’t have to worry about your notebooks timing out. Just click on the Launch on Reclaim Cloud button and you can have your own fully-provisioned, persistent environment up and running in minutes!

This is possible because every change to the Web Archives repository now triggers the build of a new Docker image with all the software that you need pre-installed. You can also run this Docker image on your own computer, or using another cloud service.

The Web Archives section now includes documentation on running the notebooks using Binder, Reclaim, Cloud or Docker. #dhhacks

Using web archives to find out when newspapers were added to Trove

There’s no doubt that Trove’s digitised newspapers have had a significant impact on the practice of history in Australia. But analysing that impact is difficult when Trove itself is always changing – more newspapers and articles are being added all the time.

In an attempt to chart the development of Trove, I’ve created a dataset that shows (approximately) when particular newspaper titles were first added. This gives a rough snapshot of what Trove contained at any point in the last 12 years.

I say approximately because the only public source of this information are web archives like the Internet Archive’s Wayback Machine and Trove itself. By downloading captures of Trove’s browse page, I was able to extract a list of newspaper titles available when that capture was made. Depending on the frequency of captures, the titles may have been first made available some time earlier.

The method I used to create the dataset is documented in the Trove Newspapers section of the GLAM Workbench. I used the Internet Archive as my source rather than Trove just because there were more captures available. Most of the code I could conveniently copy from the Web Archives section of the GLAM Workbench, in particular the Find all the archived versions of a particular web page notebook.

The result was actually two datasets:

There’s also an alphabetical list of newspaper titles for easy browsing. The list shows the date of the capture in which the title was first recorded, as well as any changes to its date range. #dhhacks

GLAM Jupyter Resources

To make it easier for people to suggest additions, I’ve created a GitHub repository for my list of GLAM Jupyter examples and resources. Contributions are welcome!

This list is automatically pulled into the GLAM Workbench’s help documentation. #dhhacks

Running notebooks – a sign of things to come in the GLAM Workbench

I recently made some changes in the GLAM Workbench’s Help documentation, adding a new Running notebooks section. This section provides detailed information of running and managing GLAM Workbench repositories using Reclaim Cloud and Docker.

I’m still rolling out this functionality across all the repositories, but it’s going to take a while. When I’m finished you’ll be able to create your own persistent environment on Reclaim Cloud from any repository with just the click of a button. See the Trove Newspapers section to try this out now! #dhhacks

Sponsor my work on GitHub!

As I foreshadowed some weeks ago, I’ve shut down my Patreon page. Thanks to everyone who has supported me there over the last few years!

I’ve now shifted across to GitHub Sponsors, which is focused on supporting open source projects. This seems like a much better fit for the things that I do, which are all free and open by default.

So if you think things like the GLAM Workbench, Historic Hansard, OzGLAM Help, and The Real Face of White Australia are worth supporting, you can sign up using my GitHub Sponsors page. Sponsorship tiers start at just $1 a month. Financially, your contributions help pay some of my cloud hosting bills and keep everything online. But just as important is the encouragement and motivation I get from knowing that there are people out there who think this work is important and useful.

To recognise my GitHub sponsors, I’ve also created a new Supporters page in the GLAM Workbench.


Updates to the Trove Newspapers section of GLAM Workbench

I’ve updated, refreshed, and reorganised the Trove newspapers section of the GLAM Workbench. There’s currently 22 Jupyter notebooks organised under the following headings:

  • Trove newspapers in context – Notebooks in this section look at the Trove newspaper corpus as a whole, to try and understand what’s there, and what’s not.
  • Visualising searches – Notebooks in this section demonstrate some ways of visualising searches in Trove newspapers – seeing everything rather than just a list of search results.
  • Useful tools – Notebooks in this section provide useful tools that extend or enhance the Trove web interface and API.
  • Tips and tricks – Notebooks in this section provide some useful hints to use with the Trove API.
  • Get creative – Notebooks in this section look at ways you can use data from Trove newspapers in creative ways.

There’s also a number of pre-harvested datasets.

Recently refreshed analyses, visualisations, and datasets include:

As part of the update, notebooks that are intended to run as apps (with all the code hidden) have been updated to use Voila. But perhaps the thing I’m most excited about are the new options for running the notebooks. As well as being able to launch the notebooks on Binder, you can now create your very own, persistent environment on Reclaim Cloud with just a click of a button.

There’s also an automatically-built Docker image of this repository, containing everything you need to run the notebooks on your own computer. Check out the new Run these notebooks section for details. I’m gradually rolling this out across all the repositories in the GLAM Workbench. #dhhacks

Introducing the new, improved RecordSearch Data Scraper!

It was way back in 2009 that I created my first scraper for getting machine-readable data out of the National Archives of Australia’s online database, RecordSearch. Since then I’ve used versions of this scraper in a number of different projects such as The Real Face of White Australia, Closed Access, and Redacted (including the recent update). The scraper is also embedded in many of the notebooks that I’ve created for the RecordSearch section of the GLAM Workbench.

However, the scraper was showing its age. The main problem was that one of its dependencies, Robobrowser, is no longer maintained. This made it difficult to update. I’d put off a major rewrite, thinking that RecordSearch itself might be getting a much-needed overhaul, but I could wait no longer. Introducing the brand new RecordSearch Data Scraper.

Just like the old version, the new scraper delivers machine-readable data relating to Items, Series and Agencies – both from individual records, and search results. It also adds a little extra to the basic metadata, for example, if an Item is digitised, the data includes the number of pages in the file. Series records can include the number of digitised files, and the breakdown of files by access category.

The new scraper adds in some additional search parameters for Series and Agencies. It also makes use of a simple caching system to improve speed and efficiency. RecordSearch makes use of an odd assortment of sessions, redirects, and hidden forms, which make scraping a challenge. Hopefully I’ve nailed down the idiosyncrasies, but I expect to catching bugs for a while.

I created the new scraper in Jupyter using NBDev. NBDev helps you to keep your code, examples, tests, and documentation all together in Jupyter notebooks. When you’re ready, it converts the code from the notebooks into distributable Python libraries, runs all your tests, and builds a documentation site. It’s very cool.

Having updated the scraper, I now need to update the notebooks in the GLAM Workbench – more on that soon. The maintenance never ends! #dhhacks

Recently digitised files in the National Archives of Australia

I’m interested in understanding what gets digitised and when by our cultural institutions, but accessible data is scarce. The National Archives of Australia lists ‘newly scanned’ records in RecordSearch, so I thought I’d see if I could convert that list into a machine-readable form for analysis. I’ve had a lot of experience trying to get data out of RecordSearch, but even so it took me a while to figure out how the ‘newly scanned’ page worked. Eventually I was able to extract all the file metadata from the list and save it to a CSV file. The details are in this notebook in the GLAM Workbench.

I used the code to create a dataset of all the files digitised in the past month. The ‘newly scanned’ list only displays a month’s worth of additions, so that’s as much as I could get in one hit. In the past month, 24,039 files were digitised. 22,500 of these (about 93%) come from just four series of military records. This is no surprise, as the NAA is currently undertaking a major project to digitise WW2 service records. What is perhaps more interesting is the long tail of series from which a small number of files were digitised. 357 of the 375 series represented in the dataset (about 95%) appear 20 or fewer times. 210 series have had only one file digitised in the last month. I’m assuming that this diversity represents research interests, refracted through the digitisation on demand service. But this really needs more data, and more analysis.

As I mentioned, only one month’s data is available from RecordSearch at any time. To try and capture a longer record of the digitisation process, I’ve set up an automated ‘git scraper’ that runs every Sunday and captures metadata of all the files digitised in the preceding week. The weekly datasets are saved as CSV files in a public GitHub repository. Over time, this should become a useful dataset for exploring long-term patterns in digitisation. #dhhacks

Moving on from Patreon...

Over the last few years, I’ve been very grateful for the support of my Patreon subscribers. Financially, their contributions have helped me cover a substantial proportion of the cloud hosting costs associated with projects like Historic Hansard and The Real Face of White Australia. But, more importantly, just knowing that they thought my work was of value has helped keep me going, and inspired me to develop a range of new resources.

However, while I’ve been grateful for the platform provided by Patreon, I’ve increasingly felt that it’s not a good fit for the sort of work I do. Patreon is geared towards providing special content to supporters, but, as you know, all my work is open. And that’s really important to me.

Recently GitHub opened up its own sponsorship program for the development of open source software. This program seems to align more closely with what I do. I already share and manage my code through GitHub, so integrating sponsorship seems to make a lot of sense. It’s worth noting too, that, unlike Patreon, GitHub charges no fees and takes no cut of your contributions. As a result I’ve decided to close my Patreon account by the end of April, and create a GitHub sponsors page.

What does this mean for you?

If you’re a Patreon subscriber and you’d like to keep supporting me, you should cancel your Patreon contribution, then head over to my brand new GitHub sponsors page and sign up! Thanks for your continued support!

If you’d prefer to let your contributions lapse, just do nothing. Your payments will stop when I close the account at the end of April. I understand that circumstances change – thank you so much for your support over the years, and I hope you will continue to make use of the things I create.

If you make use of any of my tools or resources and would like to support their continued development, please think about becoming a sponsor. For a sample of the sorts of things I’ve been working on lately, see my updates feed.

The future!

I’m very excited about the possibilities ahead. The GLAM Workbench has received a lot of attention around the world (including a Research Award from the British Library Labs), and I’m planning some major developments over coming months. And, of course, I won’t forget all my other resources – I spent a lot of time in 2020 migrating databases and platforms to keep everything chugging along.

On my GitHub sponsors page, I’ve set an initial target of 50 sponsors. That might be ambitious, but as I said above, it’s not just about money. Being able to point to a group of people who use and value this work will help me argue for new ways of enabling digital research in the humanities. So please help me spread the word – let’s make things together!

What can you do with the GLAM Workbench?

You might have noticed some changes to the GLAM Workbench home page recently. One of the difficulties has always been trying to explain what the GLAM Workbench actually is, so I thought it might be useful to put more examples up front. The home page now lists about 25 notebooks under the headings:

Hopefully they give a decent representation of the sorts of things you can do using the GLAM Workbench. I’ve also included a little rotating slideshow built using Slides.com.

Other recent additions include a new Grants and Awards page. #dhhacks

Reclaim Cloud integration coming soon to the GLAM Workbench

I’ve been doing a bit of work behind the scenes lately to prepare for a major update to the GLAM Workbench. My plan is to provide one click installation of any of the GLAM Workbench repositories on the Reclaim Cloud platform. This will provide a useful step up from Binder for any researcher who wants to do large-scale or sustained work using the GLAM Workbench. Reclaim Cloud is a paid service, but they do a great job supporting digital scholarship in the humanities, and it’s fairly easy to minimise your costs by shutting down environments when they’re not in use.

I’ve still got a lot of work to do to roll this out across the GLAM Workbench’s 40 repositories, but if you’d like a preview head to the Trove Newspaper and Gazette Harvester repository on GitHub. Get yourself a Reclaim Cloud account and click on the Launch on Reclaim Cloud button. It’s that easy!

There’s some technical notes in the Reclaim Hosting forum, and a post by Reclaim Hosting guru Jim Groom describing his own experience spinning up the GLAM Workbench.

Watch this space for more news! #dhhacks

Some recent GLAM Workbench presentations

I’ve given a couple of talks lately on the GLAM Workbench and some of my other work relating to the construction of online access to GLAM collections. Videos and slides are available for both:

  • From collections as data to collections as infrastructure: Building the GLAM Workbench, seminar for the Centre for Creative and Cultural Research, University of Canberra, 22 February 2021 – video (40 minutes) and slides

I’ve also updated the presentations page in the GLAM Workbench. #dhhacks

Some GLAM Workbench datasets to explore for Open Data Day

It was Open Data Day on Saturday 6 March – here’s some of the ready-to-go datasets you can find in the GLAM Workbench – there’s something for historians, humanities researchers, teachers & more!

And if that’s not enough data, the GLAM Workbench provides tools to help you create your own datasets from Trove, the National Archives of Australia, the National Museum of Australia, Archives NZ, DigitalNZ, & more! #dhhacks

The NAA recently changed field labels in RecordSearch, so that ‘Barcode’ is now ‘Item ID’. This required an update to my recordsearch_tools screen scraper. I also had to make a few changes in the RecordSearch section of the GLAM Workbench. #dhhacks

New! DigitalNZ API Query Builder added to GLAM Workbench

I’ve added an API Query Builder to the DigitalNZ section of the GLAM Workbench. You can use it to learn about the different parameters available from the search API, and experiment with different queries. Just get your API key from DigitalNZ, then try entering keywords and selecting options. Once you understand how the API works, you can start thinking about how you can make use of it in your own projects.

👉🏻 Try it out live on Binder!

Under the hood the API Query Builder is a Jupyter notebook (of course), but it uses ipyvuetify to create good-looking, responsive, form widgets. It’s intended to be run using Voilà, which turns notebooks into interactive apps and dashboards. You can now run any Jupyter notebook using Voilà on Binder, just by changing the url.

If this app seems useful (let me know!) I might put a version on Heroku so the start up time is reduced. I’m also thinking of using this sort of pattern to create apps for other APIs in the GLAM Workbench. #dhhacks

OpenGLAM fireworks! Finding open collections in DigitalNZ

Lately I’ve been updating and expanding the notebooks in the DigitalNZ section of the GLAM Workbench. In particular, I’ve been looking at the usage facet to understand how much of the aggregated content is ‘open’. What do I mean by ‘open’? The Open Knowledge Foundation definition states that ‘open data and content can be freely used, modified, and shared by anyone for any purpose’. Obviously things that are in the public domain, such as out-of-copyright resources, are open. But so are resources with an open licence such as CC-BY or CC-BY-SA. The Creative Commons ‘Non commercial’ and ‘No derivatives’ licences are not open because they put limits on how you can use resources.

How does this definition map to DigitalNZ? The usage facet includes five values:

  • Share
  • Modify
  • Use commercially
  • All rights reserved
  • Unknown

These values have been assigned by DigitalNZ based on the 35,000 different rights statements and 30 different copyright statements that are included in DigitalNZ metadata records. I find I have to turn the usage values inside out to really understand them. A resource that only allows you to ‘Share’, excludes the ‘Modify’ and ‘Use commercially’ permissions and so is sort of equivalent to a CC-BY-ND-NC licence. The only open value, according to the definition above, is ‘Use commercially’, which is like CC-BY. I’m assuming that ‘Use commercially’ has been assigned to resources that either out of copyright (or with no known copyright restrictions) or are openly licensed.

It’s also worth noting that the ‘usage’ values are not mutually-exclusive. A record with a ‘usage’ value of ‘Use commercially’, will also be assigned ‘Share’ and ‘Modify’ values. This is because ‘Use commercially’ includes the ‘Share’ and ‘Modify’ permissions. This seems a bit counter-intuitive, but makes sense if you think about doing a search for everything you’re allowed to share.

A rough calculation based on the usage facet indicates that 71.76% of the resources aggregated by DigitalNZ are open. That seems pretty good, though a lot of those are probably out-of-copyright newspaper articles from Papers Past. For a more fine-grained analysis, I decided to look at the ‘usage’ data for each combination of ‘content_partner’ and ‘primary_collection’. How open is each individual collection in DigitalNZ?

For added excitement, and to stretch my knowledge of what Altair can do, I decided to visualise the results as display of colourful fireworks. The higher the explosion, the more open the collection! I’m pretty pleased with the result.

I’ve saved a HTML version of the chart so you can mouseover the explosions for more details. All the code is included in this notebook, along with a CSV file containing all the harvested facet data. #dhhacks

New dataset and notebooks – twenty years of ABC Radio National

There’s a new GLAM Workbench section for working with data from Trove’s Music & Sound zone!

Inside you’ll find out how to harvest all the metadata from ABC Radio National program records – that’s 400,000+ records, from 160 Radio National programs, over more than 20 years.

It’s metadata only, so not full transcripts or audio, though there are links back to the ABC site where you might find transcripts. Most records should at least have a title, a date, the name of the program it was broadcast on, a list of contributors, and perhaps a brief abstract/summary. It’s also worth noting that many of these records, particularly those from the main current affairs programs, represent individual stories or segments – so they provide a detailed record of the major news stories for the last couple of decades!

The harvesting notebook shows you how to get the data from the Trove API. There are a number of duplicate records, and some inconsistencies in the way the data is formatted, so the harvesting code tries to clean things up a bit. You can of course adjust this to meet your own needs.

If you don’t want to do the harvesting yourself, there’s pre-harvested datasets that you can download immediately from Cloudstor and start exploring. The complete harvest of all 400,000+ records is available both in JSONL (newline separated JSON) and CSV formats. There’s also a series of separate datasets for the most frequently occurring programs: RN Breakfast, RN Drive, AM, PM, The World Today, Late Night Live, Life Matters, and the Science Show.

There’s also a notebook that demonstrates a few possible ways you might start to play with the data – looking at the range of programs, the distribution of records over time, the people involved in each story, and words in the titles of each segment.

This is a very rich source of data for examining Australia’s political and social history over the last twenty years. Dive in and see what you can find! #dhhacks

GLAM Workbench wins British Library Labs Research Award!

Asking questions with web archives – introductory notebooks for historians has won the British Library Labs Research Award for 2020. The awards recognise ‘exceptional projects that have used the Library’s digital collections and data’.

This project gave me a chance to work with web archives collections and staff from the British Library, the National Library of Australia, and the National Library of New Zealand, and was supported by the International Internet Preservation Consortium’s Discretionary Funding Program.

We developed a range of tools, examples, and documentation to help researchers use and explore the vast historical resources available through web archives. A new web archives section was added to the GLAM Workbench, and 16 Jupyter notebooks, combining text, images, and live code, were created.

Here’s a 30 second summary of the project!

The judges noted:

“The panel were impressed with the level of documentation and thought that went into how to work computationally through Jupyter notebooks with web archives which are challenging to work with because of their size. These tools were some of the first of their kind.

“The Labs Advisory Board wanted to acknowledge and reward the incredible work of Tim Sherratt in particular. Tim you have been a pioneer as a one-person lab over many years and these 16 notebooks are a fine addition to your already extensive suite in your GLAM Workbench. Your work has inspired so many in GLAM, the humanities community, and BL Labs to develop their own notebooks. To our audience, we strongly recommend that you look at the GLAM Workbench if you’re interested in doing computational experiments with many institutions’ data sources.

Thanks to Andy, Olga, Alex, and Ben for your advice and support. And thanks to the British Library Labs for the award! #dhhacks

The GLAM Workbench as research infrastructure (some basic stats)

Repositories in the GLAM Workbench have been launched on Binder 3,529 times since the start of this year (according to data from the Binder Events log). That’s repository launches, not notebooks. Having launched a repository, users might use multiple notebooks. And of course these stats don’t include people using the notebooks in contexts other than Binder – on their own machines, servers, or services like AARNet’s SWAN. Or just viewing the notebooks in GitHub and copying code into their own projects.

I’m suspicious of web stats, but the Binder data indicates that people have actually done more than ‘visit’ – they’ve spun up a Binder session ready to do some exploration.

Every Jupyter notebook in the GLAM Workbench has a link that opens the notebook in Binder. If you click on the link, Binder reads configuration details from the repository and loads a customised computing environment. All in your browser! That means you can start using the GLAM Workbench without installing any software. Just click on the Binder link and start exploring!

There are about 40 different repositories in the GLAM Workbench, helping you work with data from Trove, DigitalNZ, NAA, SLNSW, NSW Archives, NMA, ArchivesNZ, ANU Archives & more! The image below shows them ranked by number of Binder launches this year.

The web archives section was added this year in collaboration with the IIPC, the UK Web Archive, the Australian Web Archive, and the NZ Web Archive. Its annual number of launches is inflated a bit by the development process. But there’s been 426 launches since it went public in June.

I’m really pleased to see the Trove newspaper harvester up near the top. At least once a day (on average) someone’s been firing up the repository to grab Trove newspaper articles in bulk.

Overall, that’s about 11 GLAM Workbench repository launches a day on Binder. It might not seem like much, but that’s 11 research opportunities that didn’t exist before, 11 GLAM collections opened to exploration, 11 researchers building their digital skills…

As humanities researchers continue to learn of the possibilities of GLAM data and develop their digital skills the numbers will grow. It’s a start. And a reminder that not all research infrastructure needs to be built in Go8 unis, by large teams, with $millions. We can all contribute by sharing our tools and methods. #dhhacks