With poor old TroveNewsBot killed by the NLA, my Mastodon feed has had less GLAM goodness of late. To try and fill the void I’ve created PROVBot, sharing photos from the Public Record Office Victoria.
PROVBot makes use of the Public Record Office Victoria’s public API. At this stage it just selects and shares a random photograph once a day, but in the future I’ll probably add more features, such as the ability to respond to search queries.
After my Trove API keys were cancelled without warning on 21 February, I reluctantly agreed to a meeting with the National Library of Australia. They had provided so little information in their emails, that it seemed to be the only way to find out what was really going on. I came out of the meeting shocked by the NLA’s change in attitude towards API use.
TL;DR – you’re probably breaching the API terms of use All Trove API users need to be aware that the NLA now insists that accessing the ‘content’ of resources, rather than just the descriptive metadata, is a breach of the API terms of use.
See my latest post for an update!
On Friday, without warning, I received an email from the National Library of Australia informing me that my Trove API keys had been suspended. This threatens the future of 15 years of work helping people use and understand the possibilities of Trove for new types of research.
What’s happened? Here’s the full text of the email:
Your recently published work on the GLAM Workbench regarding extracting metadata and text from a National e-Deposit (NED) periodical has been brought to the Library’s attention.
I’ve created a new site (or in fact, renovated an old site) to aggregate news from GLAM collections (that’s galleries, libraries, archives, and museums) and help researchers using those collections. It’s called The Primary Source which is a bit of a bad history pun.
Why is is needed? Before the nazi takeover of the old bird site, I had a list of GLAM organisation accounts which made it pretty easy to follow what was going on in Australia’s galleries, libraries, archives, and museums.
Since March 2021, I’ve been harvesting details of newly-digitised files in the National Archives of Australia to help document long-term changes to online access. A few weeks ago, I summarised the data from 2024, and published annual compilations in Zenodo. I’ve now created an automatically-updated dashboard which displays digitisation progress in the past week, the current year, and since my harvests began.
Each week, after the latest data harvest, a GitHub action runs a Jupyter notebook that pulls in the data, generates some visualisations and summaries, and saves the results as an HTML page.
I’ve added a notebook to the GLAM Workbench that walks through the steps involved in creating a fully searchable database of content extracted from a periodical uploaded to Trove through the National eDeposit service (NED).
Why is this needed? I was contacted recently by a member of the team that publishes The Triangle, a community newsletter from the south coast of NSW. Issues of The Triangle from 2007 to the present have been uploaded to Trove through the National eDeposit service, but they were wondering whether it was possible to search across all their newsletters in Trove.
I’ve created a new dataset containing 10 years of data that can be used to explore the workings of the National Archives of Australia’s access examination system.
Australian government records become available for public access after 20 years. But before being opened to the public, records go through a process known as access examination to determine whether they should be withheld, either partially or completely. The grounds for exemption are laid out in the Archives Act and include things like national security and personal privacy.
The ARDC is holding an event on 18 February to begin shaping the next phase of the Community Data Lab. If you’re interested in the development of digital tools and resources to support HASS research, I’d suggest you go along.
I worked on the first phase of the Community Data Lab, developing the Trove Data Guide amongst other things. I’m very keen to see the CDL expand, working with researchers to create new possibilities for digital research, particularly using the rich collections of the GLAM sector (galleries, libraries, archives, and museums).
In 2024, the National Archives of Australia digitised 254,953 files (down from 416,602 in 2023). This chart shows the number of files digitised per day in 2024.
The decrease in the total number of files digitised is probably related to the completion of the NAA’s five year project to digitise Second World War service records. Thanks to $10 million in government funding, the NAA has digitised more than a million service records since 2019.
Every Sunday I harvest information about the number of digitised newspaper articles in Trove. You can view the current results in the Trove Data Dashboard. By compiling all the data from 2024, you can find out what changed last year.
6,241,739 digitised newspaper articles were added to Trove in 2024. The rate of digitisation was pretty quick until the end of March when the processing of the Melbourne Sun ended, then things flattened out a bit.
@trovenewsbot has been around for more than eleven years now – originally sharing Trove newspaper articles on Twitter, and now on the Fediverse. But with the imminent closure of the botsin.space Mastodon instance, I’ve had to find it a new home. Say hello to the latest version: @trovenewsbot@wraggebots.net!
Instead of just moving the bot to an existing instance, I decided to set up my own using GoToSocial. I thought this would give me more control, and encourage me to resurrect some more of my old Twitter bots.
A couple of months ago I realised my big, searchable database of Tasmanian Post Office Directories was missing the volume from 1920. It took a bit of work to add it in, as described in this post. Unfortunately, I’d barely finished when I realised that a number of other years were also missing! Argh! The good news is that I’ve been steadily working through these missing volumes, adding one a week, and now I’m finally, finally finished!
Visualisation is a great way to find problems in your data.
As part of the Everyday Heritage project, I’m working with a team to document the lives of Tasmania’s Chinese residents in the 19th and early 20th centuries. We’re using a variety of sources such as Trove’s newspapers, the Tasmanian Names Index, and the Tasmanian Post Office Directories. To help with the research, I converted all the PDF volumes of the Post Office Directories into a public, online, searchable database.
The Trove newspapers section of the GLAM Workbench was updated last week. Over the last year I’ve been gradually updating notebooks to use version 3 of the Trove API, but when version 2 suddenly disappeared a couple of weeks ago I had to hurriedly pull everything together. The Trove newspapers section includes 23 notebooks and 6 datasets, so it’s not a small job. The changes include:
updated all notebooks to use version 3 of the Trove API removed remaining datasets from the code repository and created dedicated data repositories for them, integrating them with Zenodo where appropriate added metadata to all the notebooks – this is used to build an RO-Crate metadata file for the code repository updated all the Python packages added a voila.
It’s pretty obvious that access to digitised resources, like Trove’s newspapers, has changed the practice of history in Australia. But how? I’m certain that the historiographical implications of the growth and development of online collections will become a topic of increasing interest to historians, and that exploration of this topic will lead to important insights into the relationship between what we keep, what we value, and what we know. But for this to happen we need to have data documenting changes in online collections.
I was recently contacted by a researcher who wanted to be able to automatically download the issues of a digitised periodical in Trove as PDFs. There was already a notebook in the GLAM Workbench that downloads the issues of a digitised newspaper as PDFs, but newspapers work differently to other digitised periodicals in Trove. While there was no corresponding notebook for other types of periodicals, all the necessary steps were documented in the Trove Data Guide, so it was just a matter of pulling together a few blocks of code.
Don’t panic! Historic Hansard is not closing down – on the contrary, I’m planning a major update in the next few months. But as I look to the future, I thought it was a good time to pull together a few threads documenting my adventures with Commonwealth Hansard.
The past Commonwealth Hansard is made available online through ParlInfo (there’s an alternative search interface here). The Parliamentary Library has invested a lot of time and effort in converting the printed volumes into nicely-structured XML files which break up the sitting day into debates and speeches, and identify individual speakers.
If you’re interested in opening up GLAM collections for use in research, you might like to join the new Collections as Data Interest Group, part of the Research Data Alliance.
According to the group description:
This group is aimed at collections professionals such as archivists, librarians, records managers and museum curators, as well as related professions such as IT professionals, knowledge scientists, and those involved in standards development, who serve in a range of critical roles: as experts in ensuring access, preservation, and reuse of digital records, objects, data, and collections; as provocateurs for good collections curation practices; and as advocates for the construction of responsible and sustainable infrastructures for information sharing.
The GLAM Name Index Search brings datasets from 10 Australian GLAM organisations together into a single search interface. All these datasets index collections by people’s names, so with one search you can find information about individuals across a broad range of records, locations, and periods. It was created as an experiment during Family History Week in 2021, so I thought I’d update it for Family History Week 2024.
The update added 18 new datasets, so the GLAM Name Index Search now includes 279 datasets from 10 organisations – almost 12 million rows of data!
Good news for Australian archives users – you can now use Zotero to capture item details and digitised files from the collections of the Public Record Office Victoria and the Queensland State Archives!
What is Zotero? According to the Zotero website:
Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share research.
While you can use it instead of commercial reference managers like EndNote, Zotero is much, much more.