glamworkbench

Using the Public Record Office Victoria's API to build an overview of their collection

Thursday, April 10, 2025

Over the past few weeks I’ve been exploring the Public Record Office Victoria’s public API. There’s not a lot of documentation, but there is a lot of data! What’s not immediately obvious is that the API includes information about a variety of different entities within the PROV’s model for archival description – not just items, but functions, agencies, series and more. You can limit your API requests to a particular entity using the category field.

Continue reading →

More than 6 million rows of data from Public Record Office Victoria added to the GLAM Name Index Search

Wednesday, April 9, 2025

The GLAM Name Index Search now includes more than 6 million rows of data from the Public Record Office Victoria, downloaded using their public API. The GLAM Name Index Search brings together records that include the names of people from 10 Australian GLAM organisations. With a single search, you can find information about individuals across millions of rows of data. Previous versions of the GLAM Name Index Search included a few datasets from the Public Record Office Victoria that had been shared through government open data portals.

Continue reading →

Introducing PROVBot – sharing photos from Public Record Office Victoria

Wednesday, April 9, 2025

With poor old TroveNewsBot killed by the NLA, my Mastodon feed has had less GLAM goodness of late. To try and fill the void I’ve created PROVBot, sharing photos from the Public Record Office Victoria. PROVBot makes use of the Public Record Office Victoria’s public API. At this stage it just selects and shares a random photograph once a day, but in the future I’ll probably add more features, such as the ability to respond to search queries.

Continue reading →

Trove API users beware! – the latest in the saga of my cancelled API keys

Sunday, March 2, 2025

After my Trove API keys were cancelled without warning on 21 February, I reluctantly agreed to a meeting with the National Library of Australia. They had provided so little information in their emails, that it seemed to be the only way to find out what was really going on. I came out of the meeting shocked by the NLA’s change in attitude towards API use. TL;DR – you’re probably breaching the API terms of use All Trove API users need to be aware that the NLA now insists that accessing the ‘content’ of resources, rather than just the descriptive metadata, is a breach of the API terms of use.

Continue reading →

15 years of work on Trove threatened by the NLA

Monday, February 24, 2025

See my latest post for an update! On Friday, without warning, I received an email from the National Library of Australia informing me that my Trove API keys had been suspended. This threatens the future of 15 years of work helping people use and understand the possibilities of Trove for new types of research. What’s happened? Here’s the full text of the email: Your recently published work on the GLAM Workbench regarding extracting metadata and text from a National e-Deposit (NED) periodical has been brought to the Library’s attention.

Continue reading →

The Primary Source – GLAM collection news and help

Thursday, February 20, 2025

I’ve created a new site (or in fact, renovated an old site) to aggregate news from GLAM collections (that’s galleries, libraries, archives, and museums) and help researchers using those collections. It’s called The Primary Source which is a bit of a bad history pun. Why is is needed? Before the nazi takeover of the old bird site, I had a list of GLAM organisation accounts which made it pretty easy to follow what was going on in Australia’s galleries, libraries, archives, and museums.

Continue reading →

National Archives of Australia Digitisation Dashboard

Thursday, February 20, 2025

Since March 2021, I’ve been harvesting details of newly-digitised files in the National Archives of Australia to help document long-term changes to online access. A few weeks ago, I summarised the data from 2024, and published annual compilations in Zenodo. I’ve now created an automatically-updated dashboard which displays digitisation progress in the past week, the current year, and since my harvests began. Each week, after the latest data harvest, a GitHub action runs a Jupyter notebook that pulls in the data, generates some visualisations and summaries, and saves the results as an HTML page.

Continue reading →

Search the content of periodicals uploaded to Trove through the National eDeposit service

Wednesday, February 19, 2025

I’ve added a notebook to the GLAM Workbench that walks through the steps involved in creating a fully searchable database of content extracted from a periodical uploaded to Trove through the National eDeposit service (NED). Why is this needed? I was contacted recently by a member of the team that publishes The Triangle, a community newsletter from the south coast of NSW. Issues of The Triangle from 2007 to the present have been uploaded to Trove through the National eDeposit service, but they were wondering whether it was possible to search across all their newsletters in Trove.

Continue reading →

Ten years of data! The files you're not allowed to see in the National Archives of Australia

Wednesday, February 5, 2025

I’ve created a new dataset containing 10 years of data that can be used to explore the workings of the National Archives of Australia’s access examination system. Australian government records become available for public access after 20 years. But before being opened to the public, records go through a process known as access examination to determine whether they should be withheld, either partially or completely. The grounds for exemption are laid out in the Archives Act and include things like national security and personal privacy.

Continue reading →

A Community Data Lab (CDL) wishlist

Friday, January 31, 2025

The ARDC is holding an event on 18 February to begin shaping the next phase of the Community Data Lab. If you’re interested in the development of digital tools and resources to support HASS research, I’d suggest you go along. I worked on the first phase of the Community Data Lab, developing the Trove Data Guide amongst other things. I’m very keen to see the CDL expand, working with researchers to create new possibilities for digital research, particularly using the rich collections of the GLAM sector (galleries, libraries, archives, and museums).

Continue reading →

Files digitised by the National Archives of Australia in 2024

Monday, January 27, 2025

In 2024, the National Archives of Australia digitised 254,953 files (down from 416,602 in 2023). This chart shows the number of files digitised per day in 2024. The decrease in the total number of files digitised is probably related to the completion of the NAA’s five year project to digitise Second World War service records. Thanks to $10 million in government funding, the NAA has digitised more than a million service records since 2019.

Continue reading →

Changes to Trove newspapers in 2024

Friday, January 17, 2025

Every Sunday I harvest information about the number of digitised newspaper articles in Trove. You can view the current results in the Trove Data Dashboard. By compiling all the data from 2024, you can find out what changed last year. 6,241,739 digitised newspaper articles were added to Trove in 2024. The rate of digitisation was pretty quick until the end of March when the processing of the Melbourne Sun ended, then things flattened out a bit.

Continue reading →

@trovenewsbot has a new home

Thursday, December 12, 2024

@trovenewsbot has been around for more than eleven years now – originally sharing Trove newspaper articles on Twitter, and now on the Fediverse. But with the imminent closure of the botsin.space Mastodon instance, I’ve had to find it a new home. Say hello to the latest version: @trovenewsbot@wraggebots.net! Instead of just moving the bot to an existing instance, I decided to set up my own using GoToSocial. I thought this would give me more control, and encourage me to resurrect some more of my old Twitter bots.

Continue reading →

Six more volumes added to the searchable database of Tasmanian Post Office Directories!

Thursday, November 21, 2024

A couple of months ago I realised my big, searchable database of Tasmanian Post Office Directories was missing the volume from 1920. It took a bit of work to add it in, as described in this post. Unfortunately, I’d barely finished when I realised that a number of other years were also missing! Argh! The good news is that I’ve been steadily working through these missing volumes, adding one a week, and now I’m finally, finally finished!

Continue reading →

Where's 1920? Missing volume added to Tasmanian Post Office Directories!

Thursday, September 26, 2024

Visualisation is a great way to find problems in your data. As part of the Everyday Heritage project, I’m working with a team to document the lives of Tasmania’s Chinese residents in the 19th and early 20th centuries. We’re using a variety of sources such as Trove’s newspapers, the Tasmanian Names Index, and the Tasmanian Post Office Directories. To help with the research, I converted all the PDF volumes of the Post Office Directories into a public, online, searchable database.

Continue reading →

Major update for the Trove Newspapers section of the GLAM Workbench

Monday, September 23, 2024

The Trove newspapers section of the GLAM Workbench was updated last week. Over the last year I’ve been gradually updating notebooks to use version 3 of the Trove API, but when version 2 suddenly disappeared a couple of weeks ago I had to hurriedly pull everything together. The Trove newspapers section includes 23 notebooks and 6 datasets, so it’s not a small job. The changes include: updated all notebooks to use version 3 of the Trove API removed remaining datasets from the code repository and created dedicated data repositories for them, integrating them with Zenodo where appropriate added metadata to all the notebooks – this is used to build an RO-Crate metadata file for the code repository updated all the Python packages added a voila.

Continue reading →

Preserving the history of online collections (my love letter to future historians)

Friday, September 20, 2024

It’s pretty obvious that access to digitised resources, like Trove’s newspapers, has changed the practice of history in Australia. But how? I’m certain that the historiographical implications of the growth and development of online collections will become a topic of increasing interest to historians, and that exploration of this topic will lead to important insights into the relationship between what we keep, what we value, and what we know. But for this to happen we need to have data documenting changes in online collections.

Continue reading →

Saving Trove's digitised periodicals as PDFs

Thursday, September 19, 2024

I was recently contacted by a researcher who wanted to be able to automatically download the issues of a digitised periodical in Trove as PDFs. There was already a notebook in the GLAM Workbench that downloads the issues of a digitised newspaper as PDFs, but newspapers work differently to other digitised periodicals in Trove. While there was no corresponding notebook for other types of periodicals, all the necessary steps were documented in the Trove Data Guide, so it was just a matter of pulling together a few blocks of code.

Continue reading →

The future (and past) of Historic Hansard

Thursday, August 29, 2024

Don’t panic! Historic Hansard is not closing down – on the contrary, I’m planning a major update in the next few months. But as I look to the future, I thought it was a good time to pull together a few threads documenting my adventures with Commonwealth Hansard. The past Commonwealth Hansard is made available online through ParlInfo (there’s an alternative search interface here). The Parliamentary Library has invested a lot of time and effort in converting the printed volumes into nicely-structured XML files which break up the sitting day into debates and speeches, and identify individual speakers.

Continue reading →

Join the Research Data Alliance's new Collections as Data Interest Group!

Tuesday, August 27, 2024

If you’re interested in opening up GLAM collections for use in research, you might like to join the new Collections as Data Interest Group, part of the Research Data Alliance. According to the group description: This group is aimed at collections professionals such as archivists, librarians, records managers and museum curators, as well as related professions such as IT professionals, knowledge scientists, and those involved in standards development, who serve in a range of critical roles: as experts in ensuring access, preservation, and reuse of digital records, objects, data, and collections; as provocateurs for good collections curation practices; and as advocates for the construction of responsible and sustainable infrastructures for information sharing.

Continue reading →

More datasets added to GLAM Name Index Search – now almost 12 million rows of data!

Monday, August 26, 2024

The GLAM Name Index Search brings datasets from 10 Australian GLAM organisations together into a single search interface. All these datasets index collections by people’s names, so with one search you can find information about individuals across a broad range of records, locations, and periods. It was created as an experiment during Family History Week in 2021, so I thought I’d update it for Family History Week 2024. The update added 18 new datasets, so the GLAM Name Index Search now includes 279 datasets from 10 organisations – almost 12 million rows of data!

Continue reading →

New Zotero translators for PROV and Queensland State Archives

Thursday, August 22, 2024

Good news for Australian archives users – you can now use Zotero to capture item details and digitised files from the collections of the Public Record Office Victoria and the Queensland State Archives! What is Zotero? According to the Zotero website: Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share research. While you can use it instead of commercial reference managers like EndNote, Zotero is much, much more.

Continue reading →

Explore Trove's digitised maps

Friday, August 16, 2024

Trove contains thousands of digitised maps from the collections of the National Library of Australia, but they’re not always easy to find because of the way they’re arranged and described. To help you explore these maps I’ve created a new database and published it using Datasette. Try it now! To get started, head to the map sheets table and search for some keywords. The results are displayed both as a cluster map using Leaflet, and as a table.

Continue reading →

Share your spreadsheet as a searchable online database using Datasette-Lite

Friday, July 19, 2024

HASS researchers often compile data in spreadsheets. Sometimes they want to ‘publish’ this data online in a form that encourages others to use and explore – but how? I’ve just added a simple tool to the GLAM Workbench that helps you construct a url that will open a CSV file as a searchable database using Datasette-Lite. What’s Datasette? Datasette is a fantastic tool that helps you publish your data as an interactive website.

Continue reading →

Updated datasets describing Trove's digitised newspapers

Tuesday, July 9, 2024

The Trove newspapers section of the GLAM Workbench includes a number of notebooks and datasets that document the context and content of the newspaper corpus. I’ve just updated a few of these datasets: Total number of issues per year for each newspaper in Trove Complete list of issues for every newspaper in Trove Trove newspapers with non-English language content Trove newspapers with articles published after 1954 OCR corrections in Trove newspapers I’ve also used the issues data to update my visualisation of the number of digitised newspaper issues in Trove published every day from 1803 to 2021 (there’s a lot of data so it can take a little while to load!

Continue reading →

Understanding Trove at the AHA annual conference

Monday, July 1, 2024

A fairly intensive period of work came to an end today as I delivered a workshop on ‘Understanding Trove’ at the Australian Historical Association’s annual conference in Adelaide. In effect, the workshop was also the launch of the Trove Data Guide, which I’ve been developing as part of the ARDC’s Community Data Lab. The ARDC sponsored today’s workshop and has provided bursaries to help five ECRs and HDRs participate in the conference’s digital history stream.

Continue reading →

Who is the Trove Data Guide for?

Friday, June 21, 2024

The Trove Data Guide aims to help researchers understand, access, and use data from Trove. But just because it’s about ‘data’ doesn’t mean you need to be able to code. To understand Trove data and its possibilities for research, you first need to understand Trove itself – its history, its structure, its assumptions, and its limits. This knowledge is useful to any Trove user. For example, all Trove users would benefit from knowing more about works and versions, or how to use the ‘simple’ search box for complex queries.

Continue reading →

Loading locations of Trove's digitised maps into the Gazetteer of Historical Australian Placenames

Tuesday, June 18, 2024

For this part of the ARDC’s Community Data Lab project, I’ve been focusing in particular on adding a series of researcher pathways to the Trove Data Guide. These pathways link data from Trove to a variety of tools and approaches and include five detailed tutorials. The first four were: Analysing keywords in Trove’s digitised newspapers Working with a Trove collection in Tropy Comparing manuscript collections in Mirador Sharing a Trove List as a CollectionBuilder exhibition I’ve now added the fifth and final (for now) tutorial:

Continue reading →

Instant exhibitions with Trove and CollectionBuilder

Monday, June 10, 2024

You’ve been collecting and annotating items relating to your research project in a Trove List. You’d like to display the contents of your list as an online exhibition for others to explore. But how? One possible approach is now documented in the Trove Data Guide. I’ve added a tutorial which walks through the process of using a GLAM Workbench notebook to extract and process data from a Trove List, before uploading it to CollectionBuilder to create an instant exhibition.

Continue reading →

Keyword analysis of Trove newspapers with the GLAM Workbench & ATAP

Monday, June 3, 2024

There’s a new draft tutorial in the development version of the Trove Data Guide. It walks through the process of harvesting a collection of digitised newspaper articles from Trove, reshaping the harvest to create sub-collections, and then loading the data into the Keyword Analysis Tool provided by the Australian Text Analytics Platform (ATAP). Along the way it goes into a fair bit of detail about constructing searches, using the Trove Newspaper Harvester, and thinking about your data.

Continue reading →