glamworkbench

The Primary Source – GLAM collection news and help

I’ve created a new site (or in fact, renovated an old site) to aggregate news from GLAM collections (that’s galleries, libraries, archives, and museums) and help researchers using those collections. It’s called The Primary Source which is a bit of a bad history pun. Why is is needed? Before the nazi takeover of the old bird site, I had a list of GLAM organisation accounts which made it pretty easy to follow what was going on in Australia’s galleries, libraries, archives, and museums.

Continue reading →

National Archives of Australia Digitisation Dashboard

Since March 2021, I’ve been harvesting details of newly-digitised files in the National Archives of Australia to help document long-term changes to online access. A few weeks ago, I summarised the data from 2024, and published annual compilations in Zenodo. I’ve now created an automatically-updated dashboard which displays digitisation progress in the past week, the current year, and since my harvests began. Each week, after the latest data harvest, a GitHub action runs a Jupyter notebook that pulls in the data, generates some visualisations and summaries, and saves the results as an HTML page.

Continue reading →

Search the content of periodicals uploaded to Trove through the National eDeposit service

I’ve added a notebook to the GLAM Workbench that walks through the steps involved in creating a fully searchable database of content extracted from a periodical uploaded to Trove through the National eDeposit service (NED). Why is this needed? I was contacted recently by a member of the team that publishes The Triangle, a community newsletter from the south coast of NSW. Issues of The Triangle from 2007 to the present have been uploaded to Trove through the National eDeposit service, but they were wondering whether it was possible to search across all their newsletters in Trove.

Continue reading →

Ten years of data! The files you're not allowed to see in the National Archives of Australia

I’ve created a new dataset containing 10 years of data that can be used to explore the workings of the National Archives of Australia’s access examination system. Australian government records become available for public access after 20 years. But before being opened to the public, records go through a process known as access examination to determine whether they should be withheld, either partially or completely. The grounds for exemption are laid out in the Archives Act and include things like national security and personal privacy.

Continue reading →

A Community Data Lab (CDL) wishlist

The ARDC is holding an event on 18 February to begin shaping the next phase of the Community Data Lab. If you’re interested in the development of digital tools and resources to support HASS research, I’d suggest you go along. I worked on the first phase of the Community Data Lab, developing the Trove Data Guide amongst other things. I’m very keen to see the CDL expand, working with researchers to create new possibilities for digital research, particularly using the rich collections of the GLAM sector (galleries, libraries, archives, and museums).

Continue reading →

Files digitised by the National Archives of Australia in 2024

In 2024, the National Archives of Australia digitised 254,953 files (down from 416,602 in 2023). This chart shows the number of files digitised per day in 2024. The decrease in the total number of files digitised is probably related to the completion of the NAA’s five year project to digitise Second World War service records. Thanks to $10 million in government funding, the NAA has digitised more than a million service records since 2019.

Continue reading →

Changes to Trove newspapers in 2024

Every Sunday I harvest information about the number of digitised newspaper articles in Trove. You can view the current results in the Trove Data Dashboard. By compiling all the data from 2024, you can find out what changed last year. 6,241,739 digitised newspaper articles were added to Trove in 2024. The rate of digitisation was pretty quick until the end of March when the processing of the Melbourne Sun ended, then things flattened out a bit.

Continue reading →

@trovenewsbot has a new home

@trovenewsbot has been around for more than eleven years now – originally sharing Trove newspaper articles on Twitter, and now on the Fediverse. But with the imminent closure of the botsin.space Mastodon instance, I’ve had to find it a new home. Say hello to the latest version: @trovenewsbot@wraggebots.net! Instead of just moving the bot to an existing instance, I decided to set up my own using GoToSocial. I thought this would give me more control, and encourage me to resurrect some more of my old Twitter bots.

Continue reading →

Six more volumes added to the searchable database of Tasmanian Post Office Directories!

A couple of months ago I realised my big, searchable database of Tasmanian Post Office Directories was missing the volume from 1920. It took a bit of work to add it in, as described in this post. Unfortunately, I’d barely finished when I realised that a number of other years were also missing! Argh! The good news is that I’ve been steadily working through these missing volumes, adding one a week, and now I’m finally, finally finished!

Continue reading →

Where's 1920? Missing volume added to Tasmanian Post Office Directories!

Visualisation is a great way to find problems in your data. As part of the Everyday Heritage project, I’m working with a team to document the lives of Tasmania’s Chinese residents in the 19th and early 20th centuries. We’re using a variety of sources such as Trove’s newspapers, the Tasmanian Names Index, and the Tasmanian Post Office Directories. To help with the research, I converted all the PDF volumes of the Post Office Directories into a public, online, searchable database.

Continue reading →

Major update for the Trove Newspapers section of the GLAM Workbench

The Trove newspapers section of the GLAM Workbench was updated last week. Over the last year I’ve been gradually updating notebooks to use version 3 of the Trove API, but when version 2 suddenly disappeared a couple of weeks ago I had to hurriedly pull everything together. The Trove newspapers section includes 23 notebooks and 6 datasets, so it’s not a small job. The changes include: updated all notebooks to use version 3 of the Trove API removed remaining datasets from the code repository and created dedicated data repositories for them, integrating them with Zenodo where appropriate added metadata to all the notebooks – this is used to build an RO-Crate metadata file for the code repository updated all the Python packages added a voila.

Continue reading →

Preserving the history of online collections (my love letter to future historians)

It’s pretty obvious that access to digitised resources, like Trove’s newspapers, has changed the practice of history in Australia. But how? I’m certain that the historiographical implications of the growth and development of online collections will become a topic of increasing interest to historians, and that exploration of this topic will lead to important insights into the relationship between what we keep, what we value, and what we know. But for this to happen we need to have data documenting changes in online collections.

Continue reading →

Saving Trove's digitised periodicals as PDFs

I was recently contacted by a researcher who wanted to be able to automatically download the issues of a digitised periodical in Trove as PDFs. There was already a notebook in the GLAM Workbench that downloads the issues of a digitised newspaper as PDFs, but newspapers work differently to other digitised periodicals in Trove. While there was no corresponding notebook for other types of periodicals, all the necessary steps were documented in the Trove Data Guide, so it was just a matter of pulling together a few blocks of code.

Continue reading →

The future (and past) of Historic Hansard

Don’t panic! Historic Hansard is not closing down – on the contrary, I’m planning a major update in the next few months. But as I look to the future, I thought it was a good time to pull together a few threads documenting my adventures with Commonwealth Hansard. The past Commonwealth Hansard is made available online through ParlInfo (there’s an alternative search interface here). The Parliamentary Library has invested a lot of time and effort in converting the printed volumes into nicely-structured XML files which break up the sitting day into debates and speeches, and identify individual speakers.

Continue reading →

Join the Research Data Alliance's new Collections as Data Interest Group!

If you’re interested in opening up GLAM collections for use in research, you might like to join the new Collections as Data Interest Group, part of the Research Data Alliance. According to the group description: This group is aimed at collections professionals such as archivists, librarians, records managers and museum curators, as well as related professions such as IT professionals, knowledge scientists, and those involved in standards development, who serve in a range of critical roles: as experts in ensuring access, preservation, and reuse of digital records, objects, data, and collections; as provocateurs for good collections curation practices; and as advocates for the construction of responsible and sustainable infrastructures for information sharing.

Continue reading →

More datasets added to GLAM Name Index Search – now almost 12 million rows of data!

The GLAM Name Index Search brings datasets from 10 Australian GLAM organisations together into a single search interface. All these datasets index collections by people’s names, so with one search you can find information about individuals across a broad range of records, locations, and periods. It was created as an experiment during Family History Week in 2021, so I thought I’d update it for Family History Week 2024. The update added 18 new datasets, so the GLAM Name Index Search now includes 279 datasets from 10 organisations – almost 12 million rows of data!

Continue reading →

New Zotero translators for PROV and Queensland State Archives

Good news for Australian archives users – you can now use Zotero to capture item details and digitised files from the collections of the Public Record Office Victoria and the Queensland State Archives! What is Zotero? According to the Zotero website: Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share research. While you can use it instead of commercial reference managers like EndNote, Zotero is much, much more.

Continue reading →

Explore Trove's digitised maps

Trove contains thousands of digitised maps from the collections of the National Library of Australia, but they’re not always easy to find because of the way they’re arranged and described. To help you explore these maps I’ve created a new database and published it using Datasette. Try it now! To get started, head to the map sheets table and search for some keywords. The results are displayed both as a cluster map using Leaflet, and as a table.

Continue reading →

Share your spreadsheet as a searchable online database using Datasette-Lite

HASS researchers often compile data in spreadsheets. Sometimes they want to ‘publish’ this data online in a form that encourages others to use and explore – but how? I’ve just added a simple tool to the GLAM Workbench that helps you construct a url that will open a CSV file as a searchable database using Datasette-Lite. What’s Datasette? Datasette is a fantastic tool that helps you publish your data as an interactive website.

Continue reading →

Updated datasets describing Trove's digitised newspapers

The Trove newspapers section of the GLAM Workbench includes a number of notebooks and datasets that document the context and content of the newspaper corpus. I’ve just updated a few of these datasets: Total number of issues per year for each newspaper in Trove Complete list of issues for every newspaper in Trove Trove newspapers with non-English language content Trove newspapers with articles published after 1954 OCR corrections in Trove newspapers I’ve also used the issues data to update my visualisation of the number of digitised newspaper issues in Trove published every day from 1803 to 2021 (there’s a lot of data so it can take a little while to load!

Continue reading →

Understanding Trove at the AHA annual conference

A fairly intensive period of work came to an end today as I delivered a workshop on ‘Understanding Trove’ at the Australian Historical Association’s annual conference in Adelaide. In effect, the workshop was also the launch of the Trove Data Guide, which I’ve been developing as part of the ARDC’s Community Data Lab. The ARDC sponsored today’s workshop and has provided bursaries to help five ECRs and HDRs participate in the conference’s digital history stream.

Continue reading →

Who is the Trove Data Guide for?

The Trove Data Guide aims to help researchers understand, access, and use data from Trove. But just because it’s about ‘data’ doesn’t mean you need to be able to code. To understand Trove data and its possibilities for research, you first need to understand Trove itself – its history, its structure, its assumptions, and its limits. This knowledge is useful to any Trove user. For example, all Trove users would benefit from knowing more about works and versions, or how to use the ‘simple’ search box for complex queries.

Continue reading →

Loading locations of Trove's digitised maps into the Gazetteer of Historical Australian Placenames

For this part of the ARDC’s Community Data Lab project, I’ve been focusing in particular on adding a series of researcher pathways to the Trove Data Guide. These pathways link data from Trove to a variety of tools and approaches and include five detailed tutorials. The first four were: Analysing keywords in Trove’s digitised newspapers Working with a Trove collection in Tropy Comparing manuscript collections in Mirador Sharing a Trove List as a CollectionBuilder exhibition I’ve now added the fifth and final (for now) tutorial:

Continue reading →

Instant exhibitions with Trove and CollectionBuilder

You’ve been collecting and annotating items relating to your research project in a Trove List. You’d like to display the contents of your list as an online exhibition for others to explore. But how? One possible approach is now documented in the Trove Data Guide. I’ve added a tutorial which walks through the process of using a GLAM Workbench notebook to extract and process data from a Trove List, before uploading it to CollectionBuilder to create an instant exhibition.

Continue reading →

Keyword analysis of Trove newspapers with the GLAM Workbench & ATAP

There’s a new draft tutorial in the development version of the Trove Data Guide. It walks through the process of harvesting a collection of digitised newspaper articles from Trove, reshaping the harvest to create sub-collections, and then loading the data into the Keyword Analysis Tool provided by the Australian Text Analytics Platform (ATAP). Along the way it goes into a fair bit of detail about constructing searches, using the Trove Newspaper Harvester, and thinking about your data.

Continue reading →

Running Mirador on GitHub Pages

I’ve just created a GitHub repository template that you can use to get your own Mirador version 3 installation running in minutes. You can also configure it to display local or remote IIIF manifests. I was thinking that it could be useful for researchers who want to create their own customised Mirador workspaces to examine a particular set of documents, but don’t want to install any software or fiddle about on the command-line.

Continue reading →

Commonwealth Hansard XML repository updates

Hey Australian Hansard fans, I’ve done a complete reharvest of all of the Commonwealth Hansard XML files from 1901 to 1980 from ParlInfo. There’s been lots of improvements/corrections, and most of the file names have changed (they now have a version flag). The improvements seem to be ongoing, so I’ll try to harvest more regularly from now on. You can download the lot from the GitHub repository. I still need to load the updated XML into the Historic Hansard site, but that’s going to have to wait for a month or two…

Continue reading →

More tools for harvesting Trove newspaper articles

I’ve just added a couple of new notebooks to the Trove Newspaper & Gazette Harvester section of the GLAM Workbench. Using the Trove Harvester as a Python package provides a basic example of using the trove-newspaper-harvester Python package. While there’s already a simple web app version of the harvester, I wanted a notebook version running in the JupyterLab interface that I could integrate with other tools and notebooks. All you need to do to harvest all the articles in a Trove newspaper search is paste in your Trove API key and the search query url from the Trove web interface.

Continue reading →

Trove to Tropy via IIIF – documenting data pathways in the Trove Data Guide

Last week I added a notebook to the GLAM Workbench that saves a collection of images from Trove as an IIIF manifest. This week I’ve written a tutorial that shows how you can use the notebook to load the collection data in Tropy – a desktop tool for managing and annotating images for research. This is the first tutorial in the Trove Data Guide’s Research Pathways section. While most of the TDG documents the types of data available in Trove and how you can access it, the pathways aim to connect Trove data with other tools and platforms – to point at possibilities for analysis, enrichment, and sharing.

Continue reading →

Using IIIF to explore Trove's digitised images

I’ve just added a new notebook to the Trove images section of the GLAM Workbench. It helps you save a collection of digitised images as an IIIF manifest. But what does that mean? It means the notebook packages up all the metadata describing the images in a standard form that can be used with a variety of IIIF-compliant tools. These tools let you do things with the collections that you can’t do in Trove’s own interface.

Continue reading →