Tim Sherratt

Exploring Trove’s digitised periodicals

Tuesday, January 30, 2024

While Trove’s digitised newspapers get all the attention, there are many other digitised periodicals to explore. But it’s not easy to find them from the Trove web interface – unlike the newspapers, there’s no list of digitised titles. So to help researchers find and use Trove’s digitised periodicals, I’ve created a searchable database using Datasette-Lite. Try it out! Search for the titles of digitised periodicals. View the details of an individual title (note the link to available issues at the bottom.

Continue reading →

The Trove Newspaper Data Dashboard now has an archive!

Monday, January 15, 2024

Since July 2022 I’ve been generating weekly snapshots of the contents of the Trove newspaper corpus. Every Sunday a new version of the Trove Newspaper Data Dashboard is created, highlighting what’s changed over the previous week, and visualising trends since April 2022 (when I first started regular data harvests). All of the past versions of the dashboard are preserved in GitHub, but there wasn’t an easy way to browse them, until now.

Continue reading →

Customising Datasette-Lite to explore datasets in the GLAM Workbench

Friday, January 12, 2024

As well as tools and code, the GLAM Workbench includes a number of pre-harvested datasets for researchers to play with. But just including a link to a CSV file in GitHub or Zenodo isn’t very useful – it doesn’t help researchers understand what’s in the dataset, and why it might be useful. That’s why I’ve also started including links that open the CSV files in Datasette-Lite, enabling the contents to be searched, filtered, and faceted.

Continue reading →

What’s going on?

Thursday, January 4, 2024

The hardest part of developing tools and resources like the GLAM Workbench is getting information about them to the people who might benefit. The collapse of Twitter has only added to the difficulty, as has the reluctance of GLAM organisations to share new resources with their users. I’d rather spend my time making new tools, but what’s the point if no-one knows they exist? Anyway, I thought I’d do a bit of a communications refresh for the new year.

Continue reading →

Exploring oral histories in Trove

Thursday, January 4, 2024

The National Library of Australia holds over 55,000 hours of oral history and folklore recordings dating back to the 1950s. This collection is being made available online, and many recordings can now be listened to using Trove’s audio player. However, the oral history collection is not easy to find in Trove. You need to go the ‘Music, Audio, & Video’ category and check the ‘Sound/Interview, lecture, talk’ format facet. To limit results to oral histories that have been digitised, you can add “nla.

Continue reading →

Mapping MARC Geographic Area codes to Wikidata

Wednesday, January 3, 2024

Trove uses codes from the MARC Geographic Areas list to identify locations in metadata records. I couldn’t find any mappings of these codes to other sources of geospatial information, so I fired up OpenRefine and reconciled the geographic area names against Wikidata. Once I’d linked as many as possible, I copied additional information from Wikidata, such as ISO country codes, GeoNames identifiers, and geographic coordinates. I’ve saved the resulting dataset in two formats – as a flattened CSV file (handy for loading as a dataframe), and as a JSON file that uses the geographic area codes as keys (handy for looking up values).

Continue reading →

National Archives of Australia in 2023 – digitisation of files

Wednesday, January 3, 2024

In 2023 the National Archives of Australia digitised 416,602 files (down from 575,597 in 2022). This chart shows the number of files digitised per day in 2023. These files were drawn from 1,423 different series, but the vast bulk (81%) were from 4 series of World War Two service records. (This media release includes some details about the funding of the WW2 digitisation.) Here’s the top twenty series by number of items digitised in 2023.

Continue reading →

Trove newspapers in 2023

Tuesday, January 2, 2024

I’ve been capturing weekly snapshots of the Trove newspaper corpus for the last couple of years. You can see the latest results in the Trove Newspaper Data Dashboard. Using this data I’ve compiled a quick summary of changes over the last year. 7,518,764 digitised newspaper articles were added to Trove in 2023. The total number of articles increased from 236,530,127 to 244,048,891. The chart below shows how the number of articles varied across the year.

Continue reading →

Trove Data Guide update – accessing data from newspapers and gazettes

Friday, September 15, 2023

I’m continuing to slog away at the Trove Data Guide (part of the ARDC’s HASS Community Data Lab) – dumping everything I know about Trove into a format that I hope will be useful for researchers. I’ve just finished a first pass through the section on accessing data from newspapers and gazettes, and it’s online if you want to have a look. There’s still lots of things to add, update, and reorganise, but getting the basic content of the section defined is a bit of a milestone, so I’ll allow myself a little moment of celebration.

Continue reading →

Some important updates for the Trove Newspaper & Gazette Harvester

Thursday, August 31, 2023

Version 3 of the Trove API is out, and version 2 is scheduled to be decommissioned in early 2023 – that means I have a lot of code to update! First cab of the rank is the Trove Newspaper & Gazette Harvester with version 0.7.1 now available. The Harvester is a Python package that can be used as either a library or a command-line tool. It’s been around in some form for more than 10 years.

Continue reading →

Run GLAM Workbench notebooks on the ARDC’s new Binder service

Thursday, August 31, 2023

There are a number of different ways to run the Jupyter notebooks in the GLAM Workbench depending on your needs and technical skills. But the easiest and quickest has always been the public, international Binder service, based in Europe. One click in the GLAM Workbench and Binder prepares a customised computing environment and loads up the Jupyter notebooks ready for you to explore. Unfortunately, the public Binder service has been having some capacity issues in the last few months, and sometimes repositories fail to run.

Continue reading →

Trove Query Parser updated!

Saturday, August 26, 2023

I’ve just updated the Trove Query Parser to work with version 3 of the Trove API. You just give it the url of a search in Trove’s newspapers, and it translates the search into a set of parameters that the API will understand. So this: parse_query("https://trove.nla.gov.au/search/category/newspapers?keyword=wragge&l-artType=newspapers&l-state=Queensland&l-category=Article&l-illustrationType=Cartoon", 3) Produces this: {'q': 'wragge', 'l-artType': 'newspapers', 'l-state': ['Queensland'], 'l-category': ['Article'], 'l-illustrated': 'true', 'l-illustrationType': ['Cartoon'], 'category': 'newspaper'} You can then feed the parameters to the Trove API with your API key and you’ll get data back.

Continue reading →

Family history resources in the GLAM Workbench

Friday, August 18, 2023

It’s Family History Month, so I thought a brief post was in order describing some of the family history related resources in the GLAM Workbench. GLAM Name Index Search This is the biggie (in more ways than one). I’ve brought 263 datasets from 10 Australian GLAM organisations together into a single search interface. All these datasets index collections by people’s names, so with one search you can find information about individuals across a broad range of records, locations, and periods.

Continue reading →

Bye bye birdsite

Thursday, August 17, 2023

In early June I pinned a “nobody’s home” post to my profile and said goodbye to Twitter. After 15 years, I was sad to leave behind friends and colleagues, but glad to get away from the hate, the nazis, and the transphobes. I hadn’t been posting much since Elno took over anyway, and was happily building a new network over on Mastodon. This morning I finally removed the Twitter links from my home page.

Continue reading →

Exploring the front pages of newspapers (10 years on)

Tuesday, August 8, 2023

Way back in 2012, I used the brand new Trove API to download the details of 4 million articles published on the front pages of newspapers. I did it for two reasons: first, I wanted to see how the content of front pages changed over time; and second, I wanted to show that large-scale data wrangling was entirely possible using nothing more than a laptop and a home broadband connection. I described my adventures in this blog post, but if you look at it now you’ll see lots of sad, empty boxes where live charts used to be.

Continue reading →

Trove API Console updates

Tuesday, July 18, 2023

The Trove API Console provides examples of the Trove API in action that you can run, edit, and share. It’s been online for 9 years now, and I’ve just updated it to use version 3 of the Trove API by default. I’ve also added a new ‘Share’ button that makes it easier to share and embed examples. If you click on the ‘Share’ button, a box will pop up. If you add a comment, this will appear above the example query when users follow the shared link.

Continue reading →

Getting to work on the Trove Data Guide

Monday, July 3, 2023

The ARDC has started work on the development of a HASS Community Data Lab to support digital research in the humanities. I’m part of the team of contractors, and my work package is focused on the development of a Trove Data Guide. My aim is to give researchers what they need to use and understand all the varieties of data that Trove makes available – from newspaper text to digitised, high-resolution maps.

Continue reading →

Updated harvest of NSW State Archives indexes – more than 2 million rows of data!

Monday, May 8, 2023

The NSW State Archives (now part of Museums of History NSW) publishes a series of useful indexes to its collections. The indexes include basic data transcribed from the records, such as names, dates, and places, providing fine-grained access to the collections. But when they’re explored as data, the indexes also suggest new ways of analysing, visualising, and linking sets of records. (For some of the possibilities and challenges of using this sort of data see Missing Links: Data Stories from the Archive of British Settler Colonial Citizenship).

Continue reading →

A big milestone, Trove contributor data, and the coming of API v3 – recent GLAM Workbench updates

Friday, March 24, 2023

There have been quite a few GLAM Workbench updates over the last month, here’s some notes. (See February’s update for more recent changes…) General developments After many months of work, all thirteen Trove repositories within the GLAM Workbench have been updated to include standard configurations, integrations, and basic tests. This will make ongoing development and maintenance much easier. Docker images of every repository are now built automatically whenever the code changes.

Continue reading →

Maps, people, lists & more – recent updates to Trove resources in the GLAM Workbench

Friday, February 17, 2023

Once again I’ve gotten a bit behind in noting GLAM Workbench updates, so here’s a quick catch up on some Trove-related changes from the last couple of months. Trove API introduction The section that introduces the Trove API (or APIs!) hasn’t had much love over recent years. I’m hoping to add some more content in the coming months, but for now I just did a bit of maintenance – updating Python packages and config files, including tests, and setting up automated builds of Docker containers.

Continue reading →