Tim Sherratt

Running Mirador on GitHub Pages

Sunday, June 2, 2024

I’ve just created a GitHub repository template that you can use to get your own Mirador version 3 installation running in minutes. You can also configure it to display local or remote IIIF manifests. I was thinking that it could be useful for researchers who want to create their own customised Mirador workspaces to examine a particular set of documents, but don’t want to install any software or fiddle about on the command-line.

Continue reading →

Commonwealth Hansard XML repository updates

Sunday, May 26, 2024

Hey Australian Hansard fans, I’ve done a complete reharvest of all of the Commonwealth Hansard XML files from 1901 to 1980 from ParlInfo. There’s been lots of improvements/corrections, and most of the file names have changed (they now have a version flag). The improvements seem to be ongoing, so I’ll try to harvest more regularly from now on. You can download the lot from the GitHub repository. I still need to load the updated XML into the Historic Hansard site, but that’s going to have to wait for a month or two…

Continue reading →

More tools for harvesting Trove newspaper articles

Friday, May 24, 2024

I’ve just added a couple of new notebooks to the Trove Newspaper & Gazette Harvester section of the GLAM Workbench. Using the Trove Harvester as a Python package provides a basic example of using the trove-newspaper-harvester Python package. While there’s already a simple web app version of the harvester, I wanted a notebook version running in the JupyterLab interface that I could integrate with other tools and notebooks. All you need to do to harvest all the articles in a Trove newspaper search is paste in your Trove API key and the search query url from the Trove web interface.

Continue reading →

Trove to Tropy via IIIF – documenting data pathways in the Trove Data Guide

Tuesday, May 21, 2024

Last week I added a notebook to the GLAM Workbench that saves a collection of images from Trove as an IIIF manifest. This week I’ve written a tutorial that shows how you can use the notebook to load the collection data in Tropy – a desktop tool for managing and annotating images for research. This is the first tutorial in the Trove Data Guide’s Research Pathways section. While most of the TDG documents the types of data available in Trove and how you can access it, the pathways aim to connect Trove data with other tools and platforms – to point at possibilities for analysis, enrichment, and sharing.

Continue reading →

Using IIIF to explore Trove's digitised images

Thursday, May 16, 2024

I’ve just added a new notebook to the Trove images section of the GLAM Workbench. It helps you save a collection of digitised images as an IIIF manifest. But what does that mean? It means the notebook packages up all the metadata describing the images in a standard form that can be used with a variety of IIIF-compliant tools. These tools let you do things with the collections that you can’t do in Trove’s own interface.

Continue reading →

Using Pandora's collection of archived websites

Tuesday, May 7, 2024

There’s a brand new section of the GLAM Workbench to help you use data from Pandora’s collection of archived websites. What’s Pandora? Pandora is an initiative of the National Library of Australia which has been selecting web sites and online resources for preservation since 1996. It’s assembled a collection of more than 80,000 archived website titles, organised into subjects and collections. The archived websites are now part of the Australian Web Archive (AWA), which combines the selected titles with broader domain harvests, and is searchable through Trove.

Continue reading →

How to download all the images from a digitised collection in Trove (& learn some cool Trove tricks)

Thursday, April 25, 2024

Digitised resources in Trove are sometimes grouped into collections – an album of photographs, a set of posters, a bundle of letters. I’ve just added a notebook to the GLAM Workbench that downloads all the images in a collection at the highest available resolution. A sample of the 3,048 posters download from nla.obj-2590804313 Why is it necessary? Trove’s digitised collection viewer includes a download option. But in most cases that seems to be limited to downloading 20 images at a time.

Continue reading →

What do you want to do with Trove data?

Thursday, April 18, 2024

In my work on the Trove Data Guide I’ve started sketching out a series of research pathways. These are intended as ways of connecting Trove data to tools and questions – providing examples of the steps involved in gathering, preparing, and using data to explore particular research topics. I’ve currently defined six pathways, roughly based on different types of data that you can get from Trove: Text Images Structured data Maps and places Networks and relationships Creating collections ‘Creating collections’ is a bit different I suppose, as it’s meant to relate to the work of assembling research collections from data in Trove – for example, creating a collection of annotated newspaper articles in Omeka.

Continue reading →

Update! Saving Trove newspaper articles and pages as images

Thursday, April 18, 2024

You probably know that when you select the Download as Image option for a digitised newspaper article in Trove what you get back is not actually an image – it’s an HTML document, in which the original image has been sliced up to try and fit on an A4 page when printed. So this article: Ends up looking like this!! So what do you do when you just want an image of an article as it appeared in the newspaper?

Continue reading →

Getting to know NED – born-digital periodicals in Trove

Thursday, April 11, 2024

I spend a lot of my time trying to highlight the wealth of resources available through Trove – whether that’s 25,000 digitised Parliamentary Papers, 6,000 oral histories you can listen to online, or 3,471 full-page editorial cartoons from The Bulletin. Most recently I’ve been working on digitised periodicals, developing a new section for the Trove Data Guide. But as I was harvesting data about the 900 periodicals and 37,000 issues that had so far been digitised, I wondered about periodicals that were born digital – in particular, those that had been submitted to the National Library by publishers and authors through the National eDeposit Scheme (NED).

Continue reading →

More tools and data for working with Trove's digitised periodicals

Tuesday, March 26, 2024

The Trove Periodicals section of the GLAM Workbench has been updated! Some changes were necessary to make use of version 3 of the Trove API, but I’ve also taken the chance to reorganise things a bit – starting with the name. This section used to be called ‘Trove journals’, reflecting the naming of Trove’s ‘Journals’ zone. But zones have gone, and periodicals are now spread across multiple categories, so I thought a name change was necessary to better reflect the type of content being examined.

Continue reading →

A new way to explore editorial cartoons from The Bulletin

Tuesday, March 19, 2024

About five years ago I created a collection of full-page editorial cartoons from The Bulletin, harvested from Trove. Through a process that might be politely described as ‘iterative’, I fiddled with an assortment of queries and methods until I had at least one cartoon from every issue published between 4 September 1886 and 17 September 1952 – 3,471 cartoons in total. The details of the collection and how I created it are available in the Trove periodicals section of the GLAM Workbench.

Continue reading →

New GLAM Workbench section for working with government publications in Trove

Tuesday, February 27, 2024

The GLAM Workbench has a brand new section aimed at helping you find and use government publications in Trove. Most of the GLAM Workbench’s existing sections focus on a particular resource format, or are related to one of Trove’s top-level categories. This didn’t quite work for government publications, as things like Parliamentary Papers are spread across multiple categories, and can encompass a variety of formats. So I thought a new section was the best way of bringing it all together.

Continue reading →

Digital history stream at AHA annual conference in July

Friday, February 16, 2024

This year the annual conference of the Australian Historical Association will include a digital history stream, sponsored by the Australian Research Data Commons (ARDC), and convened by me! The call for papers is available here or through the Conference website. The list of possible topics is deliberately broad and inclusive – if you’re using digital tools or methods in the organisation, analysis, and visualisation of historical data we’d love to hear from you.

Continue reading →

Some recent presentations on the GLAM Workbench and Trove Data Guide

Tuesday, February 13, 2024

Last week I attended the ARDC Workshop on Repositories & Workspaces where I gave a quick intro to the GLAM Workbench and the Community Data Lab. Then it was off to the ARDC HASS&I Research Data Commons Summer School where I explored some of the mysteries of Trove in a walk-through of the Trove Data Guide.

Continue reading →

Exploring Trove’s digitised periodicals

Tuesday, January 30, 2024

While Trove’s digitised newspapers get all the attention, there are many other digitised periodicals to explore. But it’s not easy to find them from the Trove web interface – unlike the newspapers, there’s no list of digitised titles. So to help researchers find and use Trove’s digitised periodicals, I’ve created a searchable database using Datasette-Lite. Try it out! Search for the titles of digitised periodicals. View the details of an individual title (note the link to available issues at the bottom.

Continue reading →

The Trove Newspaper Data Dashboard now has an archive!

Monday, January 15, 2024

Since July 2022 I’ve been generating weekly snapshots of the contents of the Trove newspaper corpus. Every Sunday a new version of the Trove Newspaper Data Dashboard is created, highlighting what’s changed over the previous week, and visualising trends since April 2022 (when I first started regular data harvests). All of the past versions of the dashboard are preserved in GitHub, but there wasn’t an easy way to browse them, until now.

Continue reading →

Customising Datasette-Lite to explore datasets in the GLAM Workbench

Friday, January 12, 2024

As well as tools and code, the GLAM Workbench includes a number of pre-harvested datasets for researchers to play with. But just including a link to a CSV file in GitHub or Zenodo isn’t very useful – it doesn’t help researchers understand what’s in the dataset, and why it might be useful. That’s why I’ve also started including links that open the CSV files in Datasette-Lite, enabling the contents to be searched, filtered, and faceted.

Continue reading →

What’s going on?

Thursday, January 4, 2024

The hardest part of developing tools and resources like the GLAM Workbench is getting information about them to the people who might benefit. The collapse of Twitter has only added to the difficulty, as has the reluctance of GLAM organisations to share new resources with their users. I’d rather spend my time making new tools, but what’s the point if no-one knows they exist? Anyway, I thought I’d do a bit of a communications refresh for the new year.

Continue reading →

Exploring oral histories in Trove

Thursday, January 4, 2024

The National Library of Australia holds over 55,000 hours of oral history and folklore recordings dating back to the 1950s. This collection is being made available online, and many recordings can now be listened to using Trove’s audio player. However, the oral history collection is not easy to find in Trove. You need to go the ‘Music, Audio, & Video’ category and check the ‘Sound/Interview, lecture, talk’ format facet. To limit results to oral histories that have been digitised, you can add “nla.

Continue reading →