I was recently contacted by a researcher who wanted to be able to automatically download the issues of a digitised periodical in Trove as PDFs. There was already a notebook in the GLAM Workbench that downloads the issues of a digitised newspaper as PDFs, but newspapers work differently to other digitised periodicals in Trove. While there was no corresponding notebook for other types of periodicals, all the necessary steps were documented in the Trove Data Guide, so it was just a matter of pulling together a few blocks of code.
Don’t panic! Historic Hansard is not closing down – on the contrary, I’m planning a major update in the next few months. But as I look to the future, I thought it was a good time to pull together a few threads documenting my adventures with Commonwealth Hansard.
The past Commonwealth Hansard is made available online through ParlInfo (there’s an alternative search interface here). The Parliamentary Library has invested a lot of time and effort in converting the printed volumes into nicely-structured XML files which break up the sitting day into debates and speeches, and identify individual speakers.
If you’re interested in opening up GLAM collections for use in research, you might like to join the new Collections as Data Interest Group, part of the Research Data Alliance.
According to the group description:
This group is aimed at collections professionals such as archivists, librarians, records managers and museum curators, as well as related professions such as IT professionals, knowledge scientists, and those involved in standards development, who serve in a range of critical roles: as experts in ensuring access, preservation, and reuse of digital records, objects, data, and collections; as provocateurs for good collections curation practices; and as advocates for the construction of responsible and sustainable infrastructures for information sharing.
The GLAM Name Index Search brings datasets from 10 Australian GLAM organisations together into a single search interface. All these datasets index collections by people’s names, so with one search you can find information about individuals across a broad range of records, locations, and periods. It was created as an experiment during Family History Week in 2021, so I thought I’d update it for Family History Week 2024.
The update added 18 new datasets, so the GLAM Name Index Search now includes 279 datasets from 10 organisations – almost 12 million rows of data!
Good news for Australian archives users – you can now use Zotero to capture item details and digitised files from the collections of the Public Record Office Victoria and the Queensland State Archives!
What is Zotero? According to the Zotero website:
Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share research.
While you can use it instead of commercial reference managers like EndNote, Zotero is much, much more.
Trove contains thousands of digitised maps from the collections of the National Library of Australia, but they’re not always easy to find because of the way they’re arranged and described. To help you explore these maps I’ve created a new database and published it using Datasette.
Try it now! To get started, head to the map sheets table and search for some keywords. The results are displayed both as a cluster map using Leaflet, and as a table.
HASS researchers often compile data in spreadsheets. Sometimes they want to ‘publish’ this data online in a form that encourages others to use and explore – but how? I’ve just added a simple tool to the GLAM Workbench that helps you construct a url that will open a CSV file as a searchable database using Datasette-Lite.
What’s Datasette? Datasette is a fantastic tool that helps you publish your data as an interactive website.
I had to update my sadly-neglected CV, so of course I ended up renovating the whole of my personal website at timsherratt.au. To start with, I migrated my CV from Pages to Markdown. This made it easy to integrate the CV’s content into the site’s about me page. As i was updating the CV, I tried to get as many as possible of my publications and presentations into Zenodo for easy access and safekeeping.
The Trove newspapers section of the GLAM Workbench includes a number of notebooks and datasets that document the context and content of the newspaper corpus. I’ve just updated a few of these datasets:
Total number of issues per year for each newspaper in Trove Complete list of issues for every newspaper in Trove Trove newspapers with non-English language content Trove newspapers with articles published after 1954 OCR corrections in Trove newspapers I’ve also used the issues data to update my visualisation of the number of digitised newspaper issues in Trove published every day from 1803 to 2021 (there’s a lot of data so it can take a little while to load!
The recently finished Australian Historical Association conference in Adelaide included a digital history stream sponsored by the Australian Research Data Commons. I’ve listed the details of all the presentations below. I also thought it might be useful to try and bring together links to the various tools, platforms, and projects mentioned during the digital history sessions. I’m relying on my memory and what I could find by googling, so please let me know if I’ve missed something!
A fairly intensive period of work came to an end today as I delivered a workshop on ‘Understanding Trove’ at the Australian Historical Association’s annual conference in Adelaide. In effect, the workshop was also the launch of the Trove Data Guide, which I’ve been developing as part of the ARDC’s Community Data Lab. The ARDC sponsored today’s workshop and has provided bursaries to help five ECRs and HDRs participate in the conference’s digital history stream.
The Trove Data Guide aims to help researchers understand, access, and use data from Trove. But just because it’s about ‘data’ doesn’t mean you need to be able to code. To understand Trove data and its possibilities for research, you first need to understand Trove itself – its history, its structure, its assumptions, and its limits. This knowledge is useful to any Trove user.
For example, all Trove users would benefit from knowing more about works and versions, or how to use the ‘simple’ search box for complex queries.
For this part of the ARDC’s Community Data Lab project, I’ve been focusing in particular on adding a series of researcher pathways to the Trove Data Guide. These pathways link data from Trove to a variety of tools and approaches and include five detailed tutorials. The first four were:
Analysing keywords in Trove’s digitised newspapers Working with a Trove collection in Tropy Comparing manuscript collections in Mirador Sharing a Trove List as a CollectionBuilder exhibition I’ve now added the fifth and final (for now) tutorial:
You’ve been collecting and annotating items relating to your research project in a Trove List. You’d like to display the contents of your list as an online exhibition for others to explore. But how? One possible approach is now documented in the Trove Data Guide. I’ve added a tutorial which walks through the process of using a GLAM Workbench notebook to extract and process data from a Trove List, before uploading it to CollectionBuilder to create an instant exhibition.
There’s a new draft tutorial in the development version of the Trove Data Guide. It walks through the process of harvesting a collection of digitised newspaper articles from Trove, reshaping the harvest to create sub-collections, and then loading the data into the Keyword Analysis Tool provided by the Australian Text Analytics Platform (ATAP).
Along the way it goes into a fair bit of detail about constructing searches, using the Trove Newspaper Harvester, and thinking about your data.
I’ve just created a GitHub repository template that you can use to get your own Mirador version 3 installation running in minutes. You can also configure it to display local or remote IIIF manifests. I was thinking that it could be useful for researchers who want to create their own customised Mirador workspaces to examine a particular set of documents, but don’t want to install any software or fiddle about on the command-line.
Hey Australian Hansard fans, I’ve done a complete reharvest of all of the Commonwealth Hansard XML files from 1901 to 1980 from ParlInfo. There’s been lots of improvements/corrections, and most of the file names have changed (they now have a version flag). The improvements seem to be ongoing, so I’ll try to harvest more regularly from now on. You can download the lot from the GitHub repository.
I still need to load the updated XML into the Historic Hansard site, but that’s going to have to wait for a month or two…
I’ve just added a couple of new notebooks to the Trove Newspaper & Gazette Harvester section of the GLAM Workbench.
Using the Trove Harvester as a Python package provides a basic example of using the trove-newspaper-harvester Python package. While there’s already a simple web app version of the harvester, I wanted a notebook version running in the JupyterLab interface that I could integrate with other tools and notebooks. All you need to do to harvest all the articles in a Trove newspaper search is paste in your Trove API key and the search query url from the Trove web interface.
Last week I added a notebook to the GLAM Workbench that saves a collection of images from Trove as an IIIF manifest. This week I’ve written a tutorial that shows how you can use the notebook to load the collection data in Tropy – a desktop tool for managing and annotating images for research.
This is the first tutorial in the Trove Data Guide’s Research Pathways section. While most of the TDG documents the types of data available in Trove and how you can access it, the pathways aim to connect Trove data with other tools and platforms – to point at possibilities for analysis, enrichment, and sharing.
I’ve just added a new notebook to the Trove images section of the GLAM Workbench. It helps you save a collection of digitised images as an IIIF manifest. But what does that mean? It means the notebook packages up all the metadata describing the images in a standard form that can be used with a variety of IIIF-compliant tools. These tools let you do things with the collections that you can’t do in Trove’s own interface.