Trove contains thousands of digitised maps from the collections of the National Library of Australia, but they’re not always easy to find because of the way they’re arranged and described. To help you explore these maps I’ve created a new database and published it using Datasette.
Try it now! To get started, head to the map sheets table and search for some keywords. The results are displayed both as a cluster map using Leaflet, and as a table.
HASS researchers often compile data in spreadsheets. Sometimes they want to ‘publish’ this data online in a form that encourages others to use and explore – but how? I’ve just added a simple tool to the GLAM Workbench that helps you construct a url that will open a CSV file as a searchable database using Datasette-Lite.
What’s Datasette? Datasette is a fantastic tool that helps you publish your data as an interactive website.
I had to update my sadly-neglected CV, so of course I ended up renovating the whole of my personal website at timsherratt.au. To start with, I migrated my CV from Pages to Markdown. This made it easy to integrate the CV’s content into the site’s about me page. As i was updating the CV, I tried to get as many as possible of my publications and presentations into Zenodo for easy access and safekeeping.
The Trove newspapers section of the GLAM Workbench includes a number of notebooks and datasets that document the context and content of the newspaper corpus. I’ve just updated a few of these datasets:
Total number of issues per year for each newspaper in Trove Complete list of issues for every newspaper in Trove Trove newspapers with non-English language content Trove newspapers with articles published after 1954 OCR corrections in Trove newspapers I’ve also used the issues data to update my visualisation of the number of digitised newspaper issues in Trove published every day from 1803 to 2021 (there’s a lot of data so it can take a little while to load!
The recently finished Australian Historical Association conference in Adelaide included a digital history stream sponsored by the Australian Research Data Commons. I’ve listed the details of all the presentations below. I also thought it might be useful to try and bring together links to the various tools, platforms, and projects mentioned during the digital history sessions. I’m relying on my memory and what I could find by googling, so please let me know if I’ve missed something!
A fairly intensive period of work came to an end today as I delivered a workshop on ‘Understanding Trove’ at the Australian Historical Association’s annual conference in Adelaide. In effect, the workshop was also the launch of the Trove Data Guide, which I’ve been developing as part of the ARDC’s Community Data Lab. The ARDC sponsored today’s workshop and has provided bursaries to help five ECRs and HDRs participate in the conference’s digital history stream.
The Trove Data Guide aims to help researchers understand, access, and use data from Trove. But just because it’s about ‘data’ doesn’t mean you need to be able to code. To understand Trove data and its possibilities for research, you first need to understand Trove itself – its history, its structure, its assumptions, and its limits. This knowledge is useful to any Trove user.
For example, all Trove users would benefit from knowing more about works and versions, or how to use the ‘simple’ search box for complex queries.
For this part of the ARDC’s Community Data Lab project, I’ve been focusing in particular on adding a series of researcher pathways to the Trove Data Guide. These pathways link data from Trove to a variety of tools and approaches and include five detailed tutorials. The first four were:
Analysing keywords in Trove’s digitised newspapers Working with a Trove collection in Tropy Comparing manuscript collections in Mirador Sharing a Trove List as a CollectionBuilder exhibition I’ve now added the fifth and final (for now) tutorial:
You’ve been collecting and annotating items relating to your research project in a Trove List. You’d like to display the contents of your list as an online exhibition for others to explore. But how? One possible approach is now documented in the Trove Data Guide. I’ve added a tutorial which walks through the process of using a GLAM Workbench notebook to extract and process data from a Trove List, before uploading it to CollectionBuilder to create an instant exhibition.
There’s a new draft tutorial in the development version of the Trove Data Guide. It walks through the process of harvesting a collection of digitised newspaper articles from Trove, reshaping the harvest to create sub-collections, and then loading the data into the Keyword Analysis Tool provided by the Australian Text Analytics Platform (ATAP).
Along the way it goes into a fair bit of detail about constructing searches, using the Trove Newspaper Harvester, and thinking about your data.
I’ve just created a GitHub repository template that you can use to get your own Mirador version 3 installation running in minutes. You can also configure it to display local or remote IIIF manifests. I was thinking that it could be useful for researchers who want to create their own customised Mirador workspaces to examine a particular set of documents, but don’t want to install any software or fiddle about on the command-line.
Hey Australian Hansard fans, I’ve done a complete reharvest of all of the Commonwealth Hansard XML files from 1901 to 1980 from ParlInfo. There’s been lots of improvements/corrections, and most of the file names have changed (they now have a version flag). The improvements seem to be ongoing, so I’ll try to harvest more regularly from now on. You can download the lot from the GitHub repository.
I still need to load the updated XML into the Historic Hansard site, but that’s going to have to wait for a month or two…
I’ve just added a couple of new notebooks to the Trove Newspaper & Gazette Harvester section of the GLAM Workbench.
Using the Trove Harvester as a Python package provides a basic example of using the trove-newspaper-harvester Python package. While there’s already a simple web app version of the harvester, I wanted a notebook version running in the JupyterLab interface that I could integrate with other tools and notebooks. All you need to do to harvest all the articles in a Trove newspaper search is paste in your Trove API key and the search query url from the Trove web interface.
Last week I added a notebook to the GLAM Workbench that saves a collection of images from Trove as an IIIF manifest. This week I’ve written a tutorial that shows how you can use the notebook to load the collection data in Tropy – a desktop tool for managing and annotating images for research.
This is the first tutorial in the Trove Data Guide’s Research Pathways section. While most of the TDG documents the types of data available in Trove and how you can access it, the pathways aim to connect Trove data with other tools and platforms – to point at possibilities for analysis, enrichment, and sharing.
I’ve just added a new notebook to the Trove images section of the GLAM Workbench. It helps you save a collection of digitised images as an IIIF manifest. But what does that mean? It means the notebook packages up all the metadata describing the images in a standard form that can be used with a variety of IIIF-compliant tools. These tools let you do things with the collections that you can’t do in Trove’s own interface.
There’s a brand new section of the GLAM Workbench to help you use data from Pandora’s collection of archived websites.
What’s Pandora? Pandora is an initiative of the National Library of Australia which has been selecting web sites and online resources for preservation since 1996. It’s assembled a collection of more than 80,000 archived website titles, organised into subjects and collections. The archived websites are now part of the Australian Web Archive (AWA), which combines the selected titles with broader domain harvests, and is searchable through Trove.
Digitised resources in Trove are sometimes grouped into collections – an album of photographs, a set of posters, a bundle of letters. I’ve just added a notebook to the GLAM Workbench that downloads all the images in a collection at the highest available resolution.
A sample of the 3,048 posters download from nla.obj-2590804313
Why is it necessary? Trove’s digitised collection viewer includes a download option. But in most cases that seems to be limited to downloading 20 images at a time.
In my work on the Trove Data Guide I’ve started sketching out a series of research pathways. These are intended as ways of connecting Trove data to tools and questions – providing examples of the steps involved in gathering, preparing, and using data to explore particular research topics.
I’ve currently defined six pathways, roughly based on different types of data that you can get from Trove:
Text Images Structured data Maps and places Networks and relationships Creating collections ‘Creating collections’ is a bit different I suppose, as it’s meant to relate to the work of assembling research collections from data in Trove – for example, creating a collection of annotated newspaper articles in Omeka.
You probably know that when you select the Download as Image option for a digitised newspaper article in Trove what you get back is not actually an image – it’s an HTML document, in which the original image has been sliced up to try and fit on an A4 page when printed. So this article:
Ends up looking like this!!
So what do you do when you just want an image of an article as it appeared in the newspaper?
I spend a lot of my time trying to highlight the wealth of resources available through Trove – whether that’s 25,000 digitised Parliamentary Papers, 6,000 oral histories you can listen to online, or 3,471 full-page editorial cartoons from The Bulletin. Most recently I’ve been working on digitised periodicals, developing a new section for the Trove Data Guide. But as I was harvesting data about the 900 periodicals and 37,000 issues that had so far been digitised, I wondered about periodicals that were born digital – in particular, those that had been submitted to the National Library by publishers and authors through the National eDeposit Scheme (NED).