glamworkbench

@trovenewsbot has a new home

@trovenewsbot has been around for more than eleven years now – originally sharing Trove newspaper articles on Twitter, and now on the Fediverse. But with the imminent closure of the botsin.space Mastodon instance, I’ve had to find it a new home. Say hello to the latest version: @trovenewsbot@wraggebots.net! Instead of just moving the bot to an existing instance, I decided to set up my own using GoToSocial. I thought this would give me more control, and encourage me to resurrect some more of my old Twitter bots.

Continue reading →

Six more volumes added to the searchable database of Tasmanian Post Office Directories!

A couple of months ago I realised my big, searchable database of Tasmanian Post Office Directories was missing the volume from 1920. It took a bit of work to add it in, as described in this post. Unfortunately, I’d barely finished when I realised that a number of other years were also missing! Argh! The good news is that I’ve been steadily working through these missing volumes, adding one a week, and now I’m finally, finally finished!

Continue reading →

Where's 1920? Missing volume added to Tasmanian Post Office Directories!

Visualisation is a great way to find problems in your data. As part of the Everyday Heritage project, I’m working with a team to document the lives of Tasmania’s Chinese residents in the 19th and early 20th centuries. We’re using a variety of sources such as Trove’s newspapers, the Tasmanian Names Index, and the Tasmanian Post Office Directories. To help with the research, I converted all the PDF volumes of the Post Office Directories into a public, online, searchable database.

Continue reading →

Major update for the Trove Newspapers section of the GLAM Workbench

The Trove newspapers section of the GLAM Workbench was updated last week. Over the last year I’ve been gradually updating notebooks to use version 3 of the Trove API, but when version 2 suddenly disappeared a couple of weeks ago I had to hurriedly pull everything together. The Trove newspapers section includes 23 notebooks and 6 datasets, so it’s not a small job. The changes include: updated all notebooks to use version 3 of the Trove API removed remaining datasets from the code repository and created dedicated data repositories for them, integrating them with Zenodo where appropriate added metadata to all the notebooks – this is used to build an RO-Crate metadata file for the code repository updated all the Python packages added a voila.

Continue reading →

Preserving the history of online collections (my love letter to future historians)

It’s pretty obvious that access to digitised resources, like Trove’s newspapers, has changed the practice of history in Australia. But how? I’m certain that the historiographical implications of the growth and development of online collections will become a topic of increasing interest to historians, and that exploration of this topic will lead to important insights into the relationship between what we keep, what we value, and what we know. But for this to happen we need to have data documenting changes in online collections.

Continue reading →

Saving Trove's digitised periodicals as PDFs

I was recently contacted by a researcher who wanted to be able to automatically download the issues of a digitised periodical in Trove as PDFs. There was already a notebook in the GLAM Workbench that downloads the issues of a digitised newspaper as PDFs, but newspapers work differently to other digitised periodicals in Trove. While there was no corresponding notebook for other types of periodicals, all the necessary steps were documented in the Trove Data Guide, so it was just a matter of pulling together a few blocks of code.

Continue reading →

The future (and past) of Historic Hansard

Don’t panic! Historic Hansard is not closing down – on the contrary, I’m planning a major update in the next few months. But as I look to the future, I thought it was a good time to pull together a few threads documenting my adventures with Commonwealth Hansard. The past Commonwealth Hansard is made available online through ParlInfo (there’s an alternative search interface here). The Parliamentary Library has invested a lot of time and effort in converting the printed volumes into nicely-structured XML files which break up the sitting day into debates and speeches, and identify individual speakers.

Continue reading →

Join the Research Data Alliance's new Collections as Data Interest Group!

If you’re interested in opening up GLAM collections for use in research, you might like to join the new Collections as Data Interest Group, part of the Research Data Alliance. According to the group description: This group is aimed at collections professionals such as archivists, librarians, records managers and museum curators, as well as related professions such as IT professionals, knowledge scientists, and those involved in standards development, who serve in a range of critical roles: as experts in ensuring access, preservation, and reuse of digital records, objects, data, and collections; as provocateurs for good collections curation practices; and as advocates for the construction of responsible and sustainable infrastructures for information sharing.

Continue reading →

More datasets added to GLAM Name Index Search – now almost 12 million rows of data!

The GLAM Name Index Search brings datasets from 10 Australian GLAM organisations together into a single search interface. All these datasets index collections by people’s names, so with one search you can find information about individuals across a broad range of records, locations, and periods. It was created as an experiment during Family History Week in 2021, so I thought I’d update it for Family History Week 2024. The update added 18 new datasets, so the GLAM Name Index Search now includes 279 datasets from 10 organisations – almost 12 million rows of data!

Continue reading →

New Zotero translators for PROV and Queensland State Archives

Good news for Australian archives users – you can now use Zotero to capture item details and digitised files from the collections of the Public Record Office Victoria and the Queensland State Archives! What is Zotero? According to the Zotero website: Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share research. While you can use it instead of commercial reference managers like EndNote, Zotero is much, much more.

Continue reading →

Explore Trove's digitised maps

Trove contains thousands of digitised maps from the collections of the National Library of Australia, but they’re not always easy to find because of the way they’re arranged and described. To help you explore these maps I’ve created a new database and published it using Datasette. Try it now! To get started, head to the map sheets table and search for some keywords. The results are displayed both as a cluster map using Leaflet, and as a table.

Continue reading →

Share your spreadsheet as a searchable online database using Datasette-Lite

HASS researchers often compile data in spreadsheets. Sometimes they want to ‘publish’ this data online in a form that encourages others to use and explore – but how? I’ve just added a simple tool to the GLAM Workbench that helps you construct a url that will open a CSV file as a searchable database using Datasette-Lite. What’s Datasette? Datasette is a fantastic tool that helps you publish your data as an interactive website.

Continue reading →

Updated datasets describing Trove's digitised newspapers

The Trove newspapers section of the GLAM Workbench includes a number of notebooks and datasets that document the context and content of the newspaper corpus. I’ve just updated a few of these datasets: Total number of issues per year for each newspaper in Trove Complete list of issues for every newspaper in Trove Trove newspapers with non-English language content Trove newspapers with articles published after 1954 OCR corrections in Trove newspapers I’ve also used the issues data to update my visualisation of the number of digitised newspaper issues in Trove published every day from 1803 to 2021 (there’s a lot of data so it can take a little while to load!

Continue reading →

Understanding Trove at the AHA annual conference

A fairly intensive period of work came to an end today as I delivered a workshop on ‘Understanding Trove’ at the Australian Historical Association’s annual conference in Adelaide. In effect, the workshop was also the launch of the Trove Data Guide, which I’ve been developing as part of the ARDC’s Community Data Lab. The ARDC sponsored today’s workshop and has provided bursaries to help five ECRs and HDRs participate in the conference’s digital history stream.

Continue reading →

Who is the Trove Data Guide for?

The Trove Data Guide aims to help researchers understand, access, and use data from Trove. But just because it’s about ‘data’ doesn’t mean you need to be able to code. To understand Trove data and its possibilities for research, you first need to understand Trove itself – its history, its structure, its assumptions, and its limits. This knowledge is useful to any Trove user. For example, all Trove users would benefit from knowing more about works and versions, or how to use the ‘simple’ search box for complex queries.

Continue reading →

Loading locations of Trove's digitised maps into the Gazetteer of Historical Australian Placenames

For this part of the ARDC’s Community Data Lab project, I’ve been focusing in particular on adding a series of researcher pathways to the Trove Data Guide. These pathways link data from Trove to a variety of tools and approaches and include five detailed tutorials. The first four were: Analysing keywords in Trove’s digitised newspapers Working with a Trove collection in Tropy Comparing manuscript collections in Mirador Sharing a Trove List as a CollectionBuilder exhibition I’ve now added the fifth and final (for now) tutorial:

Continue reading →

Instant exhibitions with Trove and CollectionBuilder

You’ve been collecting and annotating items relating to your research project in a Trove List. You’d like to display the contents of your list as an online exhibition for others to explore. But how? One possible approach is now documented in the Trove Data Guide. I’ve added a tutorial which walks through the process of using a GLAM Workbench notebook to extract and process data from a Trove List, before uploading it to CollectionBuilder to create an instant exhibition.

Continue reading →

Keyword analysis of Trove newspapers with the GLAM Workbench & ATAP

There’s a new draft tutorial in the development version of the Trove Data Guide. It walks through the process of harvesting a collection of digitised newspaper articles from Trove, reshaping the harvest to create sub-collections, and then loading the data into the Keyword Analysis Tool provided by the Australian Text Analytics Platform (ATAP). Along the way it goes into a fair bit of detail about constructing searches, using the Trove Newspaper Harvester, and thinking about your data.

Continue reading →

Running Mirador on GitHub Pages

I’ve just created a GitHub repository template that you can use to get your own Mirador version 3 installation running in minutes. You can also configure it to display local or remote IIIF manifests. I was thinking that it could be useful for researchers who want to create their own customised Mirador workspaces to examine a particular set of documents, but don’t want to install any software or fiddle about on the command-line.

Continue reading →

Commonwealth Hansard XML repository updates

Hey Australian Hansard fans, I’ve done a complete reharvest of all of the Commonwealth Hansard XML files from 1901 to 1980 from ParlInfo. There’s been lots of improvements/corrections, and most of the file names have changed (they now have a version flag). The improvements seem to be ongoing, so I’ll try to harvest more regularly from now on. You can download the lot from the GitHub repository. I still need to load the updated XML into the Historic Hansard site, but that’s going to have to wait for a month or two…

Continue reading →

More tools for harvesting Trove newspaper articles

I’ve just added a couple of new notebooks to the Trove Newspaper & Gazette Harvester section of the GLAM Workbench. Using the Trove Harvester as a Python package provides a basic example of using the trove-newspaper-harvester Python package. While there’s already a simple web app version of the harvester, I wanted a notebook version running in the JupyterLab interface that I could integrate with other tools and notebooks. All you need to do to harvest all the articles in a Trove newspaper search is paste in your Trove API key and the search query url from the Trove web interface.

Continue reading →

Trove to Tropy via IIIF – documenting data pathways in the Trove Data Guide

Last week I added a notebook to the GLAM Workbench that saves a collection of images from Trove as an IIIF manifest. This week I’ve written a tutorial that shows how you can use the notebook to load the collection data in Tropy – a desktop tool for managing and annotating images for research. This is the first tutorial in the Trove Data Guide’s Research Pathways section. While most of the TDG documents the types of data available in Trove and how you can access it, the pathways aim to connect Trove data with other tools and platforms – to point at possibilities for analysis, enrichment, and sharing.

Continue reading →

Using IIIF to explore Trove's digitised images

I’ve just added a new notebook to the Trove images section of the GLAM Workbench. It helps you save a collection of digitised images as an IIIF manifest. But what does that mean? It means the notebook packages up all the metadata describing the images in a standard form that can be used with a variety of IIIF-compliant tools. These tools let you do things with the collections that you can’t do in Trove’s own interface.

Continue reading →

Using Pandora's collection of archived websites

There’s a brand new section of the GLAM Workbench to help you use data from Pandora’s collection of archived websites. What’s Pandora? Pandora is an initiative of the National Library of Australia which has been selecting web sites and online resources for preservation since 1996. It’s assembled a collection of more than 80,000 archived website titles, organised into subjects and collections. The archived websites are now part of the Australian Web Archive (AWA), which combines the selected titles with broader domain harvests, and is searchable through Trove.

Continue reading →

How to download all the images from a digitised collection in Trove (& learn some cool Trove tricks)

Digitised resources in Trove are sometimes grouped into collections – an album of photographs, a set of posters, a bundle of letters. I’ve just added a notebook to the GLAM Workbench that downloads all the images in a collection at the highest available resolution. A sample of the 3,048 posters download from nla.obj-2590804313 Why is it necessary? Trove’s digitised collection viewer includes a download option. But in most cases that seems to be limited to downloading 20 images at a time.

Continue reading →

What do you want to do with Trove data?

In my work on the Trove Data Guide I’ve started sketching out a series of research pathways. These are intended as ways of connecting Trove data to tools and questions – providing examples of the steps involved in gathering, preparing, and using data to explore particular research topics. I’ve currently defined six pathways, roughly based on different types of data that you can get from Trove: Text Images Structured data Maps and places Networks and relationships Creating collections ‘Creating collections’ is a bit different I suppose, as it’s meant to relate to the work of assembling research collections from data in Trove – for example, creating a collection of annotated newspaper articles in Omeka.

Continue reading →

Update! Saving Trove newspaper articles and pages as images

You probably know that when you select the Download as Image option for a digitised newspaper article in Trove what you get back is not actually an image ­– it’s an HTML document, in which the original image has been sliced up to try and fit on an A4 page when printed. So this article: Ends up looking like this!! So what do you do when you just want an image of an article as it appeared in the newspaper?

Continue reading →

Getting to know NED – born-digital periodicals in Trove

I spend a lot of my time trying to highlight the wealth of resources available through Trove – whether that’s 25,000 digitised Parliamentary Papers, 6,000 oral histories you can listen to online, or 3,471 full-page editorial cartoons from The Bulletin. Most recently I’ve been working on digitised periodicals, developing a new section for the Trove Data Guide. But as I was harvesting data about the 900 periodicals and 37,000 issues that had so far been digitised, I wondered about periodicals that were born digital – in particular, those that had been submitted to the National Library by publishers and authors through the National eDeposit Scheme (NED).

Continue reading →

More tools and data for working with Trove's digitised periodicals

The Trove Periodicals section of the GLAM Workbench has been updated! Some changes were necessary to make use of version 3 of the Trove API, but I’ve also taken the chance to reorganise things a bit – starting with the name. This section used to be called ‘Trove journals’, reflecting the naming of Trove’s ‘Journals’ zone. But zones have gone, and periodicals are now spread across multiple categories, so I thought a name change was necessary to better reflect the type of content being examined.

Continue reading →

A new way to explore editorial cartoons from *The Bulletin*

About five years ago I created a collection of full-page editorial cartoons from The Bulletin, harvested from Trove. Through a process that might be politely described as ‘iterative’, I fiddled with an assortment of queries and methods until I had at least one cartoon from every issue published between 4 September 1886 and 17 September 1952 – 3,471 cartoons in total. The details of the collection and how I created it are available in the Trove periodicals section of the GLAM Workbench.

Continue reading →