19 Sep 2024

Saving Trove's digitised periodicals as PDFs

I was recently contacted by a researcher who wanted to be able to automatically download the issues of a digitised periodical in Trove as PDFs. There was already a notebook in the GLAM Workbench that downloads the issues of a digitised newspaper as PDFs, but newspapers work differently to other digitised periodicals in Trove. While there was no corresponding notebook for other types of periodicals, all the necessary steps were documented in the Trove Data Guide, so it was just a matter of pulling together a few blocks of code.

There are three main steps:

get the nla.obj identifiers for all the periodical’s issues
get the number of pages in each issue
construct a url to download each issue as a PDF using the nla.obj identifier and the number of pages

Get issue identifiers

Version 3 of the Trove API added a new endpoint to provide information about periodical titles and issues. However, the issues data provided by the API is incomplete. A more reliable alternative is to scrape the list of issues from the browse window in the digitised object viewer – see HOW TO: Get a list of items from a digitised collection in the Trove Data Guide.

Get number of pages in each issue

It’s possible to scrape the number of pages along with the identifiers in the previous step. However, I’m not certain that the information is displayed consistently across all periodicals. To play it safe, you can extract embedded metadata from the digitised object viewer and get the number of pages, issue dates, and publication details (if available). See HOW TO: Extract additional metadata from the digitised resource viewer in the Trove Data Guide.

Download PDFs

Once you have an issue’s identifier and number of pages you can construct a url to download it as a PDF. See: HOW TO: Get text, images, and PDFs using Trove’s download link in the Trove Data Guide.

Putting it all together

It seemed like this would be useful to other researchers as well, so I’ve created a new notebook in the Trove Periodicals section of the GLAM Workbench that puts all of this together, see: Download issues of a periodical as PDFs.

Tim Sherratt

Saving Trove's digitised periodicals as PDFs

Get issue identifiers

Get number of pages in each issue

Download PDFs

Putting it all together