I was recently contacted by a researcher who wanted to be able to automatically download the issues of a digitised periodical in Trove as PDFs. There was already a notebook in the GLAM Workbench that downloads the issues of a digitised newspaper as PDFs, but newspapers work differently to other digitised periodicals in Trove. While there was no corresponding notebook for other types of periodicals, all the necessary steps were documented in the Trove Data Guide, so it was just a matter of pulling together a few blocks of code.
There are three main steps:
nla.obj
identifiers for all the periodical’s issuesnla.obj
identifier and the number of pagesVersion 3 of the Trove API added a new endpoint to provide information about periodical titles and issues. However, the issues data provided by the API is incomplete. A more reliable alternative is to scrape the list of issues from the browse window in the digitised object viewer – see HOW TO: Get a list of items from a digitised collection in the Trove Data Guide.
It’s possible to scrape the number of pages along with the identifiers in the previous step. However, I’m not certain that the information is displayed consistently across all periodicals. To play it safe, you can extract embedded metadata from the digitised object viewer and get the number of pages, issue dates, and publication details (if available). See HOW TO: Extract additional metadata from the digitised resource viewer in the Trove Data Guide.
Once you have an issue’s identifier and number of pages you can construct a url to download it as a PDF. See: HOW TO: Get text, images, and PDFs using Trove’s download link in the Trove Data Guide.
It seemed like this would be useful to other researchers as well, so I’ve created a new notebook in the Trove Periodicals section of the GLAM Workbench that puts all of this together, see: Download issues of a periodical as PDFs.