Tim Sherratt

Sharing recent updates and work-in-progress

Nov 2021

Harvest newspaper issues as PDFs

An inquiry on Twitter prompted me to put together a notebook that you can use to download all available issues of a newspaper as PDFs. It was really just a matter of copying code from other tools and making a few modifications. The first step harvests a list of available issues for a particular newspaper from Trove. You can then download the PDFs of those issues, supplying an optional date range. Beware – this could consume a lot of disk space!

The PDF file names have the following structure:

[newspaper identifier]-[issue date as YYYYMMDD]-[issue identifier].pdf

For example:

903-19320528-1791051.pdf

I also took the opportunity to create a new Harvesting data heading in the Trove newspapers section of the GLAM Workbench. #dhhacks