About five years ago I created a collection of full-page editorial cartoons from The Bulletin, harvested from Trove. Through a process that might be politely described as ‘iterative’, I fiddled with an assortment of queries and methods until I had at least one cartoon from every issue published between 4 September 1886 and 17 September 1952 – 3,471 cartoons in total. The details of the collection and how I created it are available in the Trove periodicals section of the GLAM Workbench.
Last night, as I was tidying up a new release of the Trove periodicals repository, I had a thought – why not put all of the details of the cartoons in a little database and make it available using Datasette-Lite for easy exploration? So I did.
One of the coolest new features is that I’ve harvested the OCRd text from each page containing a cartoon and created a full-text index. This means you can find cartoons by searching for words in their captions! Other features include embedded thumbnail images and links to download high-resolution versions of each page image.
In creating the database, I realised there were a few problems with the original metadata (dodgy page numbers), so I’ve fixed that up as well. I’ve also moved the mega zip download of every image (over 60gb) from the unfortunately deceased CloudStor service to AWS.