Sharing recent updates and work-in-progress
As part of my work on the Everyday Heritage project I’m looking at how we can make better use of digitised collections to explore the everyday experiences woven around places such as Parramatta Road in Sydney. For example, the NSW Postal Directories from 1886 to 1908 and 1909 to 1950 have been digitised by the State Library of NSW and made available through Trove. The directories list residences and businesses by name and street location. Using them we can explore changes in the use of Parramatta Road across 60 years of history. But there’s a problem. While you can browse the directories page by page, searching is clunky. Trove’s main search indexes the contents of the directories by ‘article’. Each ‘article’ can be many pages long, so it’s difficult to focus in on the matching text. Clicking through from the search results to the digitised volume lands you in another set of search results, showing all the matches in the volume. However, the internal search index works differently to the main Trove index. In particular it doesn’t seem to understand phrase or boolean searches. If you start off searching for “parramatta road” , Trove tells you there’s 50 matching articles, but if you click through to a volume you’re told there’s no results. If you remove the quotes you get every match for ‘parramatta’ or ‘road’. It’s all pretty confusing.
The idea of ‘articles’ is really not very useful for publications like the Post Office Directories where information is mostly organised by column, row or line. You want to be able to search for a name, and go directly to the line in the directory where that name is mentioned. And now you can! Using Datasette, I’ve created an interface that searches by line across all 54 volumes of the NSW Post Office Directory from 1886 to 1950 (that’s over 30 million lines of text).
The full text search supports phrase, boolean, and wildcard searches. Just enter your query in the main search box to get results from all 54 volumes in a flash.
Each search result is a single line of text. Click on the link to view this line in context – it’ll show you 5 lines above and below your match, as well as a zoomable image of the digitised page from Trove.
For more context, you can click on View full page to see all the lines of text extracted from that page. You can then use the Next and Previous buttons to browse page by page.
To view the full digitised volume, just click on the View page in Trove button.
There were a few stages involved in creating this resource, but mostly I was able to reuse bits of code from the GLAM Workbench’s Trove journals and books sections, and other related projects such as the GLAM Name Index Search. Here’s a summary of the processing steps:
publishfunction to push the database to Google’s Cloudrun service.
All the code is available in the journals section of the GLAM Workbench.
One of the most exciting things to me is that this processing pipeline can be used with any digitised publication on Trove where it would be easier to search by line rather than article. Any suggestions?