Some Sands & Mac tweaks thanks to ALTO and IIIF
I posted recently about my new fully-searchable version of the Sands & MacDougall directories. I’ve now moved on to try and pull together a number of the State Library of Victoria’s place-based collections into a new discovery interface. It’s going to be a busy couple of weeks as my residency ends in early December!
I wanted to incorporate Sands & Mac search results into the new interface. Getting the data was easy because Datasette has a JSON API baked in. But what about the images? I could just display a thumbnail of the whole page, but it would be better to show a snippet of the actual entry. Thanks to IIIF and ALTO, I now can.
IIIF makes it easy to cut small sections out of a larger image. You just put the coordinates of the desired section in the IIIF url. As I noted in my previous post, the ALTO files that contain the OCR data from Sands & Mac include the coordinates of every line, and every word. I just had to bring the two together.
All I did was update the code that extracts the data from the ALTO files to save the results as newline delimited JSON instead of a plain text file. Each line in each volume of Sands & Mac is now saved a JSON object that contains the text, as well as the height, width, vertical position, and horizontal position of the line within the page image. When I load up the SQLite database, I add the values for h, w, x, and y as well as the text for each line.
What does this make possible?
- When you go to an individual entry, the page image now automatically pans and zooms so that the current entry is at the centre of the image viewer. I just updated the OpenSeadragon code to focus on the entry’s position.
- If you share an entry on social media, a snipped out section of the page image showing the selected entry is displayed as there’s now an image
METAtag that points to an IIIF url. - You can retrieve entries via the API and use the coordinates to request snipped out images of them via IIIF.
