Some Sands & Mac tweaks thanks to ALTO and IIIF

I posted recently about my new fully-searchable version of the Sands & MacDougall directories. I’ve now moved on to try and pull together a number of the State Library of Victoria’s place-based collections into a new discovery interface. It’s going to be a busy couple of weeks as my residency ends in early December!

I wanted to incorporate Sands & Mac search results into the new interface. Getting the data was easy because Datasette has a JSON API baked in. But what about the images? I could just display a thumbnail of the whole page, but it would be better to show a snippet of the actual entry. Thanks to IIIF and ALTO, I now can.

IIIF makes it easy to cut small sections out of a larger image. You just put the coordinates of the desired section in the IIIF url. As I noted in my previous post, the ALTO files that contain the OCR data from Sands & Mac include the coordinates of every line, and every word. I just had to bring the two together.

All I did was update the code that extracts the data from the ALTO files to save the results as newline delimited JSON instead of a plain text file. Each line in each volume of Sands & Mac is now saved a JSON object that contains the text, as well as the height, width, vertical position, and horizontal position of the line within the page image. When I load up the SQLite database, I add the values for h, w, x, and y as well as the text for each line.

What does this make possible?

  1. When you go to an individual entry, the page image now automatically pans and zooms so that the current entry is at the centre of the image viewer. I just updated the OpenSeadragon code to focus on the entry’s position.
  2. If you share an entry on social media, a snipped out section of the page image showing the selected entry is displayed as there’s now an image META tag that points to an IIIF url.
  3. You can retrieve entries via the API and use the coordinates to request snipped out images of them via IIIF.
Nice image snippets thanks to IIIF and ALTO (and a sneak prview of what's coming...)

glamworkbench