<rss xmlns:source="http://source.scripting.com/" version="2.0">
  <channel>
    <title>Tim Sherratt</title>
    <link>https://updates.timsherratt.org/</link>
    <description></description>
    
    <language>en</language>
    
    <lastBuildDate>Mon, 16 Mar 2026 15:14:17 +1100</lastBuildDate>
    <item>
      <title>Generosity in practice – a chat with Paula Bray at the State Library of Victoria</title>
      <link>https://updates.timsherratt.org/2026/03/16/generosity-in-practice-a-chat.html</link>
      <pubDate>Mon, 16 Mar 2026 15:14:17 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2026/03/16/generosity-in-practice-a-chat.html</guid>
      <description>&lt;p&gt;While I was in Melbourne during my time as &lt;a href=&#34;https://lab.slv.vic.gov.au/team/tim-sherratt&#34;&gt;Creative Technologist-in-Residence at the State Library of Victorian LAB&lt;/a&gt;, I had a conversation with Paula Bray for the LAB&amp;rsquo;s podcast series. Paula is the SLV&amp;rsquo;s Chief Digital Officer, and has long championed the importance of digital innovation in the GLAM sector. It was fun to chat about stuff that I&amp;rsquo;ve been doing for the last 30 years, any why openness and generosity is important in working with GLAM collections. You can &lt;a href=&#34;https://lab.slv.vic.gov.au/experiments/my-place-tim-sherratt/interview&#34;&gt;listen to our conversation on the LAB site&lt;/a&gt;.&lt;/p&gt;
</description>
      <source:markdown>While I was in Melbourne during my time as [Creative Technologist-in-Residence at the State Library of Victorian LAB](https://lab.slv.vic.gov.au/team/tim-sherratt), I had a conversation with Paula Bray for the LAB&#39;s podcast series. Paula is the SLV&#39;s Chief Digital Officer, and has long championed the importance of digital innovation in the GLAM sector. It was fun to chat about stuff that I&#39;ve been doing for the last 30 years, any why openness and generosity is important in working with GLAM collections. You can [listen to our conversation on the LAB site](https://lab.slv.vic.gov.au/experiments/my-place-tim-sherratt/interview).


</source:markdown>
    </item>
    
    <item>
      <title>Zotero translator for Libraries Tasmania updated!</title>
      <link>https://updates.timsherratt.org/2026/03/10/zotero-translator-for-libraries-tasmania.html</link>
      <pubDate>Tue, 10 Mar 2026 10:31:36 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2026/03/10/zotero-translator-for-libraries-tasmania.html</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://www.zotero.org&#34;&gt;Zotero&lt;/a&gt; translator for &lt;a href=&#34;https://libraries.tas.gov.au&#34;&gt;Libraries Tasmania&lt;/a&gt; has been updated, fixing a problem with attaching images of digitised resources. The fix is in the main Zotero repository now, so it should find its way to your computer automatically.&lt;/p&gt;
&lt;p&gt;I created the first version of the Libraries Tasmania translator back in 2022 – &lt;a href=&#34;https://updates.timsherratt.org/2022/07/14/calling-all-tasmanian.html&#34;&gt;this post describes what it does&lt;/a&gt;. It works across all three sections of the catalogue, including the archives, and names index. The translator captures metadata, PDFs, and images from records, including things like digitised pages from convict records. This makes it easy for researchers to assemble their own datasets of Tasmanian records in Zotero, where they can add notes and annotations, or share with colleagues.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/zotero-librariestas.png&#34; width=&#34;600&#34; height=&#34;382&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Capture images and metadata from the Libraries Tasmania catalogue using Zotero&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;The update was necessary because Libraries Tasmania changed the way some digitised resources were displayed and downloaded. Keeping Zotero translators working across system updates can take a bit of work! I also took the opportunity to update the code to meet current Zotero guidelines and clean up a few lingering problems. If you notice any oddities, please let me know.&lt;/p&gt;
&lt;p&gt;There are now at least &lt;a href=&#34;https://updates.timsherratt.org/2024/08/22/new-zotero-translators.html#zotero-and-australian-glams&#34;&gt;8 custom translators&lt;/a&gt; to help you work with Australian GLAM collections.&lt;/p&gt;
</description>
      <source:markdown>The [Zotero](https://www.zotero.org) translator for [Libraries Tasmania](https://libraries.tas.gov.au) has been updated, fixing a problem with attaching images of digitised resources. The fix is in the main Zotero repository now, so it should find its way to your computer automatically.

I created the first version of the Libraries Tasmania translator back in 2022 – [this post describes what it does](https://updates.timsherratt.org/2022/07/14/calling-all-tasmanian.html). It works across all three sections of the catalogue, including the archives, and names index. The translator captures metadata, PDFs, and images from records, including things like digitised pages from convict records. This makes it easy for researchers to assemble their own datasets of Tasmanian records in Zotero, where they can add notes and annotations, or share with colleagues.

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/zotero-librariestas.png&#34; width=&#34;600&#34; height=&#34;382&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Capture images and metadata from the Libraries Tasmania catalogue using Zotero&lt;/figcaption&gt;&lt;/figure&gt;

The update was necessary because Libraries Tasmania changed the way some digitised resources were displayed and downloaded. Keeping Zotero translators working across system updates can take a bit of work! I also took the opportunity to update the code to meet current Zotero guidelines and clean up a few lingering problems. If you notice any oddities, please let me know.

There are now at least [8 custom translators](https://updates.timsherratt.org/2024/08/22/new-zotero-translators.html#zotero-and-australian-glams) to help you work with Australian GLAM collections. 
</source:markdown>
    </item>
    
    <item>
      <title>Exploring georeferenced maps from the SLV collection</title>
      <link>https://updates.timsherratt.org/2026/02/12/exploring-georeferenced-maps-from-the.html</link>
      <pubDate>Thu, 12 Feb 2026 23:05:06 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2026/02/12/exploring-georeferenced-maps-from-the.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;m in the process of tying up all the documentation relating to my time as &lt;a href=&#34;https://lab.slv.vic.gov.au/experiments/my-place-tim-sherratt/people-place-library-data-tim-sherratt&#34;&gt;Creative Technologist-in-Residence at the State Library of Victoria LAB&lt;/a&gt;. But as I was looking through &lt;a href=&#34;https://slv.wraggelabs.com&#34;&gt;the list of outputs&lt;/a&gt;, I realised I&amp;rsquo;d never written anything about the interface I created to explore georeferenced maps from the SLV collection.&lt;/p&gt;
&lt;p&gt;I also remembered that there were a few improvements I wanted to make to the interface. So instead of spending a few hours writing up a blog post, I&amp;rsquo;ve spent several days completely overhauling the &lt;a href=&#34;https://slv.wraggelabs.com/geomaps/&#34;&gt;Georeferenced Maps Explorer&lt;/a&gt;. I&amp;rsquo;m pretty happy with how it&amp;rsquo;s working now. &lt;strong&gt;&lt;a href=&#34;https://slv.wraggelabs.com/geomaps/&#34;&gt;Have a play!&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/prom-maps.png&#34; width=&#34;600&#34; height=&#34;354&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Wilson&#39;s Prom made up of a patchwork of georeferenced maps and aerial photographs using the Georefrenced Maps Explorer. &lt;a href=&#34;https://slv.wraggelabs.com/geomaps/&#34;&gt;Try it now!&lt;/a&gt;&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;To get started, just click on the basemap. Details of all georeferenced maps within 50km of your selected point will be displayed in the right-hand column. As you move your mouse over the list of results, the boundaries of the georeferenced maps will be displayed on the basemap. This gives you a preview of their location and size. Click on one of the results to display the georeferenced map as a layer on top of the modern basemap.&lt;/p&gt;
&lt;figure&gt;&lt;video src=&#34;https://cdn.uploads.micro.mov/8371/2026/video-2026-02-12-13-39-23/playlist.m3u8&#34; poster=&#34;https://cdn.uploads.micro.blog/8371/2026/frames/1679959-0-ad6b9b.jpg&#34; width=&#34;1920&#34; height=&#34;1080&#34; controls=&#34;controls&#34; preload=&#34;metadata&#34;&gt;&lt;/video&gt;&lt;figcaption&gt;Hover over a result to see the map boundaries&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;You can add as many maps as you like. If your selected maps overlap, you can change the order in which they&amp;rsquo;re shown. Click on the layers icon in the top left of the basemap. You&amp;rsquo;ll see a list of the maps that are currently displayed. Use the arrow buttons to move a map backwards or forwards. You can also use the sliders to adjust the opacity of each map. This can make it easier to examine the relationship between maps. For example, you might want to compare the features of a historic map with those of the underlying basemap.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/photomaps.png&#34; width=&#34;600&#34; height=&#34;363&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Stitch together multiple maps like this series of seven photomaps, and change the opacity to see the features underneath&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;The Explorer&amp;rsquo;s url updates with every selection you make, so you can bookmark or share a url to return to the same position and collection of maps. For example, &lt;a href=&#34;https://slv.wraggelabs.com/geomaps/?lat=-38.96774552450964&amp;amp;lon=146.39004985426214&amp;amp;map_id=215c1310ba3a4968&amp;amp;map_id=4fe5fff0d41e3958&amp;amp;map_id=87551272fa78f2bd&amp;amp;map_id=2cc91e9b1b4bd533&#34;&gt;this link&lt;/a&gt; will take you to the collection of maps of Wilson&amp;rsquo;s Prom shown above.&lt;/p&gt;
&lt;h2 id=&#34;the-background&#34;&gt;The background&lt;/h2&gt;
&lt;p&gt;If you missed the start of this journey back in November last year, you might be wondering what the georeferenced maps are and where they come from. During my SLV LAB residency, I found a way of &lt;a href=&#34;https://updates.timsherratt.org/2025/11/04/turning-the-slvs-maps-into.html&#34;&gt;hooking the SLV&amp;rsquo;s digitised maps up to a tool called Allmaps&lt;/a&gt; that helps you identify points that connect historic maps to our modern coordinate system. When enough points have been identified, the historic maps can be positioned on a modern basemap. This is known as georeferencing, georectifying, or &amp;lsquo;map warping&amp;rsquo;,  as the results can often appear skewed or warped.&lt;/p&gt;
&lt;p&gt;Once I had connected things up, I invited the world (or at least the tiny part of it that follows me on social media) to help turn the SLV&amp;rsquo;s maps into data. And they did! As of today, &lt;strong&gt;1,447&lt;/strong&gt; of the SLV&amp;rsquo;s digitised maps have been georeferenced. &lt;a href=&#34;https://wragge.github.io/slv-allmaps/dashboard.html&#34;&gt;This dashboard displays current georeferencing progress&lt;/a&gt;.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/visualization-3.png&#34; width=&#34;600&#34; height=&#34;211&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;The total number of SLV maps georeferenced over time. It&#39;s still going up!&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;There&amp;rsquo;s still plenty more to do. If you&amp;rsquo;d like to help, &lt;a href=&#34;https://wragge.github.io/slv-allmaps/&#34;&gt;the full instructions are available here&lt;/a&gt;. Georeferencing is pretty fun, so why not have a go?&lt;/p&gt;
&lt;p&gt;You can explore the current collection of georeferenced maps in a few different ways. There&amp;rsquo;s a dataset you can &lt;a href=&#34;https://github.com/wragge/slv-allmaps/blob/main/georeferenced_maps.csv&#34;&gt;download&lt;/a&gt; or &lt;a href=&#34;https://glam-workbench.net/datasette-lite/?csv=https%3A%2F%2Fgithub.com%2Fwragge%2Fslv-allmaps%2Fblob%2Fmain%2Fgeoreferenced_maps_datasette.csv&amp;amp;install=datasette-homepage-table&amp;amp;install=datasette-json-html&amp;amp;fts=manifest_title%2Cmap_title&#34;&gt;search&lt;/a&gt; that gets updated every two hours. This data is loaded into &lt;a href=&#34;https://slv-places-481615284700.australia-southeast1.run.app/&#34;&gt;a spatial database&lt;/a&gt; that&amp;rsquo;s used by the &lt;a href=&#34;https://slv.wraggelabs.com/geomaps/&#34;&gt;Georeferenced Maps Explorer&lt;/a&gt;. As part of my recent improvements, I&amp;rsquo;ve automated this process as well, so the database should be updated with the latest additions every 24 hours.&lt;/p&gt;
&lt;p&gt;You can also search for georeferenced maps using &lt;a href=&#34;https://slv.wraggelabs.com/myplace/&#34;&gt;the my place app&lt;/a&gt;. You just enter an address and my place pulls together data from a variety of sources – mixing the georeferenced maps up with parish maps, newspapers, photos, and entries from the Sands &amp;amp; MacDougall&amp;rsquo;s directories.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-geo.png&#34; width=&#34;600&#34; height=&#34;296&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Georeferenced maps in &lt;a href=&#34;https://slv.wraggelabs.com/myplace/&#34;&gt;my place&lt;/a&gt; results&lt;/figcaption&gt;&lt;/figure&gt;
&lt;h2 id=&#34;the-interface&#34;&gt;The interface&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;https://slv.wraggelabs.com/geomaps/&#34;&gt;Georeferenced Maps Explorer&lt;/a&gt; uses &lt;a href=&#34;https://maplibre.org&#34;&gt;MapLibre&lt;/a&gt; and the &lt;a href=&#34;https://github.com/allmaps/allmaps/tree/main/packages/maplibre&#34;&gt;Allmaps MapLibre plugin&lt;/a&gt; to display the georeferenced maps. You might notice that it looks pretty similar to the &lt;a href=&#34;https://slv.wraggelabs.com/newspapers/&#34;&gt;Newspapers Explorer&lt;/a&gt; and the &lt;a href=&#34;https://slv.wraggelabs.com/cua/&#34;&gt;CUA Browser&lt;/a&gt;, both of which use MapLibre, as well as &lt;a href=&#34;https://bulma.io&#34;&gt;Bulma&lt;/a&gt; for CSS. I&amp;rsquo;ve been trying to settle on a fairly standard set of tools that I can use to create and maintain these sorts of interfaces without too much fuss. Basically I just cut and paste a lot of stuff, then modify as needed.&lt;/p&gt;
&lt;p&gt;When you click on the basemap in the Explorer, the coordinates are sent off to the spatial database to retrieve details of georeferenced maps within 50km. The spatial database runs in &lt;a href=&#34;https://datasette.io&#34;&gt;Datasette&lt;/a&gt;, which has a built-in JSON API that I use with a set of predefined &amp;lsquo;canned&amp;rsquo; queries to pull back the data I need. The results are displayed in the right-hand column, along with square thumbnails generated by the SLV&amp;rsquo;s &lt;a href=&#34;https://iiif.io/&#34;&gt;IIIF&lt;/a&gt; service.&lt;/p&gt;
&lt;p&gt;The metadata includes distance and area measures. These are used to find and sort the results. There are two distance measures, one from your selected point to the closest boundary of a map, and the other to the centre of a map. If the point is contained within a map&amp;rsquo;s boundaries, then the &amp;lsquo;bounds&amp;rsquo; distance is zero. The search query finds maps whose closest boundaries are within 50km. Originally I sorted the results by this distance and the area of the maps. But this meant that large scale maps that included the selected point (such as maps of the whole of Victoria) appeared above nearby local maps. To make it easier to find maps within an area, I added the &amp;lsquo;centre&amp;rsquo; distance and now sort the results using that. This allows nearby maps that don&amp;rsquo;t include the current point to bubble up towards the top of the search results, above many of the large scale maps. It&amp;rsquo;s far from perfect, but I think it strikes an ok balance.&lt;/p&gt;
&lt;p&gt;The data also includes the boundaries of each map as GeoJSON. I use this to generate a MapLibre layer that contains all the boundaries as polygons. The boundaries are hidden until you hover over the corresponding search result, then the opacity of the boundary is flipped to &lt;code&gt;1&lt;/code&gt; and it magically appears.&lt;/p&gt;
&lt;p&gt;When you click on a search result, a request is fired off to &lt;a href=&#34;https://allmaps.org&#34;&gt;Allmaps&lt;/a&gt; for the full georeferencing data. The Allmaps plugin uses this to retrieve the map image from the SLV&amp;rsquo;s IIIF service and display the warped map in MapLibre.&lt;/p&gt;
&lt;p&gt;I looked around for quite a while to find a good way of changing the opacity and order of the warped maps in MapLibre. I eventually found the &lt;a href=&#34;https://github.com/wragge/maplibre-gl-layer-manager&#34;&gt;Map Libre GL Layer Manager&lt;/a&gt; which did a lot of what I wanted. I &lt;a href=&#34;https://github.com/wragge/maplibre-gl-layer-manager&#34;&gt;forked the repository&lt;/a&gt; and modified the code to get the opacity slider to work with warped map layers. Warped map layers already have a &lt;code&gt;setOpacity&lt;/code&gt; method, it was just a matter of checking for &amp;lsquo;custom&amp;rsquo; layers, then finding where the warped map was in the layer object.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-javascript&#34; data-lang=&#34;javascript&#34;&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (&lt;span style=&#34;color:#a6e22e&#34;&gt;type&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;custom&amp;#34;&lt;/span&gt;) {
        &lt;span style=&#34;color:#a6e22e&#34;&gt;layer&lt;/span&gt;.&lt;span style=&#34;color:#a6e22e&#34;&gt;implementation&lt;/span&gt;.&lt;span style=&#34;color:#a6e22e&#34;&gt;setOpacity&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;opacity&lt;/span&gt;);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I also made a few cosmetic changes, such as renaming the tooltips on the reorder buttons from &amp;lsquo;move up&amp;rsquo; and &amp;lsquo;move down&amp;rsquo; to &amp;lsquo;send back&amp;rsquo; and &amp;lsquo;bring forward&amp;rsquo; – up and down just confused me.&lt;/p&gt;
&lt;p&gt;I tried for a long time to find some way of adding tooltips or popups to the warped maps that would show their details when you moved the mouse over them. I found that if you were displaying multiple maps that looked similar, such as the photomaps above, it was difficult to know which map was which. After a chat with the &lt;a href=&#34;https://allmaps.org&#34;&gt;Allmaps&lt;/a&gt; developers in their IIIF Slack channel, I realised that this approach wouldn&amp;rsquo;t work as the warped map layers don&amp;rsquo;t currently listen to mouse events. Instead I decided to add hover events to the list of results, rather than the maps, and use them to display the map boundaries as described above. This way I get the connection between the map and metadata that I wanted, as well as a useful way of previewing results.&lt;/p&gt;
&lt;p&gt;I think I&amp;rsquo;ve probably stopped fiddling with the interface for now. I hope you find it useful!&lt;/p&gt;
&lt;h2 id=&#34;the-future&#34;&gt;The future?&lt;/h2&gt;
&lt;p&gt;There&amp;rsquo;s more that I&amp;rsquo;d like to do with the georeferenced maps. In particular, I&amp;rsquo;ve been thinking about an interface with a slider that showed the changing patchwork of maps over time&amp;hellip;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Related resources:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the code for the Georeferenced Newspapers Explorer and all the other apps and sites I created during my residency is &lt;a href=&#34;https://github.com/wragge/slv-demo-apps&#34;&gt;in this GitHub repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;the code to harvest the georeferenced data from Allmaps and build the dashboard is in &lt;a href=&#34;https://github.com/wragge/slv-allmaps&#34;&gt;this GitHub repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;there&amp;rsquo;s also the &lt;a href=&#34;https://slv.wraggelabs.com&#34;&gt;full list of all the apps, code, posts, and talks&lt;/a&gt; created during my residency&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.17613/m8c1d-50882&#34;&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/doi-10.17613-m8c1d-d9b01c.svg&#34; style=&#34;border:none;width:200px&#34;&gt;&lt;/a&gt;&lt;/p&gt;
</description>
      <source:markdown>I&#39;m in the process of tying up all the documentation relating to my time as [Creative Technologist-in-Residence at the State Library of Victoria LAB](https://lab.slv.vic.gov.au/experiments/my-place-tim-sherratt/people-place-library-data-tim-sherratt). But as I was looking through [the list of outputs](https://slv.wraggelabs.com), I realised I&#39;d never written anything about the interface I created to explore georeferenced maps from the SLV collection.

I also remembered that there were a few improvements I wanted to make to the interface. So instead of spending a few hours writing up a blog post, I&#39;ve spent several days completely overhauling the [Georeferenced Maps Explorer](https://slv.wraggelabs.com/geomaps/). I&#39;m pretty happy with how it&#39;s working now. **[Have a play!](https://slv.wraggelabs.com/geomaps/)**

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/prom-maps.png&#34; width=&#34;600&#34; height=&#34;354&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Wilson&#39;s Prom made up of a patchwork of georeferenced maps and aerial photographs using the Georefrenced Maps Explorer. &lt;a href=&#34;https://slv.wraggelabs.com/geomaps/&#34;&gt;Try it now!&lt;/a&gt;&lt;/figcaption&gt;&lt;/figure&gt;

To get started, just click on the basemap. Details of all georeferenced maps within 50km of your selected point will be displayed in the right-hand column. As you move your mouse over the list of results, the boundaries of the georeferenced maps will be displayed on the basemap. This gives you a preview of their location and size. Click on one of the results to display the georeferenced map as a layer on top of the modern basemap.

&lt;figure&gt;&lt;video src=&#34;https://cdn.uploads.micro.mov/8371/2026/video-2026-02-12-13-39-23/playlist.m3u8&#34; poster=&#34;https://cdn.uploads.micro.blog/8371/2026/frames/1679959-0-ad6b9b.jpg&#34; width=&#34;1920&#34; height=&#34;1080&#34; controls=&#34;controls&#34; preload=&#34;metadata&#34;&gt;&lt;/video&gt;&lt;figcaption&gt;Hover over a result to see the map boundaries&lt;/figcaption&gt;&lt;/figure&gt;

You can add as many maps as you like. If your selected maps overlap, you can change the order in which they&#39;re shown. Click on the layers icon in the top left of the basemap. You&#39;ll see a list of the maps that are currently displayed. Use the arrow buttons to move a map backwards or forwards. You can also use the sliders to adjust the opacity of each map. This can make it easier to examine the relationship between maps. For example, you might want to compare the features of a historic map with those of the underlying basemap.

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/photomaps.png&#34; width=&#34;600&#34; height=&#34;363&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Stitch together multiple maps like this series of seven photomaps, and change the opacity to see the features underneath&lt;/figcaption&gt;&lt;/figure&gt;

The Explorer&#39;s url updates with every selection you make, so you can bookmark or share a url to return to the same position and collection of maps. For example, [this link](https://slv.wraggelabs.com/geomaps/?lat=-38.96774552450964&amp;lon=146.39004985426214&amp;map_id=215c1310ba3a4968&amp;map_id=4fe5fff0d41e3958&amp;map_id=87551272fa78f2bd&amp;map_id=2cc91e9b1b4bd533) will take you to the collection of maps of Wilson&#39;s Prom shown above.

## The background

If you missed the start of this journey back in November last year, you might be wondering what the georeferenced maps are and where they come from. During my SLV LAB residency, I found a way of [hooking the SLV&#39;s digitised maps up to a tool called Allmaps](https://updates.timsherratt.org/2025/11/04/turning-the-slvs-maps-into.html) that helps you identify points that connect historic maps to our modern coordinate system. When enough points have been identified, the historic maps can be positioned on a modern basemap. This is known as georeferencing, georectifying, or &#39;map warping&#39;,  as the results can often appear skewed or warped.

Once I had connected things up, I invited the world (or at least the tiny part of it that follows me on social media) to help turn the SLV&#39;s maps into data. And they did! As of today, **1,447** of the SLV&#39;s digitised maps have been georeferenced. [This dashboard displays current georeferencing progress](https://wragge.github.io/slv-allmaps/dashboard.html).

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/visualization-3.png&#34; width=&#34;600&#34; height=&#34;211&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;The total number of SLV maps georeferenced over time. It&#39;s still going up!&lt;/figcaption&gt;&lt;/figure&gt;

There&#39;s still plenty more to do. If you&#39;d like to help, [the full instructions are available here](https://wragge.github.io/slv-allmaps/). Georeferencing is pretty fun, so why not have a go?

You can explore the current collection of georeferenced maps in a few different ways. There&#39;s a dataset you can [download](https://github.com/wragge/slv-allmaps/blob/main/georeferenced_maps.csv) or [search](https://glam-workbench.net/datasette-lite/?csv=https%3A%2F%2Fgithub.com%2Fwragge%2Fslv-allmaps%2Fblob%2Fmain%2Fgeoreferenced_maps_datasette.csv&amp;install=datasette-homepage-table&amp;install=datasette-json-html&amp;fts=manifest_title%2Cmap_title) that gets updated every two hours. This data is loaded into [a spatial database](https://slv-places-481615284700.australia-southeast1.run.app/) that&#39;s used by the [Georeferenced Maps Explorer](https://slv.wraggelabs.com/geomaps/). As part of my recent improvements, I&#39;ve automated this process as well, so the database should be updated with the latest additions every 24 hours.

You can also search for georeferenced maps using [the my place app](https://slv.wraggelabs.com/myplace/). You just enter an address and my place pulls together data from a variety of sources – mixing the georeferenced maps up with parish maps, newspapers, photos, and entries from the Sands &amp; MacDougall&#39;s directories.

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-geo.png&#34; width=&#34;600&#34; height=&#34;296&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Georeferenced maps in &lt;a href=&#34;https://slv.wraggelabs.com/myplace/&#34;&gt;my place&lt;/a&gt; results&lt;/figcaption&gt;&lt;/figure&gt;

## The interface

The [Georeferenced Maps Explorer](https://slv.wraggelabs.com/geomaps/) uses [MapLibre](https://maplibre.org) and the [Allmaps MapLibre plugin](https://github.com/allmaps/allmaps/tree/main/packages/maplibre) to display the georeferenced maps. You might notice that it looks pretty similar to the [Newspapers Explorer](https://slv.wraggelabs.com/newspapers/) and the [CUA Browser](https://slv.wraggelabs.com/cua/), both of which use MapLibre, as well as [Bulma](https://bulma.io) for CSS. I&#39;ve been trying to settle on a fairly standard set of tools that I can use to create and maintain these sorts of interfaces without too much fuss. Basically I just cut and paste a lot of stuff, then modify as needed.

When you click on the basemap in the Explorer, the coordinates are sent off to the spatial database to retrieve details of georeferenced maps within 50km. The spatial database runs in [Datasette](https://datasette.io), which has a built-in JSON API that I use with a set of predefined &#39;canned&#39; queries to pull back the data I need. The results are displayed in the right-hand column, along with square thumbnails generated by the SLV&#39;s [IIIF](https://iiif.io/) service.

The metadata includes distance and area measures. These are used to find and sort the results. There are two distance measures, one from your selected point to the closest boundary of a map, and the other to the centre of a map. If the point is contained within a map&#39;s boundaries, then the &#39;bounds&#39; distance is zero. The search query finds maps whose closest boundaries are within 50km. Originally I sorted the results by this distance and the area of the maps. But this meant that large scale maps that included the selected point (such as maps of the whole of Victoria) appeared above nearby local maps. To make it easier to find maps within an area, I added the &#39;centre&#39; distance and now sort the results using that. This allows nearby maps that don&#39;t include the current point to bubble up towards the top of the search results, above many of the large scale maps. It&#39;s far from perfect, but I think it strikes an ok balance.

The data also includes the boundaries of each map as GeoJSON. I use this to generate a MapLibre layer that contains all the boundaries as polygons. The boundaries are hidden until you hover over the corresponding search result, then the opacity of the boundary is flipped to `1` and it magically appears.

When you click on a search result, a request is fired off to [Allmaps](https://allmaps.org) for the full georeferencing data. The Allmaps plugin uses this to retrieve the map image from the SLV&#39;s IIIF service and display the warped map in MapLibre.

I looked around for quite a while to find a good way of changing the opacity and order of the warped maps in MapLibre. I eventually found the [Map Libre GL Layer Manager](https://github.com/wragge/maplibre-gl-layer-manager) which did a lot of what I wanted. I [forked the repository](https://github.com/wragge/maplibre-gl-layer-manager) and modified the code to get the opacity slider to work with warped map layers. Warped map layers already have a `setOpacity` method, it was just a matter of checking for &#39;custom&#39; layers, then finding where the warped map was in the layer object.

```javascript
if (type == &#34;custom&#34;) {
        layer.implementation.setOpacity(opacity);
```
I also made a few cosmetic changes, such as renaming the tooltips on the reorder buttons from &#39;move up&#39; and &#39;move down&#39; to &#39;send back&#39; and &#39;bring forward&#39; – up and down just confused me.

I tried for a long time to find some way of adding tooltips or popups to the warped maps that would show their details when you moved the mouse over them. I found that if you were displaying multiple maps that looked similar, such as the photomaps above, it was difficult to know which map was which. After a chat with the [Allmaps](https://allmaps.org) developers in their IIIF Slack channel, I realised that this approach wouldn&#39;t work as the warped map layers don&#39;t currently listen to mouse events. Instead I decided to add hover events to the list of results, rather than the maps, and use them to display the map boundaries as described above. This way I get the connection between the map and metadata that I wanted, as well as a useful way of previewing results.

I think I&#39;ve probably stopped fiddling with the interface for now. I hope you find it useful!

## The future?

There&#39;s more that I&#39;d like to do with the georeferenced maps. In particular, I&#39;ve been thinking about an interface with a slider that showed the changing patchwork of maps over time...

**Related resources:**

- the code for the Georeferenced Newspapers Explorer and all the other apps and sites I created during my residency is [in this GitHub repository](https://github.com/wragge/slv-demo-apps)
- the code to harvest the georeferenced data from Allmaps and build the dashboard is in [this GitHub repository](https://github.com/wragge/slv-allmaps)
- there&#39;s also the [full list of all the apps, code, posts, and talks](https://slv.wraggelabs.com) created during my residency


&lt;a href=&#34;https://doi.org/10.17613/m8c1d-50882&#34;&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/doi-10.17613-m8c1d-d9b01c.svg&#34; style=&#34;border:none;width:200px&#34;&gt;&lt;/a&gt;








</source:markdown>
    </item>
    
    <item>
      <title>my place – exploring SLV collections through a street address</title>
      <link>https://updates.timsherratt.org/2026/02/02/my-place-exploring-slv-collections.html</link>
      <pubDate>Mon, 02 Feb 2026 22:43:20 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2026/02/02/my-place-exploring-slv-collections.html</guid>
      <description>&lt;p&gt;&lt;em&gt;&amp;lsquo;What can I find out about my house?&#39;&lt;/em&gt; My work as &lt;a href=&#34;https://lab.slv.vic.gov.au/experiments/my-place-tim-sherratt&#34;&gt;Creative Technologist-in-Residence at the SLV LAB&lt;/a&gt; was inspired by questions like this that librarians at the SLV hear every day. I wanted to explore how the Library&amp;rsquo;s place-based collections could be used to provide new entry points for discovery and navigation – entry points based not on words, but locations.&lt;/p&gt;
&lt;p&gt;At the end of my residency, I pulled all the different collections I&amp;rsquo;d been working with into a single interface – &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt;. It&amp;rsquo;s not polished or complete, but I think it&amp;rsquo;s a useful starting point to think about the possibilities. You just type in an address, street name, or place name and my place shows you maps, photos, newspapers, and even extracts from the Sands &amp;amp; MacDougall directories. &lt;strong&gt;&lt;a href=&#34;https://slv.wraggelabs.com/myplace/&#34;&gt;Try it now!&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/screenshot-from-2026-02-02-17-44-07.png&#34; width=&#34;600&#34; height=&#34;433&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;&lt;a href=&#34;https://slv.wraggelabs.com/myplace/&#34;&gt;Try &lt;b&gt;&lt;i&gt;my place!&lt;/i&gt;&lt;/b&gt;&lt;/a&gt; Just enter an address in the search box.&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Search results in my place are bookmarkable. So save and share your discoveries!&lt;/p&gt;
&lt;h2 id=&#34;the-collections&#34;&gt;The collections&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; draws its data from a number of different place-based collections that I&amp;rsquo;ve been working on during my residency.&lt;/p&gt;
&lt;h3 id=&#34;openstreetmap&#34;&gt;OpenStreetMap&lt;/h3&gt;
&lt;p&gt;When you enter an address in the search box, &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; looks it up in &lt;a href=&#34;https://www.openstreetmap.org&#34;&gt;OpenStreetMap&lt;/a&gt; to get its geospatial coordinates. It then places a marker and re-centres the map at the top of the app.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp.png&#34; width=&#34;600&#34; height=&#34;219&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Map centred on 149 Brunswick Street, Fitzroy&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;OpenStreetMap is also used to retrieve additional information about the suburb, including its boundaries.&lt;/p&gt;
&lt;h3 id=&#34;sands--macdougalls-directories&#34;&gt;Sands &amp;amp; MacDougall&amp;rsquo;s directories&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; queries the &lt;a href=&#34;https://updates.timsherratt.org/2025/11/12/a-new-way-of-searching.html&#34;&gt;full-text searchable version of Sands &amp;amp; Mac&lt;/a&gt; for addresses. Results will vary based on the OCR quality and the nature of query, but it can give you a potted history of who has lived in your house. The search results are displayed in chronological order, and include an &lt;a href=&#34;https://updates.timsherratt.org/2025/11/16/some-sands-mac-tweaks-thanks.html&#34;&gt;image snippet&lt;/a&gt; showing the actual printed entry as well as the text content and metadata.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-sandm.png&#34; width=&#34;600&#34; height=&#34;349&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Occupants of 149 Brunswick Street, Fitzroy from 1875 to 1925&lt;/figcaption&gt;&lt;/figure&gt;
&lt;h3 id=&#34;committee-for-urban-action-photographs&#34;&gt;Committee for Urban Action photographs&lt;/h3&gt;
&lt;p&gt;If you enter a full street address, &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; will search &lt;a href=&#34;https://updates.timsherratt.org/2026/01/29/geolocating-photos-from-the-slvs.html&#34;&gt;the CUA collection&lt;/a&gt; for photos associated with the segment of road that includes the current address. It then displays the individual images from any matching photosets.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-cua.png&#34; width=&#34;600&#34; height=&#34;432&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Photographs from CUA of the currently selected road&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Otherwise &lt;em&gt;&lt;strong&gt;my space&lt;/strong&gt;&lt;/em&gt; will look for CUA photos that are near the current location, and display a randomly-selected image from each photoset.&lt;/p&gt;
&lt;h3 id=&#34;georeferenced-maps&#34;&gt;Georeferenced maps&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; searches through &lt;a href=&#34;https://updates.timsherratt.org/2025/11/04/turning-the-slvs-maps-into.html&#34;&gt;digitised maps from the SLV collection that have been georeferenced by the public&lt;/a&gt;. It finds maps that either intersect with the currently selected location, or are nearby.&lt;/p&gt;
&lt;p&gt;If you enter a full street address, the first 6 georeferenced maps will be positioned on a modern basemap with a marker indicating the currently selected point. This means you can see your address on a historical map. The number of georeferenced maps that can be displayed in this way is determined by the browser – so I&amp;rsquo;ve limited it to 6 to be safe.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-geo.png&#34; width=&#34;600&#34; height=&#34;296&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Georeferenced maps positioned on a modern basemap, showing the location of the currently selected address&lt;/figcaption&gt;&lt;/figure&gt;
&lt;h3 id=&#34;parish-maps&#34;&gt;Parish maps&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; searches through &lt;a href=&#34;https://updates.timsherratt.org/2025/10/06/creating-bounding-boxes-for-parish.html&#34;&gt;parish maps in the SLV collection that have geospatial coordinates or approximate bounding boxes&lt;/a&gt;. It finds maps that either intersect with the currently selected location, or are nearby.&lt;/p&gt;
&lt;h3 id=&#34;newspapers&#34;&gt;Newspapers&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; searches through &lt;a href=&#34;https://updates.timsherratt.org/2025/12/16/exploring-victorian-newspapers.html&#34;&gt;my dataset of newspapers in the SLV collection&lt;/a&gt; that have a place of publication documented in the &amp;lsquo;Place newspaper published&amp;rsquo; metadata field. It finds newspapers that are either associated with the current suburb/town, or a nearby suburb/town. This includes digitised and non-digitised titles. Digitised titles include a link to Trove.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-newspapers.png&#34; width=&#34;600&#34; height=&#34;293&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Newspapers from the SLV collection published in Fitzroy&lt;/figcaption&gt;&lt;/figure&gt;
&lt;h3 id=&#34;photographs&#34;&gt;Photographs&lt;/h3&gt;
&lt;p&gt;I thought it would be cool to include a few photographs of the current suburb or town. To do this, I downloaded a list of place names from VicNames, then used the place names to &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/download_place_images.ipynb&#34;&gt;search the SLV catalogue for photographs with relevant subject headings&lt;/a&gt;. A random selection of the harvested images is displayed in &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt;.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-images.png&#34; width=&#34;600&#34; height=&#34;234&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;A few images of Fitzroy, displayed alongside a map of Fitzroy&#39;s current boundaries using data from OpenStreetMap&lt;/figcaption&gt;&lt;/figure&gt;
&lt;h2 id=&#34;the-interface&#34;&gt;The interface&lt;/h2&gt;
&lt;p&gt;The interface is pretty simple. You type an address in the box and hit enter. If the geocoding process finds multiple matches, it&amp;rsquo;ll give you a list to choose from. Once the location is found, a marker is added and the main map re-centres. Then related resources are displayed below the map.&lt;/p&gt;
&lt;p&gt;As you scroll down through the results you gradually zoom out from your initial starting point. This is reflected in the four bands or layers used to group resources: &amp;lsquo;my house&amp;rsquo;, &amp;lsquo;my street&amp;rsquo;, &amp;lsquo;my suburb&amp;rsquo;, and &amp;lsquo;nearby&amp;rsquo;. Each band contains a mix of resources from different collections.&lt;/p&gt;
&lt;p&gt;When I started working on &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt;, I was thinking about a project from around 2010 called &lt;a href=&#34;https://wraggelabs.com/info/history-wall/&#34;&gt;The History Wall&lt;/a&gt;. Like &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt;, The History Wall pulled many different types of resources together into a rich exploratory interface. As you scrolled through The History Wall you moved through time, with randomly selected items appearing from a range of sources including Trove newspapers, the ADB, and museum collections.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/history-wall.jpg&#34; width=&#34;600&#34; height=&#34;505&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;A version of The History Wall created for the National Museum of Australia&#39;s &#39;Irish in Australia&#39; exhibition&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;I originally thought I&amp;rsquo;d inject some of the same randomness into &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt;, but I was worried it might just get too confusing. I thought it was important to keep the relationship between the starting point and the resources in focus even as you zoomed out. So my visual metaphor shifted to something more like a blast radius map, or a stratigraphic diagram, that displayed distinct groups and layers as you moved beyond the baseline. My limited CSS skills couldn&amp;rsquo;t make the vision in my head a reality, but there are lots of headings and colours instead to highlight the transitions!&lt;/p&gt;
&lt;p&gt;The actual mix of groups and layers displayed depends on the nature of your query. If you&amp;rsquo;ve entered a complete street address, and there are results for that address in Sands &amp;amp; Mac, then you&amp;rsquo;ll see &amp;lsquo;my house&amp;rsquo;, &amp;lsquo;my suburb&amp;rsquo;, and &amp;lsquo;nearby&amp;rsquo;. If you&amp;rsquo;ve only entered a suburb or town, or your street address can&amp;rsquo;t be found, you&amp;rsquo;ll see two layers starting with &amp;lsquo;my suburb&amp;rsquo;.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s an overview of what you might expect to see.&lt;/p&gt;
&lt;h3 id=&#34;my-house&#34;&gt;my house&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Sands &amp;amp; MacDougall extracts (text search on full address)&lt;/li&gt;
&lt;li&gt;georeferenced maps (search for maps that contain the base point)&lt;/li&gt;
&lt;li&gt;parish maps (search for maps that contain the base point)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;my-street&#34;&gt;my street&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;CUA photos (search for matching street identifiers)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;if there&amp;rsquo;s no &amp;lsquo;my house&amp;rsquo; layer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sands &amp;amp; MacDougall extracts (text search on street name and suburb)&lt;/li&gt;
&lt;li&gt;georeferenced maps (search for intersections between maps and street)&lt;/li&gt;
&lt;li&gt;parish maps (search for intersections between maps and street)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;my-suburbtown&#34;&gt;my suburb/town&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;suburb boundaries from OSM&lt;/li&gt;
&lt;li&gt;images (search for suburb name in metadata)&lt;/li&gt;
&lt;li&gt;newspapers (search for suburb name in metadata)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;if there&amp;rsquo;s no &amp;lsquo;my house&amp;rsquo; or &amp;lsquo;my street&amp;rsquo; layer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;georeferenced maps (search for intersections between maps and suburb boundaries)&lt;/li&gt;
&lt;li&gt;parish maps (search for intersections between maps and suburb boundaries)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;nearby&#34;&gt;nearby&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;CUA photos (search for photosets within 5km of the base point, filtered to remove current street)&lt;/li&gt;
&lt;li&gt;georeferenced maps (search for maps within 10km of base point, ordered by distance, max of 24 displayed)&lt;/li&gt;
&lt;li&gt;parish maps (search for maps within 10km of base point, ordered by distance, max of 24 displayed)&lt;/li&gt;
&lt;li&gt;newspapers (search for newspapers within 100km of base point, ordered by distance, max of 24 displayed)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;the-data&#34;&gt;The data&lt;/h2&gt;
&lt;p&gt;Most of the data used in &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; is stored in two SQLite databases – &lt;a href=&#34;https://glam-workbench.net/state-library-victoria/sands-macdougall-directories/&#34;&gt;one for Sands &amp;amp; Mac&lt;/a&gt;, and &lt;a href=&#34;https://slv-places-481615284700.australia-southeast1.run.app/&#34;&gt;the other for CUA, georeferenced maps, parish maps, and newspapers&lt;/a&gt;. The metadata for the collection images is stored in &lt;a href=&#34;https://raw.githubusercontent.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/refs/heads/main/place_images.json&#34;&gt;a JSON file&lt;/a&gt; that is directly loaded by the interface.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve published the SQLite databases online using &lt;a href=&#34;https://datasette.io&#34;&gt;Datasette&lt;/a&gt; and &lt;a href=&#34;https://www.gaia-gis.it/fossil/libspatialite/index&#34;&gt;Spatialite&lt;/a&gt;. Spatialite makes it possible to find geospatial features that intersect, or are near, a given point. For example, you could find maps that include a specific set of coordinates.&lt;/p&gt;
&lt;p&gt;Datasette has the ability to create &lt;a href=&#34;https://docs.datasette.io/en/stable/sql_queries.html#canned-queries&#34;&gt;&amp;lsquo;canned queries&amp;rsquo;&lt;/a&gt; that feed url parameters into pre-defined SQL queries. This coupled with Datasette&amp;rsquo;s &lt;a href=&#34;https://docs.datasette.io/en/stable/json_api.html&#34;&gt;built-in JSON API&lt;/a&gt; makes it possible to construct query urls in &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; and use them to retrieve JSON results data from my databases.&lt;/p&gt;
&lt;p&gt;When you enter an address in &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt;, multiple queries are fired off to find intersecting or nearby resources. For example, this url finds georeferenced maps within 10km of a point at the centre of Fitzroy: &lt;a href=&#34;https://slv-places-481615284700.australia-southeast1.run.app/georeferenced_maps/maps_from_wkt.json?wkt=POINT(144.977468%20-37.803143)&#34;&gt;slv-places-481615284700.australia-southeast1.run.app/georefere&amp;hellip;&lt;/a&gt;&amp;amp;distance=10000&amp;amp;_shape=array.&lt;/p&gt;
&lt;p&gt;In the case of Sands &amp;amp; Mac, &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; uses a canned query that runs a full-text search across the OCRd content of a volume. Suburb names are often abbreviated in Sands &amp;amp; Mac, so the app first runs a query to find possible abbreviations, then adds them into the main query to inject a bit of fuzziness. This is repeated for all 24 digitised volumes.&lt;/p&gt;
&lt;p&gt;Once the metadata is retrieved from the databases, images are loaded from the SLV&amp;rsquo;s IIIF service.&lt;/p&gt;
&lt;h2 id=&#34;next-steps&#34;&gt;Next steps?&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m not sure how much more work I&amp;rsquo;ll do on &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt;, but there are a few things I&amp;rsquo;d like to try. In particular, I&amp;rsquo;d like to help the user understand more about what data is being presented, or not presented, and why. Not all digitised maps have been georeferenced, not all parish maps have coordinates, street numbers have changed, and the OCR in Sands &amp;amp; Mac varies in quality. &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; can only present a sample – a gesture towards the wealth of material available from the SLV. I feel that message needs to be made more explicit. Though I&amp;rsquo;m not sure how without overloading the interface.&lt;/p&gt;
&lt;p&gt;There are additional data sources I&amp;rsquo;d like to play around with. &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; already includes some code to query &lt;a href=&#34;https://www.wikidata.org/&#34;&gt;Wikidata&lt;/a&gt; for more information about a suburb. But I haven&amp;rsquo;t had a chance to do anything with it. I&amp;rsquo;d like to be able to provide additional contextual information from outside the SLV, such as electoral boundaries, populations, even election results. It would also be fun to display sightings of plants and animals from the &lt;a href=&#34;https://www.ala.org.au&#34;&gt;Atlas of Living Australia&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;What can I find out about my house? It would be great if &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; could answer that question by taking the user on an open ended journey through our cultural, historical, and environmental landscape.&lt;/p&gt;
</description>
      <source:markdown>*&#39;What can I find out about my house?&#39;* My work as [Creative Technologist-in-Residence at the SLV LAB](https://lab.slv.vic.gov.au/experiments/my-place-tim-sherratt) was inspired by questions like this that librarians at the SLV hear every day. I wanted to explore how the Library&#39;s place-based collections could be used to provide new entry points for discovery and navigation – entry points based not on words, but locations.

At the end of my residency, I pulled all the different collections I&#39;d been working with into a single interface – ***my place***. It&#39;s not polished or complete, but I think it&#39;s a useful starting point to think about the possibilities. You just type in an address, street name, or place name and my place shows you maps, photos, newspapers, and even extracts from the Sands &amp; MacDougall directories. **[Try it now!](https://slv.wraggelabs.com/myplace/)**

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/screenshot-from-2026-02-02-17-44-07.png&#34; width=&#34;600&#34; height=&#34;433&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;&lt;a href=&#34;https://slv.wraggelabs.com/myplace/&#34;&gt;Try &lt;b&gt;&lt;i&gt;my place!&lt;/i&gt;&lt;/b&gt;&lt;/a&gt; Just enter an address in the search box.&lt;/figcaption&gt;&lt;/figure&gt;

Search results in my place are bookmarkable. So save and share your discoveries!

## The collections

***my place*** draws its data from a number of different place-based collections that I&#39;ve been working on during my residency.

### OpenStreetMap

When you enter an address in the search box, ***my place*** looks it up in [OpenStreetMap](https://www.openstreetmap.org) to get its geospatial coordinates. It then places a marker and re-centres the map at the top of the app.

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp.png&#34; width=&#34;600&#34; height=&#34;219&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Map centred on 149 Brunswick Street, Fitzroy&lt;/figcaption&gt;&lt;/figure&gt;

OpenStreetMap is also used to retrieve additional information about the suburb, including its boundaries.

### Sands &amp; MacDougall&#39;s directories

***my place*** queries the [full-text searchable version of Sands &amp; Mac](https://updates.timsherratt.org/2025/11/12/a-new-way-of-searching.html) for addresses. Results will vary based on the OCR quality and the nature of query, but it can give you a potted history of who has lived in your house. The search results are displayed in chronological order, and include an [image snippet](https://updates.timsherratt.org/2025/11/16/some-sands-mac-tweaks-thanks.html) showing the actual printed entry as well as the text content and metadata. 

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-sandm.png&#34; width=&#34;600&#34; height=&#34;349&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Occupants of 149 Brunswick Street, Fitzroy from 1875 to 1925&lt;/figcaption&gt;&lt;/figure&gt;

### Committee for Urban Action photographs

If you enter a full street address, ***my place*** will search [the CUA collection](https://updates.timsherratt.org/2026/01/29/geolocating-photos-from-the-slvs.html) for photos associated with the segment of road that includes the current address. It then displays the individual images from any matching photosets.

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-cua.png&#34; width=&#34;600&#34; height=&#34;432&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Photographs from CUA of the currently selected road&lt;/figcaption&gt;&lt;/figure&gt;

Otherwise ***my space*** will look for CUA photos that are near the current location, and display a randomly-selected image from each photoset.

### Georeferenced maps

***my place*** searches through [digitised maps from the SLV collection that have been georeferenced by the public](https://updates.timsherratt.org/2025/11/04/turning-the-slvs-maps-into.html). It finds maps that either intersect with the currently selected location, or are nearby.

If you enter a full street address, the first 6 georeferenced maps will be positioned on a modern basemap with a marker indicating the currently selected point. This means you can see your address on a historical map. The number of georeferenced maps that can be displayed in this way is determined by the browser – so I&#39;ve limited it to 6 to be safe.

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-geo.png&#34; width=&#34;600&#34; height=&#34;296&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Georeferenced maps positioned on a modern basemap, showing the location of the currently selected address&lt;/figcaption&gt;&lt;/figure&gt;

### Parish maps

***my place*** searches through [parish maps in the SLV collection that have geospatial coordinates or approximate bounding boxes](https://updates.timsherratt.org/2025/10/06/creating-bounding-boxes-for-parish.html). It finds maps that either intersect with the currently selected location, or are nearby.

### Newspapers

***my place*** searches through [my dataset of newspapers in the SLV collection](https://updates.timsherratt.org/2025/12/16/exploring-victorian-newspapers.html) that have a place of publication documented in the &#39;Place newspaper published&#39; metadata field. It finds newspapers that are either associated with the current suburb/town, or a nearby suburb/town. This includes digitised and non-digitised titles. Digitised titles include a link to Trove.

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-newspapers.png&#34; width=&#34;600&#34; height=&#34;293&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Newspapers from the SLV collection published in Fitzroy&lt;/figcaption&gt;&lt;/figure&gt;

### Photographs

I thought it would be cool to include a few photographs of the current suburb or town. To do this, I downloaded a list of place names from VicNames, then used the place names to [search the SLV catalogue for photographs with relevant subject headings](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/download_place_images.ipynb). A random selection of the harvested images is displayed in ***my place***.

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-images.png&#34; width=&#34;600&#34; height=&#34;234&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;A few images of Fitzroy, displayed alongside a map of Fitzroy&#39;s current boundaries using data from OpenStreetMap&lt;/figcaption&gt;&lt;/figure&gt;

## The interface

The interface is pretty simple. You type an address in the box and hit enter. If the geocoding process finds multiple matches, it&#39;ll give you a list to choose from. Once the location is found, a marker is added and the main map re-centres. Then related resources are displayed below the map.

As you scroll down through the results you gradually zoom out from your initial starting point. This is reflected in the four bands or layers used to group resources: &#39;my house&#39;, &#39;my street&#39;, &#39;my suburb&#39;, and &#39;nearby&#39;. Each band contains a mix of resources from different collections.

When I started working on ***my place***, I was thinking about a project from around 2010 called [The History Wall](https://wraggelabs.com/info/history-wall/). Like ***my place***, The History Wall pulled many different types of resources together into a rich exploratory interface. As you scrolled through The History Wall you moved through time, with randomly selected items appearing from a range of sources including Trove newspapers, the ADB, and museum collections.

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/history-wall.jpg&#34; width=&#34;600&#34; height=&#34;505&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;A version of The History Wall created for the National Museum of Australia&#39;s &#39;Irish in Australia&#39; exhibition&lt;/figcaption&gt;&lt;/figure&gt;

I originally thought I&#39;d inject some of the same randomness into ***my place***, but I was worried it might just get too confusing. I thought it was important to keep the relationship between the starting point and the resources in focus even as you zoomed out. So my visual metaphor shifted to something more like a blast radius map, or a stratigraphic diagram, that displayed distinct groups and layers as you moved beyond the baseline. My limited CSS skills couldn&#39;t make the vision in my head a reality, but there are lots of headings and colours instead to highlight the transitions!

The actual mix of groups and layers displayed depends on the nature of your query. If you&#39;ve entered a complete street address, and there are results for that address in Sands &amp; Mac, then you&#39;ll see &#39;my house&#39;, &#39;my suburb&#39;, and &#39;nearby&#39;. If you&#39;ve only entered a suburb or town, or your street address can&#39;t be found, you&#39;ll see two layers starting with &#39;my suburb&#39;.

Here&#39;s an overview of what you might expect to see.

### my house

- Sands &amp; MacDougall extracts (text search on full address)
- georeferenced maps (search for maps that contain the base point)
- parish maps (search for maps that contain the base point)

### my street 

- CUA photos (search for matching street identifiers)

if there&#39;s no &#39;my house&#39; layer:

- Sands &amp; MacDougall extracts (text search on street name and suburb)
- georeferenced maps (search for intersections between maps and street)
- parish maps (search for intersections between maps and street)

### my suburb/town

- suburb boundaries from OSM
- images (search for suburb name in metadata)
- newspapers (search for suburb name in metadata)

if there&#39;s no &#39;my house&#39; or &#39;my street&#39; layer:

- georeferenced maps (search for intersections between maps and suburb boundaries)
- parish maps (search for intersections between maps and suburb boundaries)

### nearby

- CUA photos (search for photosets within 5km of the base point, filtered to remove current street)
- georeferenced maps (search for maps within 10km of base point, ordered by distance, max of 24 displayed)
- parish maps (search for maps within 10km of base point, ordered by distance, max of 24 displayed)
- newspapers (search for newspapers within 100km of base point, ordered by distance, max of 24 displayed)

## The data

Most of the data used in ***my place*** is stored in two SQLite databases – [one for Sands &amp; Mac](https://glam-workbench.net/state-library-victoria/sands-macdougall-directories/), and [the other for CUA, georeferenced maps, parish maps, and newspapers](https://slv-places-481615284700.australia-southeast1.run.app/). The metadata for the collection images is stored in [a JSON file](https://raw.githubusercontent.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/refs/heads/main/place_images.json) that is directly loaded by the interface.

I&#39;ve published the SQLite databases online using [Datasette](https://datasette.io) and [Spatialite](https://www.gaia-gis.it/fossil/libspatialite/index). Spatialite makes it possible to find geospatial features that intersect, or are near, a given point. For example, you could find maps that include a specific set of coordinates.

Datasette has the ability to create [&#39;canned queries&#39;](https://docs.datasette.io/en/stable/sql_queries.html#canned-queries) that feed url parameters into pre-defined SQL queries. This coupled with Datasette&#39;s [built-in JSON API](https://docs.datasette.io/en/stable/json_api.html) makes it possible to construct query urls in ***my place*** and use them to retrieve JSON results data from my databases.

When you enter an address in ***my place***, multiple queries are fired off to find intersecting or nearby resources. For example, this url finds georeferenced maps within 10km of a point at the centre of Fitzroy: [slv-places-481615284700.australia-southeast1.run.app/georefere...](https://slv-places-481615284700.australia-southeast1.run.app/georeferenced_maps/maps_from_wkt.json?wkt=POINT(144.977468%20-37.803143))&amp;distance=10000&amp;_shape=array.

In the case of Sands &amp; Mac, ***my place*** uses a canned query that runs a full-text search across the OCRd content of a volume. Suburb names are often abbreviated in Sands &amp; Mac, so the app first runs a query to find possible abbreviations, then adds them into the main query to inject a bit of fuzziness. This is repeated for all 24 digitised volumes.

Once the metadata is retrieved from the databases, images are loaded from the SLV&#39;s IIIF service.

## Next steps?

I&#39;m not sure how much more work I&#39;ll do on ***my place***, but there are a few things I&#39;d like to try. In particular, I&#39;d like to help the user understand more about what data is being presented, or not presented, and why. Not all digitised maps have been georeferenced, not all parish maps have coordinates, street numbers have changed, and the OCR in Sands &amp; Mac varies in quality. ***my place*** can only present a sample – a gesture towards the wealth of material available from the SLV. I feel that message needs to be made more explicit. Though I&#39;m not sure how without overloading the interface.

There are additional data sources I&#39;d like to play around with. ***my place*** already includes some code to query [Wikidata](https://www.wikidata.org/) for more information about a suburb. But I haven&#39;t had a chance to do anything with it. I&#39;d like to be able to provide additional contextual information from outside the SLV, such as electoral boundaries, populations, even election results. It would also be fun to display sightings of plants and animals from the [Atlas of Living Australia](https://www.ala.org.au).

What can I find out about my house? It would be great if ***my place*** could answer that question by taking the user on an open ended journey through our cultural, historical, and environmental landscape.





</source:markdown>
    </item>
    
    <item>
      <title>Geolocating photos from the SLV&#39;s Committee for Urban Action collection</title>
      <link>https://updates.timsherratt.org/2026/01/29/geolocating-photos-from-the-slvs.html</link>
      <pubDate>Thu, 29 Jan 2026 18:08:06 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2026/01/29/geolocating-photos-from-the-slvs.html</guid>
      <description>&lt;p&gt;Concerned about the loss of built heritage in the 1970s, the Committee for Urban Action photographed streetscapes across urban and regional Victoria. They compiled a remarkable collection of photographs that is &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/collectionDiscovery?vid=61SLV_INST:SLV&amp;amp;collectionId=81271917420007636&#34;&gt;now being digitised by the State Library of Victoria&lt;/a&gt;. More than 20,000 images are already available online!&lt;/p&gt;
&lt;p&gt;The CUA worked systematically, capturing photos street by street, and recording the locations of each set of photographs. This information is used to prepare the title attached to each photo as it&amp;rsquo;s uploaded to the SLV catalogue. In general, titles include the name of the road where the photo was taken, the name of the suburb or town, and the names of two intersecting roads that define the boundaries of the current road segment. They can also tell you which side of the road the photo was taken on. For example, the title &lt;code&gt;Gore Street, Fitzroy, from Gertrude Street to Webb Street - east side&lt;/code&gt; tells us the photo was taken on the east side of Gore Street, Fitzroy between the intersections with Gertrude Street and Webb Street.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/cua-slv-viewer.png&#34; width=&#34;600&#34; height=&#34;547&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Photos from Gore Street, Fitzroy &lt;a href=&#34;https://viewer.slv.vic.gov.au/?entity=IE7489506&amp;mode=browse&#34;&gt;displayed in the SLV image viewer&lt;/a&gt;&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;It&amp;rsquo;s great to have this sort of structured information linking photos to specific locations, but to navigate through the collection &lt;em&gt;in space&lt;/em&gt; we need more. We need to link each photo to a set of geospatial coordinates by mapping each road segment. That was the challenge I took on as part of &lt;a href=&#34;https://lab.slv.vic.gov.au/experiments/my-place-tim-sherratt&#34;&gt;my residency in the SLV LAB&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;When I started working on the collection I wasn&amp;rsquo;t really sure what was possible. I had to learn a lot, and ended up revising my processes multiple times as I got deeper into the data. But my aim was always to create some sort of map-based interface, that would allow users to click on a street and see any associated CUA photos. It&amp;rsquo;s still a bit buggy and incomplete, but here it is – &lt;a href=&#34;https://slv.wraggelabs.com/cua/&#34;&gt;&lt;strong&gt;explore the CUA collection street by street&lt;/strong&gt;&lt;/a&gt;!&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/cua-browser.png&#34; width=&#34;600&#34; height=&#34;513&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Gore Street, Fitzroy &lt;a href=&#34;https://slv.wraggelabs.com/cua/?photoset=gore-street-fitzroy-gertrude-street-webb-street&#34;&gt;in the new CUA Browser&lt;/a&gt;&lt;/figcaption&gt;&lt;/figure&gt;
&lt;h2 id=&#34;the-process&#34;&gt;The process&lt;/h2&gt;
&lt;p&gt;My basic plan was to find the intersections using &lt;a href=&#34;https://www.openstreetmap.org/&#34;&gt;OpenStreetMap&lt;/a&gt;, then extract geospatial information about the segment of road between the two intersections. This involved much trial and error, but eventually I ended up with a process that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;parsed each item title to try and extract the names of the main road, the suburb, and the two intersecting roads&lt;/li&gt;
&lt;li&gt;queried &lt;a href=&#34;https://nominatim.org&#34;&gt;Nominatim&lt;/a&gt; for the suburb bounding box&lt;/li&gt;
&lt;li&gt;for each intersecting road, queried OSM to find a node at, or around, its intersection with the main road, within the suburb bounding box&lt;/li&gt;
&lt;li&gt;created a new bounding box from the coordinates of the two intersections&lt;/li&gt;
&lt;li&gt;queried OSM for the main road within this bounding box&lt;/li&gt;
&lt;li&gt;extracted the coordinates of the main road segment, removing any points outside of the bounding box&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There&amp;rsquo;s more details below and in these notebooks: &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua_finding_intersections.ipynb&#34;&gt;cua_finding_intersections.ipynb&lt;/a&gt; and &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua_data_processing.ipynb&#34;&gt;cua_data_processing.ipynb&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;finding-intersections&#34;&gt;Finding intersections&lt;/h2&gt;
&lt;p&gt;As described, the title of each photograph generally includes 4 pieces of information: the road, suburb, intersecting roads, and side. My plan was to find the intersections first to get the limits of the road segment. This is possible thanks to the awesome &lt;a href=&#34;https://www.openstreetmap.org/&#34;&gt;OpenStreetMap&lt;/a&gt; and its &lt;a href=&#34;https://wiki.openstreetmap.org/wiki/Overpass_API&#34;&gt;Overpass API&lt;/a&gt;. It took me a while to get my head around the Overpass query language, but there are lots of &lt;a href=&#34;https://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_API_by_Example&#34;&gt;useful examples online&lt;/a&gt;. The query to find the intersection between Gore Street and Gertrude Street in Fitzroy looks like this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;[bbox:-37.8089071,144.9732006,-37.7929130,144.9851430];
way[&#39;highway&#39;][name=&amp;quot;Gore Street&amp;quot;];
node(w)-&amp;gt;.n1;
way[&#39;highway&#39;][name=&amp;quot;Gertrude Street&amp;quot;];
node(w)-&amp;gt;.n2;
node.n1.n2; 
out body;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can &lt;a href=&#34;https://overpass-turbo.eu/s/2jq8&#34;&gt;try it out&lt;/a&gt; using Overpass Turbo&amp;rsquo;s web interface.&lt;/p&gt;
&lt;p&gt;In OpenStreetMap, linear features, such as roads or rivers, are represented as &lt;a href=&#34;https://wiki.openstreetmap.org/wiki/Way&#34;&gt;&lt;code&gt;ways&lt;/code&gt;&lt;/a&gt;. Each way is made up of a series of &lt;code&gt;nodes&lt;/code&gt; or points with geospatial coordinates. Every way and node has its own unique identifier. Tags can be added to features to describe what type of things they are.&lt;/p&gt;
&lt;p&gt;The query above looks for &lt;code&gt;ways&lt;/code&gt; named &amp;lsquo;Gore Street&amp;rsquo; and &amp;lsquo;Gertrude Street&amp;rsquo; that are tagged as &lt;code&gt;highway&lt;/code&gt; (a &lt;code&gt;highway&lt;/code&gt; in OpenStreetMap is any road-like feature including things bike paths and foot trails).&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code class=&#34;language-way[&#39;highway&#39;][name=&#34;Gore&#34; data-lang=&#34;way[&#39;highway&#39;][name=&#34;Gore&#34;&gt;way[&#39;highway&#39;][name=&amp;quot;Gore Street&amp;quot;];
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It then extracts the nodes that make up each way and looks to see if there are any nodes in common between the two ways.  A node shared between two ways indicates an intersection.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;node(w)-&amp;gt;.n2;
node.n1.n2; 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The query is limited using a bounding box that encloses the suburb of Fitzroy. This avoids false positives and keeps down the query load.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;[bbox:-37.8089071,144.9732006,-37.7929130,144.9851430];
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The JSON result of this query gives as the latitude and longitude of the node at the intersection of the two roads.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;{
  &amp;quot;version&amp;quot;: 0.6,
  &amp;quot;generator&amp;quot;: &amp;quot;Overpass API 0.7.62.10 2d4cfc48&amp;quot;,
  &amp;quot;osm3s&amp;quot;: {
    &amp;quot;timestamp_osm_base&amp;quot;: &amp;quot;2026-01-27T03:11:45Z&amp;quot;,
    &amp;quot;copyright&amp;quot;: &amp;quot;The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.&amp;quot;
  },
  &amp;quot;elements&amp;quot;: [

{
  &amp;quot;type&amp;quot;: &amp;quot;node&amp;quot;,
  &amp;quot;id&amp;quot;: 224750459,
  &amp;quot;lat&amp;quot;: -37.8062302,
  &amp;quot;lon&amp;quot;: 144.9817848
}

  ]
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;After a bit of testing, I found this worked pretty well, except for roundabouts&amp;hellip; In OpenStreetMap, roads don&amp;rsquo;t actually cross roundabouts – they end on one side, then begin anew on the other side. In cases like this, looking for shared nodes doesn&amp;rsquo;t work. Instead you have to look to see if the two roads have nodes that are less than a given distance apart. The query is similar to the one above, but uses &lt;code&gt;around&lt;/code&gt; when comparing the nodes. In this case I&amp;rsquo;m looking for nodes that are within 20 metres of each other.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;node(w.w2)(around.w1:20);
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;finding-road-segments&#34;&gt;Finding road segments&lt;/h2&gt;
&lt;p&gt;Once I had the coordinates of the two intersections, I could look for the segment of road between between them. To do this I created a bounding box using the coordinates of the intersections, and then searched for ways by name within that defined area.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s important to note that there&amp;rsquo;s no one-to-one correspondence between roads and OSM ways. A single road might be represented in OSM as a series of separate, but connected, ways. For example, at a roundabout, or where a road divides, new ways might have been created to document the change. This means that when we query OSM for details of a road we often get back information about multiple ways. Some of these might be things like bike paths which we can filter using tags, but often they&amp;rsquo;ll be sections of the road that we want. For example, this query for Gore Street, within the bounds of its intersections with Gertrude Street and Webb Street, returns details of two ways.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;way[&amp;quot;highway&amp;quot;~&amp;quot;^(trunk|primary|secondary|tertiary|unclassified|residential|service|track|pedestrian|living_street)$&amp;quot;][name=&amp;quot;Gore Street&amp;quot;](-37.8062302,144.98128480000003,-37.8040076,144.9826827);
out body;
&amp;gt;;
out body;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can &lt;a href=&#34;https://overpass-turbo.eu/s/2jqi&#34;&gt;view the result&lt;/a&gt; in Overpass Turbo.&lt;/p&gt;
&lt;p&gt;However, that doesn&amp;rsquo;t mean that the full extent of both ways is contained within the bounding box, just that some of the nodes of both ways are inside. Because of this, I filtered the results from all the ways and only kept nodes whose coordinates were within the desired region.&lt;/p&gt;
&lt;h2 id=&#34;problems-finding-intersections&#34;&gt;Problems finding intersections&lt;/h2&gt;
&lt;p&gt;The method described above works pretty well, and once I understood enough about the Overpass API to get out actual paths that I could display on a map, I fed all of the CUA photos through a script and got useful data for more than 80% of them.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/screenshot-from-2025-09-27-17-34-36.png&#34; width=&#34;600&#34; height=&#34;652&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;One of my early tests.&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Then I spent a &lt;em&gt;lot of&lt;/em&gt; time trying to understand where the remainder were failing.&lt;/p&gt;
&lt;p&gt;Some of them failed because the titles were missing information, or were formatted in a way I didn&amp;rsquo;t expect. For example, instead of a second intersecting road, some titles just said &amp;lsquo;to end&amp;rsquo;. This makes perfect sense to a human looking at a map, but it&amp;rsquo;s difficult to handle programmatically.&lt;/p&gt;
&lt;p&gt;Some photos either recorded the wrong suburb, or the boundaries of the suburb had moved since the photos were taken. For example, many of the photos described as being from Eaglehawk are now in California Gully.&lt;/p&gt;
&lt;p&gt;Similarly, some road names were wrong either because of documentation errors, or because the names have changed over time. There are also some variations in the way OSM records road names – in particular, I found that roads with hyphenated names sometimes had spaces around the hyphen and sometimes didn&amp;rsquo;t. There were also a couple of cases where names weren&amp;rsquo;t attached to the corresponding road segment in OSM, but I was able to edit these in OSM directly.&lt;/p&gt;
&lt;p&gt;Other roads had multiple names, or change names along their path. I mean, what&amp;rsquo;s going on with Brunswick Street and St Georges Road in Fitzroy? Country towns seemed most prone to this – a highway might become &amp;lsquo;Main Road&amp;rsquo; within the town boundaries, or the order of hyphenated places in road names might change. I found one road in Clunes that had four different names within the space of a few hundred metres.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/clunes.png&#34; width=&#34;582&#34; height=&#34;1002&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;One road, four names!&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Finally, the routes of some roads had changed – intersections no longer intersected, roads were closed, or new parks had popped up to split a road in two.&lt;/p&gt;
&lt;p&gt;My processing script logged the titles I couldn&amp;rsquo;t locate and I worked through the list manually, trying to identify what each problem was. I suppose there&amp;rsquo;s two ways I could&amp;rsquo;ve handled these problems – building more fuzziness into the process to check for things like alternative names, or by compiling a list of &amp;lsquo;corrected&amp;rsquo; titles. I started off using the first approach, but as I worked through more and more anomalies, the checking logic became very complicated and inefficient. Just think about the knots you can tie yourself in trying to handle a title where the suburb is wrong and the main road changes names in between intersections.&lt;/p&gt;
&lt;p&gt;I refactored the code multiple times, but it&amp;rsquo;s still pretty messy. In the end I created a list of &amp;lsquo;corrected&amp;rsquo; titles as well, so it was a bit of a hybrid approach. I suspect I could have saved myself a lot of pain if I&amp;rsquo;d reversed the process – compiling &amp;lsquo;corrected&amp;rsquo; titles first, then adapting the logic as patterns emerged.&lt;/p&gt;
&lt;p&gt;There are still some photos I haven&amp;rsquo;t located. In some cases I just don&amp;rsquo;t have enough information. In others I need to manually record coordinates or way ids to feed into the process, and I haven&amp;rsquo;t worked out the best way to do this yet. You can see the titles that I&amp;rsquo;ve haven&amp;rsquo;t geolocated yet in the files: &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua-not-found.txt&#34;&gt;&lt;code&gt;cua-not-found.txt&lt;/code&gt;&lt;/a&gt; and &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua-not-parsed.txt&#34;&gt;&lt;code&gt;cua-not-parsed.txt&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In total, 18,603 out of 20,644 photos have been geolocated. That&amp;rsquo;s over 90%!&lt;/p&gt;
&lt;h2 id=&#34;assembling-the-data&#34;&gt;Assembling the data&lt;/h2&gt;
&lt;p&gt;I processed the data in a couple of phases to get it in the shape I wanted.&lt;/p&gt;
&lt;p&gt;The first step was to group all the photos by title, so I could link each group to its location. But remember that titles often record which &lt;em&gt;side&lt;/em&gt; of the road a photo was taken on. To bring all sides of a road segment together into a single group, I created a key from a normalised/slugified version of the title with the side value removed. I used this key to save information about each side within the same group.&lt;/p&gt;
&lt;p&gt;I ended up with a dataset with this sort of structure (a truncated example):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;iffla-street-south-melbourne-coventry-street-normanby-street&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;:&lt;/span&gt;
    {
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;title&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Iffla Street, South Melbourne, from Coventry Street to Normanby Street&amp;#34;&lt;/span&gt;,
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;sides&amp;#34;&lt;/span&gt;:
        {
            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;east side&amp;#34;&lt;/span&gt;:
            {
                &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;title&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Iffla Street, South Melbourne, from Coventry Street to Normanby Street - east side.&amp;#34;&lt;/span&gt;,
                &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;images&amp;#34;&lt;/span&gt;:
                [
                    {
                        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ie_id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;IE20321667&amp;#34;&lt;/span&gt;,
                        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;alma_id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#ae81ff&#34;&gt;9939649155207636&lt;/span&gt;
                    }
                    &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;more&lt;/span&gt; &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;photos...&lt;/span&gt;
                ]
            },
            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;west side&amp;#34;&lt;/span&gt;:
            {
                &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;title&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Iffla Street, South Melbourne, from Normanby Street to Coventry Street - west side.&amp;#34;&lt;/span&gt;,
                &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;images&amp;#34;&lt;/span&gt;:
                [
                    {
                        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ie_id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;IE20320072&amp;#34;&lt;/span&gt;,
                        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;alma_id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#ae81ff&#34;&gt;9939655629407636&lt;/span&gt;
                    },
					&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;more&lt;/span&gt; &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;photos...&lt;/span&gt;
                ]
            }
        },
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ways&amp;#34;&lt;/span&gt;:
        {
            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;27631235&amp;#34;&lt;/span&gt;:
            [
                [
                    &lt;span style=&#34;color:#ae81ff&#34;&gt;144.9503379&lt;/span&gt;,
                    &lt;span style=&#34;color:#ae81ff&#34;&gt;-37.835322&lt;/span&gt;
                ],
                &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;more&lt;/span&gt; &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;points...&lt;/span&gt;
            ]
        }
    }&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;,&lt;/span&gt;
		
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can see how the sides and matching ways have been brought together under the key value.&lt;/p&gt;
&lt;p&gt;This structure was useful for grouping and processing the data, but to create a map interface I needed to bring the geospatial information to the surface. The first version of the interface used one big GeoJSON file in which the features were &lt;a href=&#34;https://geocrystal.github.io/geojson/GeoJSON/MultiLineString.html&#34;&gt;MultiLineStrings&lt;/a&gt; created from the paths of each road segment. The photo data was saved in the properties of each GeoJSON feature.&lt;/p&gt;
&lt;p&gt;It sort of worked. The roads with photos were highlighted, and clicking on the roads displayed the photos. It was only when I changed the opacity of the lines that I realised that, in many cases, different road segments were being piled on top of each other. When the lines were opaque these piles were invisible, but add a bit of transparency and you could see that some lines were darker than others. Clicking on the lines only displayed the top layer, so some groups of photos were effectively invisible.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/screenshot-from-2025-12-01-14-09-47.png&#34; width=&#34;503&#34; height=&#34;339&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Version one of the interface showing how the colour of the highlighted roads varied once I decreased the opacity.&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Why did this happen? I&amp;rsquo;d wrongly assumed that each segment of road would only have one group of photos associated with it. But it&amp;rsquo;s not hard to find cases where this is not true. Consider Moor Street, Fitzroy, between Nicholson Street and Brunswick Street. On the north side, there is a single group of photos that document the buildings between Nicholson Street and Brunswick Street. However, on the south side there&amp;rsquo;s two groups of photos. One covers the section between Nicholson Street and Fitzroy Street, the other covers Fitzroy Street to Brunswick Street. One section of road, three groups of photos&amp;hellip;&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/cua-multiple.png&#34; width=&#34;600&#34; height=&#34;478&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Moor Street, Fitzroy, between Nicholson Street and Brunswick Street, in the new CUA Browser, showing the three photosets associated with the one section of road.&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;To make these layered groups more easily accessible through the interface I had to change the way the data was organised – separating the GeoJSON from the photosets so that multiple photosets could be associated with a single geospatial feature. I decided to create a GeoJSON feature for every OSM way in the dataset. However, I needed to prune the way&amp;rsquo;s coordinates to only include those that were part of the CUA road segments. To do this, I saved all the way data when I found the road segments. Then in the second processing phase, I grouped the way coordinates associated with the road segments and compared this list to the full way path. Any coordinate in the way path that wasn&amp;rsquo;t in the road segments was removed. It seems unnecessarily complex, but I wanted to make sure that only the parts of roads associated with photos were highlighted in the interface.&lt;/p&gt;
&lt;p&gt;The result was two data files. The first, &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua-ways.geojson&#34;&gt;&lt;code&gt;cua-ways.geojson&lt;/code&gt;&lt;/a&gt;, contains the pruned way paths and their OSM identifiers. The second, &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua-photos.json&#34;&gt;&lt;code&gt;cua-photos.json&lt;/code&gt;&lt;/a&gt;, contains information about each photo set, including the sides, photos, paths, and associated way identifiers. The datasets are linked by the way identifiers.&lt;/p&gt;
&lt;h2 id=&#34;constructing-the-interface&#34;&gt;Constructing the interface&lt;/h2&gt;
&lt;p&gt;My plan for the interface was pretty simple. There&amp;rsquo;d be a map on which all the road segments associated with CUA photos were highlighted. Clicking on a highlighted section would show the photos. I wanted to display the photos as if you were scanning the streetscape, so I decided to put them all side-by-side in a gallery that scrolled horizontally.&lt;/p&gt;
&lt;p&gt;The first version used Leaflet to display the maps and, as noted above, had some problems where there were multiple photosets associated with a segment of road.&lt;/p&gt;
&lt;p&gt;For the &lt;a href=&#34;https://slv.wraggelabs.com/cua/&#34;&gt;second version&lt;/a&gt; I decided to switch to &lt;a href=&#34;https://maplibre.org&#34;&gt;MapLibre&lt;/a&gt; because it seems a bit more active and up-to-date. I&amp;rsquo;d already used MapLibre in the &lt;a href=&#34;https://slv.wraggelabs.com/newspapers/&#34;&gt;SLV Newspapers Explorer&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The interface first loads the  &lt;code&gt;cua-ways.geojson&lt;/code&gt; file to highlight the relevant roads. When you click on one of the roads, the way id is passed to a function that looks for associated photo sets in the &lt;code&gt;cua-photos.json&lt;/code&gt; data. If there&amp;rsquo;s only one linked photoset, then the photos are displayed. However, if there&amp;rsquo;s more than one linked photoset, they&amp;rsquo;re displayed as a list. The user then selects from the list to display the related photos.&lt;/p&gt;
&lt;p&gt;A couple of other things happen when you click on a way or select a photoset:  the colour of the selected road segment changes, and the browser url is updated with the way or photoset identifier. You can bookmark or share these urls to go directly to a specific road or photoset. There&amp;rsquo;s also a button to reverse the order of the images – they scroll left to right, but sometimes they seem to have been photographed right to left.&lt;/p&gt;
&lt;h2 id=&#34;more-information-and-links&#34;&gt;More information and links&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://slv.wraggelabs.com/cua/&#34;&gt;CUA Browser&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CUA data is also used in the &lt;a href=&#34;https://slv.wraggelabs.com/myplace/&#34;&gt;my place&lt;/a&gt; app&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CUA code and data is in the &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency&#34;&gt;geo-maps-residency&lt;/a&gt; repository&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Code for the interface is in the &lt;a href=&#34;https://github.com/wragge/slv-demo-apps&#34;&gt;slv-demo-apps&lt;/a&gt; repository&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;all the outcomes of my SLV residency are listed on the &lt;a href=&#34;https://slv.wraggelabs.com&#34;&gt;Experiments with the State Library of Victoria’s collections&lt;/a&gt; page&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
</description>
      <source:markdown>Concerned about the loss of built heritage in the 1970s, the Committee for Urban Action photographed streetscapes across urban and regional Victoria. They compiled a remarkable collection of photographs that is [now being digitised by the State Library of Victoria](https://find.slv.vic.gov.au/discovery/collectionDiscovery?vid=61SLV_INST:SLV&amp;collectionId=81271917420007636). More than 20,000 images are already available online!

The CUA worked systematically, capturing photos street by street, and recording the locations of each set of photographs. This information is used to prepare the title attached to each photo as it&#39;s uploaded to the SLV catalogue. In general, titles include the name of the road where the photo was taken, the name of the suburb or town, and the names of two intersecting roads that define the boundaries of the current road segment. They can also tell you which side of the road the photo was taken on. For example, the title `Gore Street, Fitzroy, from Gertrude Street to Webb Street - east side` tells us the photo was taken on the east side of Gore Street, Fitzroy between the intersections with Gertrude Street and Webb Street.

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/cua-slv-viewer.png&#34; width=&#34;600&#34; height=&#34;547&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Photos from Gore Street, Fitzroy &lt;a href=&#34;https://viewer.slv.vic.gov.au/?entity=IE7489506&amp;mode=browse&#34;&gt;displayed in the SLV image viewer&lt;/a&gt;&lt;/figcaption&gt;&lt;/figure&gt;

It&#39;s great to have this sort of structured information linking photos to specific locations, but to navigate through the collection *in space* we need more. We need to link each photo to a set of geospatial coordinates by mapping each road segment. That was the challenge I took on as part of [my residency in the SLV LAB](https://lab.slv.vic.gov.au/experiments/my-place-tim-sherratt).

When I started working on the collection I wasn&#39;t really sure what was possible. I had to learn a lot, and ended up revising my processes multiple times as I got deeper into the data. But my aim was always to create some sort of map-based interface, that would allow users to click on a street and see any associated CUA photos. It&#39;s still a bit buggy and incomplete, but here it is – [**explore the CUA collection street by street**](https://slv.wraggelabs.com/cua/)!

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/cua-browser.png&#34; width=&#34;600&#34; height=&#34;513&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Gore Street, Fitzroy &lt;a href=&#34;https://slv.wraggelabs.com/cua/?photoset=gore-street-fitzroy-gertrude-street-webb-street&#34;&gt;in the new CUA Browser&lt;/a&gt;&lt;/figcaption&gt;&lt;/figure&gt;

## The process

My basic plan was to find the intersections using [OpenStreetMap](https://www.openstreetmap.org/), then extract geospatial information about the segment of road between the two intersections. This involved much trial and error, but eventually I ended up with a process that:

- parsed each item title to try and extract the names of the main road, the suburb, and the two intersecting roads
- queried [Nominatim](https://nominatim.org) for the suburb bounding box
- for each intersecting road, queried OSM to find a node at, or around, its intersection with the main road, within the suburb bounding box
- created a new bounding box from the coordinates of the two intersections
- queried OSM for the main road within this bounding box
- extracted the coordinates of the main road segment, removing any points outside of the bounding box

There&#39;s more details below and in these notebooks: [cua_finding_intersections.ipynb](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua_finding_intersections.ipynb) and [cua_data_processing.ipynb](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua_data_processing.ipynb).

## Finding intersections

As described, the title of each photograph generally includes 4 pieces of information: the road, suburb, intersecting roads, and side. My plan was to find the intersections first to get the limits of the road segment. This is possible thanks to the awesome [OpenStreetMap](https://www.openstreetmap.org/) and its [Overpass API](https://wiki.openstreetmap.org/wiki/Overpass_API). It took me a while to get my head around the Overpass query language, but there are lots of [useful examples online](https://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_API_by_Example). The query to find the intersection between Gore Street and Gertrude Street in Fitzroy looks like this:

```
[bbox:-37.8089071,144.9732006,-37.7929130,144.9851430];
way[&#39;highway&#39;][name=&#34;Gore Street&#34;];
node(w)-&gt;.n1;
way[&#39;highway&#39;][name=&#34;Gertrude Street&#34;];
node(w)-&gt;.n2;
node.n1.n2; 
out body;
```
You can [try it out](https://overpass-turbo.eu/s/2jq8) using Overpass Turbo&#39;s web interface.

In OpenStreetMap, linear features, such as roads or rivers, are represented as [`ways`](https://wiki.openstreetmap.org/wiki/Way). Each way is made up of a series of `nodes` or points with geospatial coordinates. Every way and node has its own unique identifier. Tags can be added to features to describe what type of things they are. 

The query above looks for `ways` named &#39;Gore Street&#39; and &#39;Gertrude Street&#39; that are tagged as `highway` (a `highway` in OpenStreetMap is any road-like feature including things bike paths and foot trails).

```way[&#39;highway&#39;][name=&#34;Gore Street&#34;];
way[&#39;highway&#39;][name=&#34;Gore Street&#34;];
```
It then extracts the nodes that make up each way and looks to see if there are any nodes in common between the two ways.  A node shared between two ways indicates an intersection.

```
node(w)-&gt;.n2;
node.n1.n2; 
```
The query is limited using a bounding box that encloses the suburb of Fitzroy. This avoids false positives and keeps down the query load.

```
[bbox:-37.8089071,144.9732006,-37.7929130,144.9851430];
```
The JSON result of this query gives as the latitude and longitude of the node at the intersection of the two roads.

```
{
  &#34;version&#34;: 0.6,
  &#34;generator&#34;: &#34;Overpass API 0.7.62.10 2d4cfc48&#34;,
  &#34;osm3s&#34;: {
    &#34;timestamp_osm_base&#34;: &#34;2026-01-27T03:11:45Z&#34;,
    &#34;copyright&#34;: &#34;The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.&#34;
  },
  &#34;elements&#34;: [

{
  &#34;type&#34;: &#34;node&#34;,
  &#34;id&#34;: 224750459,
  &#34;lat&#34;: -37.8062302,
  &#34;lon&#34;: 144.9817848
}

  ]
}
```
After a bit of testing, I found this worked pretty well, except for roundabouts... In OpenStreetMap, roads don&#39;t actually cross roundabouts – they end on one side, then begin anew on the other side. In cases like this, looking for shared nodes doesn&#39;t work. Instead you have to look to see if the two roads have nodes that are less than a given distance apart. The query is similar to the one above, but uses `around` when comparing the nodes. In this case I&#39;m looking for nodes that are within 20 metres of each other.

```
node(w.w2)(around.w1:20);
```
## Finding road segments

Once I had the coordinates of the two intersections, I could look for the segment of road between between them. To do this I created a bounding box using the coordinates of the intersections, and then searched for ways by name within that defined area.

It&#39;s important to note that there&#39;s no one-to-one correspondence between roads and OSM ways. A single road might be represented in OSM as a series of separate, but connected, ways. For example, at a roundabout, or where a road divides, new ways might have been created to document the change. This means that when we query OSM for details of a road we often get back information about multiple ways. Some of these might be things like bike paths which we can filter using tags, but often they&#39;ll be sections of the road that we want. For example, this query for Gore Street, within the bounds of its intersections with Gertrude Street and Webb Street, returns details of two ways.

```
way[&#34;highway&#34;~&#34;^(trunk|primary|secondary|tertiary|unclassified|residential|service|track|pedestrian|living_street)$&#34;][name=&#34;Gore Street&#34;](-37.8062302,144.98128480000003,-37.8040076,144.9826827);
out body;
&gt;;
out body;
```
You can [view the result](https://overpass-turbo.eu/s/2jqi) in Overpass Turbo.

However, that doesn&#39;t mean that the full extent of both ways is contained within the bounding box, just that some of the nodes of both ways are inside. Because of this, I filtered the results from all the ways and only kept nodes whose coordinates were within the desired region.

## Problems finding intersections

The method described above works pretty well, and once I understood enough about the Overpass API to get out actual paths that I could display on a map, I fed all of the CUA photos through a script and got useful data for more than 80% of them. 

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/screenshot-from-2025-09-27-17-34-36.png&#34; width=&#34;600&#34; height=&#34;652&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;One of my early tests.&lt;/figcaption&gt;&lt;/figure&gt;

Then I spent a *lot of* time trying to understand where the remainder were failing.

Some of them failed because the titles were missing information, or were formatted in a way I didn&#39;t expect. For example, instead of a second intersecting road, some titles just said &#39;to end&#39;. This makes perfect sense to a human looking at a map, but it&#39;s difficult to handle programmatically.

Some photos either recorded the wrong suburb, or the boundaries of the suburb had moved since the photos were taken. For example, many of the photos described as being from Eaglehawk are now in California Gully.

Similarly, some road names were wrong either because of documentation errors, or because the names have changed over time. There are also some variations in the way OSM records road names – in particular, I found that roads with hyphenated names sometimes had spaces around the hyphen and sometimes didn&#39;t. There were also a couple of cases where names weren&#39;t attached to the corresponding road segment in OSM, but I was able to edit these in OSM directly.

Other roads had multiple names, or change names along their path. I mean, what&#39;s going on with Brunswick Street and St Georges Road in Fitzroy? Country towns seemed most prone to this – a highway might become &#39;Main Road&#39; within the town boundaries, or the order of hyphenated places in road names might change. I found one road in Clunes that had four different names within the space of a few hundred metres. 

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/clunes.png&#34; width=&#34;582&#34; height=&#34;1002&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;One road, four names!&lt;/figcaption&gt;&lt;/figure&gt;

Finally, the routes of some roads had changed – intersections no longer intersected, roads were closed, or new parks had popped up to split a road in two.

My processing script logged the titles I couldn&#39;t locate and I worked through the list manually, trying to identify what each problem was. I suppose there&#39;s two ways I could&#39;ve handled these problems – building more fuzziness into the process to check for things like alternative names, or by compiling a list of &#39;corrected&#39; titles. I started off using the first approach, but as I worked through more and more anomalies, the checking logic became very complicated and inefficient. Just think about the knots you can tie yourself in trying to handle a title where the suburb is wrong and the main road changes names in between intersections.

I refactored the code multiple times, but it&#39;s still pretty messy. In the end I created a list of &#39;corrected&#39; titles as well, so it was a bit of a hybrid approach. I suspect I could have saved myself a lot of pain if I&#39;d reversed the process – compiling &#39;corrected&#39; titles first, then adapting the logic as patterns emerged.

There are still some photos I haven&#39;t located. In some cases I just don&#39;t have enough information. In others I need to manually record coordinates or way ids to feed into the process, and I haven&#39;t worked out the best way to do this yet. You can see the titles that I&#39;ve haven&#39;t geolocated yet in the files: [`cua-not-found.txt`](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua-not-found.txt) and [`cua-not-parsed.txt`](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua-not-parsed.txt).

In total, 18,603 out of 20,644 photos have been geolocated. That&#39;s over 90%!

## Assembling the data

I processed the data in a couple of phases to get it in the shape I wanted.

The first step was to group all the photos by title, so I could link each group to its location. But remember that titles often record which *side* of the road a photo was taken on. To bring all sides of a road segment together into a single group, I created a key from a normalised/slugified version of the title with the side value removed. I used this key to save information about each side within the same group.

I ended up with a dataset with this sort of structure (a truncated example):

```json
&#34;iffla-street-south-melbourne-coventry-street-normanby-street&#34;:
    {
        &#34;title&#34;: &#34;Iffla Street, South Melbourne, from Coventry Street to Normanby Street&#34;,
        &#34;sides&#34;:
        {
            &#34;east side&#34;:
            {
                &#34;title&#34;: &#34;Iffla Street, South Melbourne, from Coventry Street to Normanby Street - east side.&#34;,
                &#34;images&#34;:
                [
                    {
                        &#34;ie_id&#34;: &#34;IE20321667&#34;,
                        &#34;alma_id&#34;: 9939649155207636
                    }
                    more photos...
                ]
            },
            &#34;west side&#34;:
            {
                &#34;title&#34;: &#34;Iffla Street, South Melbourne, from Normanby Street to Coventry Street - west side.&#34;,
                &#34;images&#34;:
                [
                    {
                        &#34;ie_id&#34;: &#34;IE20320072&#34;,
                        &#34;alma_id&#34;: 9939655629407636
                    },
					more photos...
                ]
            }
        },
        &#34;ways&#34;:
        {
            &#34;27631235&#34;:
            [
                [
                    144.9503379,
                    -37.835322
                ],
                more points...
            ]
        }
    },
		
```
You can see how the sides and matching ways have been brought together under the key value.

This structure was useful for grouping and processing the data, but to create a map interface I needed to bring the geospatial information to the surface. The first version of the interface used one big GeoJSON file in which the features were [MultiLineStrings](https://geocrystal.github.io/geojson/GeoJSON/MultiLineString.html) created from the paths of each road segment. The photo data was saved in the properties of each GeoJSON feature.

It sort of worked. The roads with photos were highlighted, and clicking on the roads displayed the photos. It was only when I changed the opacity of the lines that I realised that, in many cases, different road segments were being piled on top of each other. When the lines were opaque these piles were invisible, but add a bit of transparency and you could see that some lines were darker than others. Clicking on the lines only displayed the top layer, so some groups of photos were effectively invisible.

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/screenshot-from-2025-12-01-14-09-47.png&#34; width=&#34;503&#34; height=&#34;339&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Version one of the interface showing how the colour of the highlighted roads varied once I decreased the opacity.&lt;/figcaption&gt;&lt;/figure&gt;

Why did this happen? I&#39;d wrongly assumed that each segment of road would only have one group of photos associated with it. But it&#39;s not hard to find cases where this is not true. Consider Moor Street, Fitzroy, between Nicholson Street and Brunswick Street. On the north side, there is a single group of photos that document the buildings between Nicholson Street and Brunswick Street. However, on the south side there&#39;s two groups of photos. One covers the section between Nicholson Street and Fitzroy Street, the other covers Fitzroy Street to Brunswick Street. One section of road, three groups of photos...

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/cua-multiple.png&#34; width=&#34;600&#34; height=&#34;478&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Moor Street, Fitzroy, between Nicholson Street and Brunswick Street, in the new CUA Browser, showing the three photosets associated with the one section of road.&lt;/figcaption&gt;&lt;/figure&gt;

To make these layered groups more easily accessible through the interface I had to change the way the data was organised – separating the GeoJSON from the photosets so that multiple photosets could be associated with a single geospatial feature. I decided to create a GeoJSON feature for every OSM way in the dataset. However, I needed to prune the way&#39;s coordinates to only include those that were part of the CUA road segments. To do this, I saved all the way data when I found the road segments. Then in the second processing phase, I grouped the way coordinates associated with the road segments and compared this list to the full way path. Any coordinate in the way path that wasn&#39;t in the road segments was removed. It seems unnecessarily complex, but I wanted to make sure that only the parts of roads associated with photos were highlighted in the interface.

The result was two data files. The first, [`cua-ways.geojson`](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua-ways.geojson), contains the pruned way paths and their OSM identifiers. The second, [`cua-photos.json`](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua-photos.json), contains information about each photo set, including the sides, photos, paths, and associated way identifiers. The datasets are linked by the way identifiers.

## Constructing the interface

My plan for the interface was pretty simple. There&#39;d be a map on which all the road segments associated with CUA photos were highlighted. Clicking on a highlighted section would show the photos. I wanted to display the photos as if you were scanning the streetscape, so I decided to put them all side-by-side in a gallery that scrolled horizontally.

The first version used Leaflet to display the maps and, as noted above, had some problems where there were multiple photosets associated with a segment of road.

For the [second version](https://slv.wraggelabs.com/cua/) I decided to switch to [MapLibre](https://maplibre.org) because it seems a bit more active and up-to-date. I&#39;d already used MapLibre in the [SLV Newspapers Explorer](https://slv.wraggelabs.com/newspapers/).

The interface first loads the  `cua-ways.geojson` file to highlight the relevant roads. When you click on one of the roads, the way id is passed to a function that looks for associated photo sets in the `cua-photos.json` data. If there&#39;s only one linked photoset, then the photos are displayed. However, if there&#39;s more than one linked photoset, they&#39;re displayed as a list. The user then selects from the list to display the related photos.

A couple of other things happen when you click on a way or select a photoset:  the colour of the selected road segment changes, and the browser url is updated with the way or photoset identifier. You can bookmark or share these urls to go directly to a specific road or photoset. There&#39;s also a button to reverse the order of the images – they scroll left to right, but sometimes they seem to have been photographed right to left.

## More information and links

- [CUA Browser](https://slv.wraggelabs.com/cua/)

- CUA data is also used in the [my place](https://slv.wraggelabs.com/myplace/) app

- CUA code and data is in the [geo-maps-residency](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency) repository
- Code for the interface is in the [slv-demo-apps](https://github.com/wragge/slv-demo-apps) repository
- all the outcomes of my SLV residency are listed on the [Experiments with the State Library of Victoria’s collections](https://slv.wraggelabs.com) page
</source:markdown>
    </item>
    
    <item>
      <title>Goodbye 2025! A brief summary of the highlights and lowlights…</title>
      <link>https://updates.timsherratt.org/2025/12/31/goodbye-a-brief-summary-of.html</link>
      <pubDate>Wed, 31 Dec 2025 17:14:14 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/12/31/goodbye-a-brief-summary-of.html</guid>
      <description>&lt;p&gt;My 2025 started badly and ended well. In the first few months of the year, battles with the gatekeepers at Trove sent me spiralling into a pretty dark place. But by year’s end I was having fun, working with the wonderful people at the State Library of Victoria. In between I caught up on some overdue project maintenance. Most of this is documented in the &lt;a href=&#34;https://updates.timsherratt.org/archive/&#34;&gt;37 blog posts I wrote this year&lt;/a&gt;, but here’s a quick summary.&lt;/p&gt;&lt;h2&gt;Highlights&lt;/h2&gt;&lt;ul&gt;&lt;li&gt;From September to December, I was Creative Technologist-in-Residence at the SLV LAB, exploring ways of opening up the Library’s place-based collections. There’s still a few things to finish off, but &lt;a href=&#34;https://slv.wraggelabs.com/&#34;&gt;here’s list of the results so far&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;As part of my SLV work, I created a &lt;a href=&#34;https://glam-workbench.net/state-library-victoria/sands-macdougall-directories/&#34;&gt;fully searchable version&lt;/a&gt; of the 24 volumes of Sands &amp;amp; MacDougall directories digitised by the Library. This followed the pattern I’d used for 54 volumes of the &lt;a href=&#34;https://glam-workbench.net/trove-journals/nsw-post-office-directories/&#34;&gt;NSW Post Office Directories&lt;/a&gt;, 44 volumes of the &lt;a href=&#34;https://glam-workbench.net/trove-journals/sydney-telephone-directories/&#34;&gt;Sydney Telephone Directories&lt;/a&gt;, and 54 volumes of the &lt;a href=&#34;https://glam-workbench.net/tasmanian-post-office-directories/&#34;&gt;Tasmanian Post Office Directories&lt;/a&gt;. So there’s now 176 volumes from the 1880s to the 1950s that can be easily explored for people and places— and all free to use of course.&lt;/li&gt;&lt;li&gt;In April, I added a &lt;a href=&#34;https://updates.timsherratt.org/2025/04/30/new-prov-section-added-to.html&#34;&gt;new section to the GLAM Workbench&lt;/a&gt; documenting the Public Record Office Victoria’s collection API. I also used the API to create a &lt;a href=&#34;https://updates.timsherratt.org/2025/04/10/using-the-public-record-office.html&#34;&gt;Data Dashboard that provides an overview of PROV’s collection&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;Also in April, I updated the GLAM Name Index Search &lt;a href=&#34;https://updates.timsherratt.org/2025/04/09/more-than-million-rows-of.html&#34;&gt;to include an additional 6 million records from PROV&lt;/a&gt;. In total, the GLAM Name Index now includes more that 12 million records in 293 datasets from 10 Australian GLAM organisations — another free resource for Australian researchers.&lt;/li&gt;&lt;li&gt;In July I undertook some overdue maintenance on a variety of old apps and projects. In the process, I &lt;a href=&#34;https://updates.timsherratt.org/2025/07/09/the-rebirth-of-wragge-labs.html&#34;&gt;resurrected my old Wragge Labs domain and created a showcase&lt;/a&gt; of many of the websites, apps and experiments I’ve worked on over the past 30 years.&lt;/li&gt;&lt;li&gt;I was particularly pleased &lt;a href=&#34;https://updates.timsherratt.org/2025/07/02/the-future-of-the-past.html&#34;&gt;to get &lt;em&gt;The Future of the Past&lt;/em&gt; working again&lt;/a&gt;, so once more you can create fridge magnet poetry from an odd collection of words harvested from Trove newspapers! I built FOTP back in 2012 when I was the Harold White Fellow at the NLA. Also this year I finally got around to &lt;a href=&#34;https://updates.timsherratt.org/2025/06/30/mining-for-meanings.html&#34;&gt;transcribing my Harold White Lecture&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;In June I wrote a &lt;a href=&#34;https://updates.timsherratt.org/2025/06/05/glam-workbench-preprint-for-building.html&#34;&gt;short piece on the GLAM Workbench&lt;/a&gt; for the forthcoming publication &lt;em&gt;Building User-Friendly Toolkits and Platforms for Digital Humanities&lt;/em&gt;. I think it provides a useful summary of what the GLAM Workbench is, and what I’d like it to be. I also wrote up the &lt;a href=&#34;https://updates.timsherratt.org/2025/06/19/a-brief-and-biased-history.html&#34;&gt;short but glorious history of Trove Twitter bots&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;h2&gt;Lowlights&lt;/h2&gt;&lt;ul&gt;&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;Saying goodbye to 15 years of work on Trove&lt;/a&gt;. It still hurts. And I still miss resources such as @TroveNewsBot and the Trove API Console which ran happily for more than a decade before being killed without warning by the NLA.&lt;/li&gt;&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2025/05/19/no-more-harvesting-data-from.html&#34;&gt;Saying goodbye to 17 years of work on the National Archives of Australia’s collections&lt;/a&gt;. This will be the first New Year’s Day in a decade when I haven’t updated my &lt;a href=&#34;https://updates.timsherratt.org/2025/02/05/ten-years-of-data-the.html&#34;&gt;harvest of files with the access status of closed&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;h2&gt;Next year&lt;/h2&gt;&lt;ul&gt;&lt;li&gt;In 2026, I’m looking forward to starting work on the &lt;a href=&#34;https://ardc.edu.au/project/reusable-and-accessible-public-interest-documents-rapid/&#34;&gt;RAPID project&lt;/a&gt;, building on the work I’ve done on Commonwealth Hansard over the years to create new examples and documentation.&lt;/li&gt;&lt;li&gt;I’m honoured to be giving the closing keynote at the &lt;a href=&#34;https://www.glamlabs.io/events/glam-labs-futures-26&#34;&gt;GLAM Labs Futures conference&lt;/a&gt; in Edinburgh in June — hoping we can pull together the funds to get there in person!&lt;/li&gt;&lt;/ul&gt;&lt;h2&gt;How you can help&lt;/h2&gt;&lt;p&gt;Much of my work is unfunded, and keeping resources such as the GLAM Name Index running costs real money. I’ve been very grateful for the support of my GitHub sponsors over past years. Their contributions help cover a substantial proportion of my cloud hosting costs. But bidding farewell to Twitter and Trove has had an impact on my sponsorship income. If you use or value the things I build to help researchers make use of GLAM collections, you might like to &lt;a href=&#34;https://github.com/sponsors/wragge&#34;&gt;sponsor me on GitHub&lt;/a&gt;, or &lt;a href=&#34;https://www.buymeacoffee.com/wragge&#34;&gt;Buy Me a Coffee&lt;/a&gt;. All contributions are greatly appreciated!&lt;/p&gt;&lt;p&gt;If you can’t afford a financial contribution, there are other ways you can help!&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Let me know how you’re using my stuff! A bit of positive feedback does wonders when my enthusiasm is flagging. You can find my contact details at &lt;a href=&#34;https://timsherratt.au/&#34;&gt;timsherratt.au&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;Tell others how you use my stuff! Getting information about resources out to those who might benefit is really hard, so your help would be greatly appreciated.&lt;/li&gt;&lt;li&gt;The GLAM Workbench describes a few other ways &lt;a href=&#34;https://glam-workbench.net/get-involved/&#34;&gt;you can get involved&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Goodbye 2025!&lt;/p&gt;
</description>
      <source:markdown>&lt;p&gt;My 2025 started badly and ended well. In the first few months of the year, battles with the gatekeepers at Trove sent me spiralling into a pretty dark place. But by year’s end I was having fun, working with the wonderful people at the State Library of Victoria. In between I caught up on some overdue project maintenance. Most of this is documented in the &lt;a href=&#34;https://updates.timsherratt.org/archive/&#34;&gt;37 blog posts I wrote this year&lt;/a&gt;, but here’s a quick summary.&lt;/p&gt;&lt;h2&gt;Highlights&lt;/h2&gt;&lt;ul&gt;&lt;li&gt;From September to December, I was Creative Technologist-in-Residence at the SLV LAB, exploring ways of opening up the Library’s place-based collections. There’s still a few things to finish off, but &lt;a href=&#34;https://slv.wraggelabs.com/&#34;&gt;here’s list of the results so far&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;As part of my SLV work, I created a &lt;a href=&#34;https://glam-workbench.net/state-library-victoria/sands-macdougall-directories/&#34;&gt;fully searchable version&lt;/a&gt; of the 24 volumes of Sands &amp;amp; MacDougall directories digitised by the Library. This followed the pattern I’d used for 54 volumes of the &lt;a href=&#34;https://glam-workbench.net/trove-journals/nsw-post-office-directories/&#34;&gt;NSW Post Office Directories&lt;/a&gt;, 44 volumes of the &lt;a href=&#34;https://glam-workbench.net/trove-journals/sydney-telephone-directories/&#34;&gt;Sydney Telephone Directories&lt;/a&gt;, and 54 volumes of the &lt;a href=&#34;https://glam-workbench.net/tasmanian-post-office-directories/&#34;&gt;Tasmanian Post Office Directories&lt;/a&gt;. So there’s now 176 volumes from the 1880s to the 1950s that can be easily explored for people and places— and all free to use of course.&lt;/li&gt;&lt;li&gt;In April, I added a &lt;a href=&#34;https://updates.timsherratt.org/2025/04/30/new-prov-section-added-to.html&#34;&gt;new section to the GLAM Workbench&lt;/a&gt; documenting the Public Record Office Victoria’s collection API. I also used the API to create a &lt;a href=&#34;https://updates.timsherratt.org/2025/04/10/using-the-public-record-office.html&#34;&gt;Data Dashboard that provides an overview of PROV’s collection&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;Also in April, I updated the GLAM Name Index Search &lt;a href=&#34;https://updates.timsherratt.org/2025/04/09/more-than-million-rows-of.html&#34;&gt;to include an additional 6 million records from PROV&lt;/a&gt;. In total, the GLAM Name Index now includes more that 12 million records in 293 datasets from 10 Australian GLAM organisations — another free resource for Australian researchers.&lt;/li&gt;&lt;li&gt;In July I undertook some overdue maintenance on a variety of old apps and projects. In the process, I &lt;a href=&#34;https://updates.timsherratt.org/2025/07/09/the-rebirth-of-wragge-labs.html&#34;&gt;resurrected my old Wragge Labs domain and created a showcase&lt;/a&gt; of many of the websites, apps and experiments I’ve worked on over the past 30 years.&lt;/li&gt;&lt;li&gt;I was particularly pleased &lt;a href=&#34;https://updates.timsherratt.org/2025/07/02/the-future-of-the-past.html&#34;&gt;to get &lt;em&gt;The Future of the Past&lt;/em&gt; working again&lt;/a&gt;, so once more you can create fridge magnet poetry from an odd collection of words harvested from Trove newspapers! I built FOTP back in 2012 when I was the Harold White Fellow at the NLA. Also this year I finally got around to &lt;a href=&#34;https://updates.timsherratt.org/2025/06/30/mining-for-meanings.html&#34;&gt;transcribing my Harold White Lecture&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;In June I wrote a &lt;a href=&#34;https://updates.timsherratt.org/2025/06/05/glam-workbench-preprint-for-building.html&#34;&gt;short piece on the GLAM Workbench&lt;/a&gt; for the forthcoming publication &lt;em&gt;Building User-Friendly Toolkits and Platforms for Digital Humanities&lt;/em&gt;. I think it provides a useful summary of what the GLAM Workbench is, and what I’d like it to be. I also wrote up the &lt;a href=&#34;https://updates.timsherratt.org/2025/06/19/a-brief-and-biased-history.html&#34;&gt;short but glorious history of Trove Twitter bots&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;h2&gt;Lowlights&lt;/h2&gt;&lt;ul&gt;&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;Saying goodbye to 15 years of work on Trove&lt;/a&gt;. It still hurts. And I still miss resources such as @TroveNewsBot and the Trove API Console which ran happily for more than a decade before being killed without warning by the NLA.&lt;/li&gt;&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2025/05/19/no-more-harvesting-data-from.html&#34;&gt;Saying goodbye to 17 years of work on the National Archives of Australia’s collections&lt;/a&gt;. This will be the first New Year’s Day in a decade when I haven’t updated my &lt;a href=&#34;https://updates.timsherratt.org/2025/02/05/ten-years-of-data-the.html&#34;&gt;harvest of files with the access status of closed&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;h2&gt;Next year&lt;/h2&gt;&lt;ul&gt;&lt;li&gt;In 2026, I’m looking forward to starting work on the &lt;a href=&#34;https://ardc.edu.au/project/reusable-and-accessible-public-interest-documents-rapid/&#34;&gt;RAPID project&lt;/a&gt;, building on the work I’ve done on Commonwealth Hansard over the years to create new examples and documentation.&lt;/li&gt;&lt;li&gt;I’m honoured to be giving the closing keynote at the &lt;a href=&#34;https://www.glamlabs.io/events/glam-labs-futures-26&#34;&gt;GLAM Labs Futures conference&lt;/a&gt; in Edinburgh in June — hoping we can pull together the funds to get there in person!&lt;/li&gt;&lt;/ul&gt;&lt;h2&gt;How you can help&lt;/h2&gt;&lt;p&gt;Much of my work is unfunded, and keeping resources such as the GLAM Name Index running costs real money. I’ve been very grateful for the support of my GitHub sponsors over past years. Their contributions help cover a substantial proportion of my cloud hosting costs. But bidding farewell to Twitter and Trove has had an impact on my sponsorship income. If you use or value the things I build to help researchers make use of GLAM collections, you might like to &lt;a href=&#34;https://github.com/sponsors/wragge&#34;&gt;sponsor me on GitHub&lt;/a&gt;, or &lt;a href=&#34;https://www.buymeacoffee.com/wragge&#34;&gt;Buy Me a Coffee&lt;/a&gt;. All contributions are greatly appreciated!&lt;/p&gt;&lt;p&gt;If you can’t afford a financial contribution, there are other ways you can help!&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Let me know how you’re using my stuff! A bit of positive feedback does wonders when my enthusiasm is flagging. You can find my contact details at &lt;a href=&#34;https://timsherratt.au/&#34;&gt;timsherratt.au&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;Tell others how you use my stuff! Getting information about resources out to those who might benefit is really hard, so your help would be greatly appreciated.&lt;/li&gt;&lt;li&gt;The GLAM Workbench describes a few other ways &lt;a href=&#34;https://glam-workbench.net/get-involved/&#34;&gt;you can get involved&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Goodbye 2025!&lt;/p&gt;
</source:markdown>
    </item>
    
    <item>
      <title>Exploring Victorian newspapers</title>
      <link>https://updates.timsherratt.org/2025/12/16/exploring-victorian-newspapers.html</link>
      <pubDate>Tue, 16 Dec 2025 13:03:48 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/12/16/exploring-victorian-newspapers.html</guid>
      <description>&lt;p&gt;Newspapers are a vital source for local history. That&amp;rsquo;s why, &lt;a href=&#34;https://discontents.com.au/easter-eggsperiments/index.html&#34;&gt;back in 2014&lt;/a&gt;, I created the &lt;a href=&#34;https://wraggelabs.com/trove-places/map/&#34;&gt;Trove Places&lt;/a&gt; app – a map interface to help people find Trove&amp;rsquo;s digitised newspapers by their place of publication or distribution. Trove Places has proved very popular, and the State Libraries of South Australia, and Victoria, amongst others, point their users to it to help with their research. I&amp;rsquo;ve updated the data several times over the years, though the &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;Trove&amp;rsquo;s new gatekeeping regime&lt;/a&gt; will make future updates difficult.&lt;/p&gt;
&lt;p&gt;During &lt;a href=&#34;https://lab.slv.vic.gov.au/team/tim-sherratt&#34;&gt;my residency at the State Library of Victoria&lt;/a&gt;, one of the librarians noted how useful the app was, and asked whether it might be possible to include undigitised newspapers from the SLV catalogue as well as those in Trove. It was, and I did – here&amp;rsquo;s a &lt;a href=&#34;https://slv.wraggelabs.com/newspapers/&#34;&gt;brand new app to explore Victorian newspapers&lt;/a&gt;, both digitised and undigitised!&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-12-12-12-48-16.png&#34; width=&#34;600&#34; height=&#34;391&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Just &lt;a href=&#34;https://slv.wraggelabs.com/newspapers/&#34;&gt;click on the map&lt;/a&gt; to find Victorian newspapers!&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;It&amp;rsquo;s pretty easy to use. You just click on the map in an area you&amp;rsquo;re interested in. The map will display the 20 nearest places where newspapers where published or distributed. The size of the markers indicates how many titles are associated with each place. In the sidebar, details of the newspapers are listed by place, ordered by their distance from your selected point.&lt;/p&gt;
&lt;p&gt;You can also find local newspapers using the &lt;a href=&#34;https://slv.wraggelabs.com/myplace/&#34;&gt;my place&lt;/a&gt; app. Once you enter an address, newspapers from your suburb or town will be displayed, as well as those from nearby locations.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-12-16-12-51-54.png&#34; width=&#34;600&#34; height=&#34;507&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;newspapers from Geelong displayed in the my place app&lt;/figcaption&gt;&lt;/figure&gt;
&lt;h2 id=&#34;assembling-the-data&#34;&gt;Assembling the data&lt;/h2&gt;
&lt;p&gt;How do you find Victorian newspapers? The reference librarians at the SLV pointed me to the &amp;lsquo;Place newspaper published&amp;rsquo; field in the catalogue. Searching this field for &amp;lsquo;Australia&amp;ndash;Victoria&amp;rsquo; &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/search?query=lds03,exact,Australia--Victoria&amp;amp;tab=searchProfile&amp;amp;search_scope=slv_local&amp;amp;vid=61SLV_INST:SLV&#34;&gt;returns 3,997 results&lt;/a&gt;, compared to the 460 digitised in Trove.&lt;/p&gt;
&lt;p&gt;The first step in assembling the data was to harvest the newspaper records from the SLV catalogue. To do this I made use of the Primo JSON API. The method is &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/download_newspapers.ipynb&#34;&gt;documented in this notebook&lt;/a&gt;. The results was a &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/newspapers.ndjson&#34;&gt;newline-delimited JSON file&lt;/a&gt;, with one record per line.&lt;/p&gt;
&lt;p&gt;The harvested metadata doesn&amp;rsquo;t include links to digitised versions of newspapers in Trove. To add these links I first looked in the &lt;code&gt;856&lt;/code&gt; field of the newspaper&amp;rsquo;s MARC record. I also noticed that some Trove links were being loaded from an &amp;lsquo;edelivery&amp;rsquo; JSON file, so I added these as well. I ended up with 344 unique links to Trove, but not all of these were to digitised newspapers as some more recent newspapers are available through eLegal deposit. In total there were 268 unique links to digitised newspapers. This is well short of the 460 Victorian newspapers in Trove. Why? It&amp;rsquo;s possible that the links haven&amp;rsquo;t been added into the SLV catalogue, or that the &amp;lsquo;place newspaper published&amp;rsquo; field hasn&amp;rsquo;t been populated for records that include the links. It&amp;rsquo;s also possible that Trove links are hiding somewhere else in the SLV catalogue!&lt;/p&gt;
&lt;p&gt;To try and fill this gap, I compared the catalogue metadata with my &lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals/blob/master/data/total_articles_by_newspaper.csv&#34;&gt;most recent harvest of Trove newspaper titles&lt;/a&gt;. If the Trove url was missing, I searched the catalogue data for the newspaper title. I then manually checked the results, making sure the dates and titles lined up, and added positive matches to &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/newspaper_manual_additions.csv&#34;&gt;a new CSV file&lt;/a&gt; which I merged back into the main dataset. This added another 152 Trove links.&lt;/p&gt;
&lt;p&gt;The next step was to link the &amp;lsquo;place newspaper published&amp;rsquo; values to places with known locations. The &amp;lsquo;place newspaper published&amp;rsquo; information is included in the &lt;code&gt;lds03&lt;/code&gt; field of the harvested metadata. Records often contain references to multiple places, so I split all the newspaper/place combinations out into separate rows. I then matched the places against a list of Victorian place names and coordinates downloaded from the &lt;a href=&#34;https://maps.land.vic.gov.au/lassi/VicnamesUI.jsp&#34;&gt;VicNames database&lt;/a&gt;. If there were no matches, I manually checked and adjusted the place names – for example, I changed &amp;lsquo;East Kew&amp;rsquo; to &amp;lsquo;Kew East&amp;rsquo;, and &amp;lsquo;Bayside&amp;rsquo; to &amp;lsquo;Bayside City&amp;rsquo;.&lt;/p&gt;
&lt;p&gt;To add any Trove digitised newspapers that might still be missing, I made use of my existing Trove harvests. First I compared my &lt;a href=&#34;https://docs.google.com/spreadsheets/d/1rURriHBSf3MocI8wsdl1114t0YeyU0BVSXWeg232MZs/edit?usp=sharing&#34;&gt;Trove Places dataset&lt;/a&gt; with my &lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals/blob/master/data/total_articles_by_newspaper.csv&#34;&gt;latest harvest of newspaper titles&lt;/a&gt;. There were a few new titles, so I matched them to places &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/get_places_from_newspapers.ipynb&#34;&gt;using this notebook&lt;/a&gt;, based on my original Trove Places code. I then merged the Trove Places dataset with the new titles and checked it against the catalogue dataset. If any urls were missing, I added a record from the Trove data.&lt;/p&gt;
&lt;p&gt;All of the processing steps are &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/process_newspapers.ipynb&#34;&gt;documented in this notebook&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;building-the-apps&#34;&gt;Building the apps&lt;/h2&gt;
&lt;p&gt;To make the data easily searchable by its geospatial coordinates, I loaded all the data into an SQLite/Spatialite database and &lt;a href=&#34;https://slv-places-481615284700.australia-southeast1.run.app/newspapers&#34;&gt;published it online using Datasette&lt;/a&gt;. The database contains linked tables for titles and places.&lt;/p&gt;
&lt;p&gt;I also created a couple of canned queries which, together with Datasette&amp;rsquo;s built-in JSON API, made it possible to retrieve places and titles based on their distance from a given point. For example, this url retrieves places ordered by their distance from the point at latitude -36.815, longitude 144.965 : &lt;a href=&#34;https://slv-places-481615284700.australia-southeast1.run.app/newspapers/places_from_point.json?longitude=144.965&amp;amp;latitude=-36.815&amp;amp;distance=100000&amp;amp;_shape=array&#34;&gt;https://slv-places-481615284700.australia-southeast1.run.app/newspapers/places_from_point.json?longitude=144.965&amp;amp;latitude=-36.815&amp;amp;distance=100000&amp;amp;_shape=array&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When you click on the map in the &lt;a href=&#34;https://slv.wraggelabs.com/newspapers/&#34;&gt;Victorian Newspapers Explorer&lt;/a&gt;, it fires off a request like this to find nearby places. It then makes a second request to find newspapers related to those places and displays the results.&lt;/p&gt;
&lt;p&gt;The Victorian Newspapers Explorer was my first attempt at using &lt;a href=&#34;https://maplibre.org&#34;&gt;MapLibre&lt;/a&gt; rather than Leaflet to display maps using Javascript. It&amp;rsquo;s more verbose, but more flexible, so I think I&amp;rsquo;ll gradually switch over my other apps, including Trove Places.&lt;/p&gt;
&lt;p&gt;All the code of the Victorian Newspapers Explorer is in the &lt;a href=&#34;https://github.com/wragge/slv-demo-apps&#34;&gt;slv-demo-apps repository&lt;/a&gt;.&lt;/p&gt;
</description>
      <source:markdown>

Newspapers are a vital source for local history. That&#39;s why, [back in 2014](https://discontents.com.au/easter-eggsperiments/index.html), I created the [Trove Places](https://wraggelabs.com/trove-places/map/) app – a map interface to help people find Trove&#39;s digitised newspapers by their place of publication or distribution. Trove Places has proved very popular, and the State Libraries of South Australia, and Victoria, amongst others, point their users to it to help with their research. I&#39;ve updated the data several times over the years, though the [Trove&#39;s new gatekeeping regime](https://updates.timsherratt.org/2025/05/07/farewell-trove.html) will make future updates difficult.

During [my residency at the State Library of Victoria](https://lab.slv.vic.gov.au/team/tim-sherratt), one of the librarians noted how useful the app was, and asked whether it might be possible to include undigitised newspapers from the SLV catalogue as well as those in Trove. It was, and I did – here&#39;s a [brand new app to explore Victorian newspapers](https://slv.wraggelabs.com/newspapers/), both digitised and undigitised!

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-12-12-12-48-16.png&#34; width=&#34;600&#34; height=&#34;391&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Just &lt;a href=&#34;https://slv.wraggelabs.com/newspapers/&#34;&gt;click on the map&lt;/a&gt; to find Victorian newspapers!&lt;/figcaption&gt;&lt;/figure&gt;

It&#39;s pretty easy to use. You just click on the map in an area you&#39;re interested in. The map will display the 20 nearest places where newspapers where published or distributed. The size of the markers indicates how many titles are associated with each place. In the sidebar, details of the newspapers are listed by place, ordered by their distance from your selected point.

You can also find local newspapers using the [my place](https://slv.wraggelabs.com/myplace/) app. Once you enter an address, newspapers from your suburb or town will be displayed, as well as those from nearby locations. 

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-12-16-12-51-54.png&#34; width=&#34;600&#34; height=&#34;507&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;newspapers from Geelong displayed in the my place app&lt;/figcaption&gt;&lt;/figure&gt;

## Assembling the data

How do you find Victorian newspapers? The reference librarians at the SLV pointed me to the &#39;Place newspaper published&#39; field in the catalogue. Searching this field for &#39;Australia--Victoria&#39; [returns 3,997 results](https://find.slv.vic.gov.au/discovery/search?query=lds03,exact,Australia--Victoria&amp;tab=searchProfile&amp;search_scope=slv_local&amp;vid=61SLV_INST:SLV), compared to the 460 digitised in Trove.

The first step in assembling the data was to harvest the newspaper records from the SLV catalogue. To do this I made use of the Primo JSON API. The method is [documented in this notebook](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/download_newspapers.ipynb). The results was a [newline-delimited JSON file](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/newspapers.ndjson), with one record per line.

The harvested metadata doesn&#39;t include links to digitised versions of newspapers in Trove. To add these links I first looked in the `856` field of the newspaper&#39;s MARC record. I also noticed that some Trove links were being loaded from an &#39;edelivery&#39; JSON file, so I added these as well. I ended up with 344 unique links to Trove, but not all of these were to digitised newspapers as some more recent newspapers are available through eLegal deposit. In total there were 268 unique links to digitised newspapers. This is well short of the 460 Victorian newspapers in Trove. Why? It&#39;s possible that the links haven&#39;t been added into the SLV catalogue, or that the &#39;place newspaper published&#39; field hasn&#39;t been populated for records that include the links. It&#39;s also possible that Trove links are hiding somewhere else in the SLV catalogue!

To try and fill this gap, I compared the catalogue metadata with my [most recent harvest of Trove newspaper titles](https://github.com/wragge/trove-newspaper-totals/blob/master/data/total_articles_by_newspaper.csv). If the Trove url was missing, I searched the catalogue data for the newspaper title. I then manually checked the results, making sure the dates and titles lined up, and added positive matches to [a new CSV file](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/newspaper_manual_additions.csv) which I merged back into the main dataset. This added another 152 Trove links.

The next step was to link the &#39;place newspaper published&#39; values to places with known locations. The &#39;place newspaper published&#39; information is included in the `lds03` field of the harvested metadata. Records often contain references to multiple places, so I split all the newspaper/place combinations out into separate rows. I then matched the places against a list of Victorian place names and coordinates downloaded from the [VicNames database](https://maps.land.vic.gov.au/lassi/VicnamesUI.jsp). If there were no matches, I manually checked and adjusted the place names – for example, I changed &#39;East Kew&#39; to &#39;Kew East&#39;, and &#39;Bayside&#39; to &#39;Bayside City&#39;.

To add any Trove digitised newspapers that might still be missing, I made use of my existing Trove harvests. First I compared my [Trove Places dataset](https://docs.google.com/spreadsheets/d/1rURriHBSf3MocI8wsdl1114t0YeyU0BVSXWeg232MZs/edit?usp=sharing) with my [latest harvest of newspaper titles](https://github.com/wragge/trove-newspaper-totals/blob/master/data/total_articles_by_newspaper.csv). There were a few new titles, so I matched them to places [using this notebook](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/get_places_from_newspapers.ipynb), based on my original Trove Places code. I then merged the Trove Places dataset with the new titles and checked it against the catalogue dataset. If any urls were missing, I added a record from the Trove data.

All of the processing steps are [documented in this notebook](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/process_newspapers.ipynb).

## Building the apps

To make the data easily searchable by its geospatial coordinates, I loaded all the data into an SQLite/Spatialite database and [published it online using Datasette](https://slv-places-481615284700.australia-southeast1.run.app/newspapers). The database contains linked tables for titles and places.

I also created a couple of canned queries which, together with Datasette&#39;s built-in JSON API, made it possible to retrieve places and titles based on their distance from a given point. For example, this url retrieves places ordered by their distance from the point at latitude -36.815, longitude 144.965 : https://slv-places-481615284700.australia-southeast1.run.app/newspapers/places_from_point.json?longitude=144.965&amp;latitude=-36.815&amp;distance=100000&amp;_shape=array 

When you click on the map in the [Victorian Newspapers Explorer](https://slv.wraggelabs.com/newspapers/), it fires off a request like this to find nearby places. It then makes a second request to find newspapers related to those places and displays the results.

The Victorian Newspapers Explorer was my first attempt at using [MapLibre](https://maplibre.org) rather than Leaflet to display maps using Javascript. It&#39;s more verbose, but more flexible, so I think I&#39;ll gradually switch over my other apps, including Trove Places.

All the code of the Victorian Newspapers Explorer is in the [slv-demo-apps repository](https://github.com/wragge/slv-demo-apps).




</source:markdown>
    </item>
    
    <item>
      <title>Why bother?</title>
      <link>https://updates.timsherratt.org/2025/12/03/why-bother.html</link>
      <pubDate>Wed, 03 Dec 2025 16:18:22 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/12/03/why-bother.html</guid>
      <description>&lt;p&gt;&lt;em&gt;This was the introduction to my talk on the results of my time as Creative Technologist-in-Residence at the State Library of Victoria. My slides, with my full notes &lt;a href=&#34;https://slides.com/wragge/slv-my-place&#34;&gt;are available online&lt;/a&gt;, but after a very strange year that has travelled from disappointment to exhilaration, I thought it was worth posting these words separately.&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;The work that I do, that I&amp;rsquo;ve been doing for the past 30 years, is focused on helping people find, use, and understand the wonderfully rich collections held by our libraries, archives, and museums – the GLAM sector. Much of it is quite practical, resulting in tools and applications that are used by a wide range of researchers. Some of it is playful, some of it is critical, and some of it is just weird.&lt;/p&gt;
&lt;p&gt;You can browse through some of this history on &lt;a href=&#34;https://wraggelabs.com&#34;&gt;wraggelabs.com&lt;/a&gt;. And if you&amp;rsquo;re interested in my current crop of tools you can head to the &lt;a href=&#34;https://glam-workbench.net&#34;&gt;GLAM Workbench&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As some of you may know, I had &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;a few setbacks&lt;/a&gt; at the beginning of this year which really made me wonder whether I wanted to continue doing this sort of work.&lt;/p&gt;
&lt;p&gt;I mean, why bother?&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m really grateful that &lt;a href=&#34;https://slv.wraggelabs.com&#34;&gt;this residency&lt;/a&gt; has given me a chance to refocus on the reasons why I do what I do.&lt;/p&gt;
&lt;p&gt;I suppose my starting point is the fact that libraries can&amp;rsquo;t do everything themselves.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m thinking here specifically about the digital research space. There&amp;rsquo;s a lot that libraries, and other GLAM organisations, &lt;em&gt;can&lt;/em&gt; do – provide search interfaces, APIs, downloadable datasets, documentation, and examples of how to access APIs and datasets using code. The sorts of things that the &lt;a href=&#34;https://lab.slv.vic.gov.au&#34;&gt;SLV LAB&lt;/a&gt; is doing.&lt;/p&gt;
&lt;p&gt;I should pause here to unpack some acronyms. APIs deliver data in a form that machines can understand and process. Websites are for humans, APIs are for computers. APIs are also building blocks which can be connected up to create new applications – and I&amp;rsquo;ll be showing some examples of this later on.&lt;/p&gt;
&lt;p&gt;So there is much that GLAM organisations can do to support digital research. But it will never be enough. Researchers – whether they be academics or family historians – will always want more. It is in the nature of research to ask new questions, to head off in new directions.&lt;/p&gt;
&lt;p&gt;But rather than see this as a source of tension, I see it as an opportunity for collaboration. An opportunity to cultivate the &lt;em&gt;in-between&lt;/em&gt; spaces where research methods, tools, and results can feed back into the contextual frameworks of GLAM collections. Where GLAM organisations can share and celebrate the work that&amp;rsquo;s done with their data. Where all can find inspiration, ideas and support.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ve tended to call this sort of stuff infrastructure, but I think that really downplays the human aspect. The research sector has started to develop the funding and career structures necessary to allow people to build and maintain these infrastructures, but we need more. We need to recognise that a single tool, developed by an individual without institutional support, can be just as important as a multi-million dollar platform. Passion is precious and needs to be protected.&lt;/p&gt;
&lt;p&gt;Most of all, we need to keep a focus on the ethical imperatives – the reasons &lt;em&gt;why&lt;/em&gt; we bother and &lt;em&gt;why&lt;/em&gt; it matters. For me it boils down to openness and generosity. I have benefited greatly from the openness and generosity of others, and I want to pass that on. It&amp;rsquo;s the glue we need to hold those in-between spaces together; the sustenance we need to maintain our enthusiasm in the face of all the crap; the inspiration we need to try something new.&lt;/p&gt;
&lt;p&gt;Initiatives like the SLV LAB are important, not just because they foster innovation, but because they invite new ideas in. They even give space for ageing hackers like me to spend some dedicated time doing what they love – crafting new pathways for people to explore our glorious GLAM collections.&lt;/p&gt;
</description>
      <source:markdown>*This was the introduction to my talk on the results of my time as Creative Technologist-in-Residence at the State Library of Victoria. My slides, with my full notes [are available online](https://slides.com/wragge/slv-my-place), but after a very strange year that has travelled from disappointment to exhilaration, I thought it was worth posting these words separately.*

----

The work that I do, that I&#39;ve been doing for the past 30 years, is focused on helping people find, use, and understand the wonderfully rich collections held by our libraries, archives, and museums – the GLAM sector. Much of it is quite practical, resulting in tools and applications that are used by a wide range of researchers. Some of it is playful, some of it is critical, and some of it is just weird.

You can browse through some of this history on [wraggelabs.com](https://wraggelabs.com). And if you&#39;re interested in my current crop of tools you can head to the [GLAM Workbench](https://glam-workbench.net).

As some of you may know, I had [a few setbacks](https://updates.timsherratt.org/2025/05/07/farewell-trove.html) at the beginning of this year which really made me wonder whether I wanted to continue doing this sort of work.

I mean, why bother?

I&#39;m really grateful that [this residency](https://slv.wraggelabs.com) has given me a chance to refocus on the reasons why I do what I do.

I suppose my starting point is the fact that libraries can&#39;t do everything themselves. 

I&#39;m thinking here specifically about the digital research space. There&#39;s a lot that libraries, and other GLAM organisations, *can* do – provide search interfaces, APIs, downloadable datasets, documentation, and examples of how to access APIs and datasets using code. The sorts of things that the [SLV LAB](https://lab.slv.vic.gov.au) is doing.

I should pause here to unpack some acronyms. APIs deliver data in a form that machines can understand and process. Websites are for humans, APIs are for computers. APIs are also building blocks which can be connected up to create new applications – and I&#39;ll be showing some examples of this later on.

So there is much that GLAM organisations can do to support digital research. But it will never be enough. Researchers – whether they be academics or family historians – will always want more. It is in the nature of research to ask new questions, to head off in new directions.

But rather than see this as a source of tension, I see it as an opportunity for collaboration. An opportunity to cultivate the *in-between* spaces where research methods, tools, and results can feed back into the contextual frameworks of GLAM collections. Where GLAM organisations can share and celebrate the work that&#39;s done with their data. Where all can find inspiration, ideas and support.

We&#39;ve tended to call this sort of stuff infrastructure, but I think that really downplays the human aspect. The research sector has started to develop the funding and career structures necessary to allow people to build and maintain these infrastructures, but we need more. We need to recognise that a single tool, developed by an individual without institutional support, can be just as important as a multi-million dollar platform. Passion is precious and needs to be protected.

Most of all, we need to keep a focus on the ethical imperatives – the reasons *why* we bother and *why* it matters. For me it boils down to openness and generosity. I have benefited greatly from the openness and generosity of others, and I want to pass that on. It&#39;s the glue we need to hold those in-between spaces together; the sustenance we need to maintain our enthusiasm in the face of all the crap; the inspiration we need to try something new.

Initiatives like the SLV LAB are important, not just because they foster innovation, but because they invite new ideas in. They even give space for ageing hackers like me to spend some dedicated time doing what they love – crafting new pathways for people to explore our glorious GLAM collections. 

</source:markdown>
    </item>
    
    <item>
      <title>Counting down... (to the end of my SLV residency)</title>
      <link>https://updates.timsherratt.org/2025/11/19/counting-down-to-the-end.html</link>
      <pubDate>Wed, 19 Nov 2025 16:01:22 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/11/19/counting-down-to-the-end.html</guid>
      <description>&lt;p&gt;My stint as &lt;a href=&#34;https://updates.timsherratt.org/2025/09/22/creative-technologistinresidence-at-the-state.html&#34;&gt;Creative Technologist-in-Residence at the State Library of Victoria LAB&lt;/a&gt; comes to an end in a few weeks time and I&amp;rsquo;m frantically trying to pull things together. I&amp;rsquo;ll be back on-site at the Library from 1 to 5 December for a few events, and to report back to staff on what I&amp;rsquo;ve been doing.&lt;/p&gt;
&lt;p&gt;On Tuesday &lt;strong&gt;2 December&lt;/strong&gt;, there&amp;rsquo;ll be a public workshop on using and contributing to the &lt;a href=&#34;https://glam-workbench.net&#34;&gt;GLAM Workbench&lt;/a&gt;. Here&amp;rsquo;s the blurb:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;More and more GLAM organisations are looking to share their data to foster creativity and support new types of research. But how can you help potential users understand the possibilities of your data? This workshop will explore how GLAM organisations can create and share resources that encourage experimentation.&lt;/p&gt;
&lt;p&gt;The GLAM Workbench is a large collection of tools, hacks, and tutorials aimed at helping researchers make use of collection data. It uses platforms such as Jupyter notebooks to create live, working examples that run in your browser without additional software. Similar repositories of computational resources are being developed by GLAM organisations around the world.&lt;/p&gt;
&lt;p&gt;This workshop will introduce the technologies and standards used in the GLAM Workbench, such as Jupyter notebooks. It will provide an overview of related activity around the world, including best practice guidelines for GLAM organisations developing computational resources. It will explain how organisations and individuals can contribute content to the GLAM Workbench, or use it as a model to create their own specialised workbenches.&lt;/p&gt;
&lt;p&gt;Sharing data is important, but so is sharing skills, tools, and knowledge. Come along to find out how the GLAM Workbench can help.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It&amp;rsquo;s a free, hybrid event (in person and online) and will run from 1.00-3.00pm. A sign up page should be available soon.&lt;/p&gt;
&lt;p&gt;On Wednesday &lt;strong&gt;3 December&lt;/strong&gt; I&amp;rsquo;m presenting the results of my residency in a &amp;lsquo;technologist&amp;rsquo;s talk&amp;rsquo;. It&amp;rsquo;s an internal event, but it&amp;rsquo;s in the public &amp;lsquo;Create quarter&amp;rsquo; of the Library, so I think anyone can pop in. Hopefully there&amp;rsquo;ll be a video I can share.&lt;/p&gt;
&lt;p&gt;To give you an idea of what I&amp;rsquo;ll be talking about, here&amp;rsquo;s some of the outcomes so far:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;hacking the library workshop (&lt;a href=&#34;https://slides.com/wragge/slv-code-club&#34;&gt;slides&lt;/a&gt;, and &lt;a href=&#34;https://updates.timsherratt.org/2025/09/23/exploring-slv-urls.html&#34;&gt;blog post about urls&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;bounding boxes for parish maps (&lt;a href=&#34;https://updates.timsherratt.org/2025/10/06/creating-bounding-boxes-for-parish.html&#34;&gt;blog post&lt;/a&gt;, &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency&#34;&gt;code&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;geolocating the Committee for Urban Action collection of photographs (&lt;a href=&#34;https://wragge.github.io/slv-demo-apps/cua-browser.html&#34;&gt;prototype interface&lt;/a&gt;, still documenting the method)&lt;/li&gt;
&lt;li&gt;a new fully-searchable version of the Sands &amp;amp; MacDougall&amp;rsquo;s directories (&lt;a href=&#34;https://updates.timsherratt.org/2025/11/12/a-new-way-of-searching.html&#34;&gt;blog post&lt;/a&gt;, &lt;a href=&#34;https://glam-workbench.net/state-library-victoria/sands-macdougall-directories/&#34;&gt;database&lt;/a&gt;, and &lt;a href=&#34;https://updates.timsherratt.org/2025/11/16/some-sands-mac-tweaks-thanks.html&#34;&gt;another blog post&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;georeferencing digitised maps – over 500 so far! (&lt;a href=&#34;https://wragge.github.io/slv-allmaps/&#34;&gt;documentation&lt;/a&gt;, &lt;a href=&#34;https://wragge.github.io/slv-allmaps/dashboard.html&#34;&gt;dashboard&lt;/a&gt;, &lt;a href=&#34;https://github.com/wragge/slv-allmaps&#34;&gt;data repository&lt;/a&gt;, &lt;a href=&#34;https://updates.timsherratt.org/2025/11/04/turning-the-slvs-maps-into.html&#34;&gt;blog post&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;and as of yesterday, 3,000+ geolocated newspapers (documentation and interface coming!)&lt;/li&gt;
&lt;/ul&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-11-18-15-22-26.png&#34; width=&#34;600&#34; height=&#34;356&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;First attempt at mapping places of publication and distribution of Victorian newspapers&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;At the moment I&amp;rsquo;m trying to bring it all together in a new interface that let&amp;rsquo;s you type in an address and find collection materials relating to your home, your street, and your suburb. Only two weeks to go! Eeek!&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-11-15-17-30-26.png&#34; width=&#34;600&#34; height=&#34;454&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Work in progress!&lt;/figcaption&gt;&lt;/figure&gt;
</description>
      <source:markdown>My stint as [Creative Technologist-in-Residence at the State Library of Victoria LAB](https://updates.timsherratt.org/2025/09/22/creative-technologistinresidence-at-the-state.html) comes to an end in a few weeks time and I&#39;m frantically trying to pull things together. I&#39;ll be back on-site at the Library from 1 to 5 December for a few events, and to report back to staff on what I&#39;ve been doing.

On Tuesday **2 December**, there&#39;ll be a public workshop on using and contributing to the [GLAM Workbench](https://glam-workbench.net). Here&#39;s the blurb:

&gt; More and more GLAM organisations are looking to share their data to foster creativity and support new types of research. But how can you help potential users understand the possibilities of your data? This workshop will explore how GLAM organisations can create and share resources that encourage experimentation.
&gt;
&gt; The GLAM Workbench is a large collection of tools, hacks, and tutorials aimed at helping researchers make use of collection data. It uses platforms such as Jupyter notebooks to create live, working examples that run in your browser without additional software. Similar repositories of computational resources are being developed by GLAM organisations around the world.
&gt;
&gt; This workshop will introduce the technologies and standards used in the GLAM Workbench, such as Jupyter notebooks. It will provide an overview of related activity around the world, including best practice guidelines for GLAM organisations developing computational resources. It will explain how organisations and individuals can contribute content to the GLAM Workbench, or use it as a model to create their own specialised workbenches.
&gt;
&gt; Sharing data is important, but so is sharing skills, tools, and knowledge. Come along to find out how the GLAM Workbench can help.

It&#39;s a free, hybrid event (in person and online) and will run from 1.00-3.00pm. A sign up page should be available soon.

On Wednesday **3 December** I&#39;m presenting the results of my residency in a &#39;technologist&#39;s talk&#39;. It&#39;s an internal event, but it&#39;s in the public &#39;Create quarter&#39; of the Library, so I think anyone can pop in. Hopefully there&#39;ll be a video I can share.

To give you an idea of what I&#39;ll be talking about, here&#39;s some of the outcomes so far:

- hacking the library workshop ([slides](https://slides.com/wragge/slv-code-club), and [blog post about urls](https://updates.timsherratt.org/2025/09/23/exploring-slv-urls.html))
- bounding boxes for parish maps ([blog post](https://updates.timsherratt.org/2025/10/06/creating-bounding-boxes-for-parish.html), [code](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency))
- geolocating the Committee for Urban Action collection of photographs ([prototype interface](https://wragge.github.io/slv-demo-apps/cua-browser.html), still documenting the method)
- a new fully-searchable version of the Sands &amp; MacDougall&#39;s directories ([blog post](https://updates.timsherratt.org/2025/11/12/a-new-way-of-searching.html), [database](https://glam-workbench.net/state-library-victoria/sands-macdougall-directories/), and [another blog post](https://updates.timsherratt.org/2025/11/16/some-sands-mac-tweaks-thanks.html))
- georeferencing digitised maps – over 500 so far! ([documentation](https://wragge.github.io/slv-allmaps/), [dashboard](https://wragge.github.io/slv-allmaps/dashboard.html), [data repository](https://github.com/wragge/slv-allmaps), [blog post](https://updates.timsherratt.org/2025/11/04/turning-the-slvs-maps-into.html))
- and as of yesterday, 3,000+ geolocated newspapers (documentation and interface coming!)

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-11-18-15-22-26.png&#34; width=&#34;600&#34; height=&#34;356&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;First attempt at mapping places of publication and distribution of Victorian newspapers&lt;/figcaption&gt;&lt;/figure&gt;

At the moment I&#39;m trying to bring it all together in a new interface that let&#39;s you type in an address and find collection materials relating to your home, your street, and your suburb. Only two weeks to go! Eeek!

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-11-15-17-30-26.png&#34; width=&#34;600&#34; height=&#34;454&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Work in progress!&lt;/figcaption&gt;&lt;/figure&gt;




</source:markdown>
    </item>
    
    <item>
      <title>Some Sands &amp; Mac tweaks thanks to ALTO and IIIF</title>
      <link>https://updates.timsherratt.org/2025/11/16/some-sands-mac-tweaks-thanks.html</link>
      <pubDate>Sun, 16 Nov 2025 15:17:26 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/11/16/some-sands-mac-tweaks-thanks.html</guid>
      <description>&lt;p&gt;I posted recently about &lt;a href=&#34;https://updates.timsherratt.org/2025/11/12/a-new-way-of-searching.html&#34;&gt;my new fully-searchable version of the Sands &amp;amp; MacDougall directories&lt;/a&gt;. I&amp;rsquo;ve now moved on to try and pull together a number of the State Library of Victoria&amp;rsquo;s place-based collections into a new discovery interface. It&amp;rsquo;s going to be a busy couple of weeks as my residency ends in early December!&lt;/p&gt;
&lt;p&gt;I wanted to incorporate Sands &amp;amp; Mac search results into the new interface. Getting the data was easy because &lt;a href=&#34;https://datasette.io&#34;&gt;Datasette&lt;/a&gt; has a JSON API baked in. But what about the images? I could just display a thumbnail of the whole page, but it would be better to show a snippet of the actual entry. Thanks to &lt;a href=&#34;https://iiif.io&#34;&gt;IIIF&lt;/a&gt; and &lt;a href=&#34;https://en.wikipedia.org/wiki/Analyzed_Layout_and_Text_Object&#34;&gt;ALTO&lt;/a&gt;, I now can.&lt;/p&gt;
&lt;p&gt;IIIF makes it easy to cut small sections out of a larger image. You just put the coordinates of the desired section in the IIIF url. As I noted in my previous post, the ALTO files that contain the OCR data from Sands &amp;amp; Mac include the coordinates of every line, and every word. I just had to bring the two together.&lt;/p&gt;
&lt;p&gt;All I did was update the code that extracts the data from the ALTO files to save the results as newline delimited JSON instead of a plain text file. Each line in each volume of Sands &amp;amp; Mac is now saved a JSON object that contains the text, as well as the height, width, vertical position, and horizontal position of the line within the page image. When I load up the SQLite database, I add the values for &lt;code&gt;h&lt;/code&gt;, &lt;code&gt;w&lt;/code&gt;, &lt;code&gt;x&lt;/code&gt;, and &lt;code&gt;y&lt;/code&gt; as well as the text for each line.&lt;/p&gt;
&lt;p&gt;What does this make possible?&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When you go to an individual entry, the page image now automatically pans and zooms so that the current entry is at the centre of the image viewer. I just updated the &lt;a href=&#34;https://openseadragon.github.io&#34;&gt;OpenSeadragon&lt;/a&gt; code to focus on the entry&amp;rsquo;s position.&lt;/li&gt;
&lt;li&gt;If you share an entry on social media, a snipped out section of the page image showing the selected entry is displayed as there&amp;rsquo;s now an image &lt;code&gt;META&lt;/code&gt; tag that points to an IIIF url.&lt;/li&gt;
&lt;li&gt;You can retrieve entries via the API and use the coordinates to request snipped out images of them via IIIF.&lt;/li&gt;
&lt;/ol&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-11-16-15-06-52.png&#34; width=&#34;600&#34; height=&#34;330&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Nice image snippets thanks to IIIF and ALTO (and a sneak prview of what&#39;s coming...)&lt;/figcaption&gt;&lt;/figure&gt;
</description>
      <source:markdown>I posted recently about [my new fully-searchable version of the Sands &amp; MacDougall directories](https://updates.timsherratt.org/2025/11/12/a-new-way-of-searching.html). I&#39;ve now moved on to try and pull together a number of the State Library of Victoria&#39;s place-based collections into a new discovery interface. It&#39;s going to be a busy couple of weeks as my residency ends in early December!

I wanted to incorporate Sands &amp; Mac search results into the new interface. Getting the data was easy because [Datasette](https://datasette.io) has a JSON API baked in. But what about the images? I could just display a thumbnail of the whole page, but it would be better to show a snippet of the actual entry. Thanks to [IIIF](https://iiif.io) and [ALTO](https://en.wikipedia.org/wiki/Analyzed_Layout_and_Text_Object), I now can.

IIIF makes it easy to cut small sections out of a larger image. You just put the coordinates of the desired section in the IIIF url. As I noted in my previous post, the ALTO files that contain the OCR data from Sands &amp; Mac include the coordinates of every line, and every word. I just had to bring the two together.

All I did was update the code that extracts the data from the ALTO files to save the results as newline delimited JSON instead of a plain text file. Each line in each volume of Sands &amp; Mac is now saved a JSON object that contains the text, as well as the height, width, vertical position, and horizontal position of the line within the page image. When I load up the SQLite database, I add the values for `h`, `w`, `x`, and `y` as well as the text for each line.

What does this make possible?

1. When you go to an individual entry, the page image now automatically pans and zooms so that the current entry is at the centre of the image viewer. I just updated the [OpenSeadragon](https://openseadragon.github.io) code to focus on the entry&#39;s position.
2. If you share an entry on social media, a snipped out section of the page image showing the selected entry is displayed as there&#39;s now an image `META` tag that points to an IIIF url.
3. You can retrieve entries via the API and use the coordinates to request snipped out images of them via IIIF.

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-11-16-15-06-52.png&#34; width=&#34;600&#34; height=&#34;330&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Nice image snippets thanks to IIIF and ALTO (and a sneak prview of what&#39;s coming...)&lt;/figcaption&gt;&lt;/figure&gt;


</source:markdown>
    </item>
    
    <item>
      <title>A new way of searching Sands &amp; Mac</title>
      <link>https://updates.timsherratt.org/2025/11/12/a-new-way-of-searching.html</link>
      <pubDate>Wed, 12 Nov 2025 22:25:21 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/11/12/a-new-way-of-searching.html</guid>
      <description>&lt;p&gt;In the fortnight I spent onsite at the State Library of Victoria, &amp;lsquo;Sands &amp;amp; Mac&amp;rsquo; was mentioned many times. And no wonder. The &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/collectionDiscovery?vid=61SLV_INST:SLV&amp;amp;collectionId=81213035910007636&#34;&gt;Sands &amp;amp; McDougall&amp;rsquo;s directories&lt;/a&gt; are a goldmine for anyone researching family, local, or social history. They list thousands of names and addresses, enabling you to find individuals, and explore changing land use over time. When people ask the SLV&amp;rsquo;s librarians, &amp;lsquo;What can you tell me about the history of my house?&amp;rsquo;, Sands &amp;amp; Mac is one of the first resources consulted.&lt;/p&gt;
&lt;p&gt;The SLV has digitised &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/collectionDiscovery?vid=61SLV_INST:SLV&amp;amp;collectionId=81213035910007636&#34;&gt;24 volumes of Sands &amp;amp; Mac&lt;/a&gt;, one every five years from 1860 to 1974. You can browse the contents of each volume in the SLV image viewer, using the partial contents listing to help you find your way to sections of interest. To search the full text content you need to use the PDF version, either in the built-in viewer, or by downloading the PDF. There&amp;rsquo;s a &lt;a href=&#34;https://blogs.slv.vic.gov.au/tips-and-tricks/collection-discovery-tips-sands-mcdougalls-directories/&#34;&gt;handy guide to using Sands &amp;amp; Mac&lt;/a&gt; that explains the options.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;However, there&amp;rsquo;s currently no way of searching across all 24 volumes, so as part of my residency at the SLV LAB, I thought I&amp;rsquo;d make one!&lt;/strong&gt;&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/sands-and-mac.png&#34; width=&#34;600&#34; height=&#34;310&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;&lt;a href=&#34;https://glam-workbench.net/state-library-victoria/sands-macdougall-directories/&#34;&gt;&lt;b&gt;Try it now!&lt;/b&gt;&lt;/a&gt;&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;My &lt;a href=&#34;https://glam-workbench.net/state-library-victoria/sands-macdougall-directories/&#34;&gt;new Sands &amp;amp; Mac database&lt;/a&gt; follows the pattern I&amp;rsquo;ve used previously to create fully-searchable versions of the &lt;a href=&#34;https://glam-workbench.net/trove-journals/nsw-post-office-directories/&#34;&gt;NSW Post Office directories&lt;/a&gt;, &lt;a href=&#34;https://glam-workbench.net/trove-journals/sydney-telephone-directories/&#34;&gt;Sydney telephone directories&lt;/a&gt;, and &lt;a href=&#34;https://glam-workbench.net/tasmanian-post-office-directories/&#34;&gt;Tasmanian Post Office directories&lt;/a&gt;. Every line of text is saved to a database, so a single query searches for entries across all volumes. You can also use advanced search features like wildcards and boolean operators.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/sands-and-mac-search.png&#34; width=&#34;600&#34; height=&#34;543&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Search across all 24 volumes!&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Once you&amp;rsquo;ve found a relevant entry you can view it in context, alongside a zoomable image of the page. You can even use Zotero to save individual entries to your own research database. &lt;a href=&#34;https://chineseaustralia.org/from-the-archive-uncovering-the-everyday-heritage-of-chinese-tasmanians/&#34;&gt;This blog post&lt;/a&gt; from the Everyday Heritage project describes how the Tasmanian directories have been used to map Tasmania&amp;rsquo;s Chinese population.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/sands-and-mac-entry.png&#34; width=&#34;600&#34; height=&#34;370&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;View each entry in context! (Here&#39;s my Dad building his first house in Beaumaris in the 1950s.)&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;There&amp;rsquo;s still a few things I&amp;rsquo;d like to try, such as making use of the table of contents information for each volume. I&amp;rsquo;d also like to create some additional entry points to take users directly to listings for individual suburbs (maybe even streets!). Each volume has a directory of suburbs, so it would be a matter of extracting and cleaning the data and linking the entries to digitised pages. Certainly possible, but I don&amp;rsquo;t think I&amp;rsquo;ll have time to get it all done before the end of my residency. Perhaps I&amp;rsquo;ll try to get at least one volume done to demonstrate how it might work, and the value it would add. As I was writing this blog post I also realised there&amp;rsquo;s &lt;a href=&#34;https://www.environment.vic.gov.au/sustainability/victoria-unearthed/about-the-data/sands-and-mcdougall&#34;&gt;a dataset of businesses&lt;/a&gt; extracted from the Sands &amp;amp; Mac, so I need to think about how I can use that as well!&lt;/p&gt;
&lt;h2 id=&#34;technical-information-follows&#34;&gt;Technical information follows&amp;hellip;&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve documented the process I used to create fully-searchable versions of the &lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/&#34;&gt;Tasmanian&lt;/a&gt; and &lt;a href=&#34;https://glam-workbench.net/trove-journals/create-text-db-indexed-by-line/&#34;&gt;NSW directories&lt;/a&gt;  in the GLAM Workbench. I followed a similar method for Sands and Mac, though with a few dead-ends and discoveries along the way.&lt;/p&gt;
&lt;h3 id=&#34;downloading-the-pdfs&#34;&gt;Downloading the PDFs&lt;/h3&gt;
&lt;p&gt;I assumed that it would be easiest to work from the PDF versions of each volume, as I&amp;rsquo;d done for Tasmania. So I set about finding a way to download them all. There&amp;rsquo;s only 24 volumes, so I &lt;em&gt;could&lt;/em&gt; have downloaded them manually, but where&amp;rsquo;s the fun in that?&lt;/p&gt;
&lt;p&gt;I started with a CSV file listing the Sands &amp;amp; Mac volumes that I downloaded from the catalogue. This gave me the Alma identifiers for each volume. To download the PDFs I needed two more identifiers, the &lt;code&gt;IE&lt;/code&gt; identifier assigned to each digitised item, and a file identifier that points to the PDF version of the item. The &lt;code&gt;IE&lt;/code&gt; identifier can be extracted from the item&amp;rsquo;s MARC record, as I described in &lt;a href=&#34;https://updates.timsherratt.org/2025/09/23/exploring-slv-urls.html&#34;&gt;my post on exploring urls&lt;/a&gt;. The PDF file identifier was a bit more difficult to track down. The PDF links in the image viewer are generated dynamically, so the data had to be coming from somewhere. Eventually I found that the viewer loaded a JSON file with all sorts of useful metadata in it!&lt;/p&gt;
&lt;p&gt;The url to download the JSON file is: &lt;code&gt;https://viewerapi.slv.vic.gov.au/?entity=[IE identifier]&amp;amp;dc_arrays=1&lt;/code&gt;. In the &lt;code&gt;summary&lt;/code&gt; section I found identifiers for &lt;code&gt;small_pdf&lt;/code&gt; and &lt;code&gt;master_pdf&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;I could then use these identifiers to construct urls to download the PDFs themselves: &lt;code&gt;https://rosetta.slv.vic.gov.au/delivery/DeliveryManagerServlet?dps_func=stream&amp;amp;dps_pid=[PDF id]&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Once I had the PDFs I used &lt;a href=&#34;https://github.com/pymupdf/PyMuPDF&#34;&gt;PyMuPDF&lt;/a&gt; to extract all the text and images. As I suspected the text wasn&amp;rsquo;t really fit for purpose. The OCR was ok, but the column structures were a mess. Because I wanted to index each entry individually, it was important to try and get the columns represented as accurately as possible. The images in the small PDFs were already bitonal, so I started feeding them to &lt;a href=&#34;https://github.com/tesseract-ocr/tesseract&#34;&gt;Tesseract&lt;/a&gt; to see if I could get better results. After a bit of tweaking, things were looking pretty good. But when I came to compile all the data, I realised there was a potential problem matching the PDF pages to the images available through IIIF. I found one case where some pages were missing from the PDF, and another couple where the page order was different.&lt;/p&gt;
&lt;p&gt;As I was looking around for a solution, I realised that those JSON files I downloaded to get the PDF identifiers also included links to &lt;a href=&#34;https://en.wikipedia.org/wiki/Analyzed_Layout_and_Text_Object&#34;&gt;ALTO XML&lt;/a&gt; files that contain all  the original OCR data (before it got mangled by the PDF formatting). There was one ALTO file for every page. Even better, the JSON linked the identifiers for the text and the image together – no more page mismatches!&lt;/p&gt;
&lt;h3 id=&#34;downloading-the-alto-files&#34;&gt;Downloading the ALTO files&lt;/h3&gt;
&lt;p&gt;Let&amp;rsquo;s start this again shall we. After wasting several days futzing about with the PDFs, I decided to download all the ALTO files and extract the text from them. As I downloaded each XML file, I also grabbed the corresponding image identifier from the JSON and included both identifiers in the file name for safe keeping.&lt;/p&gt;
&lt;p&gt;The ALTO files break the text down by block, line, and word. To extract the text, I just looped through every line, joining the words back together as a string, and writing the result to a new text file – one for each page.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s worth noting that the ALTO files include &lt;em&gt;all&lt;/em&gt; the positional data generated by the OCR process, so you have the size and position of every word on every page. I just pulled out the text, but there are many more interesting things you could do&amp;hellip;&lt;/p&gt;
&lt;h3 id=&#34;assembling-and-publishing-the-database&#34;&gt;Assembling and publishing the database&lt;/h3&gt;
&lt;p&gt;From here on everything pretty much followed the pattern of the NSW and Tasmanian directories. I looped through each volume, page, and line of text, adding the text and metadata to a SQLite database using &lt;a href=&#34;https://sqlite-utils.datasette.io/en/stable/&#34;&gt;sqlite_utils&lt;/a&gt;. I then indexed the text for full-text searching. At the same time I populated a metadata file with titles, urls, and few configuration details. The metadata file is used by &lt;a href=&#34;https://datasette.io/&#34;&gt;Datasette&lt;/a&gt; to fill in parts of the interface.&lt;/p&gt;
&lt;p&gt;I made some minor changes to the Datasette template I used for the other directories. In particular, I had to update the urls that loaded the &lt;a href=&#34;https://iiif.io&#34;&gt;IIIF&lt;/a&gt; images into the &lt;a href=&#34;https://openseadragon.github.io&#34;&gt;OpenSeadragon viewer&lt;/a&gt;. But it mostly just worked. It&amp;rsquo;s so nice to be able to reuse existing patterns!&lt;/p&gt;
&lt;p&gt;Finally, I used &lt;a href=&#34;https://docs.datasette.io/en/stable/publish.html&#34;&gt;Datasette&amp;rsquo;s &lt;code&gt;publish&lt;/code&gt; command&lt;/a&gt; to push everything to Google Cloudrun. The final database contains details of more than 50,000 pages, and over 19 million lines of text! It weighs in at about 1.7gb. The Cloudrun service will &amp;lsquo;scale to zero&amp;rsquo; when not in use. This saves some money and resources, but means it can take a little while to spin up. Once it&amp;rsquo;s loaded, it&amp;rsquo;s very fast. My &lt;a href=&#34;https://updates.timsherratt.org/2022/09/15/from-pdfs-to.html&#34;&gt;original post on the Tasmanian directories&lt;/a&gt; included a little note on costs, if you&amp;rsquo;re interested.&lt;/p&gt;
&lt;h2 id=&#34;more-information&#34;&gt;More information&lt;/h2&gt;
&lt;p&gt;The notebooks I used are on GitHub:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/download_sands_and_mac_pdfs.ipynb&#34;&gt;Download Sands and Mac PDFs and OCR text&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/load_sands_and_mac_into_datasette.ipynb&#34;&gt;Load data from the Sands and Mac directories into an SQLite database (for use with Datasette)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here are some posts about the NSW and Tasmanian directories:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2022/09/01/making-nsw-postal.html&#34;&gt;Making NSW Postal Directories (and other digitised directories) easier to search with the GLAM Workbench and Datasette&lt;/a&gt; (September 2022)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2022/09/15/from-pdfs-to.html&#34;&gt;From 48 PDFs to one searchable database – opening up the Tasmanian Post Office Directories with the GLAM Workbench&lt;/a&gt; (September 2022)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2024/09/26/wheres-missing-volume.html&#34;&gt;Where&amp;rsquo;s 1920? Missing volume added to Tasmanian Post Office Directories!&lt;/a&gt; (September 2024)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2024/11/21/six-more-volumes.html&#34;&gt;Six more volumes added to the searchable database of Tasmanian Post Office Directories!&lt;/a&gt; (November 2024)&lt;/li&gt;
&lt;/ul&gt;
</description>
      <source:markdown>In the fortnight I spent onsite at the State Library of Victoria, &#39;Sands &amp; Mac&#39; was mentioned many times. And no wonder. The [Sands &amp; McDougall&#39;s directories](https://find.slv.vic.gov.au/discovery/collectionDiscovery?vid=61SLV_INST:SLV&amp;collectionId=81213035910007636) are a goldmine for anyone researching family, local, or social history. They list thousands of names and addresses, enabling you to find individuals, and explore changing land use over time. When people ask the SLV&#39;s librarians, &#39;What can you tell me about the history of my house?&#39;, Sands &amp; Mac is one of the first resources consulted.

The SLV has digitised [24 volumes of Sands &amp; Mac](https://find.slv.vic.gov.au/discovery/collectionDiscovery?vid=61SLV_INST:SLV&amp;collectionId=81213035910007636), one every five years from 1860 to 1974. You can browse the contents of each volume in the SLV image viewer, using the partial contents listing to help you find your way to sections of interest. To search the full text content you need to use the PDF version, either in the built-in viewer, or by downloading the PDF. There&#39;s a [handy guide to using Sands &amp; Mac](https://blogs.slv.vic.gov.au/tips-and-tricks/collection-discovery-tips-sands-mcdougalls-directories/) that explains the options.

**However, there&#39;s currently no way of searching across all 24 volumes, so as part of my residency at the SLV LAB, I thought I&#39;d make one!**

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/sands-and-mac.png&#34; width=&#34;600&#34; height=&#34;310&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;&lt;a href=&#34;https://glam-workbench.net/state-library-victoria/sands-macdougall-directories/&#34;&gt;&lt;b&gt;Try it now!&lt;/b&gt;&lt;/a&gt;&lt;/figcaption&gt;&lt;/figure&gt;

My [new Sands &amp; Mac database](https://glam-workbench.net/state-library-victoria/sands-macdougall-directories/) follows the pattern I&#39;ve used previously to create fully-searchable versions of the [NSW Post Office directories](https://glam-workbench.net/trove-journals/nsw-post-office-directories/), [Sydney telephone directories](https://glam-workbench.net/trove-journals/sydney-telephone-directories/), and [Tasmanian Post Office directories](https://glam-workbench.net/tasmanian-post-office-directories/). Every line of text is saved to a database, so a single query searches for entries across all volumes. You can also use advanced search features like wildcards and boolean operators.

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/sands-and-mac-search.png&#34; width=&#34;600&#34; height=&#34;543&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Search across all 24 volumes!&lt;/figcaption&gt;&lt;/figure&gt;

Once you&#39;ve found a relevant entry you can view it in context, alongside a zoomable image of the page. You can even use Zotero to save individual entries to your own research database. [This blog post](https://chineseaustralia.org/from-the-archive-uncovering-the-everyday-heritage-of-chinese-tasmanians/) from the Everyday Heritage project describes how the Tasmanian directories have been used to map Tasmania&#39;s Chinese population.

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/sands-and-mac-entry.png&#34; width=&#34;600&#34; height=&#34;370&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;View each entry in context! (Here&#39;s my Dad building his first house in Beaumaris in the 1950s.)&lt;/figcaption&gt;&lt;/figure&gt;

There&#39;s still a few things I&#39;d like to try, such as making use of the table of contents information for each volume. I&#39;d also like to create some additional entry points to take users directly to listings for individual suburbs (maybe even streets!). Each volume has a directory of suburbs, so it would be a matter of extracting and cleaning the data and linking the entries to digitised pages. Certainly possible, but I don&#39;t think I&#39;ll have time to get it all done before the end of my residency. Perhaps I&#39;ll try to get at least one volume done to demonstrate how it might work, and the value it would add. As I was writing this blog post I also realised there&#39;s [a dataset of businesses](https://www.environment.vic.gov.au/sustainability/victoria-unearthed/about-the-data/sands-and-mcdougall) extracted from the Sands &amp; Mac, so I need to think about how I can use that as well!

## Technical information follows...

I&#39;ve documented the process I used to create fully-searchable versions of the [Tasmanian](https://glam-workbench.net/libraries-tasmania/) and [NSW directories](https://glam-workbench.net/trove-journals/create-text-db-indexed-by-line/)  in the GLAM Workbench. I followed a similar method for Sands and Mac, though with a few dead-ends and discoveries along the way.

### Downloading the PDFs

I assumed that it would be easiest to work from the PDF versions of each volume, as I&#39;d done for Tasmania. So I set about finding a way to download them all. There&#39;s only 24 volumes, so I *could* have downloaded them manually, but where&#39;s the fun in that?

I started with a CSV file listing the Sands &amp; Mac volumes that I downloaded from the catalogue. This gave me the Alma identifiers for each volume. To download the PDFs I needed two more identifiers, the `IE` identifier assigned to each digitised item, and a file identifier that points to the PDF version of the item. The `IE` identifier can be extracted from the item&#39;s MARC record, as I described in [my post on exploring urls](https://updates.timsherratt.org/2025/09/23/exploring-slv-urls.html). The PDF file identifier was a bit more difficult to track down. The PDF links in the image viewer are generated dynamically, so the data had to be coming from somewhere. Eventually I found that the viewer loaded a JSON file with all sorts of useful metadata in it! 

The url to download the JSON file is: `https://viewerapi.slv.vic.gov.au/?entity=[IE identifier]&amp;dc_arrays=1`. In the `summary` section I found identifiers for `small_pdf` and `master_pdf`. 

I could then use these identifiers to construct urls to download the PDFs themselves: `https://rosetta.slv.vic.gov.au/delivery/DeliveryManagerServlet?dps_func=stream&amp;dps_pid=[PDF id]`

Once I had the PDFs I used [PyMuPDF](https://github.com/pymupdf/PyMuPDF) to extract all the text and images. As I suspected the text wasn&#39;t really fit for purpose. The OCR was ok, but the column structures were a mess. Because I wanted to index each entry individually, it was important to try and get the columns represented as accurately as possible. The images in the small PDFs were already bitonal, so I started feeding them to [Tesseract](https://github.com/tesseract-ocr/tesseract) to see if I could get better results. After a bit of tweaking, things were looking pretty good. But when I came to compile all the data, I realised there was a potential problem matching the PDF pages to the images available through IIIF. I found one case where some pages were missing from the PDF, and another couple where the page order was different.

As I was looking around for a solution, I realised that those JSON files I downloaded to get the PDF identifiers also included links to [ALTO XML](https://en.wikipedia.org/wiki/Analyzed_Layout_and_Text_Object) files that contain all  the original OCR data (before it got mangled by the PDF formatting). There was one ALTO file for every page. Even better, the JSON linked the identifiers for the text and the image together – no more page mismatches!

### Downloading the ALTO files

Let&#39;s start this again shall we. After wasting several days futzing about with the PDFs, I decided to download all the ALTO files and extract the text from them. As I downloaded each XML file, I also grabbed the corresponding image identifier from the JSON and included both identifiers in the file name for safe keeping.

The ALTO files break the text down by block, line, and word. To extract the text, I just looped through every line, joining the words back together as a string, and writing the result to a new text file – one for each page.

It&#39;s worth noting that the ALTO files include *all* the positional data generated by the OCR process, so you have the size and position of every word on every page. I just pulled out the text, but there are many more interesting things you could do...

### Assembling and publishing the database

From here on everything pretty much followed the pattern of the NSW and Tasmanian directories. I looped through each volume, page, and line of text, adding the text and metadata to a SQLite database using [sqlite_utils](https://sqlite-utils.datasette.io/en/stable/). I then indexed the text for full-text searching. At the same time I populated a metadata file with titles, urls, and few configuration details. The metadata file is used by [Datasette](https://datasette.io/) to fill in parts of the interface.

I made some minor changes to the Datasette template I used for the other directories. In particular, I had to update the urls that loaded the [IIIF](https://iiif.io) images into the [OpenSeadragon viewer](https://openseadragon.github.io). But it mostly just worked. It&#39;s so nice to be able to reuse existing patterns!

Finally, I used [Datasette&#39;s `publish` command](https://docs.datasette.io/en/stable/publish.html) to push everything to Google Cloudrun. The final database contains details of more than 50,000 pages, and over 19 million lines of text! It weighs in at about 1.7gb. The Cloudrun service will &#39;scale to zero&#39; when not in use. This saves some money and resources, but means it can take a little while to spin up. Once it&#39;s loaded, it&#39;s very fast. My [original post on the Tasmanian directories](https://updates.timsherratt.org/2022/09/15/from-pdfs-to.html) included a little note on costs, if you&#39;re interested.

## More information

The notebooks I used are on GitHub:

- [Download Sands and Mac PDFs and OCR text](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/download_sands_and_mac_pdfs.ipynb)
- [Load data from the Sands and Mac directories into an SQLite database (for use with Datasette)](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/load_sands_and_mac_into_datasette.ipynb)

Here are some posts about the NSW and Tasmanian directories:

- [Making NSW Postal Directories (and other digitised directories) easier to search with the GLAM Workbench and Datasette](https://updates.timsherratt.org/2022/09/01/making-nsw-postal.html) (September 2022)
- [From 48 PDFs to one searchable database – opening up the Tasmanian Post Office Directories with the GLAM Workbench](https://updates.timsherratt.org/2022/09/15/from-pdfs-to.html) (September 2022)
- [Where&#39;s 1920? Missing volume added to Tasmanian Post Office Directories!](https://updates.timsherratt.org/2024/09/26/wheres-missing-volume.html) (September 2024)
- [Six more volumes added to the searchable database of Tasmanian Post Office Directories!](https://updates.timsherratt.org/2024/11/21/six-more-volumes.html) (November 2024)


</source:markdown>
    </item>
    
    <item>
      <title>Turning the SLV&#39;s maps into data with Allmaps and some GLAM plumbing</title>
      <link>https://updates.timsherratt.org/2025/11/04/turning-the-slvs-maps-into.html</link>
      <pubDate>Tue, 04 Nov 2025 15:02:53 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/11/04/turning-the-slvs-maps-into.html</guid>
      <description>&lt;p&gt;I often describe what I do as GLAM data plumbing. Most of the time I&amp;rsquo;m not creating new tools, I&amp;rsquo;m figuring out what data is available and how I can connect it up to &lt;em&gt;existing&lt;/em&gt; tools. It&amp;rsquo;s rarely straightforward, but if I can get all the pipes connected and data flowing in the right direction, suddenly new things become possible.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Things like turning all the State Library of Victoria&amp;rsquo;s digitised maps into data.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve just &lt;a href=&#34;https://wragge.github.io/slv-allmaps/&#34;&gt;created a workflow&lt;/a&gt; that uses &lt;a href=&#34;https://allmaps.org&#34;&gt;AllMaps&lt;/a&gt; and &lt;a href=&#34;https://iiif.io/&#34;&gt;IIIF&lt;/a&gt; to georeference the SLV&amp;rsquo;s digitised maps. There&amp;rsquo;s some technical details below, but the idea is pretty simple. A userscript links the SLV image viewer to Allmaps – so you just click on a button, and the digitised map opens, ready for georeferencing.&lt;/p&gt;
&lt;p&gt;Why is this useful? Georeferencing relates a digitised map to real world geography. It describes the map&amp;rsquo;s position and extent using geospatial coordinates – turning historic documents into geospatial data that can be indexed, visualised and manipulated. Georeferencing opens digitised maps to new research uses.&lt;/p&gt;
&lt;p&gt;So, how many maps we can georeference before my residency finishes in December? Hundreds? Thousands? If you like maps and want to help, head to &lt;a href=&#34;https://wragge.github.io/slv-allmaps/&#34;&gt;the documentation page&lt;/a&gt; to find out how to get started. And if you want to see how things are progressing, have a look at &lt;a href=&#34;https://wragge.github.io/slv-allmaps/dashboard.html&#34;&gt;the project dashboard&lt;/a&gt;.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/slv-allmaps-docs.png&#34; width=&#34;600&#34; height=&#34;466&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;&lt;a href=&#34;https://wragge.github.io/slv-allmaps/&#34;&gt;View the documentation&lt;/a&gt; to get started&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;A few technical details follow&amp;hellip;&lt;/p&gt;
&lt;p&gt;Early on in my time as Creative Technologist-in-Residence at the State Library of Victoria, I started playing around with Allmaps for georeferencing digitised maps. It&amp;rsquo;s a great tool (really a suite of tools and standards) because instead of constructing a whole new platform it integrates with existing IIIF services. The SLV provides digitised images through IIIF, so I thought it should be possible to use Allmaps to georeference the SLV&amp;rsquo;s map collection.&lt;/p&gt;
&lt;p&gt;But I struck a problem that took some time to unravel. The IIIF urls in the SLV manifests include port numbers and that confused Allmaps. The manifests also sometimes contained references to image formats that weren&amp;rsquo;t actually accessible, generating errors when they were loaded. Hopefully these problems will be fixed by the SLV, but in the meantime I&amp;rsquo;ve created a proxy service that edits the manifest on the fly. The proxied urls can be loaded into the Allmaps Editor without errors. Pipes fixed, data flowing!&lt;/p&gt;
&lt;details&gt;
  &lt;summary&gt;Using the manifest proxy&lt;/summary&gt;
  &lt;p&gt;To generate a link to a proxied manifest, first grab the item&#39;s &lt;code&gt;IE&lt;/code&gt; identifier from the url of the digitised item viewer. For example, the identifier in this url &lt;code&gt;https://viewer.slv.vic.gov.au/?entity=IE15485265&amp;mode=browse&lt;/code&gt; is &lt;code&gt;IE15485265&lt;/code&gt;. Once you have the identifier, add it to the end of the url &lt;code&gt;https://wraggelabs.com/slv_iiif/&lt;/code&gt;. For example, &lt;a href=&#34;https://wraggelabs.com/slv_iiif/IE15485265&#34;&gt;https://wraggelabs.com/slv_iiif/IE15485265&lt;/a&gt;. You can then supply this url to the Allmaps editor.&lt;/p&gt;
&lt;/details&gt;
&lt;p&gt;But having to fiddle around with proxies didn&amp;rsquo;t make a great user experience. I needed some way of integrating the two services, so that a user could just click a button in the SLV website and start editing in Allmaps. Userscripts to the rescue!&lt;/p&gt;
&lt;p&gt;I wrote recently about &lt;a href=&#34;https://updates.timsherratt.org/2025/07/17/glam-hacking-with-userscripts.html&#34;&gt;hacking GLAM collection interfaces using userscripts&lt;/a&gt;. Since I started my residency at the SLV, I&amp;rsquo;ve also created a userscript to &lt;a href=&#34;https://gist.github.com/wragge/a37a4db854deffad956abc7bf918f6b0&#34;&gt;display the IIIF manifest url in the SLV image viewer&lt;/a&gt;, and run a Code Club workshop where we played around with &lt;a href=&#34;https://slides.com/wragge/slv-code-club&#34;&gt;an assortment of SLV website hacks&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As in a number of these examples, the &lt;a href=&#34;https://gist.github.com/wragge/5680daaec4b4b34ed5537e6ff79559a2&#34;&gt;georeferencing userscript&lt;/a&gt; adds new features to the SLV website, but there&amp;rsquo;s a fair bit more going on under the hood. It runs automatically every time you load the SLV image viewer, and then:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;it checks the metadata of the digitised item to see it it&amp;rsquo;s a map (or something that contains maps, like an atlas or street directory)&lt;/li&gt;
&lt;li&gt;if it looks like a map, it generates an Allmaps identifier using the item&amp;rsquo;s IIIF manifest url and checks with Allmaps to see whether the item has already been georeferenced&lt;/li&gt;
&lt;li&gt;it adds a &amp;lsquo;Georeferencing&amp;rsquo; section to the page, with a button to georeference the item (or edit the existing georeferencing)&lt;/li&gt;
&lt;li&gt;if the item has already been georeferenced, it adds a button to view the item in the Allmaps Viewer, and embeds a live preview&lt;/li&gt;
&lt;/ul&gt;
&lt;details&gt;
    &lt;summary&gt;Accessing metadata&lt;/summary&gt;
    &lt;p&gt;
        The userscript gets the item metadata from a JSON file that&#39;s loaded by the image viewer. The JSON file includes a lot of extra, useful information about the digitised item. To access the JSON file, you just construct a url like this: &lt;code&gt;https://viewerapi.slv.vic.gov.au/?entity=[IE identifier]&amp;dc_arrays=1&lt;/code&gt;. The IE identifier is in the url of the image viewer.
    &lt;/p&gt;
&lt;/details&gt;
&lt;details&gt;
    &lt;summary&gt;Allmaps identifiers&lt;/summary&gt;
    &lt;p&gt;
        Allmaps creates its identifiers by hash encoding the IIIF urls. The userscript borrows some code from the &lt;a href=&#34;https://github.com/allmaps/allmaps/tree/main/packages/id&#34;&gt;Allmaps id module&lt;/a&gt; to generate the ids, then sends a HEAD request to the Allmaps API to see whether an entry for the current manifest exists.
    &lt;/p&gt;
&lt;/details&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/alv-allmaps-not-georeferenced.png&#34; width=&#34;600&#34; height=&#34;313&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Example of an item that hasn&#39;t been georeferenced yet&lt;/figcaption&gt;&lt;/figure&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/slv-allmaps-georeferenced.png&#34; width=&#34;600&#34; height=&#34;462&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Example of an item that has been georeferenced, displaying an embedded version of the Allmaps viewer&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;I&amp;rsquo;ve also created a GitHub repository to save copies of the data. Every two hours &lt;a href=&#34;https://github.com/wragge/slv-allmaps/blob/main/harvest_allmaps_data.ipynb&#34;&gt;this notebook&lt;/a&gt; is run to query the Allmaps API for newly georeferenced maps. These are added to a dataset which is saved in three formats:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/slv-allmaps/blob/main/georeferenced_maps.csv&#34;&gt;a CSV file&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/slv-allmaps/blob/main/georeferenced_maps_datasette.csv&#34;&gt;a CSV file&lt;/a&gt; that includes thumbnails and links for &lt;a href=&#34;https://glam-workbench.net/datasette-lite/?csv=https%3A%2F%2Fgithub.com%2Fwragge%2Fslv-allmaps%2Fblob%2Fmain%2Fgeoreferenced_maps_datasette.csv&amp;amp;install=datasette-homepage-table&amp;amp;install=datasette-json-html&amp;amp;fts=manifest_title%2Cmap_title&#34;&gt;viewing in Datasette-Lite&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/slv-allmaps/blob/main/georeferenced_maps.geojson&#34;&gt;a GeoJSON file&lt;/a&gt;, that can be &lt;a href=&#34;https://geojson.io/#id=github:wragge/slv-allmaps/blob/main/georeferenced_maps.geojson&#34;&gt;viewed in services like geojson.io&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At the same time, the data for each individual map is downloaded and saved as &lt;a href=&#34;https://github.com/wragge/slv-allmaps/tree/main/maps&#34;&gt;IIIF annotations&lt;/a&gt; (in JSON) and &lt;a href=&#34;https://github.com/wragge/slv-allmaps/tree/main/geojson&#34;&gt;GeoJSON&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Finally, &lt;a href=&#34;https://github.com/wragge/slv-allmaps/blob/main/allmaps_dashboard.ipynb&#34;&gt;this notebook&lt;/a&gt; is run to generate &lt;a href=&#34;https://wragge.github.io/slv-allmaps/dashboard.html&#34;&gt;a dashboard&lt;/a&gt; that provides an overview of the project&amp;rsquo;s progress.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/geo-dashboard.png&#34; width=&#34;600&#34; height=&#34;616&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;The project dashboard is updated every two hours&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;One of the Allmaps developers described all my plumbing and workarounds as a &amp;lsquo;very cool lofi example of how you can set this up with little means&amp;rsquo;, and I think that&amp;rsquo;s pretty apt. It&amp;rsquo;s really just an experiment to demonstrate the possibilities, but by connecting up existing services it&amp;rsquo;s generating real data of long term value.&lt;/p&gt;
</description>
      <source:markdown>I often describe what I do as GLAM data plumbing. Most of the time I&#39;m not creating new tools, I&#39;m figuring out what data is available and how I can connect it up to *existing* tools. It&#39;s rarely straightforward, but if I can get all the pipes connected and data flowing in the right direction, suddenly new things become possible.

**Things like turning all the State Library of Victoria&#39;s digitised maps into data.**

I&#39;ve just [created a workflow](https://wragge.github.io/slv-allmaps/) that uses [AllMaps](https://allmaps.org) and [IIIF](https://iiif.io/) to georeference the SLV&#39;s digitised maps. There&#39;s some technical details below, but the idea is pretty simple. A userscript links the SLV image viewer to Allmaps – so you just click on a button, and the digitised map opens, ready for georeferencing. 

Why is this useful? Georeferencing relates a digitised map to real world geography. It describes the map&#39;s position and extent using geospatial coordinates – turning historic documents into geospatial data that can be indexed, visualised and manipulated. Georeferencing opens digitised maps to new research uses.

So, how many maps we can georeference before my residency finishes in December? Hundreds? Thousands? If you like maps and want to help, head to [the documentation page](https://wragge.github.io/slv-allmaps/) to find out how to get started. And if you want to see how things are progressing, have a look at [the project dashboard](https://wragge.github.io/slv-allmaps/dashboard.html).

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/slv-allmaps-docs.png&#34; width=&#34;600&#34; height=&#34;466&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;&lt;a href=&#34;https://wragge.github.io/slv-allmaps/&#34;&gt;View the documentation&lt;/a&gt; to get started&lt;/figcaption&gt;&lt;/figure&gt;

A few technical details follow...

Early on in my time as Creative Technologist-in-Residence at the State Library of Victoria, I started playing around with Allmaps for georeferencing digitised maps. It&#39;s a great tool (really a suite of tools and standards) because instead of constructing a whole new platform it integrates with existing IIIF services. The SLV provides digitised images through IIIF, so I thought it should be possible to use Allmaps to georeference the SLV&#39;s map collection.

But I struck a problem that took some time to unravel. The IIIF urls in the SLV manifests include port numbers and that confused Allmaps. The manifests also sometimes contained references to image formats that weren&#39;t actually accessible, generating errors when they were loaded. Hopefully these problems will be fixed by the SLV, but in the meantime I&#39;ve created a proxy service that edits the manifest on the fly. The proxied urls can be loaded into the Allmaps Editor without errors. Pipes fixed, data flowing!

&lt;details&gt;
  &lt;summary&gt;Using the manifest proxy&lt;/summary&gt;
  &lt;p&gt;To generate a link to a proxied manifest, first grab the item&#39;s &lt;code&gt;IE&lt;/code&gt; identifier from the url of the digitised item viewer. For example, the identifier in this url &lt;code&gt;https://viewer.slv.vic.gov.au/?entity=IE15485265&amp;mode=browse&lt;/code&gt; is &lt;code&gt;IE15485265&lt;/code&gt;. Once you have the identifier, add it to the end of the url &lt;code&gt;https://wraggelabs.com/slv_iiif/&lt;/code&gt;. For example, &lt;a href=&#34;https://wraggelabs.com/slv_iiif/IE15485265&#34;&gt;https://wraggelabs.com/slv_iiif/IE15485265&lt;/a&gt;. You can then supply this url to the Allmaps editor.&lt;/p&gt;
&lt;/details&gt;

But having to fiddle around with proxies didn&#39;t make a great user experience. I needed some way of integrating the two services, so that a user could just click a button in the SLV website and start editing in Allmaps. Userscripts to the rescue!

I wrote recently about [hacking GLAM collection interfaces using userscripts](https://updates.timsherratt.org/2025/07/17/glam-hacking-with-userscripts.html). Since I started my residency at the SLV, I&#39;ve also created a userscript to [display the IIIF manifest url in the SLV image viewer](https://gist.github.com/wragge/a37a4db854deffad956abc7bf918f6b0), and run a Code Club workshop where we played around with [an assortment of SLV website hacks](https://slides.com/wragge/slv-code-club). 

As in a number of these examples, the [georeferencing userscript](https://gist.github.com/wragge/5680daaec4b4b34ed5537e6ff79559a2) adds new features to the SLV website, but there&#39;s a fair bit more going on under the hood. It runs automatically every time you load the SLV image viewer, and then:

- it checks the metadata of the digitised item to see it it&#39;s a map (or something that contains maps, like an atlas or street directory)
- if it looks like a map, it generates an Allmaps identifier using the item&#39;s IIIF manifest url and checks with Allmaps to see whether the item has already been georeferenced
- it adds a &#39;Georeferencing&#39; section to the page, with a button to georeference the item (or edit the existing georeferencing)
- if the item has already been georeferenced, it adds a button to view the item in the Allmaps Viewer, and embeds a live preview

&lt;details&gt;
    &lt;summary&gt;Accessing metadata&lt;/summary&gt;
    &lt;p&gt;
        The userscript gets the item metadata from a JSON file that&#39;s loaded by the image viewer. The JSON file includes a lot of extra, useful information about the digitised item. To access the JSON file, you just construct a url like this: &lt;code&gt;https://viewerapi.slv.vic.gov.au/?entity=[IE identifier]&amp;dc_arrays=1&lt;/code&gt;. The IE identifier is in the url of the image viewer.
    &lt;/p&gt;
&lt;/details&gt;

&lt;details&gt;
    &lt;summary&gt;Allmaps identifiers&lt;/summary&gt;
    &lt;p&gt;
        Allmaps creates its identifiers by hash encoding the IIIF urls. The userscript borrows some code from the &lt;a href=&#34;https://github.com/allmaps/allmaps/tree/main/packages/id&#34;&gt;Allmaps id module&lt;/a&gt; to generate the ids, then sends a HEAD request to the Allmaps API to see whether an entry for the current manifest exists.
    &lt;/p&gt;
&lt;/details&gt;

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/alv-allmaps-not-georeferenced.png&#34; width=&#34;600&#34; height=&#34;313&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Example of an item that hasn&#39;t been georeferenced yet&lt;/figcaption&gt;&lt;/figure&gt;

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/slv-allmaps-georeferenced.png&#34; width=&#34;600&#34; height=&#34;462&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Example of an item that has been georeferenced, displaying an embedded version of the Allmaps viewer&lt;/figcaption&gt;&lt;/figure&gt;

I&#39;ve also created a GitHub repository to save copies of the data. Every two hours [this notebook](https://github.com/wragge/slv-allmaps/blob/main/harvest_allmaps_data.ipynb) is run to query the Allmaps API for newly georeferenced maps. These are added to a dataset which is saved in three formats:

- [a CSV file](https://github.com/wragge/slv-allmaps/blob/main/georeferenced_maps.csv)
- [a CSV file](https://github.com/wragge/slv-allmaps/blob/main/georeferenced_maps_datasette.csv) that includes thumbnails and links for [viewing in Datasette-Lite](https://glam-workbench.net/datasette-lite/?csv=https%3A%2F%2Fgithub.com%2Fwragge%2Fslv-allmaps%2Fblob%2Fmain%2Fgeoreferenced_maps_datasette.csv&amp;install=datasette-homepage-table&amp;install=datasette-json-html&amp;fts=manifest_title%2Cmap_title)
- [a GeoJSON file](https://github.com/wragge/slv-allmaps/blob/main/georeferenced_maps.geojson), that can be [viewed in services like geojson.io](https://geojson.io/#id=github:wragge/slv-allmaps/blob/main/georeferenced_maps.geojson)

At the same time, the data for each individual map is downloaded and saved as [IIIF annotations](https://github.com/wragge/slv-allmaps/tree/main/maps) (in JSON) and [GeoJSON](https://github.com/wragge/slv-allmaps/tree/main/geojson).

Finally, [this notebook](https://github.com/wragge/slv-allmaps/blob/main/allmaps_dashboard.ipynb) is run to generate [a dashboard](https://wragge.github.io/slv-allmaps/dashboard.html) that provides an overview of the project&#39;s progress.

&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/geo-dashboard.png&#34; width=&#34;600&#34; height=&#34;616&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;The project dashboard is updated every two hours&lt;/figcaption&gt;&lt;/figure&gt;

One of the Allmaps developers described all my plumbing and workarounds as a &#39;very cool lofi example of how you can set this up with little means&#39;, and I think that&#39;s pretty apt. It&#39;s really just an experiment to demonstrate the possibilities, but by connecting up existing services it&#39;s generating real data of long term value.
</source:markdown>
    </item>
    
    <item>
      <title>Me at 63...</title>
      <link>https://updates.timsherratt.org/2025/11/03/me-at.html</link>
      <pubDate>Mon, 03 Nov 2025 17:55:17 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/11/03/me-at.html</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published in Pharos, newsletter of the Professional Historian&amp;rsquo;s Association (Vic &amp;amp; Tas), October-November 2025, in the &amp;lsquo;Member Profile&amp;rsquo; section.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&#34;what-was-your-first-history-related-job-what-path-have-you-taken-since-then&#34;&gt;What was your first history related job? What path have you taken since then?&lt;/h2&gt;
&lt;p&gt;In the early 1990s I started working for a small self-funded organisation called the Australian Science Archives Project. Our mission was to preserve and raise awareness of Australia&amp;rsquo;s scientific past. When the web came along, we realised it provided an enormous opportunity to communicate history to the public. So I taught myself web development and created the first archives website in Australia. Since then my work has continued to explore what happens when we release GLAM collections into online spaces where people can see and use them differently.&lt;/p&gt;
&lt;h2 id=&#34;what-kind-of-work-have-you-done-what-are-you-working-on-now&#34;&gt;What kind of work have you done? What are you working on now?&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve had a range of jobs in the GLAM and university sectors. While &amp;lsquo;history&amp;rsquo; wasn&amp;rsquo;t often in my job title, I&amp;rsquo;ve always regarded myself as a historian first – whether I was coding, editing, writing, teaching, or managing, history was always the frame through which I understood my work. At the same time, I&amp;rsquo;ve maintained my own independent practice as a &amp;lsquo;historian and hacker&amp;rsquo;, developing tools and resources for other researchers, such as the &lt;a href=&#34;https://glam-workbench.net&#34;&gt;GLAM Workbench&lt;/a&gt;. Much of this work is unfunded, but by sharing it openly I&amp;rsquo;ve created new opportunities for collaboration. For example, I&amp;rsquo;m currently the &amp;lsquo;Creative Technologist-in-Residence&amp;rsquo; at the State Library of Victoria, bringing my years of GLAM hacking to bear on the Library&amp;rsquo;s place based collections.&lt;/p&gt;
&lt;h2 id=&#34;research-or-writing-what-do-you-enjoy-more-and-why&#34;&gt;Research or writing? (What do you enjoy more and why?)&lt;/h2&gt;
&lt;p&gt;Researching, or writing, or coding, or teaching, or outreaching (what is the correct verb?) – all have their joys and travails. For me, research is less about finding things in archives and libraries, and more about &lt;em&gt;how&lt;/em&gt; we find things in archives and libraries. I poke about in online collections to try and understand how they work, what they reveal, and what they hide. This often leads to the development of new tools, the writing of  documentation and blog posts, and sometimes even real, published articles. It&amp;rsquo;s a process that has consumed my life, for better or worse. Coding often slips into obsession when I have a gnarly problem to crack. Writing is a slog, but there&amp;rsquo;s nothing like the pleasure of a finely-turned sentence. Teaching is exhausting, but also exhilarating when you see the light bulb of understanding flick on.&lt;/p&gt;
&lt;h2 id=&#34;what-are-the-best-and-hardest-things-about-the-kind-of-work-you-do&#34;&gt;What are the best and hardest things about the kind of work you do?&lt;/h2&gt;
&lt;p&gt;The best thing, the absolute hands-down best thing, is hearing from people who use, or have benefited from the tools and resources that I&amp;rsquo;ve created. I make things to help researchers see and use GLAM collections in new ways, so finding out what they&amp;rsquo;ve been doing with my stuff always provides a much-needed jolt of inspiration.&lt;/p&gt;
&lt;p&gt;However, the flip side is that getting information about my tools and resources out to the people who might benefit most is hard and often frustrating work. I churn away in the social media mines, but people and organisations seem much more reluctant to share new work these days. There was a time (yeah, the good old days) when GLAM organisations actively engaged with researchers online, sharing the cool things people were doing with their collections. But not now. We all learn through the generosity of others, and I think its important that we find ways to support and enlarge the realm of generosity.&lt;/p&gt;
</description>
      <source:markdown>*Originally published in Pharos, newsletter of the Professional Historian&#39;s Association (Vic &amp; Tas), October-November 2025, in the &#39;Member Profile&#39; section.*

## What was your first history related job? What path have you taken since then?

In the early 1990s I started working for a small self-funded organisation called the Australian Science Archives Project. Our mission was to preserve and raise awareness of Australia&#39;s scientific past. When the web came along, we realised it provided an enormous opportunity to communicate history to the public. So I taught myself web development and created the first archives website in Australia. Since then my work has continued to explore what happens when we release GLAM collections into online spaces where people can see and use them differently.

## What kind of work have you done? What are you working on now?

I&#39;ve had a range of jobs in the GLAM and university sectors. While &#39;history&#39; wasn&#39;t often in my job title, I&#39;ve always regarded myself as a historian first – whether I was coding, editing, writing, teaching, or managing, history was always the frame through which I understood my work. At the same time, I&#39;ve maintained my own independent practice as a &#39;historian and hacker&#39;, developing tools and resources for other researchers, such as the [GLAM Workbench](https://glam-workbench.net). Much of this work is unfunded, but by sharing it openly I&#39;ve created new opportunities for collaboration. For example, I&#39;m currently the &#39;Creative Technologist-in-Residence&#39; at the State Library of Victoria, bringing my years of GLAM hacking to bear on the Library&#39;s place based collections.

## Research or writing? (What do you enjoy more and why?)

Researching, or writing, or coding, or teaching, or outreaching (what is the correct verb?) – all have their joys and travails. For me, research is less about finding things in archives and libraries, and more about *how* we find things in archives and libraries. I poke about in online collections to try and understand how they work, what they reveal, and what they hide. This often leads to the development of new tools, the writing of  documentation and blog posts, and sometimes even real, published articles. It&#39;s a process that has consumed my life, for better or worse. Coding often slips into obsession when I have a gnarly problem to crack. Writing is a slog, but there&#39;s nothing like the pleasure of a finely-turned sentence. Teaching is exhausting, but also exhilarating when you see the light bulb of understanding flick on.

## What are the best and hardest things about the kind of work you do?

The best thing, the absolute hands-down best thing, is hearing from people who use, or have benefited from the tools and resources that I&#39;ve created. I make things to help researchers see and use GLAM collections in new ways, so finding out what they&#39;ve been doing with my stuff always provides a much-needed jolt of inspiration.

However, the flip side is that getting information about my tools and resources out to the people who might benefit most is hard and often frustrating work. I churn away in the social media mines, but people and organisations seem much more reluctant to share new work these days. There was a time (yeah, the good old days) when GLAM organisations actively engaged with researchers online, sharing the cool things people were doing with their collections. But not now. We all learn through the generosity of others, and I think its important that we find ways to support and enlarge the realm of generosity.


</source:markdown>
    </item>
    
    <item>
      <title>Creating bounding boxes for parish maps in the SLV collection</title>
      <link>https://updates.timsherratt.org/2025/10/06/creating-bounding-boxes-for-parish.html</link>
      <pubDate>Mon, 06 Oct 2025 15:17:51 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/10/06/creating-bounding-boxes-for-parish.html</guid>
      <description>&lt;p&gt;The State Library of Victoria holds a collection of &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/search?query=series,exact,Parish%20maps%20of%20Victoria&amp;amp;vid=61SLV_INST:SLV&amp;amp;offset=0&#34;&gt;8,804 parish maps&lt;/a&gt;. As part of my residency at the SLV LAB, I&amp;rsquo;ve been poking around in the metadata.&lt;/p&gt;
&lt;p&gt;SLV staff have geocoded many of the parish maps using the &lt;a href=&#34;https://placenames.fsdf.org.au&#34;&gt;Composite Gazetteer of Australia&lt;/a&gt;, which provides coordinates for Victorian parishes and boroughs. These coordinates give us a point which should be roughly at the centre of each map, enabling us to visualise their locations and distribution. But how much area do they cover? To answer that question we need a bounding box that includes the coordinates of each corner of the map. We could create bounding boxes by using something like &lt;a href=&#34;https://allmaps.org&#34;&gt;AllMaps&lt;/a&gt; or &lt;a href=&#34;https://www.mapwarper.net&#34;&gt;MapWarper&lt;/a&gt; to georeference each individual map, but that&amp;rsquo;s going to take a while! As a quick and dirty alternative, I wondered if it was possible to generate approximate bounding boxes from the available metadata. It seems we can!&lt;/p&gt;
&lt;h2 id=&#34;the-metadata&#34;&gt;The metadata&lt;/h2&gt;
&lt;p&gt;There are three pieces of metadata we need to construct bounding boxes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the latitude and longitude of the centre point&lt;/li&gt;
&lt;li&gt;the size of the physical map&lt;/li&gt;
&lt;li&gt;the scale of the map (ie how the size of the map relates to real world)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The coordinates and scale can be included in a couple of different places in the map&amp;rsquo;s MARC record. The &lt;a href=&#34;https://www.loc.gov/marc/bibliographic/bd034.html&#34;&gt;&lt;code&gt;034&lt;/code&gt;&lt;/a&gt; field is specifically for &amp;lsquo;Coded Cartographic Mathematical Data&amp;rsquo;. The relevant subfields are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;$a&lt;/code&gt;: category of scale&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$b&lt;/code&gt;: constant ratio linear horizontal scale (this is the most likely type of scale)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$d&lt;/code&gt;: westernmost longitude&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$e&lt;/code&gt;: easternmost longitude&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$f&lt;/code&gt;: northernmost latitude&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$g&lt;/code&gt;: southernmost latitude&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If the coordinates describe a point rather than a bounding box, then &lt;code&gt;$d&lt;/code&gt; and &lt;code&gt;$e&lt;/code&gt; will be the same, and &lt;code&gt;$f&lt;/code&gt; and &lt;code&gt;$g&lt;/code&gt; will be the same.&lt;/p&gt;
&lt;p&gt;String representations of coordinates and scale can be found in the &lt;code&gt;255&lt;/code&gt; field. The relevant subfields are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;$a&lt;/code&gt;: statement of scale, eg &lt;code&gt;Scale [ca. 1:90,000].&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$c&lt;/code&gt;: statement of coordinates, eg &lt;code&gt;(E 142°18&#39;/S 37°33&#39;)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The size of the map is recorded in &lt;a href=&#34;https://www.loc.gov/marc/bibliographic/bd300.html&#34;&gt;&lt;code&gt;300&lt;/code&gt;&lt;/a&gt; (physical description) field under the &lt;code&gt;$c&lt;/code&gt; (dimensions) subfield. For example: &lt;code&gt;on sheet 40 x 51 cm &lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;the-method&#34;&gt;The method&lt;/h2&gt;
&lt;p&gt;I started with an existing dataset downloaded from the catalogue by SLV staff. This dataset included the scale and coordinate information in the &lt;code&gt;034&lt;/code&gt; field, and the coordinate string in &lt;code&gt;255$c&lt;/code&gt;. At first I didn&amp;rsquo;t realise that the &lt;code&gt;034&lt;/code&gt; held geo data, so I separately downloaded the scale information from &lt;code&gt;255:$a&lt;/code&gt; in each item&amp;rsquo;s MARC record (d&amp;rsquo;oh). If the maps were digitised, I also wanted their image identifiers so I could access them through the SLV&amp;rsquo;s IIIF service. The image id from the &lt;code&gt;956$e&lt;/code&gt; field of the MARC record can be used to construct an IIIF manifest url, so I extracted them as well.&lt;/p&gt;
&lt;p&gt;Once I had all the catalogue data, I had to make sure everything was in a format I could work with. The coordinates in the MARC records are recorded as degrees/minutes/seconds, so I had to convert them to decimal values. The scale factor needed to be an integer, and I needed to extract the height and width as integers from the dimensions field.&lt;/p&gt;
&lt;p&gt;I used &lt;a href=&#34;https://pypi.org/project/lat-lon-parser/&#34;&gt;lat_lon_parser&lt;/a&gt; to convert the coordinates to decimal, but needed a bit of regex string manipulation to get the values into a format that could be parsed. Regex also came to the rescue in getting the map dimensions. All the details are &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/process_parish_maps.ipynb&#34;&gt;in this notebook&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;creating-bounding-boxes&#34;&gt;Creating bounding boxes&lt;/h3&gt;
&lt;p&gt;After some searching I found &lt;a href=&#34;https://stackoverflow.com/a/76910048&#34;&gt;this StackOverflow comment&lt;/a&gt; that described how to create a bounding box from a point, distance, and bearing. The point I already had, but the distance and bearing had to be calculated. Trigonometry to the rescue!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/parish-maps-box-trig.png&#34; width=&#34;600&#34; height=&#34;334&#34; alt=&#34;&#34;&gt;
&lt;p&gt;The distance from the point at the centre of the box to one of its corners is the hypotenuse of a right-angled triangle whose sides are equal to half the width and half the height of the map, and thanks to Pythagorus we know:&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-10-06-15-19-42.png&#34; width=&#34;579&#34; height=&#34;93&#34; alt=&#34;&#34;&gt;
&lt;p&gt;Once I had the distance in cm, I converted to inches, then multiplied by the scale factor, and finally converted the inches to miles. (It now occurs to me that there&amp;rsquo;s no need to convert to imperial measurements, but it doesn&amp;rsquo;t make any difference either way.)&lt;/p&gt;
&lt;p&gt;The bearing that points to the corner of the box is the angle inside the same right-angled triangle, so can be calculated using:&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-10-06-15-19-29.png&#34; width=&#34;407&#34; height=&#34;73&#34; alt=&#34;&#34;&gt;
&lt;p&gt;With the point of origin, distance, and bearing I could use &lt;a href=&#34;https://github.com/geopy/geopy&#34;&gt;geopy&lt;/a&gt; to calculate the corners of the bounding box!&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; geopy.distance &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; geodesic

destination &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; geodesic(miles&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;distance)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;destination(origin, bearing)
coords &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; destination&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;longitude, destination&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;latitude
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/process_parish_maps.ipynb&#34;&gt;See this notebook&lt;/a&gt; for the full details.&lt;/p&gt;
&lt;h2 id=&#34;limitations&#34;&gt;Limitations&lt;/h2&gt;
&lt;p&gt;Of course, this method is very rough and has a number of major limitations, in particular:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;only about 38% of the maps have point coordinates&lt;/li&gt;
&lt;li&gt;the point values don&amp;rsquo;t necessarily locate the centre of the map&lt;/li&gt;
&lt;li&gt;not all the maps are oriented towards north&lt;/li&gt;
&lt;li&gt;sometimes a parish includes multiple maps&lt;/li&gt;
&lt;li&gt;the size of the margin around the map will affect the accuracy of the bounding box&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But despite these problems the results seem pretty good. To test this I &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/parish_maps_browser.ipynb&#34;&gt;created a notebook&lt;/a&gt; to overlay the digitised maps on a modern basemap using the bounding boxes. Here&amp;rsquo;s an example.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/parish-maps-overlay.png&#34; width=&#34;600&#34; height=&#34;471&#34; alt=&#34;Screenshot of a parish map of French Island overlaid on a modern basemap. The parish map is slightly offset to the north, but you can see that the size matches the modern map fairly well&#34;&gt;
&lt;p&gt;You can see the map is slightly offset (presumably due to the second problem listed above). But the size seems about right. Certainly good enough to use the bounding boxes in some exploratory analyses!&lt;/p&gt;
&lt;h2 id=&#34;visualising-the-results&#34;&gt;Visualising the results&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve saved the processed data as a &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/parish_maps_final.csv&#34;&gt;new dataset&lt;/a&gt;, and started playing around with a couple of ways of visualising the results. These are experiments, not discovery interfaces. But you can use them for a bit of exploration if you don&amp;rsquo;t mind a few bugs. They&amp;rsquo;re all in Jupyter notebooks that can be run &lt;a href=&#34;https://mybinder.org/v2/gh/StateLibraryVictoria-SLVLAB/geo-maps-residency/HEAD&#34;&gt;using the Binder service&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/parish_maps_browser.ipynb&#34;&gt;parish maps browser&lt;/a&gt; includes a dropdown list of parish maps with point coordinates.  Select a map and:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;if there&amp;rsquo;s a bounding box and an image identifier, the image of the parish map will be overlaid on the modern base map using the bounding box coordinates&lt;/li&gt;
&lt;li&gt;if there&amp;rsquo;s a bounding box, but no image identifier, a rectangle will be drawn on the base map showing the dimensions of the bounding box&lt;/li&gt;
&lt;li&gt;if there are point coordinates, but no bounding box, a marker will be placed on the base map&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/parish-maps-browser.png&#34; width=&#34;600&#34; height=&#34;429&#34; alt=&#34;Screenshot of a parish map of Mallacoota overlaid on a modern basemap. The opacity of the digitised map has been reduced making it easier to see how the two maps align. A popup is visible on the map, listing the basic metadata and including a link to the SLV catalogue.&#34;&gt;
&lt;p&gt;If the image of the map is displayed you can use the slider to adjust the opacity. Clicking on either the image, rectangle, or marker will display metadata about the parish map and a link to the SLV catalogue.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s also a &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/parish_maps_visualise_bounds.ipynb&#34;&gt;visualisation of all the bounding boxes&lt;/a&gt; overlaid on a modern base map.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/parish-maps-bounds.png&#34; width=&#34;600&#34; height=&#34;474&#34; alt=&#34;Screenshot of a modern digital map of Victoria overlaid with 3,000+ transparent blue rectangles, showing the bounds of parish maps. A couple of the maps seem to be in Bass Strait.&#34;&gt;
&lt;p&gt;As you move your mouse over the bounding boxes the titles are displayed on the map, and if you click on a bounding box the metadata is displayed beneath the map, including a link to the SLV catalogue.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s obvious from the image above that some of the coordinates must be wrong! Visualisation is a great way of finding problems with your data. I now need to work through the results, documenting the problems, and thinking about how to make best use of the data. More to come!&lt;/p&gt;
</description>
      <source:markdown>The State Library of Victoria holds a collection of [8,804 parish maps](https://find.slv.vic.gov.au/discovery/search?query=series,exact,Parish%20maps%20of%20Victoria&amp;vid=61SLV_INST:SLV&amp;offset=0). As part of my residency at the SLV LAB, I&#39;ve been poking around in the metadata.

SLV staff have geocoded many of the parish maps using the [Composite Gazetteer of Australia](https://placenames.fsdf.org.au), which provides coordinates for Victorian parishes and boroughs. These coordinates give us a point which should be roughly at the centre of each map, enabling us to visualise their locations and distribution. But how much area do they cover? To answer that question we need a bounding box that includes the coordinates of each corner of the map. We could create bounding boxes by using something like [AllMaps](https://allmaps.org) or [MapWarper](https://www.mapwarper.net) to georeference each individual map, but that&#39;s going to take a while! As a quick and dirty alternative, I wondered if it was possible to generate approximate bounding boxes from the available metadata. It seems we can!

## The metadata

There are three pieces of metadata we need to construct bounding boxes:

- the latitude and longitude of the centre point
- the size of the physical map
- the scale of the map (ie how the size of the map relates to real world)

The coordinates and scale can be included in a couple of different places in the map&#39;s MARC record. The [`034`](https://www.loc.gov/marc/bibliographic/bd034.html) field is specifically for &#39;Coded Cartographic Mathematical Data&#39;. The relevant subfields are:

- `$a`: category of scale
- `$b`: constant ratio linear horizontal scale (this is the most likely type of scale)
- `$d`: westernmost longitude
- `$e`: easternmost longitude
- `$f`: northernmost latitude
- `$g`: southernmost latitude

If the coordinates describe a point rather than a bounding box, then `$d` and `$e` will be the same, and `$f` and `$g` will be the same.

String representations of coordinates and scale can be found in the `255` field. The relevant subfields are:

- `$a`: statement of scale, eg `Scale [ca. 1:90,000].`
- `$c`: statement of coordinates, eg `(E 142°18&#39;/S 37°33&#39;)`

The size of the map is recorded in [`300`](https://www.loc.gov/marc/bibliographic/bd300.html) (physical description) field under the `$c` (dimensions) subfield. For example: `on sheet 40 x 51 cm `.

## The method

I started with an existing dataset downloaded from the catalogue by SLV staff. This dataset included the scale and coordinate information in the `034` field, and the coordinate string in `255$c`. At first I didn&#39;t realise that the `034` held geo data, so I separately downloaded the scale information from `255:$a` in each item&#39;s MARC record (d&#39;oh). If the maps were digitised, I also wanted their image identifiers so I could access them through the SLV&#39;s IIIF service. The image id from the `956$e` field of the MARC record can be used to construct an IIIF manifest url, so I extracted them as well.

Once I had all the catalogue data, I had to make sure everything was in a format I could work with. The coordinates in the MARC records are recorded as degrees/minutes/seconds, so I had to convert them to decimal values. The scale factor needed to be an integer, and I needed to extract the height and width as integers from the dimensions field.

I used [lat_lon_parser](https://pypi.org/project/lat-lon-parser/) to convert the coordinates to decimal, but needed a bit of regex string manipulation to get the values into a format that could be parsed. Regex also came to the rescue in getting the map dimensions. All the details are [in this notebook](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/process_parish_maps.ipynb).

### Creating bounding boxes

After some searching I found [this StackOverflow comment](https://stackoverflow.com/a/76910048) that described how to create a bounding box from a point, distance, and bearing. The point I already had, but the distance and bearing had to be calculated. Trigonometry to the rescue!

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/parish-maps-box-trig.png&#34; width=&#34;600&#34; height=&#34;334&#34; alt=&#34;&#34;&gt;

The distance from the point at the centre of the box to one of its corners is the hypotenuse of a right-angled triangle whose sides are equal to half the width and half the height of the map, and thanks to Pythagorus we know:

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-10-06-15-19-42.png&#34; width=&#34;579&#34; height=&#34;93&#34; alt=&#34;&#34;&gt;

Once I had the distance in cm, I converted to inches, then multiplied by the scale factor, and finally converted the inches to miles. (It now occurs to me that there&#39;s no need to convert to imperial measurements, but it doesn&#39;t make any difference either way.)

The bearing that points to the corner of the box is the angle inside the same right-angled triangle, so can be calculated using:

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-10-06-15-19-29.png&#34; width=&#34;407&#34; height=&#34;73&#34; alt=&#34;&#34;&gt;

With the point of origin, distance, and bearing I could use [geopy](https://github.com/geopy/geopy) to calculate the corners of the bounding box!

```python
from geopy.distance import geodesic

destination = geodesic(miles=distance).destination(origin, bearing)
coords = destination.longitude, destination.latitude
```

[See this notebook](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/process_parish_maps.ipynb) for the full details.

## Limitations

Of course, this method is very rough and has a number of major limitations, in particular:

- only about 38% of the maps have point coordinates
- the point values don&#39;t necessarily locate the centre of the map
- not all the maps are oriented towards north
- sometimes a parish includes multiple maps
- the size of the margin around the map will affect the accuracy of the bounding box

But despite these problems the results seem pretty good. To test this I [created a notebook](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/parish_maps_browser.ipynb) to overlay the digitised maps on a modern basemap using the bounding boxes. Here&#39;s an example.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/parish-maps-overlay.png&#34; width=&#34;600&#34; height=&#34;471&#34; alt=&#34;Screenshot of a parish map of French Island overlaid on a modern basemap. The parish map is slightly offset to the north, but you can see that the size matches the modern map fairly well&#34;&gt;

You can see the map is slightly offset (presumably due to the second problem listed above). But the size seems about right. Certainly good enough to use the bounding boxes in some exploratory analyses!

## Visualising the results

I&#39;ve saved the processed data as a [new dataset](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/parish_maps_final.csv), and started playing around with a couple of ways of visualising the results. These are experiments, not discovery interfaces. But you can use them for a bit of exploration if you don&#39;t mind a few bugs. They&#39;re all in Jupyter notebooks that can be run [using the Binder service](https://mybinder.org/v2/gh/StateLibraryVictoria-SLVLAB/geo-maps-residency/HEAD).

The [parish maps browser](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/parish_maps_browser.ipynb) includes a dropdown list of parish maps with point coordinates.  Select a map and:

- if there&#39;s a bounding box and an image identifier, the image of the parish map will be overlaid on the modern base map using the bounding box coordinates
- if there&#39;s a bounding box, but no image identifier, a rectangle will be drawn on the base map showing the dimensions of the bounding box
- if there are point coordinates, but no bounding box, a marker will be placed on the base map

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/parish-maps-browser.png&#34; width=&#34;600&#34; height=&#34;429&#34; alt=&#34;Screenshot of a parish map of Mallacoota overlaid on a modern basemap. The opacity of the digitised map has been reduced making it easier to see how the two maps align. A popup is visible on the map, listing the basic metadata and including a link to the SLV catalogue.&#34;&gt;

If the image of the map is displayed you can use the slider to adjust the opacity. Clicking on either the image, rectangle, or marker will display metadata about the parish map and a link to the SLV catalogue.

There&#39;s also a [visualisation of all the bounding boxes](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/parish_maps_visualise_bounds.ipynb) overlaid on a modern base map. 

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/parish-maps-bounds.png&#34; width=&#34;600&#34; height=&#34;474&#34; alt=&#34;Screenshot of a modern digital map of Victoria overlaid with 3,000+ transparent blue rectangles, showing the bounds of parish maps. A couple of the maps seem to be in Bass Strait.&#34;&gt;

As you move your mouse over the bounding boxes the titles are displayed on the map, and if you click on a bounding box the metadata is displayed beneath the map, including a link to the SLV catalogue.

It&#39;s obvious from the image above that some of the coordinates must be wrong! Visualisation is a great way of finding problems with your data. I now need to work through the results, documenting the problems, and thinking about how to make best use of the data. More to come!
</source:markdown>
    </item>
    
    <item>
      <title>Exploring SLV urls</title>
      <link>https://updates.timsherratt.org/2025/09/23/exploring-slv-urls.html</link>
      <pubDate>Tue, 23 Sep 2025 17:22:45 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/09/23/exploring-slv-urls.html</guid>
      <description>&lt;p&gt;I like urls. They take you places. And if you know how to read them, they can tell you things about the systems that created them.
One of the first things I did when I started &lt;a href=&#34;https://updates.timsherratt.org/2025/09/22/creative-technologistinresidence-at-the-state.html&#34;&gt;my residency at SLV LAB&lt;/a&gt;, was to try and understand how their collection urls work. There&amp;rsquo;s a couple of well-worn methods I use when digging into a new site.&lt;/p&gt;
&lt;p&gt;The first is url hacking – this involves fiddling around with the parameters in a url and submitting the result to see what happens. The Trove Data Guide includes &lt;a href=&#34;https://tdg.glam-workbench.net/understanding-search/search-hacks.html&#34;&gt;some examples of hacking Trove urls&lt;/a&gt; to change the delivery of search results.&lt;/p&gt;
&lt;p&gt;The second method involves opening up the developer console in your web browser and watching the activity in the network tab as you click on links. This tells you where the information that gets loaded into your browser actually comes from – sometimes exposing handy urls that you can use to shortcut access to useful data.&lt;/p&gt;
&lt;h2 id=&#34;permalinks&#34;&gt;Permalinks&lt;/h2&gt;
&lt;p&gt;The SLV uses Primo for its public-facing catalogue, as well as other systems such as Rosetta and IIIF to deliver digitised content. I&amp;rsquo;d noticed that &lt;a href=&#34;https://www.zotero.org&#34;&gt;Zotero&lt;/a&gt; gets some useful data from the catalogue using the default &amp;lsquo;Primo 2018&amp;rsquo; translator, however, important things like the item url aren&amp;rsquo;t captured. The problem is that Primo&amp;rsquo;s &amp;lsquo;permalinks&amp;rsquo; are generated as required by a browser click – they&amp;rsquo;re not embedded anywhere on the page. This makes it hard to Zotero to grab them. So I started wondering how Zotero could construct short, persistent(ish) links to items.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a link to an item in Primo: &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/fulldisplay?vid=61SLV_INST:SLV&amp;amp;search_scope=slv_local&amp;amp;tab=searchProfile&amp;amp;context=L&amp;amp;docid=alma9941325055707636&#34;&gt;https://find.slv.vic.gov.au/discovery/fulldisplay?vid=61SLV_INST:SLV&amp;amp;search_scope=slv_local&amp;amp;tab=searchProfile&amp;amp;context=L&amp;amp;docid=alma9941325055707636&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It looks pretty long and messy, but if you start deleting parameters and resubmitting, you&amp;rsquo;ll find that only two parameters are essential, &lt;code&gt;vid&lt;/code&gt; and &lt;code&gt;docid&lt;/code&gt;. This means we can rewrite the url as: &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/fulldisplay?vid=61SLV_INST:SLV&amp;amp;docid=alma9941325055707636&#34;&gt;https://find.slv.vic.gov.au/discovery/fulldisplay?vid=61SLV_INST:SLV&amp;amp;docid=alma9941325055707636&lt;/a&gt; Much nicer.&lt;/p&gt;
&lt;p&gt;The &amp;lsquo;permalink&amp;rsquo; for the same item is: &lt;a href=&#34;https://find.slv.vic.gov.au/permalink/61SLV_INST/1sev8ar/alma9941325055707636&#34;&gt;https://find.slv.vic.gov.au/permalink/61SLV_INST/1sev8ar/alma9941325055707636&lt;/a&gt; If you look closely at the url path and compare it to the example above you&amp;rsquo;ll see the path is constructed from &lt;code&gt;/vid/[some other id]/docid&lt;/code&gt;. One of the librarians explained to me that the other identifier in the permalink is an encoding of the view type, but given that the &amp;lsquo;fulldisplay&amp;rsquo; view is the default, we don&amp;rsquo;t really need it. So the shortened url seems fine for use in Zotero and is easy to generate from the current url. Nice.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s also worth noting that the &lt;code&gt;vid&lt;/code&gt; value doesn&amp;rsquo;t seem to change, so to construct catalogue urls in your code, all you really need is the ALMA identifier that&amp;rsquo;s in the &lt;code&gt;docid&lt;/code&gt; parameter.&lt;/p&gt;
&lt;h2 id=&#34;structured-data&#34;&gt;Structured data&lt;/h2&gt;
&lt;p&gt;Item pages in Primo include a link labelled &amp;lsquo;Display source record&amp;rsquo;. If you click on this you&amp;rsquo;re taken to a representation of the item&amp;rsquo;s metadata in MARC. Here&amp;rsquo;s what the urls look like: &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/sourceRecord?vid=61SLV_INST%3ASLV&amp;amp;docId=alma9941325055707636&amp;amp;recordOwner=61SLV_INST&#34;&gt;https://find.slv.vic.gov.au/discovery/sourceRecord?vid=61SLV_INST%3ASLV&amp;amp;docId=alma9941325055707636&amp;amp;recordOwner=61SLV_INST&lt;/a&gt; Notice that the &amp;lsquo;fulldisplay&amp;rsquo; in the url path above has changed to &amp;lsquo;sourceRecord&amp;rsquo;. There&amp;rsquo;s also a new &lt;code&gt;recordOwner&lt;/code&gt; parameter, but it seems you can delete this and still get the same result.&lt;/p&gt;
&lt;p&gt;Having access to the MARC record is handy, because it delivers the metadata in a simple, structured plain text format. But while the &amp;lsquo;source record&amp;rsquo; page looks like a plain text file, it&amp;rsquo;s actually a HTML page that embeds a plain text record. If you open up the network tab of your browser&amp;rsquo;s developer console and reload the &amp;lsquo;source record&amp;rsquo; page, you&amp;rsquo;ll see a different url is loaded under the hood: &lt;a href=&#34;https://find.slv.vic.gov.au/primaws/rest/pub/sourceRecord?docId=alma9941325055707636&amp;amp;vid=61SLV_INST:SLV&amp;amp;recordOwner=61SLV_INST&amp;amp;lang=en&#34;&gt;https://find.slv.vic.gov.au/primaws/rest/pub/sourceRecord?docId=alma9941325055707636&amp;amp;vid=61SLV_INST:SLV&amp;amp;recordOwner=61SLV_INST&amp;amp;lang=en&lt;/a&gt; See how the url path has changed from &lt;code&gt;/discovery/&lt;/code&gt; to &lt;code&gt;/primaws/rest/pub&lt;/code&gt;? This url &lt;em&gt;does&lt;/em&gt; deliver a plain text version of the MARC record.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-09-23-15-21-05.png&#34; width=&#34;600&#34; height=&#34;215&#34; alt=&#34;&#34;&gt;
&lt;p&gt;Once you have the plain text version you can parse the contents to extract the structured data. There are tools that can probably do this automatically, but it&amp;rsquo;s also pretty easy using regular expressions. Here&amp;rsquo;s an example of some code I used to parse map records.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;get_marc_value&lt;/span&gt;(marc, tag, subfield):
    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    Gets the value of a tag/subfield from a text version of an item&amp;#39;s MARC record.
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;try&lt;/span&gt;:
        tag &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; re&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;search(&lt;span style=&#34;color:#e6db74&#34;&gt;rf&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;^&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;{&lt;/span&gt;tag&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;\t.+&amp;#34;&lt;/span&gt;, marc, re&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;M)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;group(&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
        subfield &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; re&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;search(&lt;span style=&#34;color:#e6db74&#34;&gt;rf&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;\$&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;{&lt;/span&gt;subfield&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;([^\$]+)&amp;#34;&lt;/span&gt;, tag)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;group(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)
    &lt;span style=&#34;color:#66d9ef&#34;&gt;except&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;AttributeError&lt;/span&gt;:
        &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;None&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; subfield&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;strip(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34; .,&amp;#34;&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can also access a JSON representation of the record by adding the parameter &lt;code&gt;&amp;amp;showPnx=true&lt;/code&gt; to the catalogue url: &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/fulldisplay?vid=61SLV_INST:SLV&amp;amp;search_scope=slv_local&amp;amp;tab=searchProfile&amp;amp;context=L&amp;amp;docid=alma9941325055707636&amp;amp;showPnx=true&#34;&gt;https://find.slv.vic.gov.au/discovery/fulldisplay?vid=61SLV_INST:SLV&amp;amp;search_scope=slv_local&amp;amp;tab=searchProfile&amp;amp;context=L&amp;amp;docid=alma9941325055707636&amp;amp;showPnx=true&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Once again, this is a JSON representation embedded in a web page. Using the same developer console trick, you can identify the direct url is: &lt;a href=&#34;https://find.slv.vic.gov.au/primaws/rest/pub/pnxs/L/alma9941325055707636?vid=61SLV_INST:SLV&amp;amp;lang=en&amp;amp;search_scope=slv_local&amp;amp;showPnx=true&amp;amp;lang=en&#34;&gt;https://find.slv.vic.gov.au/primaws/rest/pub/pnxs/L/alma9941325055707636?vid=61SLV_INST:SLV&amp;amp;lang=en&amp;amp;search_scope=slv_local&amp;amp;showPnx=true&amp;amp;lang=en&lt;/a&gt; You should be able to parse the response from this url as JSON and use it in your code. I think the Zotero translator makes use of this &lt;code&gt;pnx&lt;/code&gt; data.&lt;/p&gt;
&lt;p&gt;If you want to download the MARC or JSON representations in your code, all you really need is the &lt;code&gt;alma&lt;/code&gt; identifier. Just use it to construct one of the direct urls, such as this: &lt;a href=&#34;https://find.slv.vic.gov.au/primaws/rest/pub/sourceRecord?docId=alma9941325055707636&amp;amp;vid=61SLV_INST:SLV&#34;&gt;https://find.slv.vic.gov.au/primaws/rest/pub/sourceRecord?docId=alma9941325055707636&amp;amp;vid=61SLV_INST:SLV&lt;/a&gt; The &lt;code&gt;recordOwner&lt;/code&gt; and &lt;code&gt;lang&lt;/code&gt; parameters are not needed, and the &lt;code&gt;vid&lt;/code&gt; parameter doesn&amp;rsquo;t change.&lt;/p&gt;
&lt;p&gt;Librarians using Primo have documented a number of tricks like this and &lt;a href=&#34;https://igelu.org/products-and-initiatives/product-working-groups/primo/special-projects/primo-community-support-primo-useful-bookmarklets/&#34;&gt;shared handy bookmarklets&lt;/a&gt; to rewrite urls and get catalogue data in different forms.&lt;/p&gt;
&lt;h2 id=&#34;iiif-and-images&#34;&gt;IIIF and images&lt;/h2&gt;
&lt;p&gt;SLV delivers digitised images using &lt;a href=&#34;https://iiif.io&#34;&gt;IIIF&lt;/a&gt;. The IIIF manifest urls are not directly exposed through the web interface, but you can construct your own.&lt;/p&gt;
&lt;p&gt;IIIF manifest urls look like this: &lt;a href=&#34;https://rosetta.slv.vic.gov.au/delivery/iiif/presentation/2.1/IE24074939/manifest.json&#34;&gt;https://rosetta.slv.vic.gov.au/delivery/iiif/presentation/2.1/IE24074939/manifest.json&lt;/a&gt; All we need to construct them is the &lt;code&gt;IE&lt;/code&gt; identifier, in this case &lt;code&gt;IE24074939&lt;/code&gt;. But where do you find this identifier?&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re looking at an image in the SLV&amp;rsquo;s image viewer, the url will be something like this: &lt;a href=&#34;https://viewer.slv.vic.gov.au/?entity=IE24074939&amp;amp;mode=browse&#34;&gt;https://viewer.slv.vic.gov.au/?entity=IE24074939&amp;amp;mode=browse&lt;/a&gt; Yep, the &lt;code&gt;IE&lt;/code&gt; identifier is right there in the url. Just extract it from the viewer url, and plug it into the manifest url!&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re looking at a catalogue record, or starting with one of the &lt;code&gt;alma&lt;/code&gt; identifiers, you can get the &lt;code&gt;IE&lt;/code&gt; identifier from the &lt;code&gt;956$e&lt;/code&gt; field of the MARC record.&lt;/p&gt;
&lt;p&gt;The IIIF manifest will, in turn, provide identifiers for individual images that can be requested using the standard IIIF syntax.&lt;/p&gt;
&lt;p&gt;To save myself a bit of fiddling about, I created &lt;a href=&#34;https://gist.github.com/wragge/a37a4db854deffad956abc7bf918f6b0&#34;&gt;a userscript that exposes the IIIF manifest url&lt;/a&gt; within the image viewer. If you install it you&amp;rsquo;ll see something like this:&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-09-23-15-54-09.png&#34; width=&#34;510&#34; height=&#34;229&#34; alt=&#34;&#34;&gt;
&lt;h2 id=&#34;handles&#34;&gt;Handles&lt;/h2&gt;
&lt;p&gt;Links to digitised items sometimes come in the form of &amp;lsquo;handles&amp;rsquo;: &lt;a href=&#34;http://handle.slv.vic.gov.au/10381/4338980&#34;&gt;http://handle.slv.vic.gov.au/10381/4338980&lt;/a&gt; These urls are redirected to the image viewer.&lt;/p&gt;
&lt;p&gt;If you want to construct one of these handles, the identifier can be found in &lt;code&gt;956$a&lt;/code&gt; field of the MARC record.&lt;/p&gt;
&lt;h2 id=&#34;from-old-to-new&#34;&gt;From old to new&lt;/h2&gt;
&lt;p&gt;I was looking at the datasets created about 8 years ago in the &lt;a href=&#34;https://github.com/statelibraryvic/opendata&#34;&gt;SLV open data repository&lt;/a&gt; and noticed they included urls from the previous catalogue. Fortunately, the old urls redirect to the new system.&lt;/p&gt;
&lt;p&gt;For example, this url: &lt;a href=&#34;http://search.slv.vic.gov.au/MAIN:Everything:SLV_VOYAGER1842440&#34;&gt;http://search.slv.vic.gov.au/MAIN:Everything:SLV_VOYAGER1842440&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Redirects to: &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/fulldisplay?context=L&amp;amp;vid=61SLV_INST:SLV&amp;amp;search_scope=slv_local&amp;amp;tab=searchProfile&amp;amp;docid=alma9918424403607636&#34;&gt;https://find.slv.vic.gov.au/discovery/fulldisplay?context=L&amp;amp;vid=61SLV_INST:SLV&amp;amp;search_scope=slv_local&amp;amp;tab=searchProfile&amp;amp;docid=alma9918424403607636&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you look closely at the urls you&amp;rsquo;ll see that the identifier from the old system is embedded in the new identifier: &lt;code&gt;1842440&lt;/code&gt; is in &lt;code&gt;9918424403607636&lt;/code&gt; – &lt;code&gt;99_1842440_3607636&lt;/code&gt;. This means if you have a lot of old urls, such as in the open datasets, you can easily rewrite them in your code.&lt;/p&gt;
&lt;h2 id=&#34;the-process-of-glam-hacking&#34;&gt;The process of GLAM hacking&lt;/h2&gt;
&lt;p&gt;No doubt a lot of this is well-known to librarians, and there&amp;rsquo;s probably many subtleties or complexities that my poking about has missed. But I wanted to document the process as much as the results – to give an idea of what I do when I approach a new GLAM collection online. I suppose this is GLAM hacking 101.&lt;/p&gt;
</description>
      <source:markdown>I like urls. They take you places. And if you know how to read them, they can tell you things about the systems that created them.
One of the first things I did when I started [my residency at SLV LAB](https://updates.timsherratt.org/2025/09/22/creative-technologistinresidence-at-the-state.html), was to try and understand how their collection urls work. There&#39;s a couple of well-worn methods I use when digging into a new site.

The first is url hacking – this involves fiddling around with the parameters in a url and submitting the result to see what happens. The Trove Data Guide includes [some examples of hacking Trove urls](https://tdg.glam-workbench.net/understanding-search/search-hacks.html) to change the delivery of search results.

The second method involves opening up the developer console in your web browser and watching the activity in the network tab as you click on links. This tells you where the information that gets loaded into your browser actually comes from – sometimes exposing handy urls that you can use to shortcut access to useful data.

## Permalinks

The SLV uses Primo for its public-facing catalogue, as well as other systems such as Rosetta and IIIF to deliver digitised content. I&#39;d noticed that [Zotero](https://www.zotero.org) gets some useful data from the catalogue using the default &#39;Primo 2018&#39; translator, however, important things like the item url aren&#39;t captured. The problem is that Primo&#39;s &#39;permalinks&#39; are generated as required by a browser click – they&#39;re not embedded anywhere on the page. This makes it hard to Zotero to grab them. So I started wondering how Zotero could construct short, persistent(ish) links to items.

Here&#39;s a link to an item in Primo: https://find.slv.vic.gov.au/discovery/fulldisplay?vid=61SLV_INST:SLV&amp;search_scope=slv_local&amp;tab=searchProfile&amp;context=L&amp;docid=alma9941325055707636 

It looks pretty long and messy, but if you start deleting parameters and resubmitting, you&#39;ll find that only two parameters are essential, `vid` and `docid`. This means we can rewrite the url as: https://find.slv.vic.gov.au/discovery/fulldisplay?vid=61SLV_INST:SLV&amp;docid=alma9941325055707636 Much nicer.

The &#39;permalink&#39; for the same item is: https://find.slv.vic.gov.au/permalink/61SLV_INST/1sev8ar/alma9941325055707636 If you look closely at the url path and compare it to the example above you&#39;ll see the path is constructed from `/vid/[some other id]/docid`. One of the librarians explained to me that the other identifier in the permalink is an encoding of the view type, but given that the &#39;fulldisplay&#39; view is the default, we don&#39;t really need it. So the shortened url seems fine for use in Zotero and is easy to generate from the current url. Nice.

It&#39;s also worth noting that the `vid` value doesn&#39;t seem to change, so to construct catalogue urls in your code, all you really need is the ALMA identifier that&#39;s in the `docid` parameter.

## Structured data

Item pages in Primo include a link labelled &#39;Display source record&#39;. If you click on this you&#39;re taken to a representation of the item&#39;s metadata in MARC. Here&#39;s what the urls look like: https://find.slv.vic.gov.au/discovery/sourceRecord?vid=61SLV_INST%3ASLV&amp;docId=alma9941325055707636&amp;recordOwner=61SLV_INST Notice that the &#39;fulldisplay&#39; in the url path above has changed to &#39;sourceRecord&#39;. There&#39;s also a new `recordOwner` parameter, but it seems you can delete this and still get the same result.

Having access to the MARC record is handy, because it delivers the metadata in a simple, structured plain text format. But while the &#39;source record&#39; page looks like a plain text file, it&#39;s actually a HTML page that embeds a plain text record. If you open up the network tab of your browser&#39;s developer console and reload the &#39;source record&#39; page, you&#39;ll see a different url is loaded under the hood: https://find.slv.vic.gov.au/primaws/rest/pub/sourceRecord?docId=alma9941325055707636&amp;vid=61SLV_INST:SLV&amp;recordOwner=61SLV_INST&amp;lang=en See how the url path has changed from `/discovery/` to `/primaws/rest/pub`? This url *does* deliver a plain text version of the MARC record.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-09-23-15-21-05.png&#34; width=&#34;600&#34; height=&#34;215&#34; alt=&#34;&#34;&gt;

Once you have the plain text version you can parse the contents to extract the structured data. There are tools that can probably do this automatically, but it&#39;s also pretty easy using regular expressions. Here&#39;s an example of some code I used to parse map records.

``` python
def get_marc_value(marc, tag, subfield):
    &#34;&#34;&#34;
    Gets the value of a tag/subfield from a text version of an item&#39;s MARC record.
    &#34;&#34;&#34;
    try:
        tag = re.search(rf&#34;^{tag}\t.+&#34;, marc, re.M).group(0)
        subfield = re.search(rf&#34;\${subfield}([^\$]+)&#34;, tag).group(1)
    except AttributeError:
        return None
    return subfield.strip(&#34; .,&#34;)
```
You can also access a JSON representation of the record by adding the parameter `&amp;showPnx=true` to the catalogue url: https://find.slv.vic.gov.au/discovery/fulldisplay?vid=61SLV_INST:SLV&amp;search_scope=slv_local&amp;tab=searchProfile&amp;context=L&amp;docid=alma9941325055707636&amp;showPnx=true

Once again, this is a JSON representation embedded in a web page. Using the same developer console trick, you can identify the direct url is: https://find.slv.vic.gov.au/primaws/rest/pub/pnxs/L/alma9941325055707636?vid=61SLV_INST:SLV&amp;lang=en&amp;search_scope=slv_local&amp;showPnx=true&amp;lang=en You should be able to parse the response from this url as JSON and use it in your code. I think the Zotero translator makes use of this `pnx` data.

If you want to download the MARC or JSON representations in your code, all you really need is the `alma` identifier. Just use it to construct one of the direct urls, such as this: https://find.slv.vic.gov.au/primaws/rest/pub/sourceRecord?docId=alma9941325055707636&amp;vid=61SLV_INST:SLV The `recordOwner` and `lang` parameters are not needed, and the `vid` parameter doesn&#39;t change.

Librarians using Primo have documented a number of tricks like this and [shared handy bookmarklets](https://igelu.org/products-and-initiatives/product-working-groups/primo/special-projects/primo-community-support-primo-useful-bookmarklets/) to rewrite urls and get catalogue data in different forms.

## IIIF and images

SLV delivers digitised images using [IIIF](https://iiif.io). The IIIF manifest urls are not directly exposed through the web interface, but you can construct your own.

IIIF manifest urls look like this: https://rosetta.slv.vic.gov.au/delivery/iiif/presentation/2.1/IE24074939/manifest.json All we need to construct them is the `IE` identifier, in this case `IE24074939`. But where do you find this identifier?

If you&#39;re looking at an image in the SLV&#39;s image viewer, the url will be something like this: https://viewer.slv.vic.gov.au/?entity=IE24074939&amp;mode=browse Yep, the `IE` identifier is right there in the url. Just extract it from the viewer url, and plug it into the manifest url!

If you&#39;re looking at a catalogue record, or starting with one of the `alma` identifiers, you can get the `IE` identifier from the `956$e` field of the MARC record.

The IIIF manifest will, in turn, provide identifiers for individual images that can be requested using the standard IIIF syntax.

To save myself a bit of fiddling about, I created [a userscript that exposes the IIIF manifest url](https://gist.github.com/wragge/a37a4db854deffad956abc7bf918f6b0) within the image viewer. If you install it you&#39;ll see something like this:

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-09-23-15-54-09.png&#34; width=&#34;510&#34; height=&#34;229&#34; alt=&#34;&#34;&gt;

## Handles

Links to digitised items sometimes come in the form of &#39;handles&#39;: http://handle.slv.vic.gov.au/10381/4338980 These urls are redirected to the image viewer.

If you want to construct one of these handles, the identifier can be found in `956$a` field of the MARC record.

## From old to new

I was looking at the datasets created about 8 years ago in the [SLV open data repository](https://github.com/statelibraryvic/opendata) and noticed they included urls from the previous catalogue. Fortunately, the old urls redirect to the new system.

For example, this url: http://search.slv.vic.gov.au/MAIN:Everything:SLV_VOYAGER1842440

Redirects to: https://find.slv.vic.gov.au/discovery/fulldisplay?context=L&amp;vid=61SLV_INST:SLV&amp;search_scope=slv_local&amp;tab=searchProfile&amp;docid=alma9918424403607636

If you look closely at the urls you&#39;ll see that the identifier from the old system is embedded in the new identifier: `1842440` is in `9918424403607636` – `99_1842440_3607636`. This means if you have a lot of old urls, such as in the open datasets, you can easily rewrite them in your code.

## The process of GLAM hacking

No doubt a lot of this is well-known to librarians, and there&#39;s probably many subtleties or complexities that my poking about has missed. But I wanted to document the process as much as the results – to give an idea of what I do when I approach a new GLAM collection online. I suppose this is GLAM hacking 101.







</source:markdown>
    </item>
    
    <item>
      <title>Creative Technologist-in-Residence at the State Library of Victoria!</title>
      <link>https://updates.timsherratt.org/2025/09/22/creative-technologistinresidence-at-the-state.html</link>
      <pubDate>Tue, 23 Sep 2025 00:14:09 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/09/22/creative-technologistinresidence-at-the-state.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;m very excited to be the new &lt;a href=&#34;https://lab.slv.vic.gov.au/residencies-opportunities&#34;&gt;Creative Technologist-in-Residence at the SLV LAB&lt;/a&gt;. For the next few months I get to play around with metadata and images, think about online access, experiment with different technologies, and build things to help people to explore the State Library&amp;rsquo;s collections. In other words, I get to be in my happy place!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/2025-09-22-11.36.59.jpg&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;My group at &lt;a href=&#34;https://updates.timsherratt.org/2025/08/29/wikifest-at-the-state-library.html&#34;&gt;the recent SLV WikiFest&lt;/a&gt; was thinking about ways of helping researchers find resources relating to particular locations – how do I find material about my suburb, or my street? Coincidentally, the main focus of my residency will also be place-based collections, so I get to really think through some of the possibilities. SLV staff have already pointed me to some amazing maps and photographs, such as the &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/collectionDiscovery?vid=61SLV_INST:SLV&amp;amp;collectionId=81271917420007636&#34;&gt;Committee for Urban Action collection&lt;/a&gt;, the &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/search?query=any,contains,mahlstedt%20melbourne&amp;amp;tab=searchProfile&amp;amp;search_scope=slv_local&amp;amp;vid=61SLV_INST:SLV&amp;amp;facet=tlevel,include,online_resources&amp;amp;offset=0&#34;&gt;Mahlstedt fire survey maps&lt;/a&gt;, the &lt;a href=&#34;https://guides.slv.vic.gov.au/MMBWplans&#34;&gt;MMBW plans&lt;/a&gt;, and the &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/search?query=series,exact,Parish%20maps%20of%20Victoria&amp;amp;vid=61SLV_INST:SLV&amp;amp;offset=0&#34;&gt;Victorian parish maps&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;At the same time, I&amp;rsquo;ll be using my usual GLAM hacking approach to poke around in the SLV website to try and understand what data is currently available, identify any roadblocks, and document opportunities for computational research.&lt;/p&gt;
&lt;p&gt;The results of my residency will be shared on the &lt;a href=&#34;https://lab.slv.vic.gov.au&#34;&gt;SLV LAB site&lt;/a&gt;, in &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency&#34;&gt;GitHub&lt;/a&gt;, in the &lt;a href=&#34;https://glam-workbench.net/state-library-victoria/&#34;&gt;SLV section of the GLAM Workbench&lt;/a&gt;, and of course here. As usual, I&amp;rsquo;ll be working in the open, documenting things as I go along, so please join me on the journey!&lt;/p&gt;
&lt;p&gt;Although the residency was formally announced today, I&amp;rsquo;ve actually been working with SLV data for the last couple of weeks and I&amp;rsquo;ve already got a backlog of stuff I need to blog about. Here&amp;rsquo;s a taster – what happens when you generate bounding boxes for thousands of parish maps from the available metadata and throw them on a map…?&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-09-22-23-08-25.png&#34; width=&#34;600&#34; height=&#34;406&#34; alt=&#34;&#34;&gt;
</description>
      <source:markdown>I&#39;m very excited to be the new [Creative Technologist-in-Residence at the SLV LAB](https://lab.slv.vic.gov.au/residencies-opportunities). For the next few months I get to play around with metadata and images, think about online access, experiment with different technologies, and build things to help people to explore the State Library&#39;s collections. In other words, I get to be in my happy place!

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/2025-09-22-11.36.59.jpg&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

My group at [the recent SLV WikiFest](https://updates.timsherratt.org/2025/08/29/wikifest-at-the-state-library.html) was thinking about ways of helping researchers find resources relating to particular locations – how do I find material about my suburb, or my street? Coincidentally, the main focus of my residency will also be place-based collections, so I get to really think through some of the possibilities. SLV staff have already pointed me to some amazing maps and photographs, such as the [Committee for Urban Action collection](https://find.slv.vic.gov.au/discovery/collectionDiscovery?vid=61SLV_INST:SLV&amp;collectionId=81271917420007636), the [Mahlstedt fire survey maps](https://find.slv.vic.gov.au/discovery/search?query=any,contains,mahlstedt%20melbourne&amp;tab=searchProfile&amp;search_scope=slv_local&amp;vid=61SLV_INST:SLV&amp;facet=tlevel,include,online_resources&amp;offset=0), the [MMBW plans](https://guides.slv.vic.gov.au/MMBWplans), and the [Victorian parish maps](https://find.slv.vic.gov.au/discovery/search?query=series,exact,Parish%20maps%20of%20Victoria&amp;vid=61SLV_INST:SLV&amp;offset=0).

At the same time, I&#39;ll be using my usual GLAM hacking approach to poke around in the SLV website to try and understand what data is currently available, identify any roadblocks, and document opportunities for computational research.

The results of my residency will be shared on the [SLV LAB site](https://lab.slv.vic.gov.au), in [GitHub](https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency), in the [SLV section of the GLAM Workbench](https://glam-workbench.net/state-library-victoria/), and of course here. As usual, I&#39;ll be working in the open, documenting things as I go along, so please join me on the journey!

Although the residency was formally announced today, I&#39;ve actually been working with SLV data for the last couple of weeks and I&#39;ve already got a backlog of stuff I need to blog about. Here&#39;s a taster – what happens when you generate bounding boxes for thousands of parish maps from the available metadata and throw them on a map…?

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-09-22-23-08-25.png&#34; width=&#34;600&#34; height=&#34;406&#34; alt=&#34;&#34;&gt;








</source:markdown>
    </item>
    
    <item>
      <title>WikiFest at the State Library of Victoria</title>
      <link>https://updates.timsherratt.org/2025/08/29/wikifest-at-the-state-library.html</link>
      <pubDate>Fri, 29 Aug 2025 16:07:26 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/08/29/wikifest-at-the-state-library.html</guid>
      <description>&lt;p&gt;This week I was lucky enough to participate in WikiFest at the State Library of Victoria. Organised by the &lt;a href=&#34;https://lab.slv.vic.gov.au&#34;&gt;State Library&amp;rsquo;s new innovation LAB&lt;/a&gt; and &lt;a href=&#34;https://wikimedia.org.au&#34;&gt;Wikimedia Australia&lt;/a&gt;, Wikifest was a hands-on, participant-led workshop focused on the possibilities of connecting SLV&amp;rsquo;s collections to (and through!) Wikidata.&lt;/p&gt;
&lt;p&gt;The day kicked off with a series of presentations demonstrating possible uses of Wikidata. I talked a bit about some of my recent GLAM/Wikidata experiments. My &lt;a href=&#34;https://slides.com/wragge/wikifest-slv-2025&#34;&gt;slides are online&lt;/a&gt; and contain plenty of links to code, demonstrations, and documentation. They&amp;rsquo;re openly-licensed, so feel free to take anything of use.&lt;/p&gt;
&lt;iframe src=&#34;https://slides.com/wragge/wikifest-slv-2025/embed&#34; width=&#34;100%&#34; height=&#34;500&#34; title=&#34;WikiFest SLV 2025&#34; scrolling=&#34;no&#34; frameborder=&#34;0&#34; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;
&lt;p&gt;The rest of the day was spent in groups, working on particular projects and learning more about Wikidata in the process. My group was looking at providing placed-based entry points to SLV collections, and spent a lot of time exploring the representation of Victoria&amp;rsquo;s &lt;a href=&#34;https://query.wikidata.org/embed.html#%23Country%20populations%20together%20with%20total%20city%20populations%0ASELECT%20%3Flga%20%3FlgaLabel%20%3FstartDate%20%3FendDate%20%3Fpoint%20%7B%0A%20%20%3Flga%20wdt%3AP31%20wd%3AQ30129411%20%3B%0A%20%20%20%20%20%20%20wdt%3AP131%20wd%3AQ36687.%0A%20%20%3Flga%20p%3AP625%20%3Fcoordinate.%0A%20%20%3Fcoordinate%20ps%3AP625%20%3Fpoint.%0A%20%20OPTIONAL%20%7B%3Flga%20wdt%3AP571%20%3FstartDate.%7D%0A%20%20OPTIONAL%20%7B%3Flga%20wdt%3AP576%20%3FendDate.%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cmul%2Cen%22%20%7D%0A%7D&#34;&gt;Local Government Areas (LGAs) in Wikidata&lt;/a&gt;. We realised there was quite a bit of work to do in adding things like dates and boundaries, but we could see some exciting future possibilities. We also made a start, adding an &amp;lsquo;inception&amp;rsquo; date for the &lt;a href=&#34;https://www.wikidata.org/wiki/Q5123821&#34;&gt;City of Moe&lt;/a&gt;, based on the Victorian Government Gazette, &lt;a href=&#34;https://gazette.slv.vic.gov.au&#34;&gt;digitised by the SLV&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-08-29-14-45-09.png&#34; width=&#34;600&#34; height=&#34;271&#34; alt=&#34;Screen capture from Wikidata showing the inception property for the City of Moe&#34;&gt;
&lt;h2 id=&#34;bonus-userscript&#34;&gt;Bonus userscript&lt;/h2&gt;
&lt;p&gt;While I was preparing my presentation I was thinking about the the way entries for Australian people in Wikidata are linked to a range of different identifiers, such as DAAO, the Encyclopedia of Australian Science, and the Australian Dictionary of Biography (ADB). Often a single person can have multiple identifiers and this means that those identifiers themselves become connected through that person&amp;rsquo;s record. You can query Wikidata with one identifier, and get back links to a range of other information sources about that person.&lt;/p&gt;
&lt;p&gt;To demonstrate this, I created &lt;a href=&#34;https://gist.github.com/wragge/40f66af72c400b2563f95bda60e713dd&#34;&gt;a simple userscript&lt;/a&gt; that adds additional links to biographies in the ADB. The script grabs the ADB identifier from the url, queries Wikidata for additional identifiers, and writes the results into the page&amp;rsquo;s &amp;lsquo;Life Summary&amp;rsquo;. Basic, but useful!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/adb-userscript-example.png&#34; width=&#34;600&#34; height=&#34;253&#34; alt=&#34;Screenshot from the ADB showing the related links from Wikidata added to the Life Summary of Margaret Baskerville&#34;&gt;
&lt;p&gt;For something more advanced, have a look at the &lt;a href=&#34;https://addons.mozilla.org/en-US/firefox/addon/entity-explosion/&#34;&gt;Entity Explosion extension&lt;/a&gt; for Firefox.&lt;/p&gt;
</description>
      <source:markdown>This week I was lucky enough to participate in WikiFest at the State Library of Victoria. Organised by the [State Library&#39;s new innovation LAB](https://lab.slv.vic.gov.au) and [Wikimedia Australia](https://wikimedia.org.au), Wikifest was a hands-on, participant-led workshop focused on the possibilities of connecting SLV&#39;s collections to (and through!) Wikidata.

The day kicked off with a series of presentations demonstrating possible uses of Wikidata. I talked a bit about some of my recent GLAM/Wikidata experiments. My [slides are online](https://slides.com/wragge/wikifest-slv-2025) and contain plenty of links to code, demonstrations, and documentation. They&#39;re openly-licensed, so feel free to take anything of use.

&lt;iframe src=&#34;https://slides.com/wragge/wikifest-slv-2025/embed&#34; width=&#34;100%&#34; height=&#34;500&#34; title=&#34;WikiFest SLV 2025&#34; scrolling=&#34;no&#34; frameborder=&#34;0&#34; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;

The rest of the day was spent in groups, working on particular projects and learning more about Wikidata in the process. My group was looking at providing placed-based entry points to SLV collections, and spent a lot of time exploring the representation of Victoria&#39;s [Local Government Areas (LGAs) in Wikidata](https://query.wikidata.org/embed.html#%23Country%20populations%20together%20with%20total%20city%20populations%0ASELECT%20%3Flga%20%3FlgaLabel%20%3FstartDate%20%3FendDate%20%3Fpoint%20%7B%0A%20%20%3Flga%20wdt%3AP31%20wd%3AQ30129411%20%3B%0A%20%20%20%20%20%20%20wdt%3AP131%20wd%3AQ36687.%0A%20%20%3Flga%20p%3AP625%20%3Fcoordinate.%0A%20%20%3Fcoordinate%20ps%3AP625%20%3Fpoint.%0A%20%20OPTIONAL%20%7B%3Flga%20wdt%3AP571%20%3FstartDate.%7D%0A%20%20OPTIONAL%20%7B%3Flga%20wdt%3AP576%20%3FendDate.%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cmul%2Cen%22%20%7D%0A%7D). We realised there was quite a bit of work to do in adding things like dates and boundaries, but we could see some exciting future possibilities. We also made a start, adding an &#39;inception&#39; date for the [City of Moe](https://www.wikidata.org/wiki/Q5123821), based on the Victorian Government Gazette, [digitised by the SLV](https://gazette.slv.vic.gov.au).

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-08-29-14-45-09.png&#34; width=&#34;600&#34; height=&#34;271&#34; alt=&#34;Screen capture from Wikidata showing the inception property for the City of Moe&#34;&gt;

## Bonus userscript

While I was preparing my presentation I was thinking about the the way entries for Australian people in Wikidata are linked to a range of different identifiers, such as DAAO, the Encyclopedia of Australian Science, and the Australian Dictionary of Biography (ADB). Often a single person can have multiple identifiers and this means that those identifiers themselves become connected through that person&#39;s record. You can query Wikidata with one identifier, and get back links to a range of other information sources about that person.

To demonstrate this, I created [a simple userscript](https://gist.github.com/wragge/40f66af72c400b2563f95bda60e713dd) that adds additional links to biographies in the ADB. The script grabs the ADB identifier from the url, queries Wikidata for additional identifiers, and writes the results into the page&#39;s &#39;Life Summary&#39;. Basic, but useful!

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/adb-userscript-example.png&#34; width=&#34;600&#34; height=&#34;253&#34; alt=&#34;Screenshot from the ADB showing the related links from Wikidata added to the Life Summary of Margaret Baskerville&#34;&gt;

For something more advanced, have a look at the [Entity Explosion extension](https://addons.mozilla.org/en-US/firefox/addon/entity-explosion/) for Firefox.


</source:markdown>
    </item>
    
    <item>
      <title>GLAM hacking with userscripts</title>
      <link>https://updates.timsherratt.org/2025/07/17/glam-hacking-with-userscripts.html</link>
      <pubDate>Thu, 17 Jul 2025 18:21:25 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/07/17/glam-hacking-with-userscripts.html</guid>
      <description>&lt;p&gt;In teaching and workshops I used to get students to question the idea that websites are &amp;lsquo;published&amp;rsquo;. They&amp;rsquo;re not released into the world in a fixed, immutable form – they&amp;rsquo;re a set of blueprints which only reach their final form in your browser window. This makes it possible to change the way websites look and behave.&lt;/p&gt;
&lt;p&gt;Mozilla used to have a nifty educational tool called X-Ray Googles. Using it you could explore the code underlying a web page and do fun things like inserting new text or images. I encouraged students to try hacking ASIO&amp;rsquo;s home page.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/asio-eggplants.jpg&#34; width=&#34;600&#34; height=&#34;629&#34; alt=&#34;Old, modified screenshot of ASIO homepage with a section of a cartoon from First Dog On the Moon inserted.&#34;&gt;
&lt;p&gt;&lt;em&gt;ASIO home page with some added First Dog on the Moon.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;There are other ways you can fiddle with websites. For example, most browsers have a developer console that exposes the code and styling of a page. You can use the console to edit HTML elements or toggle styles, but your changes won&amp;rsquo;t be saved.&lt;/p&gt;
&lt;p&gt;One way you can save and share your web site customisations is by creating userscripts. Userscripts are little bits of Javascript code that run in your browser after a web page loads. These scripts can change many aspects of a page – not just how it looks, but also how it works.&lt;/p&gt;
&lt;h2 id=&#34;some-old-userscripts&#34;&gt;Some old userscripts&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve been playing around with userscripts for a long time. Back in 2008, I created a userscript that &lt;a href=&#34;https://discontents.com.au/shoebox/archives-shoebox/archives-in-3d.html&#34;&gt;completely overhauled the way that digital files were presented&lt;/a&gt; in the National Archives of Australia&amp;rsquo;s online database, RecordSearch. My userscript added new options for navigating and printing the file, and even made it possible to view the complete file contents on a 3D zoomable wall.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/userscript-screenshot1.jpg&#34; width=&#34;600&#34; height=&#34;577&#34; alt=&#34;Screenshot of a digitised file in RecordSearch showing the features added by the userscript.&#34;&gt;
&lt;p&gt;&lt;em&gt;This customised RecordSearch interface was created by a userscript.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The userscripts I&amp;rsquo;ve created over the years have tended to either be useful little hacks aimed at fixing annoying aspects of GLAM websites, or experiments in thinking about the sort of information that&amp;rsquo;s presented online by GLAM organisations, and how it might be different.&lt;/p&gt;
&lt;p&gt;In the first category are hacks like my &lt;a href=&#34;https://gist.github.com/wragge/b2af9dc56f7cb0a9476b&#34;&gt;RecordSearch show pages userscript&lt;/a&gt;. In 2009, I got annoyed that there was no way of knowing how many pages were in a digitised file until you clicked on the link. &lt;a href=&#34;https://discontents.com.au/doing-it-yourself/index.html&#34;&gt;So I fixed it.&lt;/a&gt; With my userscript running, the links to digitised files are rewritten to display the number of pages. I&amp;rsquo;ve updated the code numerous times over the years, adding new features, and dealing with changes to RecordSearch. The last update was just a few days ago.&lt;/p&gt;
&lt;p&gt;In the second category is my userscript that inserts photos from &lt;a href=&#34;https://www.realfaceofwhiteaustralia.net/&#34;&gt;The Real Face of White Australia&lt;/a&gt; into RecordSearch. The are many thousands of records in the National Archives of Australia that document the impact of the White Australia Policy on the lives of ordinary people. But it&amp;rsquo;s often hard to understand this from the file descriptions. The userscript displays portrait images extracted from the files alongside the metadata – it tells you there are &lt;a href=&#34;https://doi.org/10.5281/zenodo.3579530&#34;&gt;people inside&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/people-inside-list.gif&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;Animated gif showing how the userscript changes the display of a list of files in RecordSearch by adding pictures of people.&#34;&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/people-inside-item.gif&#34; width=&#34;600&#34; height=&#34;412&#34; alt=&#34;Animated gif showing how the userscript changes the display of an individual files in RecordSearch by adding pictures of the people inside.&#34;&gt;
&lt;p&gt;Amidst &lt;a href=&#34;https://updates.timsherratt.org/2025/07/09/the-rebirth-of-wragge-labs.html&#34;&gt;my recent self-archiving binge&lt;/a&gt;, I realised I&amp;rsquo;d never updated this userscript to work with the latest data from The Real Face of White Australia, so I spent some time getting it working again. In the process I realised that RecordSearch now included content security policies that made it a bit harder to insert new images. The solution was to use one of the special userscript functions, &lt;a href=&#34;https://www.tampermonkey.net/documentation.php?locale=en#api:GM_addElement&#34;&gt;GM_addElement()&lt;/a&gt;, rather than plain old Javascript. But then I discovered that the if the show pages userscript ran after this one, it would trigger the security restrictions nonetheless! To avoid this I made sure that the two userscripts operated on separate elements. So now the &lt;a href=&#34;https://gist.github.com/wragge/2941e473ee70152f4de7&#34;&gt;show people userscript&lt;/a&gt; is working again!&lt;/p&gt;
&lt;h2 id=&#34;and-a-new-userscript-to-improve-trove-lists&#34;&gt;And a new userscript to improve Trove lists&lt;/h2&gt;
&lt;p&gt;Fixing up the &amp;lsquo;people inside&amp;rsquo; code reminded me of how much fun it was playing around with userscripts, so when David Coombe mentioned a problem he had using Trove lists on Mastodon last night, I had to have a go at fixing it.&lt;/p&gt;
&lt;p&gt;The problem is that Trove lists display all the tags associated with each individual item. Some items have lots of tags, so this eats up the screen real estate, making it harder to browse the contents of a list. Notes attached to items can be hidden, but not tags. Why not?&lt;/p&gt;
&lt;p&gt;My &lt;a href=&#34;https://gist.github.com/wragge/ab6a9d6b612bee6bc4d98658e947c420&#34;&gt;brand new userscript&lt;/a&gt; hides tags by default, and adds a new link to toggle their visibility for each individual item. The link also displays the number of tags attached to each item. This gives the user control over which tags are displayed and when.&lt;/p&gt;
&lt;p&gt;&lt;video src=&#34;https://cdn.uploads.micro.blog/8371/2025/simplescreenrecorder-2025-07-17-12.53.06.mp4&#34; poster=&#34;https://updates.timsherratt.org/uploads/2025/poster.png&#34; controls=&#34;controls&#34; preload=&#34;metadata&#34; width=&#34;600px&#34;&gt;&lt;/video&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The new userscript in action – toggle your tags!&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The main difficulty in creating this userscript was knowing when the page had actually finished loading. The current version of Trove uses a lot of Javascript to load and manipulate content, so you have to tell the userscript to wait until  everything has settled down. Otherwise the script could fire too soon and cause unexpected results. I tried a number of different approaches to handling this problem, but eventually settled on the &lt;a href=&#34;https://gist.github.com/BrockA/2625891&#34;&gt;waitForKeyElements script&lt;/a&gt;. (I just realised there&amp;rsquo;s a &lt;a href=&#34;https://github.com/CoeJoder/waitForKeyElements.js&#34;&gt;more recent version&lt;/a&gt; of this script that doesn&amp;rsquo;t require JQuery, so I might need to investigate this further.)&lt;/p&gt;
&lt;p&gt;Another Trove problem fixed!&lt;/p&gt;
&lt;h2 id=&#34;using-userscripts&#34;&gt;Using userscripts&lt;/h2&gt;
&lt;p&gt;In addition to the userscripts mentioned above, I&amp;rsquo;ve also created one that &lt;a href=&#34;https://gist.github.com/wragge/af8bd20a14005d267ffc759463bd832c&#34;&gt;enables you to browse Trove newspaper pages using the arrows on your keyboard&lt;/a&gt;. Left and right arrows go to the next and previous pages, while up and down arrows jump between issues. Searching is great, but sometimes you just want to browse. Install this userscript for that old-time, authentic newspaper reading experience!&lt;/p&gt;
&lt;p&gt;But how do you install userscripts? First of all you need a browser extension to manage your userscripts – I use &lt;a href=&#34;https://www.tampermonkey.net/&#34;&gt;TamperMonkey&lt;/a&gt; or &lt;a href=&#34;http://violentmonkey.com/&#34;&gt;ViolentMonkey&lt;/a&gt;. Just follow the instructions to add one of them to your browser.&lt;/p&gt;
&lt;p&gt;To install one of my userscripts, you need to go to the script (saved as a GitHub Gist) and click on the &amp;lsquo;Raw&amp;rsquo; button. Your userscript manager will then ask you if you want to add the userscript. Click install!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-07-17-16-59-11.png&#34; width=&#34;600&#34; height=&#34;244&#34; alt=&#34;&#34;&gt;
&lt;p&gt;&lt;em&gt;Click on the &amp;lsquo;Raw&amp;rsquo; button to install.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Once they&amp;rsquo;re installed the userscripts will run automatically when specified pages are loaded. If you ever want to disable them, you can do that from your userscript manager&amp;rsquo;s dashboard.&lt;/p&gt;
&lt;p&gt;For convenience, here are the Gist links to all the userscripts I&amp;rsquo;ve mentioned:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://gist.github.com/wragge/b2af9dc56f7cb0a9476b&#34;&gt;RecordSearch show pages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://gist.github.com/wragge/2941e473ee70152f4de7&#34;&gt;RecordSearch show people&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://gist.github.com/wragge/ab6a9d6b612bee6bc4d98658e947c420&#34;&gt;Trove lists hide tags&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://gist.github.com/wragge/af8bd20a14005d267ffc759463bd832c&#34;&gt;Trove newspapers keyboard navigation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As with anything you install on your computer, you want to make sure that you trust the source of any userscripts you add.&lt;/p&gt;
</description>
      <source:markdown>In teaching and workshops I used to get students to question the idea that websites are &#39;published&#39;. They&#39;re not released into the world in a fixed, immutable form – they&#39;re a set of blueprints which only reach their final form in your browser window. This makes it possible to change the way websites look and behave.

Mozilla used to have a nifty educational tool called X-Ray Googles. Using it you could explore the code underlying a web page and do fun things like inserting new text or images. I encouraged students to try hacking ASIO&#39;s home page.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/asio-eggplants.jpg&#34; width=&#34;600&#34; height=&#34;629&#34; alt=&#34;Old, modified screenshot of ASIO homepage with a section of a cartoon from First Dog On the Moon inserted.&#34;&gt;

*ASIO home page with some added First Dog on the Moon.*

There are other ways you can fiddle with websites. For example, most browsers have a developer console that exposes the code and styling of a page. You can use the console to edit HTML elements or toggle styles, but your changes won&#39;t be saved.

One way you can save and share your web site customisations is by creating userscripts. Userscripts are little bits of Javascript code that run in your browser after a web page loads. These scripts can change many aspects of a page – not just how it looks, but also how it works.

## Some old userscripts

I&#39;ve been playing around with userscripts for a long time. Back in 2008, I created a userscript that [completely overhauled the way that digital files were presented](https://discontents.com.au/shoebox/archives-shoebox/archives-in-3d.html) in the National Archives of Australia&#39;s online database, RecordSearch. My userscript added new options for navigating and printing the file, and even made it possible to view the complete file contents on a 3D zoomable wall.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/userscript-screenshot1.jpg&#34; width=&#34;600&#34; height=&#34;577&#34; alt=&#34;Screenshot of a digitised file in RecordSearch showing the features added by the userscript.&#34;&gt;

*This customised RecordSearch interface was created by a userscript.*

The userscripts I&#39;ve created over the years have tended to either be useful little hacks aimed at fixing annoying aspects of GLAM websites, or experiments in thinking about the sort of information that&#39;s presented online by GLAM organisations, and how it might be different.

In the first category are hacks like my [RecordSearch show pages userscript](https://gist.github.com/wragge/b2af9dc56f7cb0a9476b). In 2009, I got annoyed that there was no way of knowing how many pages were in a digitised file until you clicked on the link. [So I fixed it.](https://discontents.com.au/doing-it-yourself/index.html) With my userscript running, the links to digitised files are rewritten to display the number of pages. I&#39;ve updated the code numerous times over the years, adding new features, and dealing with changes to RecordSearch. The last update was just a few days ago.

In the second category is my userscript that inserts photos from [The Real Face of White Australia](https://www.realfaceofwhiteaustralia.net/) into RecordSearch. The are many thousands of records in the National Archives of Australia that document the impact of the White Australia Policy on the lives of ordinary people. But it&#39;s often hard to understand this from the file descriptions. The userscript displays portrait images extracted from the files alongside the metadata – it tells you there are [people inside](https://doi.org/10.5281/zenodo.3579530). 

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/people-inside-list.gif&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;Animated gif showing how the userscript changes the display of a list of files in RecordSearch by adding pictures of people.&#34;&gt;

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/people-inside-item.gif&#34; width=&#34;600&#34; height=&#34;412&#34; alt=&#34;Animated gif showing how the userscript changes the display of an individual files in RecordSearch by adding pictures of the people inside.&#34;&gt;

Amidst [my recent self-archiving binge](https://updates.timsherratt.org/2025/07/09/the-rebirth-of-wragge-labs.html), I realised I&#39;d never updated this userscript to work with the latest data from The Real Face of White Australia, so I spent some time getting it working again. In the process I realised that RecordSearch now included content security policies that made it a bit harder to insert new images. The solution was to use one of the special userscript functions, [GM_addElement()](https://www.tampermonkey.net/documentation.php?locale=en#api:GM_addElement), rather than plain old Javascript. But then I discovered that the if the show pages userscript ran after this one, it would trigger the security restrictions nonetheless! To avoid this I made sure that the two userscripts operated on separate elements. So now the [show people userscript](https://gist.github.com/wragge/2941e473ee70152f4de7) is working again!

## And a new userscript to improve Trove lists

Fixing up the &#39;people inside&#39; code reminded me of how much fun it was playing around with userscripts, so when David Coombe mentioned a problem he had using Trove lists on Mastodon last night, I had to have a go at fixing it.

The problem is that Trove lists display all the tags associated with each individual item. Some items have lots of tags, so this eats up the screen real estate, making it harder to browse the contents of a list. Notes attached to items can be hidden, but not tags. Why not?

My [brand new userscript](https://gist.github.com/wragge/ab6a9d6b612bee6bc4d98658e947c420) hides tags by default, and adds a new link to toggle their visibility for each individual item. The link also displays the number of tags attached to each item. This gives the user control over which tags are displayed and when.

&lt;video src=&#34;https://cdn.uploads.micro.blog/8371/2025/simplescreenrecorder-2025-07-17-12.53.06.mp4&#34; poster=&#34;https://updates.timsherratt.org/uploads/2025/poster.png&#34; controls=&#34;controls&#34; preload=&#34;metadata&#34; width=&#34;600px&#34;&gt;&lt;/video&gt;

*The new userscript in action – toggle your tags!*

The main difficulty in creating this userscript was knowing when the page had actually finished loading. The current version of Trove uses a lot of Javascript to load and manipulate content, so you have to tell the userscript to wait until  everything has settled down. Otherwise the script could fire too soon and cause unexpected results. I tried a number of different approaches to handling this problem, but eventually settled on the [waitForKeyElements script](https://gist.github.com/BrockA/2625891). (I just realised there&#39;s a [more recent version](https://github.com/CoeJoder/waitForKeyElements.js) of this script that doesn&#39;t require JQuery, so I might need to investigate this further.)

Another Trove problem fixed!

## Using userscripts

In addition to the userscripts mentioned above, I&#39;ve also created one that [enables you to browse Trove newspaper pages using the arrows on your keyboard](https://gist.github.com/wragge/af8bd20a14005d267ffc759463bd832c). Left and right arrows go to the next and previous pages, while up and down arrows jump between issues. Searching is great, but sometimes you just want to browse. Install this userscript for that old-time, authentic newspaper reading experience!

But how do you install userscripts? First of all you need a browser extension to manage your userscripts – I use [TamperMonkey](https://www.tampermonkey.net/) or [ViolentMonkey](http://violentmonkey.com/). Just follow the instructions to add one of them to your browser.

To install one of my userscripts, you need to go to the script (saved as a GitHub Gist) and click on the &#39;Raw&#39; button. Your userscript manager will then ask you if you want to add the userscript. Click install!

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-07-17-16-59-11.png&#34; width=&#34;600&#34; height=&#34;244&#34; alt=&#34;&#34;&gt;

*Click on the &#39;Raw&#39; button to install.*

Once they&#39;re installed the userscripts will run automatically when specified pages are loaded. If you ever want to disable them, you can do that from your userscript manager&#39;s dashboard.

For convenience, here are the Gist links to all the userscripts I&#39;ve mentioned:

- [RecordSearch show pages](https://gist.github.com/wragge/b2af9dc56f7cb0a9476b)
- [RecordSearch show people](https://gist.github.com/wragge/2941e473ee70152f4de7)
- [Trove lists hide tags](https://gist.github.com/wragge/ab6a9d6b612bee6bc4d98658e947c420)
- [Trove newspapers keyboard navigation](https://gist.github.com/wragge/af8bd20a14005d267ffc759463bd832c)

As with anything you install on your computer, you want to make sure that you trust the source of any userscripts you add.
</source:markdown>
    </item>
    
    <item>
      <title>The rebirth of Wragge Labs (and moving my Heroku apps)</title>
      <link>https://updates.timsherratt.org/2025/07/09/the-rebirth-of-wragge-labs.html</link>
      <pubDate>Wed, 09 Jul 2025 17:48:23 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/07/09/the-rebirth-of-wragge-labs.html</guid>
      <description>&lt;p&gt;It looks like some paid work I was counting on won&amp;rsquo;t be going ahead, so I&amp;rsquo;m trying to save a bit of money on cloud hosting. As I previously noted, this resulted in &lt;a href=&#34;https://updates.timsherratt.org/2025/07/02/the-future-of-the-past.html&#34;&gt;the resurrection of &lt;em&gt;The future of the past&lt;/em&gt;&lt;/a&gt;, but I&amp;rsquo;ve also been continuing to slog away at migrating all my old Flask apps and experiments from Heroku to a single Digital Ocean droplet. As of today, I&amp;rsquo;ve migrated 11 apps. Here&amp;rsquo;s a few details&amp;hellip;&lt;/p&gt;
&lt;h2 id=&#34;a-new-old-home&#34;&gt;A new (old) home&lt;/h2&gt;
&lt;p&gt;The first thing I had to figure out was how to group together a series of individual &lt;a href=&#34;https://flask.palletsprojects.com/en/stable/&#34;&gt;Flask&lt;/a&gt; apps so I could easily run and maintain them on a single server, without making major changes to the apps themselves. I decided to go with the &lt;a href=&#34;https://flask.palletsprojects.com/en/stable/patterns/appdispatch/&#34;&gt;application dispatching pattern&lt;/a&gt; described in the Flask documentation. This groups the apps within a single Python environment so I had to do some alignment of Python versions and packages, but it wasn&amp;rsquo;t too hard and having just one virtual environment to manage seems a lot easier in the long run.&lt;/p&gt;
&lt;p&gt;The application dispatching pattern configures the server to run one application at the web root (&#39;/&#39;), with the other apps assigned individual sub-paths. This raised the question, what did I want sitting at the root address? Rather than selecting an existing application for the prime slot, I decided to take the opportunity to build a showcase that included details of many of the things I&amp;rsquo;ve created over the years.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/wraggelabs.png&#34; width=&#34;600&#34; height=&#34;720&#34; alt=&#34;Screenshot of the original Wragge Labs&#34;&gt;
&lt;p&gt;&lt;em&gt;The old Wragge Labs (circa 2012)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I also needed a new domain name. Or did I? Back in the old days, I had a site where I shared many of my tools and experiments – Wragge Labs. In the intervening years, I&amp;rsquo;d moved or migrated much of the content away and pointed the wraggelabs.com domain to my main site at timsherratt.au. But this seemed like a good opportunity to resurrect it. So if you&amp;rsquo;d like to have a play around with some of the things I&amp;rsquo;ve created over the last 30 years, head along to the all new &lt;a href=&#34;https://wraggelabs.com&#34;&gt;wraggelabs.com&lt;/a&gt;!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/new-wraggelabs.png&#34; width=&#34;600&#34; height=&#34;569&#34; alt=&#34;Screenshot of part of the new Wragge Labs!&#34;&gt;
&lt;p&gt;&lt;em&gt;The new &lt;a href=&#34;https://wraggelabs.com&#34;&gt;Wragge Labs&lt;/a&gt; showcases websites, apps, and experiments from the past 30 years – some useful, some playful, and some creepy&amp;hellip;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;A surprising amount of the things I&amp;rsquo;ve built are still working. As I was compiling my list, I made a few running repairs – for example, fixing broken links in &lt;a href=&#34;https://timsherratt.au/shed/culturevic/&#34;&gt;Linking history in place&lt;/a&gt; and &lt;a href=&#34;https://timsherratt.au/shed/magicsquares/&#34;&gt;Magic Squares&lt;/a&gt; to get them working again. However, some things only exist now in web archives, and others have been broken by the recent actions of the &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;NLA&lt;/a&gt; and &lt;a href=&#34;https://updates.timsherratt.org/2025/05/19/no-more-harvesting-data-from.html&#34;&gt;NAA&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To provide a bit of extra context, I&amp;rsquo;ve grouped together publications and presentations documenting many of the experiments. These are saved in &lt;a href=&#34;https://www.zotero.org/&#34;&gt;Zotero&lt;/a&gt;, and tagged with the name of the app. When you click on a &amp;lsquo;Related&amp;rsquo; link, the details of any linked resources are retrieved using the Zotero API and displayed on a new page. This means I can add new related resources simply by dropping  them into Zotero.&lt;/p&gt;
&lt;h2 id=&#34;moving-house&#34;&gt;Moving house&lt;/h2&gt;
&lt;p&gt;The process for moving the apps to their new home was pretty straightforward. On my local machine, I copied the code into the new aggregated structure, added any packages needed into a combined requirements file, and created a new top-level app to direct requests. And then I spun everything up and started fixing bugs&amp;hellip;&lt;/p&gt;
&lt;p&gt;All of the problems were easily resolved. Most involved fixing up paths to static assets, or in navigation links. The only significant changes to the Python code were caused by the deprecation of the &lt;code&gt;.count()&lt;/code&gt; method in Pymongo.&lt;/p&gt;
&lt;p&gt;To make life a little harder, I decided to take the opportunity to make sure that all the assets – javascript, css, and font files – were loaded from the local system, and not sitting in the cloud. Having everything local should make it easier to maintain the apps in the long term. It was a bit fiddly tracking down where everything was being loaded from, but not too hard.&lt;/p&gt;
&lt;p&gt;The only other changes I made were to add some caching to most of the apps, particularly those that make calls to external databases or APIs. I used &lt;a href=&#34;https://flask-caching.readthedocs.io/en/latest/index.html&#34;&gt;Flask-Caching&lt;/a&gt; with the local file system backend.&lt;/p&gt;
&lt;p&gt;To get the new aggregated application working on a Digital Ocean droplet, I followed the instructions on how to &lt;a href=&#34;https://www.digitalocean.com/community/tutorials/how-to-serve-flask-applications-with-uwsgi-and-nginx-on-ubuntu-22-04&#34;&gt;serve Flask applications using uWSGI and Nginx&lt;/a&gt;. I think the only thing I did differently was to use &lt;a href=&#34;https://github.com/pyenv/pyenv&#34;&gt;pyenv&lt;/a&gt; to manage Python versions and the virtual environment. To update the app, I use &lt;code&gt;rsync&lt;/code&gt; to copy across the code and &lt;code&gt;systemctl&lt;/code&gt; to restart it. So far it&amp;rsquo;s all working pretty smoothly.&lt;/p&gt;
&lt;h2 id=&#34;redirecting-heroku&#34;&gt;Redirecting Heroku&lt;/h2&gt;
&lt;p&gt;Once the apps were happy in their new home, I needed to redirect the Heroku addresses to &lt;a href=&#34;https://wraggelabs.com&#34;&gt;wraggelabs.com&lt;/a&gt;. It was surprisingly hard to find good documentation on how to do this, so I&amp;rsquo;ll document my steps in detail in case its of use to others.&lt;/p&gt;
&lt;p&gt;There are a few redirect apps for Heroku around, but I decided to use &lt;a href=&#34;https://github.com/fastmonkeys/heroku-redirect&#34;&gt;heroku-redirect&lt;/a&gt; because it basically just configures and runs Nginx without any additional processing. First I cloned &lt;code&gt;heroku-redirect&lt;/code&gt; to my local system, and then for each app I wanted to migrate I followed these steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;cd into the &lt;code&gt;heroku-redirect&lt;/code&gt; directory&lt;/li&gt;
&lt;li&gt;set the git remote for the app you want to redirect: &lt;code&gt;heroku git:remote -a [app name]&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;if you&amp;rsquo;re not using the lastest Heroku stack, update it: &lt;code&gt;heroku stack:set heroku-24 -a [app name]&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;set the url you want to redirect to (without trailing slash): &lt;code&gt;heroku config:set LOCATION=[new url]&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;to add the full path to the redirected url: &lt;code&gt;heroku config:set PRESERVE_PATH=true&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;I found I had to remove the Python buildpack from the app before updating, I did this using the Heroku dashboard, but no doubt there&amp;rsquo;s also a CLI command&lt;/li&gt;
&lt;li&gt;I also used the dashboard to add a new nginx buildpack: &lt;code&gt;https://github.com/heroku/heroku-buildpack-nginx.git&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;finally I pushed the new app, using &lt;code&gt;--force&lt;/code&gt; to replace it completely: &lt;code&gt;git push --force heroku master:main&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, you&amp;rsquo;ll need to have the Heroku CLI installed.&lt;/p&gt;
&lt;p&gt;A number of my Heroku apps were using basic dynos (which cost US$7 a month), so once they were redirected, I changed them to used shared eco dynos. Yay – money saved! Hopefully, the redirects won&amp;rsquo;t push the eco dynos beyond their monthly limit.&lt;/p&gt;
&lt;h2 id=&#34;more-experiments-to-come&#34;&gt;More experiments to come?&lt;/h2&gt;
&lt;p&gt;One of the good things about all of this housekeeping is that it&amp;rsquo;s got me thinking about new experiments. I used to love Flask and Heroku because it they made it so easy to build and share things. Now I can do the same with &lt;a href=&#34;https://wraggelabs.com&#34;&gt;wraggelabs.com&lt;/a&gt;!&lt;/p&gt;
</description>
      <source:markdown>It looks like some paid work I was counting on won&#39;t be going ahead, so I&#39;m trying to save a bit of money on cloud hosting. As I previously noted, this resulted in [the resurrection of *The future of the past*](https://updates.timsherratt.org/2025/07/02/the-future-of-the-past.html), but I&#39;ve also been continuing to slog away at migrating all my old Flask apps and experiments from Heroku to a single Digital Ocean droplet. As of today, I&#39;ve migrated 11 apps. Here&#39;s a few details...
## A new (old) home

The first thing I had to figure out was how to group together a series of individual [Flask](https://flask.palletsprojects.com/en/stable/) apps so I could easily run and maintain them on a single server, without making major changes to the apps themselves. I decided to go with the [application dispatching pattern](https://flask.palletsprojects.com/en/stable/patterns/appdispatch/) described in the Flask documentation. This groups the apps within a single Python environment so I had to do some alignment of Python versions and packages, but it wasn&#39;t too hard and having just one virtual environment to manage seems a lot easier in the long run.

The application dispatching pattern configures the server to run one application at the web root (&#39;/&#39;), with the other apps assigned individual sub-paths. This raised the question, what did I want sitting at the root address? Rather than selecting an existing application for the prime slot, I decided to take the opportunity to build a showcase that included details of many of the things I&#39;ve created over the years.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/wraggelabs.png&#34; width=&#34;600&#34; height=&#34;720&#34; alt=&#34;Screenshot of the original Wragge Labs&#34;&gt;

*The old Wragge Labs (circa 2012)*

I also needed a new domain name. Or did I? Back in the old days, I had a site where I shared many of my tools and experiments – Wragge Labs. In the intervening years, I&#39;d moved or migrated much of the content away and pointed the wraggelabs.com domain to my main site at timsherratt.au. But this seemed like a good opportunity to resurrect it. So if you&#39;d like to have a play around with some of the things I&#39;ve created over the last 30 years, head along to the all new [wraggelabs.com](https://wraggelabs.com)!

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/new-wraggelabs.png&#34; width=&#34;600&#34; height=&#34;569&#34; alt=&#34;Screenshot of part of the new Wragge Labs!&#34;&gt;

*The new [Wragge Labs](https://wraggelabs.com) showcases websites, apps, and experiments from the past 30 years – some useful, some playful, and some creepy...*

A surprising amount of the things I&#39;ve built are still working. As I was compiling my list, I made a few running repairs – for example, fixing broken links in [Linking history in place](https://timsherratt.au/shed/culturevic/) and [Magic Squares](https://timsherratt.au/shed/magicsquares/) to get them working again. However, some things only exist now in web archives, and others have been broken by the recent actions of the [NLA](https://updates.timsherratt.org/2025/05/07/farewell-trove.html) and [NAA](https://updates.timsherratt.org/2025/05/19/no-more-harvesting-data-from.html).

To provide a bit of extra context, I&#39;ve grouped together publications and presentations documenting many of the experiments. These are saved in [Zotero](https://www.zotero.org/), and tagged with the name of the app. When you click on a &#39;Related&#39; link, the details of any linked resources are retrieved using the Zotero API and displayed on a new page. This means I can add new related resources simply by dropping  them into Zotero.

## Moving house

The process for moving the apps to their new home was pretty straightforward. On my local machine, I copied the code into the new aggregated structure, added any packages needed into a combined requirements file, and created a new top-level app to direct requests. And then I spun everything up and started fixing bugs...

All of the problems were easily resolved. Most involved fixing up paths to static assets, or in navigation links. The only significant changes to the Python code were caused by the deprecation of the `.count()` method in Pymongo.

To make life a little harder, I decided to take the opportunity to make sure that all the assets – javascript, css, and font files – were loaded from the local system, and not sitting in the cloud. Having everything local should make it easier to maintain the apps in the long term. It was a bit fiddly tracking down where everything was being loaded from, but not too hard.

The only other changes I made were to add some caching to most of the apps, particularly those that make calls to external databases or APIs. I used [Flask-Caching](https://flask-caching.readthedocs.io/en/latest/index.html) with the local file system backend.

To get the new aggregated application working on a Digital Ocean droplet, I followed the instructions on how to [serve Flask applications using uWSGI and Nginx](https://www.digitalocean.com/community/tutorials/how-to-serve-flask-applications-with-uwsgi-and-nginx-on-ubuntu-22-04). I think the only thing I did differently was to use [pyenv](https://github.com/pyenv/pyenv) to manage Python versions and the virtual environment. To update the app, I use `rsync` to copy across the code and `systemctl` to restart it. So far it&#39;s all working pretty smoothly.

## Redirecting Heroku

Once the apps were happy in their new home, I needed to redirect the Heroku addresses to [wraggelabs.com](https://wraggelabs.com). It was surprisingly hard to find good documentation on how to do this, so I&#39;ll document my steps in detail in case its of use to others.

There are a few redirect apps for Heroku around, but I decided to use [heroku-redirect](https://github.com/fastmonkeys/heroku-redirect) because it basically just configures and runs Nginx without any additional processing. First I cloned `heroku-redirect` to my local system, and then for each app I wanted to migrate I followed these steps:

- cd into the `heroku-redirect` directory
- set the git remote for the app you want to redirect: `heroku git:remote -a [app name]`
- if you&#39;re not using the lastest Heroku stack, update it: `heroku stack:set heroku-24 -a [app name]`
- set the url you want to redirect to (without trailing slash): `heroku config:set LOCATION=[new url]`
- to add the full path to the redirected url: `heroku config:set PRESERVE_PATH=true`
- I found I had to remove the Python buildpack from the app before updating, I did this using the Heroku dashboard, but no doubt there&#39;s also a CLI command
- I also used the dashboard to add a new nginx buildpack: `https://github.com/heroku/heroku-buildpack-nginx.git`
- finally I pushed the new app, using `--force` to replace it completely: `git push --force heroku master:main`

Of course, you&#39;ll need to have the Heroku CLI installed.

A number of my Heroku apps were using basic dynos (which cost US$7 a month), so once they were redirected, I changed them to used shared eco dynos. Yay – money saved! Hopefully, the redirects won&#39;t push the eco dynos beyond their monthly limit.

## More experiments to come?

One of the good things about all of this housekeeping is that it&#39;s got me thinking about new experiments. I used to love Flask and Heroku because it they made it so easy to build and share things. Now I can do the same with [wraggelabs.com](https://wraggelabs.com)! 









</source:markdown>
    </item>
    
    <item>
      <title>The future of the past... in the present</title>
      <link>https://updates.timsherratt.org/2025/07/02/the-future-of-the-past.html</link>
      <pubDate>Wed, 02 Jul 2025 13:26:45 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/07/02/the-future-of-the-past.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve been on a bit of a self-archiving binge lately. It started because I needed to cut back some of my web hosting costs, and was looking at ways of bringing together a group of separately hosted Heroku apps onto a single Digital Ocean droplet. While taking stock of my various apps and experiments, I remembered there were some that hadn&amp;rsquo;t survived earlier migrations – in particular, &lt;em&gt;the future of the past&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;the future of the past&lt;/em&gt; was a weird little app built on top of a collection of 40,000 newspaper articles, harvested from Trove, that included the phrase &amp;lsquo;the future&amp;rsquo;. I created it as part of my Harold White Fellowship at the National Library of Australia in 2012, and told the story of its genesis in &lt;a href=&#34;https://updates.timsherratt.org/2025/06/30/mining-for-meanings.html&#34;&gt;my fellowship lecture&lt;/a&gt;. In short, I extracted words with the highest TF-IDF values for each year in my dataset, and fell in love with them. The word groupings were so odd and evocative that I felt I had to find some way of sharing them.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The future of the past&lt;/em&gt; made words the primary means of navigating the collection of newspaper articles. At first you were presented with a random selection of words, sized according to their TF-IDF values. When you clicked on a word, you limited the results to years in which that word appeared. You kept clicking words until only one year matched. Then you were shown a random selection of words from that year, along with the words you&amp;rsquo;d followed to get to that point. Once you&amp;rsquo;d arrived at a year, you could click on words to display the content of articles that contained that word. But you could also make poetry.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/fotp.png&#34; width=&#34;600&#34; height=&#34;347&#34; alt=&#34;Screen capture from the original instance of the future of the past, showing a jumble of words of different sizes in light coloured rectangles. The cpation invites users to &#39;choose a word...&#39;.&#34;&gt;
&lt;p&gt;&lt;em&gt;The future of the past&lt;/em&gt; gestured towards fridge magnet poetry in its design and odd jumble of words. And when you finally landed on a single year, you could create &lt;em&gt;your own poems&lt;/em&gt; by dragging words into the box at the bottom of the screen. Once you were happy with your poem you could share it on Twitter. And people did.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screen-shot-2012-10-11-at-7.20.13-pm.png&#34; width=&#34;537&#34; height=&#34;458&#34; alt=&#34;Examples of poems created by Bethany Nowviskie and shared on Twitter.&#34;&gt;
&lt;p&gt;The most exciting and enjoyable part of the project was watching people create and share their poems. &lt;em&gt;The future of the past&lt;/em&gt; even managed to win the &amp;lsquo;Best use of DH for fun&amp;rsquo; in the 2012 DH Awards. As I said in my &lt;a href=&#34;https://updates.timsherratt.org/2025/06/30/mining-for-meanings.html&#34;&gt;Harold White lecture&lt;/a&gt;, I wasn&amp;rsquo;t really sure what &lt;em&gt;the future of the past&lt;/em&gt; was – a discovery interface? a game? a piece of art? But I suppose that&amp;rsquo;s one of the reasons why I liked it so much.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/fridge-poetry.png&#34; width=&#34;565&#34; height=&#34;746&#34; alt=&#34;More examples of poems created using future of the past and shared on Twitter.&#34;&gt;
&lt;p&gt;When I built the app, I was going through a stage of putting everything in Django. Only later did I realise that Flask was a much more suitable framework for the sort of small, experimental apps I was creating. Django was overkill, and the maintenance demands coupled with hosting issues made it difficult to keep things alive. At some point, &lt;em&gt;the future of the past&lt;/em&gt; went dark and it just seemed too hard to get it going again&amp;hellip;&lt;/p&gt;
&lt;p&gt;But last week I had another look, and decided I could resurrect the app in a more maintenance-friendly form by converting it from Django to Flask, and migrating the data from MySQL to SQLite. Django and Flask are both Python frameworks, so it was mainly a matter of unpacking all the logic in Django&amp;rsquo;s views, models, and handlers and consolidating it into a couple of simple Flask functions. Fortunately, I managed to find an SQL dump of the original database in the backed-up downloads folder of an old laptop. It took a bit of fiddling, but I got the dumped data loaded into SQLite without too many problems.&lt;/p&gt;
&lt;p&gt;I also realised I could use the new database to fix up another app I created during my fellowship – &lt;a href=&#34;https://timsherratt.au/shed/presentations/nla/pages/frequencies.html&#34;&gt;a word frequency browser&lt;/a&gt;. It&amp;rsquo;s just a static HTML page, so I added a couple of JSON APIs to the Flask app so it could access the data.&lt;/p&gt;
&lt;p&gt;Both Django and Flask use Jinja2 templates, so I didn&amp;rsquo;t have to do anything much to the interface of &lt;em&gt;the future of the past&lt;/em&gt;. I made sure that all the assets (fonts and javascript) were being loaded from local copies to avoid any future problems and, of course, I had to replace the Twitter integration. I decided to add options to share poems on both Mastodon and Bluesky. Mastodon was a little tricky because you need to know a user&amp;rsquo;s instance before you can post their toot. There are a number of solutions available, but I went with the &lt;a href=&#34;https://github.com/autinerd/simple-mastodon-share-button&#34;&gt;pattern documented in this GitHub repository&lt;/a&gt;. It&amp;rsquo;s a little clunky because you need to enter your instance name each time you post, and you might also have to allow pop-ups for it to work properly, but it seems to do the job. I did think about updating some other aspects of the interface, but decided to preserve it in its original 2012 grey-toned glory.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;So &lt;a href=&#34;https://wraggelabs.com/fotp/&#34;&gt;the future of the past&lt;/a&gt; lives again in the present!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://wraggelabs.com/fotp/&#34;&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-07-02-11-45-36.png&#34; width=&#34;600&#34; height=&#34;419&#34; alt=&#34;Screenshot of the current future of the past interface, including options to share poems on Mastodon and Bluesky.&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;https://wraggelabs.com/fotp/&#34;&gt;Create your own poems!&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
</description>
      <source:markdown>I&#39;ve been on a bit of a self-archiving binge lately. It started because I needed to cut back some of my web hosting costs, and was looking at ways of bringing together a group of separately hosted Heroku apps onto a single Digital Ocean droplet. While taking stock of my various apps and experiments, I remembered there were some that hadn&#39;t survived earlier migrations – in particular, *the future of the past*.

The *the future of the past* was a weird little app built on top of a collection of 40,000 newspaper articles, harvested from Trove, that included the phrase &#39;the future&#39;. I created it as part of my Harold White Fellowship at the National Library of Australia in 2012, and told the story of its genesis in [my fellowship lecture](https://updates.timsherratt.org/2025/06/30/mining-for-meanings.html). In short, I extracted words with the highest TF-IDF values for each year in my dataset, and fell in love with them. The word groupings were so odd and evocative that I felt I had to find some way of sharing them.

*The future of the past* made words the primary means of navigating the collection of newspaper articles. At first you were presented with a random selection of words, sized according to their TF-IDF values. When you clicked on a word, you limited the results to years in which that word appeared. You kept clicking words until only one year matched. Then you were shown a random selection of words from that year, along with the words you&#39;d followed to get to that point. Once you&#39;d arrived at a year, you could click on words to display the content of articles that contained that word. But you could also make poetry.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/fotp.png&#34; width=&#34;600&#34; height=&#34;347&#34; alt=&#34;Screen capture from the original instance of the future of the past, showing a jumble of words of different sizes in light coloured rectangles. The cpation invites users to &#39;choose a word...&#39;.&#34;&gt;

*The future of the past* gestured towards fridge magnet poetry in its design and odd jumble of words. And when you finally landed on a single year, you could create *your own poems* by dragging words into the box at the bottom of the screen. Once you were happy with your poem you could share it on Twitter. And people did.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screen-shot-2012-10-11-at-7.20.13-pm.png&#34; width=&#34;537&#34; height=&#34;458&#34; alt=&#34;Examples of poems created by Bethany Nowviskie and shared on Twitter.&#34;&gt;

The most exciting and enjoyable part of the project was watching people create and share their poems. *The future of the past* even managed to win the &#39;Best use of DH for fun&#39; in the 2012 DH Awards. As I said in my [Harold White lecture](https://updates.timsherratt.org/2025/06/30/mining-for-meanings.html), I wasn&#39;t really sure what *the future of the past* was – a discovery interface? a game? a piece of art? But I suppose that&#39;s one of the reasons why I liked it so much.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/fridge-poetry.png&#34; width=&#34;565&#34; height=&#34;746&#34; alt=&#34;More examples of poems created using future of the past and shared on Twitter.&#34;&gt;

When I built the app, I was going through a stage of putting everything in Django. Only later did I realise that Flask was a much more suitable framework for the sort of small, experimental apps I was creating. Django was overkill, and the maintenance demands coupled with hosting issues made it difficult to keep things alive. At some point, *the future of the past* went dark and it just seemed too hard to get it going again...

But last week I had another look, and decided I could resurrect the app in a more maintenance-friendly form by converting it from Django to Flask, and migrating the data from MySQL to SQLite. Django and Flask are both Python frameworks, so it was mainly a matter of unpacking all the logic in Django&#39;s views, models, and handlers and consolidating it into a couple of simple Flask functions. Fortunately, I managed to find an SQL dump of the original database in the backed-up downloads folder of an old laptop. It took a bit of fiddling, but I got the dumped data loaded into SQLite without too many problems.

I also realised I could use the new database to fix up another app I created during my fellowship – [a word frequency browser](https://timsherratt.au/shed/presentations/nla/pages/frequencies.html). It&#39;s just a static HTML page, so I added a couple of JSON APIs to the Flask app so it could access the data.

Both Django and Flask use Jinja2 templates, so I didn&#39;t have to do anything much to the interface of *the future of the past*. I made sure that all the assets (fonts and javascript) were being loaded from local copies to avoid any future problems and, of course, I had to replace the Twitter integration. I decided to add options to share poems on both Mastodon and Bluesky. Mastodon was a little tricky because you need to know a user&#39;s instance before you can post their toot. There are a number of solutions available, but I went with the [pattern documented in this GitHub repository](https://github.com/autinerd/simple-mastodon-share-button). It&#39;s a little clunky because you need to enter your instance name each time you post, and you might also have to allow pop-ups for it to work properly, but it seems to do the job. I did think about updating some other aspects of the interface, but decided to preserve it in its original 2012 grey-toned glory.

**So [the future of the past](https://wraggelabs.com/fotp/) lives again in the present!** 

&lt;a href=&#34;https://wraggelabs.com/fotp/&#34;&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-07-02-11-45-36.png&#34; width=&#34;600&#34; height=&#34;419&#34; alt=&#34;Screenshot of the current future of the past interface, including options to share poems on Mastodon and Bluesky.&#34;&gt;&lt;/a&gt;

**[Create your own poems!](https://wraggelabs.com/fotp/)**


</source:markdown>
    </item>
    
    <item>
      <title>Mining for meanings</title>
      <link>https://updates.timsherratt.org/2025/06/30/mining-for-meanings.html</link>
      <pubDate>Mon, 30 Jun 2025 18:36:42 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/06/30/mining-for-meanings.html</guid>
      <description>&lt;p&gt;&lt;em&gt;In 2012, I was lucky enough to be awarded a Harold White Fellowship by the National Library of Australia. I used my time to explore ways of using Trove&amp;rsquo;s digitised newspapers as data, and presented my work at a public lecture in May 2012. I spoke from notes and never got round to writing it all up. The recording made by the NLA has disappeared from their website, but is &lt;a href=&#34;https://web.archive.org/web/20140212200542/http://www.nla.gov.au/podcasts/media/Harold-White/tim-sherratt.mp3&#34;&gt;still available in the Internet Archive&lt;/a&gt;. The text below is a transcription of the recording made in June 2025 with some minor editing.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;You can also listen to the audio, &lt;a href=&#34;https://timsherratt.au/shed/presentations/nla/&#34;&gt;browse the full set of slides&lt;/a&gt;, or &lt;a href=&#34;https://doi.org/10.5281/zenodo.15771695&#34;&gt;download a PDF&lt;/a&gt; from Zenodo.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;audio src=&#34;https://cdn.uploads.micro.blog/8371/2025/tim-sherratt.mp3&#34; controls=&#34;controls&#34; preload=&#34;metadata&#34;&gt;&lt;/audio&gt;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/harold-white-2.jpeg&#34; width=&#34;600&#34; height=&#34;430&#34; alt=&#34;&#34;&gt;
&lt;p&gt;&lt;em&gt;Photograph by Christopher Brothers, 2012, &lt;a href=&#34;https://nla.gov.au/nla.obj-132272018&#34;&gt;nla.gov.au/nla.obj-1&amp;hellip;&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&#34;i-beyond-discovery&#34;&gt;I. Beyond discovery&lt;/h2&gt;
&lt;p&gt;Thanks Marie-Louise and thanks to the library for this great opportunity. And of course thanks to all of you for coming along on a night when I&amp;rsquo;m sure you&amp;rsquo;d rather be at home waiting for the budget speech. And this is the API working away here in the background. Okay, well, do I really need to introduce the newspaper database? I suspect I probably don&amp;rsquo;t for this sort of audience. You&amp;rsquo;re probably avid users of the digitized newspapers online. Are you? Yeah. I did my doctoral research back in the dark ages before Trove, and of course that meant spending many weeks, if not months, destroying my eyesight using microfilm readers. Using what are quite fragmentary printed indexes to try and find stuff which might be relevant to my study. But now of course more than 60 million newspaper articles online and most importantly, really the full text of these articles is searchable. It is something which we&amp;rsquo;re quite familiar with now, but it is something which is quite revolutionary in many ways.&lt;/p&gt;
&lt;p&gt;This unprecedented access to a vast volume of material which documents the ordinary lives of Australians is already changing historical practice. We can now go beyond the well-known events, the big stories and explore the small stories, the fragments, the glimpses of lives which might not otherwise be recorded, but this access comes with a cost. What happens when we do a search and instead of getting 10 results or 100 results, we get 10,000 results or 100,000 results? How do we start to use or understand that sort of thing? What do we do when instead of the clarity and excitement of discovery, we end up with the anxiety and confusion that can come with overwhelming abundance?&lt;/p&gt;
&lt;p&gt;Fortunately though, there are a growing number of digital tools which we can turn to. Tools and technologies which enable us to manage this deluge and to explore large volumes of text rather than sort of single search results. Tools that enable us to zoom out of our search results and have a look at the big picture to understand the trends and the patterns to see what&amp;rsquo;s going on. For example, perhaps we might want to try and track events over time. Have a look for example, this graph shows the prevalence of the words drought and floods in the newspaper database over time.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-003.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;So we can actually look at that and use it as a way where we can map it to the specific events. And we can see here, of course, this is the federation drought at this point. We could also start to look for patterns that aren&amp;rsquo;t easy to see within your sort of normal list of search results.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-004.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;I was interested in having a look at how the word decade might be used. So I searched for decade and I found, as you can see, that there&amp;rsquo;s these nice sort of regular peaks and I was wondering why have we got these such these regular peaks? And I did a bit more digging and I discovered why. That red line shows the usage of the word census. And you see here how the little peaks sit on top of each other?&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-005.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;So obviously the time we talk most about decades is when a census has come out. So again, this is a sort of pattern which would be very hard to find other ways by just sort of working through our list of search results. We can also use these sorts of technologies for exploring changes in language, the way we talk about things, the labels we use.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-006.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;This is an example which I&amp;rsquo;ve taken from the National Library of New Zealand searching through New Zealand newspapers in this case. But what they&amp;rsquo;ve done is to look at the change in usage of the name for the south island, which was apparently, I didn&amp;rsquo;t know this originally called Middle Island and changed to the South Island. And so you can see here this sort of process of transition happening before South Island takes over completely.&lt;/p&gt;
&lt;p&gt;We can also challenge our expectations. Now, I was always of the belief that the traditional name for people from English cultural background of that chap who wears a red suit and comes around at Christmas time was Father Christmas. And then in recent years that has been supplanted by the sort of Americanized Santa Claus, but it seems I&amp;rsquo;m wrong.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-007.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;The red line here is Santa Claus, the blue line is Father Christmas. And so if we look here from the late 19th century to the early 20th century, Santa Claus is definitely winning. What&amp;rsquo;s interesting though, really interesting, is when we get the change over. Any guesses as to what&amp;rsquo;s going on there?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; Coke advertising.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Pardon?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; Coca-Cola advertising.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Well actually, I don&amp;rsquo;t know. My hypothesis, what&amp;rsquo;s happening, is this is around sort of 1914 and it seems that over the war period, Father Christmas starts to win over the top of Santa Claus. So whether, I mean this is pure hypothesis at this point, and it&amp;rsquo;s something which would be interesting to explore, whether it&amp;rsquo;s the Germanic sound of Santa Claus, it sort of lapses in popularity or perhaps there are other causes, completely other circumstances. But that&amp;rsquo;s the value of these sorts of things that they do allow you to ask some questions and to prompt you to do some other sorts of investigation.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-008.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;Now these graphs which I&amp;rsquo;m showing you, were all created by a tool I developed called &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/querypic/&#34;&gt;QueryPic&lt;/a&gt;. And we won&amp;rsquo;t just show you the slide, we&amp;rsquo;ll actually use it. I want a word. Anybody give a word?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; Brooch.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Broach?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; Because yours is nice.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I&amp;rsquo;m not sure it&amp;rsquo;s going to show anything. Yeah, brooch.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; Know what we&amp;rsquo;ve done &amp;lsquo;automaton&amp;rsquo;, the one we talked about last week.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You have to spell it for me.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; A-U-T-O-M-A-T-O-M. Actually, correct. That&amp;rsquo;s what you&amp;hellip;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;No, no, no, this is right. Okay. So what this is doing now, it&amp;rsquo;s actually going off to the Trove API and it&amp;rsquo;s getting the total results. I mean it&amp;rsquo;s actually a very simple tool. All it&amp;rsquo;s doing is it&amp;rsquo;s taking your query and it&amp;rsquo;s searching for each year across the span of the newspaper database, and it&amp;rsquo;s getting the total number of results for each year and it&amp;rsquo;s then presenting them in the form of the graph. As I say, it&amp;rsquo;s very simple, but it&amp;rsquo;s also quite effective as you can see. And it&amp;rsquo;s useful and it&amp;rsquo;s also quite fun. And what it gives you is the ability to quickly explore a hunch, to get a sort of sense of context or to start exploring, to start framing a more specific research question without spending&amp;hellip; there we go&amp;hellip; without spending days searching or tabulating as you would normally have to do. So you can see how easy it is to use and if you want to actually compare that to something else, you can just type in another word.&lt;/p&gt;
&lt;p&gt;Okay, now there are obvious limitations to a tool like this. There&amp;rsquo;s a lot to unpack. I wouldn&amp;rsquo;t want to say that it&amp;rsquo;s evidence because there is so many assumptions built into the back end of it. Questions about what the search engine is actually giving you back, different usages of terms, obviously the contexts and things like the quality of the OCR itself. You know there are a whole lot of stuff. But despite all that, I think it is quite useful, as I said, in terms of allowing you to explore things quite quickly and to follow your hunches. I regard it as a starting point, not as an end.&lt;/p&gt;
&lt;p&gt;Now, but there are some folks&amp;hellip; let me see if it&amp;rsquo;s going to finish.. there are some folks who are a bit more confident about techniques such as this and who would suggest that not only can they provide evidence, but they can actually be used to develop mathematical representations of past behavior.&lt;/p&gt;
&lt;h2 id=&#34;ii-finding-formulas&#34;&gt;II. Finding formulas&lt;/h2&gt;
&lt;p&gt;You may have heard of the Culturomics project from Harvard University. These guys got access to the full corpus of Google&amp;rsquo;s digitized books. So 5 million books, the text of 5 million books. They pulled it all apart. They did a bit of cleaning up of the metadata, all sorts of stuff, and then they started searching it and they started to see what they, but they could pull out of it. And when they started searching, they noticed all sorts of patterns appearing and they argued that these patterns could actually form the basis for what they said was a new science of culture, hence culturomics.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-010.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;There&amp;rsquo;s a lot I might say about that, but I just want to look at one example. Okay, this is an example which in their published paper in Science, they called &amp;lsquo;We forget&amp;rsquo;, and I generated it using an online tool called the Ngram Viewer. You can go and do this yourself if you like. And what it&amp;rsquo;s showing as you might be able to see is it&amp;rsquo;s searching for years used within the text. So 1883, 1910, 1950. It&amp;rsquo;s pulling out all the instances where those labels are used within the text, where those terms are used. And there does obviously seem to be some sort of pattern. And the research has noticed that the graphs have a characteristic shape, obviously rapid ascent and then a decline. But they also notice changes. Of course, the size of the peaks is changing over time, getting higher. They say that this is indicating a greater focus on the present and the rate of decay is increasing, so that the peak is actually dropping away faster. And they say from this, we are forgetting our past faster with each passing year.&lt;/p&gt;
&lt;p&gt;I thought it would be interesting to repeat this experiment using QueryPic. So I did. It looks a bit different.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-011.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;I mean, before we could interpret this difference, of course, there&amp;rsquo;s a lot that we would want to ask, a lot of, first of all, methodological questions. Again, exactly what are we searching in the two instances and how can we compare the searching, the books in one instance to the newspapers in others - dates obviously play a different role in newspapers than they do in books. But it was actually the conceptual issues, which really struck me in relation to this example and in particular the assumption that we can compare the past, present, and future uses of these labels as if we are talking about the same thing: as if the label 1950 means the same thing before 1950, in 1950 and after 1950. The names for events and periods that we assign, that we share, that we use are themselves the products of historical processes. They slip, they shift, they change.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-012.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;We all know what we mean by the Great Depression. Where&amp;rsquo;s the Great Depression on this graph? So in terms of the usage at the time, the usage of the term &amp;lsquo;Great Depression&amp;rsquo; was actually greater in the 1890s than in the 1930s.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-013.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;We&amp;rsquo;re very familiar with the usage of the, we&amp;rsquo;re talking about &amp;lsquo;black&amp;rsquo; days like Black Tuesdays. Black Fridays of course, is the one we&amp;rsquo;re most familiar with. And in Australia, these labels are generally attached to bushfires of course, and that&amp;rsquo;s the context where we generally understand them and use them and remember them. And over here, of course we have Black Friday. So what&amp;rsquo;s this big peak here? It&amp;rsquo;s not a bushfire. It refers to the Victorian government&amp;rsquo;s mass sackings of senior civil servants and judges in 1878. Obviously it was an extremely important event at the time, an extremely important event in government in Victoria, but it doesn&amp;rsquo;t quite figure in our collective memory in the same way as Black Friday does.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-014.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;One of my early experiments with QueryPic was to look at the question of when did the Great War become the First World War? At what point did we stop thinking about the Great War as the war to end all wars and realize that it was one in a series of global conflicts? And the graph really does a nice job of confirming our expectations, I suppose, in that we see a nice crossover late in 1941, which if we were thinking about the passage of the war would be about when we probably would expect. But what&amp;rsquo;s missing from this, what&amp;rsquo;s missing of course is just the war.&lt;/p&gt;
&lt;p&gt;Just as with the Great Depression and with the Black Wednesday, what&amp;rsquo;s hard is trying to recapture the moment as it was happening, the sense for want of a better word, of present-ness. Now, if we go back to &amp;lsquo;We forget&amp;rsquo; what are we doing when we&amp;rsquo;re talking about one of these dates? I mean, if we think of the present as providing a line there - past, present, future - on the past side, what are we doing? We&amp;rsquo;re anticipating. We&amp;rsquo;re predicting. Perhaps, we&amp;rsquo;re dreading. And present, we&amp;rsquo;re experiencing, we&amp;rsquo;re enjoying, maybe suffering. In the future, we are remembering, we are regretting, perhaps reflecting. So instead of lumping all these together, it seems to me that we should be teasing them out and exploring their different interconnections.&lt;/p&gt;
&lt;p&gt;We should be trying to give the past back its own sense of the present. And this in essence was the modest and thoroughly achievable goal of my Harold White Fellowship. I wanted to explore the possibilities of the digitized newspaper collection in supporting this sort of rich temporal contextualization using digital methods to recover the pasts, the presents and the futures of any moment in our history. I have to admit, I haven&amp;rsquo;t got very far yet, and Marie-Louise has been doing a good job of reassuring me that sometimes the fruits of these things take a while to develop. Now, there are a number of reasons why I haven&amp;rsquo;t gotten as far as I wanted, but I do have a few sort of sketches that I want to share with you.&lt;/p&gt;
&lt;h2 id=&#34;iii-the-future-of-the-past&#34;&gt;III. The future of the past&lt;/h2&gt;
&lt;p&gt;Okay. What I decided to do is to try and create a sort of manageable sample set. So I decided to work with articles, which included the phrase, &amp;lsquo;the future&amp;rsquo;, in the heading or the first four lines of the newspaper articles, and that&amp;rsquo;s one of the facets you can use within Trove. Why did I limit it in this way? Well, I&amp;rsquo;ve been doing a lot of different work in Trove as Marie-Louise said. One project I&amp;rsquo;ve been working on was looking at ways of finding editorials within Trove and exploring the content of editorials over time. And in doing that, I discovered a number of frustrating things, one of which is sometimes the articles aren&amp;rsquo;t divided up as nicely as we want them to be. Particularly with editorials, editorials on different subjects are often joined together, so it&amp;rsquo;s difficult to separate out the specific ones that you want, but I thought by limiting my search in this way, that it increases my chance of relevance. And it also brought the number of matches down to what I thought was a reasonably manageable 60,000 or so.&lt;/p&gt;
&lt;p&gt;So I started harvesting those 60,000 articles. I have over time been developing a number of tools working with Trove. One of which is a harvester, which enables you to get the data in bulk. And of course that&amp;rsquo;s necessary if you&amp;rsquo;re going to do this sort of large-scale analysis on it. I modified my existing harvesting tools to save the results directly into the database and when the API became available, I modified it to use the API, which makes a lot of things easier. Now, after about 40,000, I thought I probably had enough, and I decided I&amp;rsquo;d trust in Trove&amp;rsquo;s relevance ranking and just work with that set.&lt;/p&gt;
&lt;p&gt;And then it was time to do some cleaning. Now, Trove&amp;rsquo;s crowdsourced OCR correction project has been a wonderful success, of course, but it&amp;rsquo;s worth noting that with the sample of articles that I harvested for this project, only 2% had any corrections at all. So 98% totally uncorrected, totally untouched. While I couldn&amp;rsquo;t hope to correct all of those articles myself, I could at least try to reduce some of the noise, which is created by these sorts of OCR errors. So I developed a series of scripts which would try to clean up some of that OCR output. First of all, it corrected some errors which are fairly consistent and hopefully fairly unambiguous OCR errors. And you can test yourself here. What&amp;rsquo;s that meant to be?&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-018.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; His.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Pardon?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; His.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Nope. The.&lt;/p&gt;
&lt;p&gt;What about this one? Nah, the. This one? No, of. Should get that one. Ah, yep. And you can check. There we go. Look of, ah, yep. So there are a series of these which I could through a script just fix up. I then checked each word in the text against a series of dictionaries and word lists, and this included a word list of places which I actually extracted from the places Gazetteer provided by Geoscience Australia. Anything which didn&amp;rsquo;t seem to match up, I marked up in a way that I could extract it later if I wanted to. And all of this, you&amp;rsquo;ve got to understand, went through a lot of trial and error, just trying stuff out, seeing what it produced, trying it again, fiddling with it lots and lots and lots of trial and error.&lt;/p&gt;
&lt;p&gt;But after that, I could do some fun things. You&amp;rsquo;re all of course familiar with word clouds, but I bet you haven&amp;rsquo;t seen a non-word cloud. This is my non-word cloud.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-019.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;Now, of course, the big question is what is &amp;lsquo;others&amp;rsquo; doing there? I don&amp;rsquo;t know. For some reason my word list didn&amp;rsquo;t like the word others, but of course you can see here some more sort of consistent OCR errors. There&amp;rsquo;s another &amp;lsquo;the&amp;rsquo; and another &amp;lsquo;the&amp;rsquo;, and that would be a &amp;lsquo;be&amp;rsquo;, in most cases. And we also see where words have been split up. We&amp;rsquo;ve got a &amp;lsquo;tralia&amp;rsquo; down there. Oh, that&amp;rsquo;s a &amp;lsquo;which&amp;rsquo; obviously. So it&amp;rsquo;s actually quite useful visualizing in this way because I can then feed that back into my process of cleaning. I can see where the common errors are, and I can start to feed that back into the process.&lt;/p&gt;
&lt;p&gt;For each article that I processed in this way, I generated an accuracy score, which was simply the number of recognized words divided by the total number of words within the article. And I could use these scores to develop a couple of overviews.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-020.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;So of my set, this is only just my sample of course, so this is OCR accuracy over time. There aren&amp;rsquo;t many articles in this earlier period, so it&amp;rsquo;s probably not worth worrying about too much. But what&amp;rsquo;s interesting is this decline here down to the 1920s where we&amp;rsquo;re going below the 80% mark. Why is that? I&amp;rsquo;ve got no idea. There are a whole lot of variables which could be certainly involved here, whether it&amp;rsquo;s the fonts, whether it&amp;rsquo;s the quality of the printing, the quality of the paper, the quality of the microfilming. I don&amp;rsquo;t know. It&amp;rsquo;s something which would be interesting to explore further and to and investigate.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-021.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;We can also have a look at the poorest performing newspapers. So the &lt;em&gt;Perth Gazette and West Australian Times&lt;/em&gt; didn&amp;rsquo;t do too well, and it got 58% in my scorecard. Again, this is only a select sample, so I&amp;rsquo;m not quite sure what you can read into any of this, but it&amp;rsquo;s sort of interesting. These figures weren&amp;rsquo;t particularly important for my work, but I do think that the general issue of OCR quality is really vitally important, particularly as we make more and more scholarly use of these sorts of collections in bulk. I mean, obviously we need to improve the quality, but we also need to expose our assumptions about the OCR quality that underlie our work so that when we are putting forward something, some sort of analysis of the text, we&amp;rsquo;ve got a way of communicating the quality of the material that we&amp;rsquo;re working with.&lt;/p&gt;
&lt;p&gt;I then decided to make my sample set even more manageable by selecting just the first 10,000 articles, which had accuracy figures of over 80%. So I used my scores and I went through and I decided just to choose those ones which seemed to go pretty well. Of course, as any good digital humanities person does, I then started counting words.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-022.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;As with most of this stuff that I&amp;rsquo;m showing you tonight, &lt;a href=&#34;https://timsherratt.au/shed/presentations/nla/pages/frequencies.html&#34;&gt;it&amp;rsquo;s online and you can go and play with it&lt;/a&gt; and use it yourself. So this shows the word frequencies over time, and there&amp;rsquo;s a time slider here, and you can just drag it along and see what&amp;rsquo;s happening in different years. Now, nothing really significant jumps out at me from looking at the word frequency clouds here. I mean, what is sort of interesting I suppose, is the preponderance of &amp;lsquo;would&amp;rsquo; and &amp;lsquo;could&amp;rsquo;, which I suppose confirms the future orientation of the sample set that I&amp;rsquo;m working with. And there may well be other things within there that jump out at you. And so as I say, jump online and have a look and have a play with this and see what you can make of it.&lt;/p&gt;
&lt;p&gt;I mean, word frequencies&amp;hellip; Okay. So word frequencies can be interesting in getting a sort of overall picture of a large amount of text and starting to track some changes over time. But this sort of word frequency tells you what&amp;rsquo;s common. It doesn&amp;rsquo;t tell you what&amp;rsquo;s distinctive. It doesn&amp;rsquo;t tell you what&amp;rsquo;s interesting in an article. And another measure we can use to try and get at the distinctiveness of a piece of text is something called TF-IDF. It&amp;rsquo;s an acronym, term frequency, inverse document frequency. And what it does, it looks not just at the frequency of a word within a particular piece of text, it also looks at the frequency of that term across a collection of texts. So a word that is common in a particular article, but not very common in a collection of articles will appear as more significant, more heavily weighted in its TF-IDF value.&lt;/p&gt;
&lt;p&gt;You use TF-IDF values all the time. They&amp;rsquo;re used by search engines in calculations of similarity. They can take the TF-IDF values, convert it into a sort of mathematical format and use it to calculate the similarity between two pieces of text. And the results of calculating TF-IDF values for collections like this are pretty interesting, and I&amp;rsquo;ll just show you a little comparison.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-023.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;So on the left-hand side here, we have, this is 1939. The top 10 words on the left-hand side are just the plain frequency values, and here are the TF-IDF values. So you see we&amp;rsquo;re getting at something quite different and something quite interesting here. Obviously 1939, Hitler doesn&amp;rsquo;t figure in this list of terms. He&amp;rsquo;s at the top in this one. But we also get these really odd things like midget and roundabout. I found it really interesting producing these values, and I found them quite sort of evocative and interesting and wanted encouraging me to explore more, and I&amp;rsquo;m going to talk some more about this a bit later.&lt;/p&gt;
&lt;p&gt;But I finally just wanted to show you one other way of understanding a collection of texts, and that&amp;rsquo;s through a thing called topic modeling. There&amp;rsquo;s a lot of topic modeling going on in the digital humanities at the moment, and there are a number of good blog posts, which I&amp;rsquo;ll put links to from here, which tell you about what topic modeling is. I&amp;rsquo;m just going to quickly race through it. Basically, I use a piece of software called Mallet. I pointed Mallet at my collection of texts, told that I wanted to define 10 topics, that is 10 clusters of articles within those texts, and it just did it.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-024.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;And what it came back with is these lists of words which are grouped according to the topics which it believes existed. You can then go through and look at these lists of words and start to interpret them to try and understand what those topics are. And most of them are pretty clear. This of course, is the topic that tells me that I still didn&amp;rsquo;t clean up the OCR enough, but it&amp;rsquo;s interesting that it brought them all together.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ve got, I mean here we&amp;rsquo;ve got trade, here we&amp;rsquo;ve got technology, here we&amp;rsquo;ve got land/rural, here we&amp;rsquo;ve got international relations, here we&amp;rsquo;ve got government, and here we&amp;rsquo;ve got home and society. And it&amp;rsquo;s amazing once you run these things, how much sense the topics actually make to you.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-025.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;And it also goes through and for each article in your collection, it weights them according to these topics. So you can then go and for each article you can see which is the most heavily weighted topic for that article, and you can calculate the number associated with each topic and you can do something like that. Okay, that&amp;rsquo;s not terribly instructive as it is, but you can, I won&amp;rsquo;t show you now, but if you click on that and &lt;a href=&#34;https://timsherratt.au/shed/presentations/nla/pages/topic_totals.html&#34;&gt;go to the live version&lt;/a&gt; and you click on the legend down the bottom there, you can actually take away some of the lines so you can actually see what&amp;rsquo;s happening underneath, so you can just see the lines that you&amp;rsquo;re interested in.&lt;/p&gt;
&lt;p&gt;But basically, I want to do a lot more work on these topics at this stage, and I haven&amp;rsquo;t really done a lot of interpretation of them. I want to see how I&amp;rsquo;m actually using those weightings and find better ways of actually looking at them. So anyway, here I am. Not a lot of interpretation at this stage. No great insights. I have some data. I have a data set, and I&amp;rsquo;m going to be continuing to play with it. And as I&amp;rsquo;ve said, all this stuff is available online, so you&amp;rsquo;re welcome to come and play with it too and see what you can make of it. Now, you may think that I&amp;rsquo;ve gone into a lot of tedious detail about what I did. Well, I&amp;rsquo;ve actually saved you from a lot of the gory details.&lt;/p&gt;
&lt;h2 id=&#34;iv-meanings-for-mining&#34;&gt;IV. Meanings for mining&lt;/h2&gt;
&lt;p&gt;The truth of much research in the digital humanities is that large amounts of time are spent yak shaving, and data munging. If you don&amp;rsquo;t know the term &amp;lsquo;yak shaving&amp;rsquo;, it&amp;rsquo;s that process that we&amp;rsquo;re all familiar with, when you start doing a particular task and you realize, in order to achieve that task, you have to actually do something else or research something else, and that actually continues into infinite regression until you find yourself doing something which seems totally unrelated to the task that you started with. I&amp;rsquo;ve had a lot of that recently. There were lots of issues just involved in using this data and starting to manipulate it. As I&amp;rsquo;ve said before, the issue of the OCR quality is crucial, and we have to be upfront about the problems and continue to look for the most effective solutions. We have to talk about questions of selection and completeness. What&amp;rsquo;s actually in Trove? How does it change and how does this sort of influence the results that we get?&lt;/p&gt;
&lt;p&gt;One of my examples here is a thing called the Atomic Age Exhibition, which toured around Australia in 1948-49. It was a big thing. Many, many thousands of people visited. It was at the Easter Show in Sydney. If you search in Trove for Atomic Age Exhibition, you&amp;rsquo;ll find quite a lot of results coming from the Courier Mail in Brisbane. You&amp;rsquo;ll find virtually nothing from Sydney and Melbourne, and you might be inclined to think that the exhibition didn&amp;rsquo;t actually go to Sydney and Melbourne. Why is there nothing in Sydney and Melbourne? Because the exhibition was sponsored by the Herald in Melbourne and by the Daily Telegraph in Sydney, and both of those titles are currently not in the newspaper database.&lt;/p&gt;
&lt;p&gt;So we&amp;rsquo;ve got to bring these sorts of questions and perspectives as we start to do this research. Another barrier, which I started to butt my head up against in doing this was that of computing power. Generating the TF-IDF values for my sample took about a day and a half on my laptop. And of course, then you realize, you did something stupid and you have to do the whole thing again. And I did wonder at various times whether I was reaching the limits of what&amp;rsquo;s practically possible for one bloke and his laptop and wondering whether my sort of piecemeal efforts will be blown away by academic teams with access to research funds, bright young graduate students, and time on a supercomputer.&lt;/p&gt;
&lt;p&gt;Now, this list of problems and concerns might seem a bit depressing, and it might not be what you expected from this talk, but I want to reassure you, there are digital tools that make it easy to get started and start exploring the possibilities. QueryPic of course, there are other things like &lt;a href=&#34;https://voyant-tools.org&#34;&gt;Voyant&lt;/a&gt;, which is a great online tool for starting to do text analysis, but sooner or later you&amp;rsquo;re going to have to confront some pretty hefty questions. But hey, that&amp;rsquo;s just history. The past is messy and it raises difficult questions about things like selection and interpretation. The issues aren&amp;rsquo;t necessarily new, it&amp;rsquo;s just that they&amp;rsquo;re raised in a bigger, more technically challenging context.&lt;/p&gt;
&lt;p&gt;But what does really bug me is a nagging feeling that I should be taking statistics more seriously. That in constructing the sort of examples which I&amp;rsquo;ve been showing and the tools that I&amp;rsquo;ve been demonstrating, that I should actually be less impressionistic and more rigorous, as if I&amp;rsquo;m sort of not doing justice to the vast computing power that I have at my disposal. But I don&amp;rsquo;t want to do that. In January, I was at the American Historical Association meeting and I was actually able to see the culturomics guys live in person doing their spiel. And as they described their vision for a new science based on access to these huge cultural data sets, I tweeted, &amp;ldquo;Yeah, I want to use big data to tell better, more compelling, more human stories.&amp;rdquo;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-027.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;The British historian Tim Hitchcock has similarly described his own unease that the demands of big data seem to be moving him towards a more positivist style of history. In the humanities, we&amp;rsquo;ve been really fortunate to make use of many decades of research into things like information retrieval. We&amp;rsquo;ve adopted many of their concepts, their tools, and their formulae, but we&amp;rsquo;ve also adopted some of their language. And so we talk about what we&amp;rsquo;re doing as mining. Mining is an extractive process. We dig stuff up, we pull it out of the ground. But this seems to be pretty much the opposite of what I want to do. I mean, I do want to find structures and separate them out for different types of analysis, but then again, I want to put them back together. I want to observe them in different contexts as rich and as complex as possible. How do we do that?&lt;/p&gt;
&lt;p&gt;Well, first of all, we have to work out better ways of incorporating these sorts of big data perspectives into the narratives that we write. Just as QueryPic gives you that opportunity to sort of zoom out and get a big picture, I think we have to take control of the zoom and use it to our advantage. And this, by the way, probably means developing new forms of publication that allow easier and better integration of data and text. It&amp;rsquo;s challenging, but there&amp;rsquo;s not much point to dwelling on the dangers and problems of big data, and as Tim Hitchcock concludes, we simply need to get on with it.&lt;/p&gt;
&lt;h2 id=&#34;v-screwmeneutics-and-deformance&#34;&gt;V. Screwmeneutics and deformance&lt;/h2&gt;
&lt;p&gt;The second approach is to foster acts of creative subversion, to use digital tools in new ways. Literary scholars within the digital humanities talk about the possibilities of deformance, of using computational methods to change texts in ways that can open them up to new and new critical perspectives. Stephen Ramsey also talks about moving beyond traditional forms of search and browse and admitting &amp;lsquo;screwing around&amp;rsquo; as a legitimate research methodology. Of course, historians don&amp;rsquo;t want to start deforming their sources or do they?&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-029.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;This is an experiment I created called &lt;a href=&#34;https://www.realfaceofwhiteaustralia.net/faces/?rsort=3&#34;&gt;The Real Face of White Australia&lt;/a&gt;. I always get a bit teary when I put this up. Well, what I&amp;rsquo;ve done here is actually use computer vision software to extract portrait photographs from certificates, which were used in the administration of the White Australia policy. These are records held by the National Archives of Australia. There&amp;rsquo;s several thousands of these, and this is just from one series, and you can just keep scrolling and scrolling forever, or almost forever. So by manipulating the sources in these ways, by extracting those photographs, I&amp;rsquo;ve created a new way of seeing these records and it&amp;rsquo;s quite powerful.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-030.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;But we can also be playful. You may have seen this. This is a little game that I created using the newspaper database. Again, it&amp;rsquo;s very simple. It just picks a newspaper article at random from the database and asks you to try and guess the year in which it was published. So any guesses for this one? What would we say? Let&amp;rsquo;s say 1850&amp;hellip; That&amp;rsquo;s a bit later than that&amp;hellip; Let&amp;rsquo;s see, it&amp;rsquo;s earlier. Okay, so you can keep going like this. You can go and try it out yourself later. As I said, it&amp;rsquo;s very simple, but it&amp;rsquo;s also strangely addictive. And of course, it&amp;rsquo;s also a way of exploring the content of Trove by screwing around.&lt;/p&gt;
&lt;p&gt;QueryPic, the Real Face of White Australia and newspaper roulette, my &lt;a href=&#34;https://headlineroulette.net&#34;&gt;Headline Roulette&lt;/a&gt;, also have something else in common. They are public. I want people to use them. I want people to have fun. I want people to be moved. I want people to find things to be surprised and to do history.&lt;/p&gt;
&lt;p&gt;Just yesterday I received an email from a self-confessed Australian history addict, oh no, Australian history fanatic, sorry. And she had become addicted to Headline Roulette. She wanted to know if I could add a facility for users to actually save their scores. So presumably they could go back and see if they&amp;rsquo;d improved or share them with their friends. So obviously the next step is the Facebook application. I mean, other people have described to me how scrolling through the Real Face of White Australia brought them to tears. And I&amp;rsquo;ve come to realize that these sorts of interactions really mean more to me than a footnote in an academic article. I&amp;rsquo;ve probably just killed my hopes of an academic career there. I want to use digital tools, I do want to use them to deform history. I want to use it to deform history in a way that makes it accessible to new audiences in new ways. And so I present to you in honor of my Howard White Fellowship, a new experiment.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-031.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;Now, I described to you before the process involved in calculating the TF-IDF values. What I didn&amp;rsquo;t describe was the fun that I had when I was doing it. It was really quite exciting and amusing and funny and all sorts of things, watching the words fly past on the screen. And as I completed each year, I would had a little script which would show me the top 20 words for that year. And anybody who follows me on Twitter, we all have a good picture of what was going on there because I couldn&amp;rsquo;t help but share this. So I mean, it tells their own story. I mean, it was really like a sort of wonderful puzzle as I say there, as they all came up. And then I started tweeting some of them.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-032.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-033.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;This was a nice one. I like the &amp;lsquo;hitler&amp;rsquo; with &amp;lsquo;mudguards&amp;rsquo;, &amp;lsquo;duchess&amp;rsquo;, &amp;lsquo;opossum&amp;rsquo;, &amp;lsquo;hollywood&amp;rsquo; and &amp;lsquo;canberra&amp;rsquo;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-034.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;And as I said here, of course, I mean there&amp;rsquo;s got to be a novel in &amp;lsquo;prince&amp;rsquo;, &amp;lsquo;pronunciation&amp;rsquo;, &amp;lsquo;keyboard&amp;rsquo;, &amp;lsquo;zulu&amp;rsquo;, &amp;lsquo;begged&amp;rsquo;, &amp;lsquo;unbent&amp;rsquo;, &amp;lsquo;diddle&amp;rsquo;, &amp;lsquo;candlesticks&amp;rsquo;, &amp;lsquo;virtuoso&amp;rsquo;, &amp;lsquo;highness&amp;rsquo; and &amp;lsquo;pots&amp;rsquo;. This started me thinking, was there a way I could share this experience and use the TF-IDF values as a way of exploring my data set, a way of opening this experience to others, as creating a sort of shifting playful window on the future of the past. So this is my first attempt. Again, public, &lt;a href=&#34;https://wraggelabs.com/fotp/&#34;&gt;go play with it&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-035.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;I&amp;rsquo;ve actually deliberately tried to keep most of the metadata away from this interface because I wanted the words to be the focus. And yeah, it does look a bit like that fridge poetry thing that you can get, and that&amp;rsquo;s quite deliberate. I mean, actually at some stage, I want to add a box down here where you can drag your words down here and make your own sort of collections and tweet them. What it&amp;rsquo;s showing you there is just a random selection of TF-IDF values from my sample, and you can click on any one of these, and it goes away and it sees, first of all, how many years have that value attached to it. If there&amp;rsquo;s only one year, then it&amp;rsquo;ll actually return that year. If there is, let&amp;rsquo;s see if we can find one that has more than one year. If it appears in more than one year, then it pulls out a random selection of those values from those years.&lt;/p&gt;
&lt;p&gt;Okay. Oh no, we&amp;rsquo;ve got 1943. I&amp;rsquo;m not doing a good job of it this time. Anyway, you can have fun with it. And of course, if you want to actually see what&amp;rsquo;s going on, you can click on these and it will actually load the articles here, and you can explore the text of them there and see where the word&amp;rsquo;s popping up.&lt;/p&gt;
&lt;p&gt;Okay, what is this? I&amp;rsquo;m not quite sure. It&amp;rsquo;s not really a discovery interface, although you can find interesting stuff. It&amp;rsquo;s not quite a game, but it is quite fun to explore. To me, I really, I&amp;rsquo;m sort of in love with it at the moment because it actually, I mean, you think about what I&amp;rsquo;m trying to do in terms of recapture the presentness of the past. Our experience is not about just the big stories of the day. Our experience of any moment includes a whole lot of trivial aspects. And I love the way that this sort of brings together Churchill and Corpuscle and Melvin, whoever Melvin is. I love the mix of words, and it just, to me is incredibly evocative. It makes you want to start imagining stories. It makes you want to explore, it makes you want to find out more, but it just has a wonderful, exciting aspect to it itself. So I&amp;rsquo;m not quite sure what I&amp;rsquo;m going to do with it or how I&amp;rsquo;m going to develop it, but really, as I say, I&amp;rsquo;m quite in love with it at the moment, and I hope you&amp;rsquo;ll have a play with it and see what you make of it.&lt;/p&gt;
&lt;p&gt;Could it be a discovery interface? I don&amp;rsquo;t know. It does enable you to get into my dataset, but of course, it&amp;rsquo;s obviously from a rather indirect means and it includes lots of randomness as well. And I&amp;rsquo;m a big fan of randomness in actually developing new ways of discovery. So there you go. Please take it away, enjoy, play. I may not have conquered the meaning of time yet, but experiments like this actually make me think about the form in which I actually present those sorts of arguments and those sorts of ideas. How do we actually create resources which give that sort of sense of the disjunctions and the serendipity? So while I may not have achieved all I wanted to, I&amp;rsquo;ve come away with a better sense of what it is that I&amp;rsquo;m trying to do and what I want to do with this material. So thank you.&lt;/p&gt;
&lt;h2 id=&#34;questions&#34;&gt;Questions&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Marie-Louise Ayres:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Thanks very much, Tim. Just before we open up for questions, there was just a few things that I thought I wanted to say. One is, look, two Australians have corrected more than a million lines of text each. So if you think you couldn&amp;rsquo;t correct 40,000 editorials, you are not being ambitious enough. That&amp;rsquo;s the first thing. The second thing to say is that our own Trove team have found that the only surname that is not in Trove you could do things that are not is Kardashian. And the third is, I guess just thinking about how amazingly creative these visualizations are that Tim has been doing, and I hope you&amp;rsquo;ll ask him about them.&lt;/p&gt;
&lt;p&gt;But the fourth thing I wanted to say is to pick you up on one of your early comments where you said, &amp;ldquo;I haven&amp;rsquo;t got as far as I wanted.&amp;rdquo; Now that&amp;rsquo;s a very interesting construction that includes the past, the present, the future, and a spatial term as well. So maybe you need to think about that. I think that we&amp;rsquo;d agree with the work that Tim&amp;rsquo;s has done. I don&amp;rsquo;t know where he wanted to be, but he&amp;rsquo;s gotten a long, long way and done things that the rest of us just probably haven&amp;rsquo;t even contemplated. So I&amp;rsquo;m hoping you&amp;rsquo;ll ask Tim some questions now and then we&amp;rsquo;ll have more opportunities afterwards. So it&amp;rsquo;s dark out there. So if you want to ask a question, can you just make sure you raise your hand and speak up? So don&amp;rsquo;t be shy. Yes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q&amp;amp;A 1:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;How appropriate would your methodology be for spelling? Where I&amp;rsquo;m coming from is I know the Australian Labor Party and the British Labour Party spell different. And I can remember once going through microfilms that the Sydney Morning Herald in 1920s, and it dawned on me all those spellings were American. And so there must be things happening where you could compare how words are spelled or do all the correctors corrupt your data just by correcting.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tim Sherratt:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I certainly think you could do that sort of analysis. And actually some of the nice examples that the culturomics guys used in the Google Books example was looking at, oh God, the name is&amp;hellip; what irregular verbs, looking at changes in irregular verbs over time, which is quite interesting. But their data set goes back to quite a long way. And there are challenges. One of the challenges in working with Trove, I mean obviously the interface is geared towards discovery at the moment, making sure that people find what they&amp;rsquo;re after. But that means that sometimes if you want to find something exact, it can be a bit tricky. You&amp;rsquo;ve got to know sort of how to turn off the fuzziness in the searching. And sometimes you are foiled in that by the fact that people might&amp;rsquo;ve tagged something and the search by default also searches the tags and the comments.&lt;/p&gt;
&lt;p&gt;So when I did my first, I don&amp;rsquo;t know whether you saw it in my World War I graph, you may have noted that there was a peak, little peak, for World War I actually during the First World War, which is sort of interesting if you think about it. And that&amp;rsquo;s because people had tagged those articles with World War I. So again, this is all about, and one thing which I would always emphasize as we start to do this research, we have to develop our literacy in terms of understanding search interfaces and how they work, and be prepared to go into the documentation and to look at the advanced searches and how they work and actually start experimenting a bit with what different searches bring back, so that you can actually have a good picture. And I think mean, obviously the institutions themselves have a role in communicating this and making it, exposing what&amp;rsquo;s going on behind the scenes. But I think it&amp;rsquo;s an important literacy for researchers going on into the future, being able to pull these things apart to understand exactly what&amp;rsquo;s going on. So you can do those sorts of quite detailed fine-grained comparisons.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q&amp;amp;A 2:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m interested in what year was it the picture of White Australia and was that the people that were actually accepted or what?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tim Sherratt:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Certainly not. Well, okay. I should say that this is part of a broader project called Invisible Australians. If you just go to InvisibleAustralians.org, there&amp;rsquo;s a lot more information about what we&amp;rsquo;re trying to do with these records. That particular set of records, as I think I said, those photographs were just pulled from one series within the National Archives of Australia. And there are many more series like that. They are a series of certificates. Basically, if a person deemed non-white was living in Australia, they wanted to travel overseas and get back into the country again&amp;hellip;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q&amp;amp;A 2:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Those people that lived in Australia rather than people that tried.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tim Sherratt:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;No, because&amp;hellip; Yeah, there was no trying to it. So yes, it is people who are living in Australia. And this is what is particularly interesting about these records because what we want to do, and this isn&amp;rsquo;t Trove related, I&amp;rsquo;m sorry, but what we want to do with those records is to actually try and extract the biographical information which is contained within those certificates in order to sort of find out more about the community who was living under the White Australia policy, people who were living here, whose various activities were restricted in a number of ways by the White Australia policy in all its sort of legislative forms. And we&amp;rsquo;re bringing to bear a number of digital techniques to try and do that. As I said, with that particular case, it was a facial recognition script which pulled out as photographs, but we&amp;rsquo;re also harvesting material from the National Archives and doing some topic modeling. I&amp;rsquo;ll be doing some topic modeling as I showed there on some of the records to try and pull out clusters within those records. So anyway, check it out. It&amp;rsquo;s something I&amp;rsquo;m very passionate about.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q&amp;amp;A 3:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m just wondering, it seems like the [inaudible 00:57:20] is a bit of issue as you talked about. And I guess there&amp;rsquo;s probably a couple of aspects to that. One is like a computer vision, technology is involved and the other part is what do you do after you&amp;rsquo;ve got that? Is there anything clever that you can do to guarantee IDF or whatever else to try and make better quality? I don&amp;rsquo;t know if you can describe how do you see the next few years panning out with that? Do you think that there&amp;rsquo;s a lot of improvement going on?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tim Sherratt:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s certainly a lot of work going on and there are techniques which could be used. There are more developing specific language models that tell you that if you have a certain combination of letters, then after that combination you can expect a certain range of letters but not other letters, for example. And you can use those in sort of probabilistic techniques to sort of go across and see what&amp;rsquo;s likely to be at particular points. And specifically relating to digitized newspapers, there&amp;rsquo;s stuff going on. There&amp;rsquo;s a project in the US called Mapping Texts, which actually works with Chronicling America, the digitized newspapers from America. They actually went through and just actually very similar to what I did, but with a bigger budget and access to Stanford&amp;rsquo;s resources, did things like the topic modeling and word frequencies. It also did what&amp;rsquo;s called named entity recognition, which is pulling out people and places from the texts, but they also ran through their sample set and generated figures for OCR accuracy. And they&amp;rsquo;re fairly similar to my figures actually.&lt;/p&gt;
&lt;p&gt;So there&amp;rsquo;s a lot of obviously recognition of this in Europe. There&amp;rsquo;s actually a particular research group which has been looking at methods of improving OCR. So I mean, and of course there are many more cases which are much more complex than this if you think about old Germanic scripts or something like that. And so there&amp;rsquo;s a lot of interest, a lot of concern, and a lot of work going on, I think, to see. And so I think it&amp;rsquo;s something that we have to be prepared to revisit over time that there are going to be more possibilities for doing stuff with computers as this comes online. And so we should need to assess constantly reassess what&amp;rsquo;s actually possible and see what we can do.
But yeah, I, think, given the general awareness of the problem and the problems that it causes, I mean I think there&amp;rsquo;s certainly going to be a lot. And I think it&amp;rsquo;s really exciting the fact that we have now starting to get these collections all around the world of digitized newspapers and the possibilities that opens up for doing comparative stuff. What I didn&amp;rsquo;t mention with QueryPic is that you can also access New Zealand newspapers. It uses the DigitalNZ API and accesses papers past, so you can actually do graphs for New Zealand papers, but what you can&amp;rsquo;t do meaningfully is compare Australian and New Zealand results. And that&amp;rsquo;s because DigitalNZ, our research currently searches the titles of articles and not the full text. Wouldn&amp;rsquo;t it be really nice if we&amp;rsquo;re both searching the same things and we could do those sorts of comparisons and we could do it with the US and we could do it with Canada. I think there are some really interesting possibilities there.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q&amp;amp;A 4:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I was just going to say &amp;lsquo;stine&amp;rsquo; is very interesting there because in my opinion, it&amp;rsquo;s obviously a column break from Palestine. And so that&amp;rsquo;s a common sort OCR error, E and S being a sort of fragile combination. And not only that, but the rules of breaking, they tend to do chunks like that. And also it shows how the TF-IDF is working. Palestine itself as a whole doesn&amp;rsquo;t appear there because it&amp;rsquo;s not actually that important, but &amp;lsquo;stine&amp;rsquo; got promoted because it&amp;rsquo;s extremely uncommon and &amp;lsquo;Pale&amp;rsquo; got dropped off. And of course &amp;lsquo;Pale&amp;rsquo; would be there&amp;hellip; because it&amp;rsquo;s a very common word.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tim Sherratt:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Yeah, that&amp;rsquo;s nice. But yes, I mean as you go through this, you will see other instances where the sort of OCR issue is coming up again. But that&amp;rsquo;s also another nice example of thinking about how using computational techniques we can start to improve some of the OCR because you are looking at the way words break and seeing if we can use that in some way. Thanks, that&amp;rsquo;s great.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q&amp;amp;A 5:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As just before, you said the word evocative, Tim, I was saying to myself, evocative, and so I wanted to talk about that for a minute rather than talk about a technical thing. It seems to me this is a really interesting, I don&amp;rsquo;t know, I just want you to say more about is this a different kind of historical mode, this kind of desire to treat the past to evoke rather than to necessarily narratize or analyze, or define or pin down. What is it about, is there something distinctive about this evocative mode, which is to do with the digital techniques or, yeah, going on?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tim Sherratt:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Yeah, I mean this is the thing I&amp;rsquo;ve actually just been trying to grapple with over the last few weeks as I started playing with this stuff. And I don&amp;rsquo;t know exactly what it is. All I know is what I feel as you do when you see it. And they do make you start to think in different ways and to imagine and make connections. I mean, I think with your work, with Cath&amp;rsquo;s work on Semble, there are possibilities for creating spaces which encourage people to make connections, to see relationships between things. And I think digital technologies do lend themselves to that because, I don&amp;rsquo;t know, as I said, I actually think randomness is something which is rather undervalued in terms of exploration and discovery. And as you know, there&amp;rsquo;s another project that we worked on called The History Wall at the National Museum of Australia, and that brought together material in quite a random fashion. And it was actually, again, quite evocative in terms of it being able to see the sort of possible relations between items there.&lt;/p&gt;
&lt;p&gt;As I said, I don&amp;rsquo;t know what it lends itself to. What is the process? Is it discovery? Is it a prompt for research questions? I don&amp;rsquo;t know. But it just seems to me to be something which is worth exploring more and I find that it&amp;rsquo;s actually just something which I keep doing, so I must be interested in it in some way. And yeah, I mean obviously, I mean it&amp;rsquo;s something which would definitely be worth thinking about some more. I mean there are all sorts of ways in which you can develop a sort of evocative sense. I mean obviously historical photographs and they give you a different sort of feeling from seeing a text description. So yeah, I don&amp;rsquo;t know. Really interesting question.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q&amp;amp;A 6:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Tim, I&amp;rsquo;d just like to congratulate you on your work. I just find this really interesting and I think it&amp;rsquo;s really great that researchers like yourself take our data and play around with it, as you said, because ultimately some of those ideas do lead to really useful actual applications. And I just wanted to say your OCR accuracy results are actually bang on because we did quite a bit of research on that five years ago before we launched, and it was 65% to 70%, which is of course low, which was why we said how can we change that and get the public to help?&lt;/p&gt;
&lt;p&gt;But you&amp;rsquo;re quite right, as time moves on and as these big data sets are made more open and available, people develop technologies to improve that. So five years ago, that didn&amp;rsquo;t exist, an automated way to improve it. We now know of at least three other people like yourself have figured out how globally they could really increase that OCR accuracy rate and prove it. So I guess that&amp;rsquo;s questions that I would have, how some of this really fantastic research and ones you mentioned in Europe can be built back in to improve on services. But I just wanted to say, did you find it really useful having API being able to get that data? Because I know that was a dream we&amp;rsquo;d had for a long time and I know you waited a long time to go.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tim Sherratt:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Well, I didn&amp;rsquo;t wait. I didn&amp;rsquo;t wait, did I? I actually just went ahead and did it myself.&lt;/p&gt;
&lt;p&gt;Yeah, look, I mean background for those who didn&amp;rsquo;t know is that I built my own unofficial API at one point, which I used to do some experiments. But yeah, I mean it obviously makes a whole lot of things easier. First of all, the point of view of doing the large data dumps, getting that sort of material, and you&amp;rsquo;re not downloading the whole web page and all that stuff on it, but you&amp;rsquo;re just actually getting the data in a structured way. Great. And as anybody who also follows my work will know is that I had a number of frustrating experiences where things change on the web page and everything I created broke and I had to fix it.&lt;/p&gt;
&lt;p&gt;So APIs do away with all that. It&amp;rsquo;s fantastic. But one of the really good things I like about having API access is how easy it makes it to do something like Headline Roulette. If you have an idea and you&amp;rsquo;ve got a bit of coding experience, you can act on it and you actually build something. And that to me is the most exciting aspect and encouraging people to actually experiment. That&amp;rsquo;s what it&amp;rsquo;s all about to me, is creating an environment where people do experiment with this stuff and build things.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.15771695&#34;&gt;&lt;img src=&#34;https://zenodo.org/badge/DOI/10.5281/zenodo.15771695.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
</description>
      <source:markdown>*In 2012, I was lucky enough to be awarded a Harold White Fellowship by the National Library of Australia. I used my time to explore ways of using Trove&#39;s digitised newspapers as data, and presented my work at a public lecture in May 2012. I spoke from notes and never got round to writing it all up. The recording made by the NLA has disappeared from their website, but is [still available in the Internet Archive](https://web.archive.org/web/20140212200542/http://www.nla.gov.au/podcasts/media/Harold-White/tim-sherratt.mp3). The text below is a transcription of the recording made in June 2025 with some minor editing.*

*You can also listen to the audio, [browse the full set of slides](https://timsherratt.au/shed/presentations/nla/), or [download a PDF](https://doi.org/10.5281/zenodo.15771695) from Zenodo.*

&lt;audio src=&#34;https://cdn.uploads.micro.blog/8371/2025/tim-sherratt.mp3&#34; controls=&#34;controls&#34; preload=&#34;metadata&#34;&gt;&lt;/audio&gt;

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/harold-white-2.jpeg&#34; width=&#34;600&#34; height=&#34;430&#34; alt=&#34;&#34;&gt;

*Photograph by Christopher Brothers, 2012, [nla.gov.au/nla.obj-1...](https://nla.gov.au/nla.obj-132272018)*

## I. Beyond discovery

Thanks Marie-Louise and thanks to the library for this great opportunity. And of course thanks to all of you for coming along on a night when I&#39;m sure you&#39;d rather be at home waiting for the budget speech. And this is the API working away here in the background. Okay, well, do I really need to introduce the newspaper database? I suspect I probably don&#39;t for this sort of audience. You&#39;re probably avid users of the digitized newspapers online. Are you? Yeah. I did my doctoral research back in the dark ages before Trove, and of course that meant spending many weeks, if not months, destroying my eyesight using microfilm readers. Using what are quite fragmentary printed indexes to try and find stuff which might be relevant to my study. But now of course more than 60 million newspaper articles online and most importantly, really the full text of these articles is searchable. It is something which we&#39;re quite familiar with now, but it is something which is quite revolutionary in many ways.

This unprecedented access to a vast volume of material which documents the ordinary lives of Australians is already changing historical practice. We can now go beyond the well-known events, the big stories and explore the small stories, the fragments, the glimpses of lives which might not otherwise be recorded, but this access comes with a cost. What happens when we do a search and instead of getting 10 results or 100 results, we get 10,000 results or 100,000 results? How do we start to use or understand that sort of thing? What do we do when instead of the clarity and excitement of discovery, we end up with the anxiety and confusion that can come with overwhelming abundance?

Fortunately though, there are a growing number of digital tools which we can turn to. Tools and technologies which enable us to manage this deluge and to explore large volumes of text rather than sort of single search results. Tools that enable us to zoom out of our search results and have a look at the big picture to understand the trends and the patterns to see what&#39;s going on. For example, perhaps we might want to try and track events over time. Have a look for example, this graph shows the prevalence of the words drought and floods in the newspaper database over time.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-003.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

So we can actually look at that and use it as a way where we can map it to the specific events. And we can see here, of course, this is the federation drought at this point. We could also start to look for patterns that aren&#39;t easy to see within your sort of normal list of search results.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-004.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

I was interested in having a look at how the word decade might be used. So I searched for decade and I found, as you can see, that there&#39;s these nice sort of regular peaks and I was wondering why have we got these such these regular peaks? And I did a bit more digging and I discovered why. That red line shows the usage of the word census. And you see here how the little peaks sit on top of each other?

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-005.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

So obviously the time we talk most about decades is when a census has come out. So again, this is a sort of pattern which would be very hard to find other ways by just sort of working through our list of search results. We can also use these sorts of technologies for exploring changes in language, the way we talk about things, the labels we use.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-006.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

This is an example which I&#39;ve taken from the National Library of New Zealand searching through New Zealand newspapers in this case. But what they&#39;ve done is to look at the change in usage of the name for the south island, which was apparently, I didn&#39;t know this originally called Middle Island and changed to the South Island. And so you can see here this sort of process of transition happening before South Island takes over completely.

We can also challenge our expectations. Now, I was always of the belief that the traditional name for people from English cultural background of that chap who wears a red suit and comes around at Christmas time was Father Christmas. And then in recent years that has been supplanted by the sort of Americanized Santa Claus, but it seems I&#39;m wrong.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-007.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

The red line here is Santa Claus, the blue line is Father Christmas. And so if we look here from the late 19th century to the early 20th century, Santa Claus is definitely winning. What&#39;s interesting though, really interesting, is when we get the change over. Any guesses as to what&#39;s going on there?

&gt; **Audience:** Coke advertising.

Pardon?

&gt; **Audience:** Coca-Cola advertising.

Well actually, I don&#39;t know. My hypothesis, what&#39;s happening, is this is around sort of 1914 and it seems that over the war period, Father Christmas starts to win over the top of Santa Claus. So whether, I mean this is pure hypothesis at this point, and it&#39;s something which would be interesting to explore, whether it&#39;s the Germanic sound of Santa Claus, it sort of lapses in popularity or perhaps there are other causes, completely other circumstances. But that&#39;s the value of these sorts of things that they do allow you to ask some questions and to prompt you to do some other sorts of investigation.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-008.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

Now these graphs which I&#39;m showing you, were all created by a tool I developed called [QueryPic](https://glam-workbench.net/trove-newspapers/querypic/). And we won&#39;t just show you the slide, we&#39;ll actually use it. I want a word. Anybody give a word?

&gt; **Audience:** Brooch.

Broach?

&gt; **Audience:** Because yours is nice.

I&#39;m not sure it&#39;s going to show anything. Yeah, brooch.

&gt; **Audience:** Know what we&#39;ve done &#39;automaton&#39;, the one we talked about last week.

You have to spell it for me.

&gt; **Audience:** A-U-T-O-M-A-T-O-M. Actually, correct. That&#39;s what you...

No, no, no, this is right. Okay. So what this is doing now, it&#39;s actually going off to the Trove API and it&#39;s getting the total results. I mean it&#39;s actually a very simple tool. All it&#39;s doing is it&#39;s taking your query and it&#39;s searching for each year across the span of the newspaper database, and it&#39;s getting the total number of results for each year and it&#39;s then presenting them in the form of the graph. As I say, it&#39;s very simple, but it&#39;s also quite effective as you can see. And it&#39;s useful and it&#39;s also quite fun. And what it gives you is the ability to quickly explore a hunch, to get a sort of sense of context or to start exploring, to start framing a more specific research question without spending... there we go... without spending days searching or tabulating as you would normally have to do. So you can see how easy it is to use and if you want to actually compare that to something else, you can just type in another word.

Okay, now there are obvious limitations to a tool like this. There&#39;s a lot to unpack. I wouldn&#39;t want to say that it&#39;s evidence because there is so many assumptions built into the back end of it. Questions about what the search engine is actually giving you back, different usages of terms, obviously the contexts and things like the quality of the OCR itself. You know there are a whole lot of stuff. But despite all that, I think it is quite useful, as I said, in terms of allowing you to explore things quite quickly and to follow your hunches. I regard it as a starting point, not as an end. 

Now, but there are some folks... let me see if it&#39;s going to finish.. there are some folks who are a bit more confident about techniques such as this and who would suggest that not only can they provide evidence, but they can actually be used to develop mathematical representations of past behavior.

## II. Finding formulas

You may have heard of the Culturomics project from Harvard University. These guys got access to the full corpus of Google&#39;s digitized books. So 5 million books, the text of 5 million books. They pulled it all apart. They did a bit of cleaning up of the metadata, all sorts of stuff, and then they started searching it and they started to see what they, but they could pull out of it. And when they started searching, they noticed all sorts of patterns appearing and they argued that these patterns could actually form the basis for what they said was a new science of culture, hence culturomics.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-010.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

There&#39;s a lot I might say about that, but I just want to look at one example. Okay, this is an example which in their published paper in Science, they called &#39;We forget&#39;, and I generated it using an online tool called the Ngram Viewer. You can go and do this yourself if you like. And what it&#39;s showing as you might be able to see is it&#39;s searching for years used within the text. So 1883, 1910, 1950. It&#39;s pulling out all the instances where those labels are used within the text, where those terms are used. And there does obviously seem to be some sort of pattern. And the research has noticed that the graphs have a characteristic shape, obviously rapid ascent and then a decline. But they also notice changes. Of course, the size of the peaks is changing over time, getting higher. They say that this is indicating a greater focus on the present and the rate of decay is increasing, so that the peak is actually dropping away faster. And they say from this, we are forgetting our past faster with each passing year.

I thought it would be interesting to repeat this experiment using QueryPic. So I did. It looks a bit different.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-011.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

I mean, before we could interpret this difference, of course, there&#39;s a lot that we would want to ask, a lot of, first of all, methodological questions. Again, exactly what are we searching in the two instances and how can we compare the searching, the books in one instance to the newspapers in others - dates obviously play a different role in newspapers than they do in books. But it was actually the conceptual issues, which really struck me in relation to this example and in particular the assumption that we can compare the past, present, and future uses of these labels as if we are talking about the same thing: as if the label 1950 means the same thing before 1950, in 1950 and after 1950. The names for events and periods that we assign, that we share, that we use are themselves the products of historical processes. They slip, they shift, they change.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-012.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

We all know what we mean by the Great Depression. Where&#39;s the Great Depression on this graph? So in terms of the usage at the time, the usage of the term &#39;Great Depression&#39; was actually greater in the 1890s than in the 1930s.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-013.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

We&#39;re very familiar with the usage of the, we&#39;re talking about &#39;black&#39; days like Black Tuesdays. Black Fridays of course, is the one we&#39;re most familiar with. And in Australia, these labels are generally attached to bushfires of course, and that&#39;s the context where we generally understand them and use them and remember them. And over here, of course we have Black Friday. So what&#39;s this big peak here? It&#39;s not a bushfire. It refers to the Victorian government&#39;s mass sackings of senior civil servants and judges in 1878. Obviously it was an extremely important event at the time, an extremely important event in government in Victoria, but it doesn&#39;t quite figure in our collective memory in the same way as Black Friday does.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-014.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

One of my early experiments with QueryPic was to look at the question of when did the Great War become the First World War? At what point did we stop thinking about the Great War as the war to end all wars and realize that it was one in a series of global conflicts? And the graph really does a nice job of confirming our expectations, I suppose, in that we see a nice crossover late in 1941, which if we were thinking about the passage of the war would be about when we probably would expect. But what&#39;s missing from this, what&#39;s missing of course is just the war.

Just as with the Great Depression and with the Black Wednesday, what&#39;s hard is trying to recapture the moment as it was happening, the sense for want of a better word, of present-ness. Now, if we go back to &#39;We forget&#39; what are we doing when we&#39;re talking about one of these dates? I mean, if we think of the present as providing a line there - past, present, future - on the past side, what are we doing? We&#39;re anticipating. We&#39;re predicting. Perhaps, we&#39;re dreading. And present, we&#39;re experiencing, we&#39;re enjoying, maybe suffering. In the future, we are remembering, we are regretting, perhaps reflecting. So instead of lumping all these together, it seems to me that we should be teasing them out and exploring their different interconnections.

We should be trying to give the past back its own sense of the present. And this in essence was the modest and thoroughly achievable goal of my Harold White Fellowship. I wanted to explore the possibilities of the digitized newspaper collection in supporting this sort of rich temporal contextualization using digital methods to recover the pasts, the presents and the futures of any moment in our history. I have to admit, I haven&#39;t got very far yet, and Marie-Louise has been doing a good job of reassuring me that sometimes the fruits of these things take a while to develop. Now, there are a number of reasons why I haven&#39;t gotten as far as I wanted, but I do have a few sort of sketches that I want to share with you.

## III. The future of the past

Okay. What I decided to do is to try and create a sort of manageable sample set. So I decided to work with articles, which included the phrase, &#39;the future&#39;, in the heading or the first four lines of the newspaper articles, and that&#39;s one of the facets you can use within Trove. Why did I limit it in this way? Well, I&#39;ve been doing a lot of different work in Trove as Marie-Louise said. One project I&#39;ve been working on was looking at ways of finding editorials within Trove and exploring the content of editorials over time. And in doing that, I discovered a number of frustrating things, one of which is sometimes the articles aren&#39;t divided up as nicely as we want them to be. Particularly with editorials, editorials on different subjects are often joined together, so it&#39;s difficult to separate out the specific ones that you want, but I thought by limiting my search in this way, that it increases my chance of relevance. And it also brought the number of matches down to what I thought was a reasonably manageable 60,000 or so. 

So I started harvesting those 60,000 articles. I have over time been developing a number of tools working with Trove. One of which is a harvester, which enables you to get the data in bulk. And of course that&#39;s necessary if you&#39;re going to do this sort of large-scale analysis on it. I modified my existing harvesting tools to save the results directly into the database and when the API became available, I modified it to use the API, which makes a lot of things easier. Now, after about 40,000, I thought I probably had enough, and I decided I&#39;d trust in Trove&#39;s relevance ranking and just work with that set.

And then it was time to do some cleaning. Now, Trove&#39;s crowdsourced OCR correction project has been a wonderful success, of course, but it&#39;s worth noting that with the sample of articles that I harvested for this project, only 2% had any corrections at all. So 98% totally uncorrected, totally untouched. While I couldn&#39;t hope to correct all of those articles myself, I could at least try to reduce some of the noise, which is created by these sorts of OCR errors. So I developed a series of scripts which would try to clean up some of that OCR output. First of all, it corrected some errors which are fairly consistent and hopefully fairly unambiguous OCR errors. And you can test yourself here. What&#39;s that meant to be?

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-018.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

&gt; **Audience:** His.

Pardon?

&gt; **Audience:** His.

Nope. The.

What about this one? Nah, the. This one? No, of. Should get that one. Ah, yep. And you can check. There we go. Look of, ah, yep. So there are a series of these which I could through a script just fix up. I then checked each word in the text against a series of dictionaries and word lists, and this included a word list of places which I actually extracted from the places Gazetteer provided by Geoscience Australia. Anything which didn&#39;t seem to match up, I marked up in a way that I could extract it later if I wanted to. And all of this, you&#39;ve got to understand, went through a lot of trial and error, just trying stuff out, seeing what it produced, trying it again, fiddling with it lots and lots and lots of trial and error.

But after that, I could do some fun things. You&#39;re all of course familiar with word clouds, but I bet you haven&#39;t seen a non-word cloud. This is my non-word cloud. 

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-019.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

Now, of course, the big question is what is &#39;others&#39; doing there? I don&#39;t know. For some reason my word list didn&#39;t like the word others, but of course you can see here some more sort of consistent OCR errors. There&#39;s another &#39;the&#39; and another &#39;the&#39;, and that would be a &#39;be&#39;, in most cases. And we also see where words have been split up. We&#39;ve got a &#39;tralia&#39; down there. Oh, that&#39;s a &#39;which&#39; obviously. So it&#39;s actually quite useful visualizing in this way because I can then feed that back into my process of cleaning. I can see where the common errors are, and I can start to feed that back into the process.

For each article that I processed in this way, I generated an accuracy score, which was simply the number of recognized words divided by the total number of words within the article. And I could use these scores to develop a couple of overviews. 

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-020.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

So of my set, this is only just my sample of course, so this is OCR accuracy over time. There aren&#39;t many articles in this earlier period, so it&#39;s probably not worth worrying about too much. But what&#39;s interesting is this decline here down to the 1920s where we&#39;re going below the 80% mark. Why is that? I&#39;ve got no idea. There are a whole lot of variables which could be certainly involved here, whether it&#39;s the fonts, whether it&#39;s the quality of the printing, the quality of the paper, the quality of the microfilming. I don&#39;t know. It&#39;s something which would be interesting to explore further and to and investigate.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-021.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

We can also have a look at the poorest performing newspapers. So the *Perth Gazette and West Australian Times* didn&#39;t do too well, and it got 58% in my scorecard. Again, this is only a select sample, so I&#39;m not quite sure what you can read into any of this, but it&#39;s sort of interesting. These figures weren&#39;t particularly important for my work, but I do think that the general issue of OCR quality is really vitally important, particularly as we make more and more scholarly use of these sorts of collections in bulk. I mean, obviously we need to improve the quality, but we also need to expose our assumptions about the OCR quality that underlie our work so that when we are putting forward something, some sort of analysis of the text, we&#39;ve got a way of communicating the quality of the material that we&#39;re working with.

I then decided to make my sample set even more manageable by selecting just the first 10,000 articles, which had accuracy figures of over 80%. So I used my scores and I went through and I decided just to choose those ones which seemed to go pretty well. Of course, as any good digital humanities person does, I then started counting words. 

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-022.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

As with most of this stuff that I&#39;m showing you tonight, [it&#39;s online and you can go and play with it](https://timsherratt.au/shed/presentations/nla/pages/frequencies.html) and use it yourself. So this shows the word frequencies over time, and there&#39;s a time slider here, and you can just drag it along and see what&#39;s happening in different years. Now, nothing really significant jumps out at me from looking at the word frequency clouds here. I mean, what is sort of interesting I suppose, is the preponderance of &#39;would&#39; and &#39;could&#39;, which I suppose confirms the future orientation of the sample set that I&#39;m working with. And there may well be other things within there that jump out at you. And so as I say, jump online and have a look and have a play with this and see what you can make of it.

I mean, word frequencies... Okay. So word frequencies can be interesting in getting a sort of overall picture of a large amount of text and starting to track some changes over time. But this sort of word frequency tells you what&#39;s common. It doesn&#39;t tell you what&#39;s distinctive. It doesn&#39;t tell you what&#39;s interesting in an article. And another measure we can use to try and get at the distinctiveness of a piece of text is something called TF-IDF. It&#39;s an acronym, term frequency, inverse document frequency. And what it does, it looks not just at the frequency of a word within a particular piece of text, it also looks at the frequency of that term across a collection of texts. So a word that is common in a particular article, but not very common in a collection of articles will appear as more significant, more heavily weighted in its TF-IDF value.

You use TF-IDF values all the time. They&#39;re used by search engines in calculations of similarity. They can take the TF-IDF values, convert it into a sort of mathematical format and use it to calculate the similarity between two pieces of text. And the results of calculating TF-IDF values for collections like this are pretty interesting, and I&#39;ll just show you a little comparison.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-023.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

So on the left-hand side here, we have, this is 1939. The top 10 words on the left-hand side are just the plain frequency values, and here are the TF-IDF values. So you see we&#39;re getting at something quite different and something quite interesting here. Obviously 1939, Hitler doesn&#39;t figure in this list of terms. He&#39;s at the top in this one. But we also get these really odd things like midget and roundabout. I found it really interesting producing these values, and I found them quite sort of evocative and interesting and wanted encouraging me to explore more, and I&#39;m going to talk some more about this a bit later.

But I finally just wanted to show you one other way of understanding a collection of texts, and that&#39;s through a thing called topic modeling. There&#39;s a lot of topic modeling going on in the digital humanities at the moment, and there are a number of good blog posts, which I&#39;ll put links to from here, which tell you about what topic modeling is. I&#39;m just going to quickly race through it. Basically, I use a piece of software called Mallet. I pointed Mallet at my collection of texts, told that I wanted to define 10 topics, that is 10 clusters of articles within those texts, and it just did it.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-024.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

And what it came back with is these lists of words which are grouped according to the topics which it believes existed. You can then go through and look at these lists of words and start to interpret them to try and understand what those topics are. And most of them are pretty clear. This of course, is the topic that tells me that I still didn&#39;t clean up the OCR enough, but it&#39;s interesting that it brought them all together.

We&#39;ve got, I mean here we&#39;ve got trade, here we&#39;ve got technology, here we&#39;ve got land/rural, here we&#39;ve got international relations, here we&#39;ve got government, and here we&#39;ve got home and society. And it&#39;s amazing once you run these things, how much sense the topics actually make to you. 

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-025.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

And it also goes through and for each article in your collection, it weights them according to these topics. So you can then go and for each article you can see which is the most heavily weighted topic for that article, and you can calculate the number associated with each topic and you can do something like that. Okay, that&#39;s not terribly instructive as it is, but you can, I won&#39;t show you now, but if you click on that and [go to the live version](https://timsherratt.au/shed/presentations/nla/pages/topic_totals.html) and you click on the legend down the bottom there, you can actually take away some of the lines so you can actually see what&#39;s happening underneath, so you can just see the lines that you&#39;re interested in.

But basically, I want to do a lot more work on these topics at this stage, and I haven&#39;t really done a lot of interpretation of them. I want to see how I&#39;m actually using those weightings and find better ways of actually looking at them. So anyway, here I am. Not a lot of interpretation at this stage. No great insights. I have some data. I have a data set, and I&#39;m going to be continuing to play with it. And as I&#39;ve said, all this stuff is available online, so you&#39;re welcome to come and play with it too and see what you can make of it. Now, you may think that I&#39;ve gone into a lot of tedious detail about what I did. Well, I&#39;ve actually saved you from a lot of the gory details.

## IV. Meanings for mining

The truth of much research in the digital humanities is that large amounts of time are spent yak shaving, and data munging. If you don&#39;t know the term &#39;yak shaving&#39;, it&#39;s that process that we&#39;re all familiar with, when you start doing a particular task and you realize, in order to achieve that task, you have to actually do something else or research something else, and that actually continues into infinite regression until you find yourself doing something which seems totally unrelated to the task that you started with. I&#39;ve had a lot of that recently. There were lots of issues just involved in using this data and starting to manipulate it. As I&#39;ve said before, the issue of the OCR quality is crucial, and we have to be upfront about the problems and continue to look for the most effective solutions. We have to talk about questions of selection and completeness. What&#39;s actually in Trove? How does it change and how does this sort of influence the results that we get?

One of my examples here is a thing called the Atomic Age Exhibition, which toured around Australia in 1948-49. It was a big thing. Many, many thousands of people visited. It was at the Easter Show in Sydney. If you search in Trove for Atomic Age Exhibition, you&#39;ll find quite a lot of results coming from the Courier Mail in Brisbane. You&#39;ll find virtually nothing from Sydney and Melbourne, and you might be inclined to think that the exhibition didn&#39;t actually go to Sydney and Melbourne. Why is there nothing in Sydney and Melbourne? Because the exhibition was sponsored by the Herald in Melbourne and by the Daily Telegraph in Sydney, and both of those titles are currently not in the newspaper database.

So we&#39;ve got to bring these sorts of questions and perspectives as we start to do this research. Another barrier, which I started to butt my head up against in doing this was that of computing power. Generating the TF-IDF values for my sample took about a day and a half on my laptop. And of course, then you realize, you did something stupid and you have to do the whole thing again. And I did wonder at various times whether I was reaching the limits of what&#39;s practically possible for one bloke and his laptop and wondering whether my sort of piecemeal efforts will be blown away by academic teams with access to research funds, bright young graduate students, and time on a supercomputer.

Now, this list of problems and concerns might seem a bit depressing, and it might not be what you expected from this talk, but I want to reassure you, there are digital tools that make it easy to get started and start exploring the possibilities. QueryPic of course, there are other things like [Voyant](https://voyant-tools.org), which is a great online tool for starting to do text analysis, but sooner or later you&#39;re going to have to confront some pretty hefty questions. But hey, that&#39;s just history. The past is messy and it raises difficult questions about things like selection and interpretation. The issues aren&#39;t necessarily new, it&#39;s just that they&#39;re raised in a bigger, more technically challenging context.

But what does really bug me is a nagging feeling that I should be taking statistics more seriously. That in constructing the sort of examples which I&#39;ve been showing and the tools that I&#39;ve been demonstrating, that I should actually be less impressionistic and more rigorous, as if I&#39;m sort of not doing justice to the vast computing power that I have at my disposal. But I don&#39;t want to do that. In January, I was at the American Historical Association meeting and I was actually able to see the culturomics guys live in person doing their spiel. And as they described their vision for a new science based on access to these huge cultural data sets, I tweeted, &#34;Yeah, I want to use big data to tell better, more compelling, more human stories.&#34;

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-027.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

The British historian Tim Hitchcock has similarly described his own unease that the demands of big data seem to be moving him towards a more positivist style of history. In the humanities, we&#39;ve been really fortunate to make use of many decades of research into things like information retrieval. We&#39;ve adopted many of their concepts, their tools, and their formulae, but we&#39;ve also adopted some of their language. And so we talk about what we&#39;re doing as mining. Mining is an extractive process. We dig stuff up, we pull it out of the ground. But this seems to be pretty much the opposite of what I want to do. I mean, I do want to find structures and separate them out for different types of analysis, but then again, I want to put them back together. I want to observe them in different contexts as rich and as complex as possible. How do we do that?

Well, first of all, we have to work out better ways of incorporating these sorts of big data perspectives into the narratives that we write. Just as QueryPic gives you that opportunity to sort of zoom out and get a big picture, I think we have to take control of the zoom and use it to our advantage. And this, by the way, probably means developing new forms of publication that allow easier and better integration of data and text. It&#39;s challenging, but there&#39;s not much point to dwelling on the dangers and problems of big data, and as Tim Hitchcock concludes, we simply need to get on with it.

## V. Screwmeneutics and deformance

The second approach is to foster acts of creative subversion, to use digital tools in new ways. Literary scholars within the digital humanities talk about the possibilities of deformance, of using computational methods to change texts in ways that can open them up to new and new critical perspectives. Stephen Ramsey also talks about moving beyond traditional forms of search and browse and admitting &#39;screwing around&#39; as a legitimate research methodology. Of course, historians don&#39;t want to start deforming their sources or do they? 

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-029.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

This is an experiment I created called [The Real Face of White Australia](https://www.realfaceofwhiteaustralia.net/faces/?rsort=3). I always get a bit teary when I put this up. Well, what I&#39;ve done here is actually use computer vision software to extract portrait photographs from certificates, which were used in the administration of the White Australia policy. These are records held by the National Archives of Australia. There&#39;s several thousands of these, and this is just from one series, and you can just keep scrolling and scrolling forever, or almost forever. So by manipulating the sources in these ways, by extracting those photographs, I&#39;ve created a new way of seeing these records and it&#39;s quite powerful.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-030.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

But we can also be playful. You may have seen this. This is a little game that I created using the newspaper database. Again, it&#39;s very simple. It just picks a newspaper article at random from the database and asks you to try and guess the year in which it was published. So any guesses for this one? What would we say? Let&#39;s say 1850... That&#39;s a bit later than that... Let&#39;s see, it&#39;s earlier. Okay, so you can keep going like this. You can go and try it out yourself later. As I said, it&#39;s very simple, but it&#39;s also strangely addictive. And of course, it&#39;s also a way of exploring the content of Trove by screwing around.

QueryPic, the Real Face of White Australia and newspaper roulette, my [Headline Roulette](https://headlineroulette.net), also have something else in common. They are public. I want people to use them. I want people to have fun. I want people to be moved. I want people to find things to be surprised and to do history.

Just yesterday I received an email from a self-confessed Australian history addict, oh no, Australian history fanatic, sorry. And she had become addicted to Headline Roulette. She wanted to know if I could add a facility for users to actually save their scores. So presumably they could go back and see if they&#39;d improved or share them with their friends. So obviously the next step is the Facebook application. I mean, other people have described to me how scrolling through the Real Face of White Australia brought them to tears. And I&#39;ve come to realize that these sorts of interactions really mean more to me than a footnote in an academic article. I&#39;ve probably just killed my hopes of an academic career there. I want to use digital tools, I do want to use them to deform history. I want to use it to deform history in a way that makes it accessible to new audiences in new ways. And so I present to you in honor of my Howard White Fellowship, a new experiment.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-031.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

Now, I described to you before the process involved in calculating the TF-IDF values. What I didn&#39;t describe was the fun that I had when I was doing it. It was really quite exciting and amusing and funny and all sorts of things, watching the words fly past on the screen. And as I completed each year, I would had a little script which would show me the top 20 words for that year. And anybody who follows me on Twitter, we all have a good picture of what was going on there because I couldn&#39;t help but share this. So I mean, it tells their own story. I mean, it was really like a sort of wonderful puzzle as I say there, as they all came up. And then I started tweeting some of them.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-032.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-033.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

This was a nice one. I like the &#39;hitler&#39; with &#39;mudguards&#39;, &#39;duchess&#39;, &#39;opossum&#39;, &#39;hollywood&#39; and &#39;canberra&#39;. 

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-034.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

And as I said here, of course, I mean there&#39;s got to be a novel in &#39;prince&#39;, &#39;pronunciation&#39;, &#39;keyboard&#39;, &#39;zulu&#39;, &#39;begged&#39;, &#39;unbent&#39;, &#39;diddle&#39;, &#39;candlesticks&#39;, &#39;virtuoso&#39;, &#39;highness&#39; and &#39;pots&#39;. This started me thinking, was there a way I could share this experience and use the TF-IDF values as a way of exploring my data set, a way of opening this experience to others, as creating a sort of shifting playful window on the future of the past. So this is my first attempt. Again, public, [go play with it](https://wraggelabs.com/fotp/).

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-035.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;

I&#39;ve actually deliberately tried to keep most of the metadata away from this interface because I wanted the words to be the focus. And yeah, it does look a bit like that fridge poetry thing that you can get, and that&#39;s quite deliberate. I mean, actually at some stage, I want to add a box down here where you can drag your words down here and make your own sort of collections and tweet them. What it&#39;s showing you there is just a random selection of TF-IDF values from my sample, and you can click on any one of these, and it goes away and it sees, first of all, how many years have that value attached to it. If there&#39;s only one year, then it&#39;ll actually return that year. If there is, let&#39;s see if we can find one that has more than one year. If it appears in more than one year, then it pulls out a random selection of those values from those years.

Okay. Oh no, we&#39;ve got 1943. I&#39;m not doing a good job of it this time. Anyway, you can have fun with it. And of course, if you want to actually see what&#39;s going on, you can click on these and it will actually load the articles here, and you can explore the text of them there and see where the word&#39;s popping up. 

Okay, what is this? I&#39;m not quite sure. It&#39;s not really a discovery interface, although you can find interesting stuff. It&#39;s not quite a game, but it is quite fun to explore. To me, I really, I&#39;m sort of in love with it at the moment because it actually, I mean, you think about what I&#39;m trying to do in terms of recapture the presentness of the past. Our experience is not about just the big stories of the day. Our experience of any moment includes a whole lot of trivial aspects. And I love the way that this sort of brings together Churchill and Corpuscle and Melvin, whoever Melvin is. I love the mix of words, and it just, to me is incredibly evocative. It makes you want to start imagining stories. It makes you want to explore, it makes you want to find out more, but it just has a wonderful, exciting aspect to it itself. So I&#39;m not quite sure what I&#39;m going to do with it or how I&#39;m going to develop it, but really, as I say, I&#39;m quite in love with it at the moment, and I hope you&#39;ll have a play with it and see what you make of it.

Could it be a discovery interface? I don&#39;t know. It does enable you to get into my dataset, but of course, it&#39;s obviously from a rather indirect means and it includes lots of randomness as well. And I&#39;m a big fan of randomness in actually developing new ways of discovery. So there you go. Please take it away, enjoy, play. I may not have conquered the meaning of time yet, but experiments like this actually make me think about the form in which I actually present those sorts of arguments and those sorts of ideas. How do we actually create resources which give that sort of sense of the disjunctions and the serendipity? So while I may not have achieved all I wanted to, I&#39;ve come away with a better sense of what it is that I&#39;m trying to do and what I want to do with this material. So thank you.

## Questions

**Marie-Louise Ayres:**

Thanks very much, Tim. Just before we open up for questions, there was just a few things that I thought I wanted to say. One is, look, two Australians have corrected more than a million lines of text each. So if you think you couldn&#39;t correct 40,000 editorials, you are not being ambitious enough. That&#39;s the first thing. The second thing to say is that our own Trove team have found that the only surname that is not in Trove you could do things that are not is Kardashian. And the third is, I guess just thinking about how amazingly creative these visualizations are that Tim has been doing, and I hope you&#39;ll ask him about them.

But the fourth thing I wanted to say is to pick you up on one of your early comments where you said, &#34;I haven&#39;t got as far as I wanted.&#34; Now that&#39;s a very interesting construction that includes the past, the present, the future, and a spatial term as well. So maybe you need to think about that. I think that we&#39;d agree with the work that Tim&#39;s has done. I don&#39;t know where he wanted to be, but he&#39;s gotten a long, long way and done things that the rest of us just probably haven&#39;t even contemplated. So I&#39;m hoping you&#39;ll ask Tim some questions now and then we&#39;ll have more opportunities afterwards. So it&#39;s dark out there. So if you want to ask a question, can you just make sure you raise your hand and speak up? So don&#39;t be shy. Yes.

**Q&amp;A 1:**

How appropriate would your methodology be for spelling? Where I&#39;m coming from is I know the Australian Labor Party and the British Labour Party spell different. And I can remember once going through microfilms that the Sydney Morning Herald in 1920s, and it dawned on me all those spellings were American. And so there must be things happening where you could compare how words are spelled or do all the correctors corrupt your data just by correcting.

**Tim Sherratt:**

I certainly think you could do that sort of analysis. And actually some of the nice examples that the culturomics guys used in the Google Books example was looking at, oh God, the name is... what irregular verbs, looking at changes in irregular verbs over time, which is quite interesting. But their data set goes back to quite a long way. And there are challenges. One of the challenges in working with Trove, I mean obviously the interface is geared towards discovery at the moment, making sure that people find what they&#39;re after. But that means that sometimes if you want to find something exact, it can be a bit tricky. You&#39;ve got to know sort of how to turn off the fuzziness in the searching. And sometimes you are foiled in that by the fact that people might&#39;ve tagged something and the search by default also searches the tags and the comments.

So when I did my first, I don&#39;t know whether you saw it in my World War I graph, you may have noted that there was a peak, little peak, for World War I actually during the First World War, which is sort of interesting if you think about it. And that&#39;s because people had tagged those articles with World War I. So again, this is all about, and one thing which I would always emphasize as we start to do this research, we have to develop our literacy in terms of understanding search interfaces and how they work, and be prepared to go into the documentation and to look at the advanced searches and how they work and actually start experimenting a bit with what different searches bring back, so that you can actually have a good picture. And I think mean, obviously the institutions themselves have a role in communicating this and making it, exposing what&#39;s going on behind the scenes. But I think it&#39;s an important literacy for researchers going on into the future, being able to pull these things apart to understand exactly what&#39;s going on. So you can do those sorts of quite detailed fine-grained comparisons.

**Q&amp;A 2:**

I&#39;m interested in what year was it the picture of White Australia and was that the people that were actually accepted or what?

**Tim Sherratt:**

Certainly not. Well, okay. I should say that this is part of a broader project called Invisible Australians. If you just go to InvisibleAustralians.org, there&#39;s a lot more information about what we&#39;re trying to do with these records. That particular set of records, as I think I said, those photographs were just pulled from one series within the National Archives of Australia. And there are many more series like that. They are a series of certificates. Basically, if a person deemed non-white was living in Australia, they wanted to travel overseas and get back into the country again...

**Q&amp;A 2:**

Those people that lived in Australia rather than people that tried.

**Tim Sherratt:**

No, because... Yeah, there was no trying to it. So yes, it is people who are living in Australia. And this is what is particularly interesting about these records because what we want to do, and this isn&#39;t Trove related, I&#39;m sorry, but what we want to do with those records is to actually try and extract the biographical information which is contained within those certificates in order to sort of find out more about the community who was living under the White Australia policy, people who were living here, whose various activities were restricted in a number of ways by the White Australia policy in all its sort of legislative forms. And we&#39;re bringing to bear a number of digital techniques to try and do that. As I said, with that particular case, it was a facial recognition script which pulled out as photographs, but we&#39;re also harvesting material from the National Archives and doing some topic modeling. I&#39;ll be doing some topic modeling as I showed there on some of the records to try and pull out clusters within those records. So anyway, check it out. It&#39;s something I&#39;m very passionate about.

**Q&amp;A 3:**

I&#39;m just wondering, it seems like the [inaudible 00:57:20] is a bit of issue as you talked about. And I guess there&#39;s probably a couple of aspects to that. One is like a computer vision, technology is involved and the other part is what do you do after you&#39;ve got that? Is there anything clever that you can do to guarantee IDF or whatever else to try and make better quality? I don&#39;t know if you can describe how do you see the next few years panning out with that? Do you think that there&#39;s a lot of improvement going on?

**Tim Sherratt:**

There&#39;s certainly a lot of work going on and there are techniques which could be used. There are more developing specific language models that tell you that if you have a certain combination of letters, then after that combination you can expect a certain range of letters but not other letters, for example. And you can use those in sort of probabilistic techniques to sort of go across and see what&#39;s likely to be at particular points. And specifically relating to digitized newspapers, there&#39;s stuff going on. There&#39;s a project in the US called Mapping Texts, which actually works with Chronicling America, the digitized newspapers from America. They actually went through and just actually very similar to what I did, but with a bigger budget and access to Stanford&#39;s resources, did things like the topic modeling and word frequencies. It also did what&#39;s called named entity recognition, which is pulling out people and places from the texts, but they also ran through their sample set and generated figures for OCR accuracy. And they&#39;re fairly similar to my figures actually.

So there&#39;s a lot of obviously recognition of this in Europe. There&#39;s actually a particular research group which has been looking at methods of improving OCR. So I mean, and of course there are many more cases which are much more complex than this if you think about old Germanic scripts or something like that. And so there&#39;s a lot of interest, a lot of concern, and a lot of work going on, I think, to see. And so I think it&#39;s something that we have to be prepared to revisit over time that there are going to be more possibilities for doing stuff with computers as this comes online. And so we should need to assess constantly reassess what&#39;s actually possible and see what we can do.
But yeah, I, think, given the general awareness of the problem and the problems that it causes, I mean I think there&#39;s certainly going to be a lot. And I think it&#39;s really exciting the fact that we have now starting to get these collections all around the world of digitized newspapers and the possibilities that opens up for doing comparative stuff. What I didn&#39;t mention with QueryPic is that you can also access New Zealand newspapers. It uses the DigitalNZ API and accesses papers past, so you can actually do graphs for New Zealand papers, but what you can&#39;t do meaningfully is compare Australian and New Zealand results. And that&#39;s because DigitalNZ, our research currently searches the titles of articles and not the full text. Wouldn&#39;t it be really nice if we&#39;re both searching the same things and we could do those sorts of comparisons and we could do it with the US and we could do it with Canada. I think there are some really interesting possibilities there.

**Q&amp;A 4:**

I was just going to say &#39;stine&#39; is very interesting there because in my opinion, it&#39;s obviously a column break from Palestine. And so that&#39;s a common sort OCR error, E and S being a sort of fragile combination. And not only that, but the rules of breaking, they tend to do chunks like that. And also it shows how the TF-IDF is working. Palestine itself as a whole doesn&#39;t appear there because it&#39;s not actually that important, but &#39;stine&#39; got promoted because it&#39;s extremely uncommon and &#39;Pale&#39; got dropped off. And of course &#39;Pale&#39; would be there... because it&#39;s a very common word.

**Tim Sherratt:**

Yeah, that&#39;s nice. But yes, I mean as you go through this, you will see other instances where the sort of OCR issue is coming up again. But that&#39;s also another nice example of thinking about how using computational techniques we can start to improve some of the OCR because you are looking at the way words break and seeing if we can use that in some way. Thanks, that&#39;s great.

**Q&amp;A 5:**

As just before, you said the word evocative, Tim, I was saying to myself, evocative, and so I wanted to talk about that for a minute rather than talk about a technical thing. It seems to me this is a really interesting, I don&#39;t know, I just want you to say more about is this a different kind of historical mode, this kind of desire to treat the past to evoke rather than to necessarily narratize or analyze, or define or pin down. What is it about, is there something distinctive about this evocative mode, which is to do with the digital techniques or, yeah, going on?

**Tim Sherratt:**

Yeah, I mean this is the thing I&#39;ve actually just been trying to grapple with over the last few weeks as I started playing with this stuff. And I don&#39;t know exactly what it is. All I know is what I feel as you do when you see it. And they do make you start to think in different ways and to imagine and make connections. I mean, I think with your work, with Cath&#39;s work on Semble, there are possibilities for creating spaces which encourage people to make connections, to see relationships between things. And I think digital technologies do lend themselves to that because, I don&#39;t know, as I said, I actually think randomness is something which is rather undervalued in terms of exploration and discovery. And as you know, there&#39;s another project that we worked on called The History Wall at the National Museum of Australia, and that brought together material in quite a random fashion. And it was actually, again, quite evocative in terms of it being able to see the sort of possible relations between items there.

As I said, I don&#39;t know what it lends itself to. What is the process? Is it discovery? Is it a prompt for research questions? I don&#39;t know. But it just seems to me to be something which is worth exploring more and I find that it&#39;s actually just something which I keep doing, so I must be interested in it in some way. And yeah, I mean obviously, I mean it&#39;s something which would definitely be worth thinking about some more. I mean there are all sorts of ways in which you can develop a sort of evocative sense. I mean obviously historical photographs and they give you a different sort of feeling from seeing a text description. So yeah, I don&#39;t know. Really interesting question.

**Q&amp;A 6:**

Tim, I&#39;d just like to congratulate you on your work. I just find this really interesting and I think it&#39;s really great that researchers like yourself take our data and play around with it, as you said, because ultimately some of those ideas do lead to really useful actual applications. And I just wanted to say your OCR accuracy results are actually bang on because we did quite a bit of research on that five years ago before we launched, and it was 65% to 70%, which is of course low, which was why we said how can we change that and get the public to help?

But you&#39;re quite right, as time moves on and as these big data sets are made more open and available, people develop technologies to improve that. So five years ago, that didn&#39;t exist, an automated way to improve it. We now know of at least three other people like yourself have figured out how globally they could really increase that OCR accuracy rate and prove it. So I guess that&#39;s questions that I would have, how some of this really fantastic research and ones you mentioned in Europe can be built back in to improve on services. But I just wanted to say, did you find it really useful having API being able to get that data? Because I know that was a dream we&#39;d had for a long time and I know you waited a long time to go.

**Tim Sherratt:**

Well, I didn&#39;t wait. I didn&#39;t wait, did I? I actually just went ahead and did it myself.

Yeah, look, I mean background for those who didn&#39;t know is that I built my own unofficial API at one point, which I used to do some experiments. But yeah, I mean it obviously makes a whole lot of things easier. First of all, the point of view of doing the large data dumps, getting that sort of material, and you&#39;re not downloading the whole web page and all that stuff on it, but you&#39;re just actually getting the data in a structured way. Great. And as anybody who also follows my work will know is that I had a number of frustrating experiences where things change on the web page and everything I created broke and I had to fix it.

So APIs do away with all that. It&#39;s fantastic. But one of the really good things I like about having API access is how easy it makes it to do something like Headline Roulette. If you have an idea and you&#39;ve got a bit of coding experience, you can act on it and you actually build something. And that to me is the most exciting aspect and encouraging people to actually experiment. That&#39;s what it&#39;s all about to me, is creating an environment where people do experiment with this stuff and build things.

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.15771695.svg)](https://doi.org/10.5281/zenodo.15771695)
</source:markdown>
    </item>
    
    <item>
      <title>A brief and biased history of Trove Twitter bots</title>
      <link>https://updates.timsherratt.org/2025/06/19/a-brief-and-biased-history.html</link>
      <pubDate>Thu, 19 Jun 2025 12:08:35 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/06/19/a-brief-and-biased-history.html</guid>
      <description>&lt;p&gt;The socials recently alerted me to an &lt;a href=&#34;https://doi.org/10.1177/13548565251334087&#34;&gt;interesting article&lt;/a&gt; by Dominique Carlon, Jean Burgess, and Kateryna Kasianenko on the history of community-created Twitter bots. The article explores bot-making within the context of Twitter&amp;rsquo;s rise and fall, and provides a handy taxonomy of bot species. However, it doesn&amp;rsquo;t include any Australian bots amidst the examples. That&amp;rsquo;s a bit disappointing, as I remember the bot-building years as a time of great fun and creativity. My own contribution to the world of Twitter bots was mainly focused on &lt;a href=&#34;https://trove.nla.gov.au/&#34;&gt;Trove&lt;/a&gt; (what a surprise!), so I thought I might as well jot down a few incomplete and biased notes about the history of Trove Twitter bots.&lt;/p&gt;
&lt;h2 id=&#34;trove-tweeting-trends&#34;&gt;Trove tweeting trends&lt;/h2&gt;
&lt;p&gt;It just so happens that I recently packaged up some &lt;a href=&#34;https://updates.timsherratt.org/2025/06/10/new-dataset-trove-links-shared.html&#34;&gt;data about Trove links shared on Twitter&lt;/a&gt;. Using this data, we can get a broad perspective on the activity of Trove Twitter bots between 2009 and 2020. The identification of bots is based on the Trove bots list I maintained on Twitter, so it&amp;rsquo;s possible I&amp;rsquo;ve missed some.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;In total, 43 bots posted 318,767 tweets containing 270,474 unique Trove urls between June 2013 and December 2020.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a chart showing the total number of links to Trove shared by Twitter bots each year.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/bot-tweets-per-year.png&#34; width=&#34;600&#34; height=&#34;334&#34; alt=&#34;Bar chart showing the number of Trove links tweeted by bots per year from 2013 to 2020. The number of links rises dramatically in 2018 and reaches a peak in 2019, before dropping away in 2020.&#34;&gt;
&lt;p&gt;Most of the bots shared digitised newspaper articles, but some shared works from other Trove zones. This chart breaks the links down by the type of resource (&amp;lsquo;article&amp;rsquo; equals newspaper article).&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/bot-tweets-year-type.png&#34; width=&#34;600&#34; height=&#34;295&#34; alt=&#34;Bar chart showing the number of Trove links tweeted by bots per year from 2013 to 2020, with the type of linked resource indicated by colour. The number of links rises dramatically in 2018 and reaches a peak in 2019, before dropping away in 2020. The vast proportion of links are to newspaper articles, with a fairly consitent number going to other types of resources.&#34;&gt;
&lt;p&gt;And one final chart showing the number of active bots per year.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/active-bots-year.png&#34; width=&#34;600&#34; height=&#34;347&#34; alt=&#34;Bar chart showing the number of bots actively tweeting Trove links by year from 2013 to 2020. The numbers rise slowly to 2017, then rise dramatically in 2018, reaching a peak in 2019, and falling away in 2020.&#34;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;year&lt;/th&gt;
&lt;th&gt;active bots&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2013&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2014&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2015&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2016&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2017&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2018&lt;/td&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2019&lt;/td&gt;
&lt;td&gt;38&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2020&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;From the data above you can see that bot activity grew slowly between 2013 and 2017, before taking off dramatically in 2018. The peak year for Trove bots was 2019, when 38 individual bots shared more than 100,000 links to Trove. But a mass extinction event in 2020 almost halved the number of active bots. So what happened?&lt;/p&gt;
&lt;h2 id=&#34;build-a-bot-begins&#34;&gt;Build-a-bot begins&lt;/h2&gt;
&lt;p&gt;In June 2013, inspired by bot creators like Mark Sample, I hooked the Trove API up to Twitter to see what would happen when &lt;a href=&#34;https://discontents.com.au/conversations-with-collections/index.html&#34;&gt;GLAM collections joined online social spaces&lt;/a&gt;. The result was @TroveNewsBot, sharing digitised newspaper articles from Trove.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/trovenewsbot-twitter.png&#34; width=&#34;520&#34; height=&#34;315&#34; alt=&#34;Screen capture of TroveNewsBot&#39;s original Twitter profile.&#34;&gt;
&lt;p&gt;Twitter bots started popping up around the world, sharing collection items from Europeana, the Digital Public Library of America, DigitalNZ, the Cooper Hewitt Museum, and the Brooklyn Museum, amongst others. But @TroveNewsBot was always a bit different. Instead of just sharing randomly selected resources, @TroveNewsBot helped people explore Trove without leaving Twitter. If you tweeted keywords at the bot, it would run a search using the API and tweet back the most relevant result. By adding hashtags, users could &lt;a href=&#34;https://github.com/wragge/trovenewsbot&#34;&gt;control a variety of search parameters&lt;/a&gt; – for example, if you included the hashtag #luckydip you&amp;rsquo;d get back a random article from your search results.&lt;/p&gt;
&lt;p&gt;My favourite bot behaviour was its &amp;lsquo;opinionator&amp;rsquo; mode. If you tweeted a url at @TroveNewsBot, it would retrieve the link, extract keywords from the text, and then search for those keywords in Trove&amp;rsquo;s newspapers. This enabled @TroveNewsBot to have conversations with other online resources – for example, it replied to tweets from DPLA and DigitalNZ, &lt;a href=&#34;https://wakelet.com/wake/fa91d582-33e5-400f-9c27-b6c1c5b992b8&#34;&gt;finding connections between different collections&lt;/a&gt;. I also used the &amp;lsquo;opinionator&amp;rsquo; mode to set up a dialogue between past and present. Several times a day, the bot would grab keywords from the latest news items on the ABC (later the Guardian) website, search for historic newspaper articles, and then tweet both stories, old and new.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/life-on-the-outside.041.jpg&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;Slide illustrating TroveNewsBot&#39;s opinionator mode. There are three images: a screen capture from an ABC news article about the Scottish Independence Referendum, a Trove newspaper article headed &#39;Scottish Independence&#39; from 1928, and a TroveNewsBot tweet linking the two.&#34;&gt;
&lt;p&gt;&lt;em&gt;@TroveNewsBot&amp;rsquo;s opinionator mode in action – a slide from my keynote presentation &lt;a href=&#34;https://doi.org/10.5281/zenodo.3566879&#34;&gt;&amp;lsquo;Life on the outside: connections, contexts, and the wild, wild web&amp;rsquo;&lt;/a&gt; for the Annual Conference of the Japanese Association of Digital Humanities in 2014&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;As well as providing digitised content, such as the newspapers, Trove aggregates collection metadata from hundreds of organisations around Australia and makes it available through its own API. This meant that any organisation could use the Trove API to create a Twitter bot that shared items from &lt;em&gt;their own collection&lt;/em&gt;. To encourage more of this sort of experimentation, I created the &lt;a href=&#34;https://github.com/wragge/trovebuildabot&#34;&gt;Build-a-Bot Workshop&lt;/a&gt; GitHub repository. This repository included instructions and code for anyone wanting to build their own collection bot on top of the Trove API. Like @TroveNewsBot, these collection bots could share random items and respond to user queries.&lt;/p&gt;
&lt;p&gt;Before long, @CurtinLibBot was sharing photos from the Curtin University Library&amp;rsquo;s image collection, and @Kasparbot was tweeting about objects from the National Museum of Australia. By the end of 2013, I&amp;rsquo;d &lt;a href=&#34;https://discontents.com.au/an-addition-to-the-family/index.html&#34;&gt;added to the family&lt;/a&gt; by creating @TroveBot. While @TroveNewsBot dug into the digitised newspaper articles, its younger sibling looked for inspiration amongst Trove&amp;rsquo;s other zones – sharing books, journals, photos, maps and more.&lt;/p&gt;
&lt;p&gt;In 2015, Steve Leahy unleashed @TrovePenguinBot upon the world, searching for sardines amongst the digitised newspapers. In 2016, one of my students at the University of Canberra modified the &lt;a href=&#34;https://github.com/lolibrarian/NYPL-Emoji-Bot&#34;&gt;NYPL Emoji Bot code&lt;/a&gt; to create @TroveEmojiBot – if you tweeted an emoji at the bot, it would respond with a suitably-themed newspaper article. In 2017, the &lt;a href=&#34;https://digitisethedawn.org/&#34;&gt;Digitise the Dawn campaign&lt;/a&gt; bot-ified their Twitter account, posting an article each day from Louisa Lawson&amp;rsquo;s journal, &lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/252&#34;&gt;The Dawn&lt;/a&gt;. Meanwhile, @astrove_bot started sharing newspaper articles relating to astronomy.&lt;/p&gt;
&lt;p&gt;And then there was The Vintage Face Depot&amp;hellip;&lt;/p&gt;
&lt;h2 id=&#34;things-get-weird&#34;&gt;Things get weird&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;d been experimenting for a few years with &lt;em&gt;faces&lt;/em&gt; as a way of connecting to GLAM collections –as alternative entry points, based not on metadata but &lt;a href=&#34;https://doi.org/10.5281/zenodo.3579530&#34;&gt;the people inside&lt;/a&gt;. In 2015, this led me to create &lt;a href=&#34;https://wragge.github.io/face-depot/&#34;&gt;The Vintage Face Depot&lt;/a&gt;. If you tweeted a photo of yourself to @facedepot, the bot would select a face at random from a collection I&amp;rsquo;d compiled from Trove newspapers and superimpose that face over yours, tweeting you back the result and a link to the original article, so you could find out more about the person you&amp;rsquo;d been matched with.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/unremembering-dh2015.038.jpg&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;Slide showing the operation of The Vintage Face Depot. There are three screen captures from Twitter. Each includes a portrait photo shared by a Twitter user, and facedepot&#39;s reply that includes a modified version of the photo with a face from Trove&#39;s newspapers overlaid, and a link to the original newspaper article.&#34;&gt;
&lt;p&gt;&lt;em&gt;@facedepot in action – a slide from my keynote presentation &lt;a href=&#34;https://doi.org/10.5281/zenodo.3566887&#34;&gt;&amp;lsquo;Unremembering the forgotten&amp;rsquo;&lt;/a&gt; for the Alliance of Digital Humanities Organizations Annual Conference in 2015&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Now, in a time of deep fakes and AI generated images, @facedepot&amp;rsquo;s efforts seem quaint and kludgy. But that was always the point. I wanted to mess around with the barriers that put some people on the other side of this wall we call the past – to explore what historian Devon Elliot suggested on Twitter was an &amp;lsquo;uncanny temporal valley&amp;rsquo;. As I argued in &lt;a href=&#34;https://discontents.com.au/the-perfect-face/&#34;&gt;The Perfect Face&lt;/a&gt;, a presentation at NDF2015:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The Vintage Face Depot tells you nothing about yourself. I built it at about the same time as Microsoft launched their How-Old bot that uses machine learning to estimate your age. Face Depot does nothing clever, and yet sometimes the results are uncanny, even unsettling. Microsoft might be able to tell you how old you are, but  Face Depot asks &lt;em&gt;who&lt;/em&gt; you are and pushes you in the direction of a past life, linked merely through chance.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/vintage-faces.gif&#34; width=&#34;600&#34; height=&#34;400&#34; alt=&#34;Animated gif showing some images generated during testing of facedepot&#34;&gt;
&lt;h2 id=&#34;glitch-bots-for-all&#34;&gt;Glitch bots for all&lt;/h2&gt;
&lt;p&gt;While I&amp;rsquo;d shared some bot-building code, rolling your own bot still required access to a web-connected server – a significant barrier for most would-be experimenters. This changed in 2017 with the arrival of Glitch, a platform that enabled anyone to build simple web apps for free. Perhaps most importantly, Glitch apps were remixable – simply by clicking a button, you could open an editor and create your own customised version of any app.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/glitch-trove-bots.png&#34; width=&#34;600&#34; height=&#34;372&#34; alt=&#34;Screen capture from the Trove page in Glitch, showing the four bot templates.&#34;&gt;
&lt;p&gt;Glitch seemed like an ideal environment in which to experiment with bots, so I created four remixable Trove Twitter bot recipes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;trove-collection-bot&lt;/strong&gt; – sharing resources from a partner collection&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;trove-list-bot&lt;/strong&gt; – sharing items from a Trove list&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;trove-title-bot&lt;/strong&gt; – sharing articles from specific newspapers&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;trove-tag-bot&lt;/strong&gt; – sharing items with specific tags&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These were supported by a &lt;a href=&#34;https://101dhhacks.net/2018/01/21/trove-bots-for-all/&#34;&gt;detailed tutorial&lt;/a&gt; that walked through the process of customisation and suggested ways in which the basic recipes could be extended – for example, by adding a specific search query to a title bot.&lt;/p&gt;
&lt;p&gt;This was the beginning of the bot explosion, with more than 30 Trove Twitter bots born between 2017 and 2019.&lt;/p&gt;
&lt;p&gt;One of these, @NTTimesGazette, was created by curator and journalist Caddie Brain to tweet articles from the &lt;em&gt;Northern Territory Times and Gazette&lt;/em&gt;. The bot was featured on ABC radio in Darwin under the headline: &lt;a href=&#34;https://www.abc.net.au/news/2018-02-16/trove-twitter-unearths-history-newspaper-nt-times-and-gazette/9445458&#34;&gt;Twitter bot offers a rare look inside the Darwin&amp;rsquo;s forgotten first newspaper&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Historian Brett Holman created a series of bots related to aviation history. More than just a source of amusement, the bots became part of Brett&amp;rsquo;s research practice, as described in his &lt;em&gt;History Australia&lt;/em&gt; article &lt;a href=&#34;https://doi.org/10.17613/9h30-ke82&#34;&gt;&#39;@TroveAirRaidBot, a 24/7/365 research assistant&#39;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Perhaps the best part of this bot-making extravaganza was the number of self-professed &amp;lsquo;non coders&amp;rsquo; who were able to take their first steps into the world of programming and actually &lt;em&gt;create something&lt;/em&gt;. I have memories of sitting in the shade at Canberra&amp;rsquo;s now defunct Big Splash water park, troubleshooting someone&amp;rsquo;s Twitter bot on my phone, while the kids played on the water slides – it was fun, and it was exciting. Together, Trove, Twitter, and Glitch opened up new possibilities for learning and experimentation, and new ways of knowing Australia&amp;rsquo;s cultural heritage.&lt;/p&gt;
&lt;h2 id=&#34;2019-bot-roll-call&#34;&gt;2019 bot roll call&lt;/h2&gt;
&lt;p&gt;As new bots emerged, I added them to my Trove bots Twitter list (here&amp;rsquo;s a &lt;a href=&#34;https://web.archive.org/web/20180627053546/https://twitter.com/wragge/lists/trove-bots/members&#34;&gt;partially archived copy&lt;/a&gt;).&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/trove-bots.png&#34; width=&#34;600&#34; height=&#34;451&#34; alt=&#34;Screen capture from the Trove bots list in Twitter, showing some of the Trove bots.&#34;&gt;
&lt;p&gt;You can get an idea of their diversity from the bot names – a mix of collections, subjects, and places. Here&amp;rsquo;s a list of Trove Twitter bots active in 2019:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;astrove_bot&lt;/li&gt;
&lt;li&gt;AustWWBot&lt;/li&gt;
&lt;li&gt;BotCBR_QLD&lt;/li&gt;
&lt;li&gt;CatsofTrove&lt;/li&gt;
&lt;li&gt;digitisethedawn&lt;/li&gt;
&lt;li&gt;DoSonTrove&lt;/li&gt;
&lt;li&gt;facedepot&lt;/li&gt;
&lt;li&gt;Kasparbot&lt;/li&gt;
&lt;li&gt;KellyGangBot&lt;/li&gt;
&lt;li&gt;LAAL_bot&lt;/li&gt;
&lt;li&gt;NTTimesGazette&lt;/li&gt;
&lt;li&gt;PenrithPictures&lt;/li&gt;
&lt;li&gt;RemixHistorical&lt;/li&gt;
&lt;li&gt;suthlib&lt;/li&gt;
&lt;li&gt;TroveAirBot&lt;/li&gt;
&lt;li&gt;TroveBot&lt;/li&gt;
&lt;li&gt;TrovecakeBot&lt;/li&gt;
&lt;li&gt;TroveCHIAbot&lt;/li&gt;
&lt;li&gt;TroveDutchbot&lt;/li&gt;
&lt;li&gt;TroveEmojiBot&lt;/li&gt;
&lt;li&gt;trovefacesbot&lt;/li&gt;
&lt;li&gt;TroveHoroscopes&lt;/li&gt;
&lt;li&gt;Troveknitbot&lt;/li&gt;
&lt;li&gt;Trovelandbot&lt;/li&gt;
&lt;li&gt;trovelistbot&lt;/li&gt;
&lt;li&gt;TroveMirrorBot&lt;/li&gt;
&lt;li&gt;TroveNewsBot&lt;/li&gt;
&lt;li&gt;TrovePenguinBot&lt;/li&gt;
&lt;li&gt;TroveRefereeBot&lt;/li&gt;
&lt;li&gt;trovesportsmel&lt;/li&gt;
&lt;li&gt;trovetribunebot&lt;/li&gt;
&lt;li&gt;TroveXmasBot&lt;/li&gt;
&lt;li&gt;TsvBulletinBot&lt;/li&gt;
&lt;li&gt;WomenAtWarBot&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I also &lt;a href=&#34;https://wragge.github.io/trovenewsbot2019/&#34;&gt;overhauled @TroveNewsBot&lt;/a&gt; in 2019, adding a number of new features, including article thumbnails.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/trovenewsbot-example.png&#34; width=&#34;600&#34; height=&#34;722&#34; alt=&#34;Screen capture from Twitter showing TroveNewsBot&#39;s reply to the query &#39;library robot&#39;. The reply includes details of the article &#39;Mystery of the week: Robot murder in the library&#39; as well as a thumbnail image of the article.&#34;&gt;
&lt;h2 id=&#34;decline-and-fall&#34;&gt;Decline and fall&lt;/h2&gt;
&lt;p&gt;This golden age of bot-making came to an end late in 2019.&lt;/p&gt;
&lt;p&gt;The first blow came when &lt;a href=&#34;https://updates.timsherratt.org/2019/10/09/creators-and-users.html&#34;&gt;Trove updated its API&lt;/a&gt;. The bots needed some way of selecting random items from the millions available on Trove. This was fairly easy with version one of the API, but version two overhauled the way you accessed items within the result set, making random selections impossible. I eventually managed to hack together &lt;a href=&#34;https://glam-workbench.net/trove-random/&#34;&gt;a random-ish method&lt;/a&gt; that added multiple facets to whittle down the results set until a selection could be made. Using this method, I &lt;a href=&#34;https://updates.timsherratt.org/2019/11/07/the-death-and.html&#34;&gt;created new versions of my Glitch bot recipes&lt;/a&gt; and &lt;a href=&#34;https://101dhhacks.net/trove-bots-for-all/&#34;&gt;updated the tutorial&lt;/a&gt;. But it seemed that the moment had passed, and many bot authors just let their creations die when version one of the API was switched off.&lt;/p&gt;
&lt;p&gt;Surviving bots faced further challenges when Glitch started imposing limits on its free services. Glitch apps were designed to sleep when not in use, so to get your bot tweeting you had to fire regular web requests at it using a cron service. Glitch blocked access by these services and introduced a paid tier for &amp;lsquo;always on&amp;rsquo; apps. More bots died as a result.&lt;/p&gt;
&lt;p&gt;I was thinking about switching my recipes from Glitch to GitHub, making use of templates and scheduled actions. But  while I prevaricated, Twitter started on its long, drawn out death spiral – first imposing new limits on API use, and later becoming the preferred networking site for nazis and transphobes. It was no place for creative bot-making.&lt;/p&gt;
&lt;h2 id=&#34;the-serious-side-of-serendipity&#34;&gt;The serious side of serendipity&lt;/h2&gt;
&lt;p&gt;Bot-making wasn&amp;rsquo;t just about fun – Trove Twitter bots had a serious purpose as well. In &lt;a href=&#34;https://dhdebates.gc.cuny.edu/read/untitled-f2acf72c-a469-49d8-be35-67f9ac1e3a60/section/be608100-95b6-4e48-bfd5-a82a588da8f1&#34;&gt;&amp;lsquo;Unremembering the forgotten&amp;rsquo;&lt;/a&gt; I wrote:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Twitter bots can interrupt our social media meanderings with pinpoints of surprise, conflict, and meaning. And yet they are lightweight, almost disposable, in their development and implementation. No committees were formed, no grants were obtained—they are quick and creative: hacks in the best sense of the word. Bots are an example of how digital skills and tools allow us to try things, to build and play, without any expectation of significance or impact. We can experiment with the parameters of access.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A number of articles on the value of serendipity have considered how collection bots, like @TroveNewsBot, can puncture our research expectations. The random offerings of bots might offer new modes of discovery. In &lt;a href=&#34;https://muse.jhu.edu/article/585974&#34;&gt;&amp;lsquo;Technologies of Serendipity&amp;rsquo;&lt;/a&gt;, Paul Fyfe argues:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;For scholars or other readers, discovery results less from directed searching than from all the tangents encountered on the way. Thus, sources which are plural, redundant, and tangent-rich help promote discovery by the proliferating contingencies of their usage.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Similarly, &lt;a href=&#34;https://doi.org/10.17613/9h30-ke82&#34;&gt;Brett Holman notes&lt;/a&gt; that his own Trove bots help him make connections in his research:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;By impinging on my consciousness when I am preoccupied by other things, @TroveAirRaidBot’s tweets draw my mind back to this research topic that is always sitting at the back of my mind somewhere, and it makes me make connections – randomly, haphazardly, but often very fruitfully leading me to think of something I hadn’t thought of before, or reminding me of something I’d forgotten, or juxtaposing some seemingly unrelated things. It’s a kind of directed serendipity.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Trove Twitter bots were also entry points and interventions – challenging our understanding of access. They offered playful demonstrations of how our experience of GLAM collections might be different. Mitchell Whitelaw &lt;a href=&#34;http://olh.openlibhums.org/articles/10.16995/olh.291/&#34;&gt;suggested that such creations&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;reflect an emerging interest in collections as active sites of meaning-making, and experimentation with how we might encounter such collections in an everyday digital environment.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In &lt;a href=&#34;https://doi.org/10.5281/zenodo.3566879&#34;&gt;&amp;lsquo;Life on the outside&amp;rsquo;&lt;/a&gt;, I considered the lives that GLAM collections might lead beyond institutional confines:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;These bots do not simply present collection items outside of the familiar context of discovery interfaces or online exhibitions, they move the encounter itself into a wholly new space. &amp;hellip; Twitter bots loosen the institutional context of collections to allow them to participate in a space where people already congregate. They send collection items out into the wilds of the web, to find new meanings, new connections and perhaps even new love.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The promise of serendipitous discovery has now faded with the poisoning of social media spaces, and the retreat of many GLAM organisations from experimentation and openness. The need to control now carries more weight than the gift of creativity.&lt;/p&gt;
&lt;h2 id=&#34;what-remains&#34;&gt;What remains&lt;/h2&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-06-18-17-33-22.png&#34; width=&#34;600&#34; height=&#34;653&#34; alt=&#34;Photograph of a Raspberry Pi on a table top. Stck on to the top of the Pi is a photo of a robot from Trove&#39;s newspapers – this photo was also used as TroveNewsBot&#39;s avatar on Twitter.&#34;&gt;
&lt;p&gt;I migrated &lt;a href=&#34;https://wraggebots.net/@trovenewsbot&#34;&gt;@TroveNewsBot to the Fediverse&lt;/a&gt; in May 2023, but sadly it was killed when &lt;a href=&#34;https://updates.timsherratt.org/2025/02/24/years-of-work-on-trove.html&#34;&gt;NLA gatekeepers cancelled my Trove API keys&lt;/a&gt; without warning in January 2025.&lt;/p&gt;
&lt;p&gt;A number of other Trove bots have survived the Twitter implosion and found their way to alternate platforms. &lt;a href=&#34;https://ausglam.space/@digitisethedawn&#34;&gt;@DigitiseTheDawn&lt;/a&gt; now shares articles on the Fediverse, while &lt;a href=&#34;https://bsky.app/profile/trovepenguinbot.bsky.social&#34;&gt;@TrovePenguinBot&lt;/a&gt; is pursuing sardines on Bluesky. Brett Holman has created new versions of his aviation-themed bots – &lt;a href=&#34;https://bsky.app/profile/troveairbot.airminded.org&#34;&gt;@TroveAirBot&lt;/a&gt;, &lt;a href=&#34;https://bsky.app/profile/troveairraidbot.airminded.org&#34;&gt;@TroveAirRaidBot&lt;/a&gt;, and &lt;a href=&#34;https://bsky.app/profile/troveufobot.airminded.org&#34;&gt;@TroveUFOBot&lt;/a&gt; – on Bluesky. I&amp;rsquo;d be happy to add the details of any other survivors I might have missed.&lt;/p&gt;
&lt;p&gt;In an odd coincidence, recent months have brought &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;new restrictions on access to Trove API keys&lt;/a&gt;, and an announcement of the end of Glitch. There&amp;rsquo;s no going back.&lt;/p&gt;
&lt;p&gt;ActivityPub and the Fediverse seem to offer new digital channels through which collections might flow and connect. See, for example, &lt;a href=&#34;https://millsfield.sfomuseum.org/blog/2024/03/12/activitypub/&#34;&gt;Aaron Straup Cope&amp;rsquo;s work&lt;/a&gt; at the SFO Museum. But how do we support and encourage this type of experimentation?&lt;/p&gt;
&lt;p&gt;Personally speaking, this year&amp;rsquo;s been pretty shit so far, and I&amp;rsquo;ve been having trouble finding any motivation. But in pulling together these notes I found a section in &lt;a href=&#34;https://dhdebates.gc.cuny.edu/read/untitled-f2acf72c-a469-49d8-be35-67f9ac1e3a60/&#34;&gt;&amp;lsquo;Unremembering the forgotten&amp;rsquo;&lt;/a&gt; that reminded me of what&amp;rsquo;s at stake:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;There is no open access to the past. There is no key we can enter to recall a life. I create these projects not because I want to contribute to some form of national memory, but because I want to unsettle what it means to remember: to go beyond the listing of names and the cataloging of files to develop modes of access that are confusing, challenging, inspiring, uncomfortable, and sometimes creepy.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There&amp;rsquo;s still plenty of work to do.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.15694209&#34;&gt;&lt;img src=&#34;https://zenodo.org/badge/DOI/10.5281/zenodo.15694209.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
</description>
      <source:markdown>The socials recently alerted me to an [interesting article](https://doi.org/10.1177/13548565251334087) by Dominique Carlon, Jean Burgess, and Kateryna Kasianenko on the history of community-created Twitter bots. The article explores bot-making within the context of Twitter&#39;s rise and fall, and provides a handy taxonomy of bot species. However, it doesn&#39;t include any Australian bots amidst the examples. That&#39;s a bit disappointing, as I remember the bot-building years as a time of great fun and creativity. My own contribution to the world of Twitter bots was mainly focused on [Trove](https://trove.nla.gov.au/) (what a surprise!), so I thought I might as well jot down a few incomplete and biased notes about the history of Trove Twitter bots.

## Trove tweeting trends

It just so happens that I recently packaged up some [data about Trove links shared on Twitter](https://updates.timsherratt.org/2025/06/10/new-dataset-trove-links-shared.html). Using this data, we can get a broad perspective on the activity of Trove Twitter bots between 2009 and 2020. The identification of bots is based on the Trove bots list I maintained on Twitter, so it&#39;s possible I&#39;ve missed some.

**In total, 43 bots posted 318,767 tweets containing 270,474 unique Trove urls between June 2013 and December 2020.**

Here&#39;s a chart showing the total number of links to Trove shared by Twitter bots each year. 

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/bot-tweets-per-year.png&#34; width=&#34;600&#34; height=&#34;334&#34; alt=&#34;Bar chart showing the number of Trove links tweeted by bots per year from 2013 to 2020. The number of links rises dramatically in 2018 and reaches a peak in 2019, before dropping away in 2020.&#34;&gt;

Most of the bots shared digitised newspaper articles, but some shared works from other Trove zones. This chart breaks the links down by the type of resource (&#39;article&#39; equals newspaper article).

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/bot-tweets-year-type.png&#34; width=&#34;600&#34; height=&#34;295&#34; alt=&#34;Bar chart showing the number of Trove links tweeted by bots per year from 2013 to 2020, with the type of linked resource indicated by colour. The number of links rises dramatically in 2018 and reaches a peak in 2019, before dropping away in 2020. The vast proportion of links are to newspaper articles, with a fairly consitent number going to other types of resources.&#34;&gt;

And one final chart showing the number of active bots per year.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/active-bots-year.png&#34; width=&#34;600&#34; height=&#34;347&#34; alt=&#34;Bar chart showing the number of bots actively tweeting Trove links by year from 2013 to 2020. The numbers rise slowly to 2017, then rise dramatically in 2018, reaching a peak in 2019, and falling away in 2020.&#34;&gt;

| year | active bots |
| ---- | ----------- |
| 2013 | 4           |
| 2014 | 4           |
| 2015 | 6           |
| 2016 | 8           |
| 2017 | 12          |
| 2018 | 34          |
| 2019 | 38          |
| 2020 | 22          |

From the data above you can see that bot activity grew slowly between 2013 and 2017, before taking off dramatically in 2018. The peak year for Trove bots was 2019, when 38 individual bots shared more than 100,000 links to Trove. But a mass extinction event in 2020 almost halved the number of active bots. So what happened?

## Build-a-bot begins

In June 2013, inspired by bot creators like Mark Sample, I hooked the Trove API up to Twitter to see what would happen when [GLAM collections joined online social spaces](https://discontents.com.au/conversations-with-collections/index.html). The result was @TroveNewsBot, sharing digitised newspaper articles from Trove.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/trovenewsbot-twitter.png&#34; width=&#34;520&#34; height=&#34;315&#34; alt=&#34;Screen capture of TroveNewsBot&#39;s original Twitter profile.&#34;&gt;

Twitter bots started popping up around the world, sharing collection items from Europeana, the Digital Public Library of America, DigitalNZ, the Cooper Hewitt Museum, and the Brooklyn Museum, amongst others. But @TroveNewsBot was always a bit different. Instead of just sharing randomly selected resources, @TroveNewsBot helped people explore Trove without leaving Twitter. If you tweeted keywords at the bot, it would run a search using the API and tweet back the most relevant result. By adding hashtags, users could [control a variety of search parameters](https://github.com/wragge/trovenewsbot) – for example, if you included the hashtag #luckydip you&#39;d get back a random article from your search results.

My favourite bot behaviour was its &#39;opinionator&#39; mode. If you tweeted a url at @TroveNewsBot, it would retrieve the link, extract keywords from the text, and then search for those keywords in Trove&#39;s newspapers. This enabled @TroveNewsBot to have conversations with other online resources – for example, it replied to tweets from DPLA and DigitalNZ, [finding connections between different collections](https://wakelet.com/wake/fa91d582-33e5-400f-9c27-b6c1c5b992b8). I also used the &#39;opinionator&#39; mode to set up a dialogue between past and present. Several times a day, the bot would grab keywords from the latest news items on the ABC (later the Guardian) website, search for historic newspaper articles, and then tweet both stories, old and new.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/life-on-the-outside.041.jpg&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;Slide illustrating TroveNewsBot&#39;s opinionator mode. There are three images: a screen capture from an ABC news article about the Scottish Independence Referendum, a Trove newspaper article headed &#39;Scottish Independence&#39; from 1928, and a TroveNewsBot tweet linking the two.&#34;&gt;

*@TroveNewsBot&#39;s opinionator mode in action – a slide from my keynote presentation [&#39;Life on the outside: connections, contexts, and the wild, wild web&#39;](https://doi.org/10.5281/zenodo.3566879) for the Annual Conference of the Japanese Association of Digital Humanities in 2014*

As well as providing digitised content, such as the newspapers, Trove aggregates collection metadata from hundreds of organisations around Australia and makes it available through its own API. This meant that any organisation could use the Trove API to create a Twitter bot that shared items from *their own collection*. To encourage more of this sort of experimentation, I created the [Build-a-Bot Workshop](https://github.com/wragge/trovebuildabot) GitHub repository. This repository included instructions and code for anyone wanting to build their own collection bot on top of the Trove API. Like @TroveNewsBot, these collection bots could share random items and respond to user queries.

Before long, @CurtinLibBot was sharing photos from the Curtin University Library&#39;s image collection, and @Kasparbot was tweeting about objects from the National Museum of Australia. By the end of 2013, I&#39;d [added to the family](https://discontents.com.au/an-addition-to-the-family/index.html) by creating @TroveBot. While @TroveNewsBot dug into the digitised newspaper articles, its younger sibling looked for inspiration amongst Trove&#39;s other zones – sharing books, journals, photos, maps and more.

In 2015, Steve Leahy unleashed @TrovePenguinBot upon the world, searching for sardines amongst the digitised newspapers. In 2016, one of my students at the University of Canberra modified the [NYPL Emoji Bot code](https://github.com/lolibrarian/NYPL-Emoji-Bot) to create @TroveEmojiBot – if you tweeted an emoji at the bot, it would respond with a suitably-themed newspaper article. In 2017, the [Digitise the Dawn campaign](https://digitisethedawn.org/) bot-ified their Twitter account, posting an article each day from Louisa Lawson&#39;s journal, [The Dawn](https://trove.nla.gov.au/newspaper/title/252). Meanwhile, @astrove_bot started sharing newspaper articles relating to astronomy.

And then there was The Vintage Face Depot...

## Things get weird

I&#39;d been experimenting for a few years with *faces* as a way of connecting to GLAM collections –as alternative entry points, based not on metadata but [the people inside](https://doi.org/10.5281/zenodo.3579530). In 2015, this led me to create [The Vintage Face Depot](https://wragge.github.io/face-depot/). If you tweeted a photo of yourself to @facedepot, the bot would select a face at random from a collection I&#39;d compiled from Trove newspapers and superimpose that face over yours, tweeting you back the result and a link to the original article, so you could find out more about the person you&#39;d been matched with.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/unremembering-dh2015.038.jpg&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;Slide showing the operation of The Vintage Face Depot. There are three screen captures from Twitter. Each includes a portrait photo shared by a Twitter user, and facedepot&#39;s reply that includes a modified version of the photo with a face from Trove&#39;s newspapers overlaid, and a link to the original newspaper article.&#34;&gt;

*@facedepot in action – a slide from my keynote presentation [&#39;Unremembering the forgotten&#39;](https://doi.org/10.5281/zenodo.3566887) for the Alliance of Digital Humanities Organizations Annual Conference in 2015*

Now, in a time of deep fakes and AI generated images, @facedepot&#39;s efforts seem quaint and kludgy. But that was always the point. I wanted to mess around with the barriers that put some people on the other side of this wall we call the past – to explore what historian Devon Elliot suggested on Twitter was an &#39;uncanny temporal valley&#39;. As I argued in [The Perfect Face](https://discontents.com.au/the-perfect-face/), a presentation at NDF2015:

&gt; The Vintage Face Depot tells you nothing about yourself. I built it at about the same time as Microsoft launched their How-Old bot that uses machine learning to estimate your age. Face Depot does nothing clever, and yet sometimes the results are uncanny, even unsettling. Microsoft might be able to tell you how old you are, but  Face Depot asks *who* you are and pushes you in the direction of a past life, linked merely through chance.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/vintage-faces.gif&#34; width=&#34;600&#34; height=&#34;400&#34; alt=&#34;Animated gif showing some images generated during testing of facedepot&#34;&gt;

## Glitch bots for all

While I&#39;d shared some bot-building code, rolling your own bot still required access to a web-connected server – a significant barrier for most would-be experimenters. This changed in 2017 with the arrival of Glitch, a platform that enabled anyone to build simple web apps for free. Perhaps most importantly, Glitch apps were remixable – simply by clicking a button, you could open an editor and create your own customised version of any app.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/glitch-trove-bots.png&#34; width=&#34;600&#34; height=&#34;372&#34; alt=&#34;Screen capture from the Trove page in Glitch, showing the four bot templates.&#34;&gt;

Glitch seemed like an ideal environment in which to experiment with bots, so I created four remixable Trove Twitter bot recipes:

- **trove-collection-bot** – sharing resources from a partner collection
- **trove-list-bot** – sharing items from a Trove list
- **trove-title-bot** – sharing articles from specific newspapers
- **trove-tag-bot** – sharing items with specific tags

These were supported by a [detailed tutorial](https://101dhhacks.net/2018/01/21/trove-bots-for-all/) that walked through the process of customisation and suggested ways in which the basic recipes could be extended – for example, by adding a specific search query to a title bot.

This was the beginning of the bot explosion, with more than 30 Trove Twitter bots born between 2017 and 2019.

One of these, @NTTimesGazette, was created by curator and journalist Caddie Brain to tweet articles from the *Northern Territory Times and Gazette*. The bot was featured on ABC radio in Darwin under the headline: [Twitter bot offers a rare look inside the Darwin&#39;s forgotten first newspaper](https://www.abc.net.au/news/2018-02-16/trove-twitter-unearths-history-newspaper-nt-times-and-gazette/9445458).

Historian Brett Holman created a series of bots related to aviation history. More than just a source of amusement, the bots became part of Brett&#39;s research practice, as described in his *History Australia* article [&#39;@TroveAirRaidBot, a 24/7/365 research assistant&#39;](https://doi.org/10.17613/9h30-ke82).

Perhaps the best part of this bot-making extravaganza was the number of self-professed &#39;non coders&#39; who were able to take their first steps into the world of programming and actually *create something*. I have memories of sitting in the shade at Canberra&#39;s now defunct Big Splash water park, troubleshooting someone&#39;s Twitter bot on my phone, while the kids played on the water slides – it was fun, and it was exciting. Together, Trove, Twitter, and Glitch opened up new possibilities for learning and experimentation, and new ways of knowing Australia&#39;s cultural heritage.

## 2019 bot roll call

As new bots emerged, I added them to my Trove bots Twitter list (here&#39;s a [partially archived copy](https://web.archive.org/web/20180627053546/https://twitter.com/wragge/lists/trove-bots/members)).

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/trove-bots.png&#34; width=&#34;600&#34; height=&#34;451&#34; alt=&#34;Screen capture from the Trove bots list in Twitter, showing some of the Trove bots.&#34;&gt;

You can get an idea of their diversity from the bot names – a mix of collections, subjects, and places. Here&#39;s a list of Trove Twitter bots active in 2019:

- astrove_bot
- AustWWBot
- BotCBR_QLD
- CatsofTrove
- digitisethedawn
- DoSonTrove
- facedepot
- Kasparbot
- KellyGangBot
- LAAL_bot
- NTTimesGazette
- PenrithPictures
- RemixHistorical
- suthlib
- TroveAirBot
- TroveBot
- TrovecakeBot
- TroveCHIAbot
- TroveDutchbot
- TroveEmojiBot
- trovefacesbot
- TroveHoroscopes
- Troveknitbot
- Trovelandbot
- trovelistbot
- TroveMirrorBot
- TroveNewsBot
- TrovePenguinBot
- TroveRefereeBot
- trovesportsmel
- trovetribunebot
- TroveXmasBot
- TsvBulletinBot
- WomenAtWarBot

I also [overhauled @TroveNewsBot](https://wragge.github.io/trovenewsbot2019/) in 2019, adding a number of new features, including article thumbnails.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/trovenewsbot-example.png&#34; width=&#34;600&#34; height=&#34;722&#34; alt=&#34;Screen capture from Twitter showing TroveNewsBot&#39;s reply to the query &#39;library robot&#39;. The reply includes details of the article &#39;Mystery of the week: Robot murder in the library&#39; as well as a thumbnail image of the article.&#34;&gt;

## Decline and fall

This golden age of bot-making came to an end late in 2019.

The first blow came when [Trove updated its API](https://updates.timsherratt.org/2019/10/09/creators-and-users.html). The bots needed some way of selecting random items from the millions available on Trove. This was fairly easy with version one of the API, but version two overhauled the way you accessed items within the result set, making random selections impossible. I eventually managed to hack together [a random-ish method](https://glam-workbench.net/trove-random/) that added multiple facets to whittle down the results set until a selection could be made. Using this method, I [created new versions of my Glitch bot recipes](https://updates.timsherratt.org/2019/11/07/the-death-and.html) and [updated the tutorial](https://101dhhacks.net/trove-bots-for-all/). But it seemed that the moment had passed, and many bot authors just let their creations die when version one of the API was switched off.

Surviving bots faced further challenges when Glitch started imposing limits on its free services. Glitch apps were designed to sleep when not in use, so to get your bot tweeting you had to fire regular web requests at it using a cron service. Glitch blocked access by these services and introduced a paid tier for &#39;always on&#39; apps. More bots died as a result.

I was thinking about switching my recipes from Glitch to GitHub, making use of templates and scheduled actions. But  while I prevaricated, Twitter started on its long, drawn out death spiral – first imposing new limits on API use, and later becoming the preferred networking site for nazis and transphobes. It was no place for creative bot-making.

## The serious side of serendipity

Bot-making wasn&#39;t just about fun – Trove Twitter bots had a serious purpose as well. In [&#39;Unremembering the forgotten&#39;](https://dhdebates.gc.cuny.edu/read/untitled-f2acf72c-a469-49d8-be35-67f9ac1e3a60/section/be608100-95b6-4e48-bfd5-a82a588da8f1) I wrote:

&gt; Twitter bots can interrupt our social media meanderings with pinpoints of surprise, conflict, and meaning. And yet they are lightweight, almost disposable, in their development and implementation. No committees were formed, no grants were obtained—they are quick and creative: hacks in the best sense of the word. Bots are an example of how digital skills and tools allow us to try things, to build and play, without any expectation of significance or impact. We can experiment with the parameters of access.

A number of articles on the value of serendipity have considered how collection bots, like @TroveNewsBot, can puncture our research expectations. The random offerings of bots might offer new modes of discovery. In [&#39;Technologies of Serendipity&#39;](https://muse.jhu.edu/article/585974), Paul Fyfe argues:

&gt; For scholars or other readers, discovery results less from directed searching than from all the tangents encountered on the way. Thus, sources which are plural, redundant, and tangent-rich help promote discovery by the proliferating contingencies of their usage.

Similarly, [Brett Holman notes](https://doi.org/10.17613/9h30-ke82) that his own Trove bots help him make connections in his research:

&gt; By impinging on my consciousness when I am preoccupied by other things, @TroveAirRaidBot’s tweets draw my mind back to this research topic that is always sitting at the back of my mind somewhere, and it makes me make connections – randomly, haphazardly, but often very fruitfully leading me to think of something I hadn’t thought of before, or reminding me of something I’d forgotten, or juxtaposing some seemingly unrelated things. It’s a kind of directed serendipity.

Trove Twitter bots were also entry points and interventions – challenging our understanding of access. They offered playful demonstrations of how our experience of GLAM collections might be different. Mitchell Whitelaw [suggested that such creations](http://olh.openlibhums.org/articles/10.16995/olh.291/):

&gt; reflect an emerging interest in collections as active sites of meaning-making, and experimentation with how we might encounter such collections in an everyday digital environment.

In [&#39;Life on the outside&#39;](https://doi.org/10.5281/zenodo.3566879), I considered the lives that GLAM collections might lead beyond institutional confines:

&gt; These bots do not simply present collection items outside of the familiar context of discovery interfaces or online exhibitions, they move the encounter itself into a wholly new space. ... Twitter bots loosen the institutional context of collections to allow them to participate in a space where people already congregate. They send collection items out into the wilds of the web, to find new meanings, new connections and perhaps even new love.

The promise of serendipitous discovery has now faded with the poisoning of social media spaces, and the retreat of many GLAM organisations from experimentation and openness. The need to control now carries more weight than the gift of creativity. 

## What remains

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-06-18-17-33-22.png&#34; width=&#34;600&#34; height=&#34;653&#34; alt=&#34;Photograph of a Raspberry Pi on a table top. Stck on to the top of the Pi is a photo of a robot from Trove&#39;s newspapers – this photo was also used as TroveNewsBot&#39;s avatar on Twitter.&#34;&gt;

I migrated [@TroveNewsBot to the Fediverse](https://wraggebots.net/@trovenewsbot) in May 2023, but sadly it was killed when [NLA gatekeepers cancelled my Trove API keys](https://updates.timsherratt.org/2025/02/24/years-of-work-on-trove.html) without warning in January 2025.

A number of other Trove bots have survived the Twitter implosion and found their way to alternate platforms. [@DigitiseTheDawn](https://ausglam.space/@digitisethedawn) now shares articles on the Fediverse, while [@TrovePenguinBot](https://bsky.app/profile/trovepenguinbot.bsky.social) is pursuing sardines on Bluesky. Brett Holman has created new versions of his aviation-themed bots – [@TroveAirBot](https://bsky.app/profile/troveairbot.airminded.org), [@TroveAirRaidBot](https://bsky.app/profile/troveairraidbot.airminded.org), and [@TroveUFOBot](https://bsky.app/profile/troveufobot.airminded.org) – on Bluesky. I&#39;d be happy to add the details of any other survivors I might have missed.

In an odd coincidence, recent months have brought [new restrictions on access to Trove API keys](https://updates.timsherratt.org/2025/05/07/farewell-trove.html), and an announcement of the end of Glitch. There&#39;s no going back.

ActivityPub and the Fediverse seem to offer new digital channels through which collections might flow and connect. See, for example, [Aaron Straup Cope&#39;s work](https://millsfield.sfomuseum.org/blog/2024/03/12/activitypub/) at the SFO Museum. But how do we support and encourage this type of experimentation? 

Personally speaking, this year&#39;s been pretty shit so far, and I&#39;ve been having trouble finding any motivation. But in pulling together these notes I found a section in [&#39;Unremembering the forgotten&#39;](https://dhdebates.gc.cuny.edu/read/untitled-f2acf72c-a469-49d8-be35-67f9ac1e3a60/) that reminded me of what&#39;s at stake:

&gt; There is no open access to the past. There is no key we can enter to recall a life. I create these projects not because I want to contribute to some form of national memory, but because I want to unsettle what it means to remember: to go beyond the listing of names and the cataloging of files to develop modes of access that are confusing, challenging, inspiring, uncomfortable, and sometimes creepy.

There&#39;s still plenty of work to do.



[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.15694209.svg)](https://doi.org/10.5281/zenodo.15694209)

</source:markdown>
    </item>
    
    <item>
      <title>Some Archives Week goodies</title>
      <link>https://updates.timsherratt.org/2025/06/11/some-archives-week-goodies.html</link>
      <pubDate>Wed, 11 Jun 2025 17:40:11 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/06/11/some-archives-week-goodies.html</guid>
      <description>&lt;p&gt;It&amp;rsquo;s &lt;a href=&#34;https://www.ica.org/international-archives-week-2025-archives-are-accessible-archives-for-everyone/&#34;&gt;International Archives Week&lt;/a&gt; and I&amp;rsquo;m feeling a bit crook after being double-vaxxed yesterday, so instead of doing something productive, I&amp;rsquo;m just going to make a list of potentially handy archives-related resources from the Wonderful World of Wragge(TM).&lt;/p&gt;
&lt;p&gt;The theme of Archives Week is &lt;strong&gt;#ArchivesAreAccessible&lt;/strong&gt;, which you&amp;rsquo;d have to regard as rather aspirational given the various ways access is limited by law, policy, practice, technology, and history. But what the heck, discussions about &lt;a href=&#34;https://doi.org/10.5281/zenodo.5035855&#34;&gt;the meaning of &lt;em&gt;access&lt;/em&gt;&lt;/a&gt; are always welcome. It&amp;rsquo;s also a little jarring to see the #ArchivesAreAccessible theme being promoted by the National Archives of Australia just a few weeks after they &lt;a href=&#34;https://updates.timsherratt.org/2025/05/19/no-more-harvesting-data-from.html&#34;&gt;implemented new restrictions that make it impossible to get machine-readable data out of their online database&lt;/a&gt;, RecordSearch. But I&amp;rsquo;m trying to move on, so&amp;hellip;&lt;/p&gt;
&lt;h2 id=&#34;zotero&#34;&gt;Zotero&lt;/h2&gt;
&lt;p&gt;All Australian archives users should have Zotero installed. Through the magic of user-contributed &amp;lsquo;translators&amp;rsquo;, &lt;a href=&#34;https://www.zotero.org/&#34;&gt;Zotero&lt;/a&gt; can capture structured data and digitised images from a variety of collections, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://ozglam.chat/t/zotero-translator-for-recordsearch-updated/27&#34;&gt;National Archives of Australia&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2024/08/22/new-zotero-translators.html&#34;&gt;PROV&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2024/08/22/new-zotero-translators.html&#34;&gt;Queensland State Archives&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2022/07/14/calling-all-tasmanian.html&#34;&gt;State Library and Archives of Tasmania&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;State Records Office of WA&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first four are my work, so &lt;a href=&#34;https://timsherratt.au/&#34;&gt;let me know&lt;/a&gt; if you have any suggestions or problems.&lt;/p&gt;
&lt;h2 id=&#34;indexes-to-records&#34;&gt;Indexes to records&lt;/h2&gt;
&lt;p&gt;Archives are well represented in the GLAM Workbench&amp;rsquo;s &lt;a href=&#34;https://glam-workbench.net/glam-datasets-from-gov-portals/&#34;&gt;list of GLAM datasets shared through government open data portals&lt;/a&gt;. Many of these datasets are indexes that link records to people and places. They&amp;rsquo;re openly licensed and &lt;a href=&#34;https://muse.jhu.edu/article/794331&#34;&gt;under used&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;NSW State Archives has also compiled a lot of indexes. These aren&amp;rsquo;t shared through a portal, but you can &lt;a href=&#34;https://glam-workbench.net/nsw-state-archives/&#34;&gt;harvest them from their website&lt;/a&gt;. To save you the effort, I&amp;rsquo;ve &lt;a href=&#34;https://glam-workbench.net/nsw-state-archives/index-repository/&#34;&gt;created a repository of the harvested indexes&lt;/a&gt;.&lt;/p&gt;
 &lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-04-09-15-19-57.png&#34; width=&#34;600&#34; height=&#34;624&#34; alt=&#34;Screenshot of the main search page of GLAM Name Indexes&#34;&gt;
&lt;p&gt;I&amp;rsquo;ve pulled many of these sources together to build a mega database of name indexes that lets you search for people across millions (yes millions) of records. As well as the sources described above, it also includes &lt;a href=&#34;https://updates.timsherratt.org/2025/04/09/more-than-million-rows-of.html&#34;&gt;people-related records from the Public Record Office Victoria&amp;rsquo;s API&lt;/a&gt;. All together, the &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Index Search&lt;/a&gt; contains almost 13 million records in 293 datasets from 10 GLAM organisations. And unlike the commercial genealogical databases, it&amp;rsquo;s free! (If only Australian libraries and archives would link to it from their family history guides&amp;hellip;)&lt;/p&gt;
&lt;h2 id=&#34;other-datasets&#34;&gt;Other datasets&lt;/h2&gt;
&lt;p&gt;Before my scrapers were scuppered by the NAA, I managed to compile a few datasets. Much of this data documents the way RecordSearch itself has changed, and while it might not be of use to researchers seeking particular records, it could &lt;a href=&#34;https://updates.timsherratt.org/2024/09/20/preserving-the-history.html&#34;&gt;help future researchers&lt;/a&gt; who are trying to understand the impact of online collections on the practice of history. This includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Summary data about all series in RecordSearch – a CSV file containing basic descriptive information about all the series  currently registered on RecordSearch as well as the total number of  items described, digitised, and in each access category. Harvests from &lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/blob/master/series_totals_May_2021.csv&#34;&gt;May 2021&lt;/a&gt; and &lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/blob/master/series_totals_April_2022.csv&#34;&gt;April 2022&lt;/a&gt; are currently available, and I&amp;rsquo;ll soon be adding May 2025.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.14744050&#34;&gt;Files digitised by the National Archives of Australia since 2021&lt;/a&gt; – annual compilations of data harvested from RecordSearch&amp;rsquo;s list of recently digitised files. (The automated weekly harvests are now dead.)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.14769172&#34;&gt;Records held by the National Archives of Australia with the access status of &amp;lsquo;closed&amp;rsquo;&lt;/a&gt; – a &lt;em&gt;whole decade&lt;/em&gt; of annual harvests of records held by the NAA that have the access status of &amp;lsquo;closed&amp;rsquo; (withheld from public  access). The harvests were run on or about 1 January each year from 2016 to 2025. The aim in saving this data is to enable long-term analysis of the NAA&amp;rsquo;s access examination process.&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2022/071c7cad60.png&#34; width=&#34;600&#34; height=&#34;571&#34; alt=&#34;Screenshot of visualisation showing the relationships between Australian government agencies over time&#34;&gt;
&lt;p&gt;Thanks to &lt;a href=&#34;https://wikimedia.org.au/wiki/Exploring_government_departments_by_linking_Wikidata_to_the_National_Archives_of_Australia&#34;&gt;support from Wikimedia Australia&lt;/a&gt;, I&amp;rsquo;ve also added information about Australian government agencies from RecordSearch to WikiData. As a result you can get a list of Australian government departments since Federation using this &lt;a href=&#34;https://w.wiki/5tVh&#34;&gt;Wikidata query&lt;/a&gt;. I&amp;rsquo;ve used the data to build &lt;a href=&#34;https://glam-workbench.net/wikidata/examples/govt-agencies-network.html&#34;&gt;this interactive visualisation&lt;/a&gt; of the relationships between government departments. There&amp;rsquo;s some more examples in the &lt;a href=&#34;https://glam-workbench.net/wikidata/&#34;&gt;Wikidata section of the GLAM Workbench&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Also from the NAA is my &lt;a href=&#34;https://github.com/wragge/diy-redactionart&#34;&gt;collection of #redactionart&lt;/a&gt; found in ASIO surveillance files.&lt;/p&gt;
&lt;p&gt;As part of the &lt;a href=&#34;https://www.realfaceofwhiteaustralia.net/&#34;&gt;Real Face of White Australia&lt;/a&gt; project, we&amp;rsquo;ve been transcribing records created by the administration of the White Australia Policy, now held by the NAA. Some of the results are available in &lt;a href=&#34;https://github.com/wragge/realface-data&#34;&gt;this data repository&lt;/a&gt;. (Note to self – I need to update this with the latest data!)&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/anu-archives/&#34;&gt;ANU Archives section of the GLAM Workbench&lt;/a&gt; includes some datasets extracted from the Sydney Stock exchange stock and share lists. (Just noticed some CloudStor links in there that I need to fix&amp;hellip;)&lt;/p&gt;
&lt;h2 id=&#34;public-record-office-victoria&#34;&gt;Public Record Office Victoria&lt;/h2&gt;
&lt;p&gt;PROV gets it&amp;rsquo;s own section because, as far as I know, they&amp;rsquo;re the only Australian archives with a &lt;a href=&#34;https://prov.vic.gov.au/about-us/our-blog/new-prov-public-api&#34;&gt;functioning public API&lt;/a&gt;. (Brief moment of silence to remember the APIs that have come and gone over the years&amp;hellip;).&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s now a &lt;a href=&#34;https://glam-workbench.net/prov/&#34;&gt;PROV section of the GLAM Workbench&lt;/a&gt;, that includes &lt;a href=&#34;https://glam-workbench.net/prov/getting-started/&#34;&gt;a &amp;lsquo;getting started&amp;rsquo; notebook&lt;/a&gt; to document the basic functionality of the API. There&amp;rsquo;s some &lt;a href=&#34;https://updates.timsherratt.org/2025/04/30/new-prov-section-added-to.html&#34;&gt;more information in this blog post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve also used the PROV API to create &lt;a href=&#34;https://updates.timsherratt.org/2025/04/10/using-the-public-record-office.html&#34;&gt;an automated data dashboard&lt;/a&gt; that provides an overview of their collection. It&amp;rsquo;s updated every Sunday.&lt;/p&gt;
&lt;h2 id=&#34;other-things&#34;&gt;Other things&lt;/h2&gt;
&lt;p&gt;RecordSearch users will understand the frustration of trying to share a url to a record, only to get an annoying error. There are a few ways around this (Zotero saves persistent links to things you save), but for a quick fix I created a simple tool to &lt;a href=&#34;https://recordsearch-links.glitch.me/&#34;&gt;create persistent links in RecordSearch&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re comfortable with a little browser hacking, you can also &lt;a href=&#34;https://gist.github.com/wragge/b2af9dc56f7cb0a9476b#file-recordsearch_show_pages-user-js&#34;&gt;install this handy RecordSearch userscript&lt;/a&gt; (scroll to the bottom for installation instructions). It improves the functionality of RecordSearch in a few different ways, such as by indicating the number of pages in a digitised file.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/68747470733a2f2f646c2e64726f70626f7875736572636f6e74656e742e636f6d2f732f666.png&#34; width=&#34;600&#34; height=&#34;467&#34; alt=&#34;Screenshot of RecordSearch showing the number of pages in digitised files&#34;&gt;
&lt;h2 id=&#34;any-ideas&#34;&gt;Any ideas?&lt;/h2&gt;
&lt;p&gt;If you have any ideas for additional resources or datasets, or you&amp;rsquo;re having problems with an online collection, feel free to drop a note in the &lt;a href=&#34;https://github.com/GLAM-Workbench/glam-workbench.github.io/issues&#34;&gt;GLAM Workbench repository&lt;/a&gt;.&lt;/p&gt;
</description>
      <source:markdown>It&#39;s [International Archives Week](https://www.ica.org/international-archives-week-2025-archives-are-accessible-archives-for-everyone/) and I&#39;m feeling a bit crook after being double-vaxxed yesterday, so instead of doing something productive, I&#39;m just going to make a list of potentially handy archives-related resources from the Wonderful World of Wragge(TM).

The theme of Archives Week is **#ArchivesAreAccessible**, which you&#39;d have to regard as rather aspirational given the various ways access is limited by law, policy, practice, technology, and history. But what the heck, discussions about [the meaning of *access*](https://doi.org/10.5281/zenodo.5035855) are always welcome. It&#39;s also a little jarring to see the #ArchivesAreAccessible theme being promoted by the National Archives of Australia just a few weeks after they [implemented new restrictions that make it impossible to get machine-readable data out of their online database](https://updates.timsherratt.org/2025/05/19/no-more-harvesting-data-from.html), RecordSearch. But I&#39;m trying to move on, so...

## Zotero

All Australian archives users should have Zotero installed. Through the magic of user-contributed &#39;translators&#39;, [Zotero](https://www.zotero.org/) can capture structured data and digitised images from a variety of collections, including:

- [National Archives of Australia](https://ozglam.chat/t/zotero-translator-for-recordsearch-updated/27) 
- [PROV](https://updates.timsherratt.org/2024/08/22/new-zotero-translators.html)
- [Queensland State Archives](https://updates.timsherratt.org/2024/08/22/new-zotero-translators.html)
- [State Library and Archives of Tasmania](https://updates.timsherratt.org/2022/07/14/calling-all-tasmanian.html)
- State Records Office of WA

The first four are my work, so [let me know](https://timsherratt.au/) if you have any suggestions or problems.

## Indexes to records

Archives are well represented in the GLAM Workbench&#39;s [list of GLAM datasets shared through government open data portals](https://glam-workbench.net/glam-datasets-from-gov-portals/). Many of these datasets are indexes that link records to people and places. They&#39;re openly licensed and [under used](https://muse.jhu.edu/article/794331)!

NSW State Archives has also compiled a lot of indexes. These aren&#39;t shared through a portal, but you can [harvest them from their website](https://glam-workbench.net/nsw-state-archives/). To save you the effort, I&#39;ve [created a repository of the harvested indexes](https://glam-workbench.net/nsw-state-archives/index-repository/).

 &lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-04-09-15-19-57.png&#34; width=&#34;600&#34; height=&#34;624&#34; alt=&#34;Screenshot of the main search page of GLAM Name Indexes&#34;&gt;

I&#39;ve pulled many of these sources together to build a mega database of name indexes that lets you search for people across millions (yes millions) of records. As well as the sources described above, it also includes [people-related records from the Public Record Office Victoria&#39;s API](https://updates.timsherratt.org/2025/04/09/more-than-million-rows-of.html). All together, the [GLAM Name Index Search](https://glam-workbench.net/name-search/) contains almost 13 million records in 293 datasets from 10 GLAM organisations. And unlike the commercial genealogical databases, it&#39;s free! (If only Australian libraries and archives would link to it from their family history guides...)

## Other datasets

Before my scrapers were scuppered by the NAA, I managed to compile a few datasets. Much of this data documents the way RecordSearch itself has changed, and while it might not be of use to researchers seeking particular records, it could [help future researchers](https://updates.timsherratt.org/2024/09/20/preserving-the-history.html) who are trying to understand the impact of online collections on the practice of history. This includes:

- Summary data about all series in RecordSearch – a CSV file containing basic descriptive information about all the series  currently registered on RecordSearch as well as the total number of  items described, digitised, and in each access category. Harvests from [May 2021](https://github.com/GLAM-Workbench/recordsearch/blob/master/series_totals_May_2021.csv) and [April 2022](https://github.com/GLAM-Workbench/recordsearch/blob/master/series_totals_April_2022.csv) are currently available, and I&#39;ll soon be adding May 2025.
- [Files digitised by the National Archives of Australia since 2021](https://doi.org/10.5281/zenodo.14744050) – annual compilations of data harvested from RecordSearch&#39;s list of recently digitised files. (The automated weekly harvests are now dead.)
- [Records held by the National Archives of Australia with the access status of &#39;closed&#39;](https://doi.org/10.5281/zenodo.14769172) – a *whole decade* of annual harvests of records held by the NAA that have the access status of &#39;closed&#39; (withheld from public  access). The harvests were run on or about 1 January each year from 2016 to 2025. The aim in saving this data is to enable long-term analysis of the NAA&#39;s access examination process.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2022/071c7cad60.png&#34; width=&#34;600&#34; height=&#34;571&#34; alt=&#34;Screenshot of visualisation showing the relationships between Australian government agencies over time&#34;&gt;

Thanks to [support from Wikimedia Australia](https://wikimedia.org.au/wiki/Exploring_government_departments_by_linking_Wikidata_to_the_National_Archives_of_Australia), I&#39;ve also added information about Australian government agencies from RecordSearch to WikiData. As a result you can get a list of Australian government departments since Federation using this [Wikidata query](https://w.wiki/5tVh). I&#39;ve used the data to build [this interactive visualisation](https://glam-workbench.net/wikidata/examples/govt-agencies-network.html) of the relationships between government departments. There&#39;s some more examples in the [Wikidata section of the GLAM Workbench](https://glam-workbench.net/wikidata/).

Also from the NAA is my [collection of #redactionart](https://github.com/wragge/diy-redactionart) found in ASIO surveillance files.

As part of the [Real Face of White Australia](https://www.realfaceofwhiteaustralia.net/) project, we&#39;ve been transcribing records created by the administration of the White Australia Policy, now held by the NAA. Some of the results are available in [this data repository](https://github.com/wragge/realface-data). (Note to self – I need to update this with the latest data!)

The [ANU Archives section of the GLAM Workbench](https://glam-workbench.net/anu-archives/) includes some datasets extracted from the Sydney Stock exchange stock and share lists. (Just noticed some CloudStor links in there that I need to fix...)

## Public Record Office Victoria

PROV gets it&#39;s own section because, as far as I know, they&#39;re the only Australian archives with a [functioning public API](https://prov.vic.gov.au/about-us/our-blog/new-prov-public-api). (Brief moment of silence to remember the APIs that have come and gone over the years...). 

There&#39;s now a [PROV section of the GLAM Workbench](https://glam-workbench.net/prov/), that includes [a &#39;getting started&#39; notebook](https://glam-workbench.net/prov/getting-started/) to document the basic functionality of the API. There&#39;s some [more information in this blog post](https://updates.timsherratt.org/2025/04/30/new-prov-section-added-to.html).

I&#39;ve also used the PROV API to create [an automated data dashboard](https://updates.timsherratt.org/2025/04/10/using-the-public-record-office.html) that provides an overview of their collection. It&#39;s updated every Sunday.

## Other things

RecordSearch users will understand the frustration of trying to share a url to a record, only to get an annoying error. There are a few ways around this (Zotero saves persistent links to things you save), but for a quick fix I created a simple tool to [create persistent links in RecordSearch](https://recordsearch-links.glitch.me/).

If you&#39;re comfortable with a little browser hacking, you can also [install this handy RecordSearch userscript](https://gist.github.com/wragge/b2af9dc56f7cb0a9476b#file-recordsearch_show_pages-user-js) (scroll to the bottom for installation instructions). It improves the functionality of RecordSearch in a few different ways, such as by indicating the number of pages in a digitised file.

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/68747470733a2f2f646c2e64726f70626f7875736572636f6e74656e742e636f6d2f732f666.png&#34; width=&#34;600&#34; height=&#34;467&#34; alt=&#34;Screenshot of RecordSearch showing the number of pages in digitised files&#34;&gt;

## Any ideas?

If you have any ideas for additional resources or datasets, or you&#39;re having problems with an online collection, feel free to drop a note in the [GLAM Workbench repository](https://github.com/GLAM-Workbench/glam-workbench.github.io/issues).



</source:markdown>
    </item>
    
    <item>
      <title>New dataset – Trove links shared on Twitter, 2009 to 2020</title>
      <link>https://updates.timsherratt.org/2025/06/10/new-dataset-trove-links-shared.html</link>
      <pubDate>Tue, 10 Jun 2025 12:30:54 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/06/10/new-dataset-trove-links-shared.html</guid>
      <description>&lt;p&gt;A few years ago, I harvested the details of tweets that included links to Trove. The data has just been sitting on my computer, so I thought I should package it up and share, in case it&amp;rsquo;s of use to anyone.&lt;/p&gt;
&lt;p&gt;The story is that back in 2021, I was working on the article &lt;a href=&#34;https://doi.org/10.5281/zenodo.5595420&#34;&gt;&amp;lsquo;More than newspapers&amp;rsquo;&lt;/a&gt; for a special section of &lt;em&gt;History Australia&lt;/em&gt; focusing on Trove. I was thinking that I might include something about the way Trove newspaper articles were mobilised within online discussions about history – a topic I first explored in &lt;a href=&#34;https://doi.org/10.5281/zenodo.3566879&#34;&gt;&amp;lsquo;Life on the outside: connections, contexts, and the wild, wild web&amp;rsquo;&lt;/a&gt;, my keynote for the Annual Conference of the Japanese Association of Digital Humanities in 2014. In the end, the article went in another direction, so I didn&amp;rsquo;t use the data.&lt;/p&gt;
&lt;p&gt;I remembered this recently and thought I should I should do something with it. I&amp;rsquo;ve now created a dataset and &lt;a href=&#34;https://doi.org/10.5281/zenodo.15627800&#34;&gt;shared it on Zenodo&lt;/a&gt;. I&amp;rsquo;m &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;not working on Trove any more&lt;/a&gt;, but I&amp;rsquo;m hoping that someone else might find the data useful!&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.15694063&#34;&gt;&lt;img src=&#34;https://zenodo.org/badge/DOI/10.5281/zenodo.15694063.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The dataset contains information about tweets from 2009 to 2020 that include links to &lt;a href=&#34;https://trove.nla.gov.au/&#34;&gt;Trove&lt;/a&gt;. The tweet data was compiled using &lt;a href=&#34;https://twarc-project.readthedocs.io/en/latest/&#34;&gt;Twarc&lt;/a&gt; in May 2021, under Twitter&amp;rsquo;s academic access program. The search queries used were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;url:nla.gov.au/nla.news&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;url:trove.nla.gov.au&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;url:newspapers.nla.gov.au&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Many of the tweets were produced by bots. Fortunately, I&amp;rsquo;d been maintaining a list of &lt;a href=&#34;https://web.archive.org/web/20180627053546/https://twitter.com/wragge/lists/trove-bots/members&#34;&gt;Trove bots&lt;/a&gt; on Twitter, so I used the list to separate the tweets into two files, one for bots and one for ordinary users.&lt;/p&gt;
&lt;p&gt;To respect user intentions and comply with the Twitter API terms of use, I removed all the tweet information except for &lt;code&gt;tweet_id&lt;/code&gt; and &lt;code&gt;tweet_date&lt;/code&gt; from the files. If it hasn&amp;rsquo;t been deleted, the full data for each tweet can probably be obtained from the X API using the &lt;code&gt;tweet_id&lt;/code&gt;, though you might need a paid subscription.&lt;/p&gt;
&lt;p&gt;The two main files are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;trove_url_tweets.csv&lt;/code&gt; – links shared by human users (although it may include some unidentified bots)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;trove_url_tweets_bots.csv&lt;/code&gt; – links shared by bots&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I also created some additional data files:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;trove_url_totals.csv&lt;/code&gt; – the number of times each Trove link was shared by users (not including bots)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;active_users_per_year.csv&lt;/code&gt; – the number of unique users each year who shared a link to Trove&lt;/li&gt;
&lt;li&gt;&lt;code&gt;active_bots_per_year.csv&lt;/code&gt; – the number of active bots each year sharing links to Trove&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There&amp;rsquo;s more information about the structure and contents of the data files &lt;a href=&#34;https://doi.org/10.5281/zenodo.15694063&#34;&gt;in the Zenodo record&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;overview&#34;&gt;Overview&lt;/h2&gt;
&lt;p&gt;I haven&amp;rsquo;t explored the data in detail, but here&amp;rsquo;s some quick summaries to give you a taste.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;summary&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;number of unique users sharing Trove links&lt;/td&gt;
&lt;td&gt;9,296&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;number of bots sharing Trove links&lt;/td&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;number of tweets by humans containing Trove links&lt;/td&gt;
&lt;td&gt;48,323&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;number of tweets by bots containing Trove links&lt;/td&gt;
&lt;td&gt;318,767&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;number of unique links shared by humans&lt;/td&gt;
&lt;td&gt;36,906&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;number of unique links shared by bots&lt;/td&gt;
&lt;td&gt;270,474&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;What types of links were people sharing?&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;types of link shared by humans&lt;/th&gt;
&lt;th&gt;count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;newspaper article&lt;/td&gt;
&lt;td&gt;34,568&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;other (search queries, home page etc)&lt;/td&gt;
&lt;td&gt;8,388&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;work (items other than newspapers – books, maps, photos etc)&lt;/td&gt;
&lt;td&gt;4,856&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;newspaper page&lt;/td&gt;
&lt;td&gt;1,378&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;newspaper title&lt;/td&gt;
&lt;td&gt;406&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;How did the number of links shared by humans vary across time?&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/tweets-per-year-updated.png&#34; width=&#34;600&#34; height=&#34;306&#34; alt=&#34;Bar chart showing the number of Trove links shared on Twitter by year from 2009 to 2020. Colours indicate the type of Trove resource.&#34;&gt;
&lt;p&gt;Which articles or pages were shared most often by humans? Here&amp;rsquo;s the top ten (click on the link to view).&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;trove_id&lt;/th&gt;
&lt;th&gt;trove_type&lt;/th&gt;
&lt;th&gt;tweets&lt;/th&gt;
&lt;th&gt;retweets&lt;/th&gt;
&lt;th&gt;quotes&lt;/th&gt;
&lt;th&gt;total times shared&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/75869223&#34;&gt;75869223&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;1,232&lt;/td&gt;
&lt;td&gt;61&lt;/td&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;td&gt;1,327&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/1298497&#34;&gt;1298497&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;141&lt;/td&gt;
&lt;td&gt;1,028&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;1,222&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/102074798&#34;&gt;102074798&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;74&lt;/td&gt;
&lt;td&gt;693&lt;/td&gt;
&lt;td&gt;77&lt;/td&gt;
&lt;td&gt;844&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/68141866&#34;&gt;68141866&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;138&lt;/td&gt;
&lt;td&gt;522&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;708&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/41602327&#34;&gt;41602327&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;633&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;663&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/100645214&#34;&gt;100645214&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;111&lt;/td&gt;
&lt;td&gt;467&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;598&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/page/502650&#34;&gt;502650&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;page&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;513&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;526&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/60828173&#34;&gt;60828173&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;444&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;511&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/4173156&#34;&gt;4173156&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;321&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;384&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/79410604&#34;&gt;79410604&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;303&lt;/td&gt;
&lt;td&gt;69&lt;/td&gt;
&lt;td&gt;374&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The most shared article reports that PM Menzies had described Hitler as a &amp;lsquo;great man&amp;rsquo; at a meeting in July 1939. However, most of the tweets sharing this link came from a single user. A number of the other articles relate to the weather, a reflection of the fact that Trove&amp;rsquo;s newspaper articles have been mobilised on both sides of the climate change debate.&lt;/p&gt;
&lt;p&gt;How many Twitter users were sharing links to Trove each year?&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/tweets-humans-per-year-updated.png&#34; width=&#34;600&#34; height=&#34;340&#34; alt=&#34;Bar chart showing the number of Twitter users sharing links to Trove each year from 2009 to 2020&#34;&gt;
&lt;p&gt;I haven&amp;rsquo;t included any of the bot data in these summaries because I think I&amp;rsquo;ll write a second bot-themed post – coming soon!&lt;/p&gt;
&lt;h2 id=&#34;updates&#34;&gt;Updates&lt;/h2&gt;
&lt;p&gt;I updated the data in this post on 19 June 2025, as I realised some Twitter accounts were originally run by humans before being bot-ified.&lt;/p&gt;
</description>
      <source:markdown>A few years ago, I harvested the details of tweets that included links to Trove. The data has just been sitting on my computer, so I thought I should package it up and share, in case it&#39;s of use to anyone.

The story is that back in 2021, I was working on the article [&#39;More than newspapers&#39;](https://doi.org/10.5281/zenodo.5595420) for a special section of *History Australia* focusing on Trove. I was thinking that I might include something about the way Trove newspaper articles were mobilised within online discussions about history – a topic I first explored in [&#39;Life on the outside: connections, contexts, and the wild, wild web&#39;](https://doi.org/10.5281/zenodo.3566879), my keynote for the Annual Conference of the Japanese Association of Digital Humanities in 2014. In the end, the article went in another direction, so I didn&#39;t use the data.

I remembered this recently and thought I should I should do something with it. I&#39;ve now created a dataset and [shared it on Zenodo](https://doi.org/10.5281/zenodo.15627800). I&#39;m [not working on Trove any more](https://updates.timsherratt.org/2025/05/07/farewell-trove.html), but I&#39;m hoping that someone else might find the data useful!

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.15694063.svg)](https://doi.org/10.5281/zenodo.15694063)

The dataset contains information about tweets from 2009 to 2020 that include links to [Trove](https://trove.nla.gov.au/). The tweet data was compiled using [Twarc](https://twarc-project.readthedocs.io/en/latest/) in May 2021, under Twitter&#39;s academic access program. The search queries used were:

- `url:nla.gov.au/nla.news`
- `url:trove.nla.gov.au`
- `url:newspapers.nla.gov.au`

Many of the tweets were produced by bots. Fortunately, I&#39;d been maintaining a list of [Trove bots](https://web.archive.org/web/20180627053546/https://twitter.com/wragge/lists/trove-bots/members) on Twitter, so I used the list to separate the tweets into two files, one for bots and one for ordinary users.

To respect user intentions and comply with the Twitter API terms of use, I removed all the tweet information except for `tweet_id` and `tweet_date` from the files. If it hasn&#39;t been deleted, the full data for each tweet can probably be obtained from the X API using the `tweet_id`, though you might need a paid subscription.

The two main files are:

- `trove_url_tweets.csv` – links shared by human users (although it may include some unidentified bots)
- `trove_url_tweets_bots.csv` – links shared by bots

I also created some additional data files:

- `trove_url_totals.csv` – the number of times each Trove link was shared by users (not including bots)
- `active_users_per_year.csv` – the number of unique users each year who shared a link to Trove
- `active_bots_per_year.csv` – the number of active bots each year sharing links to Trove

There&#39;s more information about the structure and contents of the data files [in the Zenodo record](https://doi.org/10.5281/zenodo.15694063).

## Overview

I haven&#39;t explored the data in detail, but here&#39;s some quick summaries to give you a taste.

| summary                                            |         |
| ------------------------------------------------- | ------- |
| number of unique users sharing Trove links        | 9,296   |
| number of bots sharing Trove links                | 43      |
| number of tweets by humans containing Trove links | 48,323  |
| number of tweets by bots containing Trove links   | 318,767 |
| number of unique links shared by humans           | 36,906  |
| number of unique links shared by bots             | 270,474 |

What types of links were people sharing?

| types of link shared by humans                               | count  |
| ------------------------------------------------------------ | ------ |
| newspaper article                                            | 34,568 |
| other (search queries, home page etc)                        | 8,388  |
| work (items other than newspapers – books, maps, photos etc) | 4,856  |
| newspaper page                                               | 1,378  |
| newspaper title                                              | 406    |

How did the number of links shared by humans vary across time?

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/tweets-per-year-updated.png&#34; width=&#34;600&#34; height=&#34;306&#34; alt=&#34;Bar chart showing the number of Trove links shared on Twitter by year from 2009 to 2020. Colours indicate the type of Trove resource.&#34;&gt;

Which articles or pages were shared most often by humans? Here&#39;s the top ten (click on the link to view).

| trove_id                                                     | trove_type | tweets | retweets | quotes | total times shared |
| ------------------------------------------------------------ | ---------- | ------ | -------- | ------ | ------------------ |
| [75869223](https://trove.nla.gov.au/newspaper/article/75869223) | article    | 1,232  | 61       | 34     | 1,327              |
| [1298497](https://trove.nla.gov.au/newspaper/article/1298497) | article    | 141    | 1,028    | 53     | 1,222              |
| [102074798](https://trove.nla.gov.au/newspaper/article/102074798) | article    | 74     | 693      | 77     | 844                |
| [68141866](https://trove.nla.gov.au/newspaper/article/68141866) | article    | 138    | 522      | 48     | 708                |
| [41602327](https://trove.nla.gov.au/newspaper/article/41602327) | article    | 633    | 30       | 0      | 663                |
| [100645214](https://trove.nla.gov.au/newspaper/article/100645214) | article    | 111    | 467      | 20     | 598                |
| [502650](https://trove.nla.gov.au/newspaper/page/502650)     | page       | 1      | 513      | 12     | 526                |
| [60828173](https://trove.nla.gov.au/newspaper/article/60828173) | article    | 48     | 444      | 19     | 511                |
| [4173156](https://trove.nla.gov.au/newspaper/article/4173156) | article    | 53     | 321      | 10     | 384                |
| [79410604](https://trove.nla.gov.au/newspaper/article/79410604) | article    | 2      | 303      | 69     | 374                |

The most shared article reports that PM Menzies had described Hitler as a &#39;great man&#39; at a meeting in July 1939. However, most of the tweets sharing this link came from a single user. A number of the other articles relate to the weather, a reflection of the fact that Trove&#39;s newspaper articles have been mobilised on both sides of the climate change debate. 

How many Twitter users were sharing links to Trove each year?

&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/tweets-humans-per-year-updated.png&#34; width=&#34;600&#34; height=&#34;340&#34; alt=&#34;Bar chart showing the number of Twitter users sharing links to Trove each year from 2009 to 2020&#34;&gt;

I haven&#39;t included any of the bot data in these summaries because I think I&#39;ll write a second bot-themed post – coming soon!

## Updates

I updated the data in this post on 19 June 2025, as I realised some Twitter accounts were originally run by humans before being bot-ified.
</source:markdown>
    </item>
    
    <item>
      <title>GLAM Workbench ­– preprint for &#39;Building User-Friendly Toolkits and Platforms for Digital Humanities&#39;</title>
      <link>https://updates.timsherratt.org/2025/06/05/glam-workbench-preprint-for-building.html</link>
      <pubDate>Thu, 05 Jun 2025 16:16:53 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/06/05/glam-workbench-preprint-for-building.html</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a preprint of my contribution to the publication &amp;lsquo;Building User-Friendly Toolkits and Platforms for Digital Humanities&amp;rsquo;. It provides a brief overview of the GLAM Workbench. I had to leave a lot out, but hopefully it provides a useful summary of what the GLAM Workbench is, and what I&amp;rsquo;d like it to be.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.15597924&#34;&gt;&lt;img src=&#34;https://zenodo.org/badge/DOI/10.5281/zenodo.15597924.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;The GLAM Workbench is a collection of tools and resources created to help researchers use and explore the digital collections of GLAM organisations (galleries, libraries, archives, and museums).&lt;sup id=&#34;fnref:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt; It&amp;rsquo;s mainly focused on collections from Australia and New Zealand, but some sections venture across international boundaries to explore topics such as web archives and Wikidata.&lt;/p&gt;
&lt;p&gt;GLAM organisations make a lot of rich cultural data available online, but getting that data in a machine-readable form that can be aggregated and analysed is often difficult. The GLAM Workbench tries to fill this gap by providing code examples and API documentation, but data access alone is not enough. Researchers need to understand the history, structure, and extent of the data – both its limits and its possibilities. By sharing snapshots, building overviews, and exploring patterns and inconsistencies, the GLAM Workbench also attempts to contextualise GLAM collections and open them to new types of questions.&lt;sup id=&#34;fnref:2&#34;&gt;&lt;a href=&#34;#fn:2&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&#34;history-and-motivation&#34;&gt;History and motivation&lt;/h2&gt;
&lt;p&gt;I created the GLAM Workbench in 2017, but it incorporates the latest versions of tools, such as the Trove Newspaper Harvester, which I&amp;rsquo;ve been maintaining for more than 15 years.&lt;sup id=&#34;fnref:3&#34;&gt;&lt;a href=&#34;#fn:3&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;3&lt;/a&gt;&lt;/sup&gt; One of my motivations was simply to bring together useful snippets, notes, and doodles from a variety of blog posts, web applications, and code repositories, and make them available in a form that could be more easily navigated and maintained.&lt;/p&gt;
&lt;p&gt;I was also keen to explore the way that Jupyter notebooks combine code and narrative. I wanted to find ways to support researchers as they developed their digital skills and confidence, not just dump them at the command line or point them to an app.&lt;/p&gt;
&lt;p&gt;The ongoing development of the GLAM Workbench is also part of my own research. I&amp;rsquo;m interested in the meaning of access within the context of GLAM collections. What changes when you can download data and explore collections beyond the limitations of the web interface?&lt;sup id=&#34;fnref:4&#34;&gt;&lt;a href=&#34;#fn:4&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&#34;contents-and-technologies&#34;&gt;Contents and technologies&lt;/h2&gt;
&lt;p&gt;At its heart, the GLAM Workbench comprises at least 171 Jupyter notebooks and 59 datasets shared through more than 70 GitHub repositories.&lt;sup id=&#34;fnref:5&#34;&gt;&lt;a href=&#34;#fn:5&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;5&lt;/a&gt;&lt;/sup&gt; Added to this are a number of web apps, online databases, and guides to related resources. Code from some notebooks has also been spun off into independent Python packages. All of this is brought together within a single documentation site, built using MkDocs Material.&lt;/p&gt;
&lt;p&gt;The contents are mostly organised by institution, reflecting the idiosyncrasies of the data. I&amp;rsquo;ve partially implemented tags to draw together similar resources across institutions, but this needs to be made more consistent, ideally using the TaDiRAH taxonomy.&lt;sup id=&#34;fnref:6&#34;&gt;&lt;a href=&#34;#fn:6&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;6&lt;/a&gt;&lt;/sup&gt; Many of the notebooks describe methods for accessing data and building datasets. Others demonstrate techniques for visualisation and analysis, suggest workarounds for limits imposed by collection interfaces, or provide example-driven documentation for APIs and datasets.&lt;/p&gt;
&lt;p&gt;There is no single platform or server underlying the GLAM Workbench. Instead, it follows a pattern described in the ARDC Community Data Lab&amp;rsquo;s architecture principles as &amp;lsquo;infrastructure at rest&amp;rsquo;.&lt;sup id=&#34;fnref:7&#34;&gt;&lt;a href=&#34;#fn:7&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;7&lt;/a&gt;&lt;/sup&gt; Notebooks can be run as required in a variety of contexts from cloud services to local computers. This is made possible by standardised configuration files and automated processes that build virtual computing environments from each GitHub repository.&lt;sup id=&#34;fnref:8&#34;&gt;&lt;a href=&#34;#fn:8&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;8&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&#34;impact-and-engagement&#34;&gt;Impact and engagement&lt;/h2&gt;
&lt;p&gt;The GLAM Workbench has helped to expand understanding of the research possibilities of GLAM collection data. The list of publications citing the GLAM Workbench or one of its embedded tools now includes more than 100 entries.&lt;sup id=&#34;fnref:9&#34;&gt;&lt;a href=&#34;#fn:9&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;9&lt;/a&gt;&lt;/sup&gt; Some of these relate to individual research projects, while others survey the practices of GLAM organisations and the needs of research infrastructure around the world.&lt;/p&gt;
&lt;p&gt;My work on the GLAM Workbench has helped inspire organisations such as the National Library of Scotland to explore new ways of supporting digital research.&lt;sup id=&#34;fnref:10&#34;&gt;&lt;a href=&#34;#fn:10&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;10&lt;/a&gt;&lt;/sup&gt; A recent report from the &amp;lsquo;Towards a National Collection&amp;rsquo; project in the UK has mentioned the GLAM Workbench alongside a number of national libraries in Europe and the USA for &amp;lsquo;encouraging innovative research and expanding public engagement with heritage resources&amp;rsquo;.&lt;sup id=&#34;fnref:11&#34;&gt;&lt;a href=&#34;#fn:11&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;11&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;And yet, there are disappointments. Most of the Australian GLAM organisations whose collections are featured in the GLAM Workbench have shown little interest in sharing or engaging with its resources. This makes it difficult to get tools to the people who could benefit from them. There&amp;rsquo;s some irony in the fact that the websites of the National Library of Scotland, the British Library, the UK National Archives, the V&amp;amp;A Museum, and DigitalNZ all include links to the GLAM Workbench, but the National Library of Australia (NLA) and the National Archives of Australia (NAA) do not.&lt;/p&gt;
&lt;h2 id=&#34;maintenance-and-sustainability&#34;&gt;Maintenance and sustainability&lt;/h2&gt;
&lt;p&gt;While a number of individuals have contributed notebooks and additions to the GLAM Workbench, it remains essentially a one man operation. Over the years, I&amp;rsquo;ve sought to ease the maintenance burden by automating processes, adding some basic testing frameworks, and generating machine-readable metadata that summarises the contents of each repository. For example, I created a GLAM Workbench repository template that makes it easy to start work on a new topic.&lt;sup id=&#34;fnref:12&#34;&gt;&lt;a href=&#34;#fn:12&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;12&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Development of the web archives section of the GLAM Workbench was made possible by a grant from the International Internet Preservation Consortium, and the section&amp;rsquo;s ongoing maintenance is supported by the British Library.&lt;sup id=&#34;fnref:13&#34;&gt;&lt;a href=&#34;#fn:13&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;13&lt;/a&gt;&lt;/sup&gt; I&amp;rsquo;m grateful too for my GitHub sponsors who help cover some of my cloud hosting bills, and to the ARDC for funding to integrate RO-Crate metadata.&lt;sup id=&#34;fnref:14&#34;&gt;&lt;a href=&#34;#fn:14&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;14&lt;/a&gt;&lt;/sup&gt; But beyond this, the GLAM Workbench has received no dedicated funding or institutional support. It has, nonetheless, outlived some well-funded digital infrastructure projects in the HASS sector.&lt;/p&gt;
&lt;p&gt;Sustainability means more than money, though. The GLAM Workbench doesn&amp;rsquo;t have to continue in its current form to have a long-term impact. My focus is on ensuring that its contents are open to future reuse and modification. Everything is openly licensed, published through GitHub, and preserved in Zenodo. If tools are useful they can live on, independent of me.&lt;/p&gt;
&lt;h2 id=&#34;the-future&#34;&gt;The future&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m writing this at a difficult time. Changes wrought by the NLA and NAA in early 2025 have made it impossible for me to continue work on the Trove and RecordSearch sections of the GLAM Workbench.&lt;sup id=&#34;fnref:15&#34;&gt;&lt;a href=&#34;#fn:15&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;15&lt;/a&gt;&lt;/sup&gt; In the Trove section alone, there are more than 70 notebooks.&lt;/p&gt;
&lt;p&gt;The GLAM Workbench is not my job, no-one pays me. I work on it because I think its useful and important, and because I enjoy the process of solving problems and helping researchers. The NLA&amp;rsquo;s actions, in particular, have robbed me of that joy, and made me consider whether I want to continue. Research infrastructure is people.&lt;/p&gt;
&lt;p&gt;On the other hand, there are many more GLAM collections for me to explore. I&amp;rsquo;m also hoping to find new ways of collaborating with individuals and institutions. I&amp;rsquo;m often inspired to create new tools and resources by gnarly questions from researchers. While such questions continue, so the GLAM Workbench will grow.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;
&lt;p&gt;Ames, Sarah, and Lucy Havens. “Exploring National Library of Scotland Datasets with Jupyter Notebooks.” &lt;em&gt;IFLA Journal&lt;/em&gt;, December 27, 2021. &lt;a href=&#34;https://doi.org/10.1177/03400352211065484&#34;&gt;doi.org/10.1177/0&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Bailey, Rebecca, Javier Pereda, Chris Michaels, and Tom Callahan. “Unlocking the Potential of Digital Collections. A Call to Action.” Towards a National Collection, November 21, 2024. &lt;a href=&#34;https://doi.org/10.5281/zenodo.13838916&#34;&gt;doi.org/10.5281/z&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Candela, Gustavo, Sally Chambers, and Tim Sherratt. “An Approach to Assess the Quality of Jupyter Projects Published by GLAM Institutions.” &lt;em&gt;Journal of the Association for Information Science and Technology&lt;/em&gt; 74, no. 13 (2023): 1550–64. &lt;a href=&#34;https://doi.org/10.1002/asi.24835&#34;&gt;doi.org/10.1002/a&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;“GLAM Workbench (GitHub Organisation).” Accessed June 5, 2025. &lt;a href=&#34;https://github.com/GLAM-Workbench&#34;&gt;github.com/GLAM-Work&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;IIPC. “Asking Questions with Web Archives – Introductory Notebooks for Historians.” Accessed June 5, 2025. &lt;a href=&#34;https://netpreserve.org/projects/jupyter-notebooks-for-historians/&#34;&gt;netpreserve.org/projects/&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Jackson, Andy. “GLAM Workbench Update.” UK Web Archive Blog. Accessed June 2, 2025. &lt;a href=&#34;https://blogs.bl.uk/webarchive/2022/09/glam-workbench-update.html&#34;&gt;blogs.bl.uk/webarchiv&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Sefton, Peter, Tom Honeyman, Tim Sherratt, and Conal Tuohy. “The ARDC Community Data Lab Architecture: Research Software Deployment Principles and Patterns for Integrity, Reproducibility and Sustainability,” May 10, 2024. &lt;a href=&#34;https://zenodo.org/records/11169744&#34;&gt;zenodo.org/records/1&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Sherratt, Tim. “Develop a New GLAM Workbench Repository.” GLAM Workbench. Accessed June 5, 2025. &lt;a href=&#34;https://glam-workbench.net/get-involved/developing-repositories/&#34;&gt;glam-workbench.net/get-invol&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “Farewell Trove.” &lt;em&gt;Tim Sherratt – Sharing Recent Updates and Work-in-Progress&lt;/em&gt;, May 7, 2025. &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;updates.timsherratt.org/2025/05/0&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “GLAM Workbench.” Zenodo, June 5, 2025. &lt;a href=&#34;https://doi.org/10.5281/zenodo.15597489&#34;&gt;doi.org/10.5281/z&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “GLAM Workbench.” Accessed June 5, 2025. &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;glam-workbench.net/.&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “GLAM Workbench Citations.” &lt;em&gt;GLAM Workbench&lt;/em&gt;. Accessed June 5, 2025. &lt;a href=&#34;https://glam-workbench.net/citations/&#34;&gt;glam-workbench.net/citations&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “Hacking Heritage: Understanding the Limits of Online Access.” In &lt;em&gt;The Routledge International Handbook of New Digital Practices in Galleries, Libraries, Archives, Museums and Heritage Sites&lt;/em&gt;, edited by H Lewi, W Smith, S Cooke, and D vom Lehn, 116–30. London &amp;amp; New York: Routledge, 2020. &lt;a href=&#34;https://doi.org/10.5281/zenodo.5035855&#34;&gt;doi.org/10.5281/z&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “No More Harvesting Data from the National Archives of Australia.” &lt;em&gt;Tim Sherratt – Sharing Recent Updates and Work-in-Progress&lt;/em&gt;, May 19, 2025. &lt;a href=&#34;https://updates.timsherratt.org/2025/05/19/no-more-harvesting-data-from.html&#34;&gt;updates.timsherratt.org/2025/05/1&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “Some Important Updates for the Trove Newspaper &amp;amp; Gazette Harvester.” &lt;em&gt;Tim Sherratt – Sharing Recent Updates and Work-in-Progress&lt;/em&gt;, August 31, 2023. &lt;a href=&#34;https://updates.timsherratt.org/2023/08/31/some-important-updates.html&#34;&gt;updates.timsherratt.org/2023/08/3&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “Supporters.” &lt;em&gt;GLAM Workbench&lt;/em&gt;. Accessed June 5, 2025. &lt;a href=&#34;https://glam-workbench.net/get-involved/supporters/&#34;&gt;glam-workbench.net/get-invol&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “Trove Newspapers: Data Dashboard.” Accessed June 5, 2025. &lt;a href=&#34;https://wragge.github.io/trove-newspaper-totals/&#34;&gt;wragge.github.io/trove-new&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “Trove-Newspaper-Harvester.” Python, October 23, 2023. &lt;a href=&#34;https://doi.org/10.5281/zenodo.7103174&#34;&gt;doi.org/10.5281/z&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Sherratt, Tim, Harry Keightley, Ben Foley, and Michael Niemann. “GLAM-Workbench/Glam-Workbench-Template.” Python. GLAM Workbench, August 24, 2023. &lt;a href=&#34;https://github.com/GLAM-Workbench/glam-workbench-template&#34;&gt;github.com/GLAM-Work&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;“TaDiRAH The Taxonomy of Digital Research Activities in the Humanities.” Accessed June 5, 2025. &lt;a href=&#34;https://tadirah.info/&#34;&gt;tadirah.info/.&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Talboom, Leontien, and Mark Bell. “Keeping It under Lock and Keywords: Exploring New Ways to Open up the Web Archives with Notebooks.” &lt;em&gt;Archival Science&lt;/em&gt;, July 4, 2022. &lt;a href=&#34;https://doi.org/10.1007/s10502-022-09391-6&#34;&gt;doi.org/10.1007/s&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;“Trove Historical Data.” Accessed June 5, 2025. &lt;a href=&#34;https://zenodo.org/communities/trove-historical-data/records?q=&amp;amp;l=list&amp;amp;p=1&amp;amp;s=10&amp;amp;sort=newest&#34;&gt;zenodo.org/communiti&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;section class=&#34;footnotes&#34; role=&#34;doc-endnotes&#34;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&#34;fn:1&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Sherratt, “GLAM Workbench.”&amp;#160;&lt;a href=&#34;#fnref:1&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:2&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;See, for example: Sherratt, “Trove Newspapers: Data Dashboard.” and “Trove Historical Data.”&amp;#160;&lt;a href=&#34;#fnref:2&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:3&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Sherratt, “Trove-Newspaper-Harvester.”&amp;#160;&lt;a href=&#34;#fnref:3&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:4&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;See, for example: Sherratt, “Hacking Heritage: Understanding the Limits of Online Access.”&amp;#160;&lt;a href=&#34;#fnref:4&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:5&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;“GLAM Workbench (GitHub Organisation).”&amp;#160;&lt;a href=&#34;#fnref:5&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:6&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;“TaDiRAH The Taxonomy of Digital Research Activities in the Humanities.”&amp;#160;&lt;a href=&#34;#fnref:6&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:7&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Sefton et al., “The ARDC Community Data Lab Architecture.”&amp;#160;&lt;a href=&#34;#fnref:7&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:8&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;For more on best practices in sharing Jupyter projects, see: Candela, Chambers, and Sherratt, “An Approach to Assess the Quality of Jupyter Projects Published by GLAM Institutions.”&amp;#160;&lt;a href=&#34;#fnref:8&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:9&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Sherratt, “GLAM Workbench Citations.”&amp;#160;&lt;a href=&#34;#fnref:9&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:10&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Ames and Havens, “Exploring National Library of Scotland Datasets with Jupyter Notebooks.” For another example of the GLAM Workbench&amp;rsquo;s influence, see: Talboom and Bell, “Keeping It under Lock and Keywords.”&amp;#160;&lt;a href=&#34;#fnref:10&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:11&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Bailey et al., “Unlocking the Potential of Digital Collections. A Call to Action,” 58.&amp;#160;&lt;a href=&#34;#fnref:11&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:12&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Sherratt et al., “GLAM-Workbench/Glam-Workbench-Template.” For documentation see: Sherratt, “Develop a New Repository.”&amp;#160;&lt;a href=&#34;#fnref:12&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:13&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;“Asking Questions with Web Archives – Introductory Notebooks for Historians.”; Jackson, “GLAM Workbench Update.”&amp;#160;&lt;a href=&#34;#fnref:13&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:14&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Sherratt, “Supporters.”; Sherratt, “Some Important Updates for the Trove Newspaper &amp;amp; Gazette Harvester.”&amp;#160;&lt;a href=&#34;#fnref:14&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:15&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Sherratt, “Farewell Trove.”; Sherratt, “No More Harvesting Data from the National Archives of Australia.”&amp;#160;&lt;a href=&#34;#fnref:15&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</description>
      <source:markdown>*This is a preprint of my contribution to the publication &#39;Building User-Friendly Toolkits and Platforms for Digital Humanities&#39;. It provides a brief overview of the GLAM Workbench. I had to leave a lot out, but hopefully it provides a useful summary of what the GLAM Workbench is, and what I&#39;d like it to be.*

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.15597924.svg)](https://doi.org/10.5281/zenodo.15597924)

----

The GLAM Workbench is a collection of tools and resources created to help researchers use and explore the digital collections of GLAM organisations (galleries, libraries, archives, and museums).[^fn-gw] It&#39;s mainly focused on collections from Australia and New Zealand, but some sections venture across international boundaries to explore topics such as web archives and Wikidata.

GLAM organisations make a lot of rich cultural data available online, but getting that data in a machine-readable form that can be aggregated and analysed is often difficult. The GLAM Workbench tries to fill this gap by providing code examples and API documentation, but data access alone is not enough. Researchers need to understand the history, structure, and extent of the data – both its limits and its possibilities. By sharing snapshots, building overviews, and exploring patterns and inconsistencies, the GLAM Workbench also attempts to contextualise GLAM collections and open them to new types of questions.[^fn-dashboard]

## History and motivation

I created the GLAM Workbench in 2017, but it incorporates the latest versions of tools, such as the Trove Newspaper Harvester, which I&#39;ve been maintaining for more than 15 years.[^fn-harvester] One of my motivations was simply to bring together useful snippets, notes, and doodles from a variety of blog posts, web applications, and code repositories, and make them available in a form that could be more easily navigated and maintained.

I was also keen to explore the way that Jupyter notebooks combine code and narrative. I wanted to find ways to support researchers as they developed their digital skills and confidence, not just dump them at the command line or point them to an app.

The ongoing development of the GLAM Workbench is also part of my own research. I&#39;m interested in the meaning of access within the context of GLAM collections. What changes when you can download data and explore collections beyond the limitations of the web interface?[^fn-hacking]

## Contents and technologies

At its heart, the GLAM Workbench comprises at least 171 Jupyter notebooks and 59 datasets shared through more than 70 GitHub repositories.[^fn-gw-org] Added to this are a number of web apps, online databases, and guides to related resources. Code from some notebooks has also been spun off into independent Python packages. All of this is brought together within a single documentation site, built using MkDocs Material.

The contents are mostly organised by institution, reflecting the idiosyncrasies of the data. I&#39;ve partially implemented tags to draw together similar resources across institutions, but this needs to be made more consistent, ideally using the TaDiRAH taxonomy.[^fn-tadirah] Many of the notebooks describe methods for accessing data and building datasets. Others demonstrate techniques for visualisation and analysis, suggest workarounds for limits imposed by collection interfaces, or provide example-driven documentation for APIs and datasets.

There is no single platform or server underlying the GLAM Workbench. Instead, it follows a pattern described in the ARDC Community Data Lab&#39;s architecture principles as &#39;infrastructure at rest&#39;.[^fn-cdl] Notebooks can be run as required in a variety of contexts from cloud services to local computers. This is made possible by standardised configuration files and automated processes that build virtual computing environments from each GitHub repository.[^fn-quality] 

## Impact and engagement

The GLAM Workbench has helped to expand understanding of the research possibilities of GLAM collection data. The list of publications citing the GLAM Workbench or one of its embedded tools now includes more than 100 entries.[^fn-citations] Some of these relate to individual research projects, while others survey the practices of GLAM organisations and the needs of research infrastructure around the world.

My work on the GLAM Workbench has helped inspire organisations such as the National Library of Scotland to explore new ways of supporting digital research.[^fn-scotland] A recent report from the &#39;Towards a National Collection&#39; project in the UK has mentioned the GLAM Workbench alongside a number of national libraries in Europe and the USA for &#39;encouraging innovative research and expanding public engagement with heritage resources&#39;.[^fn-towards]

And yet, there are disappointments. Most of the Australian GLAM organisations whose collections are featured in the GLAM Workbench have shown little interest in sharing or engaging with its resources. This makes it difficult to get tools to the people who could benefit from them. There&#39;s some irony in the fact that the websites of the National Library of Scotland, the British Library, the UK National Archives, the V&amp;A Museum, and DigitalNZ all include links to the GLAM Workbench, but the National Library of Australia (NLA) and the National Archives of Australia (NAA) do not.

## Maintenance and sustainability

While a number of individuals have contributed notebooks and additions to the GLAM Workbench, it remains essentially a one man operation. Over the years, I&#39;ve sought to ease the maintenance burden by automating processes, adding some basic testing frameworks, and generating machine-readable metadata that summarises the contents of each repository. For example, I created a GLAM Workbench repository template that makes it easy to start work on a new topic.[^fn-template]

Development of the web archives section of the GLAM Workbench was made possible by a grant from the International Internet Preservation Consortium, and the section&#39;s ongoing maintenance is supported by the British Library.[^fn-webarchives] I&#39;m grateful too for my GitHub sponsors who help cover some of my cloud hosting bills, and to the ARDC for funding to integrate RO-Crate metadata.[^fn-sponsors] But beyond this, the GLAM Workbench has received no dedicated funding or institutional support. It has, nonetheless, outlived some well-funded digital infrastructure projects in the HASS sector.

Sustainability means more than money, though. The GLAM Workbench doesn&#39;t have to continue in its current form to have a long-term impact. My focus is on ensuring that its contents are open to future reuse and modification. Everything is openly licensed, published through GitHub, and preserved in Zenodo. If tools are useful they can live on, independent of me.

## The future

I&#39;m writing this at a difficult time. Changes wrought by the NLA and NAA in early 2025 have made it impossible for me to continue work on the Trove and RecordSearch sections of the GLAM Workbench.[^fn-trove] In the Trove section alone, there are more than 70 notebooks.

The GLAM Workbench is not my job, no-one pays me. I work on it because I think its useful and important, and because I enjoy the process of solving problems and helping researchers. The NLA&#39;s actions, in particular, have robbed me of that joy, and made me consider whether I want to continue. Research infrastructure is people.

On the other hand, there are many more GLAM collections for me to explore. I&#39;m also hoping to find new ways of collaborating with individuals and institutions. I&#39;m often inspired to create new tools and resources by gnarly questions from researchers. While such questions continue, so the GLAM Workbench will grow.

[^fn-gw]: Sherratt, “GLAM Workbench.”
[^fn-dashboard]: See, for example: Sherratt, “Trove Newspapers: Data Dashboard.” and “Trove Historical Data.”
[^fn-harvester]: Sherratt, “Trove-Newspaper-Harvester.”
[^fn-hacking]: See, for example: Sherratt, “Hacking Heritage: Understanding the Limits of Online Access.”
[^fn-gw-org]: “GLAM Workbench (GitHub Organisation).”
[^fn-tadirah]: “TaDiRAH The Taxonomy of Digital Research Activities in the Humanities.”
[^fn-cdl]: Sefton et al., “The ARDC Community Data Lab Architecture.”
[^fn-quality]: For more on best practices in sharing Jupyter projects, see: Candela, Chambers, and Sherratt, “An Approach to Assess the Quality of Jupyter Projects Published by GLAM Institutions.”
[^fn-citations]: Sherratt, “GLAM Workbench Citations.”
[^fn-scotland]: Ames and Havens, “Exploring National Library of Scotland Datasets with Jupyter Notebooks.” For another example of the GLAM Workbench&#39;s influence, see: Talboom and Bell, “Keeping It under Lock and Keywords.”
[^fn-towards]: Bailey et al., “Unlocking the Potential of Digital Collections. A Call to Action,” 58.
[^fn-template]: Sherratt et al., “GLAM-Workbench/Glam-Workbench-Template.” For documentation see: Sherratt, “Develop a New Repository.”
[^fn-webarchives]: “Asking Questions with Web Archives – Introductory Notebooks for Historians.”; Jackson, “GLAM Workbench Update.”
[^fn-sponsors]: Sherratt, “Supporters.”; Sherratt, “Some Important Updates for the Trove Newspaper &amp; Gazette Harvester.”
[^fn-trove]: Sherratt, “Farewell Trove.”; Sherratt, “No More Harvesting Data from the National Archives of Australia.”

----

## References

Ames, Sarah, and Lucy Havens. “Exploring National Library of Scotland Datasets with Jupyter Notebooks.” *IFLA Journal*, December 27, 2021. [doi.org/10.1177/0...](https://doi.org/10.1177/03400352211065484).

Bailey, Rebecca, Javier Pereda, Chris Michaels, and Tom Callahan. “Unlocking the Potential of Digital Collections. A Call to Action.” Towards a National Collection, November 21, 2024. [doi.org/10.5281/z...](https://doi.org/10.5281/zenodo.13838916).

Candela, Gustavo, Sally Chambers, and Tim Sherratt. “An Approach to Assess the Quality of Jupyter Projects Published by GLAM Institutions.” *Journal of the Association for Information Science and Technology* 74, no. 13 (2023): 1550–64. [doi.org/10.1002/a...](https://doi.org/10.1002/asi.24835).

“GLAM Workbench (GitHub Organisation).” Accessed June 5, 2025. [github.com/GLAM-Work...](https://github.com/GLAM-Workbench).

IIPC. “Asking Questions with Web Archives – Introductory Notebooks for Historians.” Accessed June 5, 2025. [netpreserve.org/projects/...](https://netpreserve.org/projects/jupyter-notebooks-for-historians/).

Jackson, Andy. “GLAM Workbench Update.” UK Web Archive Blog. Accessed June 2, 2025. [blogs.bl.uk/webarchiv...](https://blogs.bl.uk/webarchive/2022/09/glam-workbench-update.html).

Sefton, Peter, Tom Honeyman, Tim Sherratt, and Conal Tuohy. “The ARDC Community Data Lab Architecture: Research Software Deployment Principles and Patterns for Integrity, Reproducibility and Sustainability,” May 10, 2024. [zenodo.org/records/1...](https://zenodo.org/records/11169744).

Sherratt, Tim. “Develop a New GLAM Workbench Repository.” GLAM Workbench. Accessed June 5, 2025. [glam-workbench.net/get-invol...](https://glam-workbench.net/get-involved/developing-repositories/).

———. “Farewell Trove.” *Tim Sherratt – Sharing Recent Updates and Work-in-Progress*, May 7, 2025. [updates.timsherratt.org/2025/05/0...](https://updates.timsherratt.org/2025/05/07/farewell-trove.html).

———. “GLAM Workbench.” Zenodo, June 5, 2025. [doi.org/10.5281/z...](https://doi.org/10.5281/zenodo.15597489).

———. “GLAM Workbench.” Accessed June 5, 2025. [glam-workbench.net/.](https://glam-workbench.net/).

———. “GLAM Workbench Citations.” *GLAM Workbench*. Accessed June 5, 2025. [glam-workbench.net/citations...](https://glam-workbench.net/citations/).

———. “Hacking Heritage: Understanding the Limits of Online Access.” In *The Routledge International Handbook of New Digital Practices in Galleries, Libraries, Archives, Museums and Heritage Sites*, edited by H Lewi, W Smith, S Cooke, and D vom Lehn, 116–30. London &amp; New York: Routledge, 2020. [doi.org/10.5281/z...](https://doi.org/10.5281/zenodo.5035855).

———. “No More Harvesting Data from the National Archives of Australia.” *Tim Sherratt – Sharing Recent Updates and Work-in-Progress*, May 19, 2025. [updates.timsherratt.org/2025/05/1...](https://updates.timsherratt.org/2025/05/19/no-more-harvesting-data-from.html).

———. “Some Important Updates for the Trove Newspaper &amp; Gazette Harvester.” *Tim Sherratt – Sharing Recent Updates and Work-in-Progress*, August 31, 2023. [updates.timsherratt.org/2023/08/3...](https://updates.timsherratt.org/2023/08/31/some-important-updates.html).

———. “Supporters.” *GLAM Workbench*. Accessed June 5, 2025. [glam-workbench.net/get-invol...](https://glam-workbench.net/get-involved/supporters/).

———. “Trove Newspapers: Data Dashboard.” Accessed June 5, 2025. [wragge.github.io/trove-new...](https://wragge.github.io/trove-newspaper-totals/).

———. “Trove-Newspaper-Harvester.” Python, October 23, 2023. [doi.org/10.5281/z...](https://doi.org/10.5281/zenodo.7103174).

Sherratt, Tim, Harry Keightley, Ben Foley, and Michael Niemann. “GLAM-Workbench/Glam-Workbench-Template.” Python. GLAM Workbench, August 24, 2023. [github.com/GLAM-Work...](https://github.com/GLAM-Workbench/glam-workbench-template).

“TaDiRAH The Taxonomy of Digital Research Activities in the Humanities.” Accessed June 5, 2025. [tadirah.info/.](https://tadirah.info/).

Talboom, Leontien, and Mark Bell. “Keeping It under Lock and Keywords: Exploring New Ways to Open up the Web Archives with Notebooks.” *Archival Science*, July 4, 2022. [doi.org/10.1007/s...](https://doi.org/10.1007/s10502-022-09391-6).

“Trove Historical Data.” Accessed June 5, 2025. [zenodo.org/communiti...](https://zenodo.org/communities/trove-historical-data/records?q=&amp;l=list&amp;p=1&amp;s=10&amp;sort=newest).

</source:markdown>
    </item>
    
  </channel>
</rss>
