<rss version="2.0">
  <channel>
    <title>glamworkbench on Tim Sherratt</title>
    <link>https://updates.timsherratt.org/categories/glamworkbench/</link>
    <description></description>
    
    <language>en</language>
    
    <lastBuildDate>Mon, 16 Mar 2026 15:14:17 +1100</lastBuildDate>
    
    <item>
      <title>Generosity in practice – a chat with Paula Bray at the State Library of Victoria</title>
      <link>https://updates.timsherratt.org/2026/03/16/generosity-in-practice-a-chat.html</link>
      <pubDate>Mon, 16 Mar 2026 15:14:17 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2026/03/16/generosity-in-practice-a-chat.html</guid>
      <description>&lt;p&gt;While I was in Melbourne during my time as &lt;a href=&#34;https://lab.slv.vic.gov.au/team/tim-sherratt&#34;&gt;Creative Technologist-in-Residence at the State Library of Victorian LAB&lt;/a&gt;, I had a conversation with Paula Bray for the LAB&amp;rsquo;s podcast series. Paula is the SLV&amp;rsquo;s Chief Digital Officer, and has long championed the importance of digital innovation in the GLAM sector. It was fun to chat about stuff that I&amp;rsquo;ve been doing for the last 30 years, any why openness and generosity is important in working with GLAM collections. You can &lt;a href=&#34;https://lab.slv.vic.gov.au/experiments/my-place-tim-sherratt/interview&#34;&gt;listen to our conversation on the LAB site&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Zotero translator for Libraries Tasmania updated!</title>
      <link>https://updates.timsherratt.org/2026/03/10/zotero-translator-for-libraries-tasmania.html</link>
      <pubDate>Tue, 10 Mar 2026 10:31:36 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2026/03/10/zotero-translator-for-libraries-tasmania.html</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://www.zotero.org&#34;&gt;Zotero&lt;/a&gt; translator for &lt;a href=&#34;https://libraries.tas.gov.au&#34;&gt;Libraries Tasmania&lt;/a&gt; has been updated, fixing a problem with attaching images of digitised resources. The fix is in the main Zotero repository now, so it should find its way to your computer automatically.&lt;/p&gt;
&lt;p&gt;I created the first version of the Libraries Tasmania translator back in 2022 – &lt;a href=&#34;https://updates.timsherratt.org/2022/07/14/calling-all-tasmanian.html&#34;&gt;this post describes what it does&lt;/a&gt;. It works across all three sections of the catalogue, including the archives, and names index. The translator captures metadata, PDFs, and images from records, including things like digitised pages from convict records. This makes it easy for researchers to assemble their own datasets of Tasmanian records in Zotero, where they can add notes and annotations, or share with colleagues.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/zotero-librariestas.png&#34; width=&#34;600&#34; height=&#34;382&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Capture images and metadata from the Libraries Tasmania catalogue using Zotero&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;The update was necessary because Libraries Tasmania changed the way some digitised resources were displayed and downloaded. Keeping Zotero translators working across system updates can take a bit of work! I also took the opportunity to update the code to meet current Zotero guidelines and clean up a few lingering problems. If you notice any oddities, please let me know.&lt;/p&gt;
&lt;p&gt;There are now at least &lt;a href=&#34;https://updates.timsherratt.org/2024/08/22/new-zotero-translators.html#zotero-and-australian-glams&#34;&gt;8 custom translators&lt;/a&gt; to help you work with Australian GLAM collections.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>my place – exploring SLV collections through a street address</title>
      <link>https://updates.timsherratt.org/2026/02/02/my-place-exploring-slv-collections.html</link>
      <pubDate>Mon, 02 Feb 2026 22:43:20 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2026/02/02/my-place-exploring-slv-collections.html</guid>
      <description>&lt;p&gt;&lt;em&gt;&amp;lsquo;What can I find out about my house?&#39;&lt;/em&gt; My work as &lt;a href=&#34;https://lab.slv.vic.gov.au/experiments/my-place-tim-sherratt&#34;&gt;Creative Technologist-in-Residence at the SLV LAB&lt;/a&gt; was inspired by questions like this that librarians at the SLV hear every day. I wanted to explore how the Library&amp;rsquo;s place-based collections could be used to provide new entry points for discovery and navigation – entry points based not on words, but locations.&lt;/p&gt;
&lt;p&gt;At the end of my residency, I pulled all the different collections I&amp;rsquo;d been working with into a single interface – &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt;. It&amp;rsquo;s not polished or complete, but I think it&amp;rsquo;s a useful starting point to think about the possibilities. You just type in an address, street name, or place name and my place shows you maps, photos, newspapers, and even extracts from the Sands &amp;amp; MacDougall directories. &lt;strong&gt;&lt;a href=&#34;https://slv.wraggelabs.com/myplace/&#34;&gt;Try it now!&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/screenshot-from-2026-02-02-17-44-07.png&#34; width=&#34;600&#34; height=&#34;433&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;&lt;a href=&#34;https://slv.wraggelabs.com/myplace/&#34;&gt;Try &lt;b&gt;&lt;i&gt;my place!&lt;/i&gt;&lt;/b&gt;&lt;/a&gt; Just enter an address in the search box.&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Search results in my place are bookmarkable. So save and share your discoveries!&lt;/p&gt;
&lt;h2 id=&#34;the-collections&#34;&gt;The collections&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; draws its data from a number of different place-based collections that I&amp;rsquo;ve been working on during my residency.&lt;/p&gt;
&lt;h3 id=&#34;openstreetmap&#34;&gt;OpenStreetMap&lt;/h3&gt;
&lt;p&gt;When you enter an address in the search box, &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; looks it up in &lt;a href=&#34;https://www.openstreetmap.org&#34;&gt;OpenStreetMap&lt;/a&gt; to get its geospatial coordinates. It then places a marker and re-centres the map at the top of the app.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp.png&#34; width=&#34;600&#34; height=&#34;219&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Map centred on 149 Brunswick Street, Fitzroy&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;OpenStreetMap is also used to retrieve additional information about the suburb, including its boundaries.&lt;/p&gt;
&lt;h3 id=&#34;sands--macdougalls-directories&#34;&gt;Sands &amp;amp; MacDougall&amp;rsquo;s directories&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; queries the &lt;a href=&#34;https://updates.timsherratt.org/2025/11/12/a-new-way-of-searching.html&#34;&gt;full-text searchable version of Sands &amp;amp; Mac&lt;/a&gt; for addresses. Results will vary based on the OCR quality and the nature of query, but it can give you a potted history of who has lived in your house. The search results are displayed in chronological order, and include an &lt;a href=&#34;https://updates.timsherratt.org/2025/11/16/some-sands-mac-tweaks-thanks.html&#34;&gt;image snippet&lt;/a&gt; showing the actual printed entry as well as the text content and metadata.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-sandm.png&#34; width=&#34;600&#34; height=&#34;349&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Occupants of 149 Brunswick Street, Fitzroy from 1875 to 1925&lt;/figcaption&gt;&lt;/figure&gt;
&lt;h3 id=&#34;committee-for-urban-action-photographs&#34;&gt;Committee for Urban Action photographs&lt;/h3&gt;
&lt;p&gt;If you enter a full street address, &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; will search &lt;a href=&#34;https://updates.timsherratt.org/2026/01/29/geolocating-photos-from-the-slvs.html&#34;&gt;the CUA collection&lt;/a&gt; for photos associated with the segment of road that includes the current address. It then displays the individual images from any matching photosets.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-cua.png&#34; width=&#34;600&#34; height=&#34;432&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Photographs from CUA of the currently selected road&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Otherwise &lt;em&gt;&lt;strong&gt;my space&lt;/strong&gt;&lt;/em&gt; will look for CUA photos that are near the current location, and display a randomly-selected image from each photoset.&lt;/p&gt;
&lt;h3 id=&#34;georeferenced-maps&#34;&gt;Georeferenced maps&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; searches through &lt;a href=&#34;https://updates.timsherratt.org/2025/11/04/turning-the-slvs-maps-into.html&#34;&gt;digitised maps from the SLV collection that have been georeferenced by the public&lt;/a&gt;. It finds maps that either intersect with the currently selected location, or are nearby.&lt;/p&gt;
&lt;p&gt;If you enter a full street address, the first 6 georeferenced maps will be positioned on a modern basemap with a marker indicating the currently selected point. This means you can see your address on a historical map. The number of georeferenced maps that can be displayed in this way is determined by the browser – so I&amp;rsquo;ve limited it to 6 to be safe.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-geo.png&#34; width=&#34;600&#34; height=&#34;296&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Georeferenced maps positioned on a modern basemap, showing the location of the currently selected address&lt;/figcaption&gt;&lt;/figure&gt;
&lt;h3 id=&#34;parish-maps&#34;&gt;Parish maps&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; searches through &lt;a href=&#34;https://updates.timsherratt.org/2025/10/06/creating-bounding-boxes-for-parish.html&#34;&gt;parish maps in the SLV collection that have geospatial coordinates or approximate bounding boxes&lt;/a&gt;. It finds maps that either intersect with the currently selected location, or are nearby.&lt;/p&gt;
&lt;h3 id=&#34;newspapers&#34;&gt;Newspapers&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; searches through &lt;a href=&#34;https://updates.timsherratt.org/2025/12/16/exploring-victorian-newspapers.html&#34;&gt;my dataset of newspapers in the SLV collection&lt;/a&gt; that have a place of publication documented in the &amp;lsquo;Place newspaper published&amp;rsquo; metadata field. It finds newspapers that are either associated with the current suburb/town, or a nearby suburb/town. This includes digitised and non-digitised titles. Digitised titles include a link to Trove.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-newspapers.png&#34; width=&#34;600&#34; height=&#34;293&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Newspapers from the SLV collection published in Fitzroy&lt;/figcaption&gt;&lt;/figure&gt;
&lt;h3 id=&#34;photographs&#34;&gt;Photographs&lt;/h3&gt;
&lt;p&gt;I thought it would be cool to include a few photographs of the current suburb or town. To do this, I downloaded a list of place names from VicNames, then used the place names to &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/download_place_images.ipynb&#34;&gt;search the SLV catalogue for photographs with relevant subject headings&lt;/a&gt;. A random selection of the harvested images is displayed in &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt;.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/mp-images.png&#34; width=&#34;600&#34; height=&#34;234&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;A few images of Fitzroy, displayed alongside a map of Fitzroy&#39;s current boundaries using data from OpenStreetMap&lt;/figcaption&gt;&lt;/figure&gt;
&lt;h2 id=&#34;the-interface&#34;&gt;The interface&lt;/h2&gt;
&lt;p&gt;The interface is pretty simple. You type an address in the box and hit enter. If the geocoding process finds multiple matches, it&amp;rsquo;ll give you a list to choose from. Once the location is found, a marker is added and the main map re-centres. Then related resources are displayed below the map.&lt;/p&gt;
&lt;p&gt;As you scroll down through the results you gradually zoom out from your initial starting point. This is reflected in the four bands or layers used to group resources: &amp;lsquo;my house&amp;rsquo;, &amp;lsquo;my street&amp;rsquo;, &amp;lsquo;my suburb&amp;rsquo;, and &amp;lsquo;nearby&amp;rsquo;. Each band contains a mix of resources from different collections.&lt;/p&gt;
&lt;p&gt;When I started working on &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt;, I was thinking about a project from around 2010 called &lt;a href=&#34;https://wraggelabs.com/info/history-wall/&#34;&gt;The History Wall&lt;/a&gt;. Like &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt;, The History Wall pulled many different types of resources together into a rich exploratory interface. As you scrolled through The History Wall you moved through time, with randomly selected items appearing from a range of sources including Trove newspapers, the ADB, and museum collections.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/history-wall.jpg&#34; width=&#34;600&#34; height=&#34;505&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;A version of The History Wall created for the National Museum of Australia&#39;s &#39;Irish in Australia&#39; exhibition&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;I originally thought I&amp;rsquo;d inject some of the same randomness into &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt;, but I was worried it might just get too confusing. I thought it was important to keep the relationship between the starting point and the resources in focus even as you zoomed out. So my visual metaphor shifted to something more like a blast radius map, or a stratigraphic diagram, that displayed distinct groups and layers as you moved beyond the baseline. My limited CSS skills couldn&amp;rsquo;t make the vision in my head a reality, but there are lots of headings and colours instead to highlight the transitions!&lt;/p&gt;
&lt;p&gt;The actual mix of groups and layers displayed depends on the nature of your query. If you&amp;rsquo;ve entered a complete street address, and there are results for that address in Sands &amp;amp; Mac, then you&amp;rsquo;ll see &amp;lsquo;my house&amp;rsquo;, &amp;lsquo;my suburb&amp;rsquo;, and &amp;lsquo;nearby&amp;rsquo;. If you&amp;rsquo;ve only entered a suburb or town, or your street address can&amp;rsquo;t be found, you&amp;rsquo;ll see two layers starting with &amp;lsquo;my suburb&amp;rsquo;.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s an overview of what you might expect to see.&lt;/p&gt;
&lt;h3 id=&#34;my-house&#34;&gt;my house&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Sands &amp;amp; MacDougall extracts (text search on full address)&lt;/li&gt;
&lt;li&gt;georeferenced maps (search for maps that contain the base point)&lt;/li&gt;
&lt;li&gt;parish maps (search for maps that contain the base point)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;my-street&#34;&gt;my street&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;CUA photos (search for matching street identifiers)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;if there&amp;rsquo;s no &amp;lsquo;my house&amp;rsquo; layer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sands &amp;amp; MacDougall extracts (text search on street name and suburb)&lt;/li&gt;
&lt;li&gt;georeferenced maps (search for intersections between maps and street)&lt;/li&gt;
&lt;li&gt;parish maps (search for intersections between maps and street)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;my-suburbtown&#34;&gt;my suburb/town&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;suburb boundaries from OSM&lt;/li&gt;
&lt;li&gt;images (search for suburb name in metadata)&lt;/li&gt;
&lt;li&gt;newspapers (search for suburb name in metadata)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;if there&amp;rsquo;s no &amp;lsquo;my house&amp;rsquo; or &amp;lsquo;my street&amp;rsquo; layer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;georeferenced maps (search for intersections between maps and suburb boundaries)&lt;/li&gt;
&lt;li&gt;parish maps (search for intersections between maps and suburb boundaries)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;nearby&#34;&gt;nearby&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;CUA photos (search for photosets within 5km of the base point, filtered to remove current street)&lt;/li&gt;
&lt;li&gt;georeferenced maps (search for maps within 10km of base point, ordered by distance, max of 24 displayed)&lt;/li&gt;
&lt;li&gt;parish maps (search for maps within 10km of base point, ordered by distance, max of 24 displayed)&lt;/li&gt;
&lt;li&gt;newspapers (search for newspapers within 100km of base point, ordered by distance, max of 24 displayed)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;the-data&#34;&gt;The data&lt;/h2&gt;
&lt;p&gt;Most of the data used in &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; is stored in two SQLite databases – &lt;a href=&#34;https://glam-workbench.net/state-library-victoria/sands-macdougall-directories/&#34;&gt;one for Sands &amp;amp; Mac&lt;/a&gt;, and &lt;a href=&#34;https://slv-places-481615284700.australia-southeast1.run.app/&#34;&gt;the other for CUA, georeferenced maps, parish maps, and newspapers&lt;/a&gt;. The metadata for the collection images is stored in &lt;a href=&#34;https://raw.githubusercontent.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/refs/heads/main/place_images.json&#34;&gt;a JSON file&lt;/a&gt; that is directly loaded by the interface.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve published the SQLite databases online using &lt;a href=&#34;https://datasette.io&#34;&gt;Datasette&lt;/a&gt; and &lt;a href=&#34;https://www.gaia-gis.it/fossil/libspatialite/index&#34;&gt;Spatialite&lt;/a&gt;. Spatialite makes it possible to find geospatial features that intersect, or are near, a given point. For example, you could find maps that include a specific set of coordinates.&lt;/p&gt;
&lt;p&gt;Datasette has the ability to create &lt;a href=&#34;https://docs.datasette.io/en/stable/sql_queries.html#canned-queries&#34;&gt;&amp;lsquo;canned queries&amp;rsquo;&lt;/a&gt; that feed url parameters into pre-defined SQL queries. This coupled with Datasette&amp;rsquo;s &lt;a href=&#34;https://docs.datasette.io/en/stable/json_api.html&#34;&gt;built-in JSON API&lt;/a&gt; makes it possible to construct query urls in &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; and use them to retrieve JSON results data from my databases.&lt;/p&gt;
&lt;p&gt;When you enter an address in &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt;, multiple queries are fired off to find intersecting or nearby resources. For example, this url finds georeferenced maps within 10km of a point at the centre of Fitzroy: &lt;a href=&#34;https://slv-places-481615284700.australia-southeast1.run.app/georeferenced_maps/maps_from_wkt.json?wkt=POINT(144.977468%20-37.803143)&#34;&gt;slv-places-481615284700.australia-southeast1.run.app/georefere&amp;hellip;&lt;/a&gt;&amp;amp;distance=10000&amp;amp;_shape=array.&lt;/p&gt;
&lt;p&gt;In the case of Sands &amp;amp; Mac, &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; uses a canned query that runs a full-text search across the OCRd content of a volume. Suburb names are often abbreviated in Sands &amp;amp; Mac, so the app first runs a query to find possible abbreviations, then adds them into the main query to inject a bit of fuzziness. This is repeated for all 24 digitised volumes.&lt;/p&gt;
&lt;p&gt;Once the metadata is retrieved from the databases, images are loaded from the SLV&amp;rsquo;s IIIF service.&lt;/p&gt;
&lt;h2 id=&#34;next-steps&#34;&gt;Next steps?&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m not sure how much more work I&amp;rsquo;ll do on &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt;, but there are a few things I&amp;rsquo;d like to try. In particular, I&amp;rsquo;d like to help the user understand more about what data is being presented, or not presented, and why. Not all digitised maps have been georeferenced, not all parish maps have coordinates, street numbers have changed, and the OCR in Sands &amp;amp; Mac varies in quality. &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; can only present a sample – a gesture towards the wealth of material available from the SLV. I feel that message needs to be made more explicit. Though I&amp;rsquo;m not sure how without overloading the interface.&lt;/p&gt;
&lt;p&gt;There are additional data sources I&amp;rsquo;d like to play around with. &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; already includes some code to query &lt;a href=&#34;https://www.wikidata.org/&#34;&gt;Wikidata&lt;/a&gt; for more information about a suburb. But I haven&amp;rsquo;t had a chance to do anything with it. I&amp;rsquo;d like to be able to provide additional contextual information from outside the SLV, such as electoral boundaries, populations, even election results. It would also be fun to display sightings of plants and animals from the &lt;a href=&#34;https://www.ala.org.au&#34;&gt;Atlas of Living Australia&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;What can I find out about my house? It would be great if &lt;em&gt;&lt;strong&gt;my place&lt;/strong&gt;&lt;/em&gt; could answer that question by taking the user on an open ended journey through our cultural, historical, and environmental landscape.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Geolocating photos from the SLV&#39;s Committee for Urban Action collection</title>
      <link>https://updates.timsherratt.org/2026/01/29/geolocating-photos-from-the-slvs.html</link>
      <pubDate>Thu, 29 Jan 2026 18:08:06 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2026/01/29/geolocating-photos-from-the-slvs.html</guid>
      <description>&lt;p&gt;Concerned about the loss of built heritage in the 1970s, the Committee for Urban Action photographed streetscapes across urban and regional Victoria. They compiled a remarkable collection of photographs that is &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/collectionDiscovery?vid=61SLV_INST:SLV&amp;amp;collectionId=81271917420007636&#34;&gt;now being digitised by the State Library of Victoria&lt;/a&gt;. More than 20,000 images are already available online!&lt;/p&gt;
&lt;p&gt;The CUA worked systematically, capturing photos street by street, and recording the locations of each set of photographs. This information is used to prepare the title attached to each photo as it&amp;rsquo;s uploaded to the SLV catalogue. In general, titles include the name of the road where the photo was taken, the name of the suburb or town, and the names of two intersecting roads that define the boundaries of the current road segment. They can also tell you which side of the road the photo was taken on. For example, the title &lt;code&gt;Gore Street, Fitzroy, from Gertrude Street to Webb Street - east side&lt;/code&gt; tells us the photo was taken on the east side of Gore Street, Fitzroy between the intersections with Gertrude Street and Webb Street.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/cua-slv-viewer.png&#34; width=&#34;600&#34; height=&#34;547&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Photos from Gore Street, Fitzroy &lt;a href=&#34;https://viewer.slv.vic.gov.au/?entity=IE7489506&amp;mode=browse&#34;&gt;displayed in the SLV image viewer&lt;/a&gt;&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;It&amp;rsquo;s great to have this sort of structured information linking photos to specific locations, but to navigate through the collection &lt;em&gt;in space&lt;/em&gt; we need more. We need to link each photo to a set of geospatial coordinates by mapping each road segment. That was the challenge I took on as part of &lt;a href=&#34;https://lab.slv.vic.gov.au/experiments/my-place-tim-sherratt&#34;&gt;my residency in the SLV LAB&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;When I started working on the collection I wasn&amp;rsquo;t really sure what was possible. I had to learn a lot, and ended up revising my processes multiple times as I got deeper into the data. But my aim was always to create some sort of map-based interface, that would allow users to click on a street and see any associated CUA photos. It&amp;rsquo;s still a bit buggy and incomplete, but here it is – &lt;a href=&#34;https://slv.wraggelabs.com/cua/&#34;&gt;&lt;strong&gt;explore the CUA collection street by street&lt;/strong&gt;&lt;/a&gt;!&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/cua-browser.png&#34; width=&#34;600&#34; height=&#34;513&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Gore Street, Fitzroy &lt;a href=&#34;https://slv.wraggelabs.com/cua/?photoset=gore-street-fitzroy-gertrude-street-webb-street&#34;&gt;in the new CUA Browser&lt;/a&gt;&lt;/figcaption&gt;&lt;/figure&gt;
&lt;h2 id=&#34;the-process&#34;&gt;The process&lt;/h2&gt;
&lt;p&gt;My basic plan was to find the intersections using &lt;a href=&#34;https://www.openstreetmap.org/&#34;&gt;OpenStreetMap&lt;/a&gt;, then extract geospatial information about the segment of road between the two intersections. This involved much trial and error, but eventually I ended up with a process that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;parsed each item title to try and extract the names of the main road, the suburb, and the two intersecting roads&lt;/li&gt;
&lt;li&gt;queried &lt;a href=&#34;https://nominatim.org&#34;&gt;Nominatim&lt;/a&gt; for the suburb bounding box&lt;/li&gt;
&lt;li&gt;for each intersecting road, queried OSM to find a node at, or around, its intersection with the main road, within the suburb bounding box&lt;/li&gt;
&lt;li&gt;created a new bounding box from the coordinates of the two intersections&lt;/li&gt;
&lt;li&gt;queried OSM for the main road within this bounding box&lt;/li&gt;
&lt;li&gt;extracted the coordinates of the main road segment, removing any points outside of the bounding box&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There&amp;rsquo;s more details below and in these notebooks: &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua_finding_intersections.ipynb&#34;&gt;cua_finding_intersections.ipynb&lt;/a&gt; and &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua_data_processing.ipynb&#34;&gt;cua_data_processing.ipynb&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;finding-intersections&#34;&gt;Finding intersections&lt;/h2&gt;
&lt;p&gt;As described, the title of each photograph generally includes 4 pieces of information: the road, suburb, intersecting roads, and side. My plan was to find the intersections first to get the limits of the road segment. This is possible thanks to the awesome &lt;a href=&#34;https://www.openstreetmap.org/&#34;&gt;OpenStreetMap&lt;/a&gt; and its &lt;a href=&#34;https://wiki.openstreetmap.org/wiki/Overpass_API&#34;&gt;Overpass API&lt;/a&gt;. It took me a while to get my head around the Overpass query language, but there are lots of &lt;a href=&#34;https://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_API_by_Example&#34;&gt;useful examples online&lt;/a&gt;. The query to find the intersection between Gore Street and Gertrude Street in Fitzroy looks like this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;[bbox:-37.8089071,144.9732006,-37.7929130,144.9851430];
way[&#39;highway&#39;][name=&amp;quot;Gore Street&amp;quot;];
node(w)-&amp;gt;.n1;
way[&#39;highway&#39;][name=&amp;quot;Gertrude Street&amp;quot;];
node(w)-&amp;gt;.n2;
node.n1.n2; 
out body;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can &lt;a href=&#34;https://overpass-turbo.eu/s/2jq8&#34;&gt;try it out&lt;/a&gt; using Overpass Turbo&amp;rsquo;s web interface.&lt;/p&gt;
&lt;p&gt;In OpenStreetMap, linear features, such as roads or rivers, are represented as &lt;a href=&#34;https://wiki.openstreetmap.org/wiki/Way&#34;&gt;&lt;code&gt;ways&lt;/code&gt;&lt;/a&gt;. Each way is made up of a series of &lt;code&gt;nodes&lt;/code&gt; or points with geospatial coordinates. Every way and node has its own unique identifier. Tags can be added to features to describe what type of things they are.&lt;/p&gt;
&lt;p&gt;The query above looks for &lt;code&gt;ways&lt;/code&gt; named &amp;lsquo;Gore Street&amp;rsquo; and &amp;lsquo;Gertrude Street&amp;rsquo; that are tagged as &lt;code&gt;highway&lt;/code&gt; (a &lt;code&gt;highway&lt;/code&gt; in OpenStreetMap is any road-like feature including things bike paths and foot trails).&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code class=&#34;language-way[&#39;highway&#39;][name=&#34;Gore&#34; data-lang=&#34;way[&#39;highway&#39;][name=&#34;Gore&#34;&gt;way[&#39;highway&#39;][name=&amp;quot;Gore Street&amp;quot;];
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It then extracts the nodes that make up each way and looks to see if there are any nodes in common between the two ways.  A node shared between two ways indicates an intersection.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;node(w)-&amp;gt;.n2;
node.n1.n2; 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The query is limited using a bounding box that encloses the suburb of Fitzroy. This avoids false positives and keeps down the query load.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;[bbox:-37.8089071,144.9732006,-37.7929130,144.9851430];
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The JSON result of this query gives as the latitude and longitude of the node at the intersection of the two roads.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;{
  &amp;quot;version&amp;quot;: 0.6,
  &amp;quot;generator&amp;quot;: &amp;quot;Overpass API 0.7.62.10 2d4cfc48&amp;quot;,
  &amp;quot;osm3s&amp;quot;: {
    &amp;quot;timestamp_osm_base&amp;quot;: &amp;quot;2026-01-27T03:11:45Z&amp;quot;,
    &amp;quot;copyright&amp;quot;: &amp;quot;The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.&amp;quot;
  },
  &amp;quot;elements&amp;quot;: [

{
  &amp;quot;type&amp;quot;: &amp;quot;node&amp;quot;,
  &amp;quot;id&amp;quot;: 224750459,
  &amp;quot;lat&amp;quot;: -37.8062302,
  &amp;quot;lon&amp;quot;: 144.9817848
}

  ]
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;After a bit of testing, I found this worked pretty well, except for roundabouts&amp;hellip; In OpenStreetMap, roads don&amp;rsquo;t actually cross roundabouts – they end on one side, then begin anew on the other side. In cases like this, looking for shared nodes doesn&amp;rsquo;t work. Instead you have to look to see if the two roads have nodes that are less than a given distance apart. The query is similar to the one above, but uses &lt;code&gt;around&lt;/code&gt; when comparing the nodes. In this case I&amp;rsquo;m looking for nodes that are within 20 metres of each other.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;node(w.w2)(around.w1:20);
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;finding-road-segments&#34;&gt;Finding road segments&lt;/h2&gt;
&lt;p&gt;Once I had the coordinates of the two intersections, I could look for the segment of road between between them. To do this I created a bounding box using the coordinates of the intersections, and then searched for ways by name within that defined area.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s important to note that there&amp;rsquo;s no one-to-one correspondence between roads and OSM ways. A single road might be represented in OSM as a series of separate, but connected, ways. For example, at a roundabout, or where a road divides, new ways might have been created to document the change. This means that when we query OSM for details of a road we often get back information about multiple ways. Some of these might be things like bike paths which we can filter using tags, but often they&amp;rsquo;ll be sections of the road that we want. For example, this query for Gore Street, within the bounds of its intersections with Gertrude Street and Webb Street, returns details of two ways.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;way[&amp;quot;highway&amp;quot;~&amp;quot;^(trunk|primary|secondary|tertiary|unclassified|residential|service|track|pedestrian|living_street)$&amp;quot;][name=&amp;quot;Gore Street&amp;quot;](-37.8062302,144.98128480000003,-37.8040076,144.9826827);
out body;
&amp;gt;;
out body;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can &lt;a href=&#34;https://overpass-turbo.eu/s/2jqi&#34;&gt;view the result&lt;/a&gt; in Overpass Turbo.&lt;/p&gt;
&lt;p&gt;However, that doesn&amp;rsquo;t mean that the full extent of both ways is contained within the bounding box, just that some of the nodes of both ways are inside. Because of this, I filtered the results from all the ways and only kept nodes whose coordinates were within the desired region.&lt;/p&gt;
&lt;h2 id=&#34;problems-finding-intersections&#34;&gt;Problems finding intersections&lt;/h2&gt;
&lt;p&gt;The method described above works pretty well, and once I understood enough about the Overpass API to get out actual paths that I could display on a map, I fed all of the CUA photos through a script and got useful data for more than 80% of them.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/screenshot-from-2025-09-27-17-34-36.png&#34; width=&#34;600&#34; height=&#34;652&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;One of my early tests.&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Then I spent a &lt;em&gt;lot of&lt;/em&gt; time trying to understand where the remainder were failing.&lt;/p&gt;
&lt;p&gt;Some of them failed because the titles were missing information, or were formatted in a way I didn&amp;rsquo;t expect. For example, instead of a second intersecting road, some titles just said &amp;lsquo;to end&amp;rsquo;. This makes perfect sense to a human looking at a map, but it&amp;rsquo;s difficult to handle programmatically.&lt;/p&gt;
&lt;p&gt;Some photos either recorded the wrong suburb, or the boundaries of the suburb had moved since the photos were taken. For example, many of the photos described as being from Eaglehawk are now in California Gully.&lt;/p&gt;
&lt;p&gt;Similarly, some road names were wrong either because of documentation errors, or because the names have changed over time. There are also some variations in the way OSM records road names – in particular, I found that roads with hyphenated names sometimes had spaces around the hyphen and sometimes didn&amp;rsquo;t. There were also a couple of cases where names weren&amp;rsquo;t attached to the corresponding road segment in OSM, but I was able to edit these in OSM directly.&lt;/p&gt;
&lt;p&gt;Other roads had multiple names, or change names along their path. I mean, what&amp;rsquo;s going on with Brunswick Street and St Georges Road in Fitzroy? Country towns seemed most prone to this – a highway might become &amp;lsquo;Main Road&amp;rsquo; within the town boundaries, or the order of hyphenated places in road names might change. I found one road in Clunes that had four different names within the space of a few hundred metres.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/clunes.png&#34; width=&#34;582&#34; height=&#34;1002&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;One road, four names!&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Finally, the routes of some roads had changed – intersections no longer intersected, roads were closed, or new parks had popped up to split a road in two.&lt;/p&gt;
&lt;p&gt;My processing script logged the titles I couldn&amp;rsquo;t locate and I worked through the list manually, trying to identify what each problem was. I suppose there&amp;rsquo;s two ways I could&amp;rsquo;ve handled these problems – building more fuzziness into the process to check for things like alternative names, or by compiling a list of &amp;lsquo;corrected&amp;rsquo; titles. I started off using the first approach, but as I worked through more and more anomalies, the checking logic became very complicated and inefficient. Just think about the knots you can tie yourself in trying to handle a title where the suburb is wrong and the main road changes names in between intersections.&lt;/p&gt;
&lt;p&gt;I refactored the code multiple times, but it&amp;rsquo;s still pretty messy. In the end I created a list of &amp;lsquo;corrected&amp;rsquo; titles as well, so it was a bit of a hybrid approach. I suspect I could have saved myself a lot of pain if I&amp;rsquo;d reversed the process – compiling &amp;lsquo;corrected&amp;rsquo; titles first, then adapting the logic as patterns emerged.&lt;/p&gt;
&lt;p&gt;There are still some photos I haven&amp;rsquo;t located. In some cases I just don&amp;rsquo;t have enough information. In others I need to manually record coordinates or way ids to feed into the process, and I haven&amp;rsquo;t worked out the best way to do this yet. You can see the titles that I&amp;rsquo;ve haven&amp;rsquo;t geolocated yet in the files: &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua-not-found.txt&#34;&gt;&lt;code&gt;cua-not-found.txt&lt;/code&gt;&lt;/a&gt; and &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua-not-parsed.txt&#34;&gt;&lt;code&gt;cua-not-parsed.txt&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In total, 18,603 out of 20,644 photos have been geolocated. That&amp;rsquo;s over 90%!&lt;/p&gt;
&lt;h2 id=&#34;assembling-the-data&#34;&gt;Assembling the data&lt;/h2&gt;
&lt;p&gt;I processed the data in a couple of phases to get it in the shape I wanted.&lt;/p&gt;
&lt;p&gt;The first step was to group all the photos by title, so I could link each group to its location. But remember that titles often record which &lt;em&gt;side&lt;/em&gt; of the road a photo was taken on. To bring all sides of a road segment together into a single group, I created a key from a normalised/slugified version of the title with the side value removed. I used this key to save information about each side within the same group.&lt;/p&gt;
&lt;p&gt;I ended up with a dataset with this sort of structure (a truncated example):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;iffla-street-south-melbourne-coventry-street-normanby-street&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;:&lt;/span&gt;
    {
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;title&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Iffla Street, South Melbourne, from Coventry Street to Normanby Street&amp;#34;&lt;/span&gt;,
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;sides&amp;#34;&lt;/span&gt;:
        {
            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;east side&amp;#34;&lt;/span&gt;:
            {
                &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;title&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Iffla Street, South Melbourne, from Coventry Street to Normanby Street - east side.&amp;#34;&lt;/span&gt;,
                &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;images&amp;#34;&lt;/span&gt;:
                [
                    {
                        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ie_id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;IE20321667&amp;#34;&lt;/span&gt;,
                        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;alma_id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#ae81ff&#34;&gt;9939649155207636&lt;/span&gt;
                    }
                    &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;more&lt;/span&gt; &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;photos...&lt;/span&gt;
                ]
            },
            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;west side&amp;#34;&lt;/span&gt;:
            {
                &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;title&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Iffla Street, South Melbourne, from Normanby Street to Coventry Street - west side.&amp;#34;&lt;/span&gt;,
                &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;images&amp;#34;&lt;/span&gt;:
                [
                    {
                        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ie_id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;IE20320072&amp;#34;&lt;/span&gt;,
                        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;alma_id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#ae81ff&#34;&gt;9939655629407636&lt;/span&gt;
                    },
					&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;more&lt;/span&gt; &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;photos...&lt;/span&gt;
                ]
            }
        },
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;ways&amp;#34;&lt;/span&gt;:
        {
            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;27631235&amp;#34;&lt;/span&gt;:
            [
                [
                    &lt;span style=&#34;color:#ae81ff&#34;&gt;144.9503379&lt;/span&gt;,
                    &lt;span style=&#34;color:#ae81ff&#34;&gt;-37.835322&lt;/span&gt;
                ],
                &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;more&lt;/span&gt; &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;points...&lt;/span&gt;
            ]
        }
    }&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;,&lt;/span&gt;
		
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can see how the sides and matching ways have been brought together under the key value.&lt;/p&gt;
&lt;p&gt;This structure was useful for grouping and processing the data, but to create a map interface I needed to bring the geospatial information to the surface. The first version of the interface used one big GeoJSON file in which the features were &lt;a href=&#34;https://geocrystal.github.io/geojson/GeoJSON/MultiLineString.html&#34;&gt;MultiLineStrings&lt;/a&gt; created from the paths of each road segment. The photo data was saved in the properties of each GeoJSON feature.&lt;/p&gt;
&lt;p&gt;It sort of worked. The roads with photos were highlighted, and clicking on the roads displayed the photos. It was only when I changed the opacity of the lines that I realised that, in many cases, different road segments were being piled on top of each other. When the lines were opaque these piles were invisible, but add a bit of transparency and you could see that some lines were darker than others. Clicking on the lines only displayed the top layer, so some groups of photos were effectively invisible.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/screenshot-from-2025-12-01-14-09-47.png&#34; width=&#34;503&#34; height=&#34;339&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Version one of the interface showing how the colour of the highlighted roads varied once I decreased the opacity.&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Why did this happen? I&amp;rsquo;d wrongly assumed that each segment of road would only have one group of photos associated with it. But it&amp;rsquo;s not hard to find cases where this is not true. Consider Moor Street, Fitzroy, between Nicholson Street and Brunswick Street. On the north side, there is a single group of photos that document the buildings between Nicholson Street and Brunswick Street. However, on the south side there&amp;rsquo;s two groups of photos. One covers the section between Nicholson Street and Fitzroy Street, the other covers Fitzroy Street to Brunswick Street. One section of road, three groups of photos&amp;hellip;&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2026/cua-multiple.png&#34; width=&#34;600&#34; height=&#34;478&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Moor Street, Fitzroy, between Nicholson Street and Brunswick Street, in the new CUA Browser, showing the three photosets associated with the one section of road.&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;To make these layered groups more easily accessible through the interface I had to change the way the data was organised – separating the GeoJSON from the photosets so that multiple photosets could be associated with a single geospatial feature. I decided to create a GeoJSON feature for every OSM way in the dataset. However, I needed to prune the way&amp;rsquo;s coordinates to only include those that were part of the CUA road segments. To do this, I saved all the way data when I found the road segments. Then in the second processing phase, I grouped the way coordinates associated with the road segments and compared this list to the full way path. Any coordinate in the way path that wasn&amp;rsquo;t in the road segments was removed. It seems unnecessarily complex, but I wanted to make sure that only the parts of roads associated with photos were highlighted in the interface.&lt;/p&gt;
&lt;p&gt;The result was two data files. The first, &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua-ways.geojson&#34;&gt;&lt;code&gt;cua-ways.geojson&lt;/code&gt;&lt;/a&gt;, contains the pruned way paths and their OSM identifiers. The second, &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/cua-photos.json&#34;&gt;&lt;code&gt;cua-photos.json&lt;/code&gt;&lt;/a&gt;, contains information about each photo set, including the sides, photos, paths, and associated way identifiers. The datasets are linked by the way identifiers.&lt;/p&gt;
&lt;h2 id=&#34;constructing-the-interface&#34;&gt;Constructing the interface&lt;/h2&gt;
&lt;p&gt;My plan for the interface was pretty simple. There&amp;rsquo;d be a map on which all the road segments associated with CUA photos were highlighted. Clicking on a highlighted section would show the photos. I wanted to display the photos as if you were scanning the streetscape, so I decided to put them all side-by-side in a gallery that scrolled horizontally.&lt;/p&gt;
&lt;p&gt;The first version used Leaflet to display the maps and, as noted above, had some problems where there were multiple photosets associated with a segment of road.&lt;/p&gt;
&lt;p&gt;For the &lt;a href=&#34;https://slv.wraggelabs.com/cua/&#34;&gt;second version&lt;/a&gt; I decided to switch to &lt;a href=&#34;https://maplibre.org&#34;&gt;MapLibre&lt;/a&gt; because it seems a bit more active and up-to-date. I&amp;rsquo;d already used MapLibre in the &lt;a href=&#34;https://slv.wraggelabs.com/newspapers/&#34;&gt;SLV Newspapers Explorer&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The interface first loads the  &lt;code&gt;cua-ways.geojson&lt;/code&gt; file to highlight the relevant roads. When you click on one of the roads, the way id is passed to a function that looks for associated photo sets in the &lt;code&gt;cua-photos.json&lt;/code&gt; data. If there&amp;rsquo;s only one linked photoset, then the photos are displayed. However, if there&amp;rsquo;s more than one linked photoset, they&amp;rsquo;re displayed as a list. The user then selects from the list to display the related photos.&lt;/p&gt;
&lt;p&gt;A couple of other things happen when you click on a way or select a photoset:  the colour of the selected road segment changes, and the browser url is updated with the way or photoset identifier. You can bookmark or share these urls to go directly to a specific road or photoset. There&amp;rsquo;s also a button to reverse the order of the images – they scroll left to right, but sometimes they seem to have been photographed right to left.&lt;/p&gt;
&lt;h2 id=&#34;more-information-and-links&#34;&gt;More information and links&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://slv.wraggelabs.com/cua/&#34;&gt;CUA Browser&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CUA data is also used in the &lt;a href=&#34;https://slv.wraggelabs.com/myplace/&#34;&gt;my place&lt;/a&gt; app&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CUA code and data is in the &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency&#34;&gt;geo-maps-residency&lt;/a&gt; repository&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Code for the interface is in the &lt;a href=&#34;https://github.com/wragge/slv-demo-apps&#34;&gt;slv-demo-apps&lt;/a&gt; repository&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;all the outcomes of my SLV residency are listed on the &lt;a href=&#34;https://slv.wraggelabs.com&#34;&gt;Experiments with the State Library of Victoria’s collections&lt;/a&gt; page&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Goodbye 2025! A brief summary of the highlights and lowlights…</title>
      <link>https://updates.timsherratt.org/2025/12/31/goodbye-a-brief-summary-of.html</link>
      <pubDate>Wed, 31 Dec 2025 17:14:14 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/12/31/goodbye-a-brief-summary-of.html</guid>
      <description>&lt;p&gt;My 2025 started badly and ended well. In the first few months of the year, battles with the gatekeepers at Trove sent me spiralling into a pretty dark place. But by year’s end I was having fun, working with the wonderful people at the State Library of Victoria. In between I caught up on some overdue project maintenance. Most of this is documented in the &lt;a href=&#34;https://updates.timsherratt.org/archive/&#34;&gt;37 blog posts I wrote this year&lt;/a&gt;, but here’s a quick summary.&lt;/p&gt;&lt;h2&gt;Highlights&lt;/h2&gt;&lt;ul&gt;&lt;li&gt;From September to December, I was Creative Technologist-in-Residence at the SLV LAB, exploring ways of opening up the Library’s place-based collections. There’s still a few things to finish off, but &lt;a href=&#34;https://slv.wraggelabs.com/&#34;&gt;here’s list of the results so far&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;As part of my SLV work, I created a &lt;a href=&#34;https://glam-workbench.net/state-library-victoria/sands-macdougall-directories/&#34;&gt;fully searchable version&lt;/a&gt; of the 24 volumes of Sands &amp;amp; MacDougall directories digitised by the Library. This followed the pattern I’d used for 54 volumes of the &lt;a href=&#34;https://glam-workbench.net/trove-journals/nsw-post-office-directories/&#34;&gt;NSW Post Office Directories&lt;/a&gt;, 44 volumes of the &lt;a href=&#34;https://glam-workbench.net/trove-journals/sydney-telephone-directories/&#34;&gt;Sydney Telephone Directories&lt;/a&gt;, and 54 volumes of the &lt;a href=&#34;https://glam-workbench.net/tasmanian-post-office-directories/&#34;&gt;Tasmanian Post Office Directories&lt;/a&gt;. So there’s now 176 volumes from the 1880s to the 1950s that can be easily explored for people and places— and all free to use of course.&lt;/li&gt;&lt;li&gt;In April, I added a &lt;a href=&#34;https://updates.timsherratt.org/2025/04/30/new-prov-section-added-to.html&#34;&gt;new section to the GLAM Workbench&lt;/a&gt; documenting the Public Record Office Victoria’s collection API. I also used the API to create a &lt;a href=&#34;https://updates.timsherratt.org/2025/04/10/using-the-public-record-office.html&#34;&gt;Data Dashboard that provides an overview of PROV’s collection&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;Also in April, I updated the GLAM Name Index Search &lt;a href=&#34;https://updates.timsherratt.org/2025/04/09/more-than-million-rows-of.html&#34;&gt;to include an additional 6 million records from PROV&lt;/a&gt;. In total, the GLAM Name Index now includes more that 12 million records in 293 datasets from 10 Australian GLAM organisations — another free resource for Australian researchers.&lt;/li&gt;&lt;li&gt;In July I undertook some overdue maintenance on a variety of old apps and projects. In the process, I &lt;a href=&#34;https://updates.timsherratt.org/2025/07/09/the-rebirth-of-wragge-labs.html&#34;&gt;resurrected my old Wragge Labs domain and created a showcase&lt;/a&gt; of many of the websites, apps and experiments I’ve worked on over the past 30 years.&lt;/li&gt;&lt;li&gt;I was particularly pleased &lt;a href=&#34;https://updates.timsherratt.org/2025/07/02/the-future-of-the-past.html&#34;&gt;to get &lt;em&gt;The Future of the Past&lt;/em&gt; working again&lt;/a&gt;, so once more you can create fridge magnet poetry from an odd collection of words harvested from Trove newspapers! I built FOTP back in 2012 when I was the Harold White Fellow at the NLA. Also this year I finally got around to &lt;a href=&#34;https://updates.timsherratt.org/2025/06/30/mining-for-meanings.html&#34;&gt;transcribing my Harold White Lecture&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;In June I wrote a &lt;a href=&#34;https://updates.timsherratt.org/2025/06/05/glam-workbench-preprint-for-building.html&#34;&gt;short piece on the GLAM Workbench&lt;/a&gt; for the forthcoming publication &lt;em&gt;Building User-Friendly Toolkits and Platforms for Digital Humanities&lt;/em&gt;. I think it provides a useful summary of what the GLAM Workbench is, and what I’d like it to be. I also wrote up the &lt;a href=&#34;https://updates.timsherratt.org/2025/06/19/a-brief-and-biased-history.html&#34;&gt;short but glorious history of Trove Twitter bots&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;h2&gt;Lowlights&lt;/h2&gt;&lt;ul&gt;&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;Saying goodbye to 15 years of work on Trove&lt;/a&gt;. It still hurts. And I still miss resources such as @TroveNewsBot and the Trove API Console which ran happily for more than a decade before being killed without warning by the NLA.&lt;/li&gt;&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2025/05/19/no-more-harvesting-data-from.html&#34;&gt;Saying goodbye to 17 years of work on the National Archives of Australia’s collections&lt;/a&gt;. This will be the first New Year’s Day in a decade when I haven’t updated my &lt;a href=&#34;https://updates.timsherratt.org/2025/02/05/ten-years-of-data-the.html&#34;&gt;harvest of files with the access status of closed&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;h2&gt;Next year&lt;/h2&gt;&lt;ul&gt;&lt;li&gt;In 2026, I’m looking forward to starting work on the &lt;a href=&#34;https://ardc.edu.au/project/reusable-and-accessible-public-interest-documents-rapid/&#34;&gt;RAPID project&lt;/a&gt;, building on the work I’ve done on Commonwealth Hansard over the years to create new examples and documentation.&lt;/li&gt;&lt;li&gt;I’m honoured to be giving the closing keynote at the &lt;a href=&#34;https://www.glamlabs.io/events/glam-labs-futures-26&#34;&gt;GLAM Labs Futures conference&lt;/a&gt; in Edinburgh in June — hoping we can pull together the funds to get there in person!&lt;/li&gt;&lt;/ul&gt;&lt;h2&gt;How you can help&lt;/h2&gt;&lt;p&gt;Much of my work is unfunded, and keeping resources such as the GLAM Name Index running costs real money. I’ve been very grateful for the support of my GitHub sponsors over past years. Their contributions help cover a substantial proportion of my cloud hosting costs. But bidding farewell to Twitter and Trove has had an impact on my sponsorship income. If you use or value the things I build to help researchers make use of GLAM collections, you might like to &lt;a href=&#34;https://github.com/sponsors/wragge&#34;&gt;sponsor me on GitHub&lt;/a&gt;, or &lt;a href=&#34;https://www.buymeacoffee.com/wragge&#34;&gt;Buy Me a Coffee&lt;/a&gt;. All contributions are greatly appreciated!&lt;/p&gt;&lt;p&gt;If you can’t afford a financial contribution, there are other ways you can help!&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Let me know how you’re using my stuff! A bit of positive feedback does wonders when my enthusiasm is flagging. You can find my contact details at &lt;a href=&#34;https://timsherratt.au/&#34;&gt;timsherratt.au&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;Tell others how you use my stuff! Getting information about resources out to those who might benefit is really hard, so your help would be greatly appreciated.&lt;/li&gt;&lt;li&gt;The GLAM Workbench describes a few other ways &lt;a href=&#34;https://glam-workbench.net/get-involved/&#34;&gt;you can get involved&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Goodbye 2025!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Exploring Victorian newspapers</title>
      <link>https://updates.timsherratt.org/2025/12/16/exploring-victorian-newspapers.html</link>
      <pubDate>Tue, 16 Dec 2025 13:03:48 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/12/16/exploring-victorian-newspapers.html</guid>
      <description>&lt;p&gt;Newspapers are a vital source for local history. That&amp;rsquo;s why, &lt;a href=&#34;https://discontents.com.au/easter-eggsperiments/index.html&#34;&gt;back in 2014&lt;/a&gt;, I created the &lt;a href=&#34;https://wraggelabs.com/trove-places/map/&#34;&gt;Trove Places&lt;/a&gt; app – a map interface to help people find Trove&amp;rsquo;s digitised newspapers by their place of publication or distribution. Trove Places has proved very popular, and the State Libraries of South Australia, and Victoria, amongst others, point their users to it to help with their research. I&amp;rsquo;ve updated the data several times over the years, though the &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;Trove&amp;rsquo;s new gatekeeping regime&lt;/a&gt; will make future updates difficult.&lt;/p&gt;
&lt;p&gt;During &lt;a href=&#34;https://lab.slv.vic.gov.au/team/tim-sherratt&#34;&gt;my residency at the State Library of Victoria&lt;/a&gt;, one of the librarians noted how useful the app was, and asked whether it might be possible to include undigitised newspapers from the SLV catalogue as well as those in Trove. It was, and I did – here&amp;rsquo;s a &lt;a href=&#34;https://slv.wraggelabs.com/newspapers/&#34;&gt;brand new app to explore Victorian newspapers&lt;/a&gt;, both digitised and undigitised!&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-12-12-12-48-16.png&#34; width=&#34;600&#34; height=&#34;391&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Just &lt;a href=&#34;https://slv.wraggelabs.com/newspapers/&#34;&gt;click on the map&lt;/a&gt; to find Victorian newspapers!&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;It&amp;rsquo;s pretty easy to use. You just click on the map in an area you&amp;rsquo;re interested in. The map will display the 20 nearest places where newspapers where published or distributed. The size of the markers indicates how many titles are associated with each place. In the sidebar, details of the newspapers are listed by place, ordered by their distance from your selected point.&lt;/p&gt;
&lt;p&gt;You can also find local newspapers using the &lt;a href=&#34;https://slv.wraggelabs.com/myplace/&#34;&gt;my place&lt;/a&gt; app. Once you enter an address, newspapers from your suburb or town will be displayed, as well as those from nearby locations.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-12-16-12-51-54.png&#34; width=&#34;600&#34; height=&#34;507&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;newspapers from Geelong displayed in the my place app&lt;/figcaption&gt;&lt;/figure&gt;
&lt;h2 id=&#34;assembling-the-data&#34;&gt;Assembling the data&lt;/h2&gt;
&lt;p&gt;How do you find Victorian newspapers? The reference librarians at the SLV pointed me to the &amp;lsquo;Place newspaper published&amp;rsquo; field in the catalogue. Searching this field for &amp;lsquo;Australia&amp;ndash;Victoria&amp;rsquo; &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/search?query=lds03,exact,Australia--Victoria&amp;amp;tab=searchProfile&amp;amp;search_scope=slv_local&amp;amp;vid=61SLV_INST:SLV&#34;&gt;returns 3,997 results&lt;/a&gt;, compared to the 460 digitised in Trove.&lt;/p&gt;
&lt;p&gt;The first step in assembling the data was to harvest the newspaper records from the SLV catalogue. To do this I made use of the Primo JSON API. The method is &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/download_newspapers.ipynb&#34;&gt;documented in this notebook&lt;/a&gt;. The results was a &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/newspapers.ndjson&#34;&gt;newline-delimited JSON file&lt;/a&gt;, with one record per line.&lt;/p&gt;
&lt;p&gt;The harvested metadata doesn&amp;rsquo;t include links to digitised versions of newspapers in Trove. To add these links I first looked in the &lt;code&gt;856&lt;/code&gt; field of the newspaper&amp;rsquo;s MARC record. I also noticed that some Trove links were being loaded from an &amp;lsquo;edelivery&amp;rsquo; JSON file, so I added these as well. I ended up with 344 unique links to Trove, but not all of these were to digitised newspapers as some more recent newspapers are available through eLegal deposit. In total there were 268 unique links to digitised newspapers. This is well short of the 460 Victorian newspapers in Trove. Why? It&amp;rsquo;s possible that the links haven&amp;rsquo;t been added into the SLV catalogue, or that the &amp;lsquo;place newspaper published&amp;rsquo; field hasn&amp;rsquo;t been populated for records that include the links. It&amp;rsquo;s also possible that Trove links are hiding somewhere else in the SLV catalogue!&lt;/p&gt;
&lt;p&gt;To try and fill this gap, I compared the catalogue metadata with my &lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals/blob/master/data/total_articles_by_newspaper.csv&#34;&gt;most recent harvest of Trove newspaper titles&lt;/a&gt;. If the Trove url was missing, I searched the catalogue data for the newspaper title. I then manually checked the results, making sure the dates and titles lined up, and added positive matches to &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/newspaper_manual_additions.csv&#34;&gt;a new CSV file&lt;/a&gt; which I merged back into the main dataset. This added another 152 Trove links.&lt;/p&gt;
&lt;p&gt;The next step was to link the &amp;lsquo;place newspaper published&amp;rsquo; values to places with known locations. The &amp;lsquo;place newspaper published&amp;rsquo; information is included in the &lt;code&gt;lds03&lt;/code&gt; field of the harvested metadata. Records often contain references to multiple places, so I split all the newspaper/place combinations out into separate rows. I then matched the places against a list of Victorian place names and coordinates downloaded from the &lt;a href=&#34;https://maps.land.vic.gov.au/lassi/VicnamesUI.jsp&#34;&gt;VicNames database&lt;/a&gt;. If there were no matches, I manually checked and adjusted the place names – for example, I changed &amp;lsquo;East Kew&amp;rsquo; to &amp;lsquo;Kew East&amp;rsquo;, and &amp;lsquo;Bayside&amp;rsquo; to &amp;lsquo;Bayside City&amp;rsquo;.&lt;/p&gt;
&lt;p&gt;To add any Trove digitised newspapers that might still be missing, I made use of my existing Trove harvests. First I compared my &lt;a href=&#34;https://docs.google.com/spreadsheets/d/1rURriHBSf3MocI8wsdl1114t0YeyU0BVSXWeg232MZs/edit?usp=sharing&#34;&gt;Trove Places dataset&lt;/a&gt; with my &lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals/blob/master/data/total_articles_by_newspaper.csv&#34;&gt;latest harvest of newspaper titles&lt;/a&gt;. There were a few new titles, so I matched them to places &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/get_places_from_newspapers.ipynb&#34;&gt;using this notebook&lt;/a&gt;, based on my original Trove Places code. I then merged the Trove Places dataset with the new titles and checked it against the catalogue dataset. If any urls were missing, I added a record from the Trove data.&lt;/p&gt;
&lt;p&gt;All of the processing steps are &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/process_newspapers.ipynb&#34;&gt;documented in this notebook&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;building-the-apps&#34;&gt;Building the apps&lt;/h2&gt;
&lt;p&gt;To make the data easily searchable by its geospatial coordinates, I loaded all the data into an SQLite/Spatialite database and &lt;a href=&#34;https://slv-places-481615284700.australia-southeast1.run.app/newspapers&#34;&gt;published it online using Datasette&lt;/a&gt;. The database contains linked tables for titles and places.&lt;/p&gt;
&lt;p&gt;I also created a couple of canned queries which, together with Datasette&amp;rsquo;s built-in JSON API, made it possible to retrieve places and titles based on their distance from a given point. For example, this url retrieves places ordered by their distance from the point at latitude -36.815, longitude 144.965 : &lt;a href=&#34;https://slv-places-481615284700.australia-southeast1.run.app/newspapers/places_from_point.json?longitude=144.965&amp;amp;latitude=-36.815&amp;amp;distance=100000&amp;amp;_shape=array&#34;&gt;https://slv-places-481615284700.australia-southeast1.run.app/newspapers/places_from_point.json?longitude=144.965&amp;amp;latitude=-36.815&amp;amp;distance=100000&amp;amp;_shape=array&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When you click on the map in the &lt;a href=&#34;https://slv.wraggelabs.com/newspapers/&#34;&gt;Victorian Newspapers Explorer&lt;/a&gt;, it fires off a request like this to find nearby places. It then makes a second request to find newspapers related to those places and displays the results.&lt;/p&gt;
&lt;p&gt;The Victorian Newspapers Explorer was my first attempt at using &lt;a href=&#34;https://maplibre.org&#34;&gt;MapLibre&lt;/a&gt; rather than Leaflet to display maps using Javascript. It&amp;rsquo;s more verbose, but more flexible, so I think I&amp;rsquo;ll gradually switch over my other apps, including Trove Places.&lt;/p&gt;
&lt;p&gt;All the code of the Victorian Newspapers Explorer is in the &lt;a href=&#34;https://github.com/wragge/slv-demo-apps&#34;&gt;slv-demo-apps repository&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Why bother?</title>
      <link>https://updates.timsherratt.org/2025/12/03/why-bother.html</link>
      <pubDate>Wed, 03 Dec 2025 16:18:22 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/12/03/why-bother.html</guid>
      <description>&lt;p&gt;&lt;em&gt;This was the introduction to my talk on the results of my time as Creative Technologist-in-Residence at the State Library of Victoria. My slides, with my full notes &lt;a href=&#34;https://slides.com/wragge/slv-my-place&#34;&gt;are available online&lt;/a&gt;, but after a very strange year that has travelled from disappointment to exhilaration, I thought it was worth posting these words separately.&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;The work that I do, that I&amp;rsquo;ve been doing for the past 30 years, is focused on helping people find, use, and understand the wonderfully rich collections held by our libraries, archives, and museums – the GLAM sector. Much of it is quite practical, resulting in tools and applications that are used by a wide range of researchers. Some of it is playful, some of it is critical, and some of it is just weird.&lt;/p&gt;
&lt;p&gt;You can browse through some of this history on &lt;a href=&#34;https://wraggelabs.com&#34;&gt;wraggelabs.com&lt;/a&gt;. And if you&amp;rsquo;re interested in my current crop of tools you can head to the &lt;a href=&#34;https://glam-workbench.net&#34;&gt;GLAM Workbench&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As some of you may know, I had &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;a few setbacks&lt;/a&gt; at the beginning of this year which really made me wonder whether I wanted to continue doing this sort of work.&lt;/p&gt;
&lt;p&gt;I mean, why bother?&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m really grateful that &lt;a href=&#34;https://slv.wraggelabs.com&#34;&gt;this residency&lt;/a&gt; has given me a chance to refocus on the reasons why I do what I do.&lt;/p&gt;
&lt;p&gt;I suppose my starting point is the fact that libraries can&amp;rsquo;t do everything themselves.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m thinking here specifically about the digital research space. There&amp;rsquo;s a lot that libraries, and other GLAM organisations, &lt;em&gt;can&lt;/em&gt; do – provide search interfaces, APIs, downloadable datasets, documentation, and examples of how to access APIs and datasets using code. The sorts of things that the &lt;a href=&#34;https://lab.slv.vic.gov.au&#34;&gt;SLV LAB&lt;/a&gt; is doing.&lt;/p&gt;
&lt;p&gt;I should pause here to unpack some acronyms. APIs deliver data in a form that machines can understand and process. Websites are for humans, APIs are for computers. APIs are also building blocks which can be connected up to create new applications – and I&amp;rsquo;ll be showing some examples of this later on.&lt;/p&gt;
&lt;p&gt;So there is much that GLAM organisations can do to support digital research. But it will never be enough. Researchers – whether they be academics or family historians – will always want more. It is in the nature of research to ask new questions, to head off in new directions.&lt;/p&gt;
&lt;p&gt;But rather than see this as a source of tension, I see it as an opportunity for collaboration. An opportunity to cultivate the &lt;em&gt;in-between&lt;/em&gt; spaces where research methods, tools, and results can feed back into the contextual frameworks of GLAM collections. Where GLAM organisations can share and celebrate the work that&amp;rsquo;s done with their data. Where all can find inspiration, ideas and support.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ve tended to call this sort of stuff infrastructure, but I think that really downplays the human aspect. The research sector has started to develop the funding and career structures necessary to allow people to build and maintain these infrastructures, but we need more. We need to recognise that a single tool, developed by an individual without institutional support, can be just as important as a multi-million dollar platform. Passion is precious and needs to be protected.&lt;/p&gt;
&lt;p&gt;Most of all, we need to keep a focus on the ethical imperatives – the reasons &lt;em&gt;why&lt;/em&gt; we bother and &lt;em&gt;why&lt;/em&gt; it matters. For me it boils down to openness and generosity. I have benefited greatly from the openness and generosity of others, and I want to pass that on. It&amp;rsquo;s the glue we need to hold those in-between spaces together; the sustenance we need to maintain our enthusiasm in the face of all the crap; the inspiration we need to try something new.&lt;/p&gt;
&lt;p&gt;Initiatives like the SLV LAB are important, not just because they foster innovation, but because they invite new ideas in. They even give space for ageing hackers like me to spend some dedicated time doing what they love – crafting new pathways for people to explore our glorious GLAM collections.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Counting down... (to the end of my SLV residency)</title>
      <link>https://updates.timsherratt.org/2025/11/19/counting-down-to-the-end.html</link>
      <pubDate>Wed, 19 Nov 2025 16:01:22 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/11/19/counting-down-to-the-end.html</guid>
      <description>&lt;p&gt;My stint as &lt;a href=&#34;https://updates.timsherratt.org/2025/09/22/creative-technologistinresidence-at-the-state.html&#34;&gt;Creative Technologist-in-Residence at the State Library of Victoria LAB&lt;/a&gt; comes to an end in a few weeks time and I&amp;rsquo;m frantically trying to pull things together. I&amp;rsquo;ll be back on-site at the Library from 1 to 5 December for a few events, and to report back to staff on what I&amp;rsquo;ve been doing.&lt;/p&gt;
&lt;p&gt;On Tuesday &lt;strong&gt;2 December&lt;/strong&gt;, there&amp;rsquo;ll be a public workshop on using and contributing to the &lt;a href=&#34;https://glam-workbench.net&#34;&gt;GLAM Workbench&lt;/a&gt;. Here&amp;rsquo;s the blurb:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;More and more GLAM organisations are looking to share their data to foster creativity and support new types of research. But how can you help potential users understand the possibilities of your data? This workshop will explore how GLAM organisations can create and share resources that encourage experimentation.&lt;/p&gt;
&lt;p&gt;The GLAM Workbench is a large collection of tools, hacks, and tutorials aimed at helping researchers make use of collection data. It uses platforms such as Jupyter notebooks to create live, working examples that run in your browser without additional software. Similar repositories of computational resources are being developed by GLAM organisations around the world.&lt;/p&gt;
&lt;p&gt;This workshop will introduce the technologies and standards used in the GLAM Workbench, such as Jupyter notebooks. It will provide an overview of related activity around the world, including best practice guidelines for GLAM organisations developing computational resources. It will explain how organisations and individuals can contribute content to the GLAM Workbench, or use it as a model to create their own specialised workbenches.&lt;/p&gt;
&lt;p&gt;Sharing data is important, but so is sharing skills, tools, and knowledge. Come along to find out how the GLAM Workbench can help.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It&amp;rsquo;s a free, hybrid event (in person and online) and will run from 1.00-3.00pm. A sign up page should be available soon.&lt;/p&gt;
&lt;p&gt;On Wednesday &lt;strong&gt;3 December&lt;/strong&gt; I&amp;rsquo;m presenting the results of my residency in a &amp;lsquo;technologist&amp;rsquo;s talk&amp;rsquo;. It&amp;rsquo;s an internal event, but it&amp;rsquo;s in the public &amp;lsquo;Create quarter&amp;rsquo; of the Library, so I think anyone can pop in. Hopefully there&amp;rsquo;ll be a video I can share.&lt;/p&gt;
&lt;p&gt;To give you an idea of what I&amp;rsquo;ll be talking about, here&amp;rsquo;s some of the outcomes so far:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;hacking the library workshop (&lt;a href=&#34;https://slides.com/wragge/slv-code-club&#34;&gt;slides&lt;/a&gt;, and &lt;a href=&#34;https://updates.timsherratt.org/2025/09/23/exploring-slv-urls.html&#34;&gt;blog post about urls&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;bounding boxes for parish maps (&lt;a href=&#34;https://updates.timsherratt.org/2025/10/06/creating-bounding-boxes-for-parish.html&#34;&gt;blog post&lt;/a&gt;, &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency&#34;&gt;code&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;geolocating the Committee for Urban Action collection of photographs (&lt;a href=&#34;https://wragge.github.io/slv-demo-apps/cua-browser.html&#34;&gt;prototype interface&lt;/a&gt;, still documenting the method)&lt;/li&gt;
&lt;li&gt;a new fully-searchable version of the Sands &amp;amp; MacDougall&amp;rsquo;s directories (&lt;a href=&#34;https://updates.timsherratt.org/2025/11/12/a-new-way-of-searching.html&#34;&gt;blog post&lt;/a&gt;, &lt;a href=&#34;https://glam-workbench.net/state-library-victoria/sands-macdougall-directories/&#34;&gt;database&lt;/a&gt;, and &lt;a href=&#34;https://updates.timsherratt.org/2025/11/16/some-sands-mac-tweaks-thanks.html&#34;&gt;another blog post&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;georeferencing digitised maps – over 500 so far! (&lt;a href=&#34;https://wragge.github.io/slv-allmaps/&#34;&gt;documentation&lt;/a&gt;, &lt;a href=&#34;https://wragge.github.io/slv-allmaps/dashboard.html&#34;&gt;dashboard&lt;/a&gt;, &lt;a href=&#34;https://github.com/wragge/slv-allmaps&#34;&gt;data repository&lt;/a&gt;, &lt;a href=&#34;https://updates.timsherratt.org/2025/11/04/turning-the-slvs-maps-into.html&#34;&gt;blog post&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;and as of yesterday, 3,000+ geolocated newspapers (documentation and interface coming!)&lt;/li&gt;
&lt;/ul&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-11-18-15-22-26.png&#34; width=&#34;600&#34; height=&#34;356&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;First attempt at mapping places of publication and distribution of Victorian newspapers&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;At the moment I&amp;rsquo;m trying to bring it all together in a new interface that let&amp;rsquo;s you type in an address and find collection materials relating to your home, your street, and your suburb. Only two weeks to go! Eeek!&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-11-15-17-30-26.png&#34; width=&#34;600&#34; height=&#34;454&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Work in progress!&lt;/figcaption&gt;&lt;/figure&gt;
</description>
    </item>
    
    <item>
      <title>Some Sands &amp; Mac tweaks thanks to ALTO and IIIF</title>
      <link>https://updates.timsherratt.org/2025/11/16/some-sands-mac-tweaks-thanks.html</link>
      <pubDate>Sun, 16 Nov 2025 15:17:26 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/11/16/some-sands-mac-tweaks-thanks.html</guid>
      <description>&lt;p&gt;I posted recently about &lt;a href=&#34;https://updates.timsherratt.org/2025/11/12/a-new-way-of-searching.html&#34;&gt;my new fully-searchable version of the Sands &amp;amp; MacDougall directories&lt;/a&gt;. I&amp;rsquo;ve now moved on to try and pull together a number of the State Library of Victoria&amp;rsquo;s place-based collections into a new discovery interface. It&amp;rsquo;s going to be a busy couple of weeks as my residency ends in early December!&lt;/p&gt;
&lt;p&gt;I wanted to incorporate Sands &amp;amp; Mac search results into the new interface. Getting the data was easy because &lt;a href=&#34;https://datasette.io&#34;&gt;Datasette&lt;/a&gt; has a JSON API baked in. But what about the images? I could just display a thumbnail of the whole page, but it would be better to show a snippet of the actual entry. Thanks to &lt;a href=&#34;https://iiif.io&#34;&gt;IIIF&lt;/a&gt; and &lt;a href=&#34;https://en.wikipedia.org/wiki/Analyzed_Layout_and_Text_Object&#34;&gt;ALTO&lt;/a&gt;, I now can.&lt;/p&gt;
&lt;p&gt;IIIF makes it easy to cut small sections out of a larger image. You just put the coordinates of the desired section in the IIIF url. As I noted in my previous post, the ALTO files that contain the OCR data from Sands &amp;amp; Mac include the coordinates of every line, and every word. I just had to bring the two together.&lt;/p&gt;
&lt;p&gt;All I did was update the code that extracts the data from the ALTO files to save the results as newline delimited JSON instead of a plain text file. Each line in each volume of Sands &amp;amp; Mac is now saved a JSON object that contains the text, as well as the height, width, vertical position, and horizontal position of the line within the page image. When I load up the SQLite database, I add the values for &lt;code&gt;h&lt;/code&gt;, &lt;code&gt;w&lt;/code&gt;, &lt;code&gt;x&lt;/code&gt;, and &lt;code&gt;y&lt;/code&gt; as well as the text for each line.&lt;/p&gt;
&lt;p&gt;What does this make possible?&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When you go to an individual entry, the page image now automatically pans and zooms so that the current entry is at the centre of the image viewer. I just updated the &lt;a href=&#34;https://openseadragon.github.io&#34;&gt;OpenSeadragon&lt;/a&gt; code to focus on the entry&amp;rsquo;s position.&lt;/li&gt;
&lt;li&gt;If you share an entry on social media, a snipped out section of the page image showing the selected entry is displayed as there&amp;rsquo;s now an image &lt;code&gt;META&lt;/code&gt; tag that points to an IIIF url.&lt;/li&gt;
&lt;li&gt;You can retrieve entries via the API and use the coordinates to request snipped out images of them via IIIF.&lt;/li&gt;
&lt;/ol&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-11-16-15-06-52.png&#34; width=&#34;600&#34; height=&#34;330&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Nice image snippets thanks to IIIF and ALTO (and a sneak prview of what&#39;s coming...)&lt;/figcaption&gt;&lt;/figure&gt;
</description>
    </item>
    
    <item>
      <title>A new way of searching Sands &amp; Mac</title>
      <link>https://updates.timsherratt.org/2025/11/12/a-new-way-of-searching.html</link>
      <pubDate>Wed, 12 Nov 2025 22:25:21 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/11/12/a-new-way-of-searching.html</guid>
      <description>&lt;p&gt;In the fortnight I spent onsite at the State Library of Victoria, &amp;lsquo;Sands &amp;amp; Mac&amp;rsquo; was mentioned many times. And no wonder. The &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/collectionDiscovery?vid=61SLV_INST:SLV&amp;amp;collectionId=81213035910007636&#34;&gt;Sands &amp;amp; McDougall&amp;rsquo;s directories&lt;/a&gt; are a goldmine for anyone researching family, local, or social history. They list thousands of names and addresses, enabling you to find individuals, and explore changing land use over time. When people ask the SLV&amp;rsquo;s librarians, &amp;lsquo;What can you tell me about the history of my house?&amp;rsquo;, Sands &amp;amp; Mac is one of the first resources consulted.&lt;/p&gt;
&lt;p&gt;The SLV has digitised &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/collectionDiscovery?vid=61SLV_INST:SLV&amp;amp;collectionId=81213035910007636&#34;&gt;24 volumes of Sands &amp;amp; Mac&lt;/a&gt;, one every five years from 1860 to 1974. You can browse the contents of each volume in the SLV image viewer, using the partial contents listing to help you find your way to sections of interest. To search the full text content you need to use the PDF version, either in the built-in viewer, or by downloading the PDF. There&amp;rsquo;s a &lt;a href=&#34;https://blogs.slv.vic.gov.au/tips-and-tricks/collection-discovery-tips-sands-mcdougalls-directories/&#34;&gt;handy guide to using Sands &amp;amp; Mac&lt;/a&gt; that explains the options.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;However, there&amp;rsquo;s currently no way of searching across all 24 volumes, so as part of my residency at the SLV LAB, I thought I&amp;rsquo;d make one!&lt;/strong&gt;&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/sands-and-mac.png&#34; width=&#34;600&#34; height=&#34;310&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;&lt;a href=&#34;https://glam-workbench.net/state-library-victoria/sands-macdougall-directories/&#34;&gt;&lt;b&gt;Try it now!&lt;/b&gt;&lt;/a&gt;&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;My &lt;a href=&#34;https://glam-workbench.net/state-library-victoria/sands-macdougall-directories/&#34;&gt;new Sands &amp;amp; Mac database&lt;/a&gt; follows the pattern I&amp;rsquo;ve used previously to create fully-searchable versions of the &lt;a href=&#34;https://glam-workbench.net/trove-journals/nsw-post-office-directories/&#34;&gt;NSW Post Office directories&lt;/a&gt;, &lt;a href=&#34;https://glam-workbench.net/trove-journals/sydney-telephone-directories/&#34;&gt;Sydney telephone directories&lt;/a&gt;, and &lt;a href=&#34;https://glam-workbench.net/tasmanian-post-office-directories/&#34;&gt;Tasmanian Post Office directories&lt;/a&gt;. Every line of text is saved to a database, so a single query searches for entries across all volumes. You can also use advanced search features like wildcards and boolean operators.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/sands-and-mac-search.png&#34; width=&#34;600&#34; height=&#34;543&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Search across all 24 volumes!&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;Once you&amp;rsquo;ve found a relevant entry you can view it in context, alongside a zoomable image of the page. You can even use Zotero to save individual entries to your own research database. &lt;a href=&#34;https://chineseaustralia.org/from-the-archive-uncovering-the-everyday-heritage-of-chinese-tasmanians/&#34;&gt;This blog post&lt;/a&gt; from the Everyday Heritage project describes how the Tasmanian directories have been used to map Tasmania&amp;rsquo;s Chinese population.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/sands-and-mac-entry.png&#34; width=&#34;600&#34; height=&#34;370&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;View each entry in context! (Here&#39;s my Dad building his first house in Beaumaris in the 1950s.)&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;There&amp;rsquo;s still a few things I&amp;rsquo;d like to try, such as making use of the table of contents information for each volume. I&amp;rsquo;d also like to create some additional entry points to take users directly to listings for individual suburbs (maybe even streets!). Each volume has a directory of suburbs, so it would be a matter of extracting and cleaning the data and linking the entries to digitised pages. Certainly possible, but I don&amp;rsquo;t think I&amp;rsquo;ll have time to get it all done before the end of my residency. Perhaps I&amp;rsquo;ll try to get at least one volume done to demonstrate how it might work, and the value it would add. As I was writing this blog post I also realised there&amp;rsquo;s &lt;a href=&#34;https://www.environment.vic.gov.au/sustainability/victoria-unearthed/about-the-data/sands-and-mcdougall&#34;&gt;a dataset of businesses&lt;/a&gt; extracted from the Sands &amp;amp; Mac, so I need to think about how I can use that as well!&lt;/p&gt;
&lt;h2 id=&#34;technical-information-follows&#34;&gt;Technical information follows&amp;hellip;&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve documented the process I used to create fully-searchable versions of the &lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/&#34;&gt;Tasmanian&lt;/a&gt; and &lt;a href=&#34;https://glam-workbench.net/trove-journals/create-text-db-indexed-by-line/&#34;&gt;NSW directories&lt;/a&gt;  in the GLAM Workbench. I followed a similar method for Sands and Mac, though with a few dead-ends and discoveries along the way.&lt;/p&gt;
&lt;h3 id=&#34;downloading-the-pdfs&#34;&gt;Downloading the PDFs&lt;/h3&gt;
&lt;p&gt;I assumed that it would be easiest to work from the PDF versions of each volume, as I&amp;rsquo;d done for Tasmania. So I set about finding a way to download them all. There&amp;rsquo;s only 24 volumes, so I &lt;em&gt;could&lt;/em&gt; have downloaded them manually, but where&amp;rsquo;s the fun in that?&lt;/p&gt;
&lt;p&gt;I started with a CSV file listing the Sands &amp;amp; Mac volumes that I downloaded from the catalogue. This gave me the Alma identifiers for each volume. To download the PDFs I needed two more identifiers, the &lt;code&gt;IE&lt;/code&gt; identifier assigned to each digitised item, and a file identifier that points to the PDF version of the item. The &lt;code&gt;IE&lt;/code&gt; identifier can be extracted from the item&amp;rsquo;s MARC record, as I described in &lt;a href=&#34;https://updates.timsherratt.org/2025/09/23/exploring-slv-urls.html&#34;&gt;my post on exploring urls&lt;/a&gt;. The PDF file identifier was a bit more difficult to track down. The PDF links in the image viewer are generated dynamically, so the data had to be coming from somewhere. Eventually I found that the viewer loaded a JSON file with all sorts of useful metadata in it!&lt;/p&gt;
&lt;p&gt;The url to download the JSON file is: &lt;code&gt;https://viewerapi.slv.vic.gov.au/?entity=[IE identifier]&amp;amp;dc_arrays=1&lt;/code&gt;. In the &lt;code&gt;summary&lt;/code&gt; section I found identifiers for &lt;code&gt;small_pdf&lt;/code&gt; and &lt;code&gt;master_pdf&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;I could then use these identifiers to construct urls to download the PDFs themselves: &lt;code&gt;https://rosetta.slv.vic.gov.au/delivery/DeliveryManagerServlet?dps_func=stream&amp;amp;dps_pid=[PDF id]&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Once I had the PDFs I used &lt;a href=&#34;https://github.com/pymupdf/PyMuPDF&#34;&gt;PyMuPDF&lt;/a&gt; to extract all the text and images. As I suspected the text wasn&amp;rsquo;t really fit for purpose. The OCR was ok, but the column structures were a mess. Because I wanted to index each entry individually, it was important to try and get the columns represented as accurately as possible. The images in the small PDFs were already bitonal, so I started feeding them to &lt;a href=&#34;https://github.com/tesseract-ocr/tesseract&#34;&gt;Tesseract&lt;/a&gt; to see if I could get better results. After a bit of tweaking, things were looking pretty good. But when I came to compile all the data, I realised there was a potential problem matching the PDF pages to the images available through IIIF. I found one case where some pages were missing from the PDF, and another couple where the page order was different.&lt;/p&gt;
&lt;p&gt;As I was looking around for a solution, I realised that those JSON files I downloaded to get the PDF identifiers also included links to &lt;a href=&#34;https://en.wikipedia.org/wiki/Analyzed_Layout_and_Text_Object&#34;&gt;ALTO XML&lt;/a&gt; files that contain all  the original OCR data (before it got mangled by the PDF formatting). There was one ALTO file for every page. Even better, the JSON linked the identifiers for the text and the image together – no more page mismatches!&lt;/p&gt;
&lt;h3 id=&#34;downloading-the-alto-files&#34;&gt;Downloading the ALTO files&lt;/h3&gt;
&lt;p&gt;Let&amp;rsquo;s start this again shall we. After wasting several days futzing about with the PDFs, I decided to download all the ALTO files and extract the text from them. As I downloaded each XML file, I also grabbed the corresponding image identifier from the JSON and included both identifiers in the file name for safe keeping.&lt;/p&gt;
&lt;p&gt;The ALTO files break the text down by block, line, and word. To extract the text, I just looped through every line, joining the words back together as a string, and writing the result to a new text file – one for each page.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s worth noting that the ALTO files include &lt;em&gt;all&lt;/em&gt; the positional data generated by the OCR process, so you have the size and position of every word on every page. I just pulled out the text, but there are many more interesting things you could do&amp;hellip;&lt;/p&gt;
&lt;h3 id=&#34;assembling-and-publishing-the-database&#34;&gt;Assembling and publishing the database&lt;/h3&gt;
&lt;p&gt;From here on everything pretty much followed the pattern of the NSW and Tasmanian directories. I looped through each volume, page, and line of text, adding the text and metadata to a SQLite database using &lt;a href=&#34;https://sqlite-utils.datasette.io/en/stable/&#34;&gt;sqlite_utils&lt;/a&gt;. I then indexed the text for full-text searching. At the same time I populated a metadata file with titles, urls, and few configuration details. The metadata file is used by &lt;a href=&#34;https://datasette.io/&#34;&gt;Datasette&lt;/a&gt; to fill in parts of the interface.&lt;/p&gt;
&lt;p&gt;I made some minor changes to the Datasette template I used for the other directories. In particular, I had to update the urls that loaded the &lt;a href=&#34;https://iiif.io&#34;&gt;IIIF&lt;/a&gt; images into the &lt;a href=&#34;https://openseadragon.github.io&#34;&gt;OpenSeadragon viewer&lt;/a&gt;. But it mostly just worked. It&amp;rsquo;s so nice to be able to reuse existing patterns!&lt;/p&gt;
&lt;p&gt;Finally, I used &lt;a href=&#34;https://docs.datasette.io/en/stable/publish.html&#34;&gt;Datasette&amp;rsquo;s &lt;code&gt;publish&lt;/code&gt; command&lt;/a&gt; to push everything to Google Cloudrun. The final database contains details of more than 50,000 pages, and over 19 million lines of text! It weighs in at about 1.7gb. The Cloudrun service will &amp;lsquo;scale to zero&amp;rsquo; when not in use. This saves some money and resources, but means it can take a little while to spin up. Once it&amp;rsquo;s loaded, it&amp;rsquo;s very fast. My &lt;a href=&#34;https://updates.timsherratt.org/2022/09/15/from-pdfs-to.html&#34;&gt;original post on the Tasmanian directories&lt;/a&gt; included a little note on costs, if you&amp;rsquo;re interested.&lt;/p&gt;
&lt;h2 id=&#34;more-information&#34;&gt;More information&lt;/h2&gt;
&lt;p&gt;The notebooks I used are on GitHub:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/download_sands_and_mac_pdfs.ipynb&#34;&gt;Download Sands and Mac PDFs and OCR text&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/load_sands_and_mac_into_datasette.ipynb&#34;&gt;Load data from the Sands and Mac directories into an SQLite database (for use with Datasette)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here are some posts about the NSW and Tasmanian directories:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2022/09/01/making-nsw-postal.html&#34;&gt;Making NSW Postal Directories (and other digitised directories) easier to search with the GLAM Workbench and Datasette&lt;/a&gt; (September 2022)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2022/09/15/from-pdfs-to.html&#34;&gt;From 48 PDFs to one searchable database – opening up the Tasmanian Post Office Directories with the GLAM Workbench&lt;/a&gt; (September 2022)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2024/09/26/wheres-missing-volume.html&#34;&gt;Where&amp;rsquo;s 1920? Missing volume added to Tasmanian Post Office Directories!&lt;/a&gt; (September 2024)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2024/11/21/six-more-volumes.html&#34;&gt;Six more volumes added to the searchable database of Tasmanian Post Office Directories!&lt;/a&gt; (November 2024)&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Turning the SLV&#39;s maps into data with Allmaps and some GLAM plumbing</title>
      <link>https://updates.timsherratt.org/2025/11/04/turning-the-slvs-maps-into.html</link>
      <pubDate>Tue, 04 Nov 2025 15:02:53 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/11/04/turning-the-slvs-maps-into.html</guid>
      <description>&lt;p&gt;I often describe what I do as GLAM data plumbing. Most of the time I&amp;rsquo;m not creating new tools, I&amp;rsquo;m figuring out what data is available and how I can connect it up to &lt;em&gt;existing&lt;/em&gt; tools. It&amp;rsquo;s rarely straightforward, but if I can get all the pipes connected and data flowing in the right direction, suddenly new things become possible.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Things like turning all the State Library of Victoria&amp;rsquo;s digitised maps into data.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve just &lt;a href=&#34;https://wragge.github.io/slv-allmaps/&#34;&gt;created a workflow&lt;/a&gt; that uses &lt;a href=&#34;https://allmaps.org&#34;&gt;AllMaps&lt;/a&gt; and &lt;a href=&#34;https://iiif.io/&#34;&gt;IIIF&lt;/a&gt; to georeference the SLV&amp;rsquo;s digitised maps. There&amp;rsquo;s some technical details below, but the idea is pretty simple. A userscript links the SLV image viewer to Allmaps – so you just click on a button, and the digitised map opens, ready for georeferencing.&lt;/p&gt;
&lt;p&gt;Why is this useful? Georeferencing relates a digitised map to real world geography. It describes the map&amp;rsquo;s position and extent using geospatial coordinates – turning historic documents into geospatial data that can be indexed, visualised and manipulated. Georeferencing opens digitised maps to new research uses.&lt;/p&gt;
&lt;p&gt;So, how many maps we can georeference before my residency finishes in December? Hundreds? Thousands? If you like maps and want to help, head to &lt;a href=&#34;https://wragge.github.io/slv-allmaps/&#34;&gt;the documentation page&lt;/a&gt; to find out how to get started. And if you want to see how things are progressing, have a look at &lt;a href=&#34;https://wragge.github.io/slv-allmaps/dashboard.html&#34;&gt;the project dashboard&lt;/a&gt;.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/slv-allmaps-docs.png&#34; width=&#34;600&#34; height=&#34;466&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;&lt;a href=&#34;https://wragge.github.io/slv-allmaps/&#34;&gt;View the documentation&lt;/a&gt; to get started&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;A few technical details follow&amp;hellip;&lt;/p&gt;
&lt;p&gt;Early on in my time as Creative Technologist-in-Residence at the State Library of Victoria, I started playing around with Allmaps for georeferencing digitised maps. It&amp;rsquo;s a great tool (really a suite of tools and standards) because instead of constructing a whole new platform it integrates with existing IIIF services. The SLV provides digitised images through IIIF, so I thought it should be possible to use Allmaps to georeference the SLV&amp;rsquo;s map collection.&lt;/p&gt;
&lt;p&gt;But I struck a problem that took some time to unravel. The IIIF urls in the SLV manifests include port numbers and that confused Allmaps. The manifests also sometimes contained references to image formats that weren&amp;rsquo;t actually accessible, generating errors when they were loaded. Hopefully these problems will be fixed by the SLV, but in the meantime I&amp;rsquo;ve created a proxy service that edits the manifest on the fly. The proxied urls can be loaded into the Allmaps Editor without errors. Pipes fixed, data flowing!&lt;/p&gt;
&lt;details&gt;
  &lt;summary&gt;Using the manifest proxy&lt;/summary&gt;
  &lt;p&gt;To generate a link to a proxied manifest, first grab the item&#39;s &lt;code&gt;IE&lt;/code&gt; identifier from the url of the digitised item viewer. For example, the identifier in this url &lt;code&gt;https://viewer.slv.vic.gov.au/?entity=IE15485265&amp;mode=browse&lt;/code&gt; is &lt;code&gt;IE15485265&lt;/code&gt;. Once you have the identifier, add it to the end of the url &lt;code&gt;https://wraggelabs.com/slv_iiif/&lt;/code&gt;. For example, &lt;a href=&#34;https://wraggelabs.com/slv_iiif/IE15485265&#34;&gt;https://wraggelabs.com/slv_iiif/IE15485265&lt;/a&gt;. You can then supply this url to the Allmaps editor.&lt;/p&gt;
&lt;/details&gt;
&lt;p&gt;But having to fiddle around with proxies didn&amp;rsquo;t make a great user experience. I needed some way of integrating the two services, so that a user could just click a button in the SLV website and start editing in Allmaps. Userscripts to the rescue!&lt;/p&gt;
&lt;p&gt;I wrote recently about &lt;a href=&#34;https://updates.timsherratt.org/2025/07/17/glam-hacking-with-userscripts.html&#34;&gt;hacking GLAM collection interfaces using userscripts&lt;/a&gt;. Since I started my residency at the SLV, I&amp;rsquo;ve also created a userscript to &lt;a href=&#34;https://gist.github.com/wragge/a37a4db854deffad956abc7bf918f6b0&#34;&gt;display the IIIF manifest url in the SLV image viewer&lt;/a&gt;, and run a Code Club workshop where we played around with &lt;a href=&#34;https://slides.com/wragge/slv-code-club&#34;&gt;an assortment of SLV website hacks&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As in a number of these examples, the &lt;a href=&#34;https://gist.github.com/wragge/5680daaec4b4b34ed5537e6ff79559a2&#34;&gt;georeferencing userscript&lt;/a&gt; adds new features to the SLV website, but there&amp;rsquo;s a fair bit more going on under the hood. It runs automatically every time you load the SLV image viewer, and then:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;it checks the metadata of the digitised item to see it it&amp;rsquo;s a map (or something that contains maps, like an atlas or street directory)&lt;/li&gt;
&lt;li&gt;if it looks like a map, it generates an Allmaps identifier using the item&amp;rsquo;s IIIF manifest url and checks with Allmaps to see whether the item has already been georeferenced&lt;/li&gt;
&lt;li&gt;it adds a &amp;lsquo;Georeferencing&amp;rsquo; section to the page, with a button to georeference the item (or edit the existing georeferencing)&lt;/li&gt;
&lt;li&gt;if the item has already been georeferenced, it adds a button to view the item in the Allmaps Viewer, and embeds a live preview&lt;/li&gt;
&lt;/ul&gt;
&lt;details&gt;
    &lt;summary&gt;Accessing metadata&lt;/summary&gt;
    &lt;p&gt;
        The userscript gets the item metadata from a JSON file that&#39;s loaded by the image viewer. The JSON file includes a lot of extra, useful information about the digitised item. To access the JSON file, you just construct a url like this: &lt;code&gt;https://viewerapi.slv.vic.gov.au/?entity=[IE identifier]&amp;dc_arrays=1&lt;/code&gt;. The IE identifier is in the url of the image viewer.
    &lt;/p&gt;
&lt;/details&gt;
&lt;details&gt;
    &lt;summary&gt;Allmaps identifiers&lt;/summary&gt;
    &lt;p&gt;
        Allmaps creates its identifiers by hash encoding the IIIF urls. The userscript borrows some code from the &lt;a href=&#34;https://github.com/allmaps/allmaps/tree/main/packages/id&#34;&gt;Allmaps id module&lt;/a&gt; to generate the ids, then sends a HEAD request to the Allmaps API to see whether an entry for the current manifest exists.
    &lt;/p&gt;
&lt;/details&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/alv-allmaps-not-georeferenced.png&#34; width=&#34;600&#34; height=&#34;313&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Example of an item that hasn&#39;t been georeferenced yet&lt;/figcaption&gt;&lt;/figure&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/slv-allmaps-georeferenced.png&#34; width=&#34;600&#34; height=&#34;462&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;Example of an item that has been georeferenced, displaying an embedded version of the Allmaps viewer&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;I&amp;rsquo;ve also created a GitHub repository to save copies of the data. Every two hours &lt;a href=&#34;https://github.com/wragge/slv-allmaps/blob/main/harvest_allmaps_data.ipynb&#34;&gt;this notebook&lt;/a&gt; is run to query the Allmaps API for newly georeferenced maps. These are added to a dataset which is saved in three formats:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/slv-allmaps/blob/main/georeferenced_maps.csv&#34;&gt;a CSV file&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/slv-allmaps/blob/main/georeferenced_maps_datasette.csv&#34;&gt;a CSV file&lt;/a&gt; that includes thumbnails and links for &lt;a href=&#34;https://glam-workbench.net/datasette-lite/?csv=https%3A%2F%2Fgithub.com%2Fwragge%2Fslv-allmaps%2Fblob%2Fmain%2Fgeoreferenced_maps_datasette.csv&amp;amp;install=datasette-homepage-table&amp;amp;install=datasette-json-html&amp;amp;fts=manifest_title%2Cmap_title&#34;&gt;viewing in Datasette-Lite&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/slv-allmaps/blob/main/georeferenced_maps.geojson&#34;&gt;a GeoJSON file&lt;/a&gt;, that can be &lt;a href=&#34;https://geojson.io/#id=github:wragge/slv-allmaps/blob/main/georeferenced_maps.geojson&#34;&gt;viewed in services like geojson.io&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At the same time, the data for each individual map is downloaded and saved as &lt;a href=&#34;https://github.com/wragge/slv-allmaps/tree/main/maps&#34;&gt;IIIF annotations&lt;/a&gt; (in JSON) and &lt;a href=&#34;https://github.com/wragge/slv-allmaps/tree/main/geojson&#34;&gt;GeoJSON&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Finally, &lt;a href=&#34;https://github.com/wragge/slv-allmaps/blob/main/allmaps_dashboard.ipynb&#34;&gt;this notebook&lt;/a&gt; is run to generate &lt;a href=&#34;https://wragge.github.io/slv-allmaps/dashboard.html&#34;&gt;a dashboard&lt;/a&gt; that provides an overview of the project&amp;rsquo;s progress.&lt;/p&gt;
&lt;figure&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/geo-dashboard.png&#34; width=&#34;600&#34; height=&#34;616&#34; alt=&#34;&#34;&gt;&lt;figcaption&gt;The project dashboard is updated every two hours&lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;One of the Allmaps developers described all my plumbing and workarounds as a &amp;lsquo;very cool lofi example of how you can set this up with little means&amp;rsquo;, and I think that&amp;rsquo;s pretty apt. It&amp;rsquo;s really just an experiment to demonstrate the possibilities, but by connecting up existing services it&amp;rsquo;s generating real data of long term value.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Me at 63...</title>
      <link>https://updates.timsherratt.org/2025/11/03/me-at.html</link>
      <pubDate>Mon, 03 Nov 2025 17:55:17 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/11/03/me-at.html</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published in Pharos, newsletter of the Professional Historian&amp;rsquo;s Association (Vic &amp;amp; Tas), October-November 2025, in the &amp;lsquo;Member Profile&amp;rsquo; section.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&#34;what-was-your-first-history-related-job-what-path-have-you-taken-since-then&#34;&gt;What was your first history related job? What path have you taken since then?&lt;/h2&gt;
&lt;p&gt;In the early 1990s I started working for a small self-funded organisation called the Australian Science Archives Project. Our mission was to preserve and raise awareness of Australia&amp;rsquo;s scientific past. When the web came along, we realised it provided an enormous opportunity to communicate history to the public. So I taught myself web development and created the first archives website in Australia. Since then my work has continued to explore what happens when we release GLAM collections into online spaces where people can see and use them differently.&lt;/p&gt;
&lt;h2 id=&#34;what-kind-of-work-have-you-done-what-are-you-working-on-now&#34;&gt;What kind of work have you done? What are you working on now?&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve had a range of jobs in the GLAM and university sectors. While &amp;lsquo;history&amp;rsquo; wasn&amp;rsquo;t often in my job title, I&amp;rsquo;ve always regarded myself as a historian first – whether I was coding, editing, writing, teaching, or managing, history was always the frame through which I understood my work. At the same time, I&amp;rsquo;ve maintained my own independent practice as a &amp;lsquo;historian and hacker&amp;rsquo;, developing tools and resources for other researchers, such as the &lt;a href=&#34;https://glam-workbench.net&#34;&gt;GLAM Workbench&lt;/a&gt;. Much of this work is unfunded, but by sharing it openly I&amp;rsquo;ve created new opportunities for collaboration. For example, I&amp;rsquo;m currently the &amp;lsquo;Creative Technologist-in-Residence&amp;rsquo; at the State Library of Victoria, bringing my years of GLAM hacking to bear on the Library&amp;rsquo;s place based collections.&lt;/p&gt;
&lt;h2 id=&#34;research-or-writing-what-do-you-enjoy-more-and-why&#34;&gt;Research or writing? (What do you enjoy more and why?)&lt;/h2&gt;
&lt;p&gt;Researching, or writing, or coding, or teaching, or outreaching (what is the correct verb?) – all have their joys and travails. For me, research is less about finding things in archives and libraries, and more about &lt;em&gt;how&lt;/em&gt; we find things in archives and libraries. I poke about in online collections to try and understand how they work, what they reveal, and what they hide. This often leads to the development of new tools, the writing of  documentation and blog posts, and sometimes even real, published articles. It&amp;rsquo;s a process that has consumed my life, for better or worse. Coding often slips into obsession when I have a gnarly problem to crack. Writing is a slog, but there&amp;rsquo;s nothing like the pleasure of a finely-turned sentence. Teaching is exhausting, but also exhilarating when you see the light bulb of understanding flick on.&lt;/p&gt;
&lt;h2 id=&#34;what-are-the-best-and-hardest-things-about-the-kind-of-work-you-do&#34;&gt;What are the best and hardest things about the kind of work you do?&lt;/h2&gt;
&lt;p&gt;The best thing, the absolute hands-down best thing, is hearing from people who use, or have benefited from the tools and resources that I&amp;rsquo;ve created. I make things to help researchers see and use GLAM collections in new ways, so finding out what they&amp;rsquo;ve been doing with my stuff always provides a much-needed jolt of inspiration.&lt;/p&gt;
&lt;p&gt;However, the flip side is that getting information about my tools and resources out to the people who might benefit most is hard and often frustrating work. I churn away in the social media mines, but people and organisations seem much more reluctant to share new work these days. There was a time (yeah, the good old days) when GLAM organisations actively engaged with researchers online, sharing the cool things people were doing with their collections. But not now. We all learn through the generosity of others, and I think its important that we find ways to support and enlarge the realm of generosity.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Creating bounding boxes for parish maps in the SLV collection</title>
      <link>https://updates.timsherratt.org/2025/10/06/creating-bounding-boxes-for-parish.html</link>
      <pubDate>Mon, 06 Oct 2025 15:17:51 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/10/06/creating-bounding-boxes-for-parish.html</guid>
      <description>&lt;p&gt;The State Library of Victoria holds a collection of &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/search?query=series,exact,Parish%20maps%20of%20Victoria&amp;amp;vid=61SLV_INST:SLV&amp;amp;offset=0&#34;&gt;8,804 parish maps&lt;/a&gt;. As part of my residency at the SLV LAB, I&amp;rsquo;ve been poking around in the metadata.&lt;/p&gt;
&lt;p&gt;SLV staff have geocoded many of the parish maps using the &lt;a href=&#34;https://placenames.fsdf.org.au&#34;&gt;Composite Gazetteer of Australia&lt;/a&gt;, which provides coordinates for Victorian parishes and boroughs. These coordinates give us a point which should be roughly at the centre of each map, enabling us to visualise their locations and distribution. But how much area do they cover? To answer that question we need a bounding box that includes the coordinates of each corner of the map. We could create bounding boxes by using something like &lt;a href=&#34;https://allmaps.org&#34;&gt;AllMaps&lt;/a&gt; or &lt;a href=&#34;https://www.mapwarper.net&#34;&gt;MapWarper&lt;/a&gt; to georeference each individual map, but that&amp;rsquo;s going to take a while! As a quick and dirty alternative, I wondered if it was possible to generate approximate bounding boxes from the available metadata. It seems we can!&lt;/p&gt;
&lt;h2 id=&#34;the-metadata&#34;&gt;The metadata&lt;/h2&gt;
&lt;p&gt;There are three pieces of metadata we need to construct bounding boxes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the latitude and longitude of the centre point&lt;/li&gt;
&lt;li&gt;the size of the physical map&lt;/li&gt;
&lt;li&gt;the scale of the map (ie how the size of the map relates to real world)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The coordinates and scale can be included in a couple of different places in the map&amp;rsquo;s MARC record. The &lt;a href=&#34;https://www.loc.gov/marc/bibliographic/bd034.html&#34;&gt;&lt;code&gt;034&lt;/code&gt;&lt;/a&gt; field is specifically for &amp;lsquo;Coded Cartographic Mathematical Data&amp;rsquo;. The relevant subfields are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;$a&lt;/code&gt;: category of scale&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$b&lt;/code&gt;: constant ratio linear horizontal scale (this is the most likely type of scale)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$d&lt;/code&gt;: westernmost longitude&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$e&lt;/code&gt;: easternmost longitude&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$f&lt;/code&gt;: northernmost latitude&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$g&lt;/code&gt;: southernmost latitude&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If the coordinates describe a point rather than a bounding box, then &lt;code&gt;$d&lt;/code&gt; and &lt;code&gt;$e&lt;/code&gt; will be the same, and &lt;code&gt;$f&lt;/code&gt; and &lt;code&gt;$g&lt;/code&gt; will be the same.&lt;/p&gt;
&lt;p&gt;String representations of coordinates and scale can be found in the &lt;code&gt;255&lt;/code&gt; field. The relevant subfields are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;$a&lt;/code&gt;: statement of scale, eg &lt;code&gt;Scale [ca. 1:90,000].&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$c&lt;/code&gt;: statement of coordinates, eg &lt;code&gt;(E 142°18&#39;/S 37°33&#39;)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The size of the map is recorded in &lt;a href=&#34;https://www.loc.gov/marc/bibliographic/bd300.html&#34;&gt;&lt;code&gt;300&lt;/code&gt;&lt;/a&gt; (physical description) field under the &lt;code&gt;$c&lt;/code&gt; (dimensions) subfield. For example: &lt;code&gt;on sheet 40 x 51 cm &lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;the-method&#34;&gt;The method&lt;/h2&gt;
&lt;p&gt;I started with an existing dataset downloaded from the catalogue by SLV staff. This dataset included the scale and coordinate information in the &lt;code&gt;034&lt;/code&gt; field, and the coordinate string in &lt;code&gt;255$c&lt;/code&gt;. At first I didn&amp;rsquo;t realise that the &lt;code&gt;034&lt;/code&gt; held geo data, so I separately downloaded the scale information from &lt;code&gt;255:$a&lt;/code&gt; in each item&amp;rsquo;s MARC record (d&amp;rsquo;oh). If the maps were digitised, I also wanted their image identifiers so I could access them through the SLV&amp;rsquo;s IIIF service. The image id from the &lt;code&gt;956$e&lt;/code&gt; field of the MARC record can be used to construct an IIIF manifest url, so I extracted them as well.&lt;/p&gt;
&lt;p&gt;Once I had all the catalogue data, I had to make sure everything was in a format I could work with. The coordinates in the MARC records are recorded as degrees/minutes/seconds, so I had to convert them to decimal values. The scale factor needed to be an integer, and I needed to extract the height and width as integers from the dimensions field.&lt;/p&gt;
&lt;p&gt;I used &lt;a href=&#34;https://pypi.org/project/lat-lon-parser/&#34;&gt;lat_lon_parser&lt;/a&gt; to convert the coordinates to decimal, but needed a bit of regex string manipulation to get the values into a format that could be parsed. Regex also came to the rescue in getting the map dimensions. All the details are &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/process_parish_maps.ipynb&#34;&gt;in this notebook&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;creating-bounding-boxes&#34;&gt;Creating bounding boxes&lt;/h3&gt;
&lt;p&gt;After some searching I found &lt;a href=&#34;https://stackoverflow.com/a/76910048&#34;&gt;this StackOverflow comment&lt;/a&gt; that described how to create a bounding box from a point, distance, and bearing. The point I already had, but the distance and bearing had to be calculated. Trigonometry to the rescue!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/parish-maps-box-trig.png&#34; width=&#34;600&#34; height=&#34;334&#34; alt=&#34;&#34;&gt;
&lt;p&gt;The distance from the point at the centre of the box to one of its corners is the hypotenuse of a right-angled triangle whose sides are equal to half the width and half the height of the map, and thanks to Pythagorus we know:&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-10-06-15-19-42.png&#34; width=&#34;579&#34; height=&#34;93&#34; alt=&#34;&#34;&gt;
&lt;p&gt;Once I had the distance in cm, I converted to inches, then multiplied by the scale factor, and finally converted the inches to miles. (It now occurs to me that there&amp;rsquo;s no need to convert to imperial measurements, but it doesn&amp;rsquo;t make any difference either way.)&lt;/p&gt;
&lt;p&gt;The bearing that points to the corner of the box is the angle inside the same right-angled triangle, so can be calculated using:&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-10-06-15-19-29.png&#34; width=&#34;407&#34; height=&#34;73&#34; alt=&#34;&#34;&gt;
&lt;p&gt;With the point of origin, distance, and bearing I could use &lt;a href=&#34;https://github.com/geopy/geopy&#34;&gt;geopy&lt;/a&gt; to calculate the corners of the bounding box!&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; geopy.distance &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; geodesic

destination &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; geodesic(miles&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;distance)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;destination(origin, bearing)
coords &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; destination&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;longitude, destination&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;latitude
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/process_parish_maps.ipynb&#34;&gt;See this notebook&lt;/a&gt; for the full details.&lt;/p&gt;
&lt;h2 id=&#34;limitations&#34;&gt;Limitations&lt;/h2&gt;
&lt;p&gt;Of course, this method is very rough and has a number of major limitations, in particular:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;only about 38% of the maps have point coordinates&lt;/li&gt;
&lt;li&gt;the point values don&amp;rsquo;t necessarily locate the centre of the map&lt;/li&gt;
&lt;li&gt;not all the maps are oriented towards north&lt;/li&gt;
&lt;li&gt;sometimes a parish includes multiple maps&lt;/li&gt;
&lt;li&gt;the size of the margin around the map will affect the accuracy of the bounding box&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But despite these problems the results seem pretty good. To test this I &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/parish_maps_browser.ipynb&#34;&gt;created a notebook&lt;/a&gt; to overlay the digitised maps on a modern basemap using the bounding boxes. Here&amp;rsquo;s an example.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/parish-maps-overlay.png&#34; width=&#34;600&#34; height=&#34;471&#34; alt=&#34;Screenshot of a parish map of French Island overlaid on a modern basemap. The parish map is slightly offset to the north, but you can see that the size matches the modern map fairly well&#34;&gt;
&lt;p&gt;You can see the map is slightly offset (presumably due to the second problem listed above). But the size seems about right. Certainly good enough to use the bounding boxes in some exploratory analyses!&lt;/p&gt;
&lt;h2 id=&#34;visualising-the-results&#34;&gt;Visualising the results&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve saved the processed data as a &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/parish_maps_final.csv&#34;&gt;new dataset&lt;/a&gt;, and started playing around with a couple of ways of visualising the results. These are experiments, not discovery interfaces. But you can use them for a bit of exploration if you don&amp;rsquo;t mind a few bugs. They&amp;rsquo;re all in Jupyter notebooks that can be run &lt;a href=&#34;https://mybinder.org/v2/gh/StateLibraryVictoria-SLVLAB/geo-maps-residency/HEAD&#34;&gt;using the Binder service&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/parish_maps_browser.ipynb&#34;&gt;parish maps browser&lt;/a&gt; includes a dropdown list of parish maps with point coordinates.  Select a map and:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;if there&amp;rsquo;s a bounding box and an image identifier, the image of the parish map will be overlaid on the modern base map using the bounding box coordinates&lt;/li&gt;
&lt;li&gt;if there&amp;rsquo;s a bounding box, but no image identifier, a rectangle will be drawn on the base map showing the dimensions of the bounding box&lt;/li&gt;
&lt;li&gt;if there are point coordinates, but no bounding box, a marker will be placed on the base map&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/parish-maps-browser.png&#34; width=&#34;600&#34; height=&#34;429&#34; alt=&#34;Screenshot of a parish map of Mallacoota overlaid on a modern basemap. The opacity of the digitised map has been reduced making it easier to see how the two maps align. A popup is visible on the map, listing the basic metadata and including a link to the SLV catalogue.&#34;&gt;
&lt;p&gt;If the image of the map is displayed you can use the slider to adjust the opacity. Clicking on either the image, rectangle, or marker will display metadata about the parish map and a link to the SLV catalogue.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s also a &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency/blob/main/parish_maps_visualise_bounds.ipynb&#34;&gt;visualisation of all the bounding boxes&lt;/a&gt; overlaid on a modern base map.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/parish-maps-bounds.png&#34; width=&#34;600&#34; height=&#34;474&#34; alt=&#34;Screenshot of a modern digital map of Victoria overlaid with 3,000+ transparent blue rectangles, showing the bounds of parish maps. A couple of the maps seem to be in Bass Strait.&#34;&gt;
&lt;p&gt;As you move your mouse over the bounding boxes the titles are displayed on the map, and if you click on a bounding box the metadata is displayed beneath the map, including a link to the SLV catalogue.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s obvious from the image above that some of the coordinates must be wrong! Visualisation is a great way of finding problems with your data. I now need to work through the results, documenting the problems, and thinking about how to make best use of the data. More to come!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Exploring SLV urls</title>
      <link>https://updates.timsherratt.org/2025/09/23/exploring-slv-urls.html</link>
      <pubDate>Tue, 23 Sep 2025 17:22:45 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/09/23/exploring-slv-urls.html</guid>
      <description>&lt;p&gt;I like urls. They take you places. And if you know how to read them, they can tell you things about the systems that created them.
One of the first things I did when I started &lt;a href=&#34;https://updates.timsherratt.org/2025/09/22/creative-technologistinresidence-at-the-state.html&#34;&gt;my residency at SLV LAB&lt;/a&gt;, was to try and understand how their collection urls work. There&amp;rsquo;s a couple of well-worn methods I use when digging into a new site.&lt;/p&gt;
&lt;p&gt;The first is url hacking – this involves fiddling around with the parameters in a url and submitting the result to see what happens. The Trove Data Guide includes &lt;a href=&#34;https://tdg.glam-workbench.net/understanding-search/search-hacks.html&#34;&gt;some examples of hacking Trove urls&lt;/a&gt; to change the delivery of search results.&lt;/p&gt;
&lt;p&gt;The second method involves opening up the developer console in your web browser and watching the activity in the network tab as you click on links. This tells you where the information that gets loaded into your browser actually comes from – sometimes exposing handy urls that you can use to shortcut access to useful data.&lt;/p&gt;
&lt;h2 id=&#34;permalinks&#34;&gt;Permalinks&lt;/h2&gt;
&lt;p&gt;The SLV uses Primo for its public-facing catalogue, as well as other systems such as Rosetta and IIIF to deliver digitised content. I&amp;rsquo;d noticed that &lt;a href=&#34;https://www.zotero.org&#34;&gt;Zotero&lt;/a&gt; gets some useful data from the catalogue using the default &amp;lsquo;Primo 2018&amp;rsquo; translator, however, important things like the item url aren&amp;rsquo;t captured. The problem is that Primo&amp;rsquo;s &amp;lsquo;permalinks&amp;rsquo; are generated as required by a browser click – they&amp;rsquo;re not embedded anywhere on the page. This makes it hard to Zotero to grab them. So I started wondering how Zotero could construct short, persistent(ish) links to items.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a link to an item in Primo: &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/fulldisplay?vid=61SLV_INST:SLV&amp;amp;search_scope=slv_local&amp;amp;tab=searchProfile&amp;amp;context=L&amp;amp;docid=alma9941325055707636&#34;&gt;https://find.slv.vic.gov.au/discovery/fulldisplay?vid=61SLV_INST:SLV&amp;amp;search_scope=slv_local&amp;amp;tab=searchProfile&amp;amp;context=L&amp;amp;docid=alma9941325055707636&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It looks pretty long and messy, but if you start deleting parameters and resubmitting, you&amp;rsquo;ll find that only two parameters are essential, &lt;code&gt;vid&lt;/code&gt; and &lt;code&gt;docid&lt;/code&gt;. This means we can rewrite the url as: &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/fulldisplay?vid=61SLV_INST:SLV&amp;amp;docid=alma9941325055707636&#34;&gt;https://find.slv.vic.gov.au/discovery/fulldisplay?vid=61SLV_INST:SLV&amp;amp;docid=alma9941325055707636&lt;/a&gt; Much nicer.&lt;/p&gt;
&lt;p&gt;The &amp;lsquo;permalink&amp;rsquo; for the same item is: &lt;a href=&#34;https://find.slv.vic.gov.au/permalink/61SLV_INST/1sev8ar/alma9941325055707636&#34;&gt;https://find.slv.vic.gov.au/permalink/61SLV_INST/1sev8ar/alma9941325055707636&lt;/a&gt; If you look closely at the url path and compare it to the example above you&amp;rsquo;ll see the path is constructed from &lt;code&gt;/vid/[some other id]/docid&lt;/code&gt;. One of the librarians explained to me that the other identifier in the permalink is an encoding of the view type, but given that the &amp;lsquo;fulldisplay&amp;rsquo; view is the default, we don&amp;rsquo;t really need it. So the shortened url seems fine for use in Zotero and is easy to generate from the current url. Nice.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s also worth noting that the &lt;code&gt;vid&lt;/code&gt; value doesn&amp;rsquo;t seem to change, so to construct catalogue urls in your code, all you really need is the ALMA identifier that&amp;rsquo;s in the &lt;code&gt;docid&lt;/code&gt; parameter.&lt;/p&gt;
&lt;h2 id=&#34;structured-data&#34;&gt;Structured data&lt;/h2&gt;
&lt;p&gt;Item pages in Primo include a link labelled &amp;lsquo;Display source record&amp;rsquo;. If you click on this you&amp;rsquo;re taken to a representation of the item&amp;rsquo;s metadata in MARC. Here&amp;rsquo;s what the urls look like: &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/sourceRecord?vid=61SLV_INST%3ASLV&amp;amp;docId=alma9941325055707636&amp;amp;recordOwner=61SLV_INST&#34;&gt;https://find.slv.vic.gov.au/discovery/sourceRecord?vid=61SLV_INST%3ASLV&amp;amp;docId=alma9941325055707636&amp;amp;recordOwner=61SLV_INST&lt;/a&gt; Notice that the &amp;lsquo;fulldisplay&amp;rsquo; in the url path above has changed to &amp;lsquo;sourceRecord&amp;rsquo;. There&amp;rsquo;s also a new &lt;code&gt;recordOwner&lt;/code&gt; parameter, but it seems you can delete this and still get the same result.&lt;/p&gt;
&lt;p&gt;Having access to the MARC record is handy, because it delivers the metadata in a simple, structured plain text format. But while the &amp;lsquo;source record&amp;rsquo; page looks like a plain text file, it&amp;rsquo;s actually a HTML page that embeds a plain text record. If you open up the network tab of your browser&amp;rsquo;s developer console and reload the &amp;lsquo;source record&amp;rsquo; page, you&amp;rsquo;ll see a different url is loaded under the hood: &lt;a href=&#34;https://find.slv.vic.gov.au/primaws/rest/pub/sourceRecord?docId=alma9941325055707636&amp;amp;vid=61SLV_INST:SLV&amp;amp;recordOwner=61SLV_INST&amp;amp;lang=en&#34;&gt;https://find.slv.vic.gov.au/primaws/rest/pub/sourceRecord?docId=alma9941325055707636&amp;amp;vid=61SLV_INST:SLV&amp;amp;recordOwner=61SLV_INST&amp;amp;lang=en&lt;/a&gt; See how the url path has changed from &lt;code&gt;/discovery/&lt;/code&gt; to &lt;code&gt;/primaws/rest/pub&lt;/code&gt;? This url &lt;em&gt;does&lt;/em&gt; deliver a plain text version of the MARC record.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-09-23-15-21-05.png&#34; width=&#34;600&#34; height=&#34;215&#34; alt=&#34;&#34;&gt;
&lt;p&gt;Once you have the plain text version you can parse the contents to extract the structured data. There are tools that can probably do this automatically, but it&amp;rsquo;s also pretty easy using regular expressions. Here&amp;rsquo;s an example of some code I used to parse map records.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;get_marc_value&lt;/span&gt;(marc, tag, subfield):
    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    Gets the value of a tag/subfield from a text version of an item&amp;#39;s MARC record.
&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;    &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;try&lt;/span&gt;:
        tag &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; re&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;search(&lt;span style=&#34;color:#e6db74&#34;&gt;rf&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;^&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;{&lt;/span&gt;tag&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;\t.+&amp;#34;&lt;/span&gt;, marc, re&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;M)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;group(&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
        subfield &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; re&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;search(&lt;span style=&#34;color:#e6db74&#34;&gt;rf&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;\$&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;{&lt;/span&gt;subfield&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;([^\$]+)&amp;#34;&lt;/span&gt;, tag)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;group(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)
    &lt;span style=&#34;color:#66d9ef&#34;&gt;except&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;AttributeError&lt;/span&gt;:
        &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;None&lt;/span&gt;
    &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; subfield&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;strip(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34; .,&amp;#34;&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can also access a JSON representation of the record by adding the parameter &lt;code&gt;&amp;amp;showPnx=true&lt;/code&gt; to the catalogue url: &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/fulldisplay?vid=61SLV_INST:SLV&amp;amp;search_scope=slv_local&amp;amp;tab=searchProfile&amp;amp;context=L&amp;amp;docid=alma9941325055707636&amp;amp;showPnx=true&#34;&gt;https://find.slv.vic.gov.au/discovery/fulldisplay?vid=61SLV_INST:SLV&amp;amp;search_scope=slv_local&amp;amp;tab=searchProfile&amp;amp;context=L&amp;amp;docid=alma9941325055707636&amp;amp;showPnx=true&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Once again, this is a JSON representation embedded in a web page. Using the same developer console trick, you can identify the direct url is: &lt;a href=&#34;https://find.slv.vic.gov.au/primaws/rest/pub/pnxs/L/alma9941325055707636?vid=61SLV_INST:SLV&amp;amp;lang=en&amp;amp;search_scope=slv_local&amp;amp;showPnx=true&amp;amp;lang=en&#34;&gt;https://find.slv.vic.gov.au/primaws/rest/pub/pnxs/L/alma9941325055707636?vid=61SLV_INST:SLV&amp;amp;lang=en&amp;amp;search_scope=slv_local&amp;amp;showPnx=true&amp;amp;lang=en&lt;/a&gt; You should be able to parse the response from this url as JSON and use it in your code. I think the Zotero translator makes use of this &lt;code&gt;pnx&lt;/code&gt; data.&lt;/p&gt;
&lt;p&gt;If you want to download the MARC or JSON representations in your code, all you really need is the &lt;code&gt;alma&lt;/code&gt; identifier. Just use it to construct one of the direct urls, such as this: &lt;a href=&#34;https://find.slv.vic.gov.au/primaws/rest/pub/sourceRecord?docId=alma9941325055707636&amp;amp;vid=61SLV_INST:SLV&#34;&gt;https://find.slv.vic.gov.au/primaws/rest/pub/sourceRecord?docId=alma9941325055707636&amp;amp;vid=61SLV_INST:SLV&lt;/a&gt; The &lt;code&gt;recordOwner&lt;/code&gt; and &lt;code&gt;lang&lt;/code&gt; parameters are not needed, and the &lt;code&gt;vid&lt;/code&gt; parameter doesn&amp;rsquo;t change.&lt;/p&gt;
&lt;p&gt;Librarians using Primo have documented a number of tricks like this and &lt;a href=&#34;https://igelu.org/products-and-initiatives/product-working-groups/primo/special-projects/primo-community-support-primo-useful-bookmarklets/&#34;&gt;shared handy bookmarklets&lt;/a&gt; to rewrite urls and get catalogue data in different forms.&lt;/p&gt;
&lt;h2 id=&#34;iiif-and-images&#34;&gt;IIIF and images&lt;/h2&gt;
&lt;p&gt;SLV delivers digitised images using &lt;a href=&#34;https://iiif.io&#34;&gt;IIIF&lt;/a&gt;. The IIIF manifest urls are not directly exposed through the web interface, but you can construct your own.&lt;/p&gt;
&lt;p&gt;IIIF manifest urls look like this: &lt;a href=&#34;https://rosetta.slv.vic.gov.au/delivery/iiif/presentation/2.1/IE24074939/manifest.json&#34;&gt;https://rosetta.slv.vic.gov.au/delivery/iiif/presentation/2.1/IE24074939/manifest.json&lt;/a&gt; All we need to construct them is the &lt;code&gt;IE&lt;/code&gt; identifier, in this case &lt;code&gt;IE24074939&lt;/code&gt;. But where do you find this identifier?&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re looking at an image in the SLV&amp;rsquo;s image viewer, the url will be something like this: &lt;a href=&#34;https://viewer.slv.vic.gov.au/?entity=IE24074939&amp;amp;mode=browse&#34;&gt;https://viewer.slv.vic.gov.au/?entity=IE24074939&amp;amp;mode=browse&lt;/a&gt; Yep, the &lt;code&gt;IE&lt;/code&gt; identifier is right there in the url. Just extract it from the viewer url, and plug it into the manifest url!&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re looking at a catalogue record, or starting with one of the &lt;code&gt;alma&lt;/code&gt; identifiers, you can get the &lt;code&gt;IE&lt;/code&gt; identifier from the &lt;code&gt;956$e&lt;/code&gt; field of the MARC record.&lt;/p&gt;
&lt;p&gt;The IIIF manifest will, in turn, provide identifiers for individual images that can be requested using the standard IIIF syntax.&lt;/p&gt;
&lt;p&gt;To save myself a bit of fiddling about, I created &lt;a href=&#34;https://gist.github.com/wragge/a37a4db854deffad956abc7bf918f6b0&#34;&gt;a userscript that exposes the IIIF manifest url&lt;/a&gt; within the image viewer. If you install it you&amp;rsquo;ll see something like this:&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-09-23-15-54-09.png&#34; width=&#34;510&#34; height=&#34;229&#34; alt=&#34;&#34;&gt;
&lt;h2 id=&#34;handles&#34;&gt;Handles&lt;/h2&gt;
&lt;p&gt;Links to digitised items sometimes come in the form of &amp;lsquo;handles&amp;rsquo;: &lt;a href=&#34;http://handle.slv.vic.gov.au/10381/4338980&#34;&gt;http://handle.slv.vic.gov.au/10381/4338980&lt;/a&gt; These urls are redirected to the image viewer.&lt;/p&gt;
&lt;p&gt;If you want to construct one of these handles, the identifier can be found in &lt;code&gt;956$a&lt;/code&gt; field of the MARC record.&lt;/p&gt;
&lt;h2 id=&#34;from-old-to-new&#34;&gt;From old to new&lt;/h2&gt;
&lt;p&gt;I was looking at the datasets created about 8 years ago in the &lt;a href=&#34;https://github.com/statelibraryvic/opendata&#34;&gt;SLV open data repository&lt;/a&gt; and noticed they included urls from the previous catalogue. Fortunately, the old urls redirect to the new system.&lt;/p&gt;
&lt;p&gt;For example, this url: &lt;a href=&#34;http://search.slv.vic.gov.au/MAIN:Everything:SLV_VOYAGER1842440&#34;&gt;http://search.slv.vic.gov.au/MAIN:Everything:SLV_VOYAGER1842440&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Redirects to: &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/fulldisplay?context=L&amp;amp;vid=61SLV_INST:SLV&amp;amp;search_scope=slv_local&amp;amp;tab=searchProfile&amp;amp;docid=alma9918424403607636&#34;&gt;https://find.slv.vic.gov.au/discovery/fulldisplay?context=L&amp;amp;vid=61SLV_INST:SLV&amp;amp;search_scope=slv_local&amp;amp;tab=searchProfile&amp;amp;docid=alma9918424403607636&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you look closely at the urls you&amp;rsquo;ll see that the identifier from the old system is embedded in the new identifier: &lt;code&gt;1842440&lt;/code&gt; is in &lt;code&gt;9918424403607636&lt;/code&gt; – &lt;code&gt;99_1842440_3607636&lt;/code&gt;. This means if you have a lot of old urls, such as in the open datasets, you can easily rewrite them in your code.&lt;/p&gt;
&lt;h2 id=&#34;the-process-of-glam-hacking&#34;&gt;The process of GLAM hacking&lt;/h2&gt;
&lt;p&gt;No doubt a lot of this is well-known to librarians, and there&amp;rsquo;s probably many subtleties or complexities that my poking about has missed. But I wanted to document the process as much as the results – to give an idea of what I do when I approach a new GLAM collection online. I suppose this is GLAM hacking 101.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Creative Technologist-in-Residence at the State Library of Victoria!</title>
      <link>https://updates.timsherratt.org/2025/09/22/creative-technologistinresidence-at-the-state.html</link>
      <pubDate>Tue, 23 Sep 2025 00:14:09 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/09/22/creative-technologistinresidence-at-the-state.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;m very excited to be the new &lt;a href=&#34;https://lab.slv.vic.gov.au/residencies-opportunities&#34;&gt;Creative Technologist-in-Residence at the SLV LAB&lt;/a&gt;. For the next few months I get to play around with metadata and images, think about online access, experiment with different technologies, and build things to help people to explore the State Library&amp;rsquo;s collections. In other words, I get to be in my happy place!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/2025-09-22-11.36.59.jpg&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;My group at &lt;a href=&#34;https://updates.timsherratt.org/2025/08/29/wikifest-at-the-state-library.html&#34;&gt;the recent SLV WikiFest&lt;/a&gt; was thinking about ways of helping researchers find resources relating to particular locations – how do I find material about my suburb, or my street? Coincidentally, the main focus of my residency will also be place-based collections, so I get to really think through some of the possibilities. SLV staff have already pointed me to some amazing maps and photographs, such as the &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/collectionDiscovery?vid=61SLV_INST:SLV&amp;amp;collectionId=81271917420007636&#34;&gt;Committee for Urban Action collection&lt;/a&gt;, the &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/search?query=any,contains,mahlstedt%20melbourne&amp;amp;tab=searchProfile&amp;amp;search_scope=slv_local&amp;amp;vid=61SLV_INST:SLV&amp;amp;facet=tlevel,include,online_resources&amp;amp;offset=0&#34;&gt;Mahlstedt fire survey maps&lt;/a&gt;, the &lt;a href=&#34;https://guides.slv.vic.gov.au/MMBWplans&#34;&gt;MMBW plans&lt;/a&gt;, and the &lt;a href=&#34;https://find.slv.vic.gov.au/discovery/search?query=series,exact,Parish%20maps%20of%20Victoria&amp;amp;vid=61SLV_INST:SLV&amp;amp;offset=0&#34;&gt;Victorian parish maps&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;At the same time, I&amp;rsquo;ll be using my usual GLAM hacking approach to poke around in the SLV website to try and understand what data is currently available, identify any roadblocks, and document opportunities for computational research.&lt;/p&gt;
&lt;p&gt;The results of my residency will be shared on the &lt;a href=&#34;https://lab.slv.vic.gov.au&#34;&gt;SLV LAB site&lt;/a&gt;, in &lt;a href=&#34;https://github.com/StateLibraryVictoria-SLVLAB/geo-maps-residency&#34;&gt;GitHub&lt;/a&gt;, in the &lt;a href=&#34;https://glam-workbench.net/state-library-victoria/&#34;&gt;SLV section of the GLAM Workbench&lt;/a&gt;, and of course here. As usual, I&amp;rsquo;ll be working in the open, documenting things as I go along, so please join me on the journey!&lt;/p&gt;
&lt;p&gt;Although the residency was formally announced today, I&amp;rsquo;ve actually been working with SLV data for the last couple of weeks and I&amp;rsquo;ve already got a backlog of stuff I need to blog about. Here&amp;rsquo;s a taster – what happens when you generate bounding boxes for thousands of parish maps from the available metadata and throw them on a map…?&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-09-22-23-08-25.png&#34; width=&#34;600&#34; height=&#34;406&#34; alt=&#34;&#34;&gt;
</description>
    </item>
    
    <item>
      <title>WikiFest at the State Library of Victoria</title>
      <link>https://updates.timsherratt.org/2025/08/29/wikifest-at-the-state-library.html</link>
      <pubDate>Fri, 29 Aug 2025 16:07:26 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/08/29/wikifest-at-the-state-library.html</guid>
      <description>&lt;p&gt;This week I was lucky enough to participate in WikiFest at the State Library of Victoria. Organised by the &lt;a href=&#34;https://lab.slv.vic.gov.au&#34;&gt;State Library&amp;rsquo;s new innovation LAB&lt;/a&gt; and &lt;a href=&#34;https://wikimedia.org.au&#34;&gt;Wikimedia Australia&lt;/a&gt;, Wikifest was a hands-on, participant-led workshop focused on the possibilities of connecting SLV&amp;rsquo;s collections to (and through!) Wikidata.&lt;/p&gt;
&lt;p&gt;The day kicked off with a series of presentations demonstrating possible uses of Wikidata. I talked a bit about some of my recent GLAM/Wikidata experiments. My &lt;a href=&#34;https://slides.com/wragge/wikifest-slv-2025&#34;&gt;slides are online&lt;/a&gt; and contain plenty of links to code, demonstrations, and documentation. They&amp;rsquo;re openly-licensed, so feel free to take anything of use.&lt;/p&gt;
&lt;iframe src=&#34;https://slides.com/wragge/wikifest-slv-2025/embed&#34; width=&#34;100%&#34; height=&#34;500&#34; title=&#34;WikiFest SLV 2025&#34; scrolling=&#34;no&#34; frameborder=&#34;0&#34; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;
&lt;p&gt;The rest of the day was spent in groups, working on particular projects and learning more about Wikidata in the process. My group was looking at providing placed-based entry points to SLV collections, and spent a lot of time exploring the representation of Victoria&amp;rsquo;s &lt;a href=&#34;https://query.wikidata.org/embed.html#%23Country%20populations%20together%20with%20total%20city%20populations%0ASELECT%20%3Flga%20%3FlgaLabel%20%3FstartDate%20%3FendDate%20%3Fpoint%20%7B%0A%20%20%3Flga%20wdt%3AP31%20wd%3AQ30129411%20%3B%0A%20%20%20%20%20%20%20wdt%3AP131%20wd%3AQ36687.%0A%20%20%3Flga%20p%3AP625%20%3Fcoordinate.%0A%20%20%3Fcoordinate%20ps%3AP625%20%3Fpoint.%0A%20%20OPTIONAL%20%7B%3Flga%20wdt%3AP571%20%3FstartDate.%7D%0A%20%20OPTIONAL%20%7B%3Flga%20wdt%3AP576%20%3FendDate.%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cmul%2Cen%22%20%7D%0A%7D&#34;&gt;Local Government Areas (LGAs) in Wikidata&lt;/a&gt;. We realised there was quite a bit of work to do in adding things like dates and boundaries, but we could see some exciting future possibilities. We also made a start, adding an &amp;lsquo;inception&amp;rsquo; date for the &lt;a href=&#34;https://www.wikidata.org/wiki/Q5123821&#34;&gt;City of Moe&lt;/a&gt;, based on the Victorian Government Gazette, &lt;a href=&#34;https://gazette.slv.vic.gov.au&#34;&gt;digitised by the SLV&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-08-29-14-45-09.png&#34; width=&#34;600&#34; height=&#34;271&#34; alt=&#34;Screen capture from Wikidata showing the inception property for the City of Moe&#34;&gt;
&lt;h2 id=&#34;bonus-userscript&#34;&gt;Bonus userscript&lt;/h2&gt;
&lt;p&gt;While I was preparing my presentation I was thinking about the the way entries for Australian people in Wikidata are linked to a range of different identifiers, such as DAAO, the Encyclopedia of Australian Science, and the Australian Dictionary of Biography (ADB). Often a single person can have multiple identifiers and this means that those identifiers themselves become connected through that person&amp;rsquo;s record. You can query Wikidata with one identifier, and get back links to a range of other information sources about that person.&lt;/p&gt;
&lt;p&gt;To demonstrate this, I created &lt;a href=&#34;https://gist.github.com/wragge/40f66af72c400b2563f95bda60e713dd&#34;&gt;a simple userscript&lt;/a&gt; that adds additional links to biographies in the ADB. The script grabs the ADB identifier from the url, queries Wikidata for additional identifiers, and writes the results into the page&amp;rsquo;s &amp;lsquo;Life Summary&amp;rsquo;. Basic, but useful!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/adb-userscript-example.png&#34; width=&#34;600&#34; height=&#34;253&#34; alt=&#34;Screenshot from the ADB showing the related links from Wikidata added to the Life Summary of Margaret Baskerville&#34;&gt;
&lt;p&gt;For something more advanced, have a look at the &lt;a href=&#34;https://addons.mozilla.org/en-US/firefox/addon/entity-explosion/&#34;&gt;Entity Explosion extension&lt;/a&gt; for Firefox.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>GLAM hacking with userscripts</title>
      <link>https://updates.timsherratt.org/2025/07/17/glam-hacking-with-userscripts.html</link>
      <pubDate>Thu, 17 Jul 2025 18:21:25 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/07/17/glam-hacking-with-userscripts.html</guid>
      <description>&lt;p&gt;In teaching and workshops I used to get students to question the idea that websites are &amp;lsquo;published&amp;rsquo;. They&amp;rsquo;re not released into the world in a fixed, immutable form – they&amp;rsquo;re a set of blueprints which only reach their final form in your browser window. This makes it possible to change the way websites look and behave.&lt;/p&gt;
&lt;p&gt;Mozilla used to have a nifty educational tool called X-Ray Googles. Using it you could explore the code underlying a web page and do fun things like inserting new text or images. I encouraged students to try hacking ASIO&amp;rsquo;s home page.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/asio-eggplants.jpg&#34; width=&#34;600&#34; height=&#34;629&#34; alt=&#34;Old, modified screenshot of ASIO homepage with a section of a cartoon from First Dog On the Moon inserted.&#34;&gt;
&lt;p&gt;&lt;em&gt;ASIO home page with some added First Dog on the Moon.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;There are other ways you can fiddle with websites. For example, most browsers have a developer console that exposes the code and styling of a page. You can use the console to edit HTML elements or toggle styles, but your changes won&amp;rsquo;t be saved.&lt;/p&gt;
&lt;p&gt;One way you can save and share your web site customisations is by creating userscripts. Userscripts are little bits of Javascript code that run in your browser after a web page loads. These scripts can change many aspects of a page – not just how it looks, but also how it works.&lt;/p&gt;
&lt;h2 id=&#34;some-old-userscripts&#34;&gt;Some old userscripts&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve been playing around with userscripts for a long time. Back in 2008, I created a userscript that &lt;a href=&#34;https://discontents.com.au/shoebox/archives-shoebox/archives-in-3d.html&#34;&gt;completely overhauled the way that digital files were presented&lt;/a&gt; in the National Archives of Australia&amp;rsquo;s online database, RecordSearch. My userscript added new options for navigating and printing the file, and even made it possible to view the complete file contents on a 3D zoomable wall.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/userscript-screenshot1.jpg&#34; width=&#34;600&#34; height=&#34;577&#34; alt=&#34;Screenshot of a digitised file in RecordSearch showing the features added by the userscript.&#34;&gt;
&lt;p&gt;&lt;em&gt;This customised RecordSearch interface was created by a userscript.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The userscripts I&amp;rsquo;ve created over the years have tended to either be useful little hacks aimed at fixing annoying aspects of GLAM websites, or experiments in thinking about the sort of information that&amp;rsquo;s presented online by GLAM organisations, and how it might be different.&lt;/p&gt;
&lt;p&gt;In the first category are hacks like my &lt;a href=&#34;https://gist.github.com/wragge/b2af9dc56f7cb0a9476b&#34;&gt;RecordSearch show pages userscript&lt;/a&gt;. In 2009, I got annoyed that there was no way of knowing how many pages were in a digitised file until you clicked on the link. &lt;a href=&#34;https://discontents.com.au/doing-it-yourself/index.html&#34;&gt;So I fixed it.&lt;/a&gt; With my userscript running, the links to digitised files are rewritten to display the number of pages. I&amp;rsquo;ve updated the code numerous times over the years, adding new features, and dealing with changes to RecordSearch. The last update was just a few days ago.&lt;/p&gt;
&lt;p&gt;In the second category is my userscript that inserts photos from &lt;a href=&#34;https://www.realfaceofwhiteaustralia.net/&#34;&gt;The Real Face of White Australia&lt;/a&gt; into RecordSearch. The are many thousands of records in the National Archives of Australia that document the impact of the White Australia Policy on the lives of ordinary people. But it&amp;rsquo;s often hard to understand this from the file descriptions. The userscript displays portrait images extracted from the files alongside the metadata – it tells you there are &lt;a href=&#34;https://doi.org/10.5281/zenodo.3579530&#34;&gt;people inside&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/people-inside-list.gif&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;Animated gif showing how the userscript changes the display of a list of files in RecordSearch by adding pictures of people.&#34;&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/people-inside-item.gif&#34; width=&#34;600&#34; height=&#34;412&#34; alt=&#34;Animated gif showing how the userscript changes the display of an individual files in RecordSearch by adding pictures of the people inside.&#34;&gt;
&lt;p&gt;Amidst &lt;a href=&#34;https://updates.timsherratt.org/2025/07/09/the-rebirth-of-wragge-labs.html&#34;&gt;my recent self-archiving binge&lt;/a&gt;, I realised I&amp;rsquo;d never updated this userscript to work with the latest data from The Real Face of White Australia, so I spent some time getting it working again. In the process I realised that RecordSearch now included content security policies that made it a bit harder to insert new images. The solution was to use one of the special userscript functions, &lt;a href=&#34;https://www.tampermonkey.net/documentation.php?locale=en#api:GM_addElement&#34;&gt;GM_addElement()&lt;/a&gt;, rather than plain old Javascript. But then I discovered that the if the show pages userscript ran after this one, it would trigger the security restrictions nonetheless! To avoid this I made sure that the two userscripts operated on separate elements. So now the &lt;a href=&#34;https://gist.github.com/wragge/2941e473ee70152f4de7&#34;&gt;show people userscript&lt;/a&gt; is working again!&lt;/p&gt;
&lt;h2 id=&#34;and-a-new-userscript-to-improve-trove-lists&#34;&gt;And a new userscript to improve Trove lists&lt;/h2&gt;
&lt;p&gt;Fixing up the &amp;lsquo;people inside&amp;rsquo; code reminded me of how much fun it was playing around with userscripts, so when David Coombe mentioned a problem he had using Trove lists on Mastodon last night, I had to have a go at fixing it.&lt;/p&gt;
&lt;p&gt;The problem is that Trove lists display all the tags associated with each individual item. Some items have lots of tags, so this eats up the screen real estate, making it harder to browse the contents of a list. Notes attached to items can be hidden, but not tags. Why not?&lt;/p&gt;
&lt;p&gt;My &lt;a href=&#34;https://gist.github.com/wragge/ab6a9d6b612bee6bc4d98658e947c420&#34;&gt;brand new userscript&lt;/a&gt; hides tags by default, and adds a new link to toggle their visibility for each individual item. The link also displays the number of tags attached to each item. This gives the user control over which tags are displayed and when.&lt;/p&gt;
&lt;p&gt;&lt;video src=&#34;https://cdn.uploads.micro.blog/8371/2025/simplescreenrecorder-2025-07-17-12.53.06.mp4&#34; poster=&#34;https://updates.timsherratt.org/uploads/2025/poster.png&#34; controls=&#34;controls&#34; preload=&#34;metadata&#34; width=&#34;600px&#34;&gt;&lt;/video&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The new userscript in action – toggle your tags!&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The main difficulty in creating this userscript was knowing when the page had actually finished loading. The current version of Trove uses a lot of Javascript to load and manipulate content, so you have to tell the userscript to wait until  everything has settled down. Otherwise the script could fire too soon and cause unexpected results. I tried a number of different approaches to handling this problem, but eventually settled on the &lt;a href=&#34;https://gist.github.com/BrockA/2625891&#34;&gt;waitForKeyElements script&lt;/a&gt;. (I just realised there&amp;rsquo;s a &lt;a href=&#34;https://github.com/CoeJoder/waitForKeyElements.js&#34;&gt;more recent version&lt;/a&gt; of this script that doesn&amp;rsquo;t require JQuery, so I might need to investigate this further.)&lt;/p&gt;
&lt;p&gt;Another Trove problem fixed!&lt;/p&gt;
&lt;h2 id=&#34;using-userscripts&#34;&gt;Using userscripts&lt;/h2&gt;
&lt;p&gt;In addition to the userscripts mentioned above, I&amp;rsquo;ve also created one that &lt;a href=&#34;https://gist.github.com/wragge/af8bd20a14005d267ffc759463bd832c&#34;&gt;enables you to browse Trove newspaper pages using the arrows on your keyboard&lt;/a&gt;. Left and right arrows go to the next and previous pages, while up and down arrows jump between issues. Searching is great, but sometimes you just want to browse. Install this userscript for that old-time, authentic newspaper reading experience!&lt;/p&gt;
&lt;p&gt;But how do you install userscripts? First of all you need a browser extension to manage your userscripts – I use &lt;a href=&#34;https://www.tampermonkey.net/&#34;&gt;TamperMonkey&lt;/a&gt; or &lt;a href=&#34;http://violentmonkey.com/&#34;&gt;ViolentMonkey&lt;/a&gt;. Just follow the instructions to add one of them to your browser.&lt;/p&gt;
&lt;p&gt;To install one of my userscripts, you need to go to the script (saved as a GitHub Gist) and click on the &amp;lsquo;Raw&amp;rsquo; button. Your userscript manager will then ask you if you want to add the userscript. Click install!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-07-17-16-59-11.png&#34; width=&#34;600&#34; height=&#34;244&#34; alt=&#34;&#34;&gt;
&lt;p&gt;&lt;em&gt;Click on the &amp;lsquo;Raw&amp;rsquo; button to install.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Once they&amp;rsquo;re installed the userscripts will run automatically when specified pages are loaded. If you ever want to disable them, you can do that from your userscript manager&amp;rsquo;s dashboard.&lt;/p&gt;
&lt;p&gt;For convenience, here are the Gist links to all the userscripts I&amp;rsquo;ve mentioned:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://gist.github.com/wragge/b2af9dc56f7cb0a9476b&#34;&gt;RecordSearch show pages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://gist.github.com/wragge/2941e473ee70152f4de7&#34;&gt;RecordSearch show people&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://gist.github.com/wragge/ab6a9d6b612bee6bc4d98658e947c420&#34;&gt;Trove lists hide tags&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://gist.github.com/wragge/af8bd20a14005d267ffc759463bd832c&#34;&gt;Trove newspapers keyboard navigation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As with anything you install on your computer, you want to make sure that you trust the source of any userscripts you add.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>The rebirth of Wragge Labs (and moving my Heroku apps)</title>
      <link>https://updates.timsherratt.org/2025/07/09/the-rebirth-of-wragge-labs.html</link>
      <pubDate>Wed, 09 Jul 2025 17:48:23 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/07/09/the-rebirth-of-wragge-labs.html</guid>
      <description>&lt;p&gt;It looks like some paid work I was counting on won&amp;rsquo;t be going ahead, so I&amp;rsquo;m trying to save a bit of money on cloud hosting. As I previously noted, this resulted in &lt;a href=&#34;https://updates.timsherratt.org/2025/07/02/the-future-of-the-past.html&#34;&gt;the resurrection of &lt;em&gt;The future of the past&lt;/em&gt;&lt;/a&gt;, but I&amp;rsquo;ve also been continuing to slog away at migrating all my old Flask apps and experiments from Heroku to a single Digital Ocean droplet. As of today, I&amp;rsquo;ve migrated 11 apps. Here&amp;rsquo;s a few details&amp;hellip;&lt;/p&gt;
&lt;h2 id=&#34;a-new-old-home&#34;&gt;A new (old) home&lt;/h2&gt;
&lt;p&gt;The first thing I had to figure out was how to group together a series of individual &lt;a href=&#34;https://flask.palletsprojects.com/en/stable/&#34;&gt;Flask&lt;/a&gt; apps so I could easily run and maintain them on a single server, without making major changes to the apps themselves. I decided to go with the &lt;a href=&#34;https://flask.palletsprojects.com/en/stable/patterns/appdispatch/&#34;&gt;application dispatching pattern&lt;/a&gt; described in the Flask documentation. This groups the apps within a single Python environment so I had to do some alignment of Python versions and packages, but it wasn&amp;rsquo;t too hard and having just one virtual environment to manage seems a lot easier in the long run.&lt;/p&gt;
&lt;p&gt;The application dispatching pattern configures the server to run one application at the web root (&#39;/&#39;), with the other apps assigned individual sub-paths. This raised the question, what did I want sitting at the root address? Rather than selecting an existing application for the prime slot, I decided to take the opportunity to build a showcase that included details of many of the things I&amp;rsquo;ve created over the years.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/wraggelabs.png&#34; width=&#34;600&#34; height=&#34;720&#34; alt=&#34;Screenshot of the original Wragge Labs&#34;&gt;
&lt;p&gt;&lt;em&gt;The old Wragge Labs (circa 2012)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I also needed a new domain name. Or did I? Back in the old days, I had a site where I shared many of my tools and experiments – Wragge Labs. In the intervening years, I&amp;rsquo;d moved or migrated much of the content away and pointed the wraggelabs.com domain to my main site at timsherratt.au. But this seemed like a good opportunity to resurrect it. So if you&amp;rsquo;d like to have a play around with some of the things I&amp;rsquo;ve created over the last 30 years, head along to the all new &lt;a href=&#34;https://wraggelabs.com&#34;&gt;wraggelabs.com&lt;/a&gt;!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/new-wraggelabs.png&#34; width=&#34;600&#34; height=&#34;569&#34; alt=&#34;Screenshot of part of the new Wragge Labs!&#34;&gt;
&lt;p&gt;&lt;em&gt;The new &lt;a href=&#34;https://wraggelabs.com&#34;&gt;Wragge Labs&lt;/a&gt; showcases websites, apps, and experiments from the past 30 years – some useful, some playful, and some creepy&amp;hellip;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;A surprising amount of the things I&amp;rsquo;ve built are still working. As I was compiling my list, I made a few running repairs – for example, fixing broken links in &lt;a href=&#34;https://timsherratt.au/shed/culturevic/&#34;&gt;Linking history in place&lt;/a&gt; and &lt;a href=&#34;https://timsherratt.au/shed/magicsquares/&#34;&gt;Magic Squares&lt;/a&gt; to get them working again. However, some things only exist now in web archives, and others have been broken by the recent actions of the &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;NLA&lt;/a&gt; and &lt;a href=&#34;https://updates.timsherratt.org/2025/05/19/no-more-harvesting-data-from.html&#34;&gt;NAA&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To provide a bit of extra context, I&amp;rsquo;ve grouped together publications and presentations documenting many of the experiments. These are saved in &lt;a href=&#34;https://www.zotero.org/&#34;&gt;Zotero&lt;/a&gt;, and tagged with the name of the app. When you click on a &amp;lsquo;Related&amp;rsquo; link, the details of any linked resources are retrieved using the Zotero API and displayed on a new page. This means I can add new related resources simply by dropping  them into Zotero.&lt;/p&gt;
&lt;h2 id=&#34;moving-house&#34;&gt;Moving house&lt;/h2&gt;
&lt;p&gt;The process for moving the apps to their new home was pretty straightforward. On my local machine, I copied the code into the new aggregated structure, added any packages needed into a combined requirements file, and created a new top-level app to direct requests. And then I spun everything up and started fixing bugs&amp;hellip;&lt;/p&gt;
&lt;p&gt;All of the problems were easily resolved. Most involved fixing up paths to static assets, or in navigation links. The only significant changes to the Python code were caused by the deprecation of the &lt;code&gt;.count()&lt;/code&gt; method in Pymongo.&lt;/p&gt;
&lt;p&gt;To make life a little harder, I decided to take the opportunity to make sure that all the assets – javascript, css, and font files – were loaded from the local system, and not sitting in the cloud. Having everything local should make it easier to maintain the apps in the long term. It was a bit fiddly tracking down where everything was being loaded from, but not too hard.&lt;/p&gt;
&lt;p&gt;The only other changes I made were to add some caching to most of the apps, particularly those that make calls to external databases or APIs. I used &lt;a href=&#34;https://flask-caching.readthedocs.io/en/latest/index.html&#34;&gt;Flask-Caching&lt;/a&gt; with the local file system backend.&lt;/p&gt;
&lt;p&gt;To get the new aggregated application working on a Digital Ocean droplet, I followed the instructions on how to &lt;a href=&#34;https://www.digitalocean.com/community/tutorials/how-to-serve-flask-applications-with-uwsgi-and-nginx-on-ubuntu-22-04&#34;&gt;serve Flask applications using uWSGI and Nginx&lt;/a&gt;. I think the only thing I did differently was to use &lt;a href=&#34;https://github.com/pyenv/pyenv&#34;&gt;pyenv&lt;/a&gt; to manage Python versions and the virtual environment. To update the app, I use &lt;code&gt;rsync&lt;/code&gt; to copy across the code and &lt;code&gt;systemctl&lt;/code&gt; to restart it. So far it&amp;rsquo;s all working pretty smoothly.&lt;/p&gt;
&lt;h2 id=&#34;redirecting-heroku&#34;&gt;Redirecting Heroku&lt;/h2&gt;
&lt;p&gt;Once the apps were happy in their new home, I needed to redirect the Heroku addresses to &lt;a href=&#34;https://wraggelabs.com&#34;&gt;wraggelabs.com&lt;/a&gt;. It was surprisingly hard to find good documentation on how to do this, so I&amp;rsquo;ll document my steps in detail in case its of use to others.&lt;/p&gt;
&lt;p&gt;There are a few redirect apps for Heroku around, but I decided to use &lt;a href=&#34;https://github.com/fastmonkeys/heroku-redirect&#34;&gt;heroku-redirect&lt;/a&gt; because it basically just configures and runs Nginx without any additional processing. First I cloned &lt;code&gt;heroku-redirect&lt;/code&gt; to my local system, and then for each app I wanted to migrate I followed these steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;cd into the &lt;code&gt;heroku-redirect&lt;/code&gt; directory&lt;/li&gt;
&lt;li&gt;set the git remote for the app you want to redirect: &lt;code&gt;heroku git:remote -a [app name]&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;if you&amp;rsquo;re not using the lastest Heroku stack, update it: &lt;code&gt;heroku stack:set heroku-24 -a [app name]&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;set the url you want to redirect to (without trailing slash): &lt;code&gt;heroku config:set LOCATION=[new url]&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;to add the full path to the redirected url: &lt;code&gt;heroku config:set PRESERVE_PATH=true&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;I found I had to remove the Python buildpack from the app before updating, I did this using the Heroku dashboard, but no doubt there&amp;rsquo;s also a CLI command&lt;/li&gt;
&lt;li&gt;I also used the dashboard to add a new nginx buildpack: &lt;code&gt;https://github.com/heroku/heroku-buildpack-nginx.git&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;finally I pushed the new app, using &lt;code&gt;--force&lt;/code&gt; to replace it completely: &lt;code&gt;git push --force heroku master:main&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, you&amp;rsquo;ll need to have the Heroku CLI installed.&lt;/p&gt;
&lt;p&gt;A number of my Heroku apps were using basic dynos (which cost US$7 a month), so once they were redirected, I changed them to used shared eco dynos. Yay – money saved! Hopefully, the redirects won&amp;rsquo;t push the eco dynos beyond their monthly limit.&lt;/p&gt;
&lt;h2 id=&#34;more-experiments-to-come&#34;&gt;More experiments to come?&lt;/h2&gt;
&lt;p&gt;One of the good things about all of this housekeeping is that it&amp;rsquo;s got me thinking about new experiments. I used to love Flask and Heroku because it they made it so easy to build and share things. Now I can do the same with &lt;a href=&#34;https://wraggelabs.com&#34;&gt;wraggelabs.com&lt;/a&gt;!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>The future of the past... in the present</title>
      <link>https://updates.timsherratt.org/2025/07/02/the-future-of-the-past.html</link>
      <pubDate>Wed, 02 Jul 2025 13:26:45 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/07/02/the-future-of-the-past.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve been on a bit of a self-archiving binge lately. It started because I needed to cut back some of my web hosting costs, and was looking at ways of bringing together a group of separately hosted Heroku apps onto a single Digital Ocean droplet. While taking stock of my various apps and experiments, I remembered there were some that hadn&amp;rsquo;t survived earlier migrations – in particular, &lt;em&gt;the future of the past&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;the future of the past&lt;/em&gt; was a weird little app built on top of a collection of 40,000 newspaper articles, harvested from Trove, that included the phrase &amp;lsquo;the future&amp;rsquo;. I created it as part of my Harold White Fellowship at the National Library of Australia in 2012, and told the story of its genesis in &lt;a href=&#34;https://updates.timsherratt.org/2025/06/30/mining-for-meanings.html&#34;&gt;my fellowship lecture&lt;/a&gt;. In short, I extracted words with the highest TF-IDF values for each year in my dataset, and fell in love with them. The word groupings were so odd and evocative that I felt I had to find some way of sharing them.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The future of the past&lt;/em&gt; made words the primary means of navigating the collection of newspaper articles. At first you were presented with a random selection of words, sized according to their TF-IDF values. When you clicked on a word, you limited the results to years in which that word appeared. You kept clicking words until only one year matched. Then you were shown a random selection of words from that year, along with the words you&amp;rsquo;d followed to get to that point. Once you&amp;rsquo;d arrived at a year, you could click on words to display the content of articles that contained that word. But you could also make poetry.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/fotp.png&#34; width=&#34;600&#34; height=&#34;347&#34; alt=&#34;Screen capture from the original instance of the future of the past, showing a jumble of words of different sizes in light coloured rectangles. The cpation invites users to &#39;choose a word...&#39;.&#34;&gt;
&lt;p&gt;&lt;em&gt;The future of the past&lt;/em&gt; gestured towards fridge magnet poetry in its design and odd jumble of words. And when you finally landed on a single year, you could create &lt;em&gt;your own poems&lt;/em&gt; by dragging words into the box at the bottom of the screen. Once you were happy with your poem you could share it on Twitter. And people did.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screen-shot-2012-10-11-at-7.20.13-pm.png&#34; width=&#34;537&#34; height=&#34;458&#34; alt=&#34;Examples of poems created by Bethany Nowviskie and shared on Twitter.&#34;&gt;
&lt;p&gt;The most exciting and enjoyable part of the project was watching people create and share their poems. &lt;em&gt;The future of the past&lt;/em&gt; even managed to win the &amp;lsquo;Best use of DH for fun&amp;rsquo; in the 2012 DH Awards. As I said in my &lt;a href=&#34;https://updates.timsherratt.org/2025/06/30/mining-for-meanings.html&#34;&gt;Harold White lecture&lt;/a&gt;, I wasn&amp;rsquo;t really sure what &lt;em&gt;the future of the past&lt;/em&gt; was – a discovery interface? a game? a piece of art? But I suppose that&amp;rsquo;s one of the reasons why I liked it so much.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/fridge-poetry.png&#34; width=&#34;565&#34; height=&#34;746&#34; alt=&#34;More examples of poems created using future of the past and shared on Twitter.&#34;&gt;
&lt;p&gt;When I built the app, I was going through a stage of putting everything in Django. Only later did I realise that Flask was a much more suitable framework for the sort of small, experimental apps I was creating. Django was overkill, and the maintenance demands coupled with hosting issues made it difficult to keep things alive. At some point, &lt;em&gt;the future of the past&lt;/em&gt; went dark and it just seemed too hard to get it going again&amp;hellip;&lt;/p&gt;
&lt;p&gt;But last week I had another look, and decided I could resurrect the app in a more maintenance-friendly form by converting it from Django to Flask, and migrating the data from MySQL to SQLite. Django and Flask are both Python frameworks, so it was mainly a matter of unpacking all the logic in Django&amp;rsquo;s views, models, and handlers and consolidating it into a couple of simple Flask functions. Fortunately, I managed to find an SQL dump of the original database in the backed-up downloads folder of an old laptop. It took a bit of fiddling, but I got the dumped data loaded into SQLite without too many problems.&lt;/p&gt;
&lt;p&gt;I also realised I could use the new database to fix up another app I created during my fellowship – &lt;a href=&#34;https://timsherratt.au/shed/presentations/nla/pages/frequencies.html&#34;&gt;a word frequency browser&lt;/a&gt;. It&amp;rsquo;s just a static HTML page, so I added a couple of JSON APIs to the Flask app so it could access the data.&lt;/p&gt;
&lt;p&gt;Both Django and Flask use Jinja2 templates, so I didn&amp;rsquo;t have to do anything much to the interface of &lt;em&gt;the future of the past&lt;/em&gt;. I made sure that all the assets (fonts and javascript) were being loaded from local copies to avoid any future problems and, of course, I had to replace the Twitter integration. I decided to add options to share poems on both Mastodon and Bluesky. Mastodon was a little tricky because you need to know a user&amp;rsquo;s instance before you can post their toot. There are a number of solutions available, but I went with the &lt;a href=&#34;https://github.com/autinerd/simple-mastodon-share-button&#34;&gt;pattern documented in this GitHub repository&lt;/a&gt;. It&amp;rsquo;s a little clunky because you need to enter your instance name each time you post, and you might also have to allow pop-ups for it to work properly, but it seems to do the job. I did think about updating some other aspects of the interface, but decided to preserve it in its original 2012 grey-toned glory.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;So &lt;a href=&#34;https://wraggelabs.com/fotp/&#34;&gt;the future of the past&lt;/a&gt; lives again in the present!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://wraggelabs.com/fotp/&#34;&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-07-02-11-45-36.png&#34; width=&#34;600&#34; height=&#34;419&#34; alt=&#34;Screenshot of the current future of the past interface, including options to share poems on Mastodon and Bluesky.&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;https://wraggelabs.com/fotp/&#34;&gt;Create your own poems!&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Mining for meanings</title>
      <link>https://updates.timsherratt.org/2025/06/30/mining-for-meanings.html</link>
      <pubDate>Mon, 30 Jun 2025 18:36:42 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/06/30/mining-for-meanings.html</guid>
      <description>&lt;p&gt;&lt;em&gt;In 2012, I was lucky enough to be awarded a Harold White Fellowship by the National Library of Australia. I used my time to explore ways of using Trove&amp;rsquo;s digitised newspapers as data, and presented my work at a public lecture in May 2012. I spoke from notes and never got round to writing it all up. The recording made by the NLA has disappeared from their website, but is &lt;a href=&#34;https://web.archive.org/web/20140212200542/http://www.nla.gov.au/podcasts/media/Harold-White/tim-sherratt.mp3&#34;&gt;still available in the Internet Archive&lt;/a&gt;. The text below is a transcription of the recording made in June 2025 with some minor editing.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;You can also listen to the audio, &lt;a href=&#34;https://timsherratt.au/shed/presentations/nla/&#34;&gt;browse the full set of slides&lt;/a&gt;, or &lt;a href=&#34;https://doi.org/10.5281/zenodo.15771695&#34;&gt;download a PDF&lt;/a&gt; from Zenodo.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;audio src=&#34;https://cdn.uploads.micro.blog/8371/2025/tim-sherratt.mp3&#34; controls=&#34;controls&#34; preload=&#34;metadata&#34;&gt;&lt;/audio&gt;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/harold-white-2.jpeg&#34; width=&#34;600&#34; height=&#34;430&#34; alt=&#34;&#34;&gt;
&lt;p&gt;&lt;em&gt;Photograph by Christopher Brothers, 2012, &lt;a href=&#34;https://nla.gov.au/nla.obj-132272018&#34;&gt;nla.gov.au/nla.obj-1&amp;hellip;&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&#34;i-beyond-discovery&#34;&gt;I. Beyond discovery&lt;/h2&gt;
&lt;p&gt;Thanks Marie-Louise and thanks to the library for this great opportunity. And of course thanks to all of you for coming along on a night when I&amp;rsquo;m sure you&amp;rsquo;d rather be at home waiting for the budget speech. And this is the API working away here in the background. Okay, well, do I really need to introduce the newspaper database? I suspect I probably don&amp;rsquo;t for this sort of audience. You&amp;rsquo;re probably avid users of the digitized newspapers online. Are you? Yeah. I did my doctoral research back in the dark ages before Trove, and of course that meant spending many weeks, if not months, destroying my eyesight using microfilm readers. Using what are quite fragmentary printed indexes to try and find stuff which might be relevant to my study. But now of course more than 60 million newspaper articles online and most importantly, really the full text of these articles is searchable. It is something which we&amp;rsquo;re quite familiar with now, but it is something which is quite revolutionary in many ways.&lt;/p&gt;
&lt;p&gt;This unprecedented access to a vast volume of material which documents the ordinary lives of Australians is already changing historical practice. We can now go beyond the well-known events, the big stories and explore the small stories, the fragments, the glimpses of lives which might not otherwise be recorded, but this access comes with a cost. What happens when we do a search and instead of getting 10 results or 100 results, we get 10,000 results or 100,000 results? How do we start to use or understand that sort of thing? What do we do when instead of the clarity and excitement of discovery, we end up with the anxiety and confusion that can come with overwhelming abundance?&lt;/p&gt;
&lt;p&gt;Fortunately though, there are a growing number of digital tools which we can turn to. Tools and technologies which enable us to manage this deluge and to explore large volumes of text rather than sort of single search results. Tools that enable us to zoom out of our search results and have a look at the big picture to understand the trends and the patterns to see what&amp;rsquo;s going on. For example, perhaps we might want to try and track events over time. Have a look for example, this graph shows the prevalence of the words drought and floods in the newspaper database over time.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-003.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;So we can actually look at that and use it as a way where we can map it to the specific events. And we can see here, of course, this is the federation drought at this point. We could also start to look for patterns that aren&amp;rsquo;t easy to see within your sort of normal list of search results.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-004.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;I was interested in having a look at how the word decade might be used. So I searched for decade and I found, as you can see, that there&amp;rsquo;s these nice sort of regular peaks and I was wondering why have we got these such these regular peaks? And I did a bit more digging and I discovered why. That red line shows the usage of the word census. And you see here how the little peaks sit on top of each other?&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-005.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;So obviously the time we talk most about decades is when a census has come out. So again, this is a sort of pattern which would be very hard to find other ways by just sort of working through our list of search results. We can also use these sorts of technologies for exploring changes in language, the way we talk about things, the labels we use.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-006.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;This is an example which I&amp;rsquo;ve taken from the National Library of New Zealand searching through New Zealand newspapers in this case. But what they&amp;rsquo;ve done is to look at the change in usage of the name for the south island, which was apparently, I didn&amp;rsquo;t know this originally called Middle Island and changed to the South Island. And so you can see here this sort of process of transition happening before South Island takes over completely.&lt;/p&gt;
&lt;p&gt;We can also challenge our expectations. Now, I was always of the belief that the traditional name for people from English cultural background of that chap who wears a red suit and comes around at Christmas time was Father Christmas. And then in recent years that has been supplanted by the sort of Americanized Santa Claus, but it seems I&amp;rsquo;m wrong.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-007.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;The red line here is Santa Claus, the blue line is Father Christmas. And so if we look here from the late 19th century to the early 20th century, Santa Claus is definitely winning. What&amp;rsquo;s interesting though, really interesting, is when we get the change over. Any guesses as to what&amp;rsquo;s going on there?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; Coke advertising.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Pardon?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; Coca-Cola advertising.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Well actually, I don&amp;rsquo;t know. My hypothesis, what&amp;rsquo;s happening, is this is around sort of 1914 and it seems that over the war period, Father Christmas starts to win over the top of Santa Claus. So whether, I mean this is pure hypothesis at this point, and it&amp;rsquo;s something which would be interesting to explore, whether it&amp;rsquo;s the Germanic sound of Santa Claus, it sort of lapses in popularity or perhaps there are other causes, completely other circumstances. But that&amp;rsquo;s the value of these sorts of things that they do allow you to ask some questions and to prompt you to do some other sorts of investigation.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-008.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;Now these graphs which I&amp;rsquo;m showing you, were all created by a tool I developed called &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/querypic/&#34;&gt;QueryPic&lt;/a&gt;. And we won&amp;rsquo;t just show you the slide, we&amp;rsquo;ll actually use it. I want a word. Anybody give a word?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; Brooch.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Broach?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; Because yours is nice.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I&amp;rsquo;m not sure it&amp;rsquo;s going to show anything. Yeah, brooch.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; Know what we&amp;rsquo;ve done &amp;lsquo;automaton&amp;rsquo;, the one we talked about last week.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You have to spell it for me.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; A-U-T-O-M-A-T-O-M. Actually, correct. That&amp;rsquo;s what you&amp;hellip;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;No, no, no, this is right. Okay. So what this is doing now, it&amp;rsquo;s actually going off to the Trove API and it&amp;rsquo;s getting the total results. I mean it&amp;rsquo;s actually a very simple tool. All it&amp;rsquo;s doing is it&amp;rsquo;s taking your query and it&amp;rsquo;s searching for each year across the span of the newspaper database, and it&amp;rsquo;s getting the total number of results for each year and it&amp;rsquo;s then presenting them in the form of the graph. As I say, it&amp;rsquo;s very simple, but it&amp;rsquo;s also quite effective as you can see. And it&amp;rsquo;s useful and it&amp;rsquo;s also quite fun. And what it gives you is the ability to quickly explore a hunch, to get a sort of sense of context or to start exploring, to start framing a more specific research question without spending&amp;hellip; there we go&amp;hellip; without spending days searching or tabulating as you would normally have to do. So you can see how easy it is to use and if you want to actually compare that to something else, you can just type in another word.&lt;/p&gt;
&lt;p&gt;Okay, now there are obvious limitations to a tool like this. There&amp;rsquo;s a lot to unpack. I wouldn&amp;rsquo;t want to say that it&amp;rsquo;s evidence because there is so many assumptions built into the back end of it. Questions about what the search engine is actually giving you back, different usages of terms, obviously the contexts and things like the quality of the OCR itself. You know there are a whole lot of stuff. But despite all that, I think it is quite useful, as I said, in terms of allowing you to explore things quite quickly and to follow your hunches. I regard it as a starting point, not as an end.&lt;/p&gt;
&lt;p&gt;Now, but there are some folks&amp;hellip; let me see if it&amp;rsquo;s going to finish.. there are some folks who are a bit more confident about techniques such as this and who would suggest that not only can they provide evidence, but they can actually be used to develop mathematical representations of past behavior.&lt;/p&gt;
&lt;h2 id=&#34;ii-finding-formulas&#34;&gt;II. Finding formulas&lt;/h2&gt;
&lt;p&gt;You may have heard of the Culturomics project from Harvard University. These guys got access to the full corpus of Google&amp;rsquo;s digitized books. So 5 million books, the text of 5 million books. They pulled it all apart. They did a bit of cleaning up of the metadata, all sorts of stuff, and then they started searching it and they started to see what they, but they could pull out of it. And when they started searching, they noticed all sorts of patterns appearing and they argued that these patterns could actually form the basis for what they said was a new science of culture, hence culturomics.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-010.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;There&amp;rsquo;s a lot I might say about that, but I just want to look at one example. Okay, this is an example which in their published paper in Science, they called &amp;lsquo;We forget&amp;rsquo;, and I generated it using an online tool called the Ngram Viewer. You can go and do this yourself if you like. And what it&amp;rsquo;s showing as you might be able to see is it&amp;rsquo;s searching for years used within the text. So 1883, 1910, 1950. It&amp;rsquo;s pulling out all the instances where those labels are used within the text, where those terms are used. And there does obviously seem to be some sort of pattern. And the research has noticed that the graphs have a characteristic shape, obviously rapid ascent and then a decline. But they also notice changes. Of course, the size of the peaks is changing over time, getting higher. They say that this is indicating a greater focus on the present and the rate of decay is increasing, so that the peak is actually dropping away faster. And they say from this, we are forgetting our past faster with each passing year.&lt;/p&gt;
&lt;p&gt;I thought it would be interesting to repeat this experiment using QueryPic. So I did. It looks a bit different.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-011.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;I mean, before we could interpret this difference, of course, there&amp;rsquo;s a lot that we would want to ask, a lot of, first of all, methodological questions. Again, exactly what are we searching in the two instances and how can we compare the searching, the books in one instance to the newspapers in others - dates obviously play a different role in newspapers than they do in books. But it was actually the conceptual issues, which really struck me in relation to this example and in particular the assumption that we can compare the past, present, and future uses of these labels as if we are talking about the same thing: as if the label 1950 means the same thing before 1950, in 1950 and after 1950. The names for events and periods that we assign, that we share, that we use are themselves the products of historical processes. They slip, they shift, they change.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-012.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;We all know what we mean by the Great Depression. Where&amp;rsquo;s the Great Depression on this graph? So in terms of the usage at the time, the usage of the term &amp;lsquo;Great Depression&amp;rsquo; was actually greater in the 1890s than in the 1930s.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-013.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;We&amp;rsquo;re very familiar with the usage of the, we&amp;rsquo;re talking about &amp;lsquo;black&amp;rsquo; days like Black Tuesdays. Black Fridays of course, is the one we&amp;rsquo;re most familiar with. And in Australia, these labels are generally attached to bushfires of course, and that&amp;rsquo;s the context where we generally understand them and use them and remember them. And over here, of course we have Black Friday. So what&amp;rsquo;s this big peak here? It&amp;rsquo;s not a bushfire. It refers to the Victorian government&amp;rsquo;s mass sackings of senior civil servants and judges in 1878. Obviously it was an extremely important event at the time, an extremely important event in government in Victoria, but it doesn&amp;rsquo;t quite figure in our collective memory in the same way as Black Friday does.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-014.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;One of my early experiments with QueryPic was to look at the question of when did the Great War become the First World War? At what point did we stop thinking about the Great War as the war to end all wars and realize that it was one in a series of global conflicts? And the graph really does a nice job of confirming our expectations, I suppose, in that we see a nice crossover late in 1941, which if we were thinking about the passage of the war would be about when we probably would expect. But what&amp;rsquo;s missing from this, what&amp;rsquo;s missing of course is just the war.&lt;/p&gt;
&lt;p&gt;Just as with the Great Depression and with the Black Wednesday, what&amp;rsquo;s hard is trying to recapture the moment as it was happening, the sense for want of a better word, of present-ness. Now, if we go back to &amp;lsquo;We forget&amp;rsquo; what are we doing when we&amp;rsquo;re talking about one of these dates? I mean, if we think of the present as providing a line there - past, present, future - on the past side, what are we doing? We&amp;rsquo;re anticipating. We&amp;rsquo;re predicting. Perhaps, we&amp;rsquo;re dreading. And present, we&amp;rsquo;re experiencing, we&amp;rsquo;re enjoying, maybe suffering. In the future, we are remembering, we are regretting, perhaps reflecting. So instead of lumping all these together, it seems to me that we should be teasing them out and exploring their different interconnections.&lt;/p&gt;
&lt;p&gt;We should be trying to give the past back its own sense of the present. And this in essence was the modest and thoroughly achievable goal of my Harold White Fellowship. I wanted to explore the possibilities of the digitized newspaper collection in supporting this sort of rich temporal contextualization using digital methods to recover the pasts, the presents and the futures of any moment in our history. I have to admit, I haven&amp;rsquo;t got very far yet, and Marie-Louise has been doing a good job of reassuring me that sometimes the fruits of these things take a while to develop. Now, there are a number of reasons why I haven&amp;rsquo;t gotten as far as I wanted, but I do have a few sort of sketches that I want to share with you.&lt;/p&gt;
&lt;h2 id=&#34;iii-the-future-of-the-past&#34;&gt;III. The future of the past&lt;/h2&gt;
&lt;p&gt;Okay. What I decided to do is to try and create a sort of manageable sample set. So I decided to work with articles, which included the phrase, &amp;lsquo;the future&amp;rsquo;, in the heading or the first four lines of the newspaper articles, and that&amp;rsquo;s one of the facets you can use within Trove. Why did I limit it in this way? Well, I&amp;rsquo;ve been doing a lot of different work in Trove as Marie-Louise said. One project I&amp;rsquo;ve been working on was looking at ways of finding editorials within Trove and exploring the content of editorials over time. And in doing that, I discovered a number of frustrating things, one of which is sometimes the articles aren&amp;rsquo;t divided up as nicely as we want them to be. Particularly with editorials, editorials on different subjects are often joined together, so it&amp;rsquo;s difficult to separate out the specific ones that you want, but I thought by limiting my search in this way, that it increases my chance of relevance. And it also brought the number of matches down to what I thought was a reasonably manageable 60,000 or so.&lt;/p&gt;
&lt;p&gt;So I started harvesting those 60,000 articles. I have over time been developing a number of tools working with Trove. One of which is a harvester, which enables you to get the data in bulk. And of course that&amp;rsquo;s necessary if you&amp;rsquo;re going to do this sort of large-scale analysis on it. I modified my existing harvesting tools to save the results directly into the database and when the API became available, I modified it to use the API, which makes a lot of things easier. Now, after about 40,000, I thought I probably had enough, and I decided I&amp;rsquo;d trust in Trove&amp;rsquo;s relevance ranking and just work with that set.&lt;/p&gt;
&lt;p&gt;And then it was time to do some cleaning. Now, Trove&amp;rsquo;s crowdsourced OCR correction project has been a wonderful success, of course, but it&amp;rsquo;s worth noting that with the sample of articles that I harvested for this project, only 2% had any corrections at all. So 98% totally uncorrected, totally untouched. While I couldn&amp;rsquo;t hope to correct all of those articles myself, I could at least try to reduce some of the noise, which is created by these sorts of OCR errors. So I developed a series of scripts which would try to clean up some of that OCR output. First of all, it corrected some errors which are fairly consistent and hopefully fairly unambiguous OCR errors. And you can test yourself here. What&amp;rsquo;s that meant to be?&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-018.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; His.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Pardon?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Audience:&lt;/strong&gt; His.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Nope. The.&lt;/p&gt;
&lt;p&gt;What about this one? Nah, the. This one? No, of. Should get that one. Ah, yep. And you can check. There we go. Look of, ah, yep. So there are a series of these which I could through a script just fix up. I then checked each word in the text against a series of dictionaries and word lists, and this included a word list of places which I actually extracted from the places Gazetteer provided by Geoscience Australia. Anything which didn&amp;rsquo;t seem to match up, I marked up in a way that I could extract it later if I wanted to. And all of this, you&amp;rsquo;ve got to understand, went through a lot of trial and error, just trying stuff out, seeing what it produced, trying it again, fiddling with it lots and lots and lots of trial and error.&lt;/p&gt;
&lt;p&gt;But after that, I could do some fun things. You&amp;rsquo;re all of course familiar with word clouds, but I bet you haven&amp;rsquo;t seen a non-word cloud. This is my non-word cloud.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-019.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;Now, of course, the big question is what is &amp;lsquo;others&amp;rsquo; doing there? I don&amp;rsquo;t know. For some reason my word list didn&amp;rsquo;t like the word others, but of course you can see here some more sort of consistent OCR errors. There&amp;rsquo;s another &amp;lsquo;the&amp;rsquo; and another &amp;lsquo;the&amp;rsquo;, and that would be a &amp;lsquo;be&amp;rsquo;, in most cases. And we also see where words have been split up. We&amp;rsquo;ve got a &amp;lsquo;tralia&amp;rsquo; down there. Oh, that&amp;rsquo;s a &amp;lsquo;which&amp;rsquo; obviously. So it&amp;rsquo;s actually quite useful visualizing in this way because I can then feed that back into my process of cleaning. I can see where the common errors are, and I can start to feed that back into the process.&lt;/p&gt;
&lt;p&gt;For each article that I processed in this way, I generated an accuracy score, which was simply the number of recognized words divided by the total number of words within the article. And I could use these scores to develop a couple of overviews.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-020.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;So of my set, this is only just my sample of course, so this is OCR accuracy over time. There aren&amp;rsquo;t many articles in this earlier period, so it&amp;rsquo;s probably not worth worrying about too much. But what&amp;rsquo;s interesting is this decline here down to the 1920s where we&amp;rsquo;re going below the 80% mark. Why is that? I&amp;rsquo;ve got no idea. There are a whole lot of variables which could be certainly involved here, whether it&amp;rsquo;s the fonts, whether it&amp;rsquo;s the quality of the printing, the quality of the paper, the quality of the microfilming. I don&amp;rsquo;t know. It&amp;rsquo;s something which would be interesting to explore further and to and investigate.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-021.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;We can also have a look at the poorest performing newspapers. So the &lt;em&gt;Perth Gazette and West Australian Times&lt;/em&gt; didn&amp;rsquo;t do too well, and it got 58% in my scorecard. Again, this is only a select sample, so I&amp;rsquo;m not quite sure what you can read into any of this, but it&amp;rsquo;s sort of interesting. These figures weren&amp;rsquo;t particularly important for my work, but I do think that the general issue of OCR quality is really vitally important, particularly as we make more and more scholarly use of these sorts of collections in bulk. I mean, obviously we need to improve the quality, but we also need to expose our assumptions about the OCR quality that underlie our work so that when we are putting forward something, some sort of analysis of the text, we&amp;rsquo;ve got a way of communicating the quality of the material that we&amp;rsquo;re working with.&lt;/p&gt;
&lt;p&gt;I then decided to make my sample set even more manageable by selecting just the first 10,000 articles, which had accuracy figures of over 80%. So I used my scores and I went through and I decided just to choose those ones which seemed to go pretty well. Of course, as any good digital humanities person does, I then started counting words.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-022.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;As with most of this stuff that I&amp;rsquo;m showing you tonight, &lt;a href=&#34;https://timsherratt.au/shed/presentations/nla/pages/frequencies.html&#34;&gt;it&amp;rsquo;s online and you can go and play with it&lt;/a&gt; and use it yourself. So this shows the word frequencies over time, and there&amp;rsquo;s a time slider here, and you can just drag it along and see what&amp;rsquo;s happening in different years. Now, nothing really significant jumps out at me from looking at the word frequency clouds here. I mean, what is sort of interesting I suppose, is the preponderance of &amp;lsquo;would&amp;rsquo; and &amp;lsquo;could&amp;rsquo;, which I suppose confirms the future orientation of the sample set that I&amp;rsquo;m working with. And there may well be other things within there that jump out at you. And so as I say, jump online and have a look and have a play with this and see what you can make of it.&lt;/p&gt;
&lt;p&gt;I mean, word frequencies&amp;hellip; Okay. So word frequencies can be interesting in getting a sort of overall picture of a large amount of text and starting to track some changes over time. But this sort of word frequency tells you what&amp;rsquo;s common. It doesn&amp;rsquo;t tell you what&amp;rsquo;s distinctive. It doesn&amp;rsquo;t tell you what&amp;rsquo;s interesting in an article. And another measure we can use to try and get at the distinctiveness of a piece of text is something called TF-IDF. It&amp;rsquo;s an acronym, term frequency, inverse document frequency. And what it does, it looks not just at the frequency of a word within a particular piece of text, it also looks at the frequency of that term across a collection of texts. So a word that is common in a particular article, but not very common in a collection of articles will appear as more significant, more heavily weighted in its TF-IDF value.&lt;/p&gt;
&lt;p&gt;You use TF-IDF values all the time. They&amp;rsquo;re used by search engines in calculations of similarity. They can take the TF-IDF values, convert it into a sort of mathematical format and use it to calculate the similarity between two pieces of text. And the results of calculating TF-IDF values for collections like this are pretty interesting, and I&amp;rsquo;ll just show you a little comparison.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-023.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;So on the left-hand side here, we have, this is 1939. The top 10 words on the left-hand side are just the plain frequency values, and here are the TF-IDF values. So you see we&amp;rsquo;re getting at something quite different and something quite interesting here. Obviously 1939, Hitler doesn&amp;rsquo;t figure in this list of terms. He&amp;rsquo;s at the top in this one. But we also get these really odd things like midget and roundabout. I found it really interesting producing these values, and I found them quite sort of evocative and interesting and wanted encouraging me to explore more, and I&amp;rsquo;m going to talk some more about this a bit later.&lt;/p&gt;
&lt;p&gt;But I finally just wanted to show you one other way of understanding a collection of texts, and that&amp;rsquo;s through a thing called topic modeling. There&amp;rsquo;s a lot of topic modeling going on in the digital humanities at the moment, and there are a number of good blog posts, which I&amp;rsquo;ll put links to from here, which tell you about what topic modeling is. I&amp;rsquo;m just going to quickly race through it. Basically, I use a piece of software called Mallet. I pointed Mallet at my collection of texts, told that I wanted to define 10 topics, that is 10 clusters of articles within those texts, and it just did it.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-024.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;And what it came back with is these lists of words which are grouped according to the topics which it believes existed. You can then go through and look at these lists of words and start to interpret them to try and understand what those topics are. And most of them are pretty clear. This of course, is the topic that tells me that I still didn&amp;rsquo;t clean up the OCR enough, but it&amp;rsquo;s interesting that it brought them all together.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ve got, I mean here we&amp;rsquo;ve got trade, here we&amp;rsquo;ve got technology, here we&amp;rsquo;ve got land/rural, here we&amp;rsquo;ve got international relations, here we&amp;rsquo;ve got government, and here we&amp;rsquo;ve got home and society. And it&amp;rsquo;s amazing once you run these things, how much sense the topics actually make to you.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-025.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;And it also goes through and for each article in your collection, it weights them according to these topics. So you can then go and for each article you can see which is the most heavily weighted topic for that article, and you can calculate the number associated with each topic and you can do something like that. Okay, that&amp;rsquo;s not terribly instructive as it is, but you can, I won&amp;rsquo;t show you now, but if you click on that and &lt;a href=&#34;https://timsherratt.au/shed/presentations/nla/pages/topic_totals.html&#34;&gt;go to the live version&lt;/a&gt; and you click on the legend down the bottom there, you can actually take away some of the lines so you can actually see what&amp;rsquo;s happening underneath, so you can just see the lines that you&amp;rsquo;re interested in.&lt;/p&gt;
&lt;p&gt;But basically, I want to do a lot more work on these topics at this stage, and I haven&amp;rsquo;t really done a lot of interpretation of them. I want to see how I&amp;rsquo;m actually using those weightings and find better ways of actually looking at them. So anyway, here I am. Not a lot of interpretation at this stage. No great insights. I have some data. I have a data set, and I&amp;rsquo;m going to be continuing to play with it. And as I&amp;rsquo;ve said, all this stuff is available online, so you&amp;rsquo;re welcome to come and play with it too and see what you can make of it. Now, you may think that I&amp;rsquo;ve gone into a lot of tedious detail about what I did. Well, I&amp;rsquo;ve actually saved you from a lot of the gory details.&lt;/p&gt;
&lt;h2 id=&#34;iv-meanings-for-mining&#34;&gt;IV. Meanings for mining&lt;/h2&gt;
&lt;p&gt;The truth of much research in the digital humanities is that large amounts of time are spent yak shaving, and data munging. If you don&amp;rsquo;t know the term &amp;lsquo;yak shaving&amp;rsquo;, it&amp;rsquo;s that process that we&amp;rsquo;re all familiar with, when you start doing a particular task and you realize, in order to achieve that task, you have to actually do something else or research something else, and that actually continues into infinite regression until you find yourself doing something which seems totally unrelated to the task that you started with. I&amp;rsquo;ve had a lot of that recently. There were lots of issues just involved in using this data and starting to manipulate it. As I&amp;rsquo;ve said before, the issue of the OCR quality is crucial, and we have to be upfront about the problems and continue to look for the most effective solutions. We have to talk about questions of selection and completeness. What&amp;rsquo;s actually in Trove? How does it change and how does this sort of influence the results that we get?&lt;/p&gt;
&lt;p&gt;One of my examples here is a thing called the Atomic Age Exhibition, which toured around Australia in 1948-49. It was a big thing. Many, many thousands of people visited. It was at the Easter Show in Sydney. If you search in Trove for Atomic Age Exhibition, you&amp;rsquo;ll find quite a lot of results coming from the Courier Mail in Brisbane. You&amp;rsquo;ll find virtually nothing from Sydney and Melbourne, and you might be inclined to think that the exhibition didn&amp;rsquo;t actually go to Sydney and Melbourne. Why is there nothing in Sydney and Melbourne? Because the exhibition was sponsored by the Herald in Melbourne and by the Daily Telegraph in Sydney, and both of those titles are currently not in the newspaper database.&lt;/p&gt;
&lt;p&gt;So we&amp;rsquo;ve got to bring these sorts of questions and perspectives as we start to do this research. Another barrier, which I started to butt my head up against in doing this was that of computing power. Generating the TF-IDF values for my sample took about a day and a half on my laptop. And of course, then you realize, you did something stupid and you have to do the whole thing again. And I did wonder at various times whether I was reaching the limits of what&amp;rsquo;s practically possible for one bloke and his laptop and wondering whether my sort of piecemeal efforts will be blown away by academic teams with access to research funds, bright young graduate students, and time on a supercomputer.&lt;/p&gt;
&lt;p&gt;Now, this list of problems and concerns might seem a bit depressing, and it might not be what you expected from this talk, but I want to reassure you, there are digital tools that make it easy to get started and start exploring the possibilities. QueryPic of course, there are other things like &lt;a href=&#34;https://voyant-tools.org&#34;&gt;Voyant&lt;/a&gt;, which is a great online tool for starting to do text analysis, but sooner or later you&amp;rsquo;re going to have to confront some pretty hefty questions. But hey, that&amp;rsquo;s just history. The past is messy and it raises difficult questions about things like selection and interpretation. The issues aren&amp;rsquo;t necessarily new, it&amp;rsquo;s just that they&amp;rsquo;re raised in a bigger, more technically challenging context.&lt;/p&gt;
&lt;p&gt;But what does really bug me is a nagging feeling that I should be taking statistics more seriously. That in constructing the sort of examples which I&amp;rsquo;ve been showing and the tools that I&amp;rsquo;ve been demonstrating, that I should actually be less impressionistic and more rigorous, as if I&amp;rsquo;m sort of not doing justice to the vast computing power that I have at my disposal. But I don&amp;rsquo;t want to do that. In January, I was at the American Historical Association meeting and I was actually able to see the culturomics guys live in person doing their spiel. And as they described their vision for a new science based on access to these huge cultural data sets, I tweeted, &amp;ldquo;Yeah, I want to use big data to tell better, more compelling, more human stories.&amp;rdquo;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-027.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;The British historian Tim Hitchcock has similarly described his own unease that the demands of big data seem to be moving him towards a more positivist style of history. In the humanities, we&amp;rsquo;ve been really fortunate to make use of many decades of research into things like information retrieval. We&amp;rsquo;ve adopted many of their concepts, their tools, and their formulae, but we&amp;rsquo;ve also adopted some of their language. And so we talk about what we&amp;rsquo;re doing as mining. Mining is an extractive process. We dig stuff up, we pull it out of the ground. But this seems to be pretty much the opposite of what I want to do. I mean, I do want to find structures and separate them out for different types of analysis, but then again, I want to put them back together. I want to observe them in different contexts as rich and as complex as possible. How do we do that?&lt;/p&gt;
&lt;p&gt;Well, first of all, we have to work out better ways of incorporating these sorts of big data perspectives into the narratives that we write. Just as QueryPic gives you that opportunity to sort of zoom out and get a big picture, I think we have to take control of the zoom and use it to our advantage. And this, by the way, probably means developing new forms of publication that allow easier and better integration of data and text. It&amp;rsquo;s challenging, but there&amp;rsquo;s not much point to dwelling on the dangers and problems of big data, and as Tim Hitchcock concludes, we simply need to get on with it.&lt;/p&gt;
&lt;h2 id=&#34;v-screwmeneutics-and-deformance&#34;&gt;V. Screwmeneutics and deformance&lt;/h2&gt;
&lt;p&gt;The second approach is to foster acts of creative subversion, to use digital tools in new ways. Literary scholars within the digital humanities talk about the possibilities of deformance, of using computational methods to change texts in ways that can open them up to new and new critical perspectives. Stephen Ramsey also talks about moving beyond traditional forms of search and browse and admitting &amp;lsquo;screwing around&amp;rsquo; as a legitimate research methodology. Of course, historians don&amp;rsquo;t want to start deforming their sources or do they?&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-029.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;This is an experiment I created called &lt;a href=&#34;https://www.realfaceofwhiteaustralia.net/faces/?rsort=3&#34;&gt;The Real Face of White Australia&lt;/a&gt;. I always get a bit teary when I put this up. Well, what I&amp;rsquo;ve done here is actually use computer vision software to extract portrait photographs from certificates, which were used in the administration of the White Australia policy. These are records held by the National Archives of Australia. There&amp;rsquo;s several thousands of these, and this is just from one series, and you can just keep scrolling and scrolling forever, or almost forever. So by manipulating the sources in these ways, by extracting those photographs, I&amp;rsquo;ve created a new way of seeing these records and it&amp;rsquo;s quite powerful.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-030.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;But we can also be playful. You may have seen this. This is a little game that I created using the newspaper database. Again, it&amp;rsquo;s very simple. It just picks a newspaper article at random from the database and asks you to try and guess the year in which it was published. So any guesses for this one? What would we say? Let&amp;rsquo;s say 1850&amp;hellip; That&amp;rsquo;s a bit later than that&amp;hellip; Let&amp;rsquo;s see, it&amp;rsquo;s earlier. Okay, so you can keep going like this. You can go and try it out yourself later. As I said, it&amp;rsquo;s very simple, but it&amp;rsquo;s also strangely addictive. And of course, it&amp;rsquo;s also a way of exploring the content of Trove by screwing around.&lt;/p&gt;
&lt;p&gt;QueryPic, the Real Face of White Australia and newspaper roulette, my &lt;a href=&#34;https://headlineroulette.net&#34;&gt;Headline Roulette&lt;/a&gt;, also have something else in common. They are public. I want people to use them. I want people to have fun. I want people to be moved. I want people to find things to be surprised and to do history.&lt;/p&gt;
&lt;p&gt;Just yesterday I received an email from a self-confessed Australian history addict, oh no, Australian history fanatic, sorry. And she had become addicted to Headline Roulette. She wanted to know if I could add a facility for users to actually save their scores. So presumably they could go back and see if they&amp;rsquo;d improved or share them with their friends. So obviously the next step is the Facebook application. I mean, other people have described to me how scrolling through the Real Face of White Australia brought them to tears. And I&amp;rsquo;ve come to realize that these sorts of interactions really mean more to me than a footnote in an academic article. I&amp;rsquo;ve probably just killed my hopes of an academic career there. I want to use digital tools, I do want to use them to deform history. I want to use it to deform history in a way that makes it accessible to new audiences in new ways. And so I present to you in honor of my Howard White Fellowship, a new experiment.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-031.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;Now, I described to you before the process involved in calculating the TF-IDF values. What I didn&amp;rsquo;t describe was the fun that I had when I was doing it. It was really quite exciting and amusing and funny and all sorts of things, watching the words fly past on the screen. And as I completed each year, I would had a little script which would show me the top 20 words for that year. And anybody who follows me on Twitter, we all have a good picture of what was going on there because I couldn&amp;rsquo;t help but share this. So I mean, it tells their own story. I mean, it was really like a sort of wonderful puzzle as I say there, as they all came up. And then I started tweeting some of them.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-032.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-033.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;This was a nice one. I like the &amp;lsquo;hitler&amp;rsquo; with &amp;lsquo;mudguards&amp;rsquo;, &amp;lsquo;duchess&amp;rsquo;, &amp;lsquo;opossum&amp;rsquo;, &amp;lsquo;hollywood&amp;rsquo; and &amp;lsquo;canberra&amp;rsquo;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-034.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;And as I said here, of course, I mean there&amp;rsquo;s got to be a novel in &amp;lsquo;prince&amp;rsquo;, &amp;lsquo;pronunciation&amp;rsquo;, &amp;lsquo;keyboard&amp;rsquo;, &amp;lsquo;zulu&amp;rsquo;, &amp;lsquo;begged&amp;rsquo;, &amp;lsquo;unbent&amp;rsquo;, &amp;lsquo;diddle&amp;rsquo;, &amp;lsquo;candlesticks&amp;rsquo;, &amp;lsquo;virtuoso&amp;rsquo;, &amp;lsquo;highness&amp;rsquo; and &amp;lsquo;pots&amp;rsquo;. This started me thinking, was there a way I could share this experience and use the TF-IDF values as a way of exploring my data set, a way of opening this experience to others, as creating a sort of shifting playful window on the future of the past. So this is my first attempt. Again, public, &lt;a href=&#34;https://wraggelabs.com/fotp/&#34;&gt;go play with it&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/selection-035.png&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;&#34;&gt;
&lt;p&gt;I&amp;rsquo;ve actually deliberately tried to keep most of the metadata away from this interface because I wanted the words to be the focus. And yeah, it does look a bit like that fridge poetry thing that you can get, and that&amp;rsquo;s quite deliberate. I mean, actually at some stage, I want to add a box down here where you can drag your words down here and make your own sort of collections and tweet them. What it&amp;rsquo;s showing you there is just a random selection of TF-IDF values from my sample, and you can click on any one of these, and it goes away and it sees, first of all, how many years have that value attached to it. If there&amp;rsquo;s only one year, then it&amp;rsquo;ll actually return that year. If there is, let&amp;rsquo;s see if we can find one that has more than one year. If it appears in more than one year, then it pulls out a random selection of those values from those years.&lt;/p&gt;
&lt;p&gt;Okay. Oh no, we&amp;rsquo;ve got 1943. I&amp;rsquo;m not doing a good job of it this time. Anyway, you can have fun with it. And of course, if you want to actually see what&amp;rsquo;s going on, you can click on these and it will actually load the articles here, and you can explore the text of them there and see where the word&amp;rsquo;s popping up.&lt;/p&gt;
&lt;p&gt;Okay, what is this? I&amp;rsquo;m not quite sure. It&amp;rsquo;s not really a discovery interface, although you can find interesting stuff. It&amp;rsquo;s not quite a game, but it is quite fun to explore. To me, I really, I&amp;rsquo;m sort of in love with it at the moment because it actually, I mean, you think about what I&amp;rsquo;m trying to do in terms of recapture the presentness of the past. Our experience is not about just the big stories of the day. Our experience of any moment includes a whole lot of trivial aspects. And I love the way that this sort of brings together Churchill and Corpuscle and Melvin, whoever Melvin is. I love the mix of words, and it just, to me is incredibly evocative. It makes you want to start imagining stories. It makes you want to explore, it makes you want to find out more, but it just has a wonderful, exciting aspect to it itself. So I&amp;rsquo;m not quite sure what I&amp;rsquo;m going to do with it or how I&amp;rsquo;m going to develop it, but really, as I say, I&amp;rsquo;m quite in love with it at the moment, and I hope you&amp;rsquo;ll have a play with it and see what you make of it.&lt;/p&gt;
&lt;p&gt;Could it be a discovery interface? I don&amp;rsquo;t know. It does enable you to get into my dataset, but of course, it&amp;rsquo;s obviously from a rather indirect means and it includes lots of randomness as well. And I&amp;rsquo;m a big fan of randomness in actually developing new ways of discovery. So there you go. Please take it away, enjoy, play. I may not have conquered the meaning of time yet, but experiments like this actually make me think about the form in which I actually present those sorts of arguments and those sorts of ideas. How do we actually create resources which give that sort of sense of the disjunctions and the serendipity? So while I may not have achieved all I wanted to, I&amp;rsquo;ve come away with a better sense of what it is that I&amp;rsquo;m trying to do and what I want to do with this material. So thank you.&lt;/p&gt;
&lt;h2 id=&#34;questions&#34;&gt;Questions&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Marie-Louise Ayres:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Thanks very much, Tim. Just before we open up for questions, there was just a few things that I thought I wanted to say. One is, look, two Australians have corrected more than a million lines of text each. So if you think you couldn&amp;rsquo;t correct 40,000 editorials, you are not being ambitious enough. That&amp;rsquo;s the first thing. The second thing to say is that our own Trove team have found that the only surname that is not in Trove you could do things that are not is Kardashian. And the third is, I guess just thinking about how amazingly creative these visualizations are that Tim has been doing, and I hope you&amp;rsquo;ll ask him about them.&lt;/p&gt;
&lt;p&gt;But the fourth thing I wanted to say is to pick you up on one of your early comments where you said, &amp;ldquo;I haven&amp;rsquo;t got as far as I wanted.&amp;rdquo; Now that&amp;rsquo;s a very interesting construction that includes the past, the present, the future, and a spatial term as well. So maybe you need to think about that. I think that we&amp;rsquo;d agree with the work that Tim&amp;rsquo;s has done. I don&amp;rsquo;t know where he wanted to be, but he&amp;rsquo;s gotten a long, long way and done things that the rest of us just probably haven&amp;rsquo;t even contemplated. So I&amp;rsquo;m hoping you&amp;rsquo;ll ask Tim some questions now and then we&amp;rsquo;ll have more opportunities afterwards. So it&amp;rsquo;s dark out there. So if you want to ask a question, can you just make sure you raise your hand and speak up? So don&amp;rsquo;t be shy. Yes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q&amp;amp;A 1:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;How appropriate would your methodology be for spelling? Where I&amp;rsquo;m coming from is I know the Australian Labor Party and the British Labour Party spell different. And I can remember once going through microfilms that the Sydney Morning Herald in 1920s, and it dawned on me all those spellings were American. And so there must be things happening where you could compare how words are spelled or do all the correctors corrupt your data just by correcting.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tim Sherratt:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I certainly think you could do that sort of analysis. And actually some of the nice examples that the culturomics guys used in the Google Books example was looking at, oh God, the name is&amp;hellip; what irregular verbs, looking at changes in irregular verbs over time, which is quite interesting. But their data set goes back to quite a long way. And there are challenges. One of the challenges in working with Trove, I mean obviously the interface is geared towards discovery at the moment, making sure that people find what they&amp;rsquo;re after. But that means that sometimes if you want to find something exact, it can be a bit tricky. You&amp;rsquo;ve got to know sort of how to turn off the fuzziness in the searching. And sometimes you are foiled in that by the fact that people might&amp;rsquo;ve tagged something and the search by default also searches the tags and the comments.&lt;/p&gt;
&lt;p&gt;So when I did my first, I don&amp;rsquo;t know whether you saw it in my World War I graph, you may have noted that there was a peak, little peak, for World War I actually during the First World War, which is sort of interesting if you think about it. And that&amp;rsquo;s because people had tagged those articles with World War I. So again, this is all about, and one thing which I would always emphasize as we start to do this research, we have to develop our literacy in terms of understanding search interfaces and how they work, and be prepared to go into the documentation and to look at the advanced searches and how they work and actually start experimenting a bit with what different searches bring back, so that you can actually have a good picture. And I think mean, obviously the institutions themselves have a role in communicating this and making it, exposing what&amp;rsquo;s going on behind the scenes. But I think it&amp;rsquo;s an important literacy for researchers going on into the future, being able to pull these things apart to understand exactly what&amp;rsquo;s going on. So you can do those sorts of quite detailed fine-grained comparisons.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q&amp;amp;A 2:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m interested in what year was it the picture of White Australia and was that the people that were actually accepted or what?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tim Sherratt:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Certainly not. Well, okay. I should say that this is part of a broader project called Invisible Australians. If you just go to InvisibleAustralians.org, there&amp;rsquo;s a lot more information about what we&amp;rsquo;re trying to do with these records. That particular set of records, as I think I said, those photographs were just pulled from one series within the National Archives of Australia. And there are many more series like that. They are a series of certificates. Basically, if a person deemed non-white was living in Australia, they wanted to travel overseas and get back into the country again&amp;hellip;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q&amp;amp;A 2:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Those people that lived in Australia rather than people that tried.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tim Sherratt:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;No, because&amp;hellip; Yeah, there was no trying to it. So yes, it is people who are living in Australia. And this is what is particularly interesting about these records because what we want to do, and this isn&amp;rsquo;t Trove related, I&amp;rsquo;m sorry, but what we want to do with those records is to actually try and extract the biographical information which is contained within those certificates in order to sort of find out more about the community who was living under the White Australia policy, people who were living here, whose various activities were restricted in a number of ways by the White Australia policy in all its sort of legislative forms. And we&amp;rsquo;re bringing to bear a number of digital techniques to try and do that. As I said, with that particular case, it was a facial recognition script which pulled out as photographs, but we&amp;rsquo;re also harvesting material from the National Archives and doing some topic modeling. I&amp;rsquo;ll be doing some topic modeling as I showed there on some of the records to try and pull out clusters within those records. So anyway, check it out. It&amp;rsquo;s something I&amp;rsquo;m very passionate about.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q&amp;amp;A 3:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m just wondering, it seems like the [inaudible 00:57:20] is a bit of issue as you talked about. And I guess there&amp;rsquo;s probably a couple of aspects to that. One is like a computer vision, technology is involved and the other part is what do you do after you&amp;rsquo;ve got that? Is there anything clever that you can do to guarantee IDF or whatever else to try and make better quality? I don&amp;rsquo;t know if you can describe how do you see the next few years panning out with that? Do you think that there&amp;rsquo;s a lot of improvement going on?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tim Sherratt:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s certainly a lot of work going on and there are techniques which could be used. There are more developing specific language models that tell you that if you have a certain combination of letters, then after that combination you can expect a certain range of letters but not other letters, for example. And you can use those in sort of probabilistic techniques to sort of go across and see what&amp;rsquo;s likely to be at particular points. And specifically relating to digitized newspapers, there&amp;rsquo;s stuff going on. There&amp;rsquo;s a project in the US called Mapping Texts, which actually works with Chronicling America, the digitized newspapers from America. They actually went through and just actually very similar to what I did, but with a bigger budget and access to Stanford&amp;rsquo;s resources, did things like the topic modeling and word frequencies. It also did what&amp;rsquo;s called named entity recognition, which is pulling out people and places from the texts, but they also ran through their sample set and generated figures for OCR accuracy. And they&amp;rsquo;re fairly similar to my figures actually.&lt;/p&gt;
&lt;p&gt;So there&amp;rsquo;s a lot of obviously recognition of this in Europe. There&amp;rsquo;s actually a particular research group which has been looking at methods of improving OCR. So I mean, and of course there are many more cases which are much more complex than this if you think about old Germanic scripts or something like that. And so there&amp;rsquo;s a lot of interest, a lot of concern, and a lot of work going on, I think, to see. And so I think it&amp;rsquo;s something that we have to be prepared to revisit over time that there are going to be more possibilities for doing stuff with computers as this comes online. And so we should need to assess constantly reassess what&amp;rsquo;s actually possible and see what we can do.
But yeah, I, think, given the general awareness of the problem and the problems that it causes, I mean I think there&amp;rsquo;s certainly going to be a lot. And I think it&amp;rsquo;s really exciting the fact that we have now starting to get these collections all around the world of digitized newspapers and the possibilities that opens up for doing comparative stuff. What I didn&amp;rsquo;t mention with QueryPic is that you can also access New Zealand newspapers. It uses the DigitalNZ API and accesses papers past, so you can actually do graphs for New Zealand papers, but what you can&amp;rsquo;t do meaningfully is compare Australian and New Zealand results. And that&amp;rsquo;s because DigitalNZ, our research currently searches the titles of articles and not the full text. Wouldn&amp;rsquo;t it be really nice if we&amp;rsquo;re both searching the same things and we could do those sorts of comparisons and we could do it with the US and we could do it with Canada. I think there are some really interesting possibilities there.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q&amp;amp;A 4:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I was just going to say &amp;lsquo;stine&amp;rsquo; is very interesting there because in my opinion, it&amp;rsquo;s obviously a column break from Palestine. And so that&amp;rsquo;s a common sort OCR error, E and S being a sort of fragile combination. And not only that, but the rules of breaking, they tend to do chunks like that. And also it shows how the TF-IDF is working. Palestine itself as a whole doesn&amp;rsquo;t appear there because it&amp;rsquo;s not actually that important, but &amp;lsquo;stine&amp;rsquo; got promoted because it&amp;rsquo;s extremely uncommon and &amp;lsquo;Pale&amp;rsquo; got dropped off. And of course &amp;lsquo;Pale&amp;rsquo; would be there&amp;hellip; because it&amp;rsquo;s a very common word.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tim Sherratt:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Yeah, that&amp;rsquo;s nice. But yes, I mean as you go through this, you will see other instances where the sort of OCR issue is coming up again. But that&amp;rsquo;s also another nice example of thinking about how using computational techniques we can start to improve some of the OCR because you are looking at the way words break and seeing if we can use that in some way. Thanks, that&amp;rsquo;s great.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q&amp;amp;A 5:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As just before, you said the word evocative, Tim, I was saying to myself, evocative, and so I wanted to talk about that for a minute rather than talk about a technical thing. It seems to me this is a really interesting, I don&amp;rsquo;t know, I just want you to say more about is this a different kind of historical mode, this kind of desire to treat the past to evoke rather than to necessarily narratize or analyze, or define or pin down. What is it about, is there something distinctive about this evocative mode, which is to do with the digital techniques or, yeah, going on?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tim Sherratt:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Yeah, I mean this is the thing I&amp;rsquo;ve actually just been trying to grapple with over the last few weeks as I started playing with this stuff. And I don&amp;rsquo;t know exactly what it is. All I know is what I feel as you do when you see it. And they do make you start to think in different ways and to imagine and make connections. I mean, I think with your work, with Cath&amp;rsquo;s work on Semble, there are possibilities for creating spaces which encourage people to make connections, to see relationships between things. And I think digital technologies do lend themselves to that because, I don&amp;rsquo;t know, as I said, I actually think randomness is something which is rather undervalued in terms of exploration and discovery. And as you know, there&amp;rsquo;s another project that we worked on called The History Wall at the National Museum of Australia, and that brought together material in quite a random fashion. And it was actually, again, quite evocative in terms of it being able to see the sort of possible relations between items there.&lt;/p&gt;
&lt;p&gt;As I said, I don&amp;rsquo;t know what it lends itself to. What is the process? Is it discovery? Is it a prompt for research questions? I don&amp;rsquo;t know. But it just seems to me to be something which is worth exploring more and I find that it&amp;rsquo;s actually just something which I keep doing, so I must be interested in it in some way. And yeah, I mean obviously, I mean it&amp;rsquo;s something which would definitely be worth thinking about some more. I mean there are all sorts of ways in which you can develop a sort of evocative sense. I mean obviously historical photographs and they give you a different sort of feeling from seeing a text description. So yeah, I don&amp;rsquo;t know. Really interesting question.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q&amp;amp;A 6:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Tim, I&amp;rsquo;d just like to congratulate you on your work. I just find this really interesting and I think it&amp;rsquo;s really great that researchers like yourself take our data and play around with it, as you said, because ultimately some of those ideas do lead to really useful actual applications. And I just wanted to say your OCR accuracy results are actually bang on because we did quite a bit of research on that five years ago before we launched, and it was 65% to 70%, which is of course low, which was why we said how can we change that and get the public to help?&lt;/p&gt;
&lt;p&gt;But you&amp;rsquo;re quite right, as time moves on and as these big data sets are made more open and available, people develop technologies to improve that. So five years ago, that didn&amp;rsquo;t exist, an automated way to improve it. We now know of at least three other people like yourself have figured out how globally they could really increase that OCR accuracy rate and prove it. So I guess that&amp;rsquo;s questions that I would have, how some of this really fantastic research and ones you mentioned in Europe can be built back in to improve on services. But I just wanted to say, did you find it really useful having API being able to get that data? Because I know that was a dream we&amp;rsquo;d had for a long time and I know you waited a long time to go.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tim Sherratt:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Well, I didn&amp;rsquo;t wait. I didn&amp;rsquo;t wait, did I? I actually just went ahead and did it myself.&lt;/p&gt;
&lt;p&gt;Yeah, look, I mean background for those who didn&amp;rsquo;t know is that I built my own unofficial API at one point, which I used to do some experiments. But yeah, I mean it obviously makes a whole lot of things easier. First of all, the point of view of doing the large data dumps, getting that sort of material, and you&amp;rsquo;re not downloading the whole web page and all that stuff on it, but you&amp;rsquo;re just actually getting the data in a structured way. Great. And as anybody who also follows my work will know is that I had a number of frustrating experiences where things change on the web page and everything I created broke and I had to fix it.&lt;/p&gt;
&lt;p&gt;So APIs do away with all that. It&amp;rsquo;s fantastic. But one of the really good things I like about having API access is how easy it makes it to do something like Headline Roulette. If you have an idea and you&amp;rsquo;ve got a bit of coding experience, you can act on it and you actually build something. And that to me is the most exciting aspect and encouraging people to actually experiment. That&amp;rsquo;s what it&amp;rsquo;s all about to me, is creating an environment where people do experiment with this stuff and build things.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.15771695&#34;&gt;&lt;img src=&#34;https://zenodo.org/badge/DOI/10.5281/zenodo.15771695.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>A brief and biased history of Trove Twitter bots</title>
      <link>https://updates.timsherratt.org/2025/06/19/a-brief-and-biased-history.html</link>
      <pubDate>Thu, 19 Jun 2025 12:08:35 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/06/19/a-brief-and-biased-history.html</guid>
      <description>&lt;p&gt;The socials recently alerted me to an &lt;a href=&#34;https://doi.org/10.1177/13548565251334087&#34;&gt;interesting article&lt;/a&gt; by Dominique Carlon, Jean Burgess, and Kateryna Kasianenko on the history of community-created Twitter bots. The article explores bot-making within the context of Twitter&amp;rsquo;s rise and fall, and provides a handy taxonomy of bot species. However, it doesn&amp;rsquo;t include any Australian bots amidst the examples. That&amp;rsquo;s a bit disappointing, as I remember the bot-building years as a time of great fun and creativity. My own contribution to the world of Twitter bots was mainly focused on &lt;a href=&#34;https://trove.nla.gov.au/&#34;&gt;Trove&lt;/a&gt; (what a surprise!), so I thought I might as well jot down a few incomplete and biased notes about the history of Trove Twitter bots.&lt;/p&gt;
&lt;h2 id=&#34;trove-tweeting-trends&#34;&gt;Trove tweeting trends&lt;/h2&gt;
&lt;p&gt;It just so happens that I recently packaged up some &lt;a href=&#34;https://updates.timsherratt.org/2025/06/10/new-dataset-trove-links-shared.html&#34;&gt;data about Trove links shared on Twitter&lt;/a&gt;. Using this data, we can get a broad perspective on the activity of Trove Twitter bots between 2009 and 2020. The identification of bots is based on the Trove bots list I maintained on Twitter, so it&amp;rsquo;s possible I&amp;rsquo;ve missed some.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;In total, 43 bots posted 318,767 tweets containing 270,474 unique Trove urls between June 2013 and December 2020.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a chart showing the total number of links to Trove shared by Twitter bots each year.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/bot-tweets-per-year.png&#34; width=&#34;600&#34; height=&#34;334&#34; alt=&#34;Bar chart showing the number of Trove links tweeted by bots per year from 2013 to 2020. The number of links rises dramatically in 2018 and reaches a peak in 2019, before dropping away in 2020.&#34;&gt;
&lt;p&gt;Most of the bots shared digitised newspaper articles, but some shared works from other Trove zones. This chart breaks the links down by the type of resource (&amp;lsquo;article&amp;rsquo; equals newspaper article).&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/bot-tweets-year-type.png&#34; width=&#34;600&#34; height=&#34;295&#34; alt=&#34;Bar chart showing the number of Trove links tweeted by bots per year from 2013 to 2020, with the type of linked resource indicated by colour. The number of links rises dramatically in 2018 and reaches a peak in 2019, before dropping away in 2020. The vast proportion of links are to newspaper articles, with a fairly consitent number going to other types of resources.&#34;&gt;
&lt;p&gt;And one final chart showing the number of active bots per year.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/active-bots-year.png&#34; width=&#34;600&#34; height=&#34;347&#34; alt=&#34;Bar chart showing the number of bots actively tweeting Trove links by year from 2013 to 2020. The numbers rise slowly to 2017, then rise dramatically in 2018, reaching a peak in 2019, and falling away in 2020.&#34;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;year&lt;/th&gt;
&lt;th&gt;active bots&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2013&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2014&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2015&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2016&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2017&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2018&lt;/td&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2019&lt;/td&gt;
&lt;td&gt;38&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2020&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;From the data above you can see that bot activity grew slowly between 2013 and 2017, before taking off dramatically in 2018. The peak year for Trove bots was 2019, when 38 individual bots shared more than 100,000 links to Trove. But a mass extinction event in 2020 almost halved the number of active bots. So what happened?&lt;/p&gt;
&lt;h2 id=&#34;build-a-bot-begins&#34;&gt;Build-a-bot begins&lt;/h2&gt;
&lt;p&gt;In June 2013, inspired by bot creators like Mark Sample, I hooked the Trove API up to Twitter to see what would happen when &lt;a href=&#34;https://discontents.com.au/conversations-with-collections/index.html&#34;&gt;GLAM collections joined online social spaces&lt;/a&gt;. The result was @TroveNewsBot, sharing digitised newspaper articles from Trove.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/trovenewsbot-twitter.png&#34; width=&#34;520&#34; height=&#34;315&#34; alt=&#34;Screen capture of TroveNewsBot&#39;s original Twitter profile.&#34;&gt;
&lt;p&gt;Twitter bots started popping up around the world, sharing collection items from Europeana, the Digital Public Library of America, DigitalNZ, the Cooper Hewitt Museum, and the Brooklyn Museum, amongst others. But @TroveNewsBot was always a bit different. Instead of just sharing randomly selected resources, @TroveNewsBot helped people explore Trove without leaving Twitter. If you tweeted keywords at the bot, it would run a search using the API and tweet back the most relevant result. By adding hashtags, users could &lt;a href=&#34;https://github.com/wragge/trovenewsbot&#34;&gt;control a variety of search parameters&lt;/a&gt; – for example, if you included the hashtag #luckydip you&amp;rsquo;d get back a random article from your search results.&lt;/p&gt;
&lt;p&gt;My favourite bot behaviour was its &amp;lsquo;opinionator&amp;rsquo; mode. If you tweeted a url at @TroveNewsBot, it would retrieve the link, extract keywords from the text, and then search for those keywords in Trove&amp;rsquo;s newspapers. This enabled @TroveNewsBot to have conversations with other online resources – for example, it replied to tweets from DPLA and DigitalNZ, &lt;a href=&#34;https://wakelet.com/wake/fa91d582-33e5-400f-9c27-b6c1c5b992b8&#34;&gt;finding connections between different collections&lt;/a&gt;. I also used the &amp;lsquo;opinionator&amp;rsquo; mode to set up a dialogue between past and present. Several times a day, the bot would grab keywords from the latest news items on the ABC (later the Guardian) website, search for historic newspaper articles, and then tweet both stories, old and new.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/life-on-the-outside.041.jpg&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;Slide illustrating TroveNewsBot&#39;s opinionator mode. There are three images: a screen capture from an ABC news article about the Scottish Independence Referendum, a Trove newspaper article headed &#39;Scottish Independence&#39; from 1928, and a TroveNewsBot tweet linking the two.&#34;&gt;
&lt;p&gt;&lt;em&gt;@TroveNewsBot&amp;rsquo;s opinionator mode in action – a slide from my keynote presentation &lt;a href=&#34;https://doi.org/10.5281/zenodo.3566879&#34;&gt;&amp;lsquo;Life on the outside: connections, contexts, and the wild, wild web&amp;rsquo;&lt;/a&gt; for the Annual Conference of the Japanese Association of Digital Humanities in 2014&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;As well as providing digitised content, such as the newspapers, Trove aggregates collection metadata from hundreds of organisations around Australia and makes it available through its own API. This meant that any organisation could use the Trove API to create a Twitter bot that shared items from &lt;em&gt;their own collection&lt;/em&gt;. To encourage more of this sort of experimentation, I created the &lt;a href=&#34;https://github.com/wragge/trovebuildabot&#34;&gt;Build-a-Bot Workshop&lt;/a&gt; GitHub repository. This repository included instructions and code for anyone wanting to build their own collection bot on top of the Trove API. Like @TroveNewsBot, these collection bots could share random items and respond to user queries.&lt;/p&gt;
&lt;p&gt;Before long, @CurtinLibBot was sharing photos from the Curtin University Library&amp;rsquo;s image collection, and @Kasparbot was tweeting about objects from the National Museum of Australia. By the end of 2013, I&amp;rsquo;d &lt;a href=&#34;https://discontents.com.au/an-addition-to-the-family/index.html&#34;&gt;added to the family&lt;/a&gt; by creating @TroveBot. While @TroveNewsBot dug into the digitised newspaper articles, its younger sibling looked for inspiration amongst Trove&amp;rsquo;s other zones – sharing books, journals, photos, maps and more.&lt;/p&gt;
&lt;p&gt;In 2015, Steve Leahy unleashed @TrovePenguinBot upon the world, searching for sardines amongst the digitised newspapers. In 2016, one of my students at the University of Canberra modified the &lt;a href=&#34;https://github.com/lolibrarian/NYPL-Emoji-Bot&#34;&gt;NYPL Emoji Bot code&lt;/a&gt; to create @TroveEmojiBot – if you tweeted an emoji at the bot, it would respond with a suitably-themed newspaper article. In 2017, the &lt;a href=&#34;https://digitisethedawn.org/&#34;&gt;Digitise the Dawn campaign&lt;/a&gt; bot-ified their Twitter account, posting an article each day from Louisa Lawson&amp;rsquo;s journal, &lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/252&#34;&gt;The Dawn&lt;/a&gt;. Meanwhile, @astrove_bot started sharing newspaper articles relating to astronomy.&lt;/p&gt;
&lt;p&gt;And then there was The Vintage Face Depot&amp;hellip;&lt;/p&gt;
&lt;h2 id=&#34;things-get-weird&#34;&gt;Things get weird&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;d been experimenting for a few years with &lt;em&gt;faces&lt;/em&gt; as a way of connecting to GLAM collections –as alternative entry points, based not on metadata but &lt;a href=&#34;https://doi.org/10.5281/zenodo.3579530&#34;&gt;the people inside&lt;/a&gt;. In 2015, this led me to create &lt;a href=&#34;https://wragge.github.io/face-depot/&#34;&gt;The Vintage Face Depot&lt;/a&gt;. If you tweeted a photo of yourself to @facedepot, the bot would select a face at random from a collection I&amp;rsquo;d compiled from Trove newspapers and superimpose that face over yours, tweeting you back the result and a link to the original article, so you could find out more about the person you&amp;rsquo;d been matched with.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/unremembering-dh2015.038.jpg&#34; width=&#34;600&#34; height=&#34;450&#34; alt=&#34;Slide showing the operation of The Vintage Face Depot. There are three screen captures from Twitter. Each includes a portrait photo shared by a Twitter user, and facedepot&#39;s reply that includes a modified version of the photo with a face from Trove&#39;s newspapers overlaid, and a link to the original newspaper article.&#34;&gt;
&lt;p&gt;&lt;em&gt;@facedepot in action – a slide from my keynote presentation &lt;a href=&#34;https://doi.org/10.5281/zenodo.3566887&#34;&gt;&amp;lsquo;Unremembering the forgotten&amp;rsquo;&lt;/a&gt; for the Alliance of Digital Humanities Organizations Annual Conference in 2015&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Now, in a time of deep fakes and AI generated images, @facedepot&amp;rsquo;s efforts seem quaint and kludgy. But that was always the point. I wanted to mess around with the barriers that put some people on the other side of this wall we call the past – to explore what historian Devon Elliot suggested on Twitter was an &amp;lsquo;uncanny temporal valley&amp;rsquo;. As I argued in &lt;a href=&#34;https://discontents.com.au/the-perfect-face/&#34;&gt;The Perfect Face&lt;/a&gt;, a presentation at NDF2015:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The Vintage Face Depot tells you nothing about yourself. I built it at about the same time as Microsoft launched their How-Old bot that uses machine learning to estimate your age. Face Depot does nothing clever, and yet sometimes the results are uncanny, even unsettling. Microsoft might be able to tell you how old you are, but  Face Depot asks &lt;em&gt;who&lt;/em&gt; you are and pushes you in the direction of a past life, linked merely through chance.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/vintage-faces.gif&#34; width=&#34;600&#34; height=&#34;400&#34; alt=&#34;Animated gif showing some images generated during testing of facedepot&#34;&gt;
&lt;h2 id=&#34;glitch-bots-for-all&#34;&gt;Glitch bots for all&lt;/h2&gt;
&lt;p&gt;While I&amp;rsquo;d shared some bot-building code, rolling your own bot still required access to a web-connected server – a significant barrier for most would-be experimenters. This changed in 2017 with the arrival of Glitch, a platform that enabled anyone to build simple web apps for free. Perhaps most importantly, Glitch apps were remixable – simply by clicking a button, you could open an editor and create your own customised version of any app.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/glitch-trove-bots.png&#34; width=&#34;600&#34; height=&#34;372&#34; alt=&#34;Screen capture from the Trove page in Glitch, showing the four bot templates.&#34;&gt;
&lt;p&gt;Glitch seemed like an ideal environment in which to experiment with bots, so I created four remixable Trove Twitter bot recipes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;trove-collection-bot&lt;/strong&gt; – sharing resources from a partner collection&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;trove-list-bot&lt;/strong&gt; – sharing items from a Trove list&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;trove-title-bot&lt;/strong&gt; – sharing articles from specific newspapers&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;trove-tag-bot&lt;/strong&gt; – sharing items with specific tags&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These were supported by a &lt;a href=&#34;https://101dhhacks.net/2018/01/21/trove-bots-for-all/&#34;&gt;detailed tutorial&lt;/a&gt; that walked through the process of customisation and suggested ways in which the basic recipes could be extended – for example, by adding a specific search query to a title bot.&lt;/p&gt;
&lt;p&gt;This was the beginning of the bot explosion, with more than 30 Trove Twitter bots born between 2017 and 2019.&lt;/p&gt;
&lt;p&gt;One of these, @NTTimesGazette, was created by curator and journalist Caddie Brain to tweet articles from the &lt;em&gt;Northern Territory Times and Gazette&lt;/em&gt;. The bot was featured on ABC radio in Darwin under the headline: &lt;a href=&#34;https://www.abc.net.au/news/2018-02-16/trove-twitter-unearths-history-newspaper-nt-times-and-gazette/9445458&#34;&gt;Twitter bot offers a rare look inside the Darwin&amp;rsquo;s forgotten first newspaper&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Historian Brett Holman created a series of bots related to aviation history. More than just a source of amusement, the bots became part of Brett&amp;rsquo;s research practice, as described in his &lt;em&gt;History Australia&lt;/em&gt; article &lt;a href=&#34;https://doi.org/10.17613/9h30-ke82&#34;&gt;&#39;@TroveAirRaidBot, a 24/7/365 research assistant&#39;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Perhaps the best part of this bot-making extravaganza was the number of self-professed &amp;lsquo;non coders&amp;rsquo; who were able to take their first steps into the world of programming and actually &lt;em&gt;create something&lt;/em&gt;. I have memories of sitting in the shade at Canberra&amp;rsquo;s now defunct Big Splash water park, troubleshooting someone&amp;rsquo;s Twitter bot on my phone, while the kids played on the water slides – it was fun, and it was exciting. Together, Trove, Twitter, and Glitch opened up new possibilities for learning and experimentation, and new ways of knowing Australia&amp;rsquo;s cultural heritage.&lt;/p&gt;
&lt;h2 id=&#34;2019-bot-roll-call&#34;&gt;2019 bot roll call&lt;/h2&gt;
&lt;p&gt;As new bots emerged, I added them to my Trove bots Twitter list (here&amp;rsquo;s a &lt;a href=&#34;https://web.archive.org/web/20180627053546/https://twitter.com/wragge/lists/trove-bots/members&#34;&gt;partially archived copy&lt;/a&gt;).&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/trove-bots.png&#34; width=&#34;600&#34; height=&#34;451&#34; alt=&#34;Screen capture from the Trove bots list in Twitter, showing some of the Trove bots.&#34;&gt;
&lt;p&gt;You can get an idea of their diversity from the bot names – a mix of collections, subjects, and places. Here&amp;rsquo;s a list of Trove Twitter bots active in 2019:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;astrove_bot&lt;/li&gt;
&lt;li&gt;AustWWBot&lt;/li&gt;
&lt;li&gt;BotCBR_QLD&lt;/li&gt;
&lt;li&gt;CatsofTrove&lt;/li&gt;
&lt;li&gt;digitisethedawn&lt;/li&gt;
&lt;li&gt;DoSonTrove&lt;/li&gt;
&lt;li&gt;facedepot&lt;/li&gt;
&lt;li&gt;Kasparbot&lt;/li&gt;
&lt;li&gt;KellyGangBot&lt;/li&gt;
&lt;li&gt;LAAL_bot&lt;/li&gt;
&lt;li&gt;NTTimesGazette&lt;/li&gt;
&lt;li&gt;PenrithPictures&lt;/li&gt;
&lt;li&gt;RemixHistorical&lt;/li&gt;
&lt;li&gt;suthlib&lt;/li&gt;
&lt;li&gt;TroveAirBot&lt;/li&gt;
&lt;li&gt;TroveBot&lt;/li&gt;
&lt;li&gt;TrovecakeBot&lt;/li&gt;
&lt;li&gt;TroveCHIAbot&lt;/li&gt;
&lt;li&gt;TroveDutchbot&lt;/li&gt;
&lt;li&gt;TroveEmojiBot&lt;/li&gt;
&lt;li&gt;trovefacesbot&lt;/li&gt;
&lt;li&gt;TroveHoroscopes&lt;/li&gt;
&lt;li&gt;Troveknitbot&lt;/li&gt;
&lt;li&gt;Trovelandbot&lt;/li&gt;
&lt;li&gt;trovelistbot&lt;/li&gt;
&lt;li&gt;TroveMirrorBot&lt;/li&gt;
&lt;li&gt;TroveNewsBot&lt;/li&gt;
&lt;li&gt;TrovePenguinBot&lt;/li&gt;
&lt;li&gt;TroveRefereeBot&lt;/li&gt;
&lt;li&gt;trovesportsmel&lt;/li&gt;
&lt;li&gt;trovetribunebot&lt;/li&gt;
&lt;li&gt;TroveXmasBot&lt;/li&gt;
&lt;li&gt;TsvBulletinBot&lt;/li&gt;
&lt;li&gt;WomenAtWarBot&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I also &lt;a href=&#34;https://wragge.github.io/trovenewsbot2019/&#34;&gt;overhauled @TroveNewsBot&lt;/a&gt; in 2019, adding a number of new features, including article thumbnails.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/trovenewsbot-example.png&#34; width=&#34;600&#34; height=&#34;722&#34; alt=&#34;Screen capture from Twitter showing TroveNewsBot&#39;s reply to the query &#39;library robot&#39;. The reply includes details of the article &#39;Mystery of the week: Robot murder in the library&#39; as well as a thumbnail image of the article.&#34;&gt;
&lt;h2 id=&#34;decline-and-fall&#34;&gt;Decline and fall&lt;/h2&gt;
&lt;p&gt;This golden age of bot-making came to an end late in 2019.&lt;/p&gt;
&lt;p&gt;The first blow came when &lt;a href=&#34;https://updates.timsherratt.org/2019/10/09/creators-and-users.html&#34;&gt;Trove updated its API&lt;/a&gt;. The bots needed some way of selecting random items from the millions available on Trove. This was fairly easy with version one of the API, but version two overhauled the way you accessed items within the result set, making random selections impossible. I eventually managed to hack together &lt;a href=&#34;https://glam-workbench.net/trove-random/&#34;&gt;a random-ish method&lt;/a&gt; that added multiple facets to whittle down the results set until a selection could be made. Using this method, I &lt;a href=&#34;https://updates.timsherratt.org/2019/11/07/the-death-and.html&#34;&gt;created new versions of my Glitch bot recipes&lt;/a&gt; and &lt;a href=&#34;https://101dhhacks.net/trove-bots-for-all/&#34;&gt;updated the tutorial&lt;/a&gt;. But it seemed that the moment had passed, and many bot authors just let their creations die when version one of the API was switched off.&lt;/p&gt;
&lt;p&gt;Surviving bots faced further challenges when Glitch started imposing limits on its free services. Glitch apps were designed to sleep when not in use, so to get your bot tweeting you had to fire regular web requests at it using a cron service. Glitch blocked access by these services and introduced a paid tier for &amp;lsquo;always on&amp;rsquo; apps. More bots died as a result.&lt;/p&gt;
&lt;p&gt;I was thinking about switching my recipes from Glitch to GitHub, making use of templates and scheduled actions. But  while I prevaricated, Twitter started on its long, drawn out death spiral – first imposing new limits on API use, and later becoming the preferred networking site for nazis and transphobes. It was no place for creative bot-making.&lt;/p&gt;
&lt;h2 id=&#34;the-serious-side-of-serendipity&#34;&gt;The serious side of serendipity&lt;/h2&gt;
&lt;p&gt;Bot-making wasn&amp;rsquo;t just about fun – Trove Twitter bots had a serious purpose as well. In &lt;a href=&#34;https://dhdebates.gc.cuny.edu/read/untitled-f2acf72c-a469-49d8-be35-67f9ac1e3a60/section/be608100-95b6-4e48-bfd5-a82a588da8f1&#34;&gt;&amp;lsquo;Unremembering the forgotten&amp;rsquo;&lt;/a&gt; I wrote:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Twitter bots can interrupt our social media meanderings with pinpoints of surprise, conflict, and meaning. And yet they are lightweight, almost disposable, in their development and implementation. No committees were formed, no grants were obtained—they are quick and creative: hacks in the best sense of the word. Bots are an example of how digital skills and tools allow us to try things, to build and play, without any expectation of significance or impact. We can experiment with the parameters of access.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A number of articles on the value of serendipity have considered how collection bots, like @TroveNewsBot, can puncture our research expectations. The random offerings of bots might offer new modes of discovery. In &lt;a href=&#34;https://muse.jhu.edu/article/585974&#34;&gt;&amp;lsquo;Technologies of Serendipity&amp;rsquo;&lt;/a&gt;, Paul Fyfe argues:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;For scholars or other readers, discovery results less from directed searching than from all the tangents encountered on the way. Thus, sources which are plural, redundant, and tangent-rich help promote discovery by the proliferating contingencies of their usage.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Similarly, &lt;a href=&#34;https://doi.org/10.17613/9h30-ke82&#34;&gt;Brett Holman notes&lt;/a&gt; that his own Trove bots help him make connections in his research:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;By impinging on my consciousness when I am preoccupied by other things, @TroveAirRaidBot’s tweets draw my mind back to this research topic that is always sitting at the back of my mind somewhere, and it makes me make connections – randomly, haphazardly, but often very fruitfully leading me to think of something I hadn’t thought of before, or reminding me of something I’d forgotten, or juxtaposing some seemingly unrelated things. It’s a kind of directed serendipity.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Trove Twitter bots were also entry points and interventions – challenging our understanding of access. They offered playful demonstrations of how our experience of GLAM collections might be different. Mitchell Whitelaw &lt;a href=&#34;http://olh.openlibhums.org/articles/10.16995/olh.291/&#34;&gt;suggested that such creations&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;reflect an emerging interest in collections as active sites of meaning-making, and experimentation with how we might encounter such collections in an everyday digital environment.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In &lt;a href=&#34;https://doi.org/10.5281/zenodo.3566879&#34;&gt;&amp;lsquo;Life on the outside&amp;rsquo;&lt;/a&gt;, I considered the lives that GLAM collections might lead beyond institutional confines:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;These bots do not simply present collection items outside of the familiar context of discovery interfaces or online exhibitions, they move the encounter itself into a wholly new space. &amp;hellip; Twitter bots loosen the institutional context of collections to allow them to participate in a space where people already congregate. They send collection items out into the wilds of the web, to find new meanings, new connections and perhaps even new love.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The promise of serendipitous discovery has now faded with the poisoning of social media spaces, and the retreat of many GLAM organisations from experimentation and openness. The need to control now carries more weight than the gift of creativity.&lt;/p&gt;
&lt;h2 id=&#34;what-remains&#34;&gt;What remains&lt;/h2&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-06-18-17-33-22.png&#34; width=&#34;600&#34; height=&#34;653&#34; alt=&#34;Photograph of a Raspberry Pi on a table top. Stck on to the top of the Pi is a photo of a robot from Trove&#39;s newspapers – this photo was also used as TroveNewsBot&#39;s avatar on Twitter.&#34;&gt;
&lt;p&gt;I migrated &lt;a href=&#34;https://wraggebots.net/@trovenewsbot&#34;&gt;@TroveNewsBot to the Fediverse&lt;/a&gt; in May 2023, but sadly it was killed when &lt;a href=&#34;https://updates.timsherratt.org/2025/02/24/years-of-work-on-trove.html&#34;&gt;NLA gatekeepers cancelled my Trove API keys&lt;/a&gt; without warning in January 2025.&lt;/p&gt;
&lt;p&gt;A number of other Trove bots have survived the Twitter implosion and found their way to alternate platforms. &lt;a href=&#34;https://ausglam.space/@digitisethedawn&#34;&gt;@DigitiseTheDawn&lt;/a&gt; now shares articles on the Fediverse, while &lt;a href=&#34;https://bsky.app/profile/trovepenguinbot.bsky.social&#34;&gt;@TrovePenguinBot&lt;/a&gt; is pursuing sardines on Bluesky. Brett Holman has created new versions of his aviation-themed bots – &lt;a href=&#34;https://bsky.app/profile/troveairbot.airminded.org&#34;&gt;@TroveAirBot&lt;/a&gt;, &lt;a href=&#34;https://bsky.app/profile/troveairraidbot.airminded.org&#34;&gt;@TroveAirRaidBot&lt;/a&gt;, and &lt;a href=&#34;https://bsky.app/profile/troveufobot.airminded.org&#34;&gt;@TroveUFOBot&lt;/a&gt; – on Bluesky. I&amp;rsquo;d be happy to add the details of any other survivors I might have missed.&lt;/p&gt;
&lt;p&gt;In an odd coincidence, recent months have brought &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;new restrictions on access to Trove API keys&lt;/a&gt;, and an announcement of the end of Glitch. There&amp;rsquo;s no going back.&lt;/p&gt;
&lt;p&gt;ActivityPub and the Fediverse seem to offer new digital channels through which collections might flow and connect. See, for example, &lt;a href=&#34;https://millsfield.sfomuseum.org/blog/2024/03/12/activitypub/&#34;&gt;Aaron Straup Cope&amp;rsquo;s work&lt;/a&gt; at the SFO Museum. But how do we support and encourage this type of experimentation?&lt;/p&gt;
&lt;p&gt;Personally speaking, this year&amp;rsquo;s been pretty shit so far, and I&amp;rsquo;ve been having trouble finding any motivation. But in pulling together these notes I found a section in &lt;a href=&#34;https://dhdebates.gc.cuny.edu/read/untitled-f2acf72c-a469-49d8-be35-67f9ac1e3a60/&#34;&gt;&amp;lsquo;Unremembering the forgotten&amp;rsquo;&lt;/a&gt; that reminded me of what&amp;rsquo;s at stake:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;There is no open access to the past. There is no key we can enter to recall a life. I create these projects not because I want to contribute to some form of national memory, but because I want to unsettle what it means to remember: to go beyond the listing of names and the cataloging of files to develop modes of access that are confusing, challenging, inspiring, uncomfortable, and sometimes creepy.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There&amp;rsquo;s still plenty of work to do.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.15694209&#34;&gt;&lt;img src=&#34;https://zenodo.org/badge/DOI/10.5281/zenodo.15694209.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Some Archives Week goodies</title>
      <link>https://updates.timsherratt.org/2025/06/11/some-archives-week-goodies.html</link>
      <pubDate>Wed, 11 Jun 2025 17:40:11 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/06/11/some-archives-week-goodies.html</guid>
      <description>&lt;p&gt;It&amp;rsquo;s &lt;a href=&#34;https://www.ica.org/international-archives-week-2025-archives-are-accessible-archives-for-everyone/&#34;&gt;International Archives Week&lt;/a&gt; and I&amp;rsquo;m feeling a bit crook after being double-vaxxed yesterday, so instead of doing something productive, I&amp;rsquo;m just going to make a list of potentially handy archives-related resources from the Wonderful World of Wragge(TM).&lt;/p&gt;
&lt;p&gt;The theme of Archives Week is &lt;strong&gt;#ArchivesAreAccessible&lt;/strong&gt;, which you&amp;rsquo;d have to regard as rather aspirational given the various ways access is limited by law, policy, practice, technology, and history. But what the heck, discussions about &lt;a href=&#34;https://doi.org/10.5281/zenodo.5035855&#34;&gt;the meaning of &lt;em&gt;access&lt;/em&gt;&lt;/a&gt; are always welcome. It&amp;rsquo;s also a little jarring to see the #ArchivesAreAccessible theme being promoted by the National Archives of Australia just a few weeks after they &lt;a href=&#34;https://updates.timsherratt.org/2025/05/19/no-more-harvesting-data-from.html&#34;&gt;implemented new restrictions that make it impossible to get machine-readable data out of their online database&lt;/a&gt;, RecordSearch. But I&amp;rsquo;m trying to move on, so&amp;hellip;&lt;/p&gt;
&lt;h2 id=&#34;zotero&#34;&gt;Zotero&lt;/h2&gt;
&lt;p&gt;All Australian archives users should have Zotero installed. Through the magic of user-contributed &amp;lsquo;translators&amp;rsquo;, &lt;a href=&#34;https://www.zotero.org/&#34;&gt;Zotero&lt;/a&gt; can capture structured data and digitised images from a variety of collections, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://ozglam.chat/t/zotero-translator-for-recordsearch-updated/27&#34;&gt;National Archives of Australia&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2024/08/22/new-zotero-translators.html&#34;&gt;PROV&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2024/08/22/new-zotero-translators.html&#34;&gt;Queensland State Archives&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2022/07/14/calling-all-tasmanian.html&#34;&gt;State Library and Archives of Tasmania&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;State Records Office of WA&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first four are my work, so &lt;a href=&#34;https://timsherratt.au/&#34;&gt;let me know&lt;/a&gt; if you have any suggestions or problems.&lt;/p&gt;
&lt;h2 id=&#34;indexes-to-records&#34;&gt;Indexes to records&lt;/h2&gt;
&lt;p&gt;Archives are well represented in the GLAM Workbench&amp;rsquo;s &lt;a href=&#34;https://glam-workbench.net/glam-datasets-from-gov-portals/&#34;&gt;list of GLAM datasets shared through government open data portals&lt;/a&gt;. Many of these datasets are indexes that link records to people and places. They&amp;rsquo;re openly licensed and &lt;a href=&#34;https://muse.jhu.edu/article/794331&#34;&gt;under used&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;NSW State Archives has also compiled a lot of indexes. These aren&amp;rsquo;t shared through a portal, but you can &lt;a href=&#34;https://glam-workbench.net/nsw-state-archives/&#34;&gt;harvest them from their website&lt;/a&gt;. To save you the effort, I&amp;rsquo;ve &lt;a href=&#34;https://glam-workbench.net/nsw-state-archives/index-repository/&#34;&gt;created a repository of the harvested indexes&lt;/a&gt;.&lt;/p&gt;
 &lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-04-09-15-19-57.png&#34; width=&#34;600&#34; height=&#34;624&#34; alt=&#34;Screenshot of the main search page of GLAM Name Indexes&#34;&gt;
&lt;p&gt;I&amp;rsquo;ve pulled many of these sources together to build a mega database of name indexes that lets you search for people across millions (yes millions) of records. As well as the sources described above, it also includes &lt;a href=&#34;https://updates.timsherratt.org/2025/04/09/more-than-million-rows-of.html&#34;&gt;people-related records from the Public Record Office Victoria&amp;rsquo;s API&lt;/a&gt;. All together, the &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Index Search&lt;/a&gt; contains almost 13 million records in 293 datasets from 10 GLAM organisations. And unlike the commercial genealogical databases, it&amp;rsquo;s free! (If only Australian libraries and archives would link to it from their family history guides&amp;hellip;)&lt;/p&gt;
&lt;h2 id=&#34;other-datasets&#34;&gt;Other datasets&lt;/h2&gt;
&lt;p&gt;Before my scrapers were scuppered by the NAA, I managed to compile a few datasets. Much of this data documents the way RecordSearch itself has changed, and while it might not be of use to researchers seeking particular records, it could &lt;a href=&#34;https://updates.timsherratt.org/2024/09/20/preserving-the-history.html&#34;&gt;help future researchers&lt;/a&gt; who are trying to understand the impact of online collections on the practice of history. This includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Summary data about all series in RecordSearch – a CSV file containing basic descriptive information about all the series  currently registered on RecordSearch as well as the total number of  items described, digitised, and in each access category. Harvests from &lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/blob/master/series_totals_May_2021.csv&#34;&gt;May 2021&lt;/a&gt; and &lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/blob/master/series_totals_April_2022.csv&#34;&gt;April 2022&lt;/a&gt; are currently available, and I&amp;rsquo;ll soon be adding May 2025.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.14744050&#34;&gt;Files digitised by the National Archives of Australia since 2021&lt;/a&gt; – annual compilations of data harvested from RecordSearch&amp;rsquo;s list of recently digitised files. (The automated weekly harvests are now dead.)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.14769172&#34;&gt;Records held by the National Archives of Australia with the access status of &amp;lsquo;closed&amp;rsquo;&lt;/a&gt; – a &lt;em&gt;whole decade&lt;/em&gt; of annual harvests of records held by the NAA that have the access status of &amp;lsquo;closed&amp;rsquo; (withheld from public  access). The harvests were run on or about 1 January each year from 2016 to 2025. The aim in saving this data is to enable long-term analysis of the NAA&amp;rsquo;s access examination process.&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2022/071c7cad60.png&#34; width=&#34;600&#34; height=&#34;571&#34; alt=&#34;Screenshot of visualisation showing the relationships between Australian government agencies over time&#34;&gt;
&lt;p&gt;Thanks to &lt;a href=&#34;https://wikimedia.org.au/wiki/Exploring_government_departments_by_linking_Wikidata_to_the_National_Archives_of_Australia&#34;&gt;support from Wikimedia Australia&lt;/a&gt;, I&amp;rsquo;ve also added information about Australian government agencies from RecordSearch to WikiData. As a result you can get a list of Australian government departments since Federation using this &lt;a href=&#34;https://w.wiki/5tVh&#34;&gt;Wikidata query&lt;/a&gt;. I&amp;rsquo;ve used the data to build &lt;a href=&#34;https://glam-workbench.net/wikidata/examples/govt-agencies-network.html&#34;&gt;this interactive visualisation&lt;/a&gt; of the relationships between government departments. There&amp;rsquo;s some more examples in the &lt;a href=&#34;https://glam-workbench.net/wikidata/&#34;&gt;Wikidata section of the GLAM Workbench&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Also from the NAA is my &lt;a href=&#34;https://github.com/wragge/diy-redactionart&#34;&gt;collection of #redactionart&lt;/a&gt; found in ASIO surveillance files.&lt;/p&gt;
&lt;p&gt;As part of the &lt;a href=&#34;https://www.realfaceofwhiteaustralia.net/&#34;&gt;Real Face of White Australia&lt;/a&gt; project, we&amp;rsquo;ve been transcribing records created by the administration of the White Australia Policy, now held by the NAA. Some of the results are available in &lt;a href=&#34;https://github.com/wragge/realface-data&#34;&gt;this data repository&lt;/a&gt;. (Note to self – I need to update this with the latest data!)&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/anu-archives/&#34;&gt;ANU Archives section of the GLAM Workbench&lt;/a&gt; includes some datasets extracted from the Sydney Stock exchange stock and share lists. (Just noticed some CloudStor links in there that I need to fix&amp;hellip;)&lt;/p&gt;
&lt;h2 id=&#34;public-record-office-victoria&#34;&gt;Public Record Office Victoria&lt;/h2&gt;
&lt;p&gt;PROV gets it&amp;rsquo;s own section because, as far as I know, they&amp;rsquo;re the only Australian archives with a &lt;a href=&#34;https://prov.vic.gov.au/about-us/our-blog/new-prov-public-api&#34;&gt;functioning public API&lt;/a&gt;. (Brief moment of silence to remember the APIs that have come and gone over the years&amp;hellip;).&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s now a &lt;a href=&#34;https://glam-workbench.net/prov/&#34;&gt;PROV section of the GLAM Workbench&lt;/a&gt;, that includes &lt;a href=&#34;https://glam-workbench.net/prov/getting-started/&#34;&gt;a &amp;lsquo;getting started&amp;rsquo; notebook&lt;/a&gt; to document the basic functionality of the API. There&amp;rsquo;s some &lt;a href=&#34;https://updates.timsherratt.org/2025/04/30/new-prov-section-added-to.html&#34;&gt;more information in this blog post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve also used the PROV API to create &lt;a href=&#34;https://updates.timsherratt.org/2025/04/10/using-the-public-record-office.html&#34;&gt;an automated data dashboard&lt;/a&gt; that provides an overview of their collection. It&amp;rsquo;s updated every Sunday.&lt;/p&gt;
&lt;h2 id=&#34;other-things&#34;&gt;Other things&lt;/h2&gt;
&lt;p&gt;RecordSearch users will understand the frustration of trying to share a url to a record, only to get an annoying error. There are a few ways around this (Zotero saves persistent links to things you save), but for a quick fix I created a simple tool to &lt;a href=&#34;https://recordsearch-links.glitch.me/&#34;&gt;create persistent links in RecordSearch&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re comfortable with a little browser hacking, you can also &lt;a href=&#34;https://gist.github.com/wragge/b2af9dc56f7cb0a9476b#file-recordsearch_show_pages-user-js&#34;&gt;install this handy RecordSearch userscript&lt;/a&gt; (scroll to the bottom for installation instructions). It improves the functionality of RecordSearch in a few different ways, such as by indicating the number of pages in a digitised file.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/68747470733a2f2f646c2e64726f70626f7875736572636f6e74656e742e636f6d2f732f666.png&#34; width=&#34;600&#34; height=&#34;467&#34; alt=&#34;Screenshot of RecordSearch showing the number of pages in digitised files&#34;&gt;
&lt;h2 id=&#34;any-ideas&#34;&gt;Any ideas?&lt;/h2&gt;
&lt;p&gt;If you have any ideas for additional resources or datasets, or you&amp;rsquo;re having problems with an online collection, feel free to drop a note in the &lt;a href=&#34;https://github.com/GLAM-Workbench/glam-workbench.github.io/issues&#34;&gt;GLAM Workbench repository&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>New dataset – Trove links shared on Twitter, 2009 to 2020</title>
      <link>https://updates.timsherratt.org/2025/06/10/new-dataset-trove-links-shared.html</link>
      <pubDate>Tue, 10 Jun 2025 12:30:54 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/06/10/new-dataset-trove-links-shared.html</guid>
      <description>&lt;p&gt;A few years ago, I harvested the details of tweets that included links to Trove. The data has just been sitting on my computer, so I thought I should package it up and share, in case it&amp;rsquo;s of use to anyone.&lt;/p&gt;
&lt;p&gt;The story is that back in 2021, I was working on the article &lt;a href=&#34;https://doi.org/10.5281/zenodo.5595420&#34;&gt;&amp;lsquo;More than newspapers&amp;rsquo;&lt;/a&gt; for a special section of &lt;em&gt;History Australia&lt;/em&gt; focusing on Trove. I was thinking that I might include something about the way Trove newspaper articles were mobilised within online discussions about history – a topic I first explored in &lt;a href=&#34;https://doi.org/10.5281/zenodo.3566879&#34;&gt;&amp;lsquo;Life on the outside: connections, contexts, and the wild, wild web&amp;rsquo;&lt;/a&gt;, my keynote for the Annual Conference of the Japanese Association of Digital Humanities in 2014. In the end, the article went in another direction, so I didn&amp;rsquo;t use the data.&lt;/p&gt;
&lt;p&gt;I remembered this recently and thought I should I should do something with it. I&amp;rsquo;ve now created a dataset and &lt;a href=&#34;https://doi.org/10.5281/zenodo.15627800&#34;&gt;shared it on Zenodo&lt;/a&gt;. I&amp;rsquo;m &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;not working on Trove any more&lt;/a&gt;, but I&amp;rsquo;m hoping that someone else might find the data useful!&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.15694063&#34;&gt;&lt;img src=&#34;https://zenodo.org/badge/DOI/10.5281/zenodo.15694063.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The dataset contains information about tweets from 2009 to 2020 that include links to &lt;a href=&#34;https://trove.nla.gov.au/&#34;&gt;Trove&lt;/a&gt;. The tweet data was compiled using &lt;a href=&#34;https://twarc-project.readthedocs.io/en/latest/&#34;&gt;Twarc&lt;/a&gt; in May 2021, under Twitter&amp;rsquo;s academic access program. The search queries used were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;url:nla.gov.au/nla.news&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;url:trove.nla.gov.au&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;url:newspapers.nla.gov.au&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Many of the tweets were produced by bots. Fortunately, I&amp;rsquo;d been maintaining a list of &lt;a href=&#34;https://web.archive.org/web/20180627053546/https://twitter.com/wragge/lists/trove-bots/members&#34;&gt;Trove bots&lt;/a&gt; on Twitter, so I used the list to separate the tweets into two files, one for bots and one for ordinary users.&lt;/p&gt;
&lt;p&gt;To respect user intentions and comply with the Twitter API terms of use, I removed all the tweet information except for &lt;code&gt;tweet_id&lt;/code&gt; and &lt;code&gt;tweet_date&lt;/code&gt; from the files. If it hasn&amp;rsquo;t been deleted, the full data for each tweet can probably be obtained from the X API using the &lt;code&gt;tweet_id&lt;/code&gt;, though you might need a paid subscription.&lt;/p&gt;
&lt;p&gt;The two main files are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;trove_url_tweets.csv&lt;/code&gt; – links shared by human users (although it may include some unidentified bots)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;trove_url_tweets_bots.csv&lt;/code&gt; – links shared by bots&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I also created some additional data files:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;trove_url_totals.csv&lt;/code&gt; – the number of times each Trove link was shared by users (not including bots)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;active_users_per_year.csv&lt;/code&gt; – the number of unique users each year who shared a link to Trove&lt;/li&gt;
&lt;li&gt;&lt;code&gt;active_bots_per_year.csv&lt;/code&gt; – the number of active bots each year sharing links to Trove&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There&amp;rsquo;s more information about the structure and contents of the data files &lt;a href=&#34;https://doi.org/10.5281/zenodo.15694063&#34;&gt;in the Zenodo record&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;overview&#34;&gt;Overview&lt;/h2&gt;
&lt;p&gt;I haven&amp;rsquo;t explored the data in detail, but here&amp;rsquo;s some quick summaries to give you a taste.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;summary&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;number of unique users sharing Trove links&lt;/td&gt;
&lt;td&gt;9,296&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;number of bots sharing Trove links&lt;/td&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;number of tweets by humans containing Trove links&lt;/td&gt;
&lt;td&gt;48,323&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;number of tweets by bots containing Trove links&lt;/td&gt;
&lt;td&gt;318,767&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;number of unique links shared by humans&lt;/td&gt;
&lt;td&gt;36,906&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;number of unique links shared by bots&lt;/td&gt;
&lt;td&gt;270,474&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;What types of links were people sharing?&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;types of link shared by humans&lt;/th&gt;
&lt;th&gt;count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;newspaper article&lt;/td&gt;
&lt;td&gt;34,568&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;other (search queries, home page etc)&lt;/td&gt;
&lt;td&gt;8,388&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;work (items other than newspapers – books, maps, photos etc)&lt;/td&gt;
&lt;td&gt;4,856&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;newspaper page&lt;/td&gt;
&lt;td&gt;1,378&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;newspaper title&lt;/td&gt;
&lt;td&gt;406&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;How did the number of links shared by humans vary across time?&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/tweets-per-year-updated.png&#34; width=&#34;600&#34; height=&#34;306&#34; alt=&#34;Bar chart showing the number of Trove links shared on Twitter by year from 2009 to 2020. Colours indicate the type of Trove resource.&#34;&gt;
&lt;p&gt;Which articles or pages were shared most often by humans? Here&amp;rsquo;s the top ten (click on the link to view).&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;trove_id&lt;/th&gt;
&lt;th&gt;trove_type&lt;/th&gt;
&lt;th&gt;tweets&lt;/th&gt;
&lt;th&gt;retweets&lt;/th&gt;
&lt;th&gt;quotes&lt;/th&gt;
&lt;th&gt;total times shared&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/75869223&#34;&gt;75869223&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;1,232&lt;/td&gt;
&lt;td&gt;61&lt;/td&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;td&gt;1,327&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/1298497&#34;&gt;1298497&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;141&lt;/td&gt;
&lt;td&gt;1,028&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;1,222&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/102074798&#34;&gt;102074798&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;74&lt;/td&gt;
&lt;td&gt;693&lt;/td&gt;
&lt;td&gt;77&lt;/td&gt;
&lt;td&gt;844&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/68141866&#34;&gt;68141866&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;138&lt;/td&gt;
&lt;td&gt;522&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;708&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/41602327&#34;&gt;41602327&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;633&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;663&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/100645214&#34;&gt;100645214&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;111&lt;/td&gt;
&lt;td&gt;467&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;598&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/page/502650&#34;&gt;502650&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;page&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;513&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;526&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/60828173&#34;&gt;60828173&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;td&gt;444&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;511&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/4173156&#34;&gt;4173156&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;321&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;384&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/79410604&#34;&gt;79410604&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;article&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;303&lt;/td&gt;
&lt;td&gt;69&lt;/td&gt;
&lt;td&gt;374&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The most shared article reports that PM Menzies had described Hitler as a &amp;lsquo;great man&amp;rsquo; at a meeting in July 1939. However, most of the tweets sharing this link came from a single user. A number of the other articles relate to the weather, a reflection of the fact that Trove&amp;rsquo;s newspaper articles have been mobilised on both sides of the climate change debate.&lt;/p&gt;
&lt;p&gt;How many Twitter users were sharing links to Trove each year?&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/tweets-humans-per-year-updated.png&#34; width=&#34;600&#34; height=&#34;340&#34; alt=&#34;Bar chart showing the number of Twitter users sharing links to Trove each year from 2009 to 2020&#34;&gt;
&lt;p&gt;I haven&amp;rsquo;t included any of the bot data in these summaries because I think I&amp;rsquo;ll write a second bot-themed post – coming soon!&lt;/p&gt;
&lt;h2 id=&#34;updates&#34;&gt;Updates&lt;/h2&gt;
&lt;p&gt;I updated the data in this post on 19 June 2025, as I realised some Twitter accounts were originally run by humans before being bot-ified.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>GLAM Workbench ­– preprint for &#39;Building User-Friendly Toolkits and Platforms for Digital Humanities&#39;</title>
      <link>https://updates.timsherratt.org/2025/06/05/glam-workbench-preprint-for-building.html</link>
      <pubDate>Thu, 05 Jun 2025 16:16:53 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/06/05/glam-workbench-preprint-for-building.html</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a preprint of my contribution to the publication &amp;lsquo;Building User-Friendly Toolkits and Platforms for Digital Humanities&amp;rsquo;. It provides a brief overview of the GLAM Workbench. I had to leave a lot out, but hopefully it provides a useful summary of what the GLAM Workbench is, and what I&amp;rsquo;d like it to be.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.15597924&#34;&gt;&lt;img src=&#34;https://zenodo.org/badge/DOI/10.5281/zenodo.15597924.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;The GLAM Workbench is a collection of tools and resources created to help researchers use and explore the digital collections of GLAM organisations (galleries, libraries, archives, and museums).&lt;sup id=&#34;fnref:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt; It&amp;rsquo;s mainly focused on collections from Australia and New Zealand, but some sections venture across international boundaries to explore topics such as web archives and Wikidata.&lt;/p&gt;
&lt;p&gt;GLAM organisations make a lot of rich cultural data available online, but getting that data in a machine-readable form that can be aggregated and analysed is often difficult. The GLAM Workbench tries to fill this gap by providing code examples and API documentation, but data access alone is not enough. Researchers need to understand the history, structure, and extent of the data – both its limits and its possibilities. By sharing snapshots, building overviews, and exploring patterns and inconsistencies, the GLAM Workbench also attempts to contextualise GLAM collections and open them to new types of questions.&lt;sup id=&#34;fnref:2&#34;&gt;&lt;a href=&#34;#fn:2&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&#34;history-and-motivation&#34;&gt;History and motivation&lt;/h2&gt;
&lt;p&gt;I created the GLAM Workbench in 2017, but it incorporates the latest versions of tools, such as the Trove Newspaper Harvester, which I&amp;rsquo;ve been maintaining for more than 15 years.&lt;sup id=&#34;fnref:3&#34;&gt;&lt;a href=&#34;#fn:3&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;3&lt;/a&gt;&lt;/sup&gt; One of my motivations was simply to bring together useful snippets, notes, and doodles from a variety of blog posts, web applications, and code repositories, and make them available in a form that could be more easily navigated and maintained.&lt;/p&gt;
&lt;p&gt;I was also keen to explore the way that Jupyter notebooks combine code and narrative. I wanted to find ways to support researchers as they developed their digital skills and confidence, not just dump them at the command line or point them to an app.&lt;/p&gt;
&lt;p&gt;The ongoing development of the GLAM Workbench is also part of my own research. I&amp;rsquo;m interested in the meaning of access within the context of GLAM collections. What changes when you can download data and explore collections beyond the limitations of the web interface?&lt;sup id=&#34;fnref:4&#34;&gt;&lt;a href=&#34;#fn:4&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&#34;contents-and-technologies&#34;&gt;Contents and technologies&lt;/h2&gt;
&lt;p&gt;At its heart, the GLAM Workbench comprises at least 171 Jupyter notebooks and 59 datasets shared through more than 70 GitHub repositories.&lt;sup id=&#34;fnref:5&#34;&gt;&lt;a href=&#34;#fn:5&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;5&lt;/a&gt;&lt;/sup&gt; Added to this are a number of web apps, online databases, and guides to related resources. Code from some notebooks has also been spun off into independent Python packages. All of this is brought together within a single documentation site, built using MkDocs Material.&lt;/p&gt;
&lt;p&gt;The contents are mostly organised by institution, reflecting the idiosyncrasies of the data. I&amp;rsquo;ve partially implemented tags to draw together similar resources across institutions, but this needs to be made more consistent, ideally using the TaDiRAH taxonomy.&lt;sup id=&#34;fnref:6&#34;&gt;&lt;a href=&#34;#fn:6&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;6&lt;/a&gt;&lt;/sup&gt; Many of the notebooks describe methods for accessing data and building datasets. Others demonstrate techniques for visualisation and analysis, suggest workarounds for limits imposed by collection interfaces, or provide example-driven documentation for APIs and datasets.&lt;/p&gt;
&lt;p&gt;There is no single platform or server underlying the GLAM Workbench. Instead, it follows a pattern described in the ARDC Community Data Lab&amp;rsquo;s architecture principles as &amp;lsquo;infrastructure at rest&amp;rsquo;.&lt;sup id=&#34;fnref:7&#34;&gt;&lt;a href=&#34;#fn:7&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;7&lt;/a&gt;&lt;/sup&gt; Notebooks can be run as required in a variety of contexts from cloud services to local computers. This is made possible by standardised configuration files and automated processes that build virtual computing environments from each GitHub repository.&lt;sup id=&#34;fnref:8&#34;&gt;&lt;a href=&#34;#fn:8&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;8&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&#34;impact-and-engagement&#34;&gt;Impact and engagement&lt;/h2&gt;
&lt;p&gt;The GLAM Workbench has helped to expand understanding of the research possibilities of GLAM collection data. The list of publications citing the GLAM Workbench or one of its embedded tools now includes more than 100 entries.&lt;sup id=&#34;fnref:9&#34;&gt;&lt;a href=&#34;#fn:9&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;9&lt;/a&gt;&lt;/sup&gt; Some of these relate to individual research projects, while others survey the practices of GLAM organisations and the needs of research infrastructure around the world.&lt;/p&gt;
&lt;p&gt;My work on the GLAM Workbench has helped inspire organisations such as the National Library of Scotland to explore new ways of supporting digital research.&lt;sup id=&#34;fnref:10&#34;&gt;&lt;a href=&#34;#fn:10&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;10&lt;/a&gt;&lt;/sup&gt; A recent report from the &amp;lsquo;Towards a National Collection&amp;rsquo; project in the UK has mentioned the GLAM Workbench alongside a number of national libraries in Europe and the USA for &amp;lsquo;encouraging innovative research and expanding public engagement with heritage resources&amp;rsquo;.&lt;sup id=&#34;fnref:11&#34;&gt;&lt;a href=&#34;#fn:11&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;11&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;And yet, there are disappointments. Most of the Australian GLAM organisations whose collections are featured in the GLAM Workbench have shown little interest in sharing or engaging with its resources. This makes it difficult to get tools to the people who could benefit from them. There&amp;rsquo;s some irony in the fact that the websites of the National Library of Scotland, the British Library, the UK National Archives, the V&amp;amp;A Museum, and DigitalNZ all include links to the GLAM Workbench, but the National Library of Australia (NLA) and the National Archives of Australia (NAA) do not.&lt;/p&gt;
&lt;h2 id=&#34;maintenance-and-sustainability&#34;&gt;Maintenance and sustainability&lt;/h2&gt;
&lt;p&gt;While a number of individuals have contributed notebooks and additions to the GLAM Workbench, it remains essentially a one man operation. Over the years, I&amp;rsquo;ve sought to ease the maintenance burden by automating processes, adding some basic testing frameworks, and generating machine-readable metadata that summarises the contents of each repository. For example, I created a GLAM Workbench repository template that makes it easy to start work on a new topic.&lt;sup id=&#34;fnref:12&#34;&gt;&lt;a href=&#34;#fn:12&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;12&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Development of the web archives section of the GLAM Workbench was made possible by a grant from the International Internet Preservation Consortium, and the section&amp;rsquo;s ongoing maintenance is supported by the British Library.&lt;sup id=&#34;fnref:13&#34;&gt;&lt;a href=&#34;#fn:13&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;13&lt;/a&gt;&lt;/sup&gt; I&amp;rsquo;m grateful too for my GitHub sponsors who help cover some of my cloud hosting bills, and to the ARDC for funding to integrate RO-Crate metadata.&lt;sup id=&#34;fnref:14&#34;&gt;&lt;a href=&#34;#fn:14&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;14&lt;/a&gt;&lt;/sup&gt; But beyond this, the GLAM Workbench has received no dedicated funding or institutional support. It has, nonetheless, outlived some well-funded digital infrastructure projects in the HASS sector.&lt;/p&gt;
&lt;p&gt;Sustainability means more than money, though. The GLAM Workbench doesn&amp;rsquo;t have to continue in its current form to have a long-term impact. My focus is on ensuring that its contents are open to future reuse and modification. Everything is openly licensed, published through GitHub, and preserved in Zenodo. If tools are useful they can live on, independent of me.&lt;/p&gt;
&lt;h2 id=&#34;the-future&#34;&gt;The future&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m writing this at a difficult time. Changes wrought by the NLA and NAA in early 2025 have made it impossible for me to continue work on the Trove and RecordSearch sections of the GLAM Workbench.&lt;sup id=&#34;fnref:15&#34;&gt;&lt;a href=&#34;#fn:15&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;15&lt;/a&gt;&lt;/sup&gt; In the Trove section alone, there are more than 70 notebooks.&lt;/p&gt;
&lt;p&gt;The GLAM Workbench is not my job, no-one pays me. I work on it because I think its useful and important, and because I enjoy the process of solving problems and helping researchers. The NLA&amp;rsquo;s actions, in particular, have robbed me of that joy, and made me consider whether I want to continue. Research infrastructure is people.&lt;/p&gt;
&lt;p&gt;On the other hand, there are many more GLAM collections for me to explore. I&amp;rsquo;m also hoping to find new ways of collaborating with individuals and institutions. I&amp;rsquo;m often inspired to create new tools and resources by gnarly questions from researchers. While such questions continue, so the GLAM Workbench will grow.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;
&lt;p&gt;Ames, Sarah, and Lucy Havens. “Exploring National Library of Scotland Datasets with Jupyter Notebooks.” &lt;em&gt;IFLA Journal&lt;/em&gt;, December 27, 2021. &lt;a href=&#34;https://doi.org/10.1177/03400352211065484&#34;&gt;doi.org/10.1177/0&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Bailey, Rebecca, Javier Pereda, Chris Michaels, and Tom Callahan. “Unlocking the Potential of Digital Collections. A Call to Action.” Towards a National Collection, November 21, 2024. &lt;a href=&#34;https://doi.org/10.5281/zenodo.13838916&#34;&gt;doi.org/10.5281/z&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Candela, Gustavo, Sally Chambers, and Tim Sherratt. “An Approach to Assess the Quality of Jupyter Projects Published by GLAM Institutions.” &lt;em&gt;Journal of the Association for Information Science and Technology&lt;/em&gt; 74, no. 13 (2023): 1550–64. &lt;a href=&#34;https://doi.org/10.1002/asi.24835&#34;&gt;doi.org/10.1002/a&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;“GLAM Workbench (GitHub Organisation).” Accessed June 5, 2025. &lt;a href=&#34;https://github.com/GLAM-Workbench&#34;&gt;github.com/GLAM-Work&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;IIPC. “Asking Questions with Web Archives – Introductory Notebooks for Historians.” Accessed June 5, 2025. &lt;a href=&#34;https://netpreserve.org/projects/jupyter-notebooks-for-historians/&#34;&gt;netpreserve.org/projects/&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Jackson, Andy. “GLAM Workbench Update.” UK Web Archive Blog. Accessed June 2, 2025. &lt;a href=&#34;https://blogs.bl.uk/webarchive/2022/09/glam-workbench-update.html&#34;&gt;blogs.bl.uk/webarchiv&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Sefton, Peter, Tom Honeyman, Tim Sherratt, and Conal Tuohy. “The ARDC Community Data Lab Architecture: Research Software Deployment Principles and Patterns for Integrity, Reproducibility and Sustainability,” May 10, 2024. &lt;a href=&#34;https://zenodo.org/records/11169744&#34;&gt;zenodo.org/records/1&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Sherratt, Tim. “Develop a New GLAM Workbench Repository.” GLAM Workbench. Accessed June 5, 2025. &lt;a href=&#34;https://glam-workbench.net/get-involved/developing-repositories/&#34;&gt;glam-workbench.net/get-invol&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “Farewell Trove.” &lt;em&gt;Tim Sherratt – Sharing Recent Updates and Work-in-Progress&lt;/em&gt;, May 7, 2025. &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;updates.timsherratt.org/2025/05/0&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “GLAM Workbench.” Zenodo, June 5, 2025. &lt;a href=&#34;https://doi.org/10.5281/zenodo.15597489&#34;&gt;doi.org/10.5281/z&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “GLAM Workbench.” Accessed June 5, 2025. &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;glam-workbench.net/.&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “GLAM Workbench Citations.” &lt;em&gt;GLAM Workbench&lt;/em&gt;. Accessed June 5, 2025. &lt;a href=&#34;https://glam-workbench.net/citations/&#34;&gt;glam-workbench.net/citations&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “Hacking Heritage: Understanding the Limits of Online Access.” In &lt;em&gt;The Routledge International Handbook of New Digital Practices in Galleries, Libraries, Archives, Museums and Heritage Sites&lt;/em&gt;, edited by H Lewi, W Smith, S Cooke, and D vom Lehn, 116–30. London &amp;amp; New York: Routledge, 2020. &lt;a href=&#34;https://doi.org/10.5281/zenodo.5035855&#34;&gt;doi.org/10.5281/z&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “No More Harvesting Data from the National Archives of Australia.” &lt;em&gt;Tim Sherratt – Sharing Recent Updates and Work-in-Progress&lt;/em&gt;, May 19, 2025. &lt;a href=&#34;https://updates.timsherratt.org/2025/05/19/no-more-harvesting-data-from.html&#34;&gt;updates.timsherratt.org/2025/05/1&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “Some Important Updates for the Trove Newspaper &amp;amp; Gazette Harvester.” &lt;em&gt;Tim Sherratt – Sharing Recent Updates and Work-in-Progress&lt;/em&gt;, August 31, 2023. &lt;a href=&#34;https://updates.timsherratt.org/2023/08/31/some-important-updates.html&#34;&gt;updates.timsherratt.org/2023/08/3&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “Supporters.” &lt;em&gt;GLAM Workbench&lt;/em&gt;. Accessed June 5, 2025. &lt;a href=&#34;https://glam-workbench.net/get-involved/supporters/&#34;&gt;glam-workbench.net/get-invol&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “Trove Newspapers: Data Dashboard.” Accessed June 5, 2025. &lt;a href=&#34;https://wragge.github.io/trove-newspaper-totals/&#34;&gt;wragge.github.io/trove-new&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;———. “Trove-Newspaper-Harvester.” Python, October 23, 2023. &lt;a href=&#34;https://doi.org/10.5281/zenodo.7103174&#34;&gt;doi.org/10.5281/z&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Sherratt, Tim, Harry Keightley, Ben Foley, and Michael Niemann. “GLAM-Workbench/Glam-Workbench-Template.” Python. GLAM Workbench, August 24, 2023. &lt;a href=&#34;https://github.com/GLAM-Workbench/glam-workbench-template&#34;&gt;github.com/GLAM-Work&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;“TaDiRAH The Taxonomy of Digital Research Activities in the Humanities.” Accessed June 5, 2025. &lt;a href=&#34;https://tadirah.info/&#34;&gt;tadirah.info/.&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Talboom, Leontien, and Mark Bell. “Keeping It under Lock and Keywords: Exploring New Ways to Open up the Web Archives with Notebooks.” &lt;em&gt;Archival Science&lt;/em&gt;, July 4, 2022. &lt;a href=&#34;https://doi.org/10.1007/s10502-022-09391-6&#34;&gt;doi.org/10.1007/s&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;“Trove Historical Data.” Accessed June 5, 2025. &lt;a href=&#34;https://zenodo.org/communities/trove-historical-data/records?q=&amp;amp;l=list&amp;amp;p=1&amp;amp;s=10&amp;amp;sort=newest&#34;&gt;zenodo.org/communiti&amp;hellip;&lt;/a&gt;.&lt;/p&gt;
&lt;section class=&#34;footnotes&#34; role=&#34;doc-endnotes&#34;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&#34;fn:1&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Sherratt, “GLAM Workbench.”&amp;#160;&lt;a href=&#34;#fnref:1&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:2&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;See, for example: Sherratt, “Trove Newspapers: Data Dashboard.” and “Trove Historical Data.”&amp;#160;&lt;a href=&#34;#fnref:2&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:3&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Sherratt, “Trove-Newspaper-Harvester.”&amp;#160;&lt;a href=&#34;#fnref:3&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:4&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;See, for example: Sherratt, “Hacking Heritage: Understanding the Limits of Online Access.”&amp;#160;&lt;a href=&#34;#fnref:4&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:5&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;“GLAM Workbench (GitHub Organisation).”&amp;#160;&lt;a href=&#34;#fnref:5&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:6&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;“TaDiRAH The Taxonomy of Digital Research Activities in the Humanities.”&amp;#160;&lt;a href=&#34;#fnref:6&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:7&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Sefton et al., “The ARDC Community Data Lab Architecture.”&amp;#160;&lt;a href=&#34;#fnref:7&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:8&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;For more on best practices in sharing Jupyter projects, see: Candela, Chambers, and Sherratt, “An Approach to Assess the Quality of Jupyter Projects Published by GLAM Institutions.”&amp;#160;&lt;a href=&#34;#fnref:8&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:9&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Sherratt, “GLAM Workbench Citations.”&amp;#160;&lt;a href=&#34;#fnref:9&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:10&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Ames and Havens, “Exploring National Library of Scotland Datasets with Jupyter Notebooks.” For another example of the GLAM Workbench&amp;rsquo;s influence, see: Talboom and Bell, “Keeping It under Lock and Keywords.”&amp;#160;&lt;a href=&#34;#fnref:10&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:11&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Bailey et al., “Unlocking the Potential of Digital Collections. A Call to Action,” 58.&amp;#160;&lt;a href=&#34;#fnref:11&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:12&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Sherratt et al., “GLAM-Workbench/Glam-Workbench-Template.” For documentation see: Sherratt, “Develop a New Repository.”&amp;#160;&lt;a href=&#34;#fnref:12&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:13&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;“Asking Questions with Web Archives – Introductory Notebooks for Historians.”; Jackson, “GLAM Workbench Update.”&amp;#160;&lt;a href=&#34;#fnref:13&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:14&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Sherratt, “Supporters.”; Sherratt, “Some Important Updates for the Trove Newspaper &amp;amp; Gazette Harvester.”&amp;#160;&lt;a href=&#34;#fnref:14&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:15&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Sherratt, “Farewell Trove.”; Sherratt, “No More Harvesting Data from the National Archives of Australia.”&amp;#160;&lt;a href=&#34;#fnref:15&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</description>
    </item>
    
    <item>
      <title>No more harvesting data from the National Archives of Australia</title>
      <link>https://updates.timsherratt.org/2025/05/19/no-more-harvesting-data-from.html</link>
      <pubDate>Mon, 19 May 2025 18:57:26 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/05/19/no-more-harvesting-data-from.html</guid>
      <description>&lt;p&gt;A couple of weeks ago &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;I bid farewell to Trove&lt;/a&gt; due to the cancellation of my API keys and the NLA&amp;rsquo;s lack of transparency around changes to API access. Now it seems I have to wave goodbye to 16+ years of work on RecordSearch, the National Archives of Australia&amp;rsquo;s online database.&lt;/p&gt;
&lt;p&gt;I noticed this morning that my weekly &lt;a href=&#34;https://github.com/wragge/naa-recently-digitised&#34;&gt;harvest of recently digitised files&lt;/a&gt; in RecordSearch had failed. A quick check showed that my harvester was being blocked by Cloudflare&amp;rsquo;s bot protection software. I wasn&amp;rsquo;t really surprised. Websites are using tools like this to protect themselves against AI scraper bots, and I&amp;rsquo;d already seen it in action on another Australian government site. In the war between content providers and AI scrapers, researchers and digital preservation efforts are copping collateral damage.&lt;/p&gt;
&lt;p&gt;But while we can&amp;rsquo;t blame the NAA for safeguarding its systems, we can be critical of the fact that it still doesn&amp;rsquo;t provide its collection data in machine-readable form. There were a couple of datasets shared for a GovHack event many years ago, a short-lived API for the WWI service records in series B2455, and an API attached to a beta discovery service that never saw the light of day (despite many $$$ being spent on it). Without direct access to the data, researchers have had to scrape it from RecordSearch&amp;rsquo;s web interface. That&amp;rsquo;s no longer possible.&lt;/p&gt;
&lt;p&gt;I &lt;a href=&#34;https://discontents.com.au/tag/recordsearch/index.html%3Fpaged=2.html&#34;&gt;started scraping data&lt;/a&gt; from RecordSearch back in 2008 when I was working at the NAA. Eventually I packaged up &lt;a href=&#34;https://github.com/wragge/recordsearch_tools&#34;&gt;some Python code&lt;/a&gt; to help other researchers create datasets. This was completely rewritten as the &lt;a href=&#34;https://github.com/wragge/recordsearch_data_scraper&#34;&gt;RecordSearch Data Scraper&lt;/a&gt; a few years back, and you can find various tools and examples using it in the &lt;a href=&#34;https://glam-workbench.net/recordsearch/&#34;&gt;RecordSearch section of the GLAM Workbench&lt;/a&gt;. In theory, I might be able to modify the scraper to get around the bot protection, but with the bot wars escalating, it hardly seems worth it – I might get the scraper working, only for it fall foul of the latest bot detection rules. It&amp;rsquo;s really now up to the NAA to decide whether it will find other ways to give researchers access to its data.&lt;/p&gt;
&lt;p&gt;So it seems like I&amp;rsquo;ll be archiving all my RecordSearch code. Unfortunately, many of the RecordSearch notebooks in the GLAM Workbench will no longer work, so I&amp;rsquo;ll be adding warnings and explanations over coming weeks.&lt;/p&gt;
&lt;p&gt;While not entirely unexpected, it&amp;rsquo;s all pretty sad. The RecordSearch scrapers have powered some of my favourite research projects. They enabled Kate Bagnall and me to download the metadata and images behind &lt;a href=&#34;https://www.realfaceofwhiteaustralia.net/&#34;&gt;The Real Face of White Australia&lt;/a&gt; – a process we described in our article &lt;a href=&#34;https://doi.org/10.5281/zenodo.3579530&#34;&gt;&amp;lsquo;The people inside&amp;rsquo;&lt;/a&gt;. Using the scrapers I&amp;rsquo;ve been able to &lt;a href=&#34;https://insidestory.org.au/withheld-pending-advice/&#34;&gt;analyse the process of access examination&lt;/a&gt;, and &lt;a href=&#34;https://updates.timsherratt.org/2021/04/21/secrets-and-lives.html&#34;&gt;extract thousands of redactions&lt;/a&gt; from digitised ASIO surveillance files. Without the scrapers I would never have discovered &lt;a href=&#34;https://github.com/wragge/diy-redactionart&#34;&gt;#redactionart&lt;/a&gt;!&lt;/p&gt;
&lt;iframe src=&#34;https://player.vimeo.com/video/215976633?badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479&#34; frameborder=&#34;0&#34; allow=&#34;autoplay; fullscreen; picture-in-picture; clipboard-write; encrypted-media&#34; style=&#34;width:100%;height:400px;&#34; title=&#34;The Redaction Zoo&#34;&gt;&lt;/iframe&gt;
&lt;p&gt;But while I won&amp;rsquo;t be harvesting any new data, I have a few datasets that I&amp;rsquo;d like to explore further. Fortunately, I just finished compiling some summary data about every series in RecordSearch, and I want to compare this latest harvest with datasets from 2021 and 2022. I need to do some more analysis of &lt;a href=&#34;https://updates.timsherratt.org/2025/02/05/ten-years-of-data-the.html&#34;&gt;ten years&#39; worth of data&lt;/a&gt; capturing the details of files with the access status of &amp;lsquo;closed&amp;rsquo;. I&amp;rsquo;ve been working on an update to my redaction finder, which I think I should still be able to finish. And there&amp;rsquo;s also a lot of data that volunteers have transcribed from records relating to the Real Face of White Australia that I need to pull together.&lt;/p&gt;
&lt;p&gt;While my life has been dominated by Trove in recent years, as a historian my heart has always been with the collections of the National Archives of Australia. I&amp;rsquo;m hoping this is just a temporary setback, and that new methods for data access will emerge.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Farewell Trove</title>
      <link>https://updates.timsherratt.org/2025/05/07/farewell-trove.html</link>
      <pubDate>Wed, 07 May 2025 14:51:34 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/05/07/farewell-trove.html</guid>
      <description>&lt;p&gt;Over the last few months I&amp;rsquo;ve been grappling with the &lt;a href=&#34;https://updates.timsherratt.org/2025/04/11/update-on-trove-data-access.html&#34;&gt;cancellation of my Trove API keys by the National Library of Australia&lt;/a&gt;. It may seem like a minor technical hiccup from the outside, but it&amp;rsquo;s had a major personal impact. For the sake of my health, I&amp;rsquo;ve decided to stop work on Trove, archive all my code repositories related to Trove, and move on. Farewell Trove.&lt;/p&gt;
&lt;p&gt;But don&amp;rsquo;t panic! All of my Trove tools and resources available through the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; and elsewhere will remain online. They just won&amp;rsquo;t be updated. I&amp;rsquo;ll be adding explanatory notices to the affected resources over coming weeks. All of my stuff is openly licensed, so feel free to take what&amp;rsquo;s useful and develop it further yourself.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll also be adding warnings for researchers planning to use the Trove API in their projects. Given the fact that the NLA is willing to change the API terms of use to restrict access without any consultation, provides no transparency around acceptable use of full text content, and is willing to cancel API keys without warning, I can no longer recommend Trove as a reliable source for digital research. A PhD student could embark on a project in good faith, only to have the rules change mid-project.&lt;/p&gt;
&lt;p&gt;I think this is a critical issue for the research sector, and hard questions need to be asked of the NLA. But I can&amp;rsquo;t be the one to do this any more. I&amp;rsquo;m sick of being the person calling the NLA out on its bad behaviour. I&amp;rsquo;m sick of their gaslighting.&lt;/p&gt;
&lt;p&gt;I wanted to avoid making any dramatic gestures, but after talking it over with my partner last night, I realised my health is really suffering and I need to make a change. I also realised that even if my API keys were magically restored, I&amp;rsquo;d always be looking over my shoulder, wondering if I&amp;rsquo;d done something to offend the NLA gatekeepers. That&amp;rsquo;s not a good way to live. I&amp;rsquo;d rather spend my time working with organisations who value what I do.&lt;/p&gt;
&lt;h2 id=&#34;addendum-22-may-2025&#34;&gt;Addendum, 22 May 2025&lt;/h2&gt;
&lt;p&gt;I noticed last night that the Trove API key application process had changed. Previously, non-commercial use was approved automatically. Now you have to fill in a two page form justifying your proposed use. Your application is then assessed against a complex four level review matrix. Responses are provided within &lt;strong&gt;7 to 28 days&lt;/strong&gt;. If you want to download the full text of resources, such as digitised newspapers, you &lt;strong&gt;additionally&lt;/strong&gt; need to apply for an exemption to the terms of use.&lt;/p&gt;
&lt;p&gt;The NLA has also changed the API terms of use, removing all reference to unauthenticated access. Limited unauthenticated (or keyless) access was introduced with version 3 of the API and was useful for quick experimentation, demonstrations, and teaching. Unauthenticated access has now been disabled and all attempts to access the API without a key return an error. This change together with the new and complex application process, makes it difficult, if not impossible, to use the Trove API in teaching and training contexts.&lt;/p&gt;
&lt;p&gt;These changes will further discourage use of the Trove API, and that&amp;rsquo;s probably the point. I wouldn&amp;rsquo;t be surprised if further limits were imposed in the future, or if the NLA decommissioned the API entirely.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;All posts on this topic:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;25 February 2025: &lt;a href=&#34;https://updates.timsherratt.org/2025/02/24/years-of-work-on-trove.html&#34;&gt;15 years of work on Trove threatened by the NLA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2 March 2025: &lt;a href=&#34;https://updates.timsherratt.org/2025/03/02/trove-api-users-beware-the.html&#34;&gt;Trove API users beware! – the latest in the saga of my cancelled API keys&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;11 April 2025: &lt;a href=&#34;https://updates.timsherratt.org/2025/04/11/update-on-trove-data-access.html&#34;&gt;Update on Trove data access and my suspended API keys&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;7 May 2025: &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;Farewell Trove&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>SLV LAB and GLAM Workbench updates</title>
      <link>https://updates.timsherratt.org/2025/05/05/slv-lab-and-glam-workbench.html</link>
      <pubDate>Mon, 05 May 2025 14:26:15 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/05/05/slv-lab-and-glam-workbench.html</guid>
      <description>&lt;p&gt;Last week the State Library of Victoria launched &lt;a href=&#34;https://lab.slv.vic.gov.au/&#34;&gt;SLV LAB&lt;/a&gt;, a prototyping and innovation lab that &amp;lsquo;experiment[s] with technology to open access to collections, data and spaces&amp;rsquo;. The SLV LAB encourages collaboration, and is &lt;a href=&#34;https://lab.slv.vic.gov.au/resources&#34;&gt;sharing code, datasets, and tutorials&lt;/a&gt;. It&amp;rsquo;s an exciting development and I&amp;rsquo;m looking forward to seeing what they get up to. I&amp;rsquo;ve added SLV LAB to the &lt;a href=&#34;https://glam-workbench.net/glam-data-list/#glam-data-portals-repositories&#34;&gt;GLAM data portals &amp;amp; repositories&lt;/a&gt; section of my Australian GLAM data list.&lt;/p&gt;
&lt;p&gt;The launch prompted me to have a look at the SLV section of the GLAM Workbench, which I added about 5 years ago.  There are currently two notebooks which both relate to the SLV&amp;rsquo;s use of &lt;a href=&#34;https://iiif.io/&#34;&gt;IIIF&lt;/a&gt; to deliver their images. When I created them, there was an issue with IIIF image links needing to have a cookie set before you could access them, but that now seems to have been fixed, so I thought it was time for an update.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/state-library-victoria/download_image_from_iiif/&#34;&gt;Download an image using the IIIF server and a Handle url&lt;/a&gt; ­– The SLV uses the Handle system to create  persistent urls for images, and IIIF to deliver the images for use. But how do you get from one to the other? This notebook uses the Handle url to find an image&amp;rsquo;s IIIF identifier, and  then uses IIIF to download the image. The Handle urls are aggregated into Trove, so you could use this method to download SLV images from Trove metadata harvests.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/state-library-victoria/more_fun_with_iiif/&#34;&gt;More fun with IIIF&lt;/a&gt; – This notebook demonstrates how you can use the standard IIIF API to manipulate images from the SLV collection.&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-05-05-13-13-37.png&#34; width=&#34;600&#34; height=&#34;544&#34; alt=&#34;Screen capture of the More fun with IIIF notebook demonstrating how to rotate an image.&#34;&gt;
&lt;p&gt;I&amp;rsquo;ve removed the need for cookie consumption and simplified some of the code in the notebooks. I&amp;rsquo;ve also updated the repository to embed the GLAM Workbench&amp;rsquo;s latest systems and integrations. This means, among other things, that the repository is &lt;a href=&#34;https://doi.org/10.5281/zenodo.15321603&#34;&gt;preserved in Zenodo with a DOI&lt;/a&gt;, and a Docker image is automatically built that makes it easy to run the notebooks in a variety of contexts – including the &lt;a href=&#34;https://glam-workbench.net/using-ardc-binder/&#34;&gt;ARDC Binder service&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Perhaps most interestingly, I&amp;rsquo;ve also created a &lt;a href=&#34;https://glam-workbench.net/state-library-victoria-jlite/lab/index.html?path=more_fun_with_iiif.ipynb&#34;&gt;Jupyter Lite version of the &amp;lsquo;More fun with IIIF&amp;rsquo; notebook&lt;/a&gt; that runs in your browser without any need for a cloud server. Unfortunately, I can&amp;rsquo;t do the same with the Handle/IIIF notebook because Jupyter Lite runs afoul of CORS permissions when requesting the Handle url.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m looking forward to adding additional notebooks and examples as the SLV LAB develops, and shares more data and code.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>New PROV section added to the GLAM Workbench</title>
      <link>https://updates.timsherratt.org/2025/04/30/new-prov-section-added-to.html</link>
      <pubDate>Wed, 30 Apr 2025 15:00:09 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/04/30/new-prov-section-added-to.html</guid>
      <description>&lt;p&gt;There&amp;rsquo;s a brand &lt;a href=&#34;https://glam-workbench.net/prov/&#34;&gt;new GLAM Workbench section&lt;/a&gt; to help you work with data from the Public Record Office Victoria!&lt;/p&gt;
&lt;p&gt;Over the past couple of months, I&amp;rsquo;ve been poking around in the &lt;a href=&#34;https://prov.vic.gov.au/about-us/our-blog/new-prov-public-api&#34;&gt;PROV&amp;rsquo;s collection API&lt;/a&gt;. The API provides data about PROV&amp;rsquo;s archival holdings in a machine readable format. This makes it possible to use, analyse, and visualise the  collection in new ways.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve already shared a few of the results of my explorations. There&amp;rsquo;s &lt;a href=&#34;https://updates.timsherratt.org/2025/04/09/introducing-provbot-sharing-photos-from.html&#34;&gt;PROVbot sharing randomly-selected photos&lt;/a&gt; via the Fediverse; a &lt;a href=&#34;https://updates.timsherratt.org/2025/04/10/using-the-public-record-office.html&#34;&gt;data dashboard providing an overview of the PROV collection&lt;/a&gt;; and &lt;a href=&#34;https://updates.timsherratt.org/2025/04/09/more-than-million-rows-of.html&#34;&gt;6 million rows of PROV data added to the GLAM Name Index Search&lt;/a&gt;. At the same time I&amp;rsquo;ve been documenting how the API works, and the sorts of data it provides. I&amp;rsquo;ve now compiled this documentation into a Jupyter notebook – &lt;a href=&#34;https://glam-workbench.net/prov/getting-started/&#34;&gt;Getting started with the PROV API&lt;/a&gt; – and added it to the GLAM Workbench.&lt;/p&gt;
&lt;p&gt;The PROV API provides a lot of rich, interconnected data, but there&amp;rsquo;s not much documentation on the PROV website. I&amp;rsquo;m hoping that this new section of the GLAM Workbench will encourage people to explore its possibilities. I&amp;rsquo;ll be adding more notebooks over time, examining the nature of the data in more depth, and probably creating a few useful tools and visualisations. Let me know if you have ideas for new notebooks!&lt;/p&gt;
&lt;p&gt;I recently made one of the GLAM Workbench&amp;rsquo;s introductory notebooks &lt;a href=&#34;https://updates.timsherratt.org/2025/04/28/the-glam-workbench-introduction-to.html&#34;&gt;available to run live using Jupyter Lite&lt;/a&gt;. This means the notebook loads everything it needs within your browser, rather than depending on a separate cloud service. The new PROV section comes with Jupyter Lite support baked in. If you go to the &lt;a href=&#34;https://glam-workbench.net/prov/getting-started/&#34;&gt;page describing the API notebook&lt;/a&gt;, you&amp;rsquo;ll notice there&amp;rsquo;s a brand new option to run the notebook using &lt;a href=&#34;https://jupyterlite.readthedocs.io/en/latest/&#34;&gt;Jupyter Lite&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-04-30-13-27-11.png&#34; width=&#34;600&#34; height=&#34;214&#34; alt=&#34;Screenshot of the &#39;Using this notebook&#39; section of the page showing the option to run the notebook using JupyterLite&#34;&gt;
&lt;p&gt;This is the first time I&amp;rsquo;ve integrated Jupyter Lite in this way, and I think it opens up some exciting possibilities. Not only does it make it easier and quicker to jump in and start playing, it means you can embed a live, working version of the notebook in any web page. Like this!&lt;/p&gt;
&lt;iframe src=&#34;https://glam-workbench.net/prov-jlite/lab/index.html?path=getting-started.ipynb&#34; width=&#34;100%&#34;  height=&#34;500&#34;&gt;&lt;/iframe&gt;
&lt;p&gt;And a reminder – this work on the PROV API, like most of the GLAM Workbench, has received no direct funding. I do it  because I want to help researchers use GLAM collections in new ways. If you&amp;rsquo;d like to support the GLAM Workbench you can &lt;a href=&#34;https://github.com/sponsors/wragge&#34;&gt;sponsor me on GitHub&lt;/a&gt; or &lt;a href=&#34;https://www.buymeacoffee.com/wragge&#34;&gt;buy me a coffee&lt;/a&gt;. Most importantly, please share this information with anyone you think might find it useful.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>The GLAM Workbench introduction to how notebooks work now runs in Jupyter Lite</title>
      <link>https://updates.timsherratt.org/2025/04/28/the-glam-workbench-introduction-to.html</link>
      <pubDate>Mon, 28 Apr 2025 12:51:42 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/04/28/the-glam-workbench-introduction-to.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve just updated my introduction to &lt;a href=&#34;https://glam-workbench.net/getting-started-jlite/lab/index.html?path=using_jupyter_notebooks.ipynb&#34;&gt;using Jupyter notebooks in the GLAM Workbench&lt;/a&gt; so that it runs in &lt;a href=&#34;https://jupyterlite.readthedocs.io/en/latest/&#34;&gt;Jupyter Lite&lt;/a&gt; – that means no more waiting for cloud services to spin up, it all happens in your browser!&lt;/p&gt;
&lt;p&gt;All the Jupyter notebooks in &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; can be run in the cloud using the free Binder service – either &lt;a href=&#34;https://binderhub.rc.nectar.org.au/&#34;&gt;through the ARDC&lt;/a&gt; (requires authentication), or through the &lt;a href=&#34;https://mybinder.org/&#34;&gt;public, community-run service&lt;/a&gt;. While it&amp;rsquo;s usually just a matter of clicking a link, Binder can take a while to build the necessary computing environment, and sometimes it just fails. &lt;a href=&#34;https://jupyterlite.readthedocs.io/en/latest/&#34;&gt;Jupyter Lite&lt;/a&gt; takes a different approach. Instead of building things in the cloud, it sets up everything it needs to run notebooks &lt;em&gt;within your own browser&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been experimenting with Jupyter Lite a bit over the past couple of years, waiting for the technology to reach the point where I could integrate it into the GLAM Workbench without greatly multiplying the maintenance burden. The obvious place to start was my introductory notebook, which demonstrates how Jupyter notebooks themselves work. Using live data from the &lt;a href=&#34;https://www.nma.gov.au/about/our-collection/museum-api&#34;&gt;National Museum of Australia API&lt;/a&gt;, it describes the basic structure of notebooks, and shows you how to edit and run code within them. I&amp;rsquo;ve now set things up so &lt;a href=&#34;https://glam-workbench.net/getting-started-jlite/lab/index.html?path=using_jupyter_notebooks.ipynb&#34;&gt;this notebook runs in Jupyter Lite&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;What does this mean? Previously, the link to the introductory notebook spun up a new Binder instance. Now, the link retrieves a static web page hosted on GitHub. As this page loads, it installs a Python kernel and everything else it needs to run the notebook within your browser. It&amp;rsquo;s a lot faster than waiting for Binder, and provides a smoother experience for new users. And because it&amp;rsquo;s just an ordinary web page, I can even embed a live, working version of the notebook within this blog post. Try it out!&lt;/p&gt;
&lt;iframe height=500 width=&#34;100%&#34; src=&#34;https://glam-workbench.net/getting-started-jlite/lab/index.html?path=using_jupyter_notebooks.ipynb&#34;&gt;&lt;/iframe&gt;
&lt;p&gt;Jupyter Lite won&amp;rsquo;t currently work with every notebook in the GLAM Workbench. Some Python packages are difficult to install, and some data sources can&amp;rsquo;t be accessed due to CORS problems. But I&amp;rsquo;m planning to add Jupyter Lite options where I can.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Update on Trove data access and my suspended API keys</title>
      <link>https://updates.timsherratt.org/2025/04/11/update-on-trove-data-access.html</link>
      <pubDate>Fri, 11 Apr 2025 16:27:16 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/04/11/update-on-trove-data-access.html</guid>
      <description>&lt;p&gt;On 21 February, my Trove API keys were &lt;a href=&#34;https://updates.timsherratt.org/2025/02/24/years-of-work-on-trove.html&#34;&gt;cancelled without warning&lt;/a&gt;. A week later, I met with NLA staff and was shocked to be told that downloading &amp;lsquo;content&amp;rsquo;, such as the text of digitised newspaper articles, was regarded as a &lt;a href=&#34;https://updates.timsherratt.org/2025/03/02/trove-api-users-beware-the.html&#34;&gt;breach of the API terms of use&lt;/a&gt;. Without API access I can&amp;rsquo;t continue my work helping researchers make use of Trove. More generally though, the NLA&amp;rsquo;s actions threaten innovative digital research. This post tries to answer some questions raised by my first two posts, and provides some updates on recent actions by the NLA.&lt;/p&gt;
&lt;h2 id=&#34;whats-an-api-key&#34;&gt;What&amp;rsquo;s an API key?&lt;/h2&gt;
&lt;p&gt;You might be wondering what an API key is and why it&amp;rsquo;s important. At its heart, it&amp;rsquo;s all about access to data. The Trove API delivers information from Trove in a form that computers can understand and process. This allows researchers to compile datasets for detailed analysis or visualisation, and supports the development of innovative tools and interfaces that help all Trove users. But access to the API is controlled by keys ­– no key, no data.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve had an API key since 2012, and have used my keys in a variety of ways to help people use and understand Trove. Some keys were linked to particular applications such as the Trove API Console, and Headline Roulette. Others are used in the development of tools and resources such as the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt;, &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt;, and the &lt;a href=&#34;https://wragge.github.io/trove-newspaper-harvester/&#34;&gt;Trove Newspaper Harvester&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The NLA suspended all my keys without warning or consideration as to how they were being used.&lt;/strong&gt; Services like the Trove API Console which are dependent on keys for their operation simply stopped working.&lt;/p&gt;
&lt;p&gt;No. warning. at. all.&lt;/p&gt;
&lt;p&gt;However, the greater concern for me is that I can no longer develop or maintain things like the GLAM Workbench. I&amp;rsquo;ve spent a lot of time over the last 15 years responding to researcher inquiries, building tools to enable new research projects, and supporting researchers as they begin to explore the possibilities of Trove data. Most of this work has been unpaid. I do it because I think it&amp;rsquo;s important. But with no API keys my hands are tied. My ability to help researchers is severely limited.&lt;/p&gt;
&lt;p&gt;The National Library of Australia chose to do this.&lt;/p&gt;
&lt;h2 id=&#34;what-are-the-terms-of-use&#34;&gt;What are the terms of use?&lt;/h2&gt;
&lt;p&gt;I haven&amp;rsquo;t received a clear explanation as to why all of my API keys were cancelled. &lt;a href=&#34;https://updates.timsherratt.org/2025/03/02/trove-api-users-beware-the.html&#34;&gt;As I noted&lt;/a&gt;, discussion of this in my meeting with the NLA was confused and contradictory. But, in general terms, it seems to relate to the Trove &lt;a href=&#34;https://trove.nla.gov.au/about/create-something/using-api/trove-api-terms-use&#34;&gt;API terms of use&lt;/a&gt; which were changed in 2020. In particular, the NLA now insists that accessing the &amp;lsquo;content&amp;rsquo; of resources, rather  than just the descriptive metadata, is a breach of the API terms of use. This includes the full text of digitised newspaper and journal articles that are included in API responses.&lt;/p&gt;
&lt;p&gt;Through all of this it&amp;rsquo;s important to remember that the API terms of use are not imposed on the NLA by some external authority. They created them and can change them again at any time. If the NLA believes that work like mine has value, if they believe that researcher access to publicly-funded resources is important, they can change the rules to support these sorts of activities. To not do so is a choice.&lt;/p&gt;
&lt;p&gt;The terms of use were changed back in 2020, so why has the NLA suddenly chosen to act? From my point of view nothing has changed. All my work is open. I&amp;rsquo;ve just been doing what I&amp;rsquo;ve been doing since 2010. No-one has reached out to me over the last five years with concerns about the terms of use. There must be something else going on here, but, given the NLA&amp;rsquo;s lack of transparency, it&amp;rsquo;s hard to know.&lt;/p&gt;
&lt;p&gt;On 10 April, API users with keys dating back before 2020 started receiving emails that require them to explicitly accept the 2020 terms of use, or give up their keys. I suspect this is because the NLA has realised that API users weren&amp;rsquo;t informed of the changes at the time. Section 20 of the terms of use states:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The Library may change these API terms at any  time at any time in its sole discretion. The Library will notify you of  any changes to these API terms by adding a statement on Trove and the amended API terms will take effect 5 working days (in the Australian  Capital Territory) after the date on which the statement was added to  Trove. If you do not agree to the updated API terms, you must  immediately cease using Trove.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If no such statement was added to Trove, then no agreement can be assumed.&lt;/p&gt;
&lt;p&gt;The email sent to API users could have been an opportunity to explain why these changes were made, and support developers and researchers in making any necessary adjustments. But no. &lt;strong&gt;The major change in terms of access to content is not even mentioned.&lt;/strong&gt; It seems the NLA just wants to slip this past quietly without anyone really noticing.&lt;/p&gt;
&lt;h2 id=&#34;is-it-all-about-ai&#34;&gt;Is it all about AI?&lt;/h2&gt;
&lt;p&gt;Some have wondered whether the NLA&amp;rsquo;s actions were motivated by AI crawlers hoovering up vast quantities of online content. The Director General&amp;rsquo;s response to people who wrote to her expressing their concern over my treatment does highlight the challenges of AI, pointing to the NLA&amp;rsquo;s new &lt;a href=&#34;https://www.library.gov.au/visit/about-us/corporate-information/corporate-strategies/artificial-intelligence-framework&#34;&gt;Artificial intelligence framework&lt;/a&gt;. However, this framework is almost exclusively concerned with the NLA&amp;rsquo;s own use of AI technologies. The only reference to external AI use is the statement: &amp;lsquo;We will seek to protect our data from external AI systems where their use contravenes the access rights
of publishers and authors&amp;rsquo;.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s also not clear how this relates to API use. Commercial use of the Trove API has always been handled differently to non-commercial use. Applications for commercial use are individually examined and typically involve a quid pro quo, such as access to paywalled services built using the API. So why impose new restrictions on non-commercial uses?&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s also worth noting that the changes to the terms of use were made in 2020, before the impact of AI crawlers was well understood. Their impact, however, might explain why the terms of use are suddenly being enforced.&lt;/p&gt;
&lt;p&gt;Of course, most AI crawlers are probably just going to scrape stuff from the NLA&amp;rsquo;s websites. To try and manage this, it would seem preferable to push bots towards the API, not away. That way their use could be monitored and better controlled. The Wikimedia Foundation recently published an interesting article on &lt;a href=&#34;https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/&#34;&gt;How crawlers impact the operations of the Wikimedia projects&lt;/a&gt;. Their planned responses to this challenge include improving their APIs and working with developers to manage their usage.&lt;/p&gt;
&lt;p&gt;Perhaps the NLA&amp;rsquo;s new policing of their terms of use is some sort of hamfisted response to AI threats, but why go after me? As Seb Chan noted on LinkedIn:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Programmatic access to  cultural collections has been vital to establishing new forms of  practice, research, new scholarship and community discovery. Tim’s work  has been globally recognized as important and vital for a very long  time. Even if we acknowledge the increased cybersecurity risks (eg  British Library), and concerns about AI bot content harvesting that likely lie behind this suspension of Tim’s generous work, this continues to feel like another own goal in the Australian cultural sector.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If there&amp;rsquo;s a problem, why isn&amp;rsquo;t the NLA discussing it with the research sector instead of picking on individuals?&lt;/p&gt;
&lt;h2 id=&#34;who-does-this-affect&#34;&gt;Who does this affect?&lt;/h2&gt;
&lt;p&gt;The NLA&amp;rsquo;s actions have hit me pretty hard. Having 15 years of work discarded by an organisation that you&amp;rsquo;ve always sought to promote is disheartening to say the least. The impact on my work and life has been such that I&amp;rsquo;ve considered whether there might be legal recourse. I&amp;rsquo;m sad, anxious, and disappointed.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m also worried about the impact of the NLA&amp;rsquo;s actions on the research sector in general. In the Director General&amp;rsquo;s response she notes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Where there is a clear need, the Library grants exemptions to the Terms of Use in order to support research
and will continue to do so. We are working to make the process for applying for exemptions more transparent
and to reach out to API users to clarify the process.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;However, the email sent to API users simply says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If your use of the Trove API does not align with the Trove API Terms of Use, please get in touch with Trove Support to discuss an exemption agreement.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;No process. No transparency.&lt;/p&gt;
&lt;p&gt;Any researcher wanting to analyse digitised newspaper articles, parliamentary papers, or other full text content will first need to obtain an &amp;lsquo;exemption agreement&amp;rsquo;. There is no information on what such an agreement involves, or how applications are assessed. The Trove API is no longer open. Research projects are now subject to the whims of NLA gatekeepers.&lt;/p&gt;
&lt;p&gt;Established researchers, or large, well-resourced projects, might see no problem here. They can more easily, and more confidently, justify their needs. The greatest impact will be on experimental projects, where the boundaries are not yet charted. What worries me most are the HDR and ECR researchers who will be deterred by these new restrictions. What new questions will not be asked? What new approaches will be discarded because of these barriers erected by the NLA?&lt;/p&gt;
&lt;p&gt;There will also be a significant impact on research training. How can you run a workshop on using the Trove API if participants are required not only to get an API key, but an individual &amp;lsquo;exemption agreement&amp;rsquo; to work with full text?&lt;/p&gt;
&lt;h2 id=&#34;your-support-means-a-lot&#34;&gt;Your support means a lot!&lt;/h2&gt;
&lt;p&gt;The one good thing to come from all of this is the support I&amp;rsquo;ve received from people around the world. Many have written to Trove, the Director General, and the Minister for the Arts. The responses received so far have not been very enlightening but, nonetheless, I think it&amp;rsquo;s really important to remind the NLA that their actions have an impact, and are being watched.&lt;/p&gt;
&lt;p&gt;The Australian Historical Association has made &lt;a href=&#34;https://theaha.org.au/wp-content/uploads/2025/03/AHA-Statement-on-New-Restrictions-to-Trove.pdf&#34;&gt;a public statement&lt;/a&gt;, noting that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The suspension of Dr Sherratt’s API keys, the result of recent changes to Trove’s API policy, marks a troubling restriction to the work of researchers in Australia and overseas.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The Centre for Contemporary Histories at Deakin University included a message from Professor David Lowe about the NLA&amp;rsquo;s actions &lt;a href=&#34;https://cch.deakin.edu.au/news/2025/03/newsletter-17th-march-2025/&#34;&gt;in their recent newsletter&lt;/a&gt;. He wrote:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The best research stems from an environment that maximises the free flow of information and  support for independent scholarship.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Kathryn Greenhill published &lt;a href=&#34;https://librariansmatter.com/blog/2025/03/03/tim-sherratts-trove-api-keys/&#34;&gt;an open letter&lt;/a&gt; that argued:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Reinstating Dr Tim Sherratt’s access to Trove API keys is essential to restore a set of cultural, educational and research tools that enhance  access and usage for National Library resources.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There were also articles in &lt;a href=&#34;https://ia.acs.org.au/article/2025/national-library-cracks-down-on-public-data-access.html&#34;&gt;Information Age&lt;/a&gt; (published by the Australian Computer Society) and &lt;em&gt;The Sizzle&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Concerns are not just being expressed in Australia. Professor Shawn Graham from Carleton University in Canada &lt;a href=&#34;https://electricarchaeology.ca/2025/02/24/wtf-nla/&#34;&gt;published an open letter&lt;/a&gt; in which he stated:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Mr. Sherratt’s work is well known across the world in the galleries,  libraries, archives, and museums sectors. His work developing the GLAM  Workbench has promoted Australia’s cultural heritage world wide. Indeed, because of Mr. Sherratt’s work my own students in our Public History  graduate program are more familiar with the National Library of  Australia’s Trove service, and hence Australian culture, than with what  our own Library and Archives Canada provides. &lt;em&gt;By developing the  GLAMWorbench with the Trove service, Mr. Sherratt has had a major impact in how cultural heritage materials are understood at scale, across the  world&lt;/em&gt;. His work is complementary to your own, and enhances the prestige of the National Library of Australia&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;My battle with the NLA was also used as a case study in a paper on &lt;a href=&#34;https://aldonzel.github.io/intervention-forum-AAF25/&#34;&gt;digital archives as infrastructure&lt;/a&gt; by Anne-Laure  Donzel &amp;amp; Julien Benedetti presented at the forum  de l’Association des archivistes Français.&lt;/p&gt;
&lt;p&gt;Trove used to be recognised internationally as an beacon of open cultural heritage data, but that reputation has been tarnished. I&amp;rsquo;ve been contacted by a number of researchers and GLAM professionals around the world expressing their shock and disappointment at the NLA&amp;rsquo;s behaviour.&lt;/p&gt;
&lt;h2 id=&#34;what-comes-next&#34;&gt;What comes next?&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s now six weeks since my API keys were cancelled. The recent responses by the Director General and the email to API users indicate that &lt;em&gt;something&lt;/em&gt; is happening at the NLA, but I&amp;rsquo;m not sure whether that&amp;rsquo;s going to help me. Anyone who asks the NLA about my situation is told that it can&amp;rsquo;t be discussed due to their &amp;lsquo;privacy&amp;rsquo; obligations.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m seeking to find out more through my local member, Andrew Wilkie, who has contacted the Minister for the Arts. But now there&amp;rsquo;s an election, so I don&amp;rsquo;t expect any updates soon.&lt;/p&gt;
&lt;p&gt;Beyond my personal situation, there are important issues that need to be considered by the research sector, and I&amp;rsquo;m hoping peak organisations such as the &lt;a href=&#34;https://humanities.org.au/&#34;&gt;Australian Academy of the Humanities&lt;/a&gt;, the &lt;a href=&#34;https://socialsciences.org.au/&#34;&gt;Academy of the Social Sciences in Australia&lt;/a&gt;, and the &lt;a href=&#34;https://ardc.edu.au/&#34;&gt;Australian Research Data Commons&lt;/a&gt; will take up the discussion.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s also worth noting that university libraries are Trove &amp;lsquo;partners&amp;rsquo; and are represented on &lt;a href=&#34;https://trove.nla.gov.au/partners/trove-strategic-advisory-committee&#34;&gt;Trove&amp;rsquo;s Strategic Advisory Committee&lt;/a&gt;, so if you&amp;rsquo;re worried about the impact on your own research let your university librarian know, or tell the &lt;a href=&#34;https://www.caul.edu.au/&#34;&gt;Council of Australasian University Librarians&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It seems the NLA is currently hunkered down, waiting for everything to blow over. As an institution they have a lot of cultural power, and people are often reluctant to criticise them in public. But I think it&amp;rsquo;s important to keep on the pressure.&lt;/p&gt;
&lt;p&gt;What I want is pretty simple:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;my API keys back&lt;/li&gt;
&lt;li&gt;an apology for the way I&amp;rsquo;ve been treated&lt;/li&gt;
&lt;li&gt;more transparency from the NLA about API access&lt;/li&gt;
&lt;li&gt;an open discussion within the research sector about the problems and possibilities of working with Trove data&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;All posts on this topic:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;25 February 2025: &lt;a href=&#34;https://updates.timsherratt.org/2025/02/24/years-of-work-on-trove.html&#34;&gt;15 years of work on Trove threatened by the NLA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2 March 2025: &lt;a href=&#34;https://updates.timsherratt.org/2025/03/02/trove-api-users-beware-the.html&#34;&gt;Trove API users beware! – the latest in the saga of my cancelled API keys&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;11 April 2025: &lt;a href=&#34;https://updates.timsherratt.org/2025/04/11/update-on-trove-data-access.html&#34;&gt;Update on Trove data access and my suspended API keys&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;7 May 2025: &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;Farewell Trove&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Using the Public Record Office Victoria&#39;s API to build an overview of their collection</title>
      <link>https://updates.timsherratt.org/2025/04/10/using-the-public-record-office.html</link>
      <pubDate>Thu, 10 Apr 2025 14:02:03 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/04/10/using-the-public-record-office.html</guid>
      <description>&lt;p&gt;Over the past few weeks I&amp;rsquo;ve been exploring the Public Record Office Victoria&amp;rsquo;s public API. There&amp;rsquo;s not a lot of documentation, but there is a lot of data!&lt;/p&gt;
&lt;p&gt;What&amp;rsquo;s not immediately obvious is that the API includes information about a variety of different entities within the &lt;a href=&#34;https://prov.vic.gov.au/recordkeeping-government/a-z-topics/archival-control-model&#34;&gt;PROV&amp;rsquo;s model for archival description&lt;/a&gt; – not just items, but functions, agencies, series and more. You can limit your API requests to a particular entity using the &lt;code&gt;category&lt;/code&gt; field. You can also request facet counts from the &lt;code&gt;category&lt;/code&gt; field to tell you how many of each type of entity are available from the API.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-04-10-12-50-21.png&#34; width=&#34;600&#34; height=&#34;275&#34; alt=&#34;Bar chart showing the number of records for each entity including Agency, Consignment, Function, Image, Item, Series, and related Entity.&#34;&gt;
&lt;p&gt;I&amp;rsquo;ve been documenting this sort of information in notebooks for inclusion in the forthcoming PROV section of the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt;. But I thought it might be useful to pull a few things together as a standalone dashboard, providing an overview of the PROV collection. So, &lt;a href=&#34;https://wragge.github.io/prov-dashboard/&#34;&gt;here it is&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://wragge.github.io/prov-dashboard/&#34;&gt;dashboard&lt;/a&gt; tells you how many records are currently available through the API, and breaks down this count by &lt;code&gt;entity&lt;/code&gt;, and &lt;code&gt;category&lt;/code&gt;. It then works through the main entities – functions, agencies, series, items, and images – displaying a series of charts and tables that give you an idea what they&amp;rsquo;re actually made up of.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-04-10-12-49-49.png&#34; width=&#34;600&#34; height=&#34;504&#34; alt=&#34;Table showing the agencies that have created the most series. It includes the agency id and title, and the number of series created by it.&#34;&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-04-10-12-50-57.png&#34; width=&#34;600&#34; height=&#34;485&#34; alt=&#34;Bar chart showing the number of digitised items for each decade from 1830 to 2020.&#34;&gt;
&lt;p&gt;The &lt;a href=&#34;https://wragge.github.io/prov-dashboard/&#34;&gt;dashboard&lt;/a&gt; is &lt;a href=&#34;https://github.com/wragge/prov-dashboard&#34;&gt;hosted on GitHub&lt;/a&gt; and is automatically updated every Sunday. In the future, I&amp;rsquo;ll do more to highlight changes over time.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>More than 6 million rows of data from Public Record Office Victoria added to the GLAM Name Index Search</title>
      <link>https://updates.timsherratt.org/2025/04/09/more-than-million-rows-of.html</link>
      <pubDate>Wed, 09 Apr 2025 16:28:46 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/04/09/more-than-million-rows-of.html</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Index Search&lt;/a&gt; now includes more than 6 million rows of data from the &lt;a href=&#34;https://prov.vic.gov.au/&#34;&gt;Public Record Office Victoria&lt;/a&gt;, downloaded using their &lt;a href=&#34;https://prov.vic.gov.au/prov-collection-api&#34;&gt;public API&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Index Search&lt;/a&gt; brings together records that include the names of people from 10 Australian GLAM organisations. With a single search, you can find information about individuals across millions of rows of data.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-04-09-15-19-57.png&#34; width=&#34;600&#34; height=&#34;624&#34; alt=&#34;Screenshot of GLAM Name Index Search listing the GLAM organisations included and the number of rows of data from each&#34;&gt;
&lt;p&gt;Previous versions of the &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Index Search&lt;/a&gt; included a few datasets from the Public Record Office Victoria that had been shared through government open data portals. However, as I was exploring PROV&amp;rsquo;s public API recently, I realised that there were many more records that included people&amp;rsquo;s names.&lt;/p&gt;
&lt;p&gt;People&amp;rsquo;s names appear in a number of different fields in the PROV data – including &lt;code&gt;family_name&lt;/code&gt;, &lt;code&gt;description.name&lt;/code&gt;, and &lt;code&gt;sams.description.name_of_person&lt;/code&gt;. They can also be attached to either &amp;lsquo;items&amp;rsquo; or &amp;lsquo;images&amp;rsquo;. &amp;lsquo;Items&amp;rsquo; are individual records, such as files, while &amp;lsquo;images&amp;rsquo; relate metadata to a page from a digitised item. This means that an image record can tell you that a person is mentioned on a specific page. Both item and image records can point you to useful information.&lt;/p&gt;
&lt;p&gt;By using the API to search for records with values in one of the possible name fields, I compiled a list of series that contain items or images that include people&amp;rsquo;s names in their metadata.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;series_id&lt;/th&gt;
&lt;th&gt;series_title&lt;/th&gt;
&lt;th&gt;record_category&lt;/th&gt;
&lt;th&gt;number_of_records&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;Probate and Administration Files&lt;/td&gt;
&lt;td&gt;Item&lt;/td&gt;
&lt;td&gt;2,578,652&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;948&lt;/td&gt;
&lt;td&gt;Outward Passengers to Interstate, U.K. and Foreign Ports (Refer to Microfilm Copy VPRS 3506)&lt;/td&gt;
&lt;td&gt;Image&lt;/td&gt;
&lt;td&gt;1,661,181&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;947&lt;/td&gt;
&lt;td&gt;Inward Overseas  Passenger Lists (see Microfiche Copies: VPRS 7666 United Kingdom Ports;  VPRS 7667 Foreign Ports; VPRS 13439 New Zealand Ports)&lt;/td&gt;
&lt;td&gt;Image&lt;/td&gt;
&lt;td&gt;1,608,515&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7591&lt;/td&gt;
&lt;td&gt;Wills&lt;/td&gt;
&lt;td&gt;Item&lt;/td&gt;
&lt;td&gt;933,110&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;17379&lt;/td&gt;
&lt;td&gt;Probate and Administration Files (CourtView)&lt;/td&gt;
&lt;td&gt;Item&lt;/td&gt;
&lt;td&gt;294,882&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;Inquest Deposition Files&lt;/td&gt;
&lt;td&gt;Item&lt;/td&gt;
&lt;td&gt;216,768&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Register of Assisted Immigrants from the United Kingdom [refer to microform copy, VPRS 3502]&lt;/td&gt;
&lt;td&gt;Image&lt;/td&gt;
&lt;td&gt;173,167&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5357&lt;/td&gt;
&lt;td&gt;Land Selection And Correspondence Files&lt;/td&gt;
&lt;td&gt;Item&lt;/td&gt;
&lt;td&gt;136,782&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10010&lt;/td&gt;
&lt;td&gt;Body Cards&lt;/td&gt;
&lt;td&gt;Item&lt;/td&gt;
&lt;td&gt;109,968&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4527&lt;/td&gt;
&lt;td&gt;Ward Register (known as Children&amp;rsquo;s Registers 1864 - 1887)&lt;/td&gt;
&lt;td&gt;Image&lt;/td&gt;
&lt;td&gt;55,429&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13579&lt;/td&gt;
&lt;td&gt;Teacher Record Books&lt;/td&gt;
&lt;td&gt;Image&lt;/td&gt;
&lt;td&gt;49,106&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;515&lt;/td&gt;
&lt;td&gt;Central Register of Male Prisoners&lt;/td&gt;
&lt;td&gt;Image&lt;/td&gt;
&lt;td&gt;44,285&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5714&lt;/td&gt;
&lt;td&gt;Land Selection Files, Section 12 Closer Settlement Act 1938 [including obsolete and  top numbered Closer Settlement and WW1 Discharged Soldier Settlement  files]&lt;/td&gt;
&lt;td&gt;Image&lt;/td&gt;
&lt;td&gt;7,721&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;516&lt;/td&gt;
&lt;td&gt;Central Register of Female Prisoners&lt;/td&gt;
&lt;td&gt;Image&lt;/td&gt;
&lt;td&gt;6,782&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7933&lt;/td&gt;
&lt;td&gt;Non-Issued Probate Applications&lt;/td&gt;
&lt;td&gt;Item&lt;/td&gt;
&lt;td&gt;3,301&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5357&lt;/td&gt;
&lt;td&gt;Land Selection And Correspondence Files&lt;/td&gt;
&lt;td&gt;Image&lt;/td&gt;
&lt;td&gt;644&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7592&lt;/td&gt;
&lt;td&gt;Wills and Probate and Administration Files&lt;/td&gt;
&lt;td&gt;Item&lt;/td&gt;
&lt;td&gt;606&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;18110&lt;/td&gt;
&lt;td&gt;Clinical Notes and Patient Files (Receiving House)&lt;/td&gt;
&lt;td&gt;Item&lt;/td&gt;
&lt;td&gt;391&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;79&lt;/td&gt;
&lt;td&gt;Articles of Clerkship Files&lt;/td&gt;
&lt;td&gt;Item&lt;/td&gt;
&lt;td&gt;374&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;266&lt;/td&gt;
&lt;td&gt;Inward Registered Correspondence&lt;/td&gt;
&lt;td&gt;Item&lt;/td&gt;
&lt;td&gt;275&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;I then looped through each series in this list, saving details of each item or image. Finally, to prepare the data from the &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Index Search&lt;/a&gt;, I excluded some fields that weren&amp;rsquo;t useful and converted the data files into CSV format. All the details are in a notebook that I&amp;rsquo;ll add to the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; in coming weeks.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-04-09-15-20-33.png&#34; width=&#34;600&#34; height=&#34;716&#34; alt=&#34;Screen shot from the GLAM Name Index Search showing a list of datasets from the PROV&#34;&gt;
&lt;p&gt;The result is that the &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Index Search&lt;/a&gt; now includes 20 datasets from the Public Record Office Victoria, containing 6,645,269 rows of data! All together, the GLAM Name Index Search includes around 13 million records, so almost half are from PROV.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Introducing PROVBot – sharing photos from Public Record Office Victoria</title>
      <link>https://updates.timsherratt.org/2025/04/09/introducing-provbot-sharing-photos-from.html</link>
      <pubDate>Wed, 09 Apr 2025 12:56:26 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/04/09/introducing-provbot-sharing-photos-from.html</guid>
      <description>&lt;p&gt;With poor old TroveNewsBot &lt;a href=&#34;https://updates.timsherratt.org/2025/02/24/years-of-work-on-trove.html&#34;&gt;killed by the NLA,&lt;/a&gt; my Mastodon feed has had less GLAM goodness of late. To try and fill the void I&amp;rsquo;ve created &lt;a href=&#34;https://wraggebots.net/@provbot&#34;&gt;PROVBot&lt;/a&gt;, sharing photos from the &lt;a href=&#34;https://prov.vic.gov.au/&#34;&gt;Public Record Office Victoria&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-04-09-11-50-21.png&#34; width=&#34;600&#34; height=&#34;508&#34; alt=&#34;Screen capture of a post by PROVBot. The text reads: B526 Various scenes of activities at Emily MacPherson Collage - Weaving, art, cookery, design, chemistry, etc – Subject : Emily MacPherson  College Melbourne – part of VPRS14517, Negatives of Photographs  [Publications Branch]. The text is accompanied by a black and white photograph of women using what appear to be looms.&#34;&gt;
&lt;p&gt;PROVBot makes use of the &lt;a href=&#34;https://prov.vic.gov.au/prov-collection-api&#34;&gt;Public Record Office Victoria&amp;rsquo;s public API&lt;/a&gt;. At this stage it just selects and shares a random photograph once a day, but in the future I&amp;rsquo;ll probably add more features, such as the ability to respond to search queries.&lt;/p&gt;
&lt;p&gt;You can find PROVBot on the Fediverse at &lt;a href=&#34;https://wraggebots.net/@provbot&#34;&gt;https://wraggebots.net/@provbot&lt;/a&gt;. There&amp;rsquo;s also &lt;a href=&#34;https://wraggebots.net/@provbot/feed.rss&#34;&gt;an RSS feed&lt;/a&gt; that you can pop into your preferred feed reader. The bot&amp;rsquo;s code is openly licensed and &lt;a href=&#34;https://github.com/wragge/provbot&#34;&gt;available on GitHub&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Trove API users beware!  – the latest in the saga of my cancelled API keys</title>
      <link>https://updates.timsherratt.org/2025/03/02/trove-api-users-beware-the.html</link>
      <pubDate>Sun, 02 Mar 2025 16:26:51 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/03/02/trove-api-users-beware-the.html</guid>
      <description>&lt;p&gt;After &lt;a href=&#34;https://updates.timsherratt.org/2025/02/24/years-of-work-on-trove.html&#34;&gt;my Trove API keys were cancelled without warning&lt;/a&gt; on 21 February, I reluctantly agreed to a meeting with the National Library of Australia. They had provided so little information in their emails, that it seemed to be the only way to find out what was really going on. I came out of the meeting shocked by the NLA&amp;rsquo;s change in attitude towards API use.&lt;/p&gt;
&lt;h2 id=&#34;tldr--youre-probably-breaching-the-api-terms-of-use&#34;&gt;TL;DR – you&amp;rsquo;re probably breaching the API terms of use&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;All Trove API users need to be aware that the NLA now insists that accessing the &amp;lsquo;content&amp;rsquo; of resources, rather than just the descriptive metadata, is a breach of the API terms of use. This includes the full text of digitised newspaper and journal articles that are included in API responses. Yes, that&amp;rsquo;s right, using the Trove API in the way that it has been designed and documented is a breach of its own terms of use. You can only download the full text of items using the API if you seek and obtain explicit permission from the NLA beforehand. Note also that the NLA is reviewing people&amp;rsquo;s use of the API and, as demonstrated by my case, they can and will suspend your API keys without warning.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id=&#34;through-the-looking-glass&#34;&gt;Through the looking glass&lt;/h2&gt;
&lt;p&gt;I hate meetings and avoid conflict, but I didn&amp;rsquo;t see any alternative to meeting with the NLA to find the real reasons behind the cancellation of my API keys. As a classic introvert, I spent the days leading up to the meeting anxiously imagining every possible way the conversation might go. Or so I thought. The outcome was not one I predicted.&lt;/p&gt;
&lt;p&gt;The meeting was attended by three Directors responsible for the delivery of &amp;lsquo;Trove business&amp;rsquo;: the Director of Trove Community Services, the Director of Trove Data and Platforms, and the Director of Strategy and Transformation. Feeling somewhat outnumbered, I took along a colleague. I&amp;rsquo;m glad I did, as afterwards we had to confirm with each other that what we thought had just happened, actually happened. We were both stunned.&lt;/p&gt;
&lt;p&gt;The meeting started with a description of NLA&amp;rsquo;s change in API policy as described above. It was like stepping through the looking glass. I had not imagined a world where the NLA would set itself up as gatekeeper to every use of the digitised newspaper corpus through the API.&lt;/p&gt;
&lt;p&gt;At one point we were told that this change coincided with the release of version 2 of the API. But this doesn&amp;rsquo;t seem right. Checking the web archive, the terms of use page seems to have been updated when the whole Trove web interface changed in 2020. (You can see how it changed between &lt;a href=&#34;https://web.archive.org/web/20200227102147/http://help.nla.gov.au/trove/building-with-trove/api-terms-of-use&#34;&gt;February&lt;/a&gt; and &lt;a href=&#34;https://web.archive.org/web/20200909033745/https://trove.nla.gov.au/about/create-something/using-api/trove-api-terms-use&#34;&gt;September&lt;/a&gt; 2020.) In fact, &lt;a href=&#34;https://webarchive.nla.gov.au/awa/20191107024505/https://help.nla.gov.au/sites/default/files/API%20V2.1%20-%20Whats%20Changed%20-%20Full%20Text%20Search%20Release.pdf&#34;&gt;version 2.1 of the API&lt;/a&gt;, released in September 2019, was described by the NLA as opening up &amp;lsquo;access to richer data for API users, particularly the rapidly growing corpus of 1.6 million digitised articles from Australian journals, magazines and newsletters&amp;rsquo;. There was no indication then that access to this data required special permission.&lt;/p&gt;
&lt;p&gt;But when the change happened is less important than how it was, or wasn&amp;rsquo;t, communicated, and what it means for researchers. At multiple points throughout the meeting I stressed that if this is how they are interpreting and enforcing the API terms of use, they need to be explaining this change to API users. I&amp;rsquo;m certain that I&amp;rsquo;m not alone in being totally blindsided.&lt;/p&gt;
&lt;p&gt;The reasons for the change are not clear. There was some talk of &amp;lsquo;data governance&amp;rsquo;, and the fact that the online world had changed – though I fail to see how researchers downloading newspaper articles from the 1890s can be seen as a possible cyber threat. If there are particular problems or concerns, I suggested that it would be useful to have a broad-ranging conversation with the research sector, to see if it might be possible to carve out space for research uses within the terms of use. In response I was told there already is such a carve out – individual researchers can ask the NLA for permission.&lt;/p&gt;
&lt;h2 id=&#34;implications-for-my-own-work&#34;&gt;Implications for my own work&lt;/h2&gt;
&lt;p&gt;I was so shaken by this turn away from open access that the question of my own API keys hardly seemed to matter. The immediate reason for the cancellation of my keys is still not clear. The NLA admitted that I hadn&amp;rsquo;t used the API to extract text from NED journals as their original email claimed. Then it was suggested I used the API to &lt;em&gt;find&lt;/em&gt; NED journals to extract text from. This is also not true. Discussion then broadened to the whole of the &lt;a href=&#34;https://glam-workbench.net&#34;&gt;GLAM Workbench&lt;/a&gt;, and, yes, I readily admit that under the new interpretation much of my work breaches the API terms of use. The Trove Newspaper Harvester makes it easy for researchers to create datasets from the full text of newspaper articles in a set of search results. There are notebooks within the GLAM Workbench to help users access the full text of journals and books. In most cases researchers want and need content, not just metadata, and I&amp;rsquo;ve developed a range of tools to help them access it. But as I explained in my &lt;a href=&#34;https://updates.timsherratt.org/2025/02/24/years-of-work-on-trove.html&#34;&gt;last post&lt;/a&gt;, none of this is new. I&amp;rsquo;ve been helping researchers in this way for 15 years. It&amp;rsquo;s the NLA that has changed, not me.&lt;/p&gt;
&lt;p&gt;It was suggested that if I wanted to regain my API keys, an additional series of meetings would be necessary to help me bring my work within the bounds of what is now permissible.&lt;/p&gt;
&lt;p&gt;So it seems I have a choice. Either I try to get out of API jail by submitting to the NLA&amp;rsquo;s re-education program, or I work with others in the research sector and beyond to try and change the NLA&amp;rsquo;s policy. I&amp;rsquo;m inclined to do the latter.&lt;/p&gt;
&lt;h2 id=&#34;whats-at-stake&#34;&gt;What&amp;rsquo;s at stake&lt;/h2&gt;
&lt;p&gt;Ten years ago we celebrated the Trove API because of what it made possible. Everyone was free to explore and create, to analyse changes across 100 years of digitised newspapers, to shift scales and find new meanings. One of &lt;a href=&#34;https://doi.org/10.5281/zenodo.3563238&#34;&gt;my most cited presentations&lt;/a&gt; from 2013 talks about how the API made Trove a platform we could all build upon. Openness was the key:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The more we become aware of the power of networked information, the more we become concerned with making and preserving its ‘openness’. To me open data is a process not a product – each visualisation, or interpretation can challenge our assumptions and help us to see things differently. Each use is an opening into new contexts.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Even if it does not directly impact their work, I think all researchers should be alarmed by the NLA&amp;rsquo;s turn away from openness. Experiments now will have to be approved, judged against criteria which are themselves not public. Is that what we want or expect from a major, publicly funded, cultural institution?&lt;/p&gt;
&lt;p&gt;I also fear for the future of the API itself. This change makes it easier for the NLA to impose further limits over time. Perhaps researcher access to the API will be tied to future investment from the research sector. Perhaps it will be claimed the risks are too great and the API will be shut down completely. I&amp;rsquo;m no longer confident in the NLA&amp;rsquo;s commitment to providing researchers with long-term access to Trove data.&lt;/p&gt;
&lt;h2 id=&#34;where-to-now&#34;&gt;Where to now?&lt;/h2&gt;
&lt;p&gt;I want to thank everyone who has offered their support over the last couple of weeks. It&amp;rsquo;s been deeply encouraging to hear how my work over the past 15 years is valued, and how the GLAM Workbench has helped researchers and inspired new projects. Thanks too for making your views known to the NLA.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m now very doubtful that the Trove sections of the GLAM Workbench can be made acceptable to the NLA without major changes that would severely limit their usefulness. However, I&amp;rsquo;ll continue to maintain them as best I can without an API key, and I&amp;rsquo;ll continue to help researchers with their Trove questions.&lt;/p&gt;
&lt;p&gt;To get myself back into a more positive frame of mind, I think I&amp;rsquo;ll also do some work with collections from organisations who value openness and are interested in new uses of their data. Suggestions are welcome!&lt;/p&gt;
&lt;p&gt;But as I suggested above, the most important task ahead is to start talking about the implications of these changes at the NLA, particularly for the research sector.&lt;/p&gt;
&lt;p&gt;Stay tuned&amp;hellip;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;All posts on this topic:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;25 February 2025: &lt;a href=&#34;https://updates.timsherratt.org/2025/02/24/years-of-work-on-trove.html&#34;&gt;15 years of work on Trove threatened by the NLA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2 March 2025: &lt;a href=&#34;https://updates.timsherratt.org/2025/03/02/trove-api-users-beware-the.html&#34;&gt;Trove API users beware! – the latest in the saga of my cancelled API keys&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;11 April 2025: &lt;a href=&#34;https://updates.timsherratt.org/2025/04/11/update-on-trove-data-access.html&#34;&gt;Update on Trove data access and my suspended API keys&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;7 May 2025: &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;Farewell Trove&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>15 years of work on Trove threatened by the NLA</title>
      <link>https://updates.timsherratt.org/2025/02/24/years-of-work-on-trove.html</link>
      <pubDate>Mon, 24 Feb 2025 12:02:48 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/02/24/years-of-work-on-trove.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://updates.timsherratt.org/2025/03/02/trove-api-users-beware-the.html&#34;&gt;See my latest post for an update!&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;On Friday, without warning, I received an email from the National Library of Australia informing me that my Trove API keys had been suspended. This threatens the future of 15 years of work helping people use and understand the possibilities of Trove for new types of research.&lt;/p&gt;
&lt;h2 id=&#34;whats-happened&#34;&gt;What&amp;rsquo;s happened?&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s the full text of the email:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Your recently published work on the GLAM Workbench regarding extracting metadata and text from a National e-Deposit (NED) periodical has been brought to the Library’s attention.&lt;/p&gt;
&lt;p&gt;Trove API Terms of Use specify that developers may access metadata only and do not provide extended rights. We consider the use of an API to extract and save full text as being in violation of the Terms of Use.&lt;/p&gt;
&lt;p&gt;Effective immediately, the four API keys currently registered to you: glamworkbench, headlineroulette, troveconsole and wragge will be suspended.&lt;/p&gt;
&lt;p&gt;Please feel free to get in touch for a more detailed conversation about this.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The reasons given for switching off my access don&amp;rsquo;t make any sense. While the API terms of use only mention metadata, the API, by design, delivers full text from newspapers, digitised periodicals and some books. If you interpret the terms of use as above, simply using the API &lt;strong&gt;as it has been designed and documented&lt;/strong&gt; would be seen as a breach! Surely that&amp;rsquo;s nonsense.&lt;/p&gt;
&lt;p&gt;In any case, the &lt;a href=&#34;https://glam-workbench.net/trove-ned/create-searchable-database/&#34;&gt;notebook they mention&lt;/a&gt; doesn&amp;rsquo;t even use the Trove API, so it&amp;rsquo;s hard to see how it could breach the API terms of use. I extracted the text from the periodicals simply by downloading the PDFs and using a standard PDF library. The notebook does scrape some metadata from the Trove website. This is necessary because the API has major limitations – you can&amp;rsquo;t, for example, get the members of a digitised collection. The NLA might want to argue that scraping breaches the website&amp;rsquo;s terms of use, but that&amp;rsquo;s a different point. I&amp;rsquo;d also note that I&amp;rsquo;ve been scraping data from the Trove website for 15 years without any objections (see below for more).&lt;/p&gt;
&lt;p&gt;When I was Trove manager, I drafted a previous version of the API terms of use. It was a lot less legalistic back then, and I&amp;rsquo;ve always understood that the point of the API and website terms of use were to protect the NLA from exploitation by commercial interests, not to inhibit work done by researchers in good faith.&lt;/p&gt;
&lt;p&gt;I developed the NED notebook in &lt;a href=&#34;https://updates.timsherratt.org/2025/02/19/search-the-content-of-periodicals.html&#34;&gt;response to a request for help by a community group&lt;/a&gt; that uses the National eDeposit service to preserve its newsletter. I did it for free, and I documented the results in the GLAM Workbench in case it might be of use to other communities and researchers.&lt;/p&gt;
&lt;p&gt;The &amp;lsquo;has been brought to the Library’s attention&amp;rsquo; bit is also grimly amusing. Everything I do is open, and wherever possible I tag GLAM organisations on social media to let them know I&amp;rsquo;m making use of their collections. The email makes it sound like I was trying to hide what I was doing, when in fact I tagged them on Facebook and LinkedIn. I was thought they might be interested, and I suppose they were, just not in the way I hoped.&lt;/p&gt;
&lt;h2 id=&#34;whats-at-risk&#34;&gt;What&amp;rsquo;s at risk?&lt;/h2&gt;
&lt;p&gt;What&amp;rsquo;s the consequence of switching off my API keys? A few long-running services were broken immediately. Others will continue to work, but I&amp;rsquo;ll be unable to maintain them over time. Obviously I won&amp;rsquo;t be able to develop new Trove-related resources, and perhaps most importantly, my ability to help researchers with their Trove problems will be severely limited.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-02-24-11-46-54.png&#34; width=&#34;600&#34; height=&#34;334&#34; alt=&#34;Screenshot of the Trove API console displaying an error and a notice informing users that it no longer functions sue to the NLA revoking my API key&#34;&gt;
&lt;p&gt;Tools and services that were broken immediately:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://troveconsole.herokuapp.com&#34;&gt;The Trove API Console&lt;/a&gt; – running since 2014, the API console has helped many people learn to use the API. I created it when I was Trove Manager, and it&amp;rsquo;s still linked from Trove&amp;rsquo;s &lt;a href=&#34;https://trove.nla.gov.au/about/create-something&#34;&gt;Create something&lt;/a&gt; page.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://wragge.github.io/trove-newspaper-totals/&#34;&gt;The Trove Newspaper Data Dashboard&lt;/a&gt; (and related data harvests) – running since 2022, the data dashboard enables researchers to understand how the Trove newspaper corpus changes over time. The dashboard depends on &lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals&#34;&gt;weekly data harvests&lt;/a&gt; that can no longer run, so there will be no further updates. Similarly other automated data harvests capturing and sharing data about Trove &lt;a href=&#34;https://github.com/wragge/trove-zone-totals&#34;&gt;categories&lt;/a&gt; and &lt;a href=&#34;https://github.com/wragge/trove-contributor-totals/&#34;&gt;contributors&lt;/a&gt; are now broken.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://wragge.github.io/trovenewsbot-fedi/&#34;&gt;@troveNewsBot&lt;/a&gt; – &lt;a href=&#34;https://discontents.com.au/conversations-with-collections/index.html&#34;&gt;created back in 2013&lt;/a&gt;, the bot has survived changes to the Trove API, the demise of Twitter, and a recent forced change of Mastodon instances, but it can&amp;rsquo;t work without an API key. I&amp;rsquo;m already missing it&amp;rsquo;s regular posts.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://headlineroulette.net/&#34;&gt;Headline Roulette&lt;/a&gt; – just a simple game, but a fun way to start a workshop and get people thinking about the possibilities of Trove, it&amp;rsquo;s been running since 2010.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Tools and resources that I won&amp;rsquo;t be able to update or maintain:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; includes &lt;strong&gt;more than 80 Jupyter notebooks&lt;/strong&gt; that demonstrate how to work with data from Trove. Most of these require an API key. Fortunately users will still be able to use them with their own keys, but I won&amp;rsquo;t be able to do any further development or testing. This includes tools like &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/querypic/&#34;&gt;QueryPic&lt;/a&gt; that I&amp;rsquo;ve been maintaining since 2013 and is cited in the research literature.&lt;/li&gt;
&lt;li&gt;The &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; also includes &lt;strong&gt;more than 30 datasets&lt;/strong&gt; capturing information about Trove. Some document changes in things like &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/csv-newspapers-corrections/&#34;&gt;text correction&lt;/a&gt;, and the use of &lt;a href=&#34;https://glam-workbench.net/trove-lists/trove-public-tags/&#34;&gt;tags&lt;/a&gt;, while others provide alternative entry points to important collections of digitised resources, such as &lt;a href=&#34;https://glam-workbench.net/trove-music/trove-oral-histories/&#34;&gt;oral histories&lt;/a&gt; and &lt;a href=&#34;https://glam-workbench.net/trove-maps/explore-maps/&#34;&gt;maps&lt;/a&gt;. I won&amp;rsquo;t be to update any of these datasets.&lt;/li&gt;
&lt;li&gt;The &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt; incorporates many visualisations and summaries generated from Trove data. I won&amp;rsquo;t be able to update these. There are also many API examples that link to the now broken API console.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://wragge.github.io/trove-newspaper-harvester/&#34;&gt;trove-newspaper-harvester&lt;/a&gt; is a Python package that&amp;rsquo;s existed in different forms since 2010. It&amp;rsquo;s used by researchers to create datasets of newspaper articles and has been cited a number of times in the research literature. It won&amp;rsquo;t be immediately affected because users supply their own API keys, but I won&amp;rsquo;t be able to maintain it into the future.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://troveplaces.herokuapp.com/&#34;&gt;Trove places&lt;/a&gt; is a map based interface to Trove&amp;rsquo;s newspapers. It&amp;rsquo;s dependent on data harvested using the API, so I won&amp;rsquo;t be able to update it.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Planned developments I won&amp;rsquo;t be able to undertake:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I had hoped this year to automate more data harvests so I could, for example, provide regular updates on digitised collections such as books, periodicals, and maps.&lt;/li&gt;
&lt;li&gt;I was planning to add a number of new sections to the Trove Data Guide, including maps, photos, ephemera, and manuscripts.&lt;/li&gt;
&lt;li&gt;I was intending to update the trove-newspaper-harvester to make it possible to identify and capture changes in a results set.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&amp;rsquo;m very disappointed that the automated data harvests are now broken. As I &lt;a href=&#34;https://updates.timsherratt.org/2024/09/20/preserving-the-history.html&#34;&gt;suggested in this post&lt;/a&gt;, I think it&amp;rsquo;s important that we capture information about online collections so that future researchers will be able to investigate their impact. I&amp;rsquo;ve been working to streamline, standardise, and automate this data collection, both through the weekly harvests and the &lt;a href=&#34;https://zenodo.org/communities/trove-historical-data/records&#34;&gt;Trove historical data collection in Zenodo&lt;/a&gt;. But this will now stop.&lt;/p&gt;
&lt;p&gt;Most disappointing of all, however, is that without an API key I won&amp;rsquo;t be able to help researchers who come to me asking how to get data out of Trove. In finding solutions to their problems I often end up creating new notebooks so that the knowledge can be shared and all researchers can benefit. I won&amp;rsquo;t be able to do this any more.&lt;/p&gt;
&lt;p&gt;The GLAM Workbench includes &lt;a href=&#34;https://glam-workbench.net/citations/&#34;&gt;a list of published research articles&lt;/a&gt; that cite the GLAM Workbench or one of its associated tools, such as QueryPic and the Trove Newspaper Harvester. Many of these publications have used my tools to work with data from Trove. This sort of research will suffer if the tools can&amp;rsquo;t be maintained.&lt;/p&gt;
&lt;p&gt;Of course, all of my work is openly licensed and freely available through GitHub and Zenodo. If I can&amp;rsquo;t maintain the code, hopefully others will jump in and take over.&lt;/p&gt;
&lt;h2 id=&#34;trove-and-me&#34;&gt;Trove and me&lt;/h2&gt;
&lt;p&gt;I started scraping data from the digitised newspapers in 2009, before they were even a part of Trove. In 2010, I created the first versions of QueryPic and the Trove Newspaper Harvester. There was no API then, so I built a library of screen scrapers to extract the data. I ended up publishing my own &amp;lsquo;unofficial&amp;rsquo; API using the screen scrapers. I found out later that my &amp;lsquo;unofficial &#39; API was used in the design of the official version that became available in 2012.&lt;/p&gt;
&lt;p&gt;The work I was doing analysing digitised newspapers won me the NLA&amp;rsquo;s Harold White Fellowship in 2012. In 2013, I was appointed Trove Manager. Throughout my time at the NLA I lived something of a double life – manager by day, hacker by night. I continued to build tools and demonstrations to help people understand what the API made possible. Talking about the API and the new types of research that Trove opened up was one of the favourite parts of my job.&lt;/p&gt;
&lt;p&gt;Nothing much changed after I left the library. I continued to build tools, help researchers, and &lt;a href=&#34;https://timsherratt.au/about/#invited-posts--presentations&#34;&gt;give talks and workshops&lt;/a&gt; on the possibilities of Trove data. In 2017, I started to bring a lot of this work together within the GLAM Workbench. In 2023-24, I worked with the Australian Research Data Commons to develop the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt;, documenting what I knew about Trove&amp;rsquo;s intricacies and inconsistencies.&lt;/p&gt;
&lt;p&gt;My point really is that I&amp;rsquo;ve been doing this for 15 years now. Everything has been in the open, my approach has never really changed, and some of the work was actually supported by the NLA. So what&amp;rsquo;s different now?&lt;/p&gt;
&lt;p&gt;Certainly the NLA&amp;rsquo;s attitude has changed. When I was Trove manager we used to celebrate the interesting things that people did with the Trove API. In contrast, the NLA has never publicly acknowledged that the GLAM Workbench exists, and certainly hasn&amp;rsquo;t shared any links to it. This was taken to ludicrous extremes in 2021, when the NLA&amp;rsquo;s draft project plan for funding as part of the ARDC&amp;rsquo;s HASS Research Data Commons &lt;a href=&#34;https://updates.timsherratt.org/2021/09/10/some-thoughts-on.html&#34;&gt;proposed to duplicate tools already available&lt;/a&gt; through the GLAM Workbench. Just a few months earlier in December 2020, the GLAM Workbench won the &lt;a href=&#34;https://glam-workbench.net/awards/#british-library-lab-awards-2020&#34;&gt;British Library Labs Research Award&lt;/a&gt;. It&amp;rsquo;s strange that there has been much more engagement with the GLAM Workbench from national libraries in Europe than Australia.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t know why this is, but it has been immensely frustrating, even heart-breaking. I do the work I do to help people use and understand Trove. But how do they find out about it? You&amp;rsquo;d think that the NLA would be pleased to support researchers by pointing them to tools and resources that would help them make best use of Trove. You&amp;rsquo;d think that the NLA would be thrilled to have people spending their own time and money to build and maintain those resources. But no.&lt;/p&gt;
&lt;p&gt;It seems to me that the NLA has become increasingly closed off and defensive in recent years. Perhaps that&amp;rsquo;s to be expected given the funding pressures they&amp;rsquo;ve faced. But in challenging times you&amp;rsquo;d think it was more important than ever to bring together your supporters.&lt;/p&gt;
&lt;p&gt;Much of my work does involve criticism of Trove. It&amp;rsquo;s an unwieldy beast, with many problems and inconsistencies. It&amp;rsquo;s part of my job (mission? calling?) to expose these problems and help users work around them. It wouldn&amp;rsquo;t help anyone for me to ignore Trove&amp;rsquo;s shortcomings. My criticisms come with suggestions and solutions. My aim is not to undermine, but encourage – to guide people past the many pitfalls and challenges to find the treasure within.&lt;/p&gt;
&lt;p&gt;Back in November 2016, on the day after Trump&amp;rsquo;s first election victory, I gave a short presentation at the &amp;lsquo;Digital Directions&amp;rsquo; conference in Canberra. The main point of my talk, entitled &amp;lsquo;Caring about access&amp;rsquo;, was that GLAM organisations should embrace criticism. Here&amp;rsquo;s part of what I said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Access is not something that cultural institutions bestow on a grateful public. It’s a struggle for understanding and meaning. Expect to be criticised, expect problems to be found, expect your prejudices to be exposed. That’s the point.&lt;/p&gt;
&lt;p&gt;If cultural institutions want to celebrate their website hits, celebrity visits, or their latest glossy magazines – well that’s just fabulous. But I’d like them to celebrate every flaw that’s found in their data, every gap identified in their collection – that’s engagement, that’s access. We need to get beyond defensive posturing and embrace the risky, exciting possibilities that come from critical engagement with collection data – recognising hacking as a way of knowing.&lt;/p&gt;
&lt;p&gt;In this new post-truth world it’s going to be more important than ever to challenge what is given, what is ‘natural’, what is ‘inevitable’. Our cultural heritage will be a crucially important resource to be mobilised in defence of complexity, nuance, and doubt – the rich and glorious reality of simply being human.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The early part of that 2016 was dominated by the &lt;a href=&#34;https://discontents.com.au/fundtrove/index.html&#34;&gt;#fundTrove campaign&lt;/a&gt;, when Trove users mobilised to make the government aware of its importance to the Australian community. It took over my life for a while, and while many were keen to claim credit for the campaign&amp;rsquo;s ultimate success, it left me thinking that GLAM organisations need to better understand who their real friends are – the people who actually give a shit. It seems that the NLA is still struggling with that.&lt;/p&gt;
&lt;h2 id=&#34;so-what-now&#34;&gt;So what now?&lt;/h2&gt;
&lt;p&gt;I have to admit that the NLA&amp;rsquo;s inability to acknowledge the existence of the GLAM Workbench has taken an emotional toll. At times I&amp;rsquo;ve considered giving up the work. Why bother if it&amp;rsquo;s not going to get to the people who might benefit most?&lt;/p&gt;
&lt;p&gt;So at this moment I don&amp;rsquo;t feel like arguing with the NLA. If they think so little of my work that they&amp;rsquo;re happy to simply pull the plug and let it die, then what&amp;rsquo;s the point in trying to continue?&lt;/p&gt;
&lt;p&gt;However, there&amp;rsquo;s a bigger issue. Whatever happens to my work, it&amp;rsquo;s important that this &lt;em&gt;type&lt;/em&gt; of work be encouraged and supported. Trove offers immense possibilities for new types of research and we need to explore and document them together. Central to this is a well-supported API. I&amp;rsquo;m worried that this little battle is actually a sign of waning commitment to the API and what it represents. Earlier this year I was shocked when the NLA suddenly decommissioned version 2 of the API without fixing &lt;a href=&#34;https://github.com/GLAM-Workbench/trove-api-intro/issues/49&#34;&gt;major bugs in version 3&lt;/a&gt;. I think we need to stress that easy access to Trove data is vitally important to the future of Australian HASS research.&lt;/p&gt;
&lt;p&gt;So if you&amp;rsquo;ve used any of my tools or resources, or value the work I&amp;rsquo;ve been doing over the last 15 years, you might like to tell the NLA about it. I don&amp;rsquo;t know if it&amp;rsquo;ll make any difference, but at least they&amp;rsquo;ll be better informed about the sorts of things people are doing with Trove data, and the types of resources that are needed to support them.&lt;/p&gt;
&lt;p&gt;Contact options:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://librariesaustraliaref.nla.gov.au/reft100.aspx?key=Trove_Feedback&#34;&gt;Trove feedback form&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Marie-Louise Ayres, Director-General of the NLA (&lt;a href=&#34;mailto:directorgeneral@nla.gov.au&#34;&gt;directorgeneral@nla.gov.au&lt;/a&gt;)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Tony Burke, Minister for the Arts (&lt;a href=&#34;mailto:tony.burke.mp@aph.gov.au&#34;&gt;tony.burke.mp@aph.gov.au&lt;/a&gt;)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course you can also share your thoughts on social media!&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;All posts on this topic:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;25 February 2025: &lt;a href=&#34;https://updates.timsherratt.org/2025/02/24/years-of-work-on-trove.html&#34;&gt;15 years of work on Trove threatened by the NLA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;2 March 2025: &lt;a href=&#34;https://updates.timsherratt.org/2025/03/02/trove-api-users-beware-the.html&#34;&gt;Trove API users beware! – the latest in the saga of my cancelled API keys&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;11 April 2025: &lt;a href=&#34;https://updates.timsherratt.org/2025/04/11/update-on-trove-data-access.html&#34;&gt;Update on Trove data access and my suspended API keys&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;7 May 2025: &lt;a href=&#34;https://updates.timsherratt.org/2025/05/07/farewell-trove.html&#34;&gt;Farewell Trove&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>The Primary Source – GLAM collection news and help</title>
      <link>https://updates.timsherratt.org/2025/02/20/the-primary-source-glam-collection.html</link>
      <pubDate>Thu, 20 Feb 2025 16:31:54 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/02/20/the-primary-source-glam-collection.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve created a new site (or in fact, renovated an old site) to aggregate news from GLAM collections (that&amp;rsquo;s galleries, libraries, archives, and museums) and help researchers using those collections. It&amp;rsquo;s called &lt;a href=&#34;https://ozglam.chat/&#34;&gt;The Primary Source&lt;/a&gt; which is a bit of a bad history pun.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-02-20-16-27-47.png&#34; width=&#34;600&#34; height=&#34;423&#34; alt=&#34;Screen capture of the new Primary Source&#34;&gt;
&lt;h2 id=&#34;why-is-is-needed&#34;&gt;Why is is needed?&lt;/h2&gt;
&lt;p&gt;Before the nazi takeover of the old bird site, I had a list of GLAM organisation accounts which made it pretty easy to follow what was going on in Australia&amp;rsquo;s galleries, libraries, archives, and museums. Things are more fragmented now and surviving social media accounts seem dominated by event promotion, cute videos, and cultural heritage clickbait. There are a few blogs (though apparently the fashion is to call them &amp;lsquo;stories&amp;rsquo;), but functioning RSS feeds are rare. How can researchers find out about new GLAM collections or resources across Australia? Hopefully &lt;em&gt;The Primary Source&lt;/em&gt; can help by aggregating collection news from a variety of platforms.&lt;/p&gt;
&lt;p&gt;At the same time, I wanted to provide a space where researchers can share their latest work using GLAM collections, and ask for help when needed. Unfortunately, GLAM social media accounts (with a few exceptions) rarely share the work of researchers outside their own fellowship and events programs. &lt;em&gt;The Primary Source&lt;/em&gt; is built on top of the Discourse discussion platform, so anyone can create an account and contribute.&lt;/p&gt;
&lt;h2 id=&#34;discussion-categories&#34;&gt;Discussion categories&lt;/h2&gt;
&lt;p&gt;At the moment there are four main categories:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://ozglam.chat/c/collection-news/6&#34;&gt;Collection news&lt;/a&gt;: News about recent additions or updates to Australian GLAM collections&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://ozglam.chat/c/useful-resources/11&#34;&gt;Useful resources&lt;/a&gt;: Resources that help people use and understand Australian GLAM collections – digital tools, finding aids, webinars etc.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://ozglam.chat/c/help-needed/5&#34;&gt;Help needed&lt;/a&gt;: If you’re having a problem using Australian GLAM collections, ask for help here. No question is too basic.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://ozglam.chat/c/research-updates/9&#34;&gt;Research updates&lt;/a&gt;: Share news from your own research projects.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&amp;rsquo;m trying to keep things pretty simple, but ideas for new categories are welcome.&lt;/p&gt;
&lt;h2 id=&#34;how-to-contribute&#34;&gt;How to contribute&lt;/h2&gt;
&lt;p&gt;As mentioned, anyone can create an account and start contributing to &lt;em&gt;The Primary Source&lt;/em&gt;. You can manually add posts using the Discourse interface, or you can take advantage of the site&amp;rsquo;s automated Zotero-bot. If you come across an interesting post or resource on the web, simply use &lt;a href=&#34;https://www.zotero.org/&#34;&gt;Zotero&lt;/a&gt; to save it to the appropriate category in the &lt;a href=&#34;https://www.zotero.org/groups/5835691/primary_source_contributions&#34;&gt;Primary Source contributions Zotero group&lt;/a&gt;. Every 30 minutes, the Zotero-bot will check the group and add any new links to &lt;em&gt;The Primary Source&lt;/em&gt;. If you add tags or comments to a link in Zotero, they&amp;rsquo;ll be attached to the new Discourse post. To use the Zotero-bot, all you need to do is join the &lt;a href=&#34;https://www.zotero.org/groups/5835691/primary_source_contributions&#34;&gt;Zotero group&lt;/a&gt;, then it&amp;rsquo;ll pop up in your list of group libraries.  Once again, membership is open.&lt;/p&gt;
&lt;p&gt;Obviously any content that is offensive or off-topic will be removed.&lt;/p&gt;
&lt;h2 id=&#34;staying-in-touch&#34;&gt;Staying in touch&lt;/h2&gt;
&lt;p&gt;To keep up-to-date with the latest posts from &lt;em&gt;The Primary Source&lt;/em&gt; you can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Create an account at &lt;em&gt;The Primary Source&lt;/em&gt;. Once you’ve done that,  you’ll receive a weekly email with posts you might have missed. (Of  course you can disable this if you don’t want more emails!)&lt;/li&gt;
&lt;li&gt;Add the &lt;em&gt;The Primary Source&lt;/em&gt; RSS feed to your preferred feed reader: &lt;a href=&#34;https://ozglam.chat/latest.rss&#34;&gt;ozglam.chat/latest.rs&amp;hellip;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Follow &lt;a href=&#34;https://wraggebots.net/@primarysourcebot&#34;&gt;@primarysourcebot&lt;/a&gt; on Mastodon. The bot checks for new posts every hour and shares them through the fediverse.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;third-time-lucky&#34;&gt;Third time lucky?&lt;/h2&gt;
&lt;p&gt;The first version of &lt;em&gt;The Primary Source&lt;/em&gt; was created way back in 1998. Then, as now, I wanted to help researchers find and use collections from Australia&amp;rsquo;s archives and libraries. There weren&amp;rsquo;t many content management systems around back then, so I rolled my own using PHP. It was a pain to maintain, and life went elsewhere, so it didn&amp;rsquo;t survive very long. I&amp;rsquo;ve always thought the name was pretty clever though&amp;hellip;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-02-20-16-23-38.png&#34; width=&#34;600&#34; height=&#34;842&#34; alt=&#34;Screenshot of The Primary Source circa 1998&#34;&gt;
&lt;p&gt;The pandemic made me think again about ways of supporting researchers, so I created the OzGLAM help discussion board. Even though there wasn&amp;rsquo;t much activity, I&amp;rsquo;ve always believed something like this was needed. So with a bit of remodelling and renovation, OzGLAM help has been flipped to make &lt;em&gt;The Primary Source&lt;/em&gt;. Will it survive this time? I suppose that&amp;rsquo;s a matter of whether you find it useful and valuable. If you do, &lt;strong&gt;please share with your friends and colleagues&lt;/strong&gt;!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>National Archives of Australia Digitisation Dashboard</title>
      <link>https://updates.timsherratt.org/2025/02/20/national-archives-of-australia-digitisation.html</link>
      <pubDate>Thu, 20 Feb 2025 14:41:46 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/02/20/national-archives-of-australia-digitisation.html</guid>
      <description>&lt;p&gt;Since March 2021, I&amp;rsquo;ve been &lt;a href=&#34;https://github.com/wragge/naa-recently-digitised&#34;&gt;harvesting details of newly-digitised files&lt;/a&gt; in the National Archives of Australia to help document long-term changes to online access. A few weeks ago, I &lt;a href=&#34;https://updates.timsherratt.org/2025/01/27/files-digitised-by-the-national.html&#34;&gt;summarised the data from 2024&lt;/a&gt;, and published &lt;a href=&#34;https://doi.org/10.5281/zenodo.14744049&#34;&gt;annual compilations in Zenodo&lt;/a&gt;. I&amp;rsquo;ve now created an &lt;a href=&#34;https://wragge.github.io/naa-recently-digitised/&#34;&gt;automatically-updated dashboard&lt;/a&gt; which displays digitisation progress in the past week, the current year, and since my harvests began.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-02-20-14-35-04.png&#34; width=&#34;600&#34; height=&#34;477&#34; alt=&#34;Screenshot of digitisation dashboard showing details from the past week.&#34;&gt;
&lt;p&gt;Each week, after the latest data harvest, a GitHub action runs a Jupyter notebook that pulls in the data, generates some visualisations and summaries, and saves the results as an HTML page. It&amp;rsquo;s similar to the &lt;a href=&#34;https://wragge.github.io/trove-newspaper-totals/&#34;&gt;Trove newspaper data dashboard&lt;/a&gt;. Check in every Sunday afternoon to see what&amp;rsquo;s changed!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Search the content of periodicals uploaded to Trove through the National eDeposit service </title>
      <link>https://updates.timsherratt.org/2025/02/19/search-the-content-of-periodicals.html</link>
      <pubDate>Wed, 19 Feb 2025 15:12:41 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/02/19/search-the-content-of-periodicals.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve &lt;a href=&#34;https://glam-workbench.net/trove-ned/create-searchable-database/&#34;&gt;added a notebook&lt;/a&gt; to the GLAM Workbench that walks through the steps involved in creating a fully searchable database of content extracted from a periodical uploaded to &lt;a href=&#34;https://trove.nla.gov.au/&#34;&gt;Trove&lt;/a&gt; through the &lt;a href=&#34;https://ned.gov.au/ned/&#34;&gt;National eDeposit service (NED)&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;why-is-this-needed&#34;&gt;Why is this needed?&lt;/h2&gt;
&lt;p&gt;I was contacted recently by a member of the team that publishes &lt;em&gt;&lt;a href=&#34;https://nla.gov.au/nla.obj-3121636851&#34;&gt;The Triangle&lt;/a&gt;&lt;/em&gt;, a community newsletter from the south coast of NSW. Issues of &lt;em&gt;The Triangle&lt;/em&gt; from 2007 to the present have been uploaded to Trove through the National eDeposit service, but they were wondering whether it was possible to search &lt;em&gt;across&lt;/em&gt; all their newsletters in Trove. Unfortunately, the answer is no.&lt;/p&gt;
&lt;p&gt;Issues of &lt;em&gt;The Triangle&lt;/em&gt; are saved in Trove as PDFs with a searchable text layer. Individual issues can be browsed and searched using the built-in PDF viewer, but there seems to be no way of searching across multiple issues in Trove. There&amp;rsquo;s a couple of reasons for this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The text content doesn&amp;rsquo;t seem to have been indexed. Trove has the ability to index text from PDFs, though there&amp;rsquo;s a limit to how much text it consumes. Things like articles from institutional repositories, for example, will have some or all of their content indexed. It&amp;rsquo;s hard to be certain, but searches for content within &lt;em&gt;The Triangle&lt;/em&gt; don&amp;rsquo;t work, so I&amp;rsquo;m assuming they&amp;rsquo;re not being indexed.&lt;/li&gt;
&lt;li&gt;Even if they were indexed, there are no work level records for articles or issues within NED periodicals. So if there was a match on content within an issue, your search results would return the top-level title record, and you&amp;rsquo;d still have to search each issue individually to find the match. Compare this to the way Trove&amp;rsquo;s digitised periodicals are indexed at article level in the &lt;em&gt;Magazines &amp;amp; Newsletters&lt;/em&gt; category.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On top of this, Trove&amp;rsquo;s inconsistency and lack of transparency means that you&amp;rsquo;re never really sure what you&amp;rsquo;re searching and why. Why do some PDFs get indexed, but others don&amp;rsquo;t? Why are community newsletters contributed through NED in the &amp;lsquo;Books &amp;amp; Libraries&amp;rsquo; category rather than &amp;lsquo;Magazines &amp;amp; Newsletters&amp;rsquo;? Why are some periodicals searchable by article while others are not? I&amp;rsquo;m trying to document many of these inconsistencies in the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt; as they cause confusion and uncertainty for users – if your search returns no results, is it because Trove has no relevant content, or is it because the relevant content isn&amp;rsquo;t fully searchable? You just don&amp;rsquo;t know.&lt;/p&gt;
&lt;h2 id=&#34;why-is-this-important&#34;&gt;Why is this important?&lt;/h2&gt;
&lt;p&gt;I recently updated &lt;a href=&#34;https://github.com/GLAM-Workbench/trove-ned-periodicals-data&#34;&gt;my harvest of periodicals submitted to Trove through the National eDeposit Service&lt;/a&gt;. In total, there&amp;rsquo;s &lt;strong&gt;8,572 different periodicals containing 179,510 issues&lt;/strong&gt;! I used the &lt;code&gt;l-format=Periodical&lt;/code&gt; facet to separate out the periodicals from other types of publications, and I don&amp;rsquo;t think it&amp;rsquo;s always accurate – some of the publications in the dataset look like one-off reports. Nonetheless, there are &lt;em&gt;lots&lt;/em&gt; of periodicals. As &lt;a href=&#34;https://updates.timsherratt.org/2024/04/10/getting-to-know.html&#34;&gt;I&amp;rsquo;ve noted previously&lt;/a&gt;, this includes a rich assortment of local and community newsletters – not just &lt;em&gt;The Triangle&lt;/em&gt;, but &lt;em&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-2815835489&#34;&gt;The Apollo Bay News&lt;/a&gt;&lt;/em&gt;, &lt;em&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-1252246096&#34;&gt;Palm Island Voice&lt;/a&gt;&lt;/em&gt;, and many, many others. As local newspapers die out, these sorts of publications capture details of community life that might otherwise be missing from the historical record. Just as the diversity of Trove&amp;rsquo;s digitised newspapers have given historians new perspectives on the past, I believe these NED periodicals will provide researchers with an increasingly important source of information on daily life in Australia. Equally, having these publications accessible online through a free, national service like Trove ensures that communities themselves will have ongoing access to their own histories. But both for researchers and communities, the value of the publications will be affected by their accessibility – how can they be found, searched, and used?&lt;/p&gt;
&lt;h2 id=&#34;one-solution--a-diy-search-index&#34;&gt;One solution – a DIY search index&lt;/h2&gt;
&lt;p&gt;If you can&amp;rsquo;t search across a NED periodical within Trove, perhaps we need to develop alternative approaches outside of Trove. Using &lt;em&gt;The Triangle&lt;/em&gt; as my test case, I&amp;rsquo;ve developed a workflow that creates a standalone, fully searchable database of content from a NED periodical. The database supports full text searches across the complete text content, including query options like wildcards and boolean operators that don&amp;rsquo;t work within standard PDF viewers. Have a go! You can try &lt;a href=&#34;https://glam-workbench.net/datasette-lite-search/?url=https://github.com/GLAM-Workbench/trove-ned-periodicals/blob/main/dbs/the-triangle/the-triangle.db&amp;amp;metadata=https://github.com/GLAM-Workbench/trove-ned-periodicals/blob/main/dbs/the-triangle/metadata.json&#34;&gt;searching The Triangle&lt;/a&gt; here.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-02-19-14-44-59.png&#34; width=&#34;600&#34; height=&#34;363&#34; alt=&#34;Screenshot of the Triangle search interface&#34;&gt;
&lt;p&gt;There are a number of different steps involved in creating databases like this from NED periodicals:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;harvest basic metadata about all the issues&lt;/li&gt;
&lt;li&gt;download the PDFs of every issue&lt;/li&gt;
&lt;li&gt;extract the text for each page in the PDFs&lt;/li&gt;
&lt;li&gt;create an SQLite database containing the metadata and text for each page&lt;/li&gt;
&lt;li&gt;build a full-text index on the text content to allow easy searching&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is fully documented in the GLAM Workbench notebook &lt;a href=&#34;https://glam-workbench.net/trove-ned/create-searchable-database/&#34;&gt;Create a searchable database from issues of a NED periodical&lt;/a&gt;. Once you&amp;rsquo;ve created the database you can explore it using any SQLite client, but I like to use &lt;a href=&#34;https://datasette.io/&#34;&gt;Datasette&lt;/a&gt;. The notebook creates a custom metadata file for configuring Datasette, and explains how to open the database using either Datasette or Datasette-Lite.&lt;/p&gt;
&lt;h2 id=&#34;customising-datasette-for-easy-searching&#34;&gt;Customising Datasette for easy searching&lt;/h2&gt;
&lt;p&gt;The standard Datasette interface can look a little intimidating if you just want to run a full text search. To make it easier, I&amp;rsquo;ve developed a customised Datasette theme and canned query that generates a simple search page with a few extra features such as date facets, result snippets, and query highlighting. It&amp;rsquo;s basically designed to look like a standard search interface. Sometimes simple takes a bit of extra work!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-02-19-13-59-07.png&#34; width=&#34;600&#34; height=&#34;445&#34; alt=&#34;Screenshot of the customised search interface showing search results from The Triangle&#34;&gt;
&lt;p&gt;The canned query defines the search parameters and constructs the SQL query called by the search box. It&amp;rsquo;s automatically included in the &lt;code&gt;metadata.json&lt;/code&gt; file created by the notebook. To put everything together, you just need to point Datasette to your database, the metadata file, and the custom template.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve embedded the template in a &lt;a href=&#34;https://github.com/GLAM-Workbench/datasette-lite-search/&#34;&gt;new Datasette-Lite repository&lt;/a&gt;. The notebook explains how to construct a url that will open your database using this repository. &lt;a href=&#34;https://glam-workbench.net/datasette-lite-search/?url=https://github.com/GLAM-Workbench/trove-ned-periodicals/blob/main/dbs/the-triangle/the-triangle.db&amp;amp;metadata=https://github.com/GLAM-Workbench/trove-ned-periodicals/blob/main/dbs/the-triangle/metadata.json&#34;&gt;The Triangle search interface&lt;/a&gt; runs using this customised version of Datasette-Lite.&lt;/p&gt;
&lt;h2 id=&#34;a-new-ned-section-for-the-glam-workbench&#34;&gt;A new NED section for the GLAM Workbench&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve also created a new &lt;a href=&#34;https://glam-workbench.net/trove-ned/&#34;&gt;Trove NED section&lt;/a&gt; of the GLAM Workbench. The notebook that &lt;a href=&#34;https://glam-workbench.net/trove-ned/harvest-ned-periodicals/&#34;&gt;harvests metadata about NED periodicals&lt;/a&gt; was previously in the Trove periodicals section, but I think its better to keep the NED documentation separate. When I first started developing the GLAM Workbench, it made sense for the structure to mirror Trove&amp;rsquo;s zones. But over the years I&amp;rsquo;ve become &lt;em&gt;very&lt;/em&gt; aware of the fact that the way content is treated in Trove has less to do with its format than the processing pipelines that get it into Trove. In Trove, digitised newspapers are very different to digitised journals, and digitised journals are very different to NED periodicals, even though they&amp;rsquo;re all really periodicals! There&amp;rsquo;s more of this sort of fun in the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;please-share&#34;&gt;Please share!&lt;/h2&gt;
&lt;p&gt;Trove and the National Library of Australia refuse to share links relating to the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; or the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt;, so there&amp;rsquo;s a good chance that the people who might benefit most directly from this work will never know that it exists, and will instead be left struggling on their own. I think that&amp;rsquo;s bad for Australian HASS research. So if it seems this might be useful, please share amongst your networks!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Ten years of data! The files you&#39;re not allowed to see in the National Archives of Australia</title>
      <link>https://updates.timsherratt.org/2025/02/05/ten-years-of-data-the.html</link>
      <pubDate>Wed, 05 Feb 2025 13:26:30 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/02/05/ten-years-of-data-the.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve created a &lt;a href=&#34;https://doi.org/10.5281/zenodo.14769171&#34;&gt;new dataset&lt;/a&gt; containing 10 years of data that can be used to explore the workings of the National Archives of Australia&amp;rsquo;s access examination system.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/screenshot-from-2025-02-05-13-23-15.png&#34; width=&#34;600&#34; height=&#34;377&#34; alt=&#34;Screenshot of dataset record in Zenodo&#34;&gt;
&lt;p&gt;Australian government records become available for public access after 20 years. But before being opened to the public, records go through a  process known as access examination to determine whether they should be withheld, either partially or completely. The grounds for exemption are laid out in the &lt;em&gt;Archives Act&lt;/em&gt; and include things like national security and personal privacy. If a record is completely withheld from access, the NAA&amp;rsquo;s database, RecordSearch, records its access status as &amp;lsquo;closed&amp;rsquo;.&lt;/p&gt;
&lt;p&gt;On or about 1 January every year since 2016, I&amp;rsquo;ve harvested details of files in RecordSearch with the access status of &amp;lsquo;closed&amp;rsquo;. On the day when the media is full of revelations from the public release of the latest batch of cabinet records, I thought it was important to find out what we couldn&amp;rsquo;t see, as well as what we could. I&amp;rsquo;ve now published all the annual harvests as a &lt;a href=&#34;https://doi.org/10.5281/zenodo.14769171&#34;&gt;dataset on Zenodo&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s important to note that records can be re-examined and their access status can change. Also some &amp;lsquo;closed&amp;rsquo; files are actually &amp;lsquo;withheld  pending advice&amp;rsquo; – in these cases a final access decision hasn&amp;rsquo;t been made as the NAA has referred the files to their controlling agencies for advice. This means that this dataset should be treated as providing annual snapshots of an active system, not a cumulative record of closed files. Some of the complexities of the access examination system revealed by this data are discussed in the &lt;em&gt;Inside Story&lt;/em&gt; article &lt;a href=&#34;https://insidestory.org.au/withheld-pending-advice/&#34;&gt;&amp;lsquo;Withheld pending advice&amp;rsquo;&lt;/a&gt;. I&amp;rsquo;m hoping to do some more analysis later this year.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>A Community Data Lab (CDL) wishlist</title>
      <link>https://updates.timsherratt.org/2025/01/31/a-community-data-lab-cdl.html</link>
      <pubDate>Fri, 31 Jan 2025 12:40:52 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/01/31/a-community-data-lab-cdl.html</guid>
      <description>&lt;p&gt;The ARDC is holding &lt;a href=&#34;https://ardc.edu.au/event/help-shape-the-ardc-community-data-lab-for-hass-and-indigenous-research/&#34;&gt;an event on 18 February&lt;/a&gt; to begin shaping the next phase of the &lt;a href=&#34;https://ardc.edu.au/project/ardc-community-data-lab/&#34;&gt;Community Data Lab&lt;/a&gt;. If you&amp;rsquo;re interested in the development of digital tools and resources to support HASS research, I&amp;rsquo;d suggest you go along.&lt;/p&gt;
&lt;p&gt;I worked on the first phase of the Community Data Lab, developing the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt; amongst other things. I&amp;rsquo;m very keen to see the CDL expand, working with researchers to create new possibilities for digital research, particularly using the rich collections of the GLAM sector (galleries, libraries, archives, and museums). As planning gets underway for the next phase of the CDL, I thought I&amp;rsquo;d pull together some rough ideas about what the CDL might be and might do. The ARDC needs co-investment in its projects, so new initiatives have to have some form of institutional support to become part of the CDL. Nonetheless, I think it&amp;rsquo;s useful to continue to think more broadly about HASS research infrastructure needs and possibilities – both long-term requirements and short-term interventions.&lt;/p&gt;
&lt;h2 id=&#34;extending-co-design-throughout-the-life-of-the-cdl&#34;&gt;Extending co-design throughout the life of the CDL&lt;/h2&gt;
&lt;p&gt;Currently CDL co-design processes are focused on the initial design phase of a project and are structured around a plan developed from the institutional partners&#39; interests and priorities. It can be difficult to relate specific research tasks and needs to these larger-scale initiatives. How can co-design processes be embedded in a project&amp;rsquo;s ongoing development?&lt;/p&gt;
&lt;p&gt;One possibility is that CDL projects could have shortish development cycles with co-design processes before each cycle to refine the scope and priorities. This would increase opportunities for community participation and feedback, but the process would still be focused on the needs of the project rather than the needs of researchers.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d like to see an alternative (or additional) component that operates like a CDL &amp;lsquo;help desk&amp;rsquo; or &amp;lsquo;triage&amp;rsquo; service. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a researcher outlines a specific problem or need&lt;/li&gt;
&lt;li&gt;if useful tools/datasets/documentation already exist, CDL project staff point the researcher towards them&lt;/li&gt;
&lt;li&gt;if there&amp;rsquo;s nothing to meet this specific need, some basic analysis is done to see where it might fit in the CDL universe and how much work might be required&lt;/li&gt;
&lt;li&gt;for small-scale solutions within current capacities, simply create the new resource (eg. a piece of documentation, a single notebook, a slice of an existing dataset)&lt;/li&gt;
&lt;li&gt;for larger-scale solutions, document the gap/need to feed into future CDL design processes&lt;/li&gt;
&lt;li&gt;add all of the information – problem, requirements, solution – to a searchable knowledgebase&lt;/li&gt;
&lt;li&gt;the knowledgebase should allow user comments/links and voting on unsolved problems to help identify future priorities&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some of the advantages to this approach would be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;direct engagement with CDL at any time when the researcher identifies a specific problem, not when allowed by the CDL&amp;rsquo;s co-design timetable and framework&lt;/li&gt;
&lt;li&gt;recognises that many problems can be solved with existing tools and resources, both inside and outside of the CDL&lt;/li&gt;
&lt;li&gt;encourages direct researcher engagement through the knowledgebase, bringing in broader community perspectives and suggestions, as well as information about related projects and tools&lt;/li&gt;
&lt;li&gt;helps to put the &amp;lsquo;community&amp;rsquo; in Community Data Lab&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;connecting-things-up&#34;&gt;Connecting things up&lt;/h2&gt;
&lt;p&gt;There&amp;rsquo;s a lot of work to be done connecting up existing tools and data sources. This requires a combination of documentation and the development of small-scale tools or plugins to transform and move data as required. This sort of work fills in the gaps of existing research infrastructures helping to ensure that best use is made of existing investments, and encouraging re-use and collaboration instead of duplication or reinvention. It could be integrated with the ‘help desk’ function described above.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://tdg.glam-workbench.net/pathways/index.html&#34;&gt;pathways/tutorials in the Trove Data Guide&lt;/a&gt; are all examples of this, indeed, the development of additional Data Guides around particular collections or collection types could help fill a lot of these gaps.&lt;/p&gt;
&lt;p&gt;Similarly, I think it’s important to put CDL projects within a broader context, so researchers are guided to solutions that best meet their needs.&lt;/p&gt;
&lt;p&gt;For example, I&amp;rsquo;m a big fan of Omeka. I gave an Omeka workshop at THATCamp Melbourne way back in 2011, I created a &lt;a href=&#34;https://wragge.github.io/omeka_s_tools/&#34;&gt;Python client for interacting with the Omeka-S API&lt;/a&gt;, and I&amp;rsquo;ve used Omeka in a number of projects. But just as not every blog needs WordPress, not every online collection needs Omeka. There are alternative pathways that don&amp;rsquo;t require the full technology stack and might better suit the needs of individual researchers. It&amp;rsquo;s important to keep these alternatives in mind and provide documentation that will enable researchers to make decisions about what&amp;rsquo;s best for their project.&lt;/p&gt;
&lt;p&gt;For example, if you want to share data (even relational databases) in a form that other users can search and explore, then Datasette might be a better option. Even within the Datasette ecosystem there are a variety of deployment possibilities. Datasette-Lite runs wholly within the browser, all you need is a GitHub repository (or similar) to point at. Most of the CSV files that I share through the GLAM Workbench have an option to explore using Datasette-Lite. All I do is create a url that points the CSV file at my existing Datasette-Lite repository. I&amp;rsquo;ve now created a &lt;a href=&#34;https://updates.timsherratt.org/2024/07/19/share-your-spreadsheet.html&#34;&gt;simple tool to help others create these links&lt;/a&gt;. You can even add full text indexes and embed images. Because Datasette-Lite uses static technologies, the maintenance overheads are minimised.&lt;/p&gt;
&lt;p&gt;If your dataset is too large or complex for Datasette-Lite, you can use a standard Datasette instance. It&amp;rsquo;s easy to scale. I have one Datasette instance that contains &lt;a href=&#34;https://updates.timsherratt.org/2024/08/26/more-datasets-added.html&#34;&gt;11 million rows of data, in 279 datasets, from 10 different GLAM organisations&lt;/a&gt; running on Google Cloudrun. The &lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/tasmanian-post-office-directories/&#34;&gt;Tasmanian Post Office Directories&lt;/a&gt; are also in Datasette and includes a IIIF viewer embedded in a customised theme.&lt;/p&gt;
&lt;p&gt;Similarly, if a researcher&amp;rsquo;s main aim is to create a website that displays an annotated collection, then a static option, like &lt;a href=&#34;https://collectionbuilder.github.io/&#34;&gt;CollectionBuilder&lt;/a&gt;, might be all they need.&lt;/p&gt;
&lt;p&gt;Using static technologies was one of the patterns highlighted by the &lt;a href=&#34;https://doi.org/10.5281/zenodo.11169744&#34;&gt;CDL architecture and principles documents&lt;/a&gt;, and I think it would be good to get into a &amp;lsquo;static first&amp;rsquo; mindset, only implementing more complex solutions once the need had been established.&lt;/p&gt;
&lt;p&gt;That said, I also think that investment in the development of additional, openly licensed plugins, themes, resource templates, and vocabularies for use with Omeka-S would also be very useful and welcome.&lt;/p&gt;
&lt;h2 id=&#34;zotero-integration&#34;&gt;Zotero integration&lt;/h2&gt;
&lt;p&gt;An example of the small-scale interventions that can be made to mobilise existing data sources is the development of Zotero translators to extract metadata and images from GLAM collection databases. Using Zotero, researchers can collaborate in the creation and annotation of specialised datasets. Once they&amp;rsquo;ve saved a collection of resources to Zotero, they can use the public API to move the data into other tools for analysis or sharing. For example, at the moment I&amp;rsquo;m working on a collection of newspaper articles and extracts from Tasmanian Post Office Directories which have been saved and annotated in Zotero by &lt;a href=&#34;https://everydayheritage.au/blog/from-the-archive-uncovering-the-everyday-heritage-of-chinese-tasmanians/&#34;&gt;members of the EveryDay Heritage team&lt;/a&gt;. Using the API I&amp;rsquo;m downloading the data, sorting the annotations, and populating an Omeka-S instance with details of sources, people, and places.&lt;/p&gt;
&lt;p&gt;The problem is that support for Zotero by GLAM collection databases is patchy at best. I&amp;rsquo;ve &lt;a href=&#34;https://docs.google.com/spreadsheets/d/1Zb_e9ZazP4zs-K8ZcbaTCnv6cO_OgmUx4A7U4MFyOFE/edit?usp=sharing&#34;&gt;created a spreadsheet to summarise the current situation&lt;/a&gt;. Few GLAM institutions embed anything beyond Facebook approved metadata, so translators (little bits of Javascript code) are often necessary to feed rich data to Zotero. There are currently 7 custom translators available for GLAM collections including the National Archives of Australia, PROV, and Trove.&lt;/p&gt;
&lt;p&gt;Improving Zotero integration would open up new GLAM collections to digital research. Coordinated effort in the development of new translators, and best-practice documentation for supporting Zotero, would also help GLAM organisations understand what they can do to expand use of their collections.&lt;/p&gt;
&lt;p&gt;The CDL could play a coordinating role in this, rather than do all the work. For example, it could share updates, compile documentation, and perhaps organise a GLAM hack style event to create/update as many translators as possible. It would be a low cost, but high impact initiative.&lt;/p&gt;
&lt;h2 id=&#34;collections-as-data&#34;&gt;Collections as data&lt;/h2&gt;
&lt;p&gt;The Australian GLAM landscape is littered with decommissioned APIs, dead open data portals, datasets created for some hack event that never get updated, and disappeared labs. But at the same time, there are &lt;a href=&#34;https://glam-workbench.net/glam-datasets-from-gov-portals/&#34;&gt;hundreds of open datasets&lt;/a&gt; shared by GLAM organisations through government data portals that are little acknowledged. In organisations with diminishing resources, shifting priorities, and internal jealousies, it’s difficult to maintain a persuasive argument around the importance of ‘collections as data’ for research. Perhaps the CDL can help.&lt;/p&gt;
&lt;p&gt;Ultimately, it would be great for the CDL itself to incorporate some sort of national, collaborative GLAM Lab, but I think we should start by helping to develop the ammunition that institutions need to argue for their involvement in such an initiative. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Documenting needs&lt;/strong&gt; – What research might happen if particular datasets or tools were available? This is the sort of thing that could be captured through the ‘help desk’. These needs would be shared and discussed, and fed into co-design sessions with GLAM organisations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Understanding use&lt;/strong&gt; – What research is being done using GLAM collections in Australia and elsewhere? What types of data? What types of tools? Obviously there will be examples from CDL projects &amp;amp; elsewhere.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Developing skills&lt;/strong&gt; – Providing opportunities for GLAM staff to develop their own digital research skills. Where CDL projects make use of GLAM collections, invite organisations to nominate staff for active participation in the development processes (like internships).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Creating examples&lt;/strong&gt; – identify possible GLAM data related developments within CDL projects and pursue small-scale collaborations. For example in relation to government data.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Identifying opportunities&lt;/strong&gt; – Which GLAM organisations are in a position to participate now? In particular, identify data assets (like PROV API or datasets in government portals) that are already available but not well documented or understood.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sharing solutions&lt;/strong&gt; – Many of the ‘connecting things up’ solutions are likely to involve moving or transforming GLAM data. Zotero integrations would benefit researchers and organisations.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Much of this would not be ‘extra’ work, but involve the coordination and packaging of existing CDL activities – a GLAM window onto CDL activities. Though I suppose there would need to be a coordinating role.&lt;/p&gt;
&lt;p&gt;Obviously the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; would also have a role in this as a repository for tools and examples. It’d be great if the CDL could work with GLAM organisations to &lt;a href=&#34;https://glam-workbench.net/get-involved/developing-repositories/&#34;&gt;develop their own sections/repositories&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Files digitised by the National Archives of Australia in 2024</title>
      <link>https://updates.timsherratt.org/2025/01/27/files-digitised-by-the-national.html</link>
      <pubDate>Mon, 27 Jan 2025 14:16:09 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/01/27/files-digitised-by-the-national.html</guid>
      <description>&lt;p&gt;In 2024, the National Archives of Australia digitised 254,953 files (down from 416,602 in 2023). This chart shows the number of files digitised per day in 2024.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/visualization20.png&#34; width=&#34;600&#34; height=&#34;254&#34; alt=&#34;Line chart showing the number of files digitised by day in 2024.&#34;&gt;
&lt;p&gt;The decrease in the total number of files digitised is probably related to the completion of the &lt;a href=&#34;https://www.naa.gov.au/explore-collection/defence-and-war-service-records/digitising-second-world-war-service-records&#34;&gt;NAA&amp;rsquo;s five year project to digitise Second World War service records&lt;/a&gt;. Thanks to $10 million in government funding, the NAA has digitised more than a million service records since 2019. In 2023, 81% of records digitised were from series containing service records. This has dropped to around 40% in 2024. Here&amp;rsquo;s the total number of files digitised per year since February 2021.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/visualization21.png&#34; width=&#34;400&#34; height=&#34;290&#34; alt=&#34;Bar chart showing the number of files digitised by year from 2021 to 2024&#34;&gt;
&lt;p&gt;The files digitised in 2024 came from 1,439 different series. Here&amp;rsquo;s the top twenty series by number of items digitised in 2024. You&amp;rsquo;ll see that as well as war records, many photos, patents, and immigration records were digitised.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;series&lt;/th&gt;
&lt;th&gt;series_title&lt;/th&gt;
&lt;th&gt;total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A9301&lt;/td&gt;
&lt;td&gt;RAAF Personnel files of Non-Commissioned Officers (NCOs) and other ranks, 1921-1948&lt;/td&gt;
&lt;td&gt;56,635&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B883&lt;/td&gt;
&lt;td&gt;Second Australian Imperial Force Personnel Dossiers, 1939-1947&lt;/td&gt;
&lt;td&gt;39,163&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A6135&lt;/td&gt;
&lt;td&gt;Photographic colour transparencies positives, daily single number series with &amp;lsquo;K&amp;rsquo; [Colour Transparencies] prefix&lt;/td&gt;
&lt;td&gt;15,569&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A13150&lt;/td&gt;
&lt;td&gt;Specifications, examiners reports and correspondence relating to the Registration of Victorian Patents - Second system&lt;/td&gt;
&lt;td&gt;13,519&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A714&lt;/td&gt;
&lt;td&gt;Books of duplicate certificates of naturalization A(1)[Individual person] series&lt;/td&gt;
&lt;td&gt;13,384&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A1500&lt;/td&gt;
&lt;td&gt;Photographic colour transparencies [positives], single number series with &amp;lsquo;K&amp;rsquo; [Colour] prefix&lt;/td&gt;
&lt;td&gt;11,533&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A7109&lt;/td&gt;
&lt;td&gt;&amp;ldquo;Dead&amp;rdquo; card index of Registered Aliens&lt;/td&gt;
&lt;td&gt;10,618&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A705&lt;/td&gt;
&lt;td&gt;Correspondence files, multiple number (Melbourne) series (Primary numbers 1-323)&lt;/td&gt;
&lt;td&gt;9,568&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A13149&lt;/td&gt;
&lt;td&gt;Applications for Registration of Victorian Patents - Second system&lt;/td&gt;
&lt;td&gt;8,843&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A9300&lt;/td&gt;
&lt;td&gt;RAAF Officers Personnel files, 1921-1948&lt;/td&gt;
&lt;td&gt;6,371&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D3481&lt;/td&gt;
&lt;td&gt;Photographs (black and white, colour) of buildings, installations, sites, etc&lt;/td&gt;
&lt;td&gt;5,054&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A11708&lt;/td&gt;
&lt;td&gt;Applications for Registration of Queensland Trade Marks&lt;/td&gt;
&lt;td&gt;4,159&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SP388/1&lt;/td&gt;
&lt;td&gt;Personal documents of British Assisted Migrants&lt;/td&gt;
&lt;td&gt;3,395&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D879&lt;/td&gt;
&lt;td&gt;Duplicates of negatives, annual single number series with D prefix (and progressive alpha infix A - K from 1948-1957)&lt;/td&gt;
&lt;td&gt;3,253&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B6286&lt;/td&gt;
&lt;td&gt;Telstra Historical Collection&lt;/td&gt;
&lt;td&gt;2,955&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D1423&lt;/td&gt;
&lt;td&gt;Original plans (negatives), single number series with alpha prefix denoting discipline&lt;/td&gt;
&lt;td&gt;2,616&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D1989&lt;/td&gt;
&lt;td&gt;Application  forms, medical examination documents and related papers of British and  Foreign Immigrants (including Ex Service) in receipt of free and  assisted passages, chronological order of ship arrival.&lt;/td&gt;
&lt;td&gt;2,403&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D1901&lt;/td&gt;
&lt;td&gt;Loveday Internment Camp internees files, single number series with variable alpha prefix&lt;/td&gt;
&lt;td&gt;2,305&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PP9/5&lt;/td&gt;
&lt;td&gt;Medical examination forms (form 47A) for non-British migrants, single number series&lt;/td&gt;
&lt;td&gt;2,250&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A6510&lt;/td&gt;
&lt;td&gt;Classified prints of photographs relating mainly to Papua and New Guinea&lt;/td&gt;
&lt;td&gt;2,242&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This data was compiled from weekly harvests of RecordSearch&amp;rsquo;s list of recently digitised files. These weekly harvests are saved in the &lt;a href=&#34;https://github.com/wragge/naa-recently-digitised&#34;&gt;naa-recently-digitised GitHub repository&lt;/a&gt;. The harvesting method is &lt;a href=&#34;https://glam-workbench.net/recordsearch/#harvest-recently-digitised-files-from-recordsearch&#34;&gt;documented in the GLAM Workbench&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve also compiled these weekly harvests into &lt;a href=&#34;https://doi.org/10.5281/zenodo.14744050&#34;&gt;annual datasets published through Zenodo&lt;/a&gt;. &lt;a href=&#34;https://doi.org/10.5281/zenodo.14744050&#34;&gt;&lt;img src=&#34;https://zenodo.org/badge/DOI/10.5281/zenodo.14744050.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Changes to Trove newspapers in 2024</title>
      <link>https://updates.timsherratt.org/2025/01/17/changes-to-trove-newspapers-in.html</link>
      <pubDate>Fri, 17 Jan 2025 17:42:59 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2025/01/17/changes-to-trove-newspapers-in.html</guid>
      <description>&lt;p&gt;Every Sunday I harvest information about the number of digitised newspaper articles in Trove. You can view the current results in the &lt;a href=&#34;https://wragge.github.io/trove-newspaper-totals/&#34;&gt;Trove Data Dashboard&lt;/a&gt;. By compiling all the data from 2024, you can find out what changed last year.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;6,241,739 digitised newspaper articles were added to Trove in 2024&lt;/strong&gt;. The  rate of digitisation was pretty quick until the end of March when the  processing of the Melbourne Sun ended, then things flattened out a bit.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/visualization13.png&#34; width=&#34;600&#34; height=&#34;313&#34; alt=&#34;Line chart showing the number of digitised newspaper articles in Trove every Sunday. The rate of change is most rapid between January and March.&#34;&gt;
&lt;p&gt;While the number of articles with corrections, tags, and comments all  increased steadily across 2024, there seems to have been a bit of glitch indexing tags and comments causing some jumps in the totals.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/visualization14.png&#34; width=&#34;600&#34; height=&#34;318&#34; alt=&#34;Line chart showing the number of digitised newspaper articles with tags  each Sunday in 2024. While the rate of change is generally smooth, there are a couple of notable jumps in May and June that probably indicate  indexing issues.&#34;&gt;
&lt;p&gt;Most of the digitised newspaper articles were published in NSW (3,190,972), Victoria (2,680,855), and South Australia (363,483).&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/visualization16.png&#34; width=&#34;600&#34; height=&#34;243&#34; alt=&#34;Bar chart showing the number of newspaper articles added by state.  Values are visible only for NSW, Victoria, South Australia, and  National. &#34;&gt;
&lt;p&gt;About 83% (5,211,532) of the digitised articles added to Trove in 2024 were published between 1930 and 1954. The publication years with the biggest changes were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1953 (722,765 articles added)&lt;/li&gt;
&lt;li&gt;1954 (695,104 articles added)&lt;/li&gt;
&lt;li&gt;1952 (586,890 articles added)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;About 7% of articles (450,712) were published after 1954 (the copyright cliff of death). Most of these were from South Australia.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/visualization19.png&#34; width=&#34;600&#34; height=&#34;323&#34; alt=&#34;Bar chart showing the change in the number of articles in 2024 by publication year&#34;&gt;
&lt;p&gt;Thirty-eight newspapers were added to Trove in 2024:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1876&#34;&gt;Australijos Lietuvis = The Australian Lithuanian (SA : 1948 - 1956)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1914&#34;&gt;Belfast Gazette (Port Fairy, Vic. : 1876 - 1890)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1872&#34;&gt;Braidwood News and Goldfields General Advertiser (NSW : 1862)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1881&#34;&gt;Braidwood Observer and Miner&amp;rsquo;s Advocate (NSW : 1859 - 1862)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1887&#34;&gt;Broughton Creek Mail  (Berry, NSW : 1880 - 1881; 1891 - 1893; 1899 - 1907)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1920&#34;&gt;Brunswick and Coburg Courier (Moreland, Vic. 1933 - 1934)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1896&#34;&gt;Bunyip and Garfield Express : Nar-Nar-Goon, Tynong, Pakenham and Bunyip South representative (Vic. : 1938 - 1948)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1499&#34;&gt;Central Western Daily (Orange, NSW : 1945 - 1954)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1933&#34;&gt;Coburg and Moreland Courier (Vic. : 1932)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1868&#34;&gt;Coolamon-Ganmain Farmers&#39; Review (NSW : 1917 - 1942)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1921&#34;&gt;Coolamon-Ganmain Review (NSW : 1942 - 1947)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1932&#34;&gt;Hanho Tʻaimzŭ = Hanho Times (Sydney, NSW : 1985 - 1995)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1894&#34;&gt;Hill End and Tambaroora Times and Miners&#39; Advocate (NSW : 1871 - 1872; 1875)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1891&#34;&gt;Hills Messenger (Port Adelaide, SA : 1984 - 2011)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1882&#34;&gt;Moruya Examiner (NSW : 1881 - 1902)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1901&#34;&gt;Portland Gazette and Belfast Advertiser (Vic. : 1844 - 1849)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1885&#34;&gt;The Ararat and Mount Pleasant Creek Advertiser and Chronicle for the District of the Wimmera (Vic. : 1861 - 1873)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1922&#34;&gt;The Ardlethan-Beckom Times (Temora, NSW : 1924; 1936 - 1937)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1895&#34;&gt;The Banner of Belfast (Vic. : 1855; 1857 - 1876)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1890&#34;&gt;The Bega District News (NSW : 1923 - 1955)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1897&#34;&gt;The Belfast Gazette and Portland and Warnambool Advertiser (Vic. 1849 - 1876)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1889&#34;&gt;The Berry Register and Kangaroo Valley and South Coast Farmer (NSW : 1894; 1898 - 1905)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1880&#34;&gt;The Braidwood News and Southern Goldfields General Advertiser (NSW : 1864)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1888&#34;&gt;The Broughton Creek Register, and Kangaroo Valley and South Coast Farmer (Berry, NSW : 1886 - 1890)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1919&#34;&gt;The Brunswick and Coburg Gazette (Moonee Ponds, Vic. 1928 - 1933)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1893&#34;&gt;The Captain&amp;rsquo;s Flat Mining Record (NSW : 1898)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1934&#34;&gt;The Courier (Moreland, Vic. : 1932 - 1933)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1898&#34;&gt;The Footscray Advertiser (Vic.  : 1884 - 1887)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1886&#34;&gt;The Kangaroo Valley Times (NSW : 1898 - 1904)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1883&#34;&gt;The Mount Ararat Advertiser (Vic. : 1857)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1884&#34;&gt;The Mount Ararat Advertiser and Chronicle for the District of the Wimmera (Vic. : 1857 - 1861)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1913&#34;&gt;The Orroroo Enterprise and Great Northern Advertiser (SA : 1892 - 1906)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1899&#34;&gt;The Portland Mercury and Normanby Advertiser (Vic. : 1842 - 1843)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1900&#34;&gt;The Portland Mercury and Port Fairy Register (Vic. : 1843 - 1844)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1915&#34;&gt;The Queenslander Illustrated Weekly (Brisbane, Qld. : 1927 - 1939)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1875&#34;&gt;The Seasider (Christies Beach, SA : 1956 - 1963)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1879&#34;&gt;The South-East Kingston Leader (SA : 1962 - 1987)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1892&#34;&gt;The Sunny Corner Silver Press and Miners&#39; Advocate (Mitchell, NSW :  1886)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1902&#34;&gt;The Warrnambool Examiner and Western and Western District Advertiser (Vic. :  1856)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Newly digitised articles were added to 67 newspapers (including the new titles). The newspapers that had the largest increase in articles were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Sun News-Pictorial (Vic), 2,089,635 articles added&lt;/li&gt;
&lt;li&gt;Daily Mirror (NSW), 1,533,759 articles added&lt;/li&gt;
&lt;li&gt;Maitland Mercury (NSW), 658,331 articles added&lt;/li&gt;
&lt;li&gt;Central Western Daily (NSW), 253,575 articles added&lt;/li&gt;
&lt;li&gt;The Pastoral Times (NSW), 221,982 articles added&lt;/li&gt;
&lt;li&gt;Sunraysia Daily (Vic), 210,759 articles added&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2025/visualization17.png&#34; width=&#34;600&#34; height=&#34;1349&#34; alt=&#34;Bar chart showing the change in the number of articles by newspaper  title, ordered by the total change. Most of the changes are small, with  only 6 newspapers having more than 200,000 articles added.&#34;&gt;
&lt;p&gt;If you want to drill down into the changes, have a look at the &lt;a href=&#34;https://wragge.github.io/trove-newspaper-totals/archive/index.html&#34;&gt;Trove Data Dashboard archive&lt;/a&gt; that has weekly snapshots. Or get the raw data for each weekly harvest &lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals&#34;&gt;from this repository&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>@trovenewsbot has a new home</title>
      <link>https://updates.timsherratt.org/2024/12/12/trovenewsbot-has-a.html</link>
      <pubDate>Thu, 12 Dec 2024 12:41:35 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/12/12/trovenewsbot-has-a.html</guid>
      <description>&lt;p&gt;&lt;strong&gt;@trovenewsbot&lt;/strong&gt; has been around for more than eleven years now – originally sharing &lt;a href=&#34;https://trove.nla.gov.au/&#34;&gt;Trove&lt;/a&gt; newspaper articles on Twitter, and now on the Fediverse. But with the imminent closure of the botsin.space Mastodon instance, I&amp;rsquo;ve had to find it a new home. Say hello to the latest version: &lt;a href=&#34;https://wraggebots.net/@trovenewsbot&#34;&gt;@trovenewsbot@wraggebots.net&lt;/a&gt;!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-12-12-12-36-16.png&#34; width=&#34;600&#34; height=&#34;486&#34; alt=&#34;Screenshot of @trovenewsbot&#39;s new profile on wraggebots.net&#34;&gt;
&lt;p&gt;Instead of just moving the bot to an existing instance, I decided to set up my own using &lt;a href=&#34;https://gotosocial.org/&#34;&gt;GoToSocial&lt;/a&gt;. I thought this would give me more control, and encourage me to resurrect some more of my old Twitter bots. I installed GoToSocial on the smallest available DigitalOcean droplet, following the &lt;a href=&#34;https://docs.gotosocial.org/en/latest/getting_started/installation/metal/&#34;&gt;&amp;lsquo;bare metal&amp;rsquo; instructions&lt;/a&gt;. Beyond the usual faffing around with permissions and DNS, I didn&amp;rsquo;t have any major problems. The &lt;a href=&#34;https://docs.gotosocial.org/en/latest/&#34;&gt;GoToSocial documention&lt;/a&gt; is very comprehensive, and includes useful advice on things like &lt;a href=&#34;https://docs.gotosocial.org/en/latest/advanced/security/firewall/&#34;&gt;setting up a firewall&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;GoToSocial is a social network server based on the ActivityPub standard. It&amp;rsquo;s not the same as Mastodon, but because it supports the same standards, it interoperates &lt;em&gt;with&lt;/em&gt; Mastodon. For example, once I had a @trovenewsbot account set up on wraggebots.net, I was able to use the standard migrate functions to move all of the bot&amp;rsquo;s existing followers to the new instance. Easy peasy!&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://github.com/wragge/trovenewsbot-fedi&#34;&gt;bot&amp;rsquo;s code&lt;/a&gt; makes use of the &lt;a href=&#34;https://mastodonpy.readthedocs.io/en/stable/&#34;&gt;mastodon.py&lt;/a&gt; Python library, but again because of the similarities between the GoToSocial and Mastodon APIs, I was able to reuse the code with only minor changes. Specifically, I found that I had to add the parameter &lt;code&gt;version_check_mode = &amp;quot;none&amp;quot;&lt;/code&gt; to the Mastodon client initialisation. The &lt;code&gt;notifications_dismiss()&lt;/code&gt; method is not currently implemented in GoToSocial, so I also had to change the way I check for new notifications ­– saving the id of the last viewed notification, and then using it as the value for &lt;code&gt;since_id&lt;/code&gt; when requesting a list of current notifications.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;@trovenewsbot&lt;/strong&gt; does a lot more than post random newspaper articles from Trove. By tooting keywords at it, you can search Trove from within the Fediverse. There are many options available for customising your queries, &lt;a href=&#34;https://wragge.github.io/trovenewsbot-fedi/&#34;&gt;see the documentation for full details&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Six more volumes added to the searchable database of Tasmanian Post Office Directories!</title>
      <link>https://updates.timsherratt.org/2024/11/21/six-more-volumes.html</link>
      <pubDate>Thu, 21 Nov 2024 16:05:37 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/11/21/six-more-volumes.html</guid>
      <description>&lt;p&gt;A couple of months ago I realised my &lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/tasmanian-post-office-directories/&#34;&gt;big, searchable database of Tasmanian Post Office Directories&lt;/a&gt; was missing the volume from 1920. It took a bit of work to add it in, as &lt;a href=&#34;https://updates.timsherratt.org/2024/09/26/wheres-missing-volume.html&#34;&gt;described in this post&lt;/a&gt;. Unfortunately, I&amp;rsquo;d barely finished when I realised that a number of other years were also missing! Argh! The good news is that I&amp;rsquo;ve been steadily working through these missing volumes, adding one a week, and now I&amp;rsquo;m finally, finally finished!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/autas001126438076-1945-46-5.jpg&#34; width=&#34;600&#34; height=&#34;962&#34; alt=&#34;Page from Wise&#39;s Tasmania Post Office Directory 1945-46&#34;&gt;
&lt;p&gt;The new volumes are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1920&lt;/li&gt;
&lt;li&gt;1933-34&lt;/li&gt;
&lt;li&gt;1941-42&lt;/li&gt;
&lt;li&gt;1942-43&lt;/li&gt;
&lt;li&gt;1943-44&lt;/li&gt;
&lt;li&gt;1945-46&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In total there are now 54 volumes from 1890 to 1948. Every line of every volume has been OCRd and indexed, so you can run fulltext searches across all 54 volumes to find matching entries. The fulltext search also supports advanced operators like wildcards and booleans.&lt;/p&gt;
&lt;p&gt;As &lt;a href=&#34;https://updates.timsherratt.org/2024/09/26/wheres-missing-volume.html&#34;&gt;I mentioned in relation to 1920&lt;/a&gt;, while these volumes can be downloaded as PDFs from Libraries Tasmania, they don&amp;rsquo;t contain any OCRd text – they&amp;rsquo;re not searchable (despite what Libraries Tasmania &lt;a href=&#34;https://libraries.tas.gov.au/family-history/where-lived/what-is-online/&#34;&gt;says here&lt;/a&gt;). The quality of the scans is also quite variable – tight bindings cut off text, pages are skewed, and lighting is inconsistent. This means that the OCR processing is far from perfect. There will be names missing from the search index as a result of this. However, because you can search across all volumes at once, the database makes it easier to find people, as you can pick them up in one year and follow them through subsequent volumes, filling in any gaps.&lt;/p&gt;
&lt;p&gt;It would be great if Libraries Tasmania would add a link to the database from their &lt;a href=&#34;https://libraries.tas.gov.au/slat/guides-to-records/directories-and-almanacs/introduction/&#34;&gt;Directories and almanacs&lt;/a&gt; page. I&amp;rsquo;ve sent a couple of emails but haven&amp;rsquo;t received a reply. It seems odd that they&amp;rsquo;d link to commercial offerings like FindMyPast, but not to the free, community-developed version!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Where&#39;s 1920? Missing volume added to Tasmanian Post Office Directories!</title>
      <link>https://updates.timsherratt.org/2024/09/26/wheres-missing-volume.html</link>
      <pubDate>Thu, 26 Sep 2024 13:58:54 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/09/26/wheres-missing-volume.html</guid>
      <description>&lt;p&gt;Visualisation is a great way to find problems in your data.&lt;/p&gt;
&lt;p&gt;As part of the &lt;a href=&#34;https://everydayheritage.au/&#34;&gt;Everyday Heritage project&lt;/a&gt;, I&amp;rsquo;m working with a team to document the lives of Tasmania&amp;rsquo;s Chinese residents in the 19th and early 20th centuries. We&amp;rsquo;re &lt;a href=&#34;https://everydayheritage.au/blog/from-the-archive-uncovering-the-everyday-heritage-of-chinese-tasmanians/&#34;&gt;using a variety of sources&lt;/a&gt; such as Trove&amp;rsquo;s newspapers, the Tasmanian Names Index, and the Tasmanian Post Office Directories. To help with the research, I &lt;a href=&#34;https://updates.timsherratt.org/2022/09/15/from-pdfs-to.html&#34;&gt;converted all the PDF volumes of the Post Office Directories into a public, online, searchable database&lt;/a&gt;. Or at least, I thought I had.&lt;/p&gt;
&lt;p&gt;The Tasmanian Post Office Directories database embeds metadata about each line of text in its results, so it&amp;rsquo;s easy to save items of interest using Zotero. A member of our team has already saved hundreds of entries this way. The other day I started pulling these entries out of Zotero using its web API, and thought I&amp;rsquo;d get an overview by charting the number of results per year. It was then I noticed that 1920 was missing&amp;hellip;&lt;/p&gt;
&lt;p&gt;I checked &lt;a href=&#34;https://stors.tas.gov.au/ILS/SD_ILS-981598&#34;&gt;the PDF volumes in Libraries Tasmania&lt;/a&gt; and the 1920 volume was there, so I worked back through my processing code to figure out why I&amp;rsquo;d missed it. It turns out the 1920 volume is named using a different pattern, and the regular expression I used to scrape the list of volumes was a little too specific. At least that was easy to rectify.&lt;/p&gt;
&lt;p&gt;However, it wasn&amp;rsquo;t just a matter of feeding the 1920 volume through my &lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/&#34;&gt;processing notebooks&lt;/a&gt;, because the &lt;em&gt;content&lt;/em&gt; of the 1920 PDF was also quite different to all the other volumes. Most of the PDFs made available from Libraries Tasmania have a single page per image, and the images have been pre-processed for OCR. The PDFs also include searchable, OCRd text. Here&amp;rsquo;s an example of one of the images from 1921:&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-09-26-12-27-41.png&#34; width=&#34;600&#34; height=&#34;947&#34; alt=&#34;&#34;&gt;
&lt;p&gt;The images in the 1920 volume are double page spreads, in colour, without any pre-processing. The PDF doesn&amp;rsquo;t include any OCRd text, so it&amp;rsquo;s not searchable. The quality of the images is also quite variable: tight bindings mean some text is cut off, pages are sometimes skewed, and bad lighting causes shadows across the right hand page – when converted to black and white for OCR, these shadows become black blobs that completely obscure the text. Here&amp;rsquo;s an example of one of the images from 1920:&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-09-26-12-23-52.png&#34; width=&#34;600&#34; height=&#34;460&#34; alt=&#34;&#34;&gt;
&lt;p&gt;All this meant I had to do a lot of additional processing of the images before I could extract useful text via OCR. Here&amp;rsquo;s a summary of the image pre-processing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;extracted all the images from the PDF&lt;/li&gt;
&lt;li&gt;sliced all the images roughly in half and saved the results as separate files, so I had one image per page&lt;/li&gt;
&lt;li&gt;manually cropped all 800 pages to get them as clean as possible&lt;/li&gt;
&lt;li&gt;removed the shadows from the images thanks to &lt;a href=&#34;https://stackoverflow.com/questions/44752240/how-to-remove-shadow-from-scanned-images-using-opencv&#34;&gt;this recipe in Stack Overflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;binarized and deskewed the images ready for OCR&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From then on I could apply the processes in my &lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/&#34;&gt;existing notebooks&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;OCRd the images using Tesseract&lt;/li&gt;
&lt;li&gt;uploaded the cropped, colour images to the AWS bucket for delivery using &lt;a href=&#34;https://github.com/samvera/serverless-iiif&#34;&gt;serveless-IIIF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;added the volume metadata and OCRd text to the &lt;a href=&#34;https://datasette.io/&#34;&gt;Datasette&lt;/a&gt; database, generating a full-text index on the text column&lt;/li&gt;
&lt;li&gt;updated the application on Google Cloudrun&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I also took the chance to tweak the theme a bit, including a new dark mode.&lt;/p&gt;
&lt;p&gt;The updated database &lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/tasmanian-post-office-directories/&#34;&gt;is live&lt;/a&gt;, now containing &lt;strong&gt;49 volumes&lt;/strong&gt; from 1890 to 1948 including 1920!&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/tasmanian-post-office-directories/&#34;&gt;&lt;strong&gt;Try it now!&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/tasmanian-post-office-directories/&#34;&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-09-26-12-40-57.png&#34; width=&#34;600&#34; height=&#34;574&#34; alt=&#34;&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;update-27-september-2024&#34;&gt;UPDATE 27, September 2024&lt;/h2&gt;
&lt;p&gt;It seems I was too focused on the gap in 1920 and missed some other missing volumes from the 1930s and 40s. I&amp;rsquo;ve started processing these but it&amp;rsquo;s going to take a fair while to work through them all. I&amp;rsquo;ll add each volume to the database as it&amp;rsquo;s finished. &lt;a href=&#34;https://updates.timsherratt.org/&#34;&gt;Check here&lt;/a&gt; for regular updates.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Major update for the Trove Newspapers section of the GLAM Workbench</title>
      <link>https://updates.timsherratt.org/2024/09/23/major-update-for.html</link>
      <pubDate>Mon, 23 Sep 2024 12:15:42 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/09/23/major-update-for.html</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/&#34;&gt;Trove newspapers&lt;/a&gt; section of the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; was updated last week. Over the last year I&amp;rsquo;ve been gradually updating notebooks to use &lt;a href=&#34;https://glam-workbench.net/trove-api-v3/&#34;&gt;version 3 of the Trove API&lt;/a&gt;, but when version 2 suddenly disappeared a couple of weeks ago I had to hurriedly pull everything together. The Trove newspapers section includes 23 notebooks and 6 datasets, so it&amp;rsquo;s not a small job. The changes include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;updated all notebooks to use version 3 of the Trove API&lt;/li&gt;
&lt;li&gt;removed remaining datasets from the &lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers&#34;&gt;code repository&lt;/a&gt; and created dedicated data repositories for them, integrating them with Zenodo where appropriate&lt;/li&gt;
&lt;li&gt;added metadata to all the notebooks – this is used to build an &lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers/blob/master/ro-crate-metadata.json&#34;&gt;RO-Crate metadata file&lt;/a&gt; for the code repository&lt;/li&gt;
&lt;li&gt;updated all the Python packages&lt;/li&gt;
&lt;li&gt;added a &lt;code&gt;voila.json&lt;/code&gt; file to configure Voilá&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;None of the functionality of the notebooks should have changed. There&amp;rsquo;s a slight difference in the &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/find-non-english-newspapers/&#34;&gt;Finding non-English newspapers in Trove&lt;/a&gt; notebook because the language detection library I was using is no longer maintained. I&amp;rsquo;ve swapped in &lt;a href=&#34;https://pypi.org/project/py3langid/&#34;&gt;py3langid&lt;/a&gt; and it seems to work well, though the results are a little different. Interestingly, where the previous library thought that bad OCR was &amp;lsquo;Maltese&amp;rsquo;, the new one detects it as &amp;lsquo;Latin&amp;rsquo;! There&amp;rsquo;s no change to the list of newspapers with non-English language content detected by the notebook.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-09-23-11-09-14.png&#34; width=&#34;600&#34; height=&#34;647&#34; alt=&#34;Screenshot of documentation page for notebook showing the embedded preview&#34;&gt;
&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/&#34;&gt;documentation pages&lt;/a&gt; have also been updated. The notebook pages are now built using data from the code repository&amp;rsquo;s RO-Crate file. They also include embedded HTML previews of the notebooks. If a notebook generates visualisations, the visualisations are usually included in the HTML, so you can explore the outputs without running the notebook – see, for example, the charts in &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/visualise-total-newspaper-articles-by-state-year/&#34;&gt;Visualise the total number of newspaper articles in Trove by year and state&lt;/a&gt;. Most of the dataset pages now include links to explore the contents &lt;a href=&#34;https://updates.timsherratt.org/2024/07/19/share-your-spreadsheet.html&#34;&gt;using Datasette-Lite&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I still have to generate RO-Crate files for all the data repositories, but I wanted to get the code stuff finished first.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Preserving the history of online collections (my love letter to future historians)</title>
      <link>https://updates.timsherratt.org/2024/09/20/preserving-the-history.html</link>
      <pubDate>Fri, 20 Sep 2024 16:18:34 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/09/20/preserving-the-history.html</guid>
      <description>&lt;p&gt;It&amp;rsquo;s pretty obvious that access to digitised resources, like Trove&amp;rsquo;s newspapers, has changed the practice of history in Australia. But how? I&amp;rsquo;m certain that the historiographical implications of the growth and development of online collections will become a topic of increasing interest to historians, and that exploration of this topic will lead to important insights into the relationship between what we keep, what we value, and what we know. But for this to happen we need to have data documenting changes in online collections. What became available when? How was it delivered to users? How did the search indexing work?&lt;/p&gt;
&lt;p&gt;In general, GLAM collection interfaces exist in an eternal present – they&amp;rsquo;re not good at explaining changes, or communicating their own histories. Australian GLAM organisations also share little statistical information. If you&amp;rsquo;re lucky, you might get something useful out of annual reports, but that&amp;rsquo;s about it. Trove, in fact, removed all their online collection statistics in the 2020 interface update. Web archives capture individual pages, but not complete systems. If we don&amp;rsquo;t document the shape and structure of online collections now, how will future historians understand their impact?&lt;/p&gt;
&lt;p&gt;A couple of years ago I gave a short talk entitled &amp;lsquo;Living archaeologies of online collections&amp;rsquo; &lt;a href=&#34;https://www.dpconline.org/events/past-events/event-watch-party-comp-access-launch&#34;&gt;for the Digital Preservation Coalition&lt;/a&gt; (&lt;a href=&#34;https://youtu.be/HHdYi4LjDJk&#34;&gt;video&lt;/a&gt; &amp;amp; &lt;a href=&#34;https://slides.com/wragge/dpc-living-archaeologies&#34;&gt;slides&lt;/a&gt;).&lt;/p&gt;
&lt;iframe width=&#34;100%&#34; height=&#34;400&#34; src=&#34;https://www.youtube.com/embed/HHdYi4LjDJk?si=cTtKF8MWclnozJDQ&#34; title=&#34;YouTube video player&#34; frameborder=&#34;0&#34; allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&#34; referrerpolicy=&#34;strict-origin-when-cross-origin&#34; allowfullscreen&gt;&lt;/iframe&gt;
&lt;p&gt;In the talk, I described some of my piecemeal and inconsistent attempts to capture this history – starting &lt;a href=&#34;https://timsherratt.org/shed/trove/graphs/&#34;&gt;back in 2011&lt;/a&gt; when I first harvested data from Trove about the number of digitised newspaper articles. I&amp;rsquo;ve been continuing to create and update datasets, and have been trying to improve the way they&amp;rsquo;re organised and described – making them more FAIR – but I&amp;rsquo;ve still got a long way to go. At the moment there&amp;rsquo;s information spread across the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt;, GitHub repositories, and &lt;a href=&#34;https://zenodo-rdm.web.cern.ch/communities/trove-historical-data/records&#34;&gt;Zenodo records&lt;/a&gt;, so I thought it might be useful if I pulled everything together into one big list. Hence this post. I often forget about what I&amp;rsquo;ve done in the past, so it&amp;rsquo;ll help me keep track of where the gaps are and what&amp;rsquo;s left to do. And hopefully it&amp;rsquo;ll encourage others to think about the significance and possibilities of this data, and perhaps share their own datasets.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-09-20-15-12-52.png&#34; width=&#34;600&#34; height=&#34;745&#34; alt=&#34;Screenshot of some of the datasets in the Trove Historical Data community in Zenodo&#34;&gt;
&lt;p&gt;&lt;em&gt;There&amp;rsquo;s also a growing list of datasets in the &lt;a href=&#34;https://zenodo.org/communities/trove-historical-data/records&#34;&gt;Trove Historical Data&lt;/a&gt; community on Zenodo&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I suppose I could also add the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt; to this list. It&amp;rsquo;s not a dataset, but it is a snapshot of the current state of Trove. I&amp;rsquo;ll continue to update it, and these updates will themselves be saved as versions in GitHub and Zenodo, allowing future researchers to dig back through the layers.&lt;/p&gt;
&lt;p&gt;A lot of what I do is focused on the present – building tools and resources that help researchers make use of GLAM collections right now. Those tools and resources will eventually decay as I shuffle off, as collections evolve, and as technologies change. But I&amp;rsquo;m hoping that these datasets will grow in value over time. I think it was &lt;a href=&#34;https://en.wikipedia.org/wiki/Jason_Scott&#34;&gt;Jason Scott&lt;/a&gt; who coined the phrase &amp;lsquo;metadata is a love letter to the future&amp;rsquo;. I suppose this is my love letter to future historians.&lt;/p&gt;
&lt;h2 id=&#34;1-trove-zones-categories-and-formats&#34;&gt;1. Trove zones, categories, and formats&lt;/h2&gt;
&lt;h3 id=&#34;trove-zone-totals&#34;&gt;trove-zone-totals&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/wragge/trove-zone-totals&#34;&gt;https://github.com/wragge/trove-zone-totals&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This repository contains an automated git scraper that uses the Trove API to save data about the contents of Trove&amp;rsquo;s zones and categories. It runs every week and updates the following data files:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-zone-totals/blob/main/data/trove-category-totals.csv&#34;&gt;Total number of resources by Trove category&lt;/a&gt;, weekly updates, 13 June 2023 to present&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-zone-totals/blob/main/data/trove-category-formats.csv&#34;&gt;Total number of resources by Trove category and format&lt;/a&gt;, weekly updates, 13 June 2023 to present&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the web interface, &amp;lsquo;zones&amp;rsquo; were replaced by &amp;lsquo;categories&amp;rsquo; in 2020. However, categories were not available through the Trove API until the release of version 3 in June 2023. To try and document the differences between zones and categories, totals from both were captured until version 2 of the API was switched off in September 2024.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-zone-totals/blob/main/data/trove-zone-totals.csv&#34;&gt;Total number of resources by Trove zone&lt;/a&gt;, weekly updates, 9 March 2023 to 1 September 2024&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-zone-totals/blob/main/data/trove-zone-formats.csv&#34;&gt;Total number of resources by Trove zone and format&lt;/a&gt;, weekly updates, 9 March 2023 to 1 September 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2-trove-newspapers&#34;&gt;2. Trove newspapers&lt;/h2&gt;
&lt;h3 id=&#34;trove-newspaper-totals-historical&#34;&gt;trove-newspaper-totals-historical&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals-historical/&#34;&gt;https://github.com/wragge/trove-newspaper-totals-historical/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.6471544&#34;&gt;&lt;img src=&#34;https://zenodo-rdm.web.cern.ch/badge/DOI/10.5281/zenodo.6471544.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The files in this dataset were created at irregular intervals between 2011 and 2022 for use in visualising Trove&amp;rsquo;s newspaper corpus. Harvests from 2011 were screen scraped from the Trove website. Harvests after 2012 make use of the &lt;code&gt;year&lt;/code&gt; and &lt;code&gt;state&lt;/code&gt; facets from the Trove API. There are 9 versions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;12 April 2011&lt;/li&gt;
&lt;li&gt;4 August 2011&lt;/li&gt;
&lt;li&gt;12 September 2014&lt;/li&gt;
&lt;li&gt;29 November 2015&lt;/li&gt;
&lt;li&gt;14 December 2016&lt;/li&gt;
&lt;li&gt;28 July 2019&lt;/li&gt;
&lt;li&gt;10 July 2020&lt;/li&gt;
&lt;li&gt;27 April 2021&lt;/li&gt;
&lt;li&gt;21 January 2022&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each version includes two data files:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Total number of newspaper articles by year&lt;/li&gt;
&lt;li&gt;Total number of newspaper articles by year and state&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;trove-newspaper-totals&#34;&gt;trove-newspaper-totals&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals&#34;&gt;https://github.com/wragge/trove-newspaper-totals&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This repository contains an automated git scraper that uses the Trove API to save information about the number of digitised newspaper articles currently available through Trove. It runs every week and updates four data files:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals/blob/master/data/total_articles_by_year.csv&#34;&gt;Total number of newspaper articles by year&lt;/a&gt;, weekly updates, 19 April 2022 to present&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals/blob/master/data/total_articles_by_year_and_state.csv&#34;&gt;Total number of newspaper articles by year and state&lt;/a&gt;, weekly updates, 19 April 2022 to present&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals/blob/master/data/total_articles_by_newspaper.csv&#34;&gt;Total number of articles by newspaper&lt;/a&gt;, weekly updates, 20 April 2022 to present&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals/blob/master/data/total_articles_by_category.csv&#34;&gt;Total number of newspaper articles by category&lt;/a&gt;, weekly updates, 20 April 2022 to present&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By retrieving all versions of these files from the commit history, you can analyse changes in Trove over time.&lt;/p&gt;
&lt;p&gt;A weekly summary of the harvested data is presented in the &lt;a href=&#34;https://wragge.github.io/trove-newspaper-totals/&#34;&gt;Trove Newspapers Data Dashboard&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;trove-newspaper-titles-web-archives&#34;&gt;trove-newspaper-titles-web-archives&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspaper-titles-web-archives&#34;&gt;https://github.com/GLAM-Workbench/trove-newspaper-titles-web-archives&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.13761732&#34;&gt;&lt;img src=&#34;https://zenodo-rdm.web.cern.ch/badge/DOI/10.5281/zenodo.13761732.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;These datasets were created by  harvesting information about newspaper titles in Trove from web archives. The harvesting method is documented by &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/historical-data-newspaper-titles/&#34;&gt;Gathering historical data about the addition of newspaper titles to Trove&lt;/a&gt; in the GLAM Workbench.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspaper-titles-web-archives/blob/main/trove_newspaper_titles_2009_2021.csv&#34;&gt;All web archive captures of Trove newspaper titles, 2009 to 2021&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspaper-titles-web-archives/blob/main/trove_newspaper_titles_first_appearance_2009_2021.csv&#34;&gt;First appearance of each newspaper title in web archive captures, 2009 to 2021&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspaper-titles-web-archives/blob/v1.2/titles_list.md&#34;&gt;Alphabetical list of newspaper titles showing approximately when they first appeared in Trove, 2009 to 2021&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;trove-newspapers-corrections&#34;&gt;trove-newspapers-corrections&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers-corrections&#34;&gt;https://github.com/GLAM-Workbench/trove-newspapers-corrections&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.13761546&#34;&gt;&lt;img src=&#34;https://zenodo-rdm.web.cern.ch/badge/DOI/10.5281/zenodo.13761546.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;OCR errors in Trove&amp;rsquo;s digitised newspapers can be corrected by users. To help understand patterns in newspaper correction, this dataset has been created to record information about the number of articles with  corrections. The data was extracted from the Trove API using &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/Analysing_OCR_corrections/&#34;&gt;this notebook&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers-corrections/blob/main/corrections_by_title.csv&#34;&gt;Number of corrections by newspaper&lt;/a&gt;, 7 versions:
&lt;ul&gt;
&lt;li&gt;13 August 2019&lt;/li&gt;
&lt;li&gt;10 July 2020&lt;/li&gt;
&lt;li&gt;27 April 2021&lt;/li&gt;
&lt;li&gt;21 January 2022&lt;/li&gt;
&lt;li&gt;24 June 2022&lt;/li&gt;
&lt;li&gt;14 September 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers-corrections/blob/main/corrections_by_year.csv&#34;&gt;Number of corrections by year&lt;/a&gt;, 2 versions:
&lt;ul&gt;
&lt;li&gt;24 June 2024&lt;/li&gt;
&lt;li&gt;14 September 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers-corrections/blob/main/corrections_by_category.csv&#34;&gt;Number of corrections by category&lt;/a&gt;, 2 versions:
&lt;ul&gt;
&lt;li&gt;24 June 2024&lt;/li&gt;
&lt;li&gt;14 September 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;trove-newspapers-data-post-54&#34;&gt;trove-newspapers-data-post-54&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers-data-post-54/&#34;&gt;https://github.com/GLAM-Workbench/trove-newspapers-data-post-54/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.13761534&#34;&gt;&lt;img src=&#34;https://zenodo-rdm.web.cern.ch/badge/DOI/10.5281/zenodo.13761534.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Due to copyright restrictions, most of the digitised  newspaper articles on Trove were published before 1955. However, some  articles published after 1954 have been made available. This repository  provides data about digitised newspapers in Trove that have articles  available from after 1954 (the &amp;lsquo;copyright cliff of death&amp;rsquo;). The data was extracted from the Trove API using &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/Beyond_the_copyright_cliff_of_death/&#34;&gt;this notebook&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There are 8 versions of this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;7 June 2019&lt;/li&gt;
&lt;li&gt;12 August 2019&lt;/li&gt;
&lt;li&gt;10 July 2020&lt;/li&gt;
&lt;li&gt;11 November 2020&lt;/li&gt;
&lt;li&gt;27 April 2021&lt;/li&gt;
&lt;li&gt;21 January 2022&lt;/li&gt;
&lt;li&gt;27 June 2024&lt;/li&gt;
&lt;li&gt;14 September 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;trove-newspapers-non-english&#34;&gt;trove-newspapers-non-english&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers-non-english&#34;&gt;https://github.com/GLAM-Workbench/trove-newspapers-non-english&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.13761509&#34;&gt;&lt;img src=&#34;https://zenodo-rdm.web.cern.ch/badge/DOI/10.5281/zenodo.13761509.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This dataset contains information about newspapers published in  languages other than English that have been digitised and made available through Trove. Data about the languages present in newspapers was  generated by harvesting a sample of articles from each newspaper using  the Trove API, and then using language detection software on the OCRd  text of each article. The method is documented in &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/find-non-english-newspapers/&#34;&gt;this notebook&lt;/a&gt; in the GLAM Workbench.&lt;/p&gt;
&lt;p&gt;There are two data files:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers-non-english/blob/main/newspapers_non_english.csv&#34;&gt;CSV data file of the main languages detected for each newspaper with non-English language content&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers-non-english/blob/main/non-english-newspapers.md&#34;&gt;A markdown formatted list of all newspapers found with non-English language content&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are two versions of this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;9 July 2024&lt;/li&gt;
&lt;li&gt;14 September 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;trove-newspaper-issues&#34;&gt;trove-newspaper-issues&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.13761491&#34;&gt;&lt;img src=&#34;https://zenodo-rdm.web.cern.ch/badge/DOI/10.5281/zenodo.13761491.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This dataset contains information about the published issues of  newspapers digitised and made available through Trove. The data was  harvested from the Trove API, using &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/harvest_newspaper_issues/&#34;&gt;this notebook in the GLAM Workbench&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There are two data files in this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Total number of newspaper issues per year for each digitised newspaper&lt;/li&gt;
&lt;li&gt;A complete list of newspaper issues available from Trove&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are 5 versions of this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;18 October 2021&lt;/li&gt;
&lt;li&gt;20 January 2022&lt;/li&gt;
&lt;li&gt;3 August 2023&lt;/li&gt;
&lt;li&gt;26 June 2024&lt;/li&gt;
&lt;li&gt;13 September 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;3-trove-lists-and-tags&#34;&gt;3. Trove lists and tags&lt;/h2&gt;
&lt;h3 id=&#34;trove-lists-metadata&#34;&gt;trove-lists-metadata&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-lists-metadata/&#34;&gt;https://github.com/GLAM-Workbench/trove-lists-metadata/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.11504501&#34;&gt;&lt;img src=&#34;https://zenodo-rdm.web.cern.ch/badge/DOI/10.5281/zenodo.11504501.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Trove users can create collections of resources using  Trove&amp;rsquo;s &amp;lsquo;lists&amp;rsquo;. This dataset contains metadata describing all public lists, harvested via the Trove API. To reduce file size, the details of the resources collected by each list  are not included, just the total number of resources. The data was extracted using &lt;a href=&#34;https://glam-workbench.net/trove-lists/#harvest-summary-data-from-trove-lists&#34;&gt;this notebook&lt;/a&gt; from the &lt;a href=&#34;https://glam-workbench.net/trove-lists/&#34;&gt;Trove lists and tags&lt;/a&gt; section of the GLAM Workbench.&lt;/p&gt;
&lt;p&gt;There are 4 versions of this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;20 September 2018&lt;/li&gt;
&lt;li&gt;22 September 2020&lt;/li&gt;
&lt;li&gt;5 July 2022&lt;/li&gt;
&lt;li&gt;6 June 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;public-tags-added-to-resources-in-trove-2008-to-2024&#34;&gt;Public tags added to resources in Trove, 2008 to 2024&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.11496377&#34;&gt;&lt;img src=&#34;https://zenodo-rdm.web.cern.ch/badge/DOI/10.5281/zenodo.11496377.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This dataset contains details of 2,495,958 unique public tags added to 10,403,650 resources in Trove between August 2008 and June 2024.&lt;/p&gt;
&lt;p&gt;There are 3 versions of this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;10 July 2021&lt;/li&gt;
&lt;li&gt;6 July 2022&lt;/li&gt;
&lt;li&gt;6 June 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;tag-counts&#34;&gt;Tag counts&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.11496519&#34;&gt;&lt;img src=&#34;https://zenodo-rdm.web.cern.ch/badge/DOI/10.5281/zenodo.11496519.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This dataset was derived from the full harvest of Trove public tags. It contains a list of unique tags and the total number of resources in Trove each tag is attached to.&lt;/p&gt;
&lt;p&gt;There are 3 versions of this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;10 July 2021&lt;/li&gt;
&lt;li&gt;6 July 2022&lt;/li&gt;
&lt;li&gt;6 June 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;4-trove-contributors&#34;&gt;4. Trove contributors&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/wragge/trove-contributor-totals&#34;&gt;https://github.com/wragge/trove-contributor-totals&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This repository contains an automated git scraper that uses the Trove API to save details of organisations and projects that contribute metadata to Trove. As well as counts of total resources by contributor, this dataset includes counts of resources from each contributor by format and category. It runs every week and updates the following data files:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-contributor-totals/blob/main/data/trove-contributors.json&#34;&gt;Number of resources by contributor (unmodified JSON from API)&lt;/a&gt;, weekly updates, 9 March 2023 to present&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-contributor-totals/blob/main/data/trove-contributors.csv&#34;&gt;Number of resources by contributor (flattened data as CSV)&lt;/a&gt;, weekly updates, 9 March 2023 to present&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-contributor-totals/blob/main/data/trove-contributors-categories.csv&#34;&gt;Number of resources by contributor and category&lt;/a&gt;, weekly updates, 21 June 2024 to present&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-contributor-totals/blob/main/data/trove-contributors-categories-formats.csv&#34;&gt;Number of resources by contributor, category, and format&lt;/a&gt;, weekly updates, 21 June 2024 to present&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These data files were generated using version 2 of the Trove API:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-contributor-totals/blob/main/data/trove-contributors-zones.csv&#34;&gt;Number of resources by contributor and zone&lt;/a&gt;, weekly updates, 9 March 2023 to 12 May 2024&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-contributor-totals/blob/main/data/trove-contributors-formats.csv&#34;&gt;Number of resources by contributor, zone, and format&lt;/a&gt;,  weekly updates, 9 March 2023 to 12 May 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;5-trove-digitised-resources-other-than-newspapers&#34;&gt;5. Trove digitised resources (other than newspapers)&lt;/h2&gt;
&lt;p&gt;To help people find and use digitised resources other than newspapers in Trove, I&amp;rsquo;ve been harvesting, sharing, and visualising metadata relating to specific formats, such as books and periodicals. The methods I&amp;rsquo;ve used have changed over time, and there are some earlier versions that I still need to extract from the Git repositories, but these are the current datasets. I&amp;rsquo;m planning to set up automatic re-harvests for some or all of these, so there&amp;rsquo;ll be a better record of change over time.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s more information about these datasets in both the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; and the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;books&#34;&gt;Books&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-books-data&#34;&gt;https://github.com/GLAM-Workbench/trove-books-data&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There are 2 versions of this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;20 November 2023&lt;/li&gt;
&lt;li&gt;14 February 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;periodicals&#34;&gt;Periodicals&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-periodicals-data&#34;&gt;https://github.com/GLAM-Workbench/trove-periodicals-data&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This dataset was created by checking, correcting, and enriching data  about digitised periodicals obtained from the Trove API. Additional  metadata describing periodical titles and issues was extracted from the  Trove website and used to check the API results. Where titles were  wrongly described as issues, and vice versa, the records were corrected. Additional descriptive metadata was also added into the records.  Separate CSV formatted data files were created for titles and issues.  Finally, the titles and issues data was loaded into an SQLite database  for use with Datasette.&lt;/p&gt;
&lt;p&gt;There are 4 data files in this repository:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-periodicals-data/raw/main/titles-issues-added.ndjson&#34;&gt;NDJSON file of titles and issues harvested from Trove API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-periodicals-data/raw/main/periodical-titles.csv&#34;&gt;CSV file of periodical titles enriched with additional metadata&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-periodicals-data/raw/main/periodical-issues.csv&#34;&gt;CSV file of periodical issues&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-periodicals-data/raw/main/periodicals.db&#34;&gt;SQLite database with linked titles and issues data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are two versions of this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;29 February 2024&lt;/li&gt;
&lt;li&gt;12 March 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;parliamentary-papers&#34;&gt;Parliamentary Papers&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-parliamentary-papers-data&#34;&gt;https://github.com/GLAM-Workbench/trove-parliamentary-papers-data&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This dataset contains metadata describing Commonwealth Parliamentary  Papers that have been digitised and are made available through Trove.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s one version of this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;23 February 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;maps&#34;&gt;Maps&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-maps-data&#34;&gt;https://github.com/GLAM-Workbench/trove-maps-data&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.11526486&#34;&gt;&lt;img src=&#34;https://zenodo-rdm.web.cern.ch/badge/DOI/10.5281/zenodo.11526486.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This dataset contains metadata describing digitised maps in Trove, harvested from the Trove API and other sources.&lt;/p&gt;
&lt;p&gt;There are 2 data files in this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-maps-data/raw/main/single_maps.csv&#34;&gt;Metadata of single maps&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-maps-data/raw/main/single_maps_coordinates.csv&#34;&gt;Parsed geospatial coordinates of single maps&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are 2 versions of this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1 February 2023&lt;/li&gt;
&lt;li&gt;8 June 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;oral-histories&#34;&gt;Oral histories&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-oral-histories-data&#34;&gt;https://github.com/GLAM-Workbench/trove-oral-histories-data&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There are 2 data files in this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-oral-histories-data/blob/main/trove-oral-histories.csv&#34;&gt;Metadata describing NLA oral histories&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-oral-histories-data/blob/main/trove-oral-history-series.txt&#34;&gt;List of series containing oral histories&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are 2 versions of this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;16 November 2023&lt;/li&gt;
&lt;li&gt;15 December 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;images&#34;&gt;Images&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-images-rights-data/&#34;&gt;https://github.com/GLAM-Workbench/trove-images-rights-data/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This dataset includes information about the application of licences and rights statements to images by Trove contributors.&lt;/p&gt;
&lt;p&gt;There are 2 data files in this repository:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-images-rights-data/raw/main/rights-on-images.csv&#34;&gt;Number of images by licence type and contributor&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-images-rights-data/raw/main/rights-on-out-of-copyright-photos.csv&#34;&gt;Number of out-of-copyright images by licence type and contributor&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are 3 versions of this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;17 February 2020&lt;/li&gt;
&lt;li&gt;9 March 2022&lt;/li&gt;
&lt;li&gt;24 April 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;finding-aids&#34;&gt;Finding aids&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/nla-finding-aids-data/&#34;&gt;https://github.com/GLAM-Workbench/nla-finding-aids-data/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This repository contains data about the National Library of Australia&amp;rsquo;s digitised manuscript finding aids, harvested from Trove.&lt;/p&gt;
&lt;p&gt;This dataset contains 2 data files:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/nla-finding-aids-data/blob/main/finding-aids.csv&#34;&gt;List of urls of digitised finding aids in Trove&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/nla-finding-aids-data/blob/main/finding-aids-totals.csv&#34;&gt;Summary information describing each digitised finding aid&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There is one version of this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1 March 2023&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;6-trove-born-digital-resources&#34;&gt;6. Trove born digital resources&lt;/h2&gt;
&lt;h3 id=&#34;pandora-web-archive-collections&#34;&gt;Pandora web archive collections&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-web-archives-collections&#34;&gt;https://github.com/GLAM-Workbench/trove-web-archives-collections&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This dataset contains details of the subject and collection groupings used by Pandora to organise archived web resource titles.&lt;/p&gt;
&lt;p&gt;There are two data files in this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-web-archives-collections/raw/main/pandora-subjects.ndjson&#34;&gt;List of subject groupings&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-web-archives-collections/raw/main/pandora-collections.ndjson&#34;&gt;List of collection groupings&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are 2 versions of this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;2 May 2024&lt;/li&gt;
&lt;li&gt;7 May 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;ned-periodicals&#34;&gt;NED periodicals&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-ned-periodicals-data&#34;&gt;https://github.com/GLAM-Workbench/trove-ned-periodicals-data&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This dataset contains details of periodical titles and issues submitted to the Trove through the NLA&amp;rsquo;s National edeposit scheme.&lt;/p&gt;
&lt;p&gt;There are 3 data files in this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-ned-periodicals-data/raw/main/ned-periodicals.csv&#34;&gt;Metadata of periodical titles submitted through NED&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-ned-periodicals-data/raw/main/ned-periodical-issues.csv&#34;&gt;Metadata of periodical issues submitted through NED&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-ned-periodicals-data/raw/main/ned-periodicals.db&#34;&gt;SQLite database combining linked titles and issues data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There is one version of this dataset:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;10 April 2024&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;7-national-archives-of-australia&#34;&gt;7. National Archives of Australia&lt;/h2&gt;
&lt;p&gt;The NAA datasets are all over the place at present and I need to do a lot of work to get them standardised and organised. These are the main datasets, but there are others I need to add.&lt;/p&gt;
&lt;h3 id=&#34;records-with-the-access-status-closed&#34;&gt;Records with the access status &amp;lsquo;Closed&amp;rsquo;&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/wragge/closed_access&#34;&gt;https://github.com/wragge/closed_access&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch&#34;&gt;https://github.com/GLAM-Workbench/recordsearch&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Versions in Figshare:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://doi.org/10.6084/m9.figshare.3443867.v1&#34;&gt;Files in the National Archives of Australia currently withheld from public access&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://doi.org/10.6084/m9.figshare.2060052.v1&#34;&gt;Files in the National Archives of Australia withheld from public access in 2015&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://doi.org/10.6084/m9.figshare.4530851.v1&#34;&gt;Files in the National Archives of Australia withheld from public access in 2016&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://doi.org/10.6084/m9.figshare.5900125.v1&#34;&gt;Files in the National Archives of Australia withheld from public access in 2017&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Versions in GitHub:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/blob/master/data/closed-20160101.csv&#34;&gt;January 2016&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/blob/master/data/closed-20170109.csv&#34;&gt;January 2017&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/blob/master/data/closed-20180101.csv&#34;&gt;January 2018&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/blob/master/data/closed-20190101.csv&#34;&gt;January 2019&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/blob/master/data/closed-20200101.csv&#34;&gt;January 2020&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/blob/master/data/closed-20210101.csv&#34;&gt;January 2021&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/blob/master/data/closed-20220101.csv&#34;&gt;January 2022&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;January 2023 (harvested but not in repo yet)&lt;/li&gt;
&lt;li&gt;January 2024 (harvested but not in repo yet)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;summary-data-about-all-series-in-recordsearch&#34;&gt;Summary data about all series in RecordSearch&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch&#34;&gt;https://github.com/GLAM-Workbench/recordsearch&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;CSV file containing basic descriptive information about all the series currently registered on RecordSearch as well as the total number of items described, digitised, and in each access category.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/blob/master/series_totals_May_2021.csv&#34;&gt;Summary data about all series in RecordSearch, May 2021&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/blob/master/series_totals_April_2022.csv&#34;&gt;Summary data about all series in RecordSearch, April 2022&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;recently-digitised-files&#34;&gt;Recently digitised files&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch&#34;&gt;https://github.com/GLAM-Workbench/recordsearch&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/blob/master/data/recently-digitised-20210327&#34;&gt;Details of files digitised between 25 February and 26 March 2020&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;recently-digitised-files--weekly-snapshots&#34;&gt;Recently digitised files – weekly snapshots&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/wragge/naa-recently-digitised&#34;&gt;https://github.com/wragge/naa-recently-digitised&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This dataset contains weekly harvests of newly digitised files in RecordSearch. The  automated scraper is currently scheduled to run each Sunday, saving a  list of files that have been digitised in the previous week.&lt;/p&gt;
&lt;p&gt;There are 177 data files, created between 28 March 2021 and the present.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Saving Trove&#39;s digitised periodicals as PDFs</title>
      <link>https://updates.timsherratt.org/2024/09/19/saving-troves-digitised.html</link>
      <pubDate>Thu, 19 Sep 2024 13:46:22 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/09/19/saving-troves-digitised.html</guid>
      <description>&lt;p&gt;I was recently contacted by a researcher who wanted to be able to automatically download the issues of a digitised periodical in Trove as PDFs. There was already a &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/harvest_newspaper_issues_as_pdfs/&#34;&gt;notebook in the GLAM Workbench&lt;/a&gt; that downloads the issues of a digitised &lt;em&gt;newspaper&lt;/em&gt; as PDFs, but newspapers work differently to other digitised periodicals in Trove. While there was no corresponding notebook for other types of periodicals, all the necessary steps were documented in the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt;, so it was just a matter of pulling together a few blocks of code.&lt;/p&gt;
&lt;p&gt;There are three main steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;get the &lt;code&gt;nla.obj&lt;/code&gt; identifiers for all the periodical&amp;rsquo;s issues&lt;/li&gt;
&lt;li&gt;get the number of pages in each issue&lt;/li&gt;
&lt;li&gt;construct a url to download each issue as a PDF using the &lt;code&gt;nla.obj&lt;/code&gt; identifier and the number of pages&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;get-issue-identifiers&#34;&gt;Get issue identifiers&lt;/h2&gt;
&lt;p&gt;Version 3 of the Trove API added a new endpoint to &lt;a href=&#34;https://tdg.glam-workbench.net/other-digitised-resources/periodicals/accessing-data.html#using-the-magazine-titles-api-endpoint&#34;&gt;provide information about periodical titles and issues&lt;/a&gt;. However, the issues data provided by the API is incomplete. A more  reliable alternative is to scrape the list of issues from the browse  window in the digitised object viewer – see &lt;a href=&#34;https://tdg.glam-workbench.net/other-digitised-resources/how-to/get-collection-items.html&#34;&gt;HOW TO: Get a list of items from a digitised collection&lt;/a&gt; in the &lt;em&gt;Trove Data Guide&lt;/em&gt;.&lt;/p&gt;
&lt;h2 id=&#34;get-number-of-pages-in-each-issue&#34;&gt;Get number of pages in each issue&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s possible to scrape the number of pages along with the identifiers  in the previous step. However, I&amp;rsquo;m not certain that the information is  displayed consistently across all periodicals. To play it safe, you can extract embedded metadata from the digitised object viewer and  get the number of pages, issue dates, and publication details (if  available). See &lt;a href=&#34;https://tdg.glam-workbench.net/other-digitised-resources/how-to/extract-embedded-metadata.html&#34;&gt;HOW TO: Extract additional metadata from the digitised resource viewer&lt;/a&gt; in the &lt;em&gt;Trove Data Guide&lt;/em&gt;.&lt;/p&gt;
&lt;h2 id=&#34;download-pdfs&#34;&gt;Download PDFs&lt;/h2&gt;
&lt;p&gt;Once you have an issue&amp;rsquo;s identifier and number of pages you can construct a url to download it as a PDF. See: &lt;a href=&#34;https://tdg.glam-workbench.net/other-digitised-resources/how-to/get-downloads.html&#34;&gt;HOW TO: Get text, images, and PDFs using Trove’s download link&lt;/a&gt; in the &lt;em&gt;Trove Data Guide&lt;/em&gt;.&lt;/p&gt;
&lt;h2 id=&#34;putting-it-all-together&#34;&gt;Putting it all together&lt;/h2&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-09-19-12-42-49.png&#34; width=&#34;600&#34; height=&#34;596&#34; alt=&#34;Screencap of page in the GLAM Workbench&#34;&gt;
&lt;p&gt;It seemed like this would be useful to other researchers as well, so I&amp;rsquo;ve created a new notebook in the Trove Periodicals section of the GLAM Workbench that puts all of this together, see: &lt;a href=&#34;https://glam-workbench.net/trove-journals/save-issues-as-pdfs/&#34;&gt;Download issues of a periodical as PDFs&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>The future (and past) of Historic Hansard</title>
      <link>https://updates.timsherratt.org/2024/08/29/the-future-and.html</link>
      <pubDate>Thu, 29 Aug 2024 14:59:45 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/08/29/the-future-and.html</guid>
      <description>&lt;p&gt;Don&amp;rsquo;t panic! &lt;a href=&#34;https://historichansard.net/&#34;&gt;Historic Hansard&lt;/a&gt; is not closing down – on the contrary, I&amp;rsquo;m planning a major update in the next few months. But as I look to the future, I thought it was a good time to pull together a few threads documenting my adventures with Commonwealth Hansard.&lt;/p&gt;
&lt;h2 id=&#34;the-past&#34;&gt;The past&lt;/h2&gt;
&lt;p&gt;Commonwealth Hansard is made available &lt;a href=&#34;http://parlinfo.aph.gov.au/parlInfo/search/summary/summary.w3p;adv%3Dyes;orderBy%3D_fragment_number,doc_date-rev;query%3DDataset%3Ahansardr,hansardr80;resCount%3DDefault&#34;&gt;online through ParlInfo&lt;/a&gt; (there&amp;rsquo;s an &lt;a href=&#34;https://www.aph.gov.au/Parliamentary_Business/Hansard/Search&#34;&gt;alternative search interface here&lt;/a&gt;). The Parliamentary Library has invested a lot of time and effort in  converting the printed volumes into nicely-structured XML files which break up the sitting day into debates and speeches, and identify individual speakers. For the most part, there&amp;rsquo;s one XML file for each sitting day in each house. However, there&amp;rsquo;s currently a gap between 1981 and 1997 where no XML files are available.&lt;/p&gt;
&lt;p&gt;I started pulling data out of ParlInfo around 2011, but in 2016 I decided it would be more efficient to harvest all of the XML files into a &lt;a href=&#34;https://github.com/wragge/hansard-xml&#34;&gt;dedicated repository&lt;/a&gt;. I started with the House of Representatives debates to 1965, and gradually expanded the coverage. The &lt;a href=&#34;https://github.com/wragge/hansard-xml&#34;&gt;repository&lt;/a&gt; now contains all the Hansard XML files for both houses from 1901 to 1980, and 1998 to 2005. I stopped in 2005 because &lt;a href=&#34;https://www.openaustralia.org.au/&#34;&gt;Open Australia&lt;/a&gt; provides access to the Hansard XML files from 2006 onwards.&lt;/p&gt;
&lt;p&gt;The harvesting process revealed some interesting anomalies. In particular, &lt;a href=&#34;https://timsherratt.org/research-notebook/historic-hansard/notes/investigating-the-hansard-black-hole/&#34;&gt;I discovered that Parlinfo was missing Hansard from 94 sitting days&lt;/a&gt;. Most of the gaps were in the Senate between 1910 and 1920.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-08-29-13-39-41.png&#34; width=&#34;600&#34; height=&#34;254&#34; alt=&#34;Chart showing &#39;missing&#39; Hansard in the Senate 1901 to 1925&#34;&gt;
&lt;p&gt;Fortunately the Parliamentary Library was quick to investigate the problem and fill the gaps. Over the years since, they&amp;rsquo;ve continued to improve the quality and accuracy of the XML files. Earlier this year I noticed that lots of new versions of the XML files were appearing and so I &lt;a href=&#34;https://updates.timsherratt.org/2024/05/26/commonwealth-hansard-xml.html&#34;&gt;reharvested them all&lt;/a&gt;. It looks like the accuracy of the OCRd text has been improved and some structural issues fixed. This is one reason why a new version of Historic Hansard is needed!&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;ve ever used ParlInfo you&amp;rsquo;ll know that while you can &lt;em&gt;find&lt;/em&gt; things in Hansard, &lt;em&gt;browsing&lt;/em&gt; and &lt;em&gt;reading&lt;/em&gt; are difficult. There&amp;rsquo;s no easy way of just perusing the proceedings of a single day. A few months after I created the XML repository I decided to use the files to build &lt;a href=&#34;https://historichansard.net/&#34;&gt;Historic Hansard&lt;/a&gt; – &amp;lsquo;Commonwealth of Australia parliamentary debates presented in an  easy-to-read format for historians and other lovers of political speech&amp;rsquo;. Historic Hansard is mostly just a static site, with one web page for each sitting day. Unlike ParlInfo, the focus is on reading, and you can view each speech within the context of the complete day&amp;rsquo;s proceedings.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-08-29-13-41-16.png&#34; width=&#34;600&#34; height=&#34;286&#34; alt=&#34;Screenshot of the home page of Historic Hansard&#34;&gt;
&lt;p&gt;You can browse Historic Hansard by house, parliament, year, and day. There are also indexes to the bills presented in the &lt;a href=&#34;https://historichansard.net/hofreps/bills/&#34;&gt;House of Representatives&lt;/a&gt; and the &lt;a href=&#34;https://historichansard.net/senate/bills/&#34;&gt;Senate&lt;/a&gt;, and pages for every person in the &lt;a href=&#34;https://historichansard.net/hofreps/people/&#34;&gt;House of Representatives&lt;/a&gt; and the &lt;a href=&#34;https://historichansard.net/senate/people/&#34;&gt;Senate&lt;/a&gt; with a complete list of their speeches. Because the focus was on browsing, Historic Hansard didn&amp;rsquo;t originally include a full text search function, but I eventually succumbed to user demand and &lt;a href=&#34;https://search.historichansard.net/&#34;&gt;added one in 2017&lt;/a&gt;. You can search for either debates or speeches, and download your results as a CSV file.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-08-29-13-43-29.png&#34; width=&#34;600&#34; height=&#34;268&#34; alt=&#34;Screenshot of Historic Hansard&#39;s search interface&#34;&gt;
&lt;p&gt;I also integrated Historic Hansard with &lt;a href=&#34;https://web.hypothes.is/&#34;&gt;Hypothes.is&lt;/a&gt; and &lt;a href=&#34;https://voyant-tools.org/&#34;&gt;Voyant Tools&lt;/a&gt;. Using Hypothes.is you can add notes and annotations to the speeches. You can even &lt;a href=&#34;https://timsherratt.org/research-notebook/historic-hansard/notes/two-way-direct-linking/&#34;&gt;create deep links to fragments of text within a speech&lt;/a&gt;. I&amp;rsquo;ve often suggested that you could structure a whole undergraduate history unit around the annotation of a year of Hansard – identifying people and events, and finding and linking to related information. The Voyant Tools integration allowed you to analyse and visualise the language of a complete year of Hansard. Unfortunately I broke this at some point, so it&amp;rsquo;s on my list of things to fix in the new version!&lt;/p&gt;
&lt;p&gt;In 2017, I did a bit of work with David Lowe at Deakin University to analyse &lt;a href=&#34;https://wragge.github.io/hansard-language/&#34;&gt;the language of Hansard&lt;/a&gt;. Most of it was focused on the 1970s, but I did create word clouds using the top 200 TF-IDF weighted words for &lt;a href=&#34;https://wragge.github.io/hansard-language/decades/tfidf-clouds/&#34;&gt;each decade&lt;/a&gt; and &lt;a href=&#34;https://wragge.github.io/hansard-language/decades/parliament-tfidf-clouds/&#34;&gt;each parliament&lt;/a&gt;. In 2019, I compared the &lt;a href=&#34;https://doi.org/10.5281/zenodo.3544686&#34;&gt;usage of the term &amp;lsquo;aliens&amp;rsquo;&lt;/a&gt; in Hansard, newspapers, and &lt;em&gt;The Bulletin&lt;/em&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-08-29-13-45-43.png&#34; width=&#34;600&#34; height=&#34;466&#34; alt=&#34;Comparison of words associated with &#39;aliens&#39; in The Bulletin and Hansard&#34;&gt;
&lt;p&gt;One interesting feature of the Hansard XML is that it identifies &lt;em&gt;interjections&lt;/em&gt; as well as formal speeches. In 2017, I extracted almost a million interjections from Hansard into a &lt;a href=&#34;https://github.com/wragge/hansard-interjections&#34;&gt;separate dataset&lt;/a&gt;. One of my favourite experiments used the interjections to reimagine political communication in the pre-internet era by &lt;a href=&#34;http://hansard-interjections.herokuapp.com/tweets/&#34;&gt;transforming interjections into (fake) tweets&lt;/a&gt;. Of course I also started feeding the interjections through a text-to-speech processor, creating &lt;a href=&#34;https://youtu.be/8fz734LpyVU&#34;&gt;RoboHansard&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-08-29-13-48-55.png&#34; width=&#34;600&#34; height=&#34;402&#34; alt=&#34;Screenshot of Real Words :: Imagined Tweets showing Hansard interjections reimagine as tweets&#34;&gt;
&lt;p&gt;During the &lt;a href=&#34;https://www.realfaceofwhiteaustralia.net/&#34;&gt;Real Face of White Australia&lt;/a&gt; transcribe-a-thon at Old Parliament House in 2017, I had the chance to take some of the interjections back to the place they were uttered. I set up speakers around the House of Representatives chamber and then started them hurling interjections about the White Australia Policy at each other. The drama (and spookiness) was heightened when a power failure turned off all the lights!&lt;/p&gt;
&lt;p&gt;I summarised my &lt;a href=&#34;https://youtu.be/TFzhiXH3_eg&#34;&gt;adventures with Historic Hansard&lt;/a&gt; for a conference in South Africa in 2020.&lt;/p&gt;
&lt;iframe width=&#34;100%&#34; height=&#34;400&#34; src=&#34;https://www.youtube.com/embed/TFzhiXH3_eg?si=LKUF15B41ul8urZj&#34; title=&#34;YouTube video player&#34; frameborder=&#34;0&#34; allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&#34; referrerpolicy=&#34;strict-origin-when-cross-origin&#34; allowfullscreen&gt;&lt;/iframe&gt;
&lt;p&gt;I also wrote a piece on &amp;lsquo;The language of Parliament&amp;rsquo; for the Museum of Australian Democracy which, unfortunately, seems to have disappeared during their latest site update. You can find it on &lt;a href=&#34;https://doi.org/10.5281/zenodo.6872590&#34;&gt;Zenodo&lt;/a&gt; or in the &lt;a href=&#34;https://webarchive.nla.gov.au/awa/20170401163437/https://moadoph.gov.au/blog/the-language-of-parliament/&#34;&gt;Australian Web Archive&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In more recent times, I&amp;rsquo;ve integrated the XML harvesting code &lt;a href=&#34;https://glam-workbench.net/hansard/&#34;&gt;into the GLAM Workbench&lt;/a&gt; and added some Jupyter notebooks that demonstrate how you can access and analyse files from the repository.&lt;/p&gt;
&lt;h2 id=&#34;the-future&#34;&gt;The future&lt;/h2&gt;
&lt;p&gt;A number of historians have told me how much &lt;a href=&#34;https://historichansard.net/&#34;&gt;Historic Hansard&lt;/a&gt; has helped their research. A quick &lt;a href=&#34;https://scholar.google.com/scholar?q=%22historichansard.net%22&#34;&gt;search in Google Scholar for &amp;ldquo;historichansard.net&amp;rdquo;&lt;/a&gt; returns 78 publications, many of which seem to include multiple links to specific sitting days. It seems I accidentally created a key piece of digital research infrastructure for Australian historians.&lt;/p&gt;
&lt;p&gt;Given that, I&amp;rsquo;m not intending to change very much. My current plans include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;update all the content to use the latest versions of the XML&lt;/li&gt;
&lt;li&gt;extend the date range to include files from 1998 to 2005&lt;/li&gt;
&lt;li&gt;get the Voyant integration working again&lt;/li&gt;
&lt;li&gt;make some back-end changes to the search application&lt;/li&gt;
&lt;li&gt;fix various outdated links&lt;/li&gt;
&lt;li&gt;I&amp;rsquo;d also like to see if I can clean up the Bills indexes a bit by merging together some of the titles, but that&amp;rsquo;ll require a bit of experimentation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Are there any improvements you&amp;rsquo;d like to see? If you have any suggestions, feel free to &lt;a href=&#34;https://github.com/wragge/historic-hansard/issues&#34;&gt;add an issue to the GitHub repository&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://historichansard.net/&#34;&gt;Historic Hansard&lt;/a&gt; is one of my passion projects. I haven&amp;rsquo;t received any funding to create or maintain it. Currently I estimate it costs me about AU$400 a year to run. It&amp;rsquo;s not much in the world of research infrastructure but, from a personal perspective, it all adds up. I&amp;rsquo;m committed to keeping Historic Hansard going, and it&amp;rsquo;s already outlasted some similar, well-funded projects, but if you&amp;rsquo;d like to help with the costs you can &lt;a href=&#34;https://github.com/sponsors/wragge&#34;&gt;sponsor me on GitHub&lt;/a&gt;, or just &lt;a href=&#34;https://www.buymeacoffee.com/wragge&#34;&gt;buy me a coffee&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There seems to be a lot of interest in Hansard amongst digital humanities researchers at the moment, and there&amp;rsquo;s a couple of new projects starting up. It&amp;rsquo;ll be interesting to see where they go, but whatever happens, Historic Hansard will continue to serve lovers of political speech.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Join the Research Data Alliance&#39;s new Collections as Data Interest Group!</title>
      <link>https://updates.timsherratt.org/2024/08/27/join-the-research.html</link>
      <pubDate>Tue, 27 Aug 2024 14:24:10 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/08/27/join-the-research.html</guid>
      <description>&lt;p&gt;If you&amp;rsquo;re interested in opening up GLAM collections for use in research, you might like to join the new &lt;a href=&#34;https://www.rd-alliance.org/groups/collections-as-data-ig/&#34;&gt;Collections as Data Interest Group&lt;/a&gt;, part of the &lt;a href=&#34;https://www.rd-alliance.org/&#34;&gt;Research Data Alliance&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-08-27-13-19-16.png&#34; width=&#34;600&#34; height=&#34;309&#34; alt=&#34;Screenshot of Collections as Data IG page&#34;&gt;
&lt;p&gt;According to the group description:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This group is aimed at collections  professionals such as archivists, librarians, records managers and  museum curators, as well as related professions such as IT  professionals, knowledge scientists, and those involved in standards  development, who serve in a range of critical roles: as experts in  ensuring access, preservation, and reuse of digital records, objects,  data, and collections; as provocateurs for good collections curation  practices; and as advocates for the construction of responsible and  sustainable infrastructures for information sharing.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The group has been running for a few months now, but communication has been difficult due to the upgrade of the RDA website. Things now seem to be working ok, so it&amp;rsquo;s time to spread the word!&lt;/p&gt;
&lt;p&gt;The group has monthly online meetings. To try and cover a variety of timezones, we&amp;rsquo;re rotating the meeting times. &lt;strong&gt;The next meeting will be held in an Australia/Asia/Pacific timeslot – 3pm AEST (5.00am UTC), on Wednesday, 18 September.&lt;/strong&gt; Further meeting details, including a registration link and agenda &lt;a href=&#34;https://www.rd-alliance.org/groups/collections-as-data-ig/forum/topic/next-rda-collections-as-data-ig-meeting-18-september/&#34;&gt;are available here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To become part of the Collections as Data Interest Group, first &lt;a href=&#34;https://www.rd-alliance.org/individual-membership/&#34;&gt;register for membership&lt;/a&gt; of the Research Data Alliance. Once you&amp;rsquo;re logged into the RDA site, visit the &lt;a href=&#34;https://www.rd-alliance.org/groups/collections-as-data-ig/&#34;&gt;Collections as Data IG page&lt;/a&gt; and click on the &lt;strong&gt;Join Group&lt;/strong&gt; button. You&amp;rsquo;ll then receive notifications of new group posts and activities. You&amp;rsquo;ll also be able to access archived group meeting minutes under the &amp;lsquo;Wiki&amp;rsquo; tab.&lt;/p&gt;
&lt;p&gt;Current activities include planning for a session on &amp;lsquo;Gathering Metrics and Setting Boundaries: Reusing Collections as Data and the impacts of AI&amp;rsquo; at the &lt;a href=&#34;https://www.rd-alliance.org/event/rda-23rd-plenary-meeting-san-jose-costa-rica/&#34;&gt;RDA&amp;rsquo;s 23rd Plenary Meeting&lt;/a&gt; in November. There&amp;rsquo;s also a &lt;a href=&#34;https://www.rd-alliance.org/groups/collections-as-data-ig/forum/topic/zotero-library-for-cad/&#34;&gt;Zotero group&lt;/a&gt; where we&amp;rsquo;ve started to capture useful resources.&lt;/p&gt;
&lt;p&gt;Given my work on the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; and the &lt;a href=&#34;https://tdg.glam-workbench.net&#34;&gt;Trove Data Guide&lt;/a&gt;, I&amp;rsquo;m particularly interested in ways we can collaborate to engage researchers, build tools, and share resources. But what are your interests? I&amp;rsquo;m the Australian-based co-chair, so let me know if there are topics you&amp;rsquo;d like to discuss in future meetings.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>More datasets added to GLAM Name Index Search – now almost 12 million rows of data!</title>
      <link>https://updates.timsherratt.org/2024/08/26/more-datasets-added.html</link>
      <pubDate>Mon, 26 Aug 2024 18:54:48 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/08/26/more-datasets-added.html</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Index Search&lt;/a&gt; brings datasets from 10 Australian GLAM organisations together into a single  search interface. All these datasets index collections by people’s  names, so with one search you can find information about individuals  across a broad range of records, locations, and periods. It was created as an experiment during Family History Week in 2021, so I thought I&amp;rsquo;d update it for Family History Week 2024.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The update added 18 new datasets, so the GLAM Name Index Search now includes 279 datasets from 10 organisations – almost 12 million rows of data!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;Start exploring now!&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Most of the datasets come from government open data portals, so the GLAM Name Index Search is also a demonstration the value of data sharing. By making these datasets openly available, GLAM organisations support the development of new tools and resources that work across institutional boundaries.&lt;/p&gt;
&lt;p&gt;As well as updating the datasets (details below), I took the chance to tweak the interface a bit. In particular, there&amp;rsquo;s now a new, svelte dark mode for people (like me) who struggle with screen glare.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-08-26-17-36-46.png&#34; width=&#34;600&#34; height=&#34;484&#34; alt=&#34;Screenshot of the home page of the GLAM Name Index Search in dark mode&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The GLAM Name Index Search is created using &lt;a href=&#34;https://datasette.io/&#34;&gt;Datasette&lt;/a&gt; with a custom theme. For more information on the how I compiled the data, see the &lt;a href=&#34;https://glam-workbench.net/glam-data-portals/&#34;&gt;GLAM data portals&lt;/a&gt; section of the GLAM Workbench.&lt;/p&gt;
&lt;h2 id=&#34;new-datasets&#34;&gt;New datasets&lt;/h2&gt;
&lt;h3 id=&#34;queensland-state-archives&#34;&gt;Queensland State Archives&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Book of Trials 1835-1842 (created 2024-01-08)&lt;/li&gt;
&lt;li&gt;Brisbane Gaol Prisoners 1850-1898 (created 2022-08-30)&lt;/li&gt;
&lt;li&gt;Brisbane Passenger Arrivals 1913-1915 (created 2023-11-14)&lt;/li&gt;
&lt;li&gt;Gympie Mining Claims 1868-1901 (created 2023-01-30)&lt;/li&gt;
&lt;li&gt;Immigrant Remittances, IMA Maryborough 1864-1905. (S13055, S13060 &amp;amp; S13061) (created 2022-08-30)&lt;/li&gt;
&lt;li&gt;Immigrant Remittances, Nanango 1863-1901 (created 2024-08-13)&lt;/li&gt;
&lt;li&gt;Index to Deed Polls (Change of Name) 1889 - 1920.csv (created 2024-01-08)&lt;/li&gt;
&lt;li&gt;Index to Immigrants and Missing Immigrants cards 1920-1945 (created 2024-01-08)&lt;/li&gt;
&lt;li&gt;Index to Inquests 1859-1905 (digital) (created 2023-11-20)&lt;/li&gt;
&lt;li&gt;Index to Register of Immigrants, Maryborough 1871-1915 (created 2023-11-17)&lt;/li&gt;
&lt;li&gt;Land Orders 1861 - 1878 (created 2022-11-30)&lt;/li&gt;
&lt;li&gt;Records on Frontier Wars and violence in Queensland (created 2023-05-29)&lt;/li&gt;
&lt;li&gt;Reformatory School for Boys 1871-1906 (created 2022-09-05)&lt;/li&gt;
&lt;li&gt;Teachers in the Education Office Gazettes 1926-1952 (created 2022-08-30)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;nsw-state-archives&#34;&gt;NSW State Archives&lt;/h3&gt;
&lt;p&gt;Most of the NSW State Archives indexes are harvested from their web site. I &lt;a href=&#34;https://updates.timsherratt.org/2023/05/08/updated-harvest-of.html&#34;&gt;last updated them&lt;/a&gt; about a year ago, but there have been some additions since then.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Assisted immigrants digitised shipping lists 1828-1896&lt;/li&gt;
&lt;li&gt;Governor&amp;rsquo;s Court Case Papers&lt;/li&gt;
&lt;li&gt;Small Debts Registers Index&lt;/li&gt;
&lt;li&gt;Wages paid to orphans&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;updated-datasets&#34;&gt;Updated datasets&lt;/h2&gt;
&lt;h3 id=&#34;libraries-tasmania&#34;&gt;Libraries Tasmania&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Bankruptcy - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Colonial Secretary correspondence - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Court - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Education - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Employment - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Hotels &amp;amp; Properties - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Land Records - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Miscellaneous - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Tasmanian Arrivals - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Tasmanian Births - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Tasmanian Census - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Tasmanian Convicts - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Tasmanian Convicts - permission to marry - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Tasmanian Deaths - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Tasmanian Departures - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Tasmanian Divorces - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Tasmanian Health &amp;amp; Welfare Records - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Tasmanian Immigration - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Tasmanian Inquests - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Tasmanian Marriages - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Tasmanian Naturalisations - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Tasmanian Prisoners - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;Tasmanian Wills - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;li&gt;World War One Tasmanian Photographs - CSV (modified 2024-08-14)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;nsw-state-archives-1&#34;&gt;NSW State Archives&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Inquest index 1851, 1916-1963&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;queensland-state-archives-1&#34;&gt;Queensland State Archives&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Assisted immigration 1848 to 1912 - combined (modified 2023-08-28)&lt;/li&gt;
&lt;li&gt;Australian South Sea Islanders 1867 - 1948 (modified 2023-08-11)&lt;/li&gt;
&lt;li&gt;Book of Trials 1835-1842 (modified 2024-07-12)&lt;/li&gt;
&lt;li&gt;Brisbane Gaol Prisoners 1850-1898 (modified 2024-05-28)&lt;/li&gt;
&lt;li&gt;Brisbane Passenger Arrivals 1913-1915 (modified 2024-07-12)&lt;/li&gt;
&lt;li&gt;Civil servants 1851 - 1867 (modified 2023-02-10)&lt;/li&gt;
&lt;li&gt;Consumptive patients 1897 to 1903 (modified 2023-07-24)&lt;/li&gt;
&lt;li&gt;Criminal Depositions 1861-1900 (modified 2024-08-02)&lt;/li&gt;
&lt;li&gt;Farm Lads 1922-1940 (modified 2023-10-24)&lt;/li&gt;
&lt;li&gt;Female prisoners admitted, Toowoomba 1887-1891 (modified 2023-08-14)&lt;/li&gt;
&lt;li&gt;Gympie Mining Claims 1868-1901 (modified 2023-01-30)&lt;/li&gt;
&lt;li&gt;Immigrant Remittances, IMA Maryborough 1864-1905. (S13055, S13060 &amp;amp; S13061) (modified 2022-08-30)&lt;/li&gt;
&lt;li&gt;Immigrant Remittances, Nanango 1863-1901 (modified 2024-08-20)&lt;/li&gt;
&lt;li&gt;Immigrants 1909-1932 (modified 2024-06-13)&lt;/li&gt;
&lt;li&gt;Immigrants and Crew 1860 - 1864 (modified 2024-06-12)&lt;/li&gt;
&lt;li&gt;Immigrants landed Bowen 1888-1896 (modified 2024-06-13)&lt;/li&gt;
&lt;li&gt;Immigrants nominated for passage, Maryborough 1884 to 1907 (modified 2023-02-10)&lt;/li&gt;
&lt;li&gt;Immigration 1922 to 1940 (modified 2023-05-30)&lt;/li&gt;
&lt;li&gt;Index to Deed Polls (Change of Name) 1889 - 1920.csv (modified 2024-01-08)&lt;/li&gt;
&lt;li&gt;Index to Immigrants and Missing Immigrants cards 1920-1945 (modified 2024-02-15)&lt;/li&gt;
&lt;li&gt;Index to Inquests 1859-1905 (digital) (modified 2024-07-09)&lt;/li&gt;
&lt;li&gt;Index to Outdoor Relief 1892-1920 (modified 2023-08-07)&lt;/li&gt;
&lt;li&gt;Index to Register of  Cases and treatment at Moreton Bay Hospital 1830-1862 (modified 2023-02-13)&lt;/li&gt;
&lt;li&gt;Index to Register of Aliens 1922-1923 - Sugar Exemptions (modified 2023-02-13)&lt;/li&gt;
&lt;li&gt;Index to Register of Immigrants, Maryborough 1871-1915 (modified 2024-07-04)&lt;/li&gt;
&lt;li&gt;Inquests 1859 to 1905 (non-digital) (modified 2024-07-09)&lt;/li&gt;
&lt;li&gt;Justices of the Peace 1857 to 1957 (modified 2023-10-26)&lt;/li&gt;
&lt;li&gt;Land Orders 1861 - 1878 (modified 2022-11-30)&lt;/li&gt;
&lt;li&gt;Land Selections 1885-1981 (modified 2024-06-13)&lt;/li&gt;
&lt;li&gt;Land orders 1861 to 1874 (modified 2022-11-30)&lt;/li&gt;
&lt;li&gt;Land orders 1862 to 1878 (modified 2022-11-30)&lt;/li&gt;
&lt;li&gt;Land orders 1865 to 1866 (Lands Dept) (modified 2023-02-10)&lt;/li&gt;
&lt;li&gt;Mackay Hospital admissions 1891 to 1908 (modified 2023-07-24)&lt;/li&gt;
&lt;li&gt;Miners rights 1874 to 1880 A-Z (modified 2022-09-06)&lt;/li&gt;
&lt;li&gt;Nominated immigrants 1905-1928 (modified 2024-07-05)&lt;/li&gt;
&lt;li&gt;Nurses examinations 1912 to 1925 (modified 2023-08-14)&lt;/li&gt;
&lt;li&gt;Passport clearances 1923-1940 (modified 2024-02-16)&lt;/li&gt;
&lt;li&gt;Prisoners admitted, Toowoomba 1895-1906 (modified 2023-08-11)&lt;/li&gt;
&lt;li&gt;Prisoners discharged, Toowoomba 1869-1879 (modified 2023-07-31)&lt;/li&gt;
&lt;li&gt;Prisoners tried, Toowoomba 1864-1903 (modified 2023-07-31)&lt;/li&gt;
&lt;li&gt;Records on Frontier Wars and violence in Queensland (modified 2023-05-29)&lt;/li&gt;
&lt;li&gt;Redeemed Land Orders 1860-1906 (modified 2024-06-13)&lt;/li&gt;
&lt;li&gt;Reformatory School for Boys 1871-1906 (modified 2022-09-05)&lt;/li&gt;
&lt;li&gt;Register of Court Fees Marburg 1885 to 1908 (modified 2023-09-05)&lt;/li&gt;
&lt;li&gt;Register of Lands Sold 1842-1868 (modified 2022-08-30)&lt;/li&gt;
&lt;li&gt;Register of immigrants 1864 to 1878 (modified 2023-09-01)&lt;/li&gt;
&lt;li&gt;Register of immigrants, Toowoomba 1880 to 1888 (modified 2023-08-30)&lt;/li&gt;
&lt;li&gt;Register of the Engagement of Immigrants at the Immigration Depot - Bowen 1873-1912 (modified 2023-09-05)&lt;/li&gt;
&lt;li&gt;Registers of immigrants 1882 to 1938 combined (modified 2023-08-11)&lt;/li&gt;
&lt;li&gt;Seamen 1882 to 1919 (modified 2022-11-30)&lt;/li&gt;
&lt;li&gt;South African (Boer War) service records 1899-1902 (modified 2023-05-30)&lt;/li&gt;
&lt;li&gt;South African (Boer) War Paybooks  1899-1902 (modified 2023-08-14)&lt;/li&gt;
&lt;li&gt;Teachers in the Education Office Gazettes 1926-1952 (modified 2022-08-30)&lt;/li&gt;
&lt;li&gt;Toowoomba Prisoners 1864-1906 (modified 2023-07-31)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;state-library-of-queensland&#34;&gt;State Library of Queensland&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Queensland railway appointees 1890-1915 (modified 2024-07-25)&lt;/li&gt;
&lt;li&gt;Queensland railway removals 1890-1915 (modified 2024-07-25)&lt;/li&gt;
&lt;li&gt;Southern and Western Railway appointees 1866-1876 (modified 2024-07-25)&lt;/li&gt;
&lt;li&gt;Southern and Western Railway removals 1866-1876 (modified 2024-07-25)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;related-updates&#34;&gt;Related updates&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2023/05/08/updated-harvest-of.html&#34;&gt;2023-05-08&lt;/a&gt;:  &lt;strong&gt;Updated harvest of NSW State Archives indexes – more than 2 million rows of data!&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2021/10/18/more-glam-name.html&#34;&gt;2021-10-18&lt;/a&gt;:  &lt;strong&gt;More GLAM Name Index updates from Queensland State Archives and SLWA&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2021/09/29/more-records-for.html&#34;&gt;2021-09-29&lt;/a&gt;: &lt;strong&gt;More records for the GLAM Name Index Search&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2021/08/23/a-family-history.html&#34;&gt;2021-08-23&lt;/a&gt;: &lt;strong&gt;A Family History Month experiment – search millions of name records from GLAM organisations&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>New Zotero translators for PROV and Queensland State Archives</title>
      <link>https://updates.timsherratt.org/2024/08/22/new-zotero-translators.html</link>
      <pubDate>Thu, 22 Aug 2024 14:35:55 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/08/22/new-zotero-translators.html</guid>
      <description>&lt;p&gt;Good news for Australian archives users – you can now use &lt;a href=&#34;https://www.zotero.org/&#34;&gt;Zotero&lt;/a&gt; to capture item details and digitised files from the collections of the &lt;a href=&#34;https://prov.vic.gov.au/&#34;&gt;Public Record Office Victoria&lt;/a&gt; and the &lt;a href=&#34;https://www.qld.gov.au/recreation/arts/heritage/archives&#34;&gt;Queensland State Archives&lt;/a&gt;!&lt;/p&gt;
&lt;h2 id=&#34;what-is-zotero&#34;&gt;What is Zotero?&lt;/h2&gt;
&lt;p&gt;According to the &lt;a href=&#34;https://www.zotero.org/&#34;&gt;Zotero&lt;/a&gt; website:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share research.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;While you can use it instead of commercial reference managers like EndNote, Zotero is much, much more. I use Zotero as my personal research database, capturing publications, websites, PDFs, as well as records from a growing number of GLAM collections. You can add tags and notes to items, organise them into collections, and annotate PDFs and website captures. You can also share your collections, create groups to collaborate with others, and access all your Zotero data via an API. And on top of all that, &lt;a href=&#34;https://www.zotero.org/blog/zotero-7/&#34;&gt;Zotero has just been completely revamped&lt;/a&gt; with a cool new interface (yay, dark mode!).&lt;/p&gt;
&lt;h2 id=&#34;extending-zotero&#34;&gt;Extending Zotero&lt;/h2&gt;
&lt;p&gt;One of my favourite features is the ability for users to extend Zotero by creating new &amp;lsquo;translators&amp;rsquo;. &lt;a href=&#34;https://www.zotero.org/support/dev/translators&#34;&gt;Translators&lt;/a&gt; are little bits of Javascript code that capture information from a website and load it into Zotero. If there&amp;rsquo;s a online collection or database that doesn&amp;rsquo;t currently work with Zotero, you can write a translator for it and have it added into the main application. The &lt;a href=&#34;https://github.com/zotero/translators&#34;&gt;Zotero translators repository&lt;/a&gt; currently contains more than 700 translators created by more than 200 contributors. That&amp;rsquo;s a pretty significant community effort!&lt;/p&gt;
&lt;p&gt;And as of today, the repository includes translators for the PROV and Queensland State Archives collection databases.&lt;/p&gt;
&lt;h2 id=&#34;capturing-prov-records&#34;&gt;Capturing PROV records&lt;/h2&gt;
&lt;p&gt;The PROV translator makes use of &lt;a href=&#34;https://prov.vic.gov.au/prov-collection-api&#34;&gt;PROV&amp;rsquo;s collection API&lt;/a&gt;. It will capture item records from search results or individual item pages – just click on the Zotero icon in your browser&amp;rsquo;s toolbar. Here&amp;rsquo;s an example of the metadata Zotero captures from &lt;a href=&#34;https://prov.vic.gov.au/archive/84BF6CD8-F54F-11E9-AE98-55CAE942BD8E?image=1&#34;&gt;this record&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-08-22-11-53-45.png&#34; width=&#34;600&#34; height=&#34;628&#34; alt=&#34;Screenshot of Zotero details pane showing captured metadata&#34;&gt;
&lt;p&gt;As well as expected details like &amp;lsquo;title&amp;rsquo; and &amp;lsquo;date&amp;rsquo;, the translator saves the agencies that created the file as &amp;lsquo;contributors&amp;rsquo;.  Some additional archival metadata such as the series details are included in the &amp;lsquo;Extra&amp;rsquo; field.&lt;/p&gt;
&lt;p&gt;If a file has been digitised, the translator also captures a number of &amp;lsquo;attachments&amp;rsquo;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;an image of the first page (particularly handy if the item is a photograph)&lt;/li&gt;
&lt;li&gt;a link to a PDF version of the file&lt;/li&gt;
&lt;li&gt;a link to the IIIF manifest describing this file&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-08-22-11-57-33.png&#34; width=&#34;600&#34; height=&#34;641&#34; alt=&#34;&#34;&gt;
&lt;p&gt;Why doesn&amp;rsquo;t it attach the complete PDF rather than just a link? The download PDF links in the PROV interface behave differently depending on the size of the file. If the file is small, the link loads the PDF as expected. But if it&amp;rsquo;s large you&amp;rsquo;re redirected to a web page that tells you that the PDF is being generated and could take up to an hour. There wasn&amp;rsquo;t a straightforward way for Zotero to handle these two possible outcomes, so instead of trying to attach the PDFs, I&amp;rsquo;ve saved the link. You can then click on the link to view/generate the PDF and add it manually to the record if you desire.&lt;/p&gt;
&lt;p&gt;What&amp;rsquo;s an IIIF manifest? PROV delivers digitised images using the &lt;a href=&#34;https://iiif.io/&#34;&gt;International Image Interoperability Framework (IIIF)&lt;/a&gt;. An IIIF manifest is a standard way of describing a group of related images. PROV uses IIIF manifests to describe the contents of each digitised file. The link to the file manifest isn&amp;rsquo;t available through the PROV web interface, but the translator picks it up from the API and attaches it to the Zotero record.&lt;/p&gt;
&lt;p&gt;Because manifests are based on the IIIF standard, they can be reused across different applications. For example, you could use the PROV manifest links to load a digitised file into &lt;a href=&#34;https://tropy.org/&#34;&gt;Tropy&lt;/a&gt;, a desktop tool for managing and annotating images from the creators of Zotero. There&amp;rsquo;s an example of doing this with an IIIF manifest generated from Trove in the &lt;a href=&#34;https://tdg.glam-workbench.net/pathways/images/tropy.html&#34;&gt;Trove Data Guide&lt;/a&gt;. You can also use the manifest links to download all the images from a file using a tool like &lt;a href=&#34;https://www.lizmfischer.com/iiif-tools/download&#34;&gt;IIIF Download&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;capturing-queensland-state-archives-records&#34;&gt;Capturing Queensland State Archives records&lt;/h2&gt;
&lt;p&gt;The QSA translator will capture item records from search results or individual item pages – just click on the Zotero icon in your browser&amp;rsquo;s toolbar. Here&amp;rsquo;s an example of the metadata Zotero captures from &lt;a href=&#34;https://www.archivessearch.qld.gov.au/items/ITM1662471&#34;&gt;this record&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-08-22-12-51-43.png&#34; width=&#34;600&#34; height=&#34;529&#34; alt=&#34;Screenshot of Zotero details pane showing item metadata&#34;&gt;
&lt;p&gt;As far as I can see, digitised files in the QSA collection will either be a PDF or an image. Either way, the translator captures the file and attaches it to the record.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-08-22-12-52-00.png&#34; width=&#34;600&#34; height=&#34;486&#34; alt=&#34;Screenshot of Zotero details pane showing the image attachment&#34;&gt;
&lt;h2 id=&#34;zotero-and-australian-glams&#34;&gt;Zotero and Australian GLAMs&lt;/h2&gt;
&lt;p&gt;Unfortunately there&amp;rsquo;s still a lot of work to be done before all Australian GLAM organisations are integrated with Zotero. Ideally, GLAM collection databases would support Zotero directly by &lt;a href=&#34;https://www.zotero.org/support/dev/exposing_metadata&#34;&gt;embedding metadata&lt;/a&gt; in their web pages. This would make custom translators unnecessary (and support other types of integration as well). But few actually do this. When it comes to embedded metadata we seem to have gone backwards in recent years as systems have been &amp;lsquo;upgraded&amp;rsquo;.&lt;/p&gt;
&lt;p&gt;The good news is that there are now custom translators for &lt;a href=&#34;https://librariestas.ent.sirsidynix.net.au/client/en_AU/library/search/results?qu=&#34;&gt;Libraries Tasmania&lt;/a&gt;, &lt;a href=&#34;https://recordsearch.naa.gov.au/&#34;&gt;National Archives of Australia&lt;/a&gt;, &lt;a href=&#34;https://trove.nla.gov.au/&#34;&gt;Trove&lt;/a&gt;, &lt;a href=&#34;https://searchthecollection.nga.gov.au/landing&#34;&gt;National Gallery of Australia&lt;/a&gt;, &lt;a href=&#34;https://archive.sro.wa.gov.au/&#34;&gt;State Records Office of WA&lt;/a&gt;, &lt;a href=&#34;https://prov.vic.gov.au/explore-collection&#34;&gt;PROV&lt;/a&gt;, and &lt;a href=&#34;https://www.archivessearch.qld.gov.au/&#34;&gt;Queensland State Archives&lt;/a&gt;. But translators need maintenance, and they&amp;rsquo;re often broken by site upgrades. Also some systems (particularly the large commercial ones) make it very difficult (if not impossible) to write useful translators.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re interested in the current situation, I&amp;rsquo;ve created a spreadsheet documenting &lt;a href=&#34;https://docs.google.com/spreadsheets/d/1Zb_e9ZazP4zs-K8ZcbaTCnv6cO_OgmUx4A7U4MFyOFE/edit?usp=sharing&#34;&gt;GLAM Zotero support&lt;/a&gt;. As I&amp;rsquo;ve suggested previously, a project aimed at improving Zotero integration across Australian GLAM organisations would make a big (and relatively cheap) contribution to Australia&amp;rsquo;s HASS research infrastructure.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Explore Trove&#39;s digitised maps</title>
      <link>https://updates.timsherratt.org/2024/08/16/explore-troves-digitised.html</link>
      <pubDate>Fri, 16 Aug 2024 14:27:20 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/08/16/explore-troves-digitised.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://trove.nla.gov.au&#34;&gt;Trove&lt;/a&gt; contains thousands of digitised maps from the &lt;a href=&#34;https://www.nla.gov.au/collections/what-we-collect/maps&#34;&gt;collections of the National Library of Australia&lt;/a&gt;, but they&amp;rsquo;re not always easy to find because of the way they&amp;rsquo;re arranged and described. To help you explore these maps I&amp;rsquo;ve created &lt;a href=&#34;https://glam-workbench.net/trove-maps/explore-maps/&#34;&gt;a new database&lt;/a&gt; and published it using Datasette.&lt;/p&gt;
&lt;h3 id=&#34;try-it-nowhttpsglam-workbenchnettrove-mapsexplore-maps&#34;&gt;&lt;a href=&#34;https://glam-workbench.net/trove-maps/explore-maps/&#34;&gt;Try it now!&lt;/a&gt;&lt;/h3&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-08-16-12-39-50.png&#34; width=&#34;600&#34; height=&#34;455&#34; alt=&#34;Screenshot of a search for &#39;Hobart&#39; in the maps table, showing the results displayed both as a cluster map using Leaflet, and in a table.&#34;&gt;
&lt;p&gt;To get started, head to the &lt;strong&gt;map sheets&lt;/strong&gt; table and search for some keywords. The results are displayed both as a cluster map using Leaflet, and as a table. To view the details of an individual map sheet, click on the &lt;code&gt;id&lt;/code&gt; value.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-08-16-12-40-43.png&#34; width=&#34;600&#34; height=&#34;365&#34; alt=&#34;Screenshot of a page describing an individual map sheet.&#34;&gt;
&lt;p&gt;The individual record displays a zoomable version of the map image. If the record includes geospatial coordinates, these are also displayed on a modern basemap. There are also options to download the map image as a JPEG or a high-resolution TIFF (if available). Where possible I&amp;rsquo;ve also tried to link the Trove records to the NLA &lt;a href=&#34;https://mapsearch.nla.gov.au/&#34;&gt;MapSearch&lt;/a&gt; interface.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-08-16-13-01-49.png&#34; width=&#34;600&#34; height=&#34;351&#34; alt=&#34;Screenshot of a collection record showing links to subcollections and map sheets.&#34;&gt;
&lt;p&gt;To find map collections, you can either search in the &lt;strong&gt;collections&lt;/strong&gt; table, or click on the &lt;code&gt;collection&lt;/code&gt; link in a map sheet record to move up the descriptive hierarchy. A collection record includes links to browse the sub-collections and map sheets it contains.&lt;/p&gt;
&lt;h2 id=&#34;about-the-interface&#34;&gt;About the interface&lt;/h2&gt;
&lt;p&gt;I usually share GLAM Workbench datasets using &lt;a href=&#34;https://updates.timsherratt.org/2024/07/19/share-your-spreadsheet.html&#34;&gt;Datasette-Lite&lt;/a&gt;, but in this case I&amp;rsquo;ve created a full &lt;a href=&#34;https://datasette.io/&#34;&gt;Datasette&lt;/a&gt; instance running on Google Cloudrun. This is because of the size of the database, and  because I wanted to try out some plugins that don&amp;rsquo;t work in Datasette-Lite. In particular:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://datasette.io/plugins/datasette-cluster-map&#34;&gt;datasette-cluster-map&lt;/a&gt; – displays results on a cluster map using Leaflet&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://datasette.io/plugins/datasette-leaflet-geojson&#34;&gt;datasette-leaflet-geojson&lt;/a&gt; – converts GeoJSON strings in cells into Leaflet maps, this creates the location previews in each row and item view&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;about-the-data&#34;&gt;About the data&lt;/h2&gt;
&lt;p&gt;The data was compiled using &lt;a href=&#34;https://tdg.glam-workbench.net/other-digitised-resources/how-to/harvest-digitised-resources.html&#34;&gt;the method outlined in the &lt;em&gt;Trove Data Guide&lt;/em&gt;&lt;/a&gt;. This involves:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;harvesting all the work records from a &lt;a href=&#34;https://trove.nla.gov.au/search/category/images?keyword=%22nla.obj%22&amp;amp;l-format=Map&amp;amp;l-availability=y&#34;&gt;search for maps&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;unpacking any versions grouped within the work records&lt;/li&gt;
&lt;li&gt;using metadata embedded in the digitised object viewers to identify  collections, harvest collection items, and enrich the records&lt;/li&gt;
&lt;li&gt;merging duplicate values and records&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In addition, I&amp;rsquo;ve added geospatial information if it&amp;rsquo;s available. If there&amp;rsquo;s a corresponding record in NLA&amp;rsquo;s &lt;a href=&#34;https://mapsearch.nla.gov.au/&#34;&gt;MapSearch&lt;/a&gt; interface, I&amp;rsquo;ve linked the records and added the geospatial data from  MapSearch. Otherwise I&amp;rsquo;ve looked for geospatial information in the item  metadata and attempted to parse the coordinates.&lt;/p&gt;
&lt;p&gt;The result is a flat structure where every row in the database represents a record with a unique &lt;code&gt;nla.obj&lt;/code&gt; identifier. I&amp;rsquo;ve divided the rows into three tables according to their position in Trove&amp;rsquo;s descriptive hierarchy – &lt;strong&gt;collections&lt;/strong&gt;, &lt;strong&gt;sub-collections&lt;/strong&gt;, and &lt;strong&gt;map sheets&lt;/strong&gt;. Collections have child records, but no parents as they&amp;rsquo;re at the top of the hierarchy. Sub-collections have parent collections as well as child records. Map sheets have no child records but can belong to collections or sub-collections. There are links within each record to help you  navigate up and down this hierarchy.&lt;/p&gt;
&lt;p&gt;You&amp;rsquo;ll notice some problems  and limitations with the data, in particular, maps within collections  sometimes inherit their metadata from their parent. This means that  their geospatial coordinates refer to the whole series, rather than the  individual sheet. There are also numerous errors in the geospatial  coordinates parsed from metadata.&lt;/p&gt;
&lt;h2 id=&#34;coming-soon&#34;&gt;Coming soon&lt;/h2&gt;
&lt;p&gt;I originally harvested this data to try and help me understand the scope of the digitised maps in Trove. I&amp;rsquo;m currently adding a &amp;lsquo;Maps&amp;rsquo; section to the &lt;a href=&#34;https://tdg.glam-workbench.net/&#34;&gt;Trove Data Guide&lt;/a&gt; in which I&amp;rsquo;ll provide an overview of what&amp;rsquo;s available, and document methods for accessing and using the data.&lt;/p&gt;
&lt;p&gt;The notebooks used to create the dataset will be added soon to the &lt;a href=&#34;https://glam-workbench.net/trove-maps/&#34;&gt;Maps section of the GLAM Workbench&lt;/a&gt;. I&amp;rsquo;ll also be sharing CSV-formatted versions of all the data.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Share your spreadsheet as a searchable online database using Datasette-Lite</title>
      <link>https://updates.timsherratt.org/2024/07/19/share-your-spreadsheet.html</link>
      <pubDate>Fri, 19 Jul 2024 12:13:36 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/07/19/share-your-spreadsheet.html</guid>
      <description>&lt;p&gt;HASS researchers often compile data in spreadsheets. Sometimes they want to &amp;lsquo;publish&amp;rsquo; this data online in a form that encourages others to use and explore – but how? I&amp;rsquo;ve just added &lt;a href=&#34;https://glam-workbench.net/glam-tools/datasette/&#34;&gt;a simple tool&lt;/a&gt; to the GLAM Workbench that helps you construct a url that will open a CSV file as a searchable database using Datasette-Lite.&lt;/p&gt;
&lt;h2 id=&#34;whats-datasette&#34;&gt;What&amp;rsquo;s Datasette?&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://datasette.io/&#34;&gt;Datasette&lt;/a&gt; is a fantastic tool that helps you publish your data as an interactive website. There&amp;rsquo;s a few different varieties of Datasette, but &lt;a href=&#34;https://github.com/simonw/datasette-lite&#34;&gt;Datasette-Lite&lt;/a&gt; is probably the easiest, as you don&amp;rsquo;t need to install any software. Datasette-Lite runs completely within your web browser, converting your data into into a searchable database on demand.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-07-19-11-04-14.png&#34; width=&#34;600&#34; height=&#34;430&#34; alt=&#34;Screenshot of Datasette-Lite displaying a collection of editorial cartoons from the Bulletin&#34;&gt;
&lt;p&gt;I&amp;rsquo;m using Datasette-Lite throughout the GLAM Workbench. For example, try clicking the &lt;strong&gt;Explore in Datasette&lt;/strong&gt; buttons on any of these pages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/periodicals-data-api/&#34;&gt;digitised periodicals in Trove&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/bulletin-cartoons-collection/&#34;&gt;editorial cartoons from the Bulletin&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-music/trove-oral-histories/&#34;&gt;oral histories in Trove&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What do the buttons do? They&amp;rsquo;re simply links to my &lt;a href=&#34;https://github.com/GLAM-Workbench/datasette-lite&#34;&gt;customised version of Datasette-Lite&lt;/a&gt; in GitHub. This links retrieve the Datasette-Lite web page, which instructs your browser to install the necessary software and run the Datasette application. Parameters in the url point Datasette to a specific data file in another GitHub repository. Datasette loads the data file and builds the interface.&lt;/p&gt;
&lt;h2 id=&#34;create-your-own-datasette-lite-links&#34;&gt;Create your own Datasette-Lite links&lt;/h2&gt;
&lt;p&gt;All you need to publish your spreadsheet using Datasette-Lite is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;your spreadsheet saved in CSV format in a public GitHub repository&lt;/li&gt;
&lt;li&gt;a url that points Datasette-Lite to the CSV file&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-07-19-10-58-39.png&#34; width=&#34;600&#34; height=&#34;380&#34; alt=&#34;Screenshot of the web form in the GLAM Workbench to generate a Datasette-Lite url.&#34;&gt;
&lt;p&gt;To make this as easy as possible, I&amp;rsquo;ve created &lt;a href=&#34;https://glam-workbench.net/glam-tools/datasette/&#34;&gt;a tool that generates the necessary url&lt;/a&gt;. It&amp;rsquo;s just a simple web form – paste in the link to your CSV file, add some optional parameters, click the button, and the url to open your CSV file in Datasette-Lite will be displayed. Click on the url to open Datasette, or copy it to share your data with others.&lt;/p&gt;
&lt;p&gt;The optional parameters let you index full text columns for easy keyword searching, hide unwanted columns, and add links to more information about your dataset. They&amp;rsquo;re described more fully in the &lt;a href=&#34;https://glam-workbench.net/glam-tools/datasette/#documentation&#34;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;limitations-and-alternatives&#34;&gt;Limitations and alternatives&lt;/h2&gt;
&lt;p&gt;This tool makes it easy to share single CSV files as searchable databases. But if your CSV has millions of rows it&amp;rsquo;ll probably make your web browser unhappy if you try and load it in Datasette-Lite. Instead you can make use of &lt;a href=&#34;https://docs.datasette.io/en/stable/publish.html&#34;&gt;Datasette&amp;rsquo;s &amp;lsquo;publish&amp;rsquo; option&lt;/a&gt; to push your data to a dedicated Datasette instance running in the cloud. You can also customise your instance by changing the theme or using canned queries.&lt;/p&gt;
&lt;p&gt;For example, the &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Index Search&lt;/a&gt; contains more than 10 millions rows of data from multiple datasets, all aggregated through a single search interface. The &lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/tasmanian-post-office-directories/&#34;&gt;Tasmanian Post Office Directories&lt;/a&gt; provides full-text search across 48 digitised volumes and has a heavily customised interface that displays page images as well as the OCRd content. Both of these Datasette instances are hosted on Google Cloudrun.&lt;/p&gt;
&lt;p&gt;Similarly, the method described here is designed to work with single CSV files. If you have multiple, interconnected tables you&amp;rsquo;ll probably want to generate your own SQLite database to use with Datasette. There&amp;rsquo;s an example of how you can do this in the &lt;a href=&#34;https://glam-workbench.net/trove-journals/periodicals-enrich-for-datasette/&#34;&gt;Enrich the list of periodicals from the Trove API&lt;/a&gt; notebook in the GLAM Workbench.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Updated datasets describing Trove&#39;s digitised newspapers</title>
      <link>https://updates.timsherratt.org/2024/07/09/updated-datasets-describing.html</link>
      <pubDate>Tue, 09 Jul 2024 17:05:03 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/07/09/updated-datasets-describing.html</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/&#34;&gt;Trove newspapers&lt;/a&gt; section of the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; includes a number of notebooks and datasets that document the context and content of the newspaper corpus. I&amp;rsquo;ve just updated a few of these datasets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/csv-issues-per-year/&#34;&gt;Total number of issues per year for each newspaper in Trove&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/csv-complete-list-issues/&#34;&gt;Complete list of issues for every newspaper in Trove&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/list-non-english-newspapers/&#34;&gt;Trove newspapers with non-English language content&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/csv-newspapers-post-54/&#34;&gt;Trove newspapers with articles published after 1954&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/csv-newspapers-corrections/&#34;&gt;OCR corrections in Trove newspapers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-07-09-15-56-17.png&#34; width=&#34;600&#34; height=&#34;487&#34; alt=&#34;Visualisation showing the number of issues per day in Trove from 1896&#34;&gt;
&lt;p&gt;I&amp;rsquo;ve also used the issues data to update my &lt;a href=&#34;https://glam-workbench.net/examples/trove_newspaper_issues_per_day.html&#34;&gt;visualisation of the number of digitised newspaper issues in Trove published every day&lt;/a&gt; from 1803 to 2021 (there&amp;rsquo;s a lot of data so it can take a little while to load!).&lt;/p&gt;
&lt;p&gt;The notebooks in the Trove newspapers section still need to be updated to work with version 3 of the Trove API. I&amp;rsquo;m part way through and should get it finished in the next few weeks. I&amp;rsquo;ll also be adding some more of this data into the &lt;a href=&#34;https://tdg.glam-workbench.net/newspapers-and-gazettes/newspaper-corpus.html&#34;&gt;Understanding the digitised newspapers&lt;/a&gt; section of the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Understanding Trove at the AHA annual conference</title>
      <link>https://updates.timsherratt.org/2024/07/01/understanding-trove-at.html</link>
      <pubDate>Mon, 01 Jul 2024 21:36:41 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/07/01/understanding-trove-at.html</guid>
      <description>&lt;p&gt;A fairly intensive period of work came to an end today as I delivered a workshop on &amp;lsquo;Understanding Trove&amp;rsquo; at the Australian Historical Association&amp;rsquo;s annual conference in Adelaide. In effect, the workshop was also the launch of the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt;, which I&amp;rsquo;ve been developing as part of the &lt;a href=&#34;https://ardc.edu.au/project/ardc-community-data-lab/&#34;&gt;ARDC&amp;rsquo;s Community Data Lab&lt;/a&gt;. The ARDC sponsored today&amp;rsquo;s workshop and has provided bursaries to help five ECRs and HDRs participate in the conference&amp;rsquo;s digital history stream.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/b2bab75987ea4fef.jpeg&#34; width=&#34;600&#34; height=&#34;449&#34; alt=&#34;Photograph of attendees at the Understanding Trove workshop.&#34;&gt;
&lt;p&gt;Thanks to everyone who came to the workshop. It was great to have so much interest in developing a critical understanding of Trove and thinking about new research uses for Trove data. If you couldn&amp;rsquo;t make it, the slides are available below. Like the Trove Data Guide, the GLAM Workbench and pretty much everything else I do, the slides are openly licensed so feel free to share and reuse if any of it is useful to you.&lt;/p&gt;
&lt;iframe src=&#34;https://slides.com/wragge/aha-2024/embed&#34; width=&#34;100%&#34; height=&#34;500&#34; title=&#34;Understanding Trove workshop&#34; scrolling=&#34;no&#34; frameborder=&#34;0&#34; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;
&lt;p&gt;Now I think I need a day off before I start thinking about the topics I&amp;rsquo;d still like to add to the Trove Data Guide&amp;hellip;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Who is the Trove Data Guide for?</title>
      <link>https://updates.timsherratt.org/2024/06/21/who-is-the.html</link>
      <pubDate>Fri, 21 Jun 2024 16:32:29 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/06/21/who-is-the.html</guid>
      <description>&lt;p&gt;The Trove Data Guide aims to help researchers understand, access, and use data from Trove. But just because it’s about ‘data’ doesn’t mean  you need to be able to code. To understand Trove data and its  possibilities for research, you first need to understand Trove itself –  its history, its structure, its assumptions, and its limits. This  knowledge is useful to any Trove user.&lt;/p&gt;
&lt;p&gt;For example, all Trove users would benefit from knowing more about &lt;a href=&#34;https://tdg.glam-workbench.net/what-is-trove/works-and-versions.html&#34;&gt;works and versions&lt;/a&gt;, or how to use the &lt;a href=&#34;https://tdg.glam-workbench.net/understanding-search/simple-search-options.html&#34;&gt;‘simple’ search box for complex queries&lt;/a&gt;. There’s also an introduction to &lt;a href=&#34;https://tdg.glam-workbench.net/newspapers-and-gazettes/newspaper-corpus.html&#34;&gt;what’s in (and not in) the digitised newspapers&lt;/a&gt;, and similar overviews for other digitised content such as &lt;a href=&#34;https://tdg.glam-workbench.net/other-digitised-resources/books/overview.html&#34;&gt;books&lt;/a&gt;, &lt;a href=&#34;https://tdg.glam-workbench.net/other-digitised-resources/parliamentary-papers/overview.html&#34;&gt;parliamentary papers&lt;/a&gt;, and &lt;a href=&#34;https://tdg.glam-workbench.net/other-digitised-resources/parliamentary-papers/overview.html&#34;&gt;oral histories&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/newspapers-change.png&#34; width=&#34;600&#34; height=&#34;307&#34; alt=&#34;Line chart showing number of newspapers articles in Trove by publication year, showing the change from 2011 to 2022&#34;&gt;
&lt;p&gt;&lt;a href=&#34;https://tdg.glam-workbench.net/newspapers-and-gazettes/newspaper-corpus.html#newspapers-corpus-history&#34;&gt;&lt;em&gt;Number of newspapers articles in Trove by publication year, showing the change from 2011 to 2022&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Several sections document the way the web interface works (or doesn’t work). There’s a field guide to the various &lt;a href=&#34;https://tdg.glam-workbench.net/what-is-trove/interfaces.html&#34;&gt;interfaces&lt;/a&gt; and &lt;a href=&#34;https://tdg.glam-workbench.net/what-is-trove/links-and-identifiers.html&#34;&gt;identifiers&lt;/a&gt; you might come across, and details of &lt;a href=&#34;https://tdg.glam-workbench.net/accessing-data/using-web-interface.html&#34;&gt;options for downloading data&lt;/a&gt;. The Trove Data Guide fills many gaps in the official Trove  documentation, so check here if you run into problems, or can’t figure  out how to achieve a particular task. Perhaps you were wondering how &lt;a href=&#34;https://tdg.glam-workbench.net/accessing-data/how-to/download-higher-resolution-images.html&#34;&gt;to download digitised images at their highest available resolution&lt;/a&gt;?&lt;/p&gt;
&lt;p&gt;For people who are more comfortable with code there are plenty of  useful snippets and complete working examples. For example there are  sections that document how to &lt;a href=&#34;https://tdg.glam-workbench.net/newspapers-and-gazettes/data/accessing-data.html&#34;&gt;get metadata, text, and images from newspapers&lt;/a&gt;, and &lt;a href=&#34;https://tdg.glam-workbench.net/other-digitised-resources/accessing-data.html&#34;&gt;other digitised resources&lt;/a&gt;. There are also a series of ‘HOW TO’ pages that describe more complex data access methods.&lt;/p&gt;
&lt;p&gt;But what can you do with Trove data? The Trove Data Guide’s &lt;a href=&#34;https://tdg.glam-workbench.net/other-digitised-resources/accessing-data.html&#34;&gt;Pathways&lt;/a&gt; provide detailed tutorials that lead you step by step through examples  of packaging Trove data for use with other digital tools. Use these  examples as a starting point in planning your own projects.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Loading locations of Trove&#39;s digitised maps into the Gazetteer of Historical Australian Placenames</title>
      <link>https://updates.timsherratt.org/2024/06/18/loading-locations-of.html</link>
      <pubDate>Tue, 18 Jun 2024 17:57:16 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/06/18/loading-locations-of.html</guid>
      <description>&lt;p&gt;For this part of the ARDC&amp;rsquo;s &lt;a href=&#34;https://ardc.edu.au/project/ardc-community-data-lab/&#34;&gt;Community Data Lab&lt;/a&gt; project, I&amp;rsquo;ve been focusing in particular on adding a series of &lt;a href=&#34;https://tdg.glam-workbench.net/pathways/index.html&#34;&gt;researcher pathways&lt;/a&gt; to the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt;. These pathways link data from Trove to a variety of tools and approaches and include five detailed tutorials. The first four were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://tdg.glam-workbench.net/pathways/text/newspapers-keywords.html&#34;&gt;Analysing keywords in Trove’s digitised newspapers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://tdg.glam-workbench.net/pathways/images/tropy.html&#34;&gt;Working with a Trove collection in Tropy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://tdg.glam-workbench.net/pathways/images/mirador.html&#34;&gt;Comparing manuscript collections in Mirador&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://tdg.glam-workbench.net/pathways/collections/collectionbuilder.html&#34;&gt;Sharing a Trove List as a CollectionBuilder exhibition&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&amp;rsquo;ve now added the fifth and final (for now) tutorial:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://tdg.glam-workbench.net/pathways/geospatial/maps-to-ghap.html&#34;&gt;Create a layer in the Gazetteer of Historical Australian Placenames using metadata from Trove’s digitised maps&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As has been the way in a lot of TDG development, this tutorial builds on and extends resources available through the GLAM Workbench. The &lt;a href=&#34;https://glam-workbench.net/trove-maps/&#34;&gt;Trove maps section&lt;/a&gt; of the GLAM Workbench already included a dataset of &lt;a href=&#34;https://glam-workbench.net/trove-maps/single-maps-data/&#34;&gt;digitised maps&lt;/a&gt; and their &lt;a href=&#34;https://glam-workbench.net/trove-maps/single-maps-coordinates-data/&#34;&gt;coordinates&lt;/a&gt;, but for this tutorial I added a notebook that lets you &lt;a href=&#34;https://glam-workbench.net/trove-maps/create-map-subsets/&#34;&gt;create a subset of maps&lt;/a&gt; relating to a particular region. It does this by putting all the available map locations on world map. You then draw a rectangle on the map to select a region and display details of all the maps whose centre points fall within that region. It also displays links to download your new dataset as either a CSV or GeoJSON file.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-06-08-23-43-44.png&#34; width=&#34;600&#34; height=&#34;378&#34; alt=&#34;Screenshot of the notebook in action showing how you can select a region to create a subset of digitised maps.&#34;&gt;
&lt;p&gt;The tutorial walks you through this process, then demonstrates how you can upload data from the CSV file to create a new layer in the &lt;a href=&#34;https://ghap.tlcmap.org/&#34;&gt;Gazetteer of Historical Australian Placenames&lt;/a&gt; (GHAP).&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/ghap-layer-point-info.png&#34; width=&#34;600&#34; height=&#34;469&#34; alt=&#34;Screenshot of GHAP layer displaying a layer containing information about digitised relating to Fiji.&#34;&gt;
&lt;p&gt;This part of the Trove Data Guide project is now finished, but I&amp;rsquo;ll be continuing to add and refine content. If you have any suggestions for additional tutorials, feel free to add them to the &lt;a href=&#34;https://github.com/wragge/trove-data-guide/discussions/categories/ideas&#34;&gt;ideas board&lt;/a&gt; (no promises though!).&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Instant exhibitions with Trove and CollectionBuilder</title>
      <link>https://updates.timsherratt.org/2024/06/10/instant-exhibitions-with.html</link>
      <pubDate>Mon, 10 Jun 2024 22:38:04 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/06/10/instant-exhibitions-with.html</guid>
      <description>&lt;p&gt;You’ve been collecting and annotating items relating to your research  project in a Trove List. You’d like to display the contents of your list as an online exhibition for others to explore. But how? One possible approach is now documented in the &lt;em&gt;Trove Data Guide&lt;/em&gt;. I&amp;rsquo;ve added &lt;a href=&#34;https://wragge.github.io/trove-data-guide/pathways/collections/collectionbuilder.html&#34;&gt;a tutorial&lt;/a&gt; which walks through the process of using a &lt;a href=&#34;https://glam-workbench.net/trove-lists/convert-list-to-cb-exhibition/&#34;&gt;GLAM Workbench notebook&lt;/a&gt; to extract and process data from a Trove List, before uploading it to &lt;a href=&#34;https://collectionbuilder.github.io/&#34;&gt;CollectionBuilder&lt;/a&gt; to create an instant exhibition.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/cb-wragge-demo.png&#34; width=&#34;600&#34; height=&#34;407&#34; alt=&#34;Screenshot of the demo exhibition&#34;&gt;
&lt;p&gt;CollectionBuilder creates online exhibitions using static web technologies. It provides a GitHub Pages template repository, so all you need to do to create an exhibition is upload your metadata and images to GitHub. The GLAM Workbench notebook gets your list data from the Trove API, and enriches it a bit to take advantage of CollectionBuilder&amp;rsquo;s built-in visualisations. For example, if there&amp;rsquo;s any digitised maps in your list, the notebook will try and extract their coordinates from the digitised map viewer and add them to the metadata so that CollectionViewer can display the location on a map. The notebook also downloads images of newspaper articles and other digitsed resources, and links them to the metadata, ready for upload.&lt;/p&gt;
&lt;p&gt;Check out the tutorial: &lt;a href=&#34;https://wragge.github.io/trove-data-guide/pathways/collections/collectionbuilder.html&#34;&gt;Sharing a Trove List as a CollectionBuilder exhibition&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Keyword analysis of Trove newspapers with the GLAM Workbench &amp; ATAP</title>
      <link>https://updates.timsherratt.org/2024/06/03/keyword-analysis-of.html</link>
      <pubDate>Mon, 03 Jun 2024 16:06:44 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/06/03/keyword-analysis-of.html</guid>
      <description>&lt;p&gt;There&amp;rsquo;s a &lt;a href=&#34;https://wragge.github.io/trove-data-guide/pathways/text/newspapers-keywords.html&#34;&gt;new draft tutorial&lt;/a&gt; in the development version of the &lt;em&gt;Trove Data Guide&lt;/em&gt;. It walks through the process of harvesting a collection of digitised newspaper articles from Trove, reshaping the harvest to create sub-collections, and then loading the data into the &lt;a href=&#34;https://github.com/Australian-Text-Analytics-Platform/keywords-analysis&#34;&gt;Keyword Analysis Tool&lt;/a&gt; provided by the &lt;a href=&#34;https://www.atap.edu.au/&#34;&gt;Australian Text Analytics Platform&lt;/a&gt; (ATAP).&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-06-03-14-56-53.png&#34; width=&#34;600&#34; height=&#34;404&#34; alt=&#34;&#34;&gt;
&lt;p&gt;Along the way it goes into a fair bit of detail about constructing searches, using the &lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;Trove Newspaper Harvester&lt;/a&gt;, and thinking about your data. Much of the information on creating and reshaping datasets would apply to using the digitised newspapers with other analysis tools as well.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Running Mirador on GitHub Pages</title>
      <link>https://updates.timsherratt.org/2024/06/02/running-mirador-on.html</link>
      <pubDate>Sun, 02 Jun 2024 22:27:40 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/06/02/running-mirador-on.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve just created a &lt;a href=&#34;https://github.com/wragge/mirador-ghpages&#34;&gt;GitHub repository template&lt;/a&gt; that you can use to get your own &lt;a href=&#34;https://projectmirador.org/&#34;&gt;Mirador&lt;/a&gt; version 3 installation running in minutes. You can also configure it to display local or remote IIIF manifests. I was thinking that it could be useful for researchers who want to create their own customised Mirador workspaces to examine a particular set of documents, but don&amp;rsquo;t want to install any software or fiddle about on the command-line.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-06-01-14-33-34.png&#34; width=&#34;600&#34; height=&#34;367&#34; alt=&#34;Screenshot of Mirador workspace showing two windows displaying manuscript images.&#34;&gt;
&lt;p&gt;I&amp;rsquo;ve been doing a lot with &lt;a href=&#34;https://iiif.io/&#34;&gt;IIIF&lt;/a&gt; lately. First &lt;a href=&#34;https://updates.timsherratt.org/2024/05/15/using-iiif-to.html&#34;&gt;a GLAM Workbench notebook&lt;/a&gt; to save a collection of images from Trove as an IIIF manifest. Then a &lt;a href=&#34;https://updates.timsherratt.org/2024/05/21/trove-to-tropy.html&#34;&gt;tutorial for the Trove Data Guide&lt;/a&gt; that walks through the whole process of generating an IIIF manifest from Trove, then loading the manifest into &lt;a href=&#34;https://tropy.org/&#34;&gt;Tropy&lt;/a&gt; for analysis and annotation. I&amp;rsquo;ve also just about finished another tutorial that saves parts of a manuscript collection as IIIF manifests, then loads them into Mirador for side-by-side comparison.&lt;/p&gt;
&lt;p&gt;In the new tutorial I was planning to use GitHub to put the manifests online so that they could then be displayed using the Mirador demo site. But then I thought if we&amp;rsquo;re saving the manifests to GitHub, why not use GitHub Pages to run a customised version of Mirador that can read the manifests from its own repository? There are various examples around of getting Mirador running on services like Netlify, but I couldn&amp;rsquo;t find a GitHub Pages example. So I decided to make my own.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://github.com/ProjectMirador/mirador-integration&#34;&gt;Mirador integration examples&lt;/a&gt; were useful in getting a basic set-up working. Then it was a matter of creating a GitHub action to generate the Mirador site any time the repository changed. As mentioned, I wanted users to be able to load local manifests into their Mirador workspace, so I wrote a little Python script to check the &lt;code&gt;manifests&lt;/code&gt; directory for files, and copy the paths into the Mirador config. So all a researcher has to do is upload a manifest to the repository, then the deploy scripts rewrite the config and create a new version of the Mirador site automatically.&lt;/p&gt;
&lt;p&gt;Similarly, if you want to include some remote manifests in your default workspace, it&amp;rsquo;s just a matter of adding the urls to the &lt;code&gt;manifest_urls.txt&lt;/code&gt; file. The deploy scripts read the file and add the urls to the config.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve included the &lt;a href=&#34;https://github.com/ProjectMirador/mirador-image-tools&#34;&gt;Mirador Image Tools&lt;/a&gt; plugin in the default installation, but you can modify the template to add additional plugins if you want. With Mirador version 4 currently being tested, I&amp;rsquo;ll no doubt create a version 4 template before too long. Let me know if you find this useful!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Commonwealth Hansard XML repository updates</title>
      <link>https://updates.timsherratt.org/2024/05/26/commonwealth-hansard-xml.html</link>
      <pubDate>Sun, 26 May 2024 14:45:51 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/05/26/commonwealth-hansard-xml.html</guid>
      <description>&lt;p&gt;Hey Australian Hansard fans, I&amp;rsquo;ve done a complete reharvest of all of the Commonwealth Hansard XML files from 1901 to 1980 from ParlInfo. There&amp;rsquo;s been lots of improvements/corrections, and most of the file names have changed (they now have a version flag). The improvements seem to be ongoing, so I&amp;rsquo;ll try to harvest more regularly from now on. You can download the lot from &lt;a href=&#34;https://github.com/wragge/hansard-xml&#34;&gt;the GitHub repository&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I still need to load the updated XML into the &lt;a href=&#34;http://historichansard.net/&#34;&gt;Historic Hansard&lt;/a&gt; site, but that&amp;rsquo;s going to have to wait  for a month or two&amp;hellip;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>More tools for harvesting Trove newspaper articles</title>
      <link>https://updates.timsherratt.org/2024/05/24/more-tools-for.html</link>
      <pubDate>Fri, 24 May 2024 16:10:21 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/05/24/more-tools-for.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve just added a couple of new notebooks to the &lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;Trove Newspaper &amp;amp; Gazette Harvester section&lt;/a&gt; of the GLAM Workbench.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-harvester/basic-harvester-example/&#34;&gt;Using the Trove Harvester as a Python package&lt;/a&gt; provides a basic example of using the &lt;a href=&#34;https://wragge.github.io/trove-newspaper-harvester/&#34;&gt;trove-newspaper-harvester&lt;/a&gt; Python package. While there&amp;rsquo;s already a simple web app version of the harvester, I wanted a notebook version running in the JupyterLab interface that I could integrate with other tools and notebooks. All you need to do to harvest all the articles in a Trove newspaper search is paste in your Trove API key and the search query url from the Trove web interface.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-05-24-15-05-55.png&#34; width=&#34;600&#34; height=&#34;355&#34; alt=&#34;Screenshot from the Reshaping your newspaper harvest describing the Harvest Slicer&#34;&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-harvester/reshaping-harvests/&#34;&gt;Reshaping your newspaper harvest&lt;/a&gt; provides a slice and dice wonder tool for Trove newspaper harvests, enabling you to repackage OCRd text by decade, year, and  newspaper title. It saves the results as zip files, concatenated text files, or CSV files with embedded text. These repackaged slices should  suit a variety of text analysis tools and questions. I&amp;rsquo;ve be thinking about doing something like this for a while, and think it should be quite useful.&lt;/p&gt;
&lt;p&gt;In my usual way, I started off writing a tutorial for the Trove Data Guide on ways of loading digitised newspaper data into text analysis tools and then realised I needed these notebooks to fill in some gaps in the data processing pipeline. So after a day or two of yak shaving I now have to get back to the tutorial.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Trove to Tropy via IIIF – documenting data pathways in the Trove Data Guide</title>
      <link>https://updates.timsherratt.org/2024/05/21/trove-to-tropy.html</link>
      <pubDate>Tue, 21 May 2024 23:16:16 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/05/21/trove-to-tropy.html</guid>
      <description>&lt;p&gt;Last week I &lt;a href=&#34;https://updates.timsherratt.org/2024/05/15/using-iiif-to.html&#34;&gt;added a notebook to the GLAM Workbench&lt;/a&gt; that saves a collection of images from Trove as an IIIF manifest. This week I&amp;rsquo;ve &lt;a href=&#34;https://wragge.github.io/trove-data-guide/pathways/images/tropy.html&#34;&gt;written a tutorial&lt;/a&gt; that shows how you can use the notebook to load the collection data in &lt;a href=&#34;https://tropy.org/&#34;&gt;Tropy&lt;/a&gt; – a desktop tool for managing and annotating images for research.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/tropy-interface.png&#34; width=&#34;600&#34; height=&#34;409&#34; alt=&#34;The Tropy interface showing photographs imported from the B.A.N.Z. Antarctic Research Expedition photographs (https://nla.gov.au/nla.obj-141170265).&#34;&gt;
&lt;p&gt;This is the &lt;a href=&#34;https://wragge.github.io/trove-data-guide/pathways/images/tropy.html&#34;&gt;first tutorial&lt;/a&gt; in the &lt;em&gt;Trove Data Guide&lt;/em&gt;&amp;rsquo;s &lt;a href=&#34;https://updates.timsherratt.org/2024/04/18/what-do-you.html&#34;&gt;Research Pathways section&lt;/a&gt;. While most of the TDG documents the types of data available in Trove and how you can access it, the pathways aim to connect Trove data with other tools and platforms – to point at possibilities for analysis, enrichment, and sharing. For example, I&amp;rsquo;m planning tutorials on packaging OCRd text from Trove for use with text analysis tools, as well as ways of sharing selected data through tools like CollectionBuilder, and Datasette.&lt;/p&gt;
&lt;p&gt;If you have any ideas for additional tutorials, feel free to add them to the &lt;a href=&#34;https://github.com/wragge/trove-data-guide/discussions/categories/ideas&#34;&gt;ideas board&lt;/a&gt; in GitHub.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Using IIIF to explore Trove&#39;s digitised images</title>
      <link>https://updates.timsherratt.org/2024/05/15/using-iiif-to.html</link>
      <pubDate>Thu, 16 May 2024 00:24:24 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/05/15/using-iiif-to.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve just added &lt;a href=&#34;https://glam-workbench.net/trove-images/save-image-collection-iiif/&#34;&gt;a new notebook&lt;/a&gt; to the Trove images section of the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt;. It helps you save a collection of digitised images as an IIIF manifest. But what does that mean? It means the notebook packages up all the metadata describing the images in a standard form that can be used with a variety of IIIF-compliant tools. These tools let you do things with the collections that you can&amp;rsquo;t do in Trove&amp;rsquo;s own interface.&lt;/p&gt;
&lt;p&gt;Perhaps you&amp;rsquo;d like to browse the complete digitised contents of &lt;a href=&#34;https://nla.gov.au/nla.obj-224441684&#34;&gt;Sir Edmund Barton&amp;rsquo;s manuscript collection&lt;/a&gt;, without all the back and forth and up and down navigation imposed by the Trove web interface. Here you go, thanks to IIIF!&lt;/p&gt;
&lt;iframe src=&#34;https://uv-v3.netlify.app/uv/uv.html#?manifest=https://raw.githubusercontent.com/wragge/iiif-tests/main/nla.obj-224441684-v3-manifest.json&amp;c=0&amp;m=0&amp;s=0&amp;cv=0&#34; width=&#34;100%&#34; height=&#34;600&#34; allowfullscreen frameborder=&#34;0&#34;&gt;&lt;/iframe&gt;
&lt;h2 id=&#34;whats-iiif&#34;&gt;What&amp;rsquo;s IIIF?&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;https://iiif.io/&#34;&gt;International Image Interoperability Framework&lt;/a&gt;, more conveniently known as IIIF, develops open standards for sharing digital objects, such as images. IIIF platforms and standards are used by GLAM organisations around the world to deliver their image collections online.&lt;/p&gt;
&lt;p&gt;Once you have standards for sharing image metadata, people can build tools that work across collections. For example, &lt;a href=&#34;https://universalviewer.io/&#34;&gt;Universal Viewer&lt;/a&gt; and &lt;a href=&#34;https://projectmirador.org/&#34;&gt;Mirador&lt;/a&gt; are both richly featured, open source, community developed image viewing platforms.&lt;/p&gt;
&lt;p&gt;IIIF manifests are JSON files that describe a set of digital objects. They include technical information about the images and how to access them, as well as metadata describing their content and context. Everything you need to explore the images is packaged up in a single, standards based file. This means that if you point a manifest at tools like Universal Viewer and Mirador, they can present the images to users without any special configuration.&lt;/p&gt;
&lt;h2 id=&#34;why-is-the-new-notebook-needed&#34;&gt;Why is the new notebook needed?&lt;/h2&gt;
&lt;p&gt;Unfortunately Trove doesn&amp;rsquo;t provide data using IIIF standards. Indeed, it doesn&amp;rsquo;t really supply &lt;em&gt;any&lt;/em&gt; machine-readable data about the contents of digital collections. The notebook scrapes metadata from Trove&amp;rsquo;s digital collection viewer, reassembling it as a standard IIIF manifest.&lt;/p&gt;
&lt;h2 id=&#34;whats-possible&#34;&gt;What&amp;rsquo;s possible?&lt;/h2&gt;
&lt;p&gt;Here are a few more examples of manifests created with the new notebook.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Trove collection&lt;/th&gt;
&lt;th&gt;manifest&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://nla.gov.au/nla.obj-141170265&#34;&gt;B.A.N.Z. Antarctic Research Expedition photographs&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://raw.githubusercontent.com/wragge/iiif-tests/main/nla.obj-141170265-v3-manifest.json&#34;&gt;nla.obj-141170265-v3-manifest.json&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://uv-v3.netlify.app/#?c=&amp;amp;m=&amp;amp;s=&amp;amp;cv=54&amp;amp;manifest=https://raw.githubusercontent.com/wragge/iiif-tests/main/nla.obj-141170265-v3-manifest.json&#34;&gt;view in UV3&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://projectmirador.org/embed/?iiif-content=https://raw.githubusercontent.com/wragge/iiif-tests/main/nla.obj-141170265-v3-manifest.json&#34;&gt;view in Mirador&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://nla.gov.au/nla.obj-224441684&#34;&gt;The Papers of Sir Edmund Barton&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://raw.githubusercontent.com/wragge/iiif-tests/main/nla.obj-224441684-v3-manifest.json&#34;&gt;nla.obj-224441684-v3-manifest.json&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://uv-v3.netlify.app/#?c=&amp;amp;m=&amp;amp;s=&amp;amp;cv=54&amp;amp;manifest=https://raw.githubusercontent.com/wragge/iiif-tests/main/nla.obj-224441684-v3-manifest.json&#34;&gt;view in UV3&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://projectmirador.org/embed/?iiif-content=https://raw.githubusercontent.com/wragge/iiif-tests/main/nla.obj-224441684-v3-manifest.json&#34;&gt;view in Mirador&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://nla.gov.au/nla.obj-224441858&#34;&gt;Papers relating to the Federation Campaign (a single series from the Barton papers)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://raw.githubusercontent.com/wragge/iiif-tests/main/nla.obj-224441858-v3-manifest.json&#34;&gt;nla.obj-224441858-v3-manifest.json&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://uv-v3.netlify.app/#?c=&amp;amp;m=&amp;amp;s=&amp;amp;cv=54&amp;amp;manifest=https://raw.githubusercontent.com/wragge/iiif-tests/main/nla.obj-224441858-v3-manifest.json&#34;&gt;view in UV3&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://projectmirador.org/embed/?iiif-content=https://raw.githubusercontent.com/wragge/iiif-tests/main/nla.obj-224441858-v3-manifest.json&#34;&gt;view in Mirador&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&#34;https://nla.gov.au/nla.obj-140670968&#34;&gt;Postcard portraits of actresses, and of Australian towns, 1900s&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://raw.githubusercontent.com/wragge/iiif-tests/main/nla.obj-140670968-v3-manifest.json&#34;&gt;nla.obj-224441858-v3-manifest.json&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://uv-v3.netlify.app/#?c=&amp;amp;m=&amp;amp;s=&amp;amp;cv=54&amp;amp;manifest=https://raw.githubusercontent.com/wragge/iiif-tests/main/nla.obj-140670968-v3-manifest.json&#34;&gt;view in UV3&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://projectmirador.org/embed/?iiif-content=https://raw.githubusercontent.com/wragge/iiif-tests/main/nla.obj-140670968-v3-manifest.json&#34;&gt;view in Mirador&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;I&amp;rsquo;m just linking to the standard demonstration versions of Universal Viewer and Mirador, but they can be extended with plugins that add more functionality, such as annotation. As demonstrated above, it&amp;rsquo;s also easy to embed the viewers in your own website.&lt;/p&gt;
&lt;p&gt;Manifests can also be used to move data between programs. &lt;a href=&#34;https://tropy.org/&#34;&gt;Tropy&lt;/a&gt;, for example, is a desktop tool for managing collections of images for research. It can import images and metadata from an IIIF manifest. So if you want to add your own notes and transcriptions to a digitised manuscript collection in Trove, you could save it as an IIIF manifest, then load it in Tropy.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m just about to start documenting some of these possibilities and pathways for the &lt;a href=&#34;https://wragge.github.io/trove-data-guide/home.html&#34;&gt;Trove Data Guide&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;one-more-thing&#34;&gt;One more thing&lt;/h2&gt;
&lt;p&gt;While I was developing the notebook, I noticed another inconsistency in the way Trove&amp;rsquo;s digitised collections are arranged. This meant that in some cases my notebook to &lt;a href=&#34;https://glam-workbench.net/trove-images/download-image-collection/&#34;&gt;download all the images from a collection&lt;/a&gt; might not get every image. I&amp;rsquo;ve made some changes that should fix this and uploaded a new version of the notebook.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Using Pandora&#39;s collection of archived websites</title>
      <link>https://updates.timsherratt.org/2024/05/07/using-pandoras-collection.html</link>
      <pubDate>Tue, 07 May 2024 17:23:39 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/05/07/using-pandoras-collection.html</guid>
      <description>&lt;p&gt;There&amp;rsquo;s a brand &lt;a href=&#34;https://glam-workbench.net/trove-web-archives/&#34;&gt;new section&lt;/a&gt; of the GLAM Workbench to help you use data from Pandora&amp;rsquo;s collection of archived websites.&lt;/p&gt;
&lt;h2 id=&#34;whats-pandora&#34;&gt;What&amp;rsquo;s Pandora?&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;http://pandora.nla.gov.au/&#34;&gt;Pandora&lt;/a&gt; is an initiative of the National Library of Australia which has been selecting web sites and online resources for preservation since 1996. It&amp;rsquo;s assembled a collection of more than 80,000 archived website titles, organised into subjects and collections. The archived websites are now part of the  Australian Web Archive (AWA), which combines the selected titles with  broader domain harvests, and is searchable through Trove.&lt;/p&gt;
&lt;h2 id=&#34;why-is-this-needed&#34;&gt;Why is this needed?&lt;/h2&gt;
&lt;p&gt;The GLAM Workbench already has a &lt;a href=&#34;https://glam-workbench.net/web-archives/&#34;&gt;Web Archives section&lt;/a&gt; that provides documentation, tools, and examples to help you work with data  from a range of web archives, including the Australian Web Archive. But Pandora is unique to the NLA, and its curated collections offer a useful entry point for researchers trying to find web sites relating to particular topics or  events.&lt;/p&gt;
&lt;p&gt;Imagine you&amp;rsquo;re a researcher examining Australian election campaigns over the last decade. One thing you might want to do is analyse the language of campaign web sites. But how would you find them? You can search across the full text content of the Australian Web Archive in Trove, but you&amp;rsquo;d need some way of filtering the results to find sites of interest. Or you could just go the &lt;a href=&#34;http://pandora.nla.gov.au/subject/6&#34;&gt;Elections&lt;/a&gt; category in Pandora.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-05-07-15-06-38.png&#34; width=&#34;600&#34; height=&#34;336&#34; alt=&#34;Screenshot of Pandora&#39;s listing of collections related to election campaigns&#34;&gt;
&lt;p&gt;Full-text search is great for some tasks, but carefully-curated collections with good-quality metadata can save us a lot of time and effort. Unfortunately, the Trove web interface prioritises search over Pandora&amp;rsquo;s collection metadata. If you head to Trove&amp;rsquo;s &amp;lsquo;Categories&amp;rsquo; tab, you&amp;rsquo;ll find a link to &lt;a href=&#34;https://webarchive.nla.gov.au/collection&#34;&gt;Archived Webpage Collections&lt;/a&gt;. This collection hierarchy is basically the same as Pandora&amp;rsquo;s – combining Pandora&amp;rsquo;s subjects, subcategories, and collections into a single hierarchical structure. However, it only includes links to titles that are part of collections. This is important, as less than half of Pandora&amp;rsquo;s selected titles seem to be assigned to collections. Even stranger is the fact that I can&amp;rsquo;t find any link in Trove to the main Pandora site. This means that most researchers using the Australian Web Archive through Trove probably don&amp;rsquo;t even know that Pandora&amp;rsquo;s subject groupings exist!&lt;/p&gt;
&lt;p&gt;For more on Pandora&amp;rsquo;s approach to describing collections see &lt;a href=&#34;https://doi.org/10.48550/arXiv.2209.08649&#34;&gt;Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;whats-in-the-new-section&#34;&gt;What&amp;rsquo;s in the new section?&lt;/h2&gt;
&lt;p&gt;The new &lt;a href=&#34;https://glam-workbench.net/trove-web-archives/&#34;&gt;Trove web archive collections (Pandora)&lt;/a&gt; section of the GLAM Workbench includes three notebooks and three datasets.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The main aim is help researchers assemble datasets of archived website urls based on Pandora&amp;rsquo;s subject groupings.&lt;/strong&gt; So if, as described above, you want a list of websites associated with election campaigns you can go the &lt;a href=&#34;https://glam-workbench.net/trove-web-archives/create-datasets/&#34;&gt;Create title datasets from collections and subjects&lt;/a&gt; notebook and generate a CSV file containing 9,304 urls. Easy! The notebook also generates an &lt;a href=&#34;https://www.researchobject.org/ro-crate/&#34;&gt;RO-Crate&lt;/a&gt; metadata file capturing the context in which your dataset was created.&lt;/p&gt;
&lt;p&gt;To make all that possible, I&amp;rsquo;ve harvested Pandora&amp;rsquo;s complete subject hierarchy and list of titles. The code to do this is in these notebooks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-web-archives/harvest-pandora-subject-collections/&#34;&gt;Harvest Pandora subjects and collections&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-web-archives/harvest-pandora-titles/&#34;&gt;Harvest the full collection of Pandora titles&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And pre-harvested subject, collections, and titles datasets are here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-web-archives/pandora-collections-data/&#34;&gt;Pandora collections data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-web-archives/pandora-titles-data/&#34;&gt;Pandora titles data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To give an overview of Pandora&amp;rsquo;s subject organisation, I&amp;rsquo;ve also created a &lt;a href=&#34;https://glam-workbench.net/trove-web-archives/pandora-subject-hierarchy/&#34;&gt;single-page view of the complete hierarchy&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;what-do-you-do-with-a-dataset-of-archived-website-urls&#34;&gt;What do you do with a dataset of Archived website urls?&lt;/h2&gt;
&lt;p&gt;Once you have your own dataset of archived urls you can make use of the tools in the &lt;a href=&#34;https://glam-workbench.net/web-archives/&#34;&gt;Web Archives&lt;/a&gt; section to gather more data for analysis. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/web-archives/get-all-versions/&#34;&gt;Find all the archived versions of a web page using Timemaps&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/web-archives/display-changes-in-text/&#34;&gt;Display changes in the text of an archived web page over time&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/web-archives/harvesting-text/&#34;&gt;Harvesting collections of text from archived web pages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/web-archives/create-screenshots-over-time/&#34;&gt;Using screenshots to visualise change in a page over time&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&amp;rsquo;m hoping to explore some more of these possibilities in the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt;, part of the &lt;a href=&#34;https://ardc.edu.au/project/ardc-community-data-lab/&#34;&gt;ARDC&amp;rsquo;s Community Data Lab&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>How to download all the images from a digitised collection in Trove (&amp; learn some cool Trove tricks)</title>
      <link>https://updates.timsherratt.org/2024/04/24/how-to-download.html</link>
      <pubDate>Thu, 25 Apr 2024 00:35:50 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/04/24/how-to-download.html</guid>
      <description>&lt;p&gt;Digitised resources in Trove are sometimes grouped into collections – an album of photographs, a set of posters, a bundle of letters. I&amp;rsquo;ve just added &lt;a href=&#34;https://glam-workbench.net/trove-images/download-image-collection/&#34;&gt;a notebook&lt;/a&gt; to the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; that downloads all the images in a collection at the highest available resolution.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-04-24-23-29-35.png&#34; width=&#34;600&#34; height=&#34;345&#34; alt=&#34;Screen capture of file broswer showing colourful thumbnails of a harvested collection od posters&#34;&gt;
&lt;p&gt;&lt;em&gt;A sample of the 3,048 posters download from &lt;a href=&#34;https://nla.gov.au/nla.obj-2590804313&#34;&gt;nla.obj-2590804313&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&#34;why-is-it-necessary&#34;&gt;Why is it necessary?&lt;/h2&gt;
&lt;p&gt;Trove&amp;rsquo;s digitised collection viewer includes a download option. But in most cases that seems to be limited to downloading 20 images at a time. Part of the reason for that is probably because the images are zipped up into a single file, which could get very large if 100, or 200 images were added.&lt;/p&gt;
&lt;p&gt;Another limitation of the built-in download option is that the images are often fairly low resolution copies (many have a maximum width of 1000px). The quality of the images limits how you can use them.&lt;/p&gt;
&lt;p&gt;For many research purposes, you&amp;rsquo;ll want a complete collection at the highest resolution possible. Trove makes that difficult. The new notebook makes it easy.&lt;/p&gt;
&lt;h2 id=&#34;how-does-it-work&#34;&gt;How does it work?&lt;/h2&gt;
&lt;p&gt;The Trove API is not much help. The only collections it knows about are articles in newspapers, and issues in periodicals. However, when you browse through a collection using the digitised collection viewer, a little internal API is called that delivers an HTML list of the next 20 items. By stepping through the collection page by page, you can eventually harvest details of all the items in a collection.&lt;/p&gt;
&lt;p&gt;Once you have the &lt;code&gt;nla.obj&lt;/code&gt; identifiers for each image, you can download a high-resolution version simply by adding &lt;code&gt;/image&lt;/code&gt; to the url. Downloaded this way, the images generally seem to have a longest dimension equal to 5000px ­– a considerable improvement!&lt;/p&gt;
&lt;p&gt;As well as downloading all the images, the notebook also generates an &lt;a href=&#34;https://www.researchobject.org/ro-crate/&#34;&gt;RO-Crate&lt;/a&gt; metadata file that describes the context of your harvest – when it was run, what collection you downloaded, what notebook you used, as well as the details of each image. This&amp;rsquo;ll help you in the future when you&amp;rsquo;ve forgotten where all the images came from!&lt;/p&gt;
&lt;h2 id=&#34;where-can-i-learn-all-these-trove-tricks&#34;&gt;Where can I learn all these Trove tricks?&lt;/h2&gt;
&lt;p&gt;This new notebook came about because I was documenting the method for harvesting collections in the &lt;em&gt;Trove Data Guide&lt;/em&gt;. I realised that I needed to adapt my original code to work with complex collections that included multiple layers of nested sub-collections (like manuscript finding aids). Having done that, I thought it would be useful to create a working example in the GLAM Workbench.&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;Trove Data Guide&lt;/em&gt; is being developed as part of the &lt;a href=&#34;https://ardc.edu.au/project/ardc-community-data-lab/&#34;&gt;ARDC&amp;rsquo;s Community Lab&lt;/a&gt;. In it I&amp;rsquo;m trying to document as many of these problems and workarounds as I can to open Trove data to new research uses. Here, for example, is the &lt;a href=&#34;https://wragge.github.io/trove-data-guide/other-digitised-resources/how-to/get-collection-items.html&#34;&gt;page that discusses how to get a list of collection items&lt;/a&gt;, and here are &lt;a href=&#34;https://wragge.github.io/trove-data-guide/accessing-data/how-to/download-higher-resolution-images.html&#34;&gt;some suggestions for downloading high-resolution images&lt;/a&gt;. One of my favourite discoveries from recent weeks was the internal API that &lt;a href=&#34;https://wragge.github.io/trove-data-guide/other-digitised-resources/how-to/get-ocr-layout-data.html&#34;&gt;delivers OCR layout information about book and periodical pages&lt;/a&gt; – find out how to save illustrations from books, visualise the layout of a periodical page, and even create some #redactionart poetry.&lt;/p&gt;
&lt;p&gt;The content of the &lt;em&gt;Trove Data Guide&lt;/em&gt; is changing and developing all the time. It&amp;rsquo;s all happening in the open, so feel free to explore the &lt;a href=&#34;https://wragge.github.io/trove-data-guide/home.html&#34;&gt;bleeding-edge development version&lt;/a&gt; or the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;latest published version&lt;/a&gt;. If there&amp;rsquo;s something you&amp;rsquo;d like to see, please post it on the &lt;a href=&#34;https://github.com/wragge/trove-data-guide/discussions/categories/ideas&#34;&gt;GitHub ideas board&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;where-is-the-new-notebook&#34;&gt;Where is the new notebook?&lt;/h2&gt;
&lt;p&gt;The new notebook is part of the &lt;a href=&#34;https://glam-workbench.net/trove-images/&#34;&gt;Trove images&lt;/a&gt; section in the GLAM Workbench. Because I was adding a new notebook, I also took the chance to update the whole repository. Changes include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;now using Trove API v3&lt;/li&gt;
&lt;li&gt;now using Python 3.10&lt;/li&gt;
&lt;li&gt;updated Python packages&lt;/li&gt;
&lt;li&gt;now includes an &lt;code&gt;ro-crate-metadata.json&lt;/code&gt; containing machine-readable metadata describing this repository&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I also updated &lt;a href=&#34;https://glam-workbench.net/trove-images/trove-images-rights-data/&#34;&gt;the datasets&lt;/a&gt; that provide information about the application of licences and rights statements to images by Trove contributors and moved them to &lt;a href=&#34;https://github.com/GLAM-Workbench/trove-images-rights-data&#34;&gt;their own GitHub repository&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>What do you want to do with Trove data?</title>
      <link>https://updates.timsherratt.org/2024/04/18/what-do-you.html</link>
      <pubDate>Thu, 18 Apr 2024 17:22:39 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/04/18/what-do-you.html</guid>
      <description>&lt;p&gt;In my work on the Trove Data Guide I&amp;rsquo;ve started sketching out a series of &lt;a href=&#34;https://wragge.github.io/trove-data-guide/pathways/index.html&#34;&gt;research pathways&lt;/a&gt;. These are intended as ways of connecting Trove data to tools and questions – providing examples of the steps involved in gathering, preparing, and using data to explore particular research topics.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve currently defined six pathways, roughly based on different types of data that you can get from Trove:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://wragge.github.io/trove-data-guide/pathways/text.html&#34;&gt;Text&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://wragge.github.io/trove-data-guide/pathways/images.html&#34;&gt;Images&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://wragge.github.io/trove-data-guide/pathways/structured-data.html&#34;&gt;Structured data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://wragge.github.io/trove-data-guide/pathways/geospatial.html&#34;&gt;Maps and places&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://wragge.github.io/trove-data-guide/pathways/networks.html&#34;&gt;Networks and relationships&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://wragge.github.io/trove-data-guide/pathways/collections.html&#34;&gt;Creating collections&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&amp;lsquo;Creating collections&amp;rsquo; is a bit different I suppose, as it&amp;rsquo;s meant to relate to the work of assembling research collections &lt;em&gt;from&lt;/em&gt; data in Trove – for example, creating a collection of annotated newspaper articles in Omeka.&lt;/p&gt;
&lt;p&gt;I have some ideas, of course, about the types of tutorials and examples to include in each pathway, but I&amp;rsquo;m wondering what you would like to see. &lt;strong&gt;What would you like to be able to do with Trove data?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;You might get some inspiration by browsing through what&amp;rsquo;s already in the &lt;a href=&#34;https://wragge.github.io/trove-data-guide/home.html&#34;&gt;Trove Data Guide&lt;/a&gt; and the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt;, or perhaps you have a research question that&amp;rsquo;s foundered because you couldn&amp;rsquo;t get the data you needed out of Trove. If you have any ideas please share them via the &lt;a href=&#34;https://github.com/wragge/trove-data-guide/discussions/categories/ideas&#34;&gt;TDG&amp;rsquo;s ideas board&lt;/a&gt;. This is a chance to get some of your gnarly Trove data problems solved!&lt;/p&gt;
&lt;p&gt;Note that the TDG links in this post go to the &lt;a href=&#34;https://wragge.github.io/trove-data-guide/home.html&#34;&gt;development version&lt;/a&gt;, which changes frequently. There is also a &lt;a href=&#34;http://tdg.glamworkbench.cloud.edu.au/&#34;&gt;published version&lt;/a&gt; that doesn&amp;rsquo;t include the latest content.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Update! Saving Trove newspaper articles and pages as images</title>
      <link>https://updates.timsherratt.org/2024/04/18/update-saving-trove.html</link>
      <pubDate>Thu, 18 Apr 2024 13:32:50 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/04/18/update-saving-trove.html</guid>
      <description>&lt;p&gt;You probably know that when you select the &lt;strong&gt;Download as Image&lt;/strong&gt; option for a digitised newspaper article in Trove what you get back is not actually an image ­– it&amp;rsquo;s an HTML document, in which the original image has been sliced up to try and fit on an A4 page when printed. So &lt;a href=&#34;https://trove.nla.gov.au/newspaper/article/203234114&#34;&gt;this article&lt;/a&gt;:&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/nla.news-article203234114-22060404.jpg&#34; width=&#34;600&#34; height=&#34;766&#34; alt=&#34;Image of newspaper article as it appeared in the published newspaper, with the central column extending well below the main body of the article&#34;&gt;
&lt;p&gt;Ends up looking like this!!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-04-18-11-22-11.png&#34; width=&#34;600&#34; height=&#34;1563&#34; alt=&#34;Image of the results of using Trove&#39;s download article as image option -- the content of the article has been sliced and jumbled making it very difficult to understand how it originally appeared&#34;&gt;
&lt;p&gt;So what do you do when you just want an image of an article as it appeared in the newspaper? Some years ago I figured out a workaround that involves scraping the OCR positional data that&amp;rsquo;s embedded in Trove&amp;rsquo;s newspaper viewer and cropping the article from a high-resolution image of the page. The method is &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/Save-Trove-newspaper-article-as-image/&#34;&gt;documented&lt;/a&gt; in the GLAM Workbench and the &lt;a href=&#34;https://tdg.glam-workbench.net/newspapers-and-gazettes/how-to/get-ocr-coordinates.html&#34;&gt;Trove Data Guide&lt;/a&gt;, and I&amp;rsquo;ve packaged up the code in &lt;a href=&#34;https://wragge.github.io/trove_newspaper_images/&#34;&gt;trove-newspapers-images&lt;/a&gt; so you can embed it in your own projects.&lt;/p&gt;
&lt;p&gt;I also created a web app (using Jupyter and Voilá) to make it as simple as possible for people to download images from articles. Unlike most of the other notebooks in the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; which are spun up on demand, this web app was hosted on a constantly-running server. This made it faster to start and use, but it was relatively expensive, wasteful, and difficult to maintain. So I decided to make a change!&lt;/p&gt;
&lt;p&gt;The new version of the &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/Save-Trove-newspaper-article-as-image-app/&#34;&gt;Save Trove newspaper article as image web app&lt;/a&gt; is actually embedded with the GLAM Workbench. Behind the scenes, the page calls a AWS Lambda function which uses &lt;a href=&#34;https://wragge.github.io/trove_newspaper_images/&#34;&gt;trove-newspapers-images&lt;/a&gt; to generate the image. So far it seems to be working pretty well. &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/Save-Trove-newspaper-article-as-image-app/&#34;&gt;&lt;strong&gt;Try it now!&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/Save-Trove-newspaper-article-as-image-app/&#34;&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-04-18-12-08-39.png&#34; width=&#34;600&#34; height=&#34;410&#34; alt=&#34;Screen capture of the web app from the GLAM Workbench&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Even better, I&amp;rsquo;ve made some changes to image generation code to give users the option of masking the articles. The original version crops a rectangle from the page using the article coordinates. If an article extends over multiple columns with different lengths, the image will include content from neighbouring articles. It&amp;rsquo;s not a big problem, but it always annoyed me. Recently I realised that the solution was quite simple – instead of cropping one big box from the page, you can crop each individual OCR &amp;lsquo;zone&amp;rsquo; and paste them into a new empty image with the same dimensions as the original. Once you&amp;rsquo;ve pasted all the zones, you crop the new page image using the article coordinates. Here&amp;rsquo;s an example of &lt;a href=&#34;http://nla.gov.au/nla.news-article255909273&#34;&gt;an article&lt;/a&gt; without masking:&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/cell-9-output-1.jpeg&#34; width=&#34;600&#34; height=&#34;880&#34; alt=&#34;&#34;&gt;
&lt;p&gt;And the same article &lt;em&gt;with&lt;/em&gt; masking:&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/cell-10-output-1.jpeg&#34; width=&#34;600&#34; height=&#34;880&#34; alt=&#34;&#34;&gt;
&lt;p&gt;This enhancement has been pushed to the &lt;a href=&#34;https://wragge.github.io/trove_newspaper_images/&#34;&gt;trove-newspapers-images&lt;/a&gt; package, and is available through the &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/Save-Trove-newspaper-article-as-image-app/&#34;&gt;web app&lt;/a&gt; by simply checking the &amp;lsquo;mask image&amp;rsquo; option.&lt;/p&gt;
&lt;p&gt;Another frustrating feature of the Trove web interface is that there&amp;rsquo;s no way of saving a newspaper &lt;em&gt;page&lt;/em&gt; as an image, only as a PDF. In this case the workaround is pretty simple, you just have to know the url pattern used to download page images. This is &lt;a href=&#34;https://tdg.glam-workbench.net/newspapers-and-gazettes/data/pages.html#download-a-page-image&#34;&gt;documented&lt;/a&gt; in the &lt;em&gt;Trove Data Guide&lt;/em&gt;. Once again, I&amp;rsquo;ve been providing a web app to make this easy for users, and once again I&amp;rsquo;ve just updated it so that it&amp;rsquo;s embedded with the GLAM Workbench itself. &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/Save-page-image/&#34;&gt;&lt;strong&gt;Try it now!&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/Save-page-image/&#34;&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-04-18-12-19-01.png&#34; width=&#34;600&#34; height=&#34;390&#34; alt=&#34;&#34;&gt;&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Getting to know NED – born-digital periodicals in Trove</title>
      <link>https://updates.timsherratt.org/2024/04/10/getting-to-know.html</link>
      <pubDate>Thu, 11 Apr 2024 00:05:01 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/04/10/getting-to-know.html</guid>
      <description>&lt;p&gt;I spend a lot of my time trying to highlight the wealth of resources available through Trove – whether that&amp;rsquo;s &lt;a href=&#34;https://updates.timsherratt.org/2024/02/27/new-glam-workbench.html&#34;&gt;25,000 digitised Parliamentary Papers&lt;/a&gt;, &lt;a href=&#34;https://updates.timsherratt.org/2024/01/04/exploring-oral-histories.html&#34;&gt;6,000 oral histories you can listen to online&lt;/a&gt;, or &lt;a href=&#34;https://updates.timsherratt.org/2024/03/19/a-new-way.html&#34;&gt;3,471 full-page editorial cartoons from &lt;em&gt;The Bulletin&lt;/em&gt;&lt;/a&gt;. Most recently I&amp;rsquo;ve been working on &lt;a href=&#34;https://updates.timsherratt.org/2024/03/26/more-tools-and.html&#34;&gt;digitised periodicals&lt;/a&gt;, developing a &lt;a href=&#34;https://wragge.github.io/trove-data-guide/other-digitised-resources/periodicals/overview.html&#34;&gt;new section&lt;/a&gt; for the &lt;em&gt;Trove Data Guide&lt;/em&gt;. But as I was harvesting data about the 900 periodicals and 37,000 issues that had so far been digitised, I wondered about periodicals that were &lt;em&gt;born digital&lt;/em&gt; – in particular, those that had been submitted to the National Library by publishers and authors through the &lt;a href=&#34;https://ned.gov.au/ned/&#34;&gt;National eDeposit Scheme&lt;/a&gt; (NED). It turns out, there&amp;rsquo;s a lot more than I realised.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve added &lt;a href=&#34;https://glam-workbench.net/trove-journals/harvest-ned-periodicals/&#34;&gt;a new notebook&lt;/a&gt; to the Trove Periodicals section of the GLAM Workbench that harvests data about NED periodicals, and created a &lt;a href=&#34;https://glam-workbench.net/trove-journals/trove-ned-periodicals-data/&#34;&gt;new dataset&lt;/a&gt; with lists of titles and issues. You can also &lt;a href=&#34;https://glam-workbench.net/datasette-lite/?url=https://github.com/GLAM-Workbench/trove-ned-periodicals-data/blob/main/ned-periodicals.db&amp;amp;install=datasette-json-html&amp;amp;install=datasette-template-sql&amp;amp;metadata=https://github.com/GLAM-Workbench/trove-ned-periodicals-data/blob/main/metadata.json&#34;&gt;explore the harvested data using Datasette-Lite&lt;/a&gt;. But here&amp;rsquo;s a quick overview.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;There are at least 7,973 born-digital periodicals contributed through NED, comprising a total of 156,151 issues!&lt;/strong&gt;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-04-10-22-56-27.png&#34; width=&#34;600&#34; height=&#34;383&#34; alt=&#34;Screenshot of Trove displaying the Palm Island Voice, one of the NED periodicals&#34;&gt;
&lt;p&gt;&lt;em&gt;One of the 428 issues of the &lt;a href=&#34;https://nla.gov.au/nla.obj-1252246147&#34;&gt;Palm Island Voice&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;What are they? Here&amp;rsquo;s the twenty titles with the most issues.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;title_id&lt;/th&gt;
&lt;th&gt;title&lt;/th&gt;
&lt;th&gt;issues&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-1916881555&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-1916881555&#34;&gt;Western Australian government gazette.&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1869&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-2940864261&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-2940864261&#34;&gt;The Australian Jewish News.&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1067&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-2692666983&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-2692666983&#34;&gt;APSjobs-vacancies daily &amp;hellip; daily gazette.&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1043&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-2945379691&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-2945379691&#34;&gt;Tweed link&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;825&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-2541626239&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-2541626239&#34;&gt;Weekly notice&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;798&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-2940863963&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-2940863963&#34;&gt;The Australian Jewish News.&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;726&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-1252109725&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-1252109725&#34;&gt;Queensland Health services bulletin&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;700&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-1247944368&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-1247944368&#34;&gt;Hyden Karlgarin Householder News.&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;642&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-1775015332&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-1775015332&#34;&gt;E-record : your news from across the Archdioce&amp;hellip;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;640&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-638303044&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-638303044&#34;&gt;Class ruling&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;580&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-2536144595&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-2536144595&#34;&gt;Plantagenet news.&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;574&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-1252305285&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-1252305285&#34;&gt;Clermont rag : Community newspaper.&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;514&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-2815835489&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-2815835489&#34;&gt;The Apollo Bay news.&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;513&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-1908935587&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-1908935587&#34;&gt;Assessment reports and exam papers&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-3125539859&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-3125539859&#34;&gt;The Peninsula community access news.&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;506&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-2859788676&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-2859788676&#34;&gt;Council news : weekly information from us to you&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;469&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-1252119874&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-1252119874&#34;&gt;Rot-Ayr-Ian [electronic resource] : the offici&amp;hellip;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;467&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-2994765231&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-2994765231&#34;&gt;Townsville Orchid Society Inc. bulletin.&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;442&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-1252246096&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-1252246096&#34;&gt;Palm Island Voice.&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;428&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nla.obj-3267060622&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;http://nla.gov.au/nla.obj-3267060622&#34;&gt;News &amp;amp; views from George Cochrane.&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;399&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;These are &lt;em&gt;born-digital&lt;/em&gt;, so they&amp;rsquo;re not images and OCRd text like the digitised periodicals and newspapers. Most of them are PDFs as we can see from the metadata.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;format&lt;/th&gt;
&lt;th&gt;number of issues&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;application/pdf&lt;/td&gt;
&lt;td&gt;154,976&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;not specified&lt;/td&gt;
&lt;td&gt;1,075&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;application/epub+zip&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Not all NED periodicals can be viewed online. Publishers submitting periodicals through NED can place restrictions on access, specifying that that the publications can only be viewed on-site in a library. The three access categories are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Unrestricted&lt;/code&gt; – you can view online and download&lt;/li&gt;
&lt;li&gt;&lt;code&gt;View Only&lt;/code&gt; – you can view online but not download&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Onsite Only&lt;/code&gt; – you can only view when onsite at the designated libraries&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Fortunately, the vast majority are &lt;code&gt;Unrestricted&lt;/code&gt;.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Access status&lt;/th&gt;
&lt;th&gt;Number of issues&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Unrestricted&lt;/td&gt;
&lt;td&gt;138,557&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;View Only&lt;/td&gt;
&lt;td&gt;12,937&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Onsite Only&lt;/td&gt;
&lt;td&gt;4,657&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;One of the most important things about the digitised newspaper corpus is its diversity – it&amp;rsquo;s not just the metropolitan dailies, but many local, community, political, and religious newspapers as well. While local newspapers might be dying out in their traditional form, electronic publications are popping up. Look at the titles in the list above – the &lt;em&gt;Apollo Bay News&lt;/em&gt;, the &lt;em&gt;Palm Island Voice&lt;/em&gt; – while current historians mine the digitised newspapers for fragments of everyday life, future historians will be grateful for what&amp;rsquo;s being captured and preserved by NED.&lt;/p&gt;
&lt;p&gt;But wait there&amp;rsquo;s more! Since 1996, the Australian Web Archive (previously Pandora) has been capturing online periodicals. My next task is to harvest some details of these as well.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>More tools and data for working with Trove&#39;s digitised periodicals</title>
      <link>https://updates.timsherratt.org/2024/03/26/more-tools-and.html</link>
      <pubDate>Tue, 26 Mar 2024 15:35:02 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/03/26/more-tools-and.html</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/trove-journals/&#34;&gt;Trove Periodicals&lt;/a&gt; section of the GLAM Workbench has been updated! Some changes were necessary to make use of version 3 of the Trove API, but I&amp;rsquo;ve also taken the chance to reorganise things a bit – starting with the name. This section used to be called &amp;lsquo;Trove journals&amp;rsquo;, reflecting the naming of Trove&amp;rsquo;s &amp;lsquo;Journals&amp;rsquo; zone. But zones have gone, and periodicals are now spread across multiple categories, so I thought a name change was necessary to better reflect the type of content being examined.&lt;/p&gt;
&lt;h2 id=&#34;what-periodicals-have-been-digitised&#34;&gt;What periodicals have been digitised?&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s surprising difficult to find out what periodicals have actually been digitised in Trove. There&amp;rsquo;s no straightforward list of titles as there is in the newspapers category. Over the years I&amp;rsquo;ve created a variety of lists and tools to try and overcome this. I&amp;rsquo;m now trying to consolidate these efforts into &lt;a href=&#34;https://updates.timsherratt.org/2024/01/30/exploring-troves-digitised.html&#34;&gt;a single dataset which you can explore using Datasette-Lite&lt;/a&gt;. I&amp;rsquo;ve made a few improvements to this in recent weeks, in particular, title records now include a link to download &lt;em&gt;all&lt;/em&gt; the OCRd text from periodical.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-03-26-15-01-09.png&#34; width=&#34;600&#34; height=&#34;384&#34; alt=&#34;Screen capture of Datasette-Lite interface showing a list of periodical titles.&#34;&gt;
&lt;h2 id=&#34;new-notebooks&#34;&gt;New notebooks&lt;/h2&gt;
&lt;p&gt;The notebook pages in the GLAM Workbench now include previews of the notebook&amp;rsquo;s content. There are a number of new notebooks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/periodicals-from-api/&#34;&gt;Get details of periodicals from the &lt;code&gt;/magazine/titles&lt;/code&gt; API endpoint&lt;/a&gt; – shows how you can get a list of titles from version 3 of the Trove API and explores some of the problems with the data&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/periodicals-enrich-for-datasette/&#34;&gt;Enrich the list of periodicals from the Trove API&lt;/a&gt; – shows how to work around some of the problems with the titles data, adds some extra metadata, and generates the database described above&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/harvest-illustrations-from-periodicals/&#34;&gt;Harvest illustrations from periodicals&lt;/a&gt; – extract illustrations for periodical pages, issues, articles and searches using a OCR layout data&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you&amp;rsquo;d like an example of the sorts of illustrations you can extract from the digitised periodicals, here&amp;rsquo;s a &lt;a href=&#34;https://www.dropbox.com/scl/fo/60imdoyf4ss2b6vh01q1w/h?rlkey=zuwbjaqnmr7qvkuinovdu5ot0&amp;amp;dl=0&#34;&gt;collection of photos&lt;/a&gt; found by searching for periodical articles with &lt;code&gt;cat&lt;/code&gt; or &lt;code&gt;kitten&lt;/code&gt; in their titles.&lt;/p&gt;
&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-03-26-15-15-06.png&#34; width=&#34;600&#34; height=&#34;259&#34; alt=&#34;Thumbnails of cat photos extracted from periodicals.&#34;&gt;
&lt;h2 id=&#34;updated-and-reorganised-datasets&#34;&gt;Updated and reorganised datasets&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve moved all the datasets out of the main GitHub repository into their own separate repositories. Some large collections that were previously stored on the sadly-deceased Cloudstor service are now sitting in an Amazon s3 bucket. These include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/periodicals-data-api/&#34;&gt;Details of digitised periodicals from the &lt;code&gt;/magazine/titles&lt;/code&gt; API endpoint&lt;/a&gt; – these are the datasets created by harvesting and enriching titles and issues data from the Trove API&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/csv-digital-journals/&#34;&gt;CSV formatted list of journals available from Trove in digital form&lt;/a&gt; – this is an update of an older dataset of titles created by searching for digitised works with the format &lt;code&gt;Periodical&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/bulletin-cartoons-collection/&#34;&gt;Editorial cartoons from The Bulletin, 1886 to 1952&lt;/a&gt; – the cartoons haven&amp;rsquo;t been updated, but I&amp;rsquo;ve created a new metadata file and fixed up some problems with page numbering&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/ocrd-text-all-journals/&#34;&gt;OCRd text from Trove digitised journals&lt;/a&gt; – I&amp;rsquo;ve reharvested all of the OCRd text and made it available as individual zip files for each title, and one &lt;em&gt;big&lt;/em&gt; zip file with everything!&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As previously noted, I&amp;rsquo;ve also &lt;a href=&#34;https://updates.timsherratt.org/2024/03/19/a-new-way.html&#34;&gt;made the Bulletin cartoons available through Datasette-Lite for easy exploration&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-03-19-15-43-41.png&#34; width=&#34;600&#34; height=&#34;459&#34; alt=&#34;Screen capture of Datasette-Lite interface showing some of the Bulletin cartoons.&#34;&gt;
</description>
    </item>
    
    <item>
      <title>A new way to explore editorial cartoons from *The Bulletin*</title>
      <link>https://updates.timsherratt.org/2024/03/19/a-new-way.html</link>
      <pubDate>Tue, 19 Mar 2024 15:46:22 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/03/19/a-new-way.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://updates.timsherratt.org/2019/05/09/over-the-last.html&#34;&gt;About five years ago&lt;/a&gt; I created a collection of full-page editorial cartoons from &lt;em&gt;The Bulletin&lt;/em&gt;, harvested from Trove. Through a process that might be politely described as &amp;lsquo;iterative&amp;rsquo;, I fiddled with an assortment of queries and methods until I had at least one cartoon from every issue published between 4 September 1886 and 17 September 1952 – 3,471 cartoons in total. The &lt;a href=&#34;https://glam-workbench.net/trove-journals/bulletin-cartoons-collection/&#34;&gt;details of the collection&lt;/a&gt; and &lt;a href=&#34;https://glam-workbench.net/trove-journals/finding-editorial-cartoons-in-bulletin/&#34;&gt;how I created it&lt;/a&gt; are available in the &lt;a href=&#34;https://glam-workbench.net/trove-journals/&#34;&gt;Trove periodicals&lt;/a&gt; section of the GLAM Workbench.&lt;/p&gt;
&lt;p&gt;Last night, as I was tidying up a new release of the Trove periodicals repository, I had a thought – why not put all of the details of the cartoons in a little database and make it available using Datasette-Lite for easy exploration? So I did.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/datasette-lite/?url=https://github.com/GLAM-Workbench/bulletin-editorial-cartoons/blob/main/bulletin-editorial-cartoons.db&amp;amp;install=datasette-json-html&amp;amp;metadata=https://raw.githubusercontent.com/GLAM-Workbench/bulletin-editorial-cartoons/main/metadata.json#/bulletin-editorial-cartoons/cartoons&#34;&gt;&lt;strong&gt;Try it now!&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-03-19-15-43-41.png&#34; width=&#34;600&#34; height=&#34;459&#34; alt=&#34;screenshot of Datasette interface showing how details of the cartoons are displayed&#34;&gt;
&lt;p&gt;One of the coolest new features is that I&amp;rsquo;ve harvested the OCRd text from each page containing a cartoon and created a full-text index. This means you can find cartoons by searching for words in their captions! Other features include embedded thumbnail images and links to download high-resolution versions of each page image.&lt;/p&gt;
&lt;p&gt;In creating the database, I realised there were a few problems with the original metadata (dodgy page numbers), so I&amp;rsquo;ve fixed that up as well. I&amp;rsquo;ve also moved the mega zip download of every image (over 60gb) from the unfortunately deceased CloudStor service to AWS.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>New GLAM Workbench section for working with government publications in Trove</title>
      <link>https://updates.timsherratt.org/2024/02/27/new-glam-workbench.html</link>
      <pubDate>Tue, 27 Feb 2024 14:34:18 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/02/27/new-glam-workbench.html</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; has a brand new section aimed at helping you find and use government publications in Trove. Most of the GLAM Workbench&amp;rsquo;s existing sections focus on a particular resource format, or are related to one of Trove&amp;rsquo;s top-level categories. This didn&amp;rsquo;t quite work for government publications, as things like Parliamentary Papers are spread across multiple categories, and can encompass a variety of formats. So I thought a new section was the best way of bringing it all together.&lt;/p&gt;
&lt;p&gt;At the moment the &lt;a href=&#34;https://glam-workbench.net/trove-government/&#34;&gt;Trove Government section&lt;/a&gt; includes two notebooks and three pre-harvested datasets.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-government/harvest-parliament-press-releases/&#34;&gt;Harvest parliament press releases from Trove&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-government/harvest-parliamentary-papers/&#34;&gt;Harvest details of Commonwealth Parliamentary Papers digitised in Trove&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-government/trove-parliamentary-papers-data/&#34;&gt;Digitised Parliamentary Papers in Trove&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-government/trove-parliament-press-releases-refugees/&#34;&gt;Press releases relating to refugees&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-government/trove-parliament-press-releases-covid/&#34;&gt;Press releases relating to COVID&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It took a bit longer than I was originally expecting because I also made some changes in the way I store and display information about GLAM Workbench resources. You might notice, for example, that each of the datasets lives in its own separate GitHub repository, rather than being rolled together with the notebooks into one big repository. This makes it easier to manage and share information about individual datasets, and also trims down the size of the Docker images built from the code repository.&lt;/p&gt;
&lt;p&gt;Each of these data and code repositories have their own machine-readable metadata following the &lt;a href=&#34;https://www.researchobject.org/ro-crate/&#34;&gt;RO-Crate&lt;/a&gt; standard. This continues &lt;a href=&#34;https://updates.timsherratt.org/2023/08/31/some-important-updates.html&#34;&gt;work I&amp;rsquo;ve been doing&lt;/a&gt; with the &lt;a href=&#34;https://ardc.edu.au/project/ardc-community-data-lab/&#34;&gt;ARDC Community Data Lab&lt;/a&gt; to describe GLAM Workbench resources and outputs using RO-Crate. Having this metadata in a standard format creates new possibilities for integration and automation. I&amp;rsquo;m now using the RO-Crate files to produce different, public-facing views of the resources they describe. The README files in each repository and all the GLAM Workbench pages in the Trove Government section are automatically generated from the RO-Crate data. In the latter case, I&amp;rsquo;ve extended my MkDocs setup using macros to pull in the RO-Crate JSON files and make the data available to the page templates. Connecting all the bits up took a lot of time, but I&amp;rsquo;m pretty happy with the result and will eventually extend this approach to the rest of the GLAM Workbench.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-02-27-14-19-39.png&#34; width=&#34;600&#34; height=&#34;496&#34; alt=&#34;Screenshot from the GLAM Workbench showing the new &#39;Preview&#39; feature.&#34;&gt;
&lt;p&gt;I also fiddled a bit with the way Jupyter notebooks are presented in the GLAM Workbench. The Trove Government pages include a notebook preview – basically an HTML rendering of the notebook in an &lt;code&gt;iframe&lt;/code&gt;. This means you can browse the content of the notebook without having to do anything extra, or go anywhere else. In other sections you can view notebook content by following links to GitHub or NBViewer, but the embedded previews seem cleaner and more useful.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-02-27-14-18-25.png&#34; width=&#34;600&#34; height=&#34;242&#34; alt=&#34;Screenshot from GLAM Workbench showing the tabbed &#39;Using this notebook&#39; options.&#34;&gt;
&lt;p&gt;I&amp;rsquo;ve also changed the way options to run the notebook are presented. In the Trove Government section, these options are displayed as tabs beneath the preview – allowing you to choose, for example, between ARDC Binder and the public MyBinder service. In other sections I have a big blue button to launch the notebook using a specific service, with other options listed below. This new approach means I don&amp;rsquo;t have to prioritise one particular service – it&amp;rsquo;s left to the user to choose. It&amp;rsquo;s also expandable. In the future, I&amp;rsquo;m hoping to make some of the GLAM Workbench&amp;rsquo;s notebooks available using JupyterLite. As I do this, I can just add the JupyterLite option under another tab.&lt;/p&gt;
&lt;p&gt;As with some other sections of the GLAM Workbench, the dataset pages are integrated with Datasette-Lite. If there&amp;rsquo;s a CSV file in the dataset, you&amp;rsquo;ll see a button to explore it using Datasette. For example, &lt;a href=&#34;https://glam-workbench.net/datasette-lite/?csv=https://raw.githubusercontent.com/GLAM-Workbench/trove-parliamentary-papers-data/main/trove-parliamentary-papers.csv&amp;amp;fts=title,alternative_title,contributor&amp;amp;drop=work_type,fulltext_url_text,parent,parent_url,children&#34;&gt;this link leads to a searchable database&lt;/a&gt; with details of 24,997 digitised Parliamentary Papers. That same dataset has also been used in the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt;, to &lt;a href=&#34;https://tdg.glam-workbench.net/other-digitised-resources/parliamentary-papers/overview.html&#34;&gt;visualise Trove&amp;rsquo;s holdings of Parliamentary Papers&lt;/a&gt;. Yay for integration and reuse!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Digital history stream at AHA annual conference in July</title>
      <link>https://updates.timsherratt.org/2024/02/16/digital-history-stream.html</link>
      <pubDate>Fri, 16 Feb 2024 11:44:32 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/02/16/digital-history-stream.html</guid>
      <description>&lt;p&gt;This year the annual conference of the Australian Historical Association will include a digital history stream, sponsored by the Australian Research Data Commons (ARDC), and convened by me!&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.flinders.edu.au/content/dam/documents/engage/events/aha-conference/digital-history-stream-aha-2024.pdf&#34;&gt;call for papers is available here&lt;/a&gt; or through the &lt;a href=&#34;https://www.flinders.edu.au/engage/culture/whats-on/aha-conference&#34;&gt;Conference website&lt;/a&gt;. The list of possible topics is deliberately broad and inclusive – if you’re using digital tools or methods in the organisation, analysis, and visualisation of historical data we’d love to hear from you. Proposals are due on &lt;del&gt;23 February&lt;/del&gt; 4 March and can be submitted through the &lt;a href=&#34;https://www.flinders.edu.au/engage/culture/whats-on/aha-conference&#34;&gt;Conference website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.flinders.edu.au/content/dam/documents/engage/events/aha-conference/digital-history-stream-aha-2024.pdf&#34;&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-02-16-11-41-34.png&#34; width=&#34;600&#34; height=&#34;594&#34; alt=&#34;Screen capture of the Call for Papers&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We’re particularly keen for HDR and ECR scholars to be involved. To help meet registration and travel costs, the ARDC is funding up to four $1000 bursaries. &lt;a href=&#34;https://flinders-my.sharepoint.com/:b:/g/personal/anto0105_flinders_edu_au/EfSLhqTZn8REqyEQm8VYwx8B1Pll8knu2fJ88DmNoc_f1g?e=VFGXgC&#34;&gt;More details are available here&lt;/a&gt;. Bursary applications close on 31 March.&lt;/p&gt;
&lt;p&gt;There’s also likely to be a digital history workshop, as well as updates on the work of the &lt;a href=&#34;https://ardc.edu.au/hass-and-indigenous-research-data-commons/&#34;&gt;HASS &amp;amp; Indigenous Research Data Commons&lt;/a&gt; and &lt;a href=&#34;https://ardc.edu.au/project/ardc-community-data-lab/&#34;&gt;ARDC Community Data Lab&lt;/a&gt;, including the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Time is short! Get your proposals in now! Contact me at &lt;a href=&#34;mailto:tim@timsherratt.au&#34;&gt;tim@timsherratt.au&lt;/a&gt; if you have any questions.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Some recent presentations on the GLAM Workbench and Trove Data Guide</title>
      <link>https://updates.timsherratt.org/2024/02/13/some-recent-presentations.html</link>
      <pubDate>Tue, 13 Feb 2024 09:11:17 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/02/13/some-recent-presentations.html</guid>
      <description>&lt;p&gt;Last week I attended the ARDC Workshop on Repositories &amp;amp; Workspaces where I gave a &lt;a href=&#34;https://slides.com/wragge/cdl-gw-workspaces&#34;&gt;quick intro to the GLAM Workbench and the Community Data Lab&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-01-31-16-28-37.png&#34; width=&#34;600&#34; height=&#34;442&#34; alt=&#34;Cover slide of my presentation on the mysteries of Trove&#34;&gt;
&lt;p&gt;Then it was off to the ARDC HASS&amp;amp;I Research Data Commons Summer School where I explored some of &lt;a href=&#34;https://slides.com/wragge/hass-tdg&#34;&gt;the mysteries of Trove&lt;/a&gt; in a walk-through of the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Exploring Trove’s digitised periodicals</title>
      <link>https://updates.timsherratt.org/2024/01/30/exploring-troves-digitised.html</link>
      <pubDate>Tue, 30 Jan 2024 22:53:14 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/01/30/exploring-troves-digitised.html</guid>
      <description>&lt;p&gt;While Trove’s digitised newspapers get all the attention, there are many other digitised periodicals to explore. But it’s not easy to find them from the Trove web interface – unlike the newspapers, there’s no list of digitised titles. So to help researchers find and use Trove’s digitised periodicals, I’ve created a searchable database using Datasette-Lite. &lt;a href=&#34;https://glam-workbench.net/datasette-lite/?url=https://github.com/GLAM-Workbench/trove-periodicals-data/blob/main/periodicals.db&amp;amp;install=datasette-json-html&amp;amp;install=datasette-template-sql&amp;amp;metadata=https://github.com/GLAM-Workbench/trove-periodicals-data/blob/main/metadata.json&#34;&gt;&lt;strong&gt;Try it out!&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Search for the titles of digitised periodicals.&lt;/em&gt;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-01-29-21-45-34.png&#34; width=&#34;600&#34; height=&#34;405&#34; alt=&#34;Screenshot showing a list of periodical titles in Datasette-Lite.&#34;&gt;
&lt;p&gt;&lt;em&gt;View the details of an individual title (note the link to available issues at the bottom.&lt;/em&gt;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-01-29-21-45-47.png&#34; width=&#34;600&#34; height=&#34;405&#34; alt=&#34;Screenshot showing details of a single periodical title&#34;&gt;
&lt;p&gt;&lt;em&gt;Browse a list of available issues.&lt;/em&gt;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-01-29-21-45-57.png&#34; width=&#34;600&#34; height=&#34;405&#34; alt=&#34;Screenshot showing a list of periodical issues&#34;&gt;
&lt;p&gt;The database currently contains details of 923 different titles, and over 37,000 individual issues. You can search for titles by keyword, then click through to view a full list of issues from a periodical. As well as basic descriptive metadata and links back to Trove, there’s a couple of other handy inclusions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Titles include a ‘Search for articles in Trove’ link that opens up the Trove interface and pre-populates the search box with the title’s identifier. By adding some keywords you can search for articles within the publication.&lt;/li&gt;
&lt;li&gt;Issues include a &lt;code&gt;text_download_url&lt;/code&gt; link that downloads all the OCRd text from the issue.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Regular viewers might be thinking ­– wasn’t there already something like this? Yes indeed, for several years I’ve been maintaining the &lt;a href=&#34;https://trove-titles.herokuapp.com/&#34;&gt;Trove Titles&lt;/a&gt; app, which provides a similar list. I’ve also provided &lt;a href=&#34;https://glam-workbench.net/trove-journals/journals-with-ocr/&#34;&gt;harvests of OCRd text&lt;/a&gt;. So why the new database? First of all, I’ve harvested the data in a different way – making use of the new &lt;code&gt;/magazine/titles&lt;/code&gt; API endpoint. This had several problems (see below), but I’m hoping that in the long term it will make updates easier.&lt;/p&gt;
&lt;p&gt;Second, I’m exploring ways to make these sorts of resources available in a more sustainable way. The current Trove Titles app runs on the Heroku platform and there are costs associated with the app and the databases it uses. It just seems a bit silly for a relatively small amount of data. Datasette-Lite takes a very different approach – there’s no constantly running server, just a static site pointing at a dataset. All the magic happens within your browser!&lt;/p&gt;
&lt;p&gt;I’ve written previously about how I’ve been &lt;a href=&#34;https://updates.timsherratt.org/2024/01/12/customising-datasettelite-to.html&#34;&gt;customising Datasette-Lite for use within the GLAM Workbench&lt;/a&gt;, but I had to handle the periodicals data a bit differently. Because there’s a foreign key relationship between the titles and the issues (each issue is linked back to a title), I loaded the harvested data into a SQLite database (using &lt;a href=&#34;https://sqlite-utils.datasette.io/en/stable/index.html&#34;&gt;sqlite-utils&lt;/a&gt;), defined the foreign key, and built a fulltext index on the periodical titles. Then I just saved the whole SQLite database to a GitHub repository and pointed Datasette-Lite at it.&lt;/p&gt;
&lt;p&gt;I had to modify the GLAM Workbench template a bit to insert links back to the title when you view an individual issue. This happens automatically when you view a list of results, but not when you view an individual item. First I used the &lt;code&gt;install&lt;/code&gt; parameter to tell Datasette-Lite to install the &lt;a href=&#34;https://datasette.io/plugins/datasette-template-sql&#34;&gt;datasette-template-sql&lt;/a&gt; plugin. This plugin lets you run SQL queries within a template. Then I could run a query to see if there was a foreign key associated with the current item:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code class=&#34;language-jinja2&#34; data-lang=&#34;jinja2&#34;&gt;{% set fk = sql(&amp;quot;SELECT * FROM pragma_foreign_key_list(?)&amp;quot;, [table]) %}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;If there is a foreign key, I run another query to get the title of the linked title:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code class=&#34;language-jinja2&#34; data-lang=&#34;jinja2&#34;&gt;{% if fk %}
    {% set flinks = sql(&amp;quot;select title, &amp;quot; + fk.0.to + &amp;quot; from &amp;quot; + fk.0.table + &amp;quot; where id = ?&amp;quot;, [display_rows.0[fk.0.from]]) %}
    {% set ftitle = flinks.0.title %}
{% endif %}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Then when rendering the column containing the linked value I can insert the title and a link:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code class=&#34;language-jinja2&#34; data-lang=&#34;jinja2&#34;&gt;{% if fk and cell.column == fk.0.from %}
	&amp;lt;a href=&amp;quot;/{{database}}/{{fk.0.table}}/{{cell.value}}&amp;quot;&amp;gt;{{ftitle}}&amp;lt;/a&amp;gt; &amp;lt;em&amp;gt;{{cell.value}}&amp;lt;/em&amp;gt;
{% else %}
	{{ row.display(cell.column) }}
{% endif %}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It seems to work ok, and doesn’t cause problems on databases where there’s no foreign keys.&lt;/p&gt;
&lt;p&gt;I’m also using the &lt;a href=&#34;https://datasette.io/plugins/datasette-json-html&#34;&gt;datasette-json-html&lt;/a&gt; plugin to render the thumbnails. I’m also using the &lt;code&gt;metadata&lt;/code&gt; parameter to point Datasette-Lite at a custom metadata file – this was primarily to define a custom sort order for the tables.&lt;/p&gt;
&lt;h2 id=&#34;the-data&#34;&gt;The data&lt;/h2&gt;
&lt;p&gt;I’ll write up more about the data and the harvesting process in coming weeks. There’ll also be a new section in the &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt; and some updated notebooks in the &lt;a href=&#34;https://glam-workbench.net/trove-journals/&#34;&gt;journals section of the GLAM Workbench&lt;/a&gt;. But a few notes about the &lt;code&gt;/magazine/titles&lt;/code&gt; API endpoint:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;there are a few hundred duplicate records ­– I’ve removed these from the dataset&lt;/li&gt;
&lt;li&gt;the API doesn’t provide full information about issues, in particular undated issues are not returned – I’ve tried to fill these gaps&lt;/li&gt;
&lt;li&gt;the data includes a thousand or more Parliamentary Papers – I’ve harvested these separately and thought it was best to exclude them from this dataset&lt;/li&gt;
&lt;li&gt;some titles are actually nested collections, so the ‘issues’ are actually another level of title, while alternatively some titles are actually issues – I’ve tried to sort as much of this out as I can, but it gets confusing!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So I’m not confident that I’ve got everything, but I think it’s a useful start. I’ve reported the API problems to Trove but haven’t heard anything back yet.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>The Trove Newspaper Data Dashboard now has an archive!</title>
      <link>https://updates.timsherratt.org/2024/01/15/the-trove-newspaper.html</link>
      <pubDate>Mon, 15 Jan 2024 10:51:58 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/01/15/the-trove-newspaper.html</guid>
      <description>&lt;p&gt;Since July 2022 I’ve been generating weekly snapshots of the contents of the Trove newspaper corpus. Every Sunday a new version of the &lt;a href=&#34;https://wragge.github.io/trove-newspaper-totals/&#34;&gt;Trove Newspaper Data Dashboard&lt;/a&gt; is created, highlighting what’s changed over the previous week, and visualising trends since April 2022 (when I first started &lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals/&#34;&gt;regular data harvests&lt;/a&gt;).&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-01-15-10-42-09.png&#34; width=&#34;600&#34; height=&#34;378&#34; alt=&#34;Screenshot of part of the Trove Newspaper Data Dashboard, showing trends in number of articles, corrections, comments, and tags since April 2022.&#34;&gt;
&lt;p&gt;All of the past versions of the dashboard are preserved in GitHub, but there wasn’t an easy way to browse them, until now. If you want to find out what changed in any week since July 2022, you can now visit the &lt;a href=&#34;https://wragge.github.io/trove-newspaper-totals/archive/index.html&#34;&gt;Trove Data Dashboard Archive&lt;/a&gt; and select a date from the list!&lt;/p&gt;
&lt;p&gt;I created the archive by pulling all the versions from GitHub and saving them as individual files. I’ve also added some code to the weekly process that should automatically archive the past week from now on – we’ll see if it works next Sunday…&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Customising Datasette-Lite to explore datasets in the GLAM Workbench</title>
      <link>https://updates.timsherratt.org/2024/01/12/customising-datasettelite-to.html</link>
      <pubDate>Fri, 12 Jan 2024 16:09:41 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/01/12/customising-datasettelite-to.html</guid>
      <description>&lt;p&gt;As well as tools and code, the GLAM Workbench includes a number of pre-harvested datasets for researchers to play with. But just including a link to a CSV file in GitHub or Zenodo isn’t very useful – it doesn’t help researchers understand &lt;em&gt;what’s in&lt;/em&gt; the dataset, and &lt;em&gt;why&lt;/em&gt; it might be useful. That’s why I’ve also started including links that open the CSV files in &lt;a href=&#34;https://github.com/simonw/datasette-lite&#34;&gt;Datasette-Lite&lt;/a&gt;, enabling the contents to be searched, filtered, and faceted. Just look for the &lt;strong&gt;Explore in Datasette&lt;/strong&gt; buttons!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2023/6bfdff462d.png&#34; width=&#34;600&#34; height=&#34;193&#34; alt=&#34;Example of a Explore in Datasette button&#34;&gt;
&lt;p&gt;&lt;a href=&#34;https://datasette.io/&#34;&gt;Datasette&lt;/a&gt; is an excellent tool for sharing and exploring data. I’ve used it in a number of projects such as the &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Index Search&lt;/a&gt; and the &lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/tasmanian-post-office-directories/&#34;&gt;Tasmanian Post Office Directories&lt;/a&gt;. &lt;a href=&#34;https://github.com/simonw/datasette-lite&#34;&gt;Datasette-Lite&lt;/a&gt; is a version of Datasette that runs completely in the user’s web browser – no need for separate servers! All you do is point a Datasette-Lite Github repository at a publicly available CSV file, and it builds a searchable database in your browser. So instead of having to configure and maintain a series of servers running Datasette, I just have one static GitHub repository that only springs into action when needed.&lt;/p&gt;
&lt;p&gt;For example,  &lt;a href=&#34;https://glam-workbench.net/datasette-lite/?csv=https://github.com/GLAM-Workbench/trove-oral-histories-data/blob/main/trove-oral-histories.csv&amp;amp;fts=title,contributor,is_part_of&amp;amp;drop=publisher,work_type,fulltext_url_text&#34;&gt;click this link&lt;/a&gt; to explore metadata describing oral history collections in Trove using Datasette-Lite.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/datasette-lite/?csv=https://github.com/GLAM-Workbench/trove-oral-histories-data/blob/main/trove-oral-histories.csv&amp;amp;fts=title,contributor,is_part_of&amp;amp;drop=publisher,work_type,fulltext_url_text&#34;&gt;&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-01-12-16-02-56.png&#34; width=&#34;600&#34; height=&#34;425&#34; alt=&#34;Screenshot of oral histories metadata in Datasette&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I’ve made a few changes to the standard Datasette-Lite application for use with the GLAM Workbench. These are all included in the &lt;a href=&#34;https://github.com/GLAM-Workbench/datasette-lite&#34;&gt;GLAM Workbench’s Datasette-Lite fork&lt;/a&gt; and described below.&lt;/p&gt;
&lt;h2 id=&#34;custom-theme&#34;&gt;Custom theme&lt;/h2&gt;
&lt;p&gt;I’d already created a custom Datasette theme for use in other projects. The question was how do I get it to work with Datasette-Lite? Just putting a &lt;code&gt;templates&lt;/code&gt; folder in the repository wasn’t enough, as the virtual environment created within the browser doesn’t have direct access to all the files. I eventually figured out that I could zip up the templates folder, fetch the zip file using Javascript, and then unzip the folder into the browser’s virtual environment. This is the code in &lt;code&gt;webworker.js&lt;/code&gt; that does all that:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-javascript&#34; data-lang=&#34;javascript&#34;&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;let&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;templateResponse&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;await&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;fetch&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;templates.zip&amp;#34;&lt;/span&gt;);
&lt;span style=&#34;color:#66d9ef&#34;&gt;let&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;templateBinary&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;await&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;templateResponse&lt;/span&gt;.&lt;span style=&#34;color:#a6e22e&#34;&gt;arrayBuffer&lt;/span&gt;();
&lt;span style=&#34;color:#a6e22e&#34;&gt;pyodide&lt;/span&gt;.&lt;span style=&#34;color:#a6e22e&#34;&gt;unpackArchive&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;templateBinary&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;zip&amp;#34;&lt;/span&gt;);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then it’s just a matter of changing the Datasette initialisation to point to the &lt;code&gt;templates&lt;/code&gt; directory:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;ds &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; Datasette(
    names, 
    settings&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;{
    	&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;num_sql_threads&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;, 
    	&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;truncate_cells_html&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#ae81ff&#34;&gt;100&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;# truncate cells&lt;/span&gt;
    }, 
    metadata&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;metadata, 
    template_dir&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;templates&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#75715e&#34;&gt;# point to custom templates directory&lt;/span&gt;
    plugins_dir&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;plugins&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#75715e&#34;&gt;# point to custom plugins directory&lt;/span&gt;
    memory&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;$&lt;/span&gt;{settings&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;memory &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;?&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;True&amp;#39;&lt;/span&gt; : &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;False&amp;#39;&lt;/span&gt;}
)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As you can see, I’ve also added the &lt;code&gt;&amp;quot;truncate_cells_html&amp;quot;: 100&lt;/code&gt; setting to truncate the contents of cells in the table view.&lt;/p&gt;
&lt;h2 id=&#34;custom-plugins&#34;&gt;Custom plugins&lt;/h2&gt;
&lt;p&gt;Sometimes fields can contain multiple urls. While Datasette will make single urls clickable, multiple urls are just left as plain text. The &lt;a href=&#34;https://datasette.io/plugins/datasette-multiline-links&#34;&gt;datasette-multiline-links&lt;/a&gt; plugin fixes this for urls separated by line breaks, but I generally separate multiple values in CSV fields using the &lt;code&gt;|&lt;/code&gt; character. It wasn’t hard to &lt;a href=&#34;https://github.com/GLAM-Workbench/datasette-lite/blob/main/plugins/datasette-multi-links.py&#34;&gt;modify the plugin&lt;/a&gt;, but again it wasn’t clear how to make the modified plugin work with Datasette-Lite. You can use the &lt;code&gt;install&lt;/code&gt; parameter to load plugins, but the plugins have to either be published in PyPI or available in GitHub as a Python wheel. That all seemed like overkill for my tiny plugin modification, but then I realised that I could use the same method as I was using for the custom template – zip, fetch, unzip, then point Datasette to the new plugins directory.&lt;/p&gt;
&lt;p&gt;It also took me a while to figure out how to get the plugin to work nicely with the &lt;code&gt;truncate_cells_html&lt;/code&gt; setting. Unless a cell-formatting plugin returns &lt;code&gt;None&lt;/code&gt;, other cell format operations, such as truncation, aren’t applied. So I had to make sure that the plugin returned &lt;code&gt;None&lt;/code&gt; if there were no urls in a cell.&lt;/p&gt;
&lt;h2 id=&#34;custom-metadata&#34;&gt;Custom metadata&lt;/h2&gt;
&lt;p&gt;You can use the &lt;code&gt;metadata&lt;/code&gt; parameter in Datasette-Lite to point to a metadata file in either JSON or YAML. I’ve added a custom &lt;code&gt;metadata.json&lt;/code&gt; file to the GLAM Workbench repository, and adjusted the &lt;code&gt;webworker.js&lt;/code&gt;  code to load it by default.&lt;/p&gt;
&lt;h2 id=&#34;full-text-indexing&#34;&gt;Full text indexing&lt;/h2&gt;
&lt;p&gt;One really cool things about Datasette is the ability to run full text searches across specified columns. If Datasette detects a full text index, it automatically adds a keyword search box.&lt;/p&gt;
&lt;p&gt;There wasn’t a way of adding full text indexes to CSV datasets in Datasette-Lite, so I added a new &lt;code&gt;fts&lt;/code&gt; url parameter and used the value in &lt;code&gt;webworker.js&lt;/code&gt; to modify the database using &lt;a href=&#34;https://sqlite-utils.datasette.io/en/stable/&#34;&gt;SQLite-utils&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;fts_cols &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;$&lt;/span&gt;{JSON&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;stringify(settings&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;ftsCols &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;)} 
&lt;span style=&#34;color:#66d9ef&#34;&gt;try&lt;/span&gt;:
    db[bit]&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;enable_fts(fts_cols&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;,&amp;#34;&lt;/span&gt;)) &lt;span style=&#34;color:#75715e&#34;&gt;# add full text indexes to columns&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;except&lt;/span&gt; sqlite3&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;OperationalError:
    print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Column not found&amp;#34;&lt;/span&gt;)
    &lt;span style=&#34;color:#66d9ef&#34;&gt;pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For example, adding &lt;code&gt;fts=title&lt;/code&gt; to a Datasette-Lite url will automatically add a full text index to the title column. You can also index multiple columns – just separate the column names with commas.&lt;/p&gt;
&lt;p&gt;This url opens a CSV dataset with oral history metadata harvested from Trove and indexes the &lt;code&gt;title&lt;/code&gt; and &lt;code&gt;contributor&lt;/code&gt; columns: &lt;a href=&#34;https://glam-workbench.net/datasette-lite/?csv=https://github.com/GLAM-Workbench/trove-oral-histories-data/blob/main/trove-oral-histories.csv&amp;amp;fts=title,contributor&#34;&gt;https://glam-workbench.net/datasette-lite/?csv=https://github.com/GLAM-Workbench/trove-oral-histories-data/blob/main/trove-oral-histories.csv&amp;amp;fts=title,contributor&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Datasette converts your CSV file to a SQLite database, and SQLite supports a number of &lt;a href=&#34;https://www.sqlite.org/fts5.html#full_text_query_syntax&#34;&gt;advanced search options&lt;/a&gt;. These options aren’t enabled by default in Datasette – you need to set &lt;code&gt;searchmode&lt;/code&gt; to &lt;code&gt;raw&lt;/code&gt; in the table metadata. To enable advanced searches, I’ve added a line in &lt;code&gt;webworker.js&lt;/code&gt; to modify the default metadata:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;metadata[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;databases&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;data&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;tables&amp;#34;&lt;/span&gt;][bit] &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; {&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;searchmode&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;raw&amp;#34;&lt;/span&gt;}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;drop-unwanted-columns&#34;&gt;Drop unwanted columns&lt;/h2&gt;
&lt;p&gt;Not all the columns in pre-harvested datasets are useful or interesting. To remove selected columns from Datasette-Lite, I  added a &lt;code&gt;drop&lt;/code&gt; url parameter. Once again, you can submit multiple values separated by commas.&lt;/p&gt;
&lt;p&gt;In &lt;code&gt;webworker.js&lt;/code&gt; the &lt;code&gt;drop&lt;/code&gt; values are used with the SQLite-utils &lt;code&gt;transform()&lt;/code&gt; function to remove the columns from the database.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;drop_cols &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;$&lt;/span&gt;{JSON&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;stringify(settings&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;dropCols &lt;span style=&#34;color:#f92672&#34;&gt;||&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;)}
db[bit]&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;transform(drop&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;set(drop_cols&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;split(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;,&amp;#34;&lt;/span&gt;)))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This url opens a CSV dataset with oral history metadata harvested from Trove and drops the &lt;code&gt;publisher&lt;/code&gt; and &lt;code&gt;work_type&lt;/code&gt; columns: &lt;a href=&#34;https://glam-workbench.net/datasette-lite/?csv=https://github.com/GLAM-Workbench/trove-oral-histories-data/blob/main/trove-oral-histories.csv&amp;amp;drop=publisher,work_type&#34;&gt;https://glam-workbench.net/datasette-lite/?csv=https://github.com/GLAM-Workbench/trove-oral-histories-data/blob/main/trove-oral-histories.csv&amp;amp;drop=publisher,work_type&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>What’s going on?</title>
      <link>https://updates.timsherratt.org/2024/01/04/whats-going-on.html</link>
      <pubDate>Thu, 04 Jan 2024 18:21:41 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/01/04/whats-going-on.html</guid>
      <description>&lt;p&gt;The hardest part of developing tools and resources like the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; is getting information about them to the people who might benefit. The collapse of Twitter has only added to the difficulty, as has the reluctance of GLAM organisations to share new resources with their users. I’d rather spend my time making new tools, but what’s the point if no-one knows they exist?&lt;/p&gt;
&lt;p&gt;Anyway, I thought I’d do a bit of a communications refresh for the new year. If you’re interested in GLAM Workbench and &lt;a href=&#34;https://tdg.glam-workbench.net/&#34;&gt;Trove Data Guide&lt;/a&gt; updates you can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Keep an eye on the &lt;a href=&#34;https://updates.timsherratt.org/categories/glamworkbench/&#34;&gt;GLAM Workbench&lt;/a&gt; channel of my microblog (or add &lt;a href=&#34;https://updates.timsherratt.org/categories/glamworkbench/feed.xml&#34;&gt;the feed&lt;/a&gt; to your RSS reader)&lt;/li&gt;
&lt;li&gt;Follow the &lt;a href=&#34;https://www.facebook.com/glamworkbench&#34;&gt;GLAM Workbench Facebook page&lt;/a&gt; for cross-posted updates from the RSS feed&lt;/li&gt;
&lt;li&gt;Follow the &lt;a href=&#34;https://www.linkedin.com/company/glam-workbench/&#34;&gt;GLAM Workbench LinkedIn page&lt;/a&gt; for cross-posted updates from the RSS feed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&amp;rsquo;m also working on an email newsletter thing that&amp;rsquo;ll compile the updates at regular intervals.&lt;/p&gt;
&lt;p&gt;For more social socials, as well as questions, requests and problems, you can always find me on Mastodon: &lt;a href=&#34;https://hcommons.social/@wragge&#34;&gt;@wragge@hcommons.social&lt;/a&gt;. My email address is not too hard to find, but, honestly, your chances of getting a reply are slim.&lt;/p&gt;
&lt;p&gt;If you’ve got a bug report, or a suggestion for a new notebook or data source, feel free to &lt;a href=&#34;https://github.com/GLAM-Workbench/glam-workbench.github.io/issues&#34;&gt;create an issue on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;And, of course, everything I do is openly-licensed, so you are very welcome to modify and share! See the GLAM Workbench for &lt;a href=&#34;https://glam-workbench.net/get-involved/&#34;&gt;other ways to get involved&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/gw-hex.png&#34; width=&#34;250&#34; height=&#34;289&#34; alt=&#34;GLAM Workbench Logo&#34;&gt;
</description>
    </item>
    
    <item>
      <title>Exploring oral histories in Trove</title>
      <link>https://updates.timsherratt.org/2024/01/04/exploring-oral-histories.html</link>
      <pubDate>Thu, 04 Jan 2024 12:20:30 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/01/04/exploring-oral-histories.html</guid>
      <description>&lt;p&gt;The National Library of Australia holds over &lt;a href=&#34;https://www.nla.gov.au/collections/what-we-collect/oral-history-and-folklore&#34;&gt;55,000 hours of oral history and folklore recordings&lt;/a&gt; dating back to the 1950s. This collection is being made available online, and many recordings can now be listened to using &lt;a href=&#34;https://trove.nla.gov.au/help/navigating/audio-player&#34;&gt;Trove’s audio player&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;However, the oral history collection is not easy to find in Trove. You need to go the ‘Music, Audio, &amp;amp; Video’ category and check the ‘Sound/Interview, lecture, talk’ format facet. To limit results to oral histories that have been digitised, you can &lt;a href=&#34;https://trove.nla.gov.au/search/category/music?keyword=%22nla.obj%22&amp;amp;l-format=Sound%2FInterview,%20lecture,%20talk&amp;amp;l-availability=y&#34;&gt;add &lt;code&gt;“nla.obj”&lt;/code&gt; to your query and set the ‘Access’ facet to ‘Online’&lt;/a&gt;. But what’s actually &lt;em&gt;in&lt;/em&gt; the oral history collection and what can you do with it?&lt;/p&gt;
&lt;p&gt;To help researchers explore and analyse the NLA’s oral history collection, I’ve added some notebooks to the &lt;a href=&#34;https://glam-workbench.net/trove-music/&#34;&gt;Music, sound, and oral histories&lt;/a&gt; section of the GLAM Workbench:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-music/harvest-oral-histories/&#34;&gt;Harvest oral histories metadata&lt;/a&gt; – harvests metadata describing the NLA&amp;rsquo;s oral history collection from Trove and saves the results as a CSV file&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-music/save-series/&#34;&gt;Save a list of oral history collections and projects&lt;/a&gt; – extracts a list of series from metadata describing oral histories held by the NLA and described in Trove&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-music/download-transcripts/&#34;&gt;Download summaries and transcripts from oral histories&lt;/a&gt; – download all the available transcripts and summaries from digitised oral histories available in Trove&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There’s also a couple of associated datasets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-music/trove-oral-histories/&#34;&gt;NLA oral histories metadata&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-music/trove-oral-history-series/&#34;&gt;List of NLA oral history collections and projects&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;em&gt;Trove Data Guide&lt;/em&gt; uses these datasets &lt;a href=&#34;https://tdg.glam-workbench.net/other-digitised-resources/oral-histories/overview.html&#34;&gt;to create an overview of the collection&lt;/a&gt;. For example, here’s how the oral histories are distributed over time.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/visualization42.png&#34; width=&#34;600&#34; height=&#34;276&#34; alt=&#34;Chart showing the number of oral histories per year and online status&#34;&gt;
&lt;p&gt;And here’s the top ten subjects of digitised oral histories.&lt;/p&gt;
&lt;details class=&#34;hide above-input&#34;&gt;&lt;summary aria-label=&#34;Toggle hidden content&#34;&gt;&lt;span class=&#34;collapsed&#34;&gt;&lt;/span&gt;
&lt;/summary&gt;
&lt;/details&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;subject&lt;/th&gt;
&lt;th&gt;count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Painters &amp;ndash; Australia &amp;ndash; Interviews&lt;/td&gt;
&lt;td&gt;193&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Politicians &amp;ndash; Australia&lt;/td&gt;
&lt;td&gt;192&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prime ministers &amp;ndash; Australia &amp;ndash; Quotations&lt;/td&gt;
&lt;td&gt;188&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Older people &amp;ndash; New South Wales &amp;ndash; Biography&lt;/td&gt;
&lt;td&gt;187&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Menzies, Robert, Sir, 1894-1978. Speeches&lt;/td&gt;
&lt;td&gt;185&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Federal politicians&lt;/td&gt;
&lt;td&gt;184&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Politicians &amp;ndash; Australia &amp;ndash; Quotations&lt;/td&gt;
&lt;td&gt;183&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Australia &amp;ndash; Politics and government &amp;ndash; 1945-1965&lt;/td&gt;
&lt;td&gt;172&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Politicians &amp;ndash; Australia &amp;ndash; Interviews&lt;/td&gt;
&lt;td&gt;171&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Academics&lt;/td&gt;
&lt;td&gt;126&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The Trove Data Guide also includes information on &lt;a href=&#34;https://tdg.glam-workbench.net/other-digitised-resources/oral-histories/accessing-data.html&#34;&gt;the types of data from the oral histories and how you can access it&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Mapping MARC Geographic Area codes to Wikidata</title>
      <link>https://updates.timsherratt.org/2024/01/03/mapping-marc-geographic.html</link>
      <pubDate>Wed, 03 Jan 2024 18:03:48 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/01/03/mapping-marc-geographic.html</guid>
      <description>&lt;p&gt;Trove uses codes from the &lt;a href=&#34;https://www.loc.gov/marc/geoareas/gacs_code.html&#34;&gt;MARC Geographic Areas list&lt;/a&gt; to identify locations in metadata records. I couldn&amp;rsquo;t find any mappings of these codes to other sources of geospatial information, so I fired up &lt;a href=&#34;https://openrefine.org/&#34;&gt;OpenRefine&lt;/a&gt; and reconciled the geographic area names against &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:Main_Page&#34;&gt;Wikidata&lt;/a&gt;. Once I&amp;rsquo;d linked as many as possible, I copied additional information from Wikidata, such as &lt;a href=&#34;https://en.wikipedia.org/wiki/List_of_ISO_3166_country_code&#34;&gt;ISO country codes&lt;/a&gt;, &lt;a href=&#34;https://www.geonames.org/&#34;&gt;GeoNames&lt;/a&gt; identifiers, and geographic coordinates.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve saved the resulting dataset in two formats – as a flattened CSV file (handy for loading as a dataframe), and as a JSON file that uses the geographic area codes as keys (handy for looking up values). You can &lt;a href=&#34;https://github.com/GLAM-Workbench/marc-geographicareas&#34;&gt;download the datasets from this GitHub repository&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve also written the codes back into the Wikidata records, so you can now find them with a &lt;a href=&#34;https://w.wiki/8iKM&#34;&gt;SPARQL query like this&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The columns in the CSV file are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;code&lt;/code&gt; – MARC geographic areas code (without any trailing dashes)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;place&lt;/code&gt; – name of geographic area from the MARC list&lt;/li&gt;
&lt;li&gt;&lt;code&gt;wikidata_label&lt;/code&gt; – name of geographic area from Wikidata&lt;/li&gt;
&lt;li&gt;&lt;code&gt;wikidata_id&lt;/code&gt; – Wikidata identifier&lt;/li&gt;
&lt;li&gt;&lt;code&gt;coordinates&lt;/code&gt; – pair of decimal coordinates in the form &lt;code&gt;latitude,longitude&lt;/code&gt; (multiple values are pipe &lt;code&gt;|&lt;/code&gt; separated)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;iso_country_code&lt;/code&gt; – ISO two letter country code (multiple values are pipe &lt;code&gt;|&lt;/code&gt; separated)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;iso_numeric_country_code&lt;/code&gt; – ISO numeric country code (multiple values are pipe &lt;code&gt;|&lt;/code&gt; separated)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;geonames_id&lt;/code&gt; – GeoNames identifier (multiple values are pipe &lt;code&gt;|&lt;/code&gt; separated)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note that some fields can contain multiple values. For example the area &lt;code&gt;Mediterranean Region&lt;/code&gt; is linked to 22 countries, so there will be multiple values in the ISO code fields.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-01-03-18-01-07.png&#34; width=&#34;600&#34; height=&#34;335&#34; alt=&#34;Choropleth map showing the countries associated with oral history records&#34;&gt;
&lt;p&gt;For an example of this dataset in use, see &lt;a href=&#34;https://tdg.glam-workbench.net/other-digitised-resources/oral-histories/overview.html#which-countries-do-the-oral-histories-relate-to&#34;&gt;Which countries do the oral histories relate to?&lt;/a&gt; in the &lt;em&gt;Trove Data Guide&lt;/em&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>National Archives of Australia in 2023 – digitisation of files</title>
      <link>https://updates.timsherratt.org/2024/01/03/national-archives-of.html</link>
      <pubDate>Wed, 03 Jan 2024 11:18:21 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/01/03/national-archives-of.html</guid>
      <description>&lt;p&gt;In 2023 the National Archives of Australia digitised 416,602 files (down from 575,597 in 2022). This chart shows the number of files digitised per day in 2023.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-01-03-10-59-46.png&#34; width=&#34;600&#34; height=&#34;250&#34; alt=&#34;Chart showing the number of files digitised per day across 2023&#34;&gt;
&lt;p&gt;These files were drawn from 1,423 different series, but the vast bulk (81%) were from 4 series of World War Two service records. (This &lt;a href=&#34;https://www.naa.gov.au/about-us/media-and-publications/media-releases/large-scale-effort-sees-1-million-second-world-war-records-digitised&#34;&gt;media release&lt;/a&gt; includes some details about the funding of the WW2 digitisation.)&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s the top twenty series by number of items digitised in 2023.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;series&lt;/th&gt;
&lt;th&gt;series_title&lt;/th&gt;
&lt;th&gt;total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;B883&lt;/td&gt;
&lt;td&gt;Second Australian Imperial Force Personnel Dossiers, 1939-1947&lt;/td&gt;
&lt;td&gt;201,511&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A9301&lt;/td&gt;
&lt;td&gt;RAAF Personnel files of Non-Commissioned Officers (NCOs) and other ranks, 1921-1948&lt;/td&gt;
&lt;td&gt;111,673&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A9300&lt;/td&gt;
&lt;td&gt;RAAF Officers Personnel files, 1921-1948&lt;/td&gt;
&lt;td&gt;14,125&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B884&lt;/td&gt;
&lt;td&gt;Citizen Military Forces Personnel Dossiers, 1939-1947&lt;/td&gt;
&lt;td&gt;11,265&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A14435&lt;/td&gt;
&lt;td&gt;Stanley Fowler photographs showing the Australian fishing industry and coastline, numerical series with ‘LA’prefix&lt;/td&gt;
&lt;td&gt;10,512&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D3481&lt;/td&gt;
&lt;td&gt;Photographs (black and white, colour) of buildings, installations, sites, etc&lt;/td&gt;
&lt;td&gt;8,295&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A1&lt;/td&gt;
&lt;td&gt;Correspondence files, annual single number series [Main correspondence files series of the agency]&lt;/td&gt;
&lt;td&gt;7,571&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;K1145&lt;/td&gt;
&lt;td&gt;Certificates of Exemption from Dictation Test, annual certificate number order&lt;/td&gt;
&lt;td&gt;4,169&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A13150&lt;/td&gt;
&lt;td&gt;Specifications, examiners reports and correspondence relating to the Registration of Victorian Patents - Second system&lt;/td&gt;
&lt;td&gt;3,941&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;J853&lt;/td&gt;
&lt;td&gt;Architectural  plans, annual single number series with alpha (denoting Papua New Guinea and discipline) prefix and/or alpha/numeric (denoting size and  amendment) suffix&lt;/td&gt;
&lt;td&gt;3,322&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A2571&lt;/td&gt;
&lt;td&gt;Name Index Cards, Migrants Registration [Bonegilla]&lt;/td&gt;
&lt;td&gt;2,204&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D1423&lt;/td&gt;
&lt;td&gt;Original plans (negatives), single number series with alpha prefix denoting discipline&lt;/td&gt;
&lt;td&gt;2,102&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AP67/1&lt;/td&gt;
&lt;td&gt;Personal documents of British migrants (including ex-service) in receipt of free and assisted passages&lt;/td&gt;
&lt;td&gt;2,058&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E1652&lt;/td&gt;
&lt;td&gt;Northern Territory Pastoral Applications (Pastoral Claims)&lt;/td&gt;
&lt;td&gt;1,822&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D5440&lt;/td&gt;
&lt;td&gt;Photographs of post office buildings, personnel and equipment, single number series (with variations)&lt;/td&gt;
&lt;td&gt;1,488&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;C609&lt;/td&gt;
&lt;td&gt;Payment cards for employees&#39; entitlements claims, alphabetical series&lt;/td&gt;
&lt;td&gt;1,317&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B3104&lt;/td&gt;
&lt;td&gt;Photographs, Trans-Australian Railway, single number series&lt;/td&gt;
&lt;td&gt;1,222&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MP1117/2&lt;/td&gt;
&lt;td&gt;Microfilm reels of RAAF Engineering Drawings&lt;/td&gt;
&lt;td&gt;1,168&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A2572&lt;/td&gt;
&lt;td&gt;Name Index Cards, Migrants Registration [Bonegilla]&lt;/td&gt;
&lt;td&gt;1,130&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B6295&lt;/td&gt;
&lt;td&gt;Photographs and negatives of Commonwealth building sites and Works departmental activities, single number series&lt;/td&gt;
&lt;td&gt;1,113&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;For more data, see the &lt;a href=&#34;https://github.com/wragge/naa-recently-digitised&#34;&gt;naa-recently-digitised&lt;/a&gt; GitHub repository which runs a process every Sunday to save details of files digitised in the previous week.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Trove newspapers in 2023</title>
      <link>https://updates.timsherratt.org/2024/01/02/trove-newspapers-in.html</link>
      <pubDate>Tue, 02 Jan 2024 17:56:58 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2024/01/02/trove-newspapers-in.html</guid>
      <description>&lt;p&gt;I’ve been capturing weekly snapshots of the Trove newspaper corpus for the last couple of years. You can see the latest results in the &lt;a href=&#34;https://wragge.github.io/trove-newspaper-totals/&#34;&gt;Trove Newspaper Data Dashboard&lt;/a&gt;. Using this data I’ve compiled a quick summary of changes over the last year.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;7,518,764&lt;/strong&gt;  digitised newspaper articles were added to Trove in 2023. The total  number of articles increased from 236,530,127 to 244,048,891. The chart below shows how the number of articles varied across the year. You&amp;rsquo;ll notice that the rate of digitisation increased about the same  time the government announced new funding for Trove. Were more articles  digitised because of the funding, or were articles in the digitisation  pipeline held back until the funding was announced? Or both?&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2023-12-31-17-57-21.png&#34; width=&#34;600&#34; height=&#34;306&#34; alt=&#34;Number of digitised newspaper articles in Trove by week, 2023&#34;&gt;
&lt;p&gt;Most  of the new articles were published in either Victoria or NSW – both   these states had an increase of more than 3 million articles each! There  were smaller increases for WA and SA. This chart shows the distribution of articles by state.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2023-12-31-18-37-59.png&#34; width=&#34;600&#34; height=&#34;239&#34; alt=&#34;Change in the number of articles in 2023 by state&#34;&gt;
&lt;p&gt;Fifty-seven new newspaper titles were added to Trove in 2023:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1832&#34;&gt;Bairnsdale Advertiser and East Gippsland Stock and Station Journal (Vic. : 1946 - 1954)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1864&#34;&gt;Cessnock Express and Mining and Farming Representative (NSW : 1905 - 1910; 1925 - 1928)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1824&#34;&gt;Coolamon Echo (NSW : 1898 - 1905)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1867&#34;&gt;Coolamon Farmers&#39; Review (NSW : 1910 - 1917)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1866&#34;&gt;Coolamon-Ganmain Farmers&#39; Review (NSW : 1906 - 1910)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1846&#34;&gt;Daily Commercial News and Shipping List (Perth, WA : 1927 - 1934)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1852&#34;&gt;Daily Mirror (Sydney, NSW : 1941 - 1955)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1869&#34;&gt;Direct Action (Adelaide, SA : 1928 - 1930)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1849&#34;&gt;Dubbo Dispatch (NSW : 1942)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1841&#34;&gt;Essendon Gazette (Moonee Ponds, Vic. : 1900 - 1905)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1826&#34;&gt;Essendon and Flemington Chronicle (Vic. : 1882; 1884 - 1894)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1851&#34;&gt;Hills Gazette (Port Adelaide, SA: 1973 - 1984)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1838&#34;&gt;Kyabram Free Press (Vic. : 1892 - 1894)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1293&#34;&gt;Lawloit Times (Kaniva, Vic. : 1910 - 1929) &lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1828&#34;&gt;Myrtleford Times and Ovens Valley Advertiser (Vic. : 1930 - 1955)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1856&#34;&gt;Northern Planter (Ingham, Qld. : 1907 - 1908)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1835&#34;&gt;Omeo Standard (Vic. : 1928)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1860&#34;&gt;Pastoral Times and Deniliquin and Echuca Chronicle (NSW : 1862)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1812&#34;&gt;Pastoral Times and Echuca and Moama Chronicle (Deniliquin, NSW : 1863 - 1866)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1829&#34;&gt;Rutherglen Miner and Howlong and Wahgunyah Times (Vic. : 1903 - 1912)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1854&#34;&gt;South Sydney News (NSW : 1940)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1853&#34;&gt;South Sydney Sentinel (NSW : 1932 - 1935)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1842&#34;&gt;Sportsman (Perth, WA : 1903)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1865&#34;&gt;The Araluen Star and Miners&#39; Right (NSW : 1863 - 1864)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1833&#34;&gt;The Bairnsdale Liberal News and North Gippsland District Advertiser (Vic. : 1879 - 1880)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1878&#34;&gt;The Braidwood Express and People&amp;rsquo;s Advocate (NSW : 1904 - 1907)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1850&#34;&gt;The Braidwood and Araluen Express (NSW : 1899 - 1907)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1877&#34;&gt;The Braidwood and Araluen Express and People&amp;rsquo;s Advocate (NSW : 1899 - 1904)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1831&#34;&gt;The Corryong Courier and Walwa District News (Vic. : 1946 - 1952)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1827&#34;&gt;The Essendon Gazette and Keilor, Bulla and Broadmeadows Reporter (Moonee Ponds, Vic. : 1888 - 1900)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1834&#34;&gt;The Gippsland Daily News (Bairnsdale, Vic. : 1890 - 1894)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1870&#34;&gt;The Gippsland Farmers&#39; and Glengarry, Toongabbie and Cowwarr Journal (Traralgon, Vic. : 1922 - 1923)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1848&#34;&gt;The Hampden Guardian and Western Province Advertiser (Camperdown, Vic. : 1871 - 1872 ; 1874 - 1877)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1840&#34;&gt;The Irishman (Melbourne, Vic. : 1872 - 1873)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1871&#34;&gt;The Journal : Glengarry, Toongabbie and Cowwarr Journal (Traralgon, Vic. : 1923 - 1929)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1815&#34;&gt;The Manning River Chronicle (Wingham, NSW : 1886 - 1888) &lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1830&#34;&gt;The North Eastern Despatch (Wangaratta, Vic. : 1907 - 1913)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1813&#34;&gt;The Pastoral Times (South Deniliquin, NSW : 1866 - 1950&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1858&#34;&gt;The Pastoral Times : incorporated with the Southern Courier (South Deniliquin, NSW : 1861)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1857&#34;&gt;The Pastoral Times and Deniliquin Telegraph (Deniliquin, NSW : 1859 - 1861)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1861&#34;&gt;The Pastoral Times and Deniliquin and Moama Reporter (NSW : 1863)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1859&#34;&gt;The Pastoral Times and Southern Courier (Deniliquin, N.S.W : 1861 - 1862)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1874&#34;&gt;The People&amp;rsquo;s Weekly (Moonta, SA: 1890 - 1926)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1727&#34;&gt;The Rutherglen Sun (Vic. : 1885)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1728&#34;&gt;The Rutherglen Sun and Murray Valley Advertiser (Vic. : 1886)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1836&#34;&gt;The Shepparton Advertiser and Moira and Rodney Farmers&#39; Chronicle (Vic. : 1886 - 1887)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1511&#34;&gt;The Skipton Standard and Streatham Gazette (Vic. : 1914 - 1928)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1843&#34;&gt;The Sportsman (Perth, WA : 1904)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1873&#34;&gt;The Standard (Port Adelaide, SA : 1959 - 1965)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1512&#34;&gt;The Tatura Guardian (Vic. : 1895 - 1903)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1823&#34;&gt;The Wingham Chronicle and Manning Advertiser (NSW : 1888)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1862&#34;&gt;The Yarrawonga Mercury and Lake Rowan, Tungamah and Mulwala (N.S.W.) News (Vic.  : 1882)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1863&#34;&gt;The Yarrawonga Mercury and Mulwala (N.S.W.) News (Vic. : 1882 - 1897)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1825&#34;&gt;The Yarrawonga Mercury and Southern Riverina Advertiser (Vic. : 1897 - 1905; 1913 - 1920)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1837&#34;&gt;Tungamah and Lake Rowan Express (Vic. : 1882 - 1883)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1847&#34;&gt;Western Press and Camperdown, Colac, Mortlake and Terang Representative (Vic. : 1866-1867 ; 1870)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/1839&#34;&gt;Woodend Star and Macedon Advocate (Vic. : 1942 - 1955)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Trove Data Guide update – accessing data from newspapers and gazettes</title>
      <link>https://updates.timsherratt.org/2023/09/15/trove-data-guide.html</link>
      <pubDate>Fri, 15 Sep 2023 15:53:20 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2023/09/15/trove-data-guide.html</guid>
      <description>&lt;p&gt;I’m continuing to slog away at the &lt;a href=&#34;https://wragge.github.io/trove-data-guide/home.html&#34;&gt;Trove Data Guide&lt;/a&gt; (part of the &lt;a href=&#34;https://ardc.edu.au/project/ardc-community-data-lab/&#34;&gt;ARDC’s HASS Community Data Lab&lt;/a&gt;) – dumping everything I know about Trove into a format that I hope will be useful for researchers.&lt;/p&gt;
&lt;p&gt;I’ve just finished a first pass through the section on &lt;a href=&#34;https://wragge.github.io/trove-data-guide/accessing-data/newspapers-and-gazettes.html&#34;&gt;accessing data from newspapers and gazettes&lt;/a&gt;, and it’s online if you want to have a look. There’s still lots of things to add, update, and reorganise, but getting the basic content of the section defined is a bit of a milestone, so I’ll allow myself a little moment of celebration. Yay!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2023/screenshot-from-2023-09-15-14-47-24.png&#34; width=&#34;600&#34; height=&#34;466&#34; alt=&#34;Screencapture from the Trove Data Guide&#34;&gt;
&lt;p&gt;Of course it took longer than I expected, but that’s largely due to the fact that I was sketching out related sections as I went along. You’ll see lots of pages in the navigation that only contain a list of dot points, but they’ll get filled out over the next couple of months.&lt;/p&gt;
&lt;p&gt;The ‘Accessing data’ section is going to be the most code heavy, as it’s focused on using the API to develop reusable methods for harvesting machine-readable data. Other sections, such as ‘Understanding search’ and ‘Collections and contexts’ will be more discursive, aimed at helping all Trove users better understand what Trove is and how it works.&lt;/p&gt;
&lt;p&gt;Comments and suggestions are welcome! You can &lt;a href=&#34;https://github.com/wragge/trove-data-guide/issues&#34;&gt;add issues on GitHub&lt;/a&gt;, or &lt;a href=&#34;https://web.hypothes.is/&#34;&gt;use Hypothes.is&lt;/a&gt; to annotate the text.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Some important updates for the Trove Newspaper &amp; Gazette Harvester </title>
      <link>https://updates.timsherratt.org/2023/08/31/some-important-updates.html</link>
      <pubDate>Thu, 31 Aug 2023 18:00:54 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2023/08/31/some-important-updates.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-api-v3/&#34;&gt;Version 3&lt;/a&gt; of the Trove API is out, and version 2 is scheduled to be decommissioned in early 2023 – that means I have a lot of code to update! First cab of the rank is the &lt;a href=&#34;https://wragge.github.io/trove-newspaper-harvester/&#34;&gt;Trove Newspaper &amp;amp; Gazette Harvester&lt;/a&gt; with version &lt;a href=&#34;https://github.com/wragge/trove-newspaper-harvester/releases/tag/v0.7.1&#34;&gt;0.7.1&lt;/a&gt; now available.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2023/screenshot-from-2023-08-31-16-54-15.png&#34; width=&#34;600&#34; height=&#34;543&#34; alt=&#34;Screenshot of the Trove Newspaper and Gazette Harvester documentation page.&#34;&gt;
&lt;p&gt;The Harvester is a Python package that can be used as either a library or a command-line tool. It’s been around in some form for more than 10 years. The latest updates include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;support for version 3 of the Trove API&lt;/li&gt;
&lt;li&gt;automatic creation of a metadata file describing each harvest according to the &lt;a href=&#34;https://www.researchobject.org/ro-crate/&#34;&gt;RO-Crate&lt;/a&gt; format&lt;/li&gt;
&lt;li&gt;automatic creation of a harvester config file, capturing the query parameters sent to Trove as well as the Harvester options&lt;/li&gt;
&lt;li&gt;the ability to initiate a harvest from an existing config file&lt;/li&gt;
&lt;li&gt;more memory-friendly generation of CSV result files (no loading everything into Pandas)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.researchobject.org/ro-crate/&#34;&gt;RO-Crate&lt;/a&gt; integration was part of my work for the ARDC’s &lt;a href=&#34;https://ardc.edu.au/project/hass-community-data-lab/&#34;&gt;HASS Community Data Lab&lt;/a&gt;. The Harvester was already generating a simple metadata file that captured &lt;em&gt;some&lt;/em&gt; of the harvest parameters, but now it documents the context of the harvest in much more detail, and saves it in a standard, Linked Open Data based, format.&lt;/p&gt;
&lt;p&gt;Every harvest now creates an &lt;code&gt;ro-crate-metadata.json&lt;/code&gt; file. This  file includes details of the datasets created by the Harvester, such as the &lt;code&gt;results.csv&lt;/code&gt; file that includes article metadata, and the &lt;code&gt;text&lt;/code&gt; directory that contains the OCRd text. It also captures contextual information about the Harvester itself. The Harvester and the datasets are linked through a &lt;code&gt;CreateAction&lt;/code&gt; that describes the harvesting process. The &lt;code&gt;harvester_config.json&lt;/code&gt; file that saves the query parameters and Harvester options is also linked as an input to this process. In this way, all the components of the harvest are described and linked.&lt;/p&gt;
&lt;p&gt;Here’s an &lt;a href=&#34;https://gist.github.com/wragge/23f88bf4c5945174a0986dad91691b06&#34;&gt;example RO-Crate file&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Trove is changing all the time. By capturing information such as the query, the harvester version, the date, and the number of results, the RO-Crate file will help researchers document, manage, and share their research. And now that you can start a new harvest with an existing config file, it’s easy for researchers to re-run a harvest to see what changes over time.&lt;/p&gt;
&lt;p&gt;As well as updating the Python package, I’ve also updated the &lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;Trove Newspaper &amp;amp; Gazette Harvester&lt;/a&gt; section of the GLAM Workbench. Here you’ll find examples of the Harvester in action, as well as some ways of exploring the harvested data. If you’d like to take the Harvester for a spin, the easiest way to start is with &lt;a href=&#34;https://glam-workbench.net/trove-harvester/harvester-web-app/&#34;&gt;web app version&lt;/a&gt; – no software to install, no code to navigate! If you’re an Australian university researcher you can spin it up on the new &lt;a href=&#34;https://updates.timsherratt.org/2023/08/31/run-glam-workbench.html&#34;&gt;ARDC Binder service&lt;/a&gt; in seconds.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Run GLAM Workbench notebooks on the ARDC’s new Binder service</title>
      <link>https://updates.timsherratt.org/2023/08/31/run-glam-workbench.html</link>
      <pubDate>Thu, 31 Aug 2023 13:52:11 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2023/08/31/run-glam-workbench.html</guid>
      <description>&lt;p&gt;There are a number of different ways to &lt;a href=&#34;https://glam-workbench.net/getting-started/#using-the-glam-workbench&#34;&gt;run the Jupyter notebooks&lt;/a&gt; in the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; depending on your needs and technical skills. But the easiest and quickest has always been the public, &lt;a href=&#34;https://mybinder.org/&#34;&gt;international Binder service&lt;/a&gt;, based in Europe. One click in the GLAM Workbench and Binder prepares a customised computing environment and loads up the Jupyter notebooks ready for you to explore. Unfortunately, the public Binder service has been having some capacity issues in the last few months, and sometimes repositories fail to run. The good news is that Australian university researchers now have another option with the launch of the &lt;a href=&#34;https://ardc.edu.au/services/ardc-nectar-research-cloud/ardc-binderhub-service/&#34;&gt;Australian Research Data Commons Binder service&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;The big difference between the ARDC’s Binder service and the international version is that you need to log in using your university credentials. While that’s an extra hassle, the service itself should be faster and more reliable for Australian researchers. For this reason, I’ve started making ARDC Binder links the default in a number of GLAM Workbench sections. Of course, not all GLAM Workbench users are attached to Australian universities, so the international Binder links remain – it’s just a matter of emphasis.&lt;/p&gt;
&lt;p&gt;For example, near the top of many GLAM Workbench pages you’ll see &lt;strong&gt;Explore live on Binder&lt;/strong&gt; buttons that launch the current repository. I’ve now added an &lt;strong&gt;Explore live on ARDC Binder&lt;/strong&gt; option.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2023/screenshot-from-2023-08-31-12-36-10.png&#34; width=&#34;600&#34; alt=&#34;Screenshot from GLAM Workbench page showing two buttons –  the first is labelled *Explore live on ARDC Binder*, while the second is *Explore live on Binder*&#34;&gt;
&lt;p&gt;Most notebooks in the GLAM Workbench now have their own documentation page with a big blue button to launch the notebook on Binder. I’ve started changing these buttons to use the ARDC Binder service but, as you can see in the screenshot below, there’s also a link to run the notebook on the original Binder service, with no authentication required.&lt;/p&gt;
&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2023/f23a54bfbf.png&#34; width=&#34;600&#34; height=&#34;397&#34; alt=&#34;Screenshot of a notebook documentation page showing the big blue *Run live on ARDC Binder button*&#34;&gt;
&lt;p&gt;I’ve added some &lt;a href=&#34;https://glam-workbench.net/using-ardc-binder/&#34;&gt;information on using the ARDC Binder service&lt;/a&gt; to the GLAM Workbench help pages.&lt;/p&gt;
&lt;p&gt;I’ll be continuing to explore new options for running GLAM Workbench notebooks (I’m particularly interested in the possibilities of &lt;a href=&#34;https://jupyterlite.readthedocs.io/en/latest/&#34;&gt;Jupyter Lite&lt;/a&gt;). Also the ARDC’s &lt;a href=&#34;https://ardc.edu.au/project/ardc-community-data-lab/&#34;&gt;HASS Community Data Lab project&lt;/a&gt; is currently investigating ways of adding more authentication options to the Binder service to open it up to researchers outside of universities.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Trove Query Parser updated!</title>
      <link>https://updates.timsherratt.org/2023/08/26/trove-query-parser.html</link>
      <pubDate>Sat, 26 Aug 2023 18:21:23 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2023/08/26/trove-query-parser.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve just updated the &lt;a href=&#34;https://github.com/wragge/trove_query_parser/&#34;&gt;Trove Query Parser&lt;/a&gt; to work with version 3 of the Trove API. You just give it the url of a search in Trove&amp;rsquo;s newspapers, and it translates the search into a set of parameters that the API will understand. So this:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;parse_query(&amp;quot;https://trove.nla.gov.au/search/category/newspapers?keyword=wragge&amp;amp;l-artType=newspapers&amp;amp;l-state=Queensland&amp;amp;l-category=Article&amp;amp;l-illustrationType=Cartoon&amp;quot;, 3)&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Produces this:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;{&#39;q&#39;: &#39;wragge&#39;, &#39;l-artType&#39;: &#39;newspapers&#39;, &#39;l-state&#39;: [&#39;Queensland&#39;], &#39;l-category&#39;: [&#39;Article&#39;], &#39;l-illustrated&#39;: &#39;true&#39;, &#39;l-illustrationType&#39;: [&#39;Cartoon&#39;], &#39;category&#39;: &#39;newspaper&#39;}&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;You can then feed the parameters to the Trove API with your API key and you&amp;rsquo;ll get data back. Easy! It&amp;rsquo;s simple but handy – I use the Query Parser in other tools like the &lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;Trove Newspaper Harvester&lt;/a&gt; and &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/querypic/&#34;&gt;QueryPic&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This version adds a second parameter to the &lt;code&gt;parse_query()&lt;/code&gt; function so you can specify the version of the Trove API you&amp;rsquo;re using. The default value is &lt;code&gt;2&lt;/code&gt; for backwards compatibility. See &lt;a href=&#34;https://wragge.github.io/trove_query_parser/&#34;&gt;the documentation&lt;/a&gt; for more information.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Family history resources in the GLAM Workbench</title>
      <link>https://updates.timsherratt.org/2023/08/18/family-history-resources.html</link>
      <pubDate>Fri, 18 Aug 2023 15:06:21 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2023/08/18/family-history-resources.html</guid>
      <description>&lt;p&gt;It’s Family History Month, so I thought a brief post was in order describing some of the family history related resources in the GLAM Workbench.&lt;/p&gt;
&lt;h2 id=&#34;glam-name-index-search&#34;&gt;GLAM Name Index Search&lt;/h2&gt;
&lt;p&gt;This is the biggie (in more ways than one). I’ve brought 263 datasets from 10 Australian GLAM organisations together into a single search interface. All these datasets index collections by people’s names, so with one search you can find information about individuals across a broad range of records, locations, and periods. There’s more than 10 million rows of data to explore!&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;&lt;strong&gt;Try it now!&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2023/screenshot-from-2023-08-18-13-19-53.png&#34; width=&#34;600&#34; height=&#34;809&#34; alt=&#34;Screenshot of the home page of the GLAM Name Index Search showing the list of GLAM organisations contributing data.&#34;&gt;
&lt;h2 id=&#34;nsw-post-office-and-sydney-telephone-directories&#34;&gt;NSW Post Office and Sydney Telephone Directories&lt;/h2&gt;
&lt;p&gt;Many volumes of the NSW Post Office and Sydney Telephone Directories have been digitised and made available through Trove. However, they’re not easy to search. I’ve taken the text from these volumes and indexed it by line to make it easier to find people and places. There’s two databases to explore:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/nsw-post-office-directories/&#34;&gt;&lt;strong&gt;New South Wales Post Office Directories&lt;/strong&gt;&lt;/a&gt; (54 volumes from 1886 to 1950)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/sydney-telephone-directories/&#34;&gt;&lt;strong&gt;Sydney Telephone Directories&lt;/strong&gt;&lt;/a&gt; (44 volumes from 1926 to 1954)&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2022/8c1b80625f.png&#34; width=&#34;600&#34; height=&#34;335&#34; alt=&#34;Screenshot of the NSW Post Office Directories search, showing how matching results on a page are displayed.&#34;&gt;
&lt;h2 id=&#34;tasmanian-post-office-directories&#34;&gt;Tasmanian Post Office Directories&lt;/h2&gt;
&lt;p&gt;The Tasmanian Post Office Directories have been digitised by Libraries Tasmania, but each volume is available as a separate PDF, making it difficult to search across the collection. I’ve downloaded all the PDFs, extracted the text and images, and indexed the content by line. Now you can search all &lt;strong&gt;48 volumes,  from 1890 to 1948&lt;/strong&gt;, at once!&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/tasmanian-post-office-directories/&#34;&gt;&lt;strong&gt;Try it now!&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2022/91aff3faba.png&#34; width=&#34;600&#34; height=&#34;326&#34; alt=&#34;Screenshot of the search screen for Tasmanian Post Office Directories.&#34;&gt;
&lt;h2 id=&#34;trove-places&#34;&gt;Trove Places&lt;/h2&gt;
&lt;p&gt;If your family history research is focused on a particular region, it can be useful to know what newspapers from that region are digitised in Trove. Trove Places is a map interface to Trove’s digitised newspapers, just click on a place to find newspapers published or distributed nearby.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://troveplaces.herokuapp.com/map/&#34;&gt;&lt;strong&gt;Try it now!&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2022/cccbceaddd.png&#34; width=&#34;600&#34; height=&#34;316&#34; alt=&#34;Screenshot of Trove Places showing how a search for ‘Walhalla&#39; in Victoria displays a number of markers on the map, each marker providing information about Trove newspapers published or distributed in that location.&#34;&gt;
&lt;h2 id=&#34;save-a-trove-newspaper-article-as-an-image&#34;&gt;Save a Trove newspaper article as an image&lt;/h2&gt;
&lt;p&gt;You might have noticed that Trove’s download option for digitised newspaper articles can slice articles up in unfortunate ways. This simple web app saves the complete article as a single image (or multiple images if the article is split across different pages). Simple, but very useful.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/Save-Trove-newspaper-article-as-image-app/&#34;&gt;&lt;strong&gt;Try it now!&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2024/screenshot-from-2024-04-18-12-08-39.png&#34; width=&#34;600&#34; height=&#34;410&#34; alt=&#34;Screenshot of the web app to save a newspaper article as an image&#34;&gt;
&lt;h2 id=&#34;trove-newspapers-data-dashboard&#34;&gt;Trove Newspapers Data Dashboard&lt;/h2&gt;
&lt;p&gt;Just about every week more digitised newspaper articles are added to Trove. The search you tried a couple of months ago might now produce different results. How can you keep track of these changes? The Trove Data Dashboard uses weekly snapshots of the digitised newspapers to visualise changes over time. It’s updated every Sunday!&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://wragge.github.io/trove-newspaper-totals/&#34;&gt;&lt;strong&gt;Try it now!&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2023/screenshot-from-2023-08-18-13-56-41.png&#34; width=&#34;600&#34; height=&#34;470&#34; alt=&#34;Screenshot of Trove Data Dashboard showing how the total number of newspaper articles changes over time.&#34;&gt;
&lt;h2 id=&#34;and-more&#34;&gt;And more!&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;&lt;strong&gt;GLAM Workbench&lt;/strong&gt;&lt;/a&gt; provides a wide range of tools and examples to help you work with data from libraries, archives, and museums.&lt;/p&gt;
&lt;p&gt;The resources above cost money to keep online. If you find them useful, you might like to &lt;a href=&#34;https://github.com/sponsors/wragge&#34;&gt;sponsor me on GitHub&lt;/a&gt; or &lt;a href=&#34;https://www.buymeacoffee.com/wragge&#34;&gt;buy me a coffee&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;2024-update&#34;&gt;2024 update!&lt;/h2&gt;
&lt;p&gt;If you want to know more about using Trove, head to the new &lt;a href=&#34;https://tdg.glam-workbench.net/home.html&#34;&gt;Trove Data Guide&lt;/a&gt;. The Guide explores the different types of data available from Trove – what it is, how you can access it, and what you can do with it. It aims to give users a critical understanding of Trove data, both its limits and its possibilities.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Exploring the front pages of newspapers (10 years on)</title>
      <link>https://updates.timsherratt.org/2023/08/08/exploring-the-front.html</link>
      <pubDate>Tue, 08 Aug 2023 17:28:18 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2023/08/08/exploring-the-front.html</guid>
      <description>&lt;p&gt;Way back in 2012, I used the &lt;em&gt;brand new&lt;/em&gt; Trove API to download the details of 4 million  articles published on the front pages of newspapers. I did it for two reasons: first, I wanted to see how the content of front pages changed over time; and second, I wanted to show that large-scale data wrangling was entirely possible using nothing more than a laptop and a home broadband connection. I described my adventures in &lt;a href=&#34;https://discontents.com.au/4-million-articles-later/index.html&#34;&gt;this blog post&lt;/a&gt;, but if you look at it now you’ll see lots of sad, empty boxes where live charts used to be. This is because I shared my results though a custom web application which, 10 years later, seems like a really, really dumb idea. Needless to say the app fell foul of web hosting changes and is no more. Nowadays I use GitHub and Jupyter notebooks to share my data noodling, so I thought it was time to revisit the topic of newspaper front pages, and create something a bit more robust.&lt;/p&gt;
&lt;p&gt;If you just want the data and code, head over to the &lt;a href=&#34;https://github.com/wragge/newspaper-front-pages/&#34;&gt;GitHub repository&lt;/a&gt;. I’ve shared the notebooks used to &lt;a href=&#34;https://github.com/wragge/newspaper-front-pages/blob/main/large-harvest-example.ipynb&#34;&gt;harvest&lt;/a&gt;, &lt;a href=&#34;https://github.com/wragge/newspaper-front-pages/blob/main/convert-front-pages-harvest.ipynb&#34;&gt;convert&lt;/a&gt;, and &lt;a href=&#34;https://github.com/wragge/newspaper-front-pages/blob/main/explore-front-pages.ipynb&#34;&gt;explore&lt;/a&gt; the data, as well as two parquet formatted datasets.&lt;/p&gt;
&lt;p&gt;As you might expect, there are a &lt;strong&gt;lot more&lt;/strong&gt; articles now, as new articles and newspapers are added to Trove every week (see the &lt;a href=&#34;https://wragge.github.io/trove-newspaper-totals/&#34;&gt;Trove Newspaper Data Dashboard&lt;/a&gt; for details). Instead of 4 million articles, I ended up downloading details of more than &lt;strong&gt;19 million articles&lt;/strong&gt;! I used my trusty &lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;Trove Newspaper Harvester&lt;/a&gt; as a library so I could more easily manage the way the data was saved. After several days I had a 14.6gb newline-delimited JSON file. I pared the data down to only the necessary fields, and used DuckDB to create two parquet files – &lt;a href=&#34;https://github.com/wragge/newspaper-front-pages/blob/main/front_pages.parquet&#34;&gt;one with the article data&lt;/a&gt;, and &lt;a href=&#34;https://github.com/wragge/newspaper-front-pages/blob/main/front_pages_totals.parquet&#34;&gt;another that added up the number of words on each page in the different article categories&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;One nifty thing about the Trove API is that it tells you the number of words in each newspaper article. Articles are also assigned to a series of categories, such as ‘Article’ (your standard news-type piece) and ‘Advertising’. By adding up the number of words in each category I could explore how the front page’s mix of articles and advertising changed over time. Here, for example, is what happened to the front page of the &lt;em&gt;Sydney Morning Herald&lt;/em&gt;.&lt;/p&gt;
&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2023/screenshot-from-2023-08-08-15-55-41.png&#34; width=&#34;600&#34; height=&#34;411&#34; alt=&#34;Screenshot from Jupyter notebook displaying two charts that show changed in the number of words in articles and advertising on front pages over time&#34;&gt;
&lt;p&gt;The first chart shows the average number of words per page in the ‘Article’ and ‘Advertising’ categories across the full span of the &lt;em&gt;SMH’s&lt;/em&gt; publication run digitised in Trove (advertising is blue, and articles orange). The second focuses in on the point where the number of words in articles overtakes the number of words in advertisements. You can see that for most of the publication run, the front page was dominated by advertising. But when change came it was quite abrupt. Sydney-siders awoke on 15 April 1944 to a new look newspaper. Here’s what the front page looked like at the beginning and end of the period represented by the second chart.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2023/screenshot-from-2023-08-08-16-06-55.png&#34; width=&#34;600&#34; height=&#34;371&#34; alt=&#34;Images of the SMH’s front page on 1 January 1943 and 31 December 1946&#34;&gt;
&lt;p&gt;Different newspapers have different histories. I’ve created &lt;a href=&#34;https://nbviewer.org/github/wragge/newspaper-front-pages/blob/main/explore-front-pages.ipynb&#34;&gt;a notebook that generates more of these visualisations&lt;/a&gt; for a range of newspapers, and tries to provide a bit more context around the changes taking place.&lt;/p&gt;
&lt;p&gt;As I continue my work on the &lt;a href=&#34;https://wragge.github.io/trove-data-guide/home.html&#34;&gt;Trove Data Guide&lt;/a&gt; for the &lt;a href=&#34;https://ardc.edu.au/project/hass-community-data-lab/&#34;&gt;HASS Community Data Lab&lt;/a&gt;, I’m discovering more and more inconsistencies, oddities, and undocumented possibilities. As I was finishing up the work on front pages, I realised there was another way of exploring the same topic – by looking at the &lt;em&gt;space&lt;/em&gt; articles take up. This data isn’t in the API, but it can be scraped from the web site. Hmmm, interesting…&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Trove API Console updates</title>
      <link>https://updates.timsherratt.org/2023/07/18/trove-api-console.html</link>
      <pubDate>Tue, 18 Jul 2023 12:27:11 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2023/07/18/trove-api-console.html</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://troveconsole.herokuapp.com/&#34;&gt;Trove API Console&lt;/a&gt; provides examples of the Trove API in action that you can run, edit, and share. It’s been online for 9 years now, and I’ve just updated it to use version 3 of the Trove API by default. I’ve also added a new ‘Share’ button that makes it easier to share and embed examples.&lt;/p&gt;
&lt;p&gt;If you click on the ‘Share’ button, a box will pop up.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2023/screenshot-from-2023-07-18-10-55-23.png&#34; width=&#34;600&#34; height=&#34;256&#34; alt=&#34;Screenshot of the Share dialogue box showing a text box where you can enter a comment and two buttons: ‘Copy share url’, and ‘Copy Markdown button’&#34;&gt;
&lt;p&gt;If you add a comment, this will appear above the example query when users follow the shared link. You can use this to provide them with some context or a description.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2023/9a91947ca6.png&#34; width=&#34;600&#34; height=&#34;343&#34; alt=&#34;Screenshot of an example query with a comment highlighted.&#34;&gt;
&lt;p&gt;There are two buttons providing different share options:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Copy share url&lt;/strong&gt; – copies the full url to the shared example&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Copy Markdown button&lt;/strong&gt; – copies a Markdown snippet that embeds a button like this &lt;img src=&#34;https://wragge.github.io/trove-data-guide/_images/try-trove-api-console.svg&#34; alt=&#34;Try it!&#34;&gt; linked to your example. Just paste it into your Markdown-formatted documentation!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Other improvements include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The parameters used in any request are now displayed in a table for easy reference.&lt;/li&gt;
&lt;li&gt;The Console includes a link to an &lt;a href=&#34;http://glam-workbench.net/trove-api-test/&#34;&gt;API Status page&lt;/a&gt;. This page runs all the example queries in the Console and checks the results to make sure the API is working as expected. It’s updated every 6 hours.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Version 3 of the Trove API includes a standard &lt;a href=&#34;https://api.trove.nla.gov.au/v3/index.html&#34;&gt;Swagger UI&lt;/a&gt; that you can use to build queries, and provides limited anonymous access (without the need for an API key). But if, like me, you learn best by looking at examples, then you should find the API Console handy. In particular, the API Console makes it easy to share live examples which can be very useful in training, troubleshooting, and writing documentation. The &lt;a href=&#34;https://wragge.github.io/trove-data-guide/home.html&#34;&gt;Trove Data Guide&lt;/a&gt;, which I’m working on at the moment, includes lots of ‘Try it!’ buttons.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Updated harvest of NSW State Archives indexes – more than 2 million rows of data!</title>
      <link>https://updates.timsherratt.org/2023/05/08/updated-harvest-of.html</link>
      <pubDate>Mon, 08 May 2023 13:25:09 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2023/05/08/updated-harvest-of.html</guid>
      <description>&lt;p&gt;The NSW State Archives (now part of &lt;a href=&#34;https://mhnsw.au/&#34;&gt;Museums of History NSW&lt;/a&gt;) publishes a series of &lt;a href=&#34;https://mhnsw.au/archive/subjects/?filter=indexes&#34;&gt;useful indexes&lt;/a&gt; to its collections. The indexes include basic data transcribed from the records, such as names, dates, and places, providing fine-grained access to the collections. But when they’re explored as data, the indexes also suggest new ways of analysing, visualising, and linking sets of records. (For some of the possibilities and challenges of using this sort of data see &lt;a href=&#34;http://doi.org/10.1353/jwh.2021.0025&#34;&gt;Missing Links: Data Stories from the Archive of British Settler Colonial Citizenship&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;In 2016, I started harvesting the index data from the NSW State Archives website to make each dataset available as an easily downloadable CSV file. In 2019, changes to the website made it impossible to access the complete indexes, so I was unable to update the CSV files. Fortunately the website has changed again, and I’ve been able to reharvest all the indexes, capturing any changes since 2019. There are currently &lt;strong&gt;75 indexes containing 2,481,881 rows of data&lt;/strong&gt;.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Title&lt;/th&gt;
&lt;th&gt;Number of rows&lt;/th&gt;
&lt;th&gt;Download data&lt;/th&gt;
&lt;th&gt;View at State Archives&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Aboriginal People in the Register of Aboriginal Reserves 1875-1904&lt;/td&gt;
&lt;td&gt;78&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/aboriginal-people-in-the-register-of-aboriginal-reserves.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/first-nations/aboriginal-people-in-the-register-of-aboriginal-reserves/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Assisted Immigrants Index 1839-1896&lt;/td&gt;
&lt;td&gt;200,000&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/assisted-immigrants-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/immigration-and-shipping/assisted-immigrants-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Australian Railway Supply Detachment 1914&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/australian-railway-supply-detachment.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/wwi/australian-railway-supply-detachment/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bankruptcy index 1888-1929&lt;/td&gt;
&lt;td&gt;30,000&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/bankruptcy-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/bankruptcy-and-insolvency/bankruptcy-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bench of Magistrates Index 1788-1820&lt;/td&gt;
&lt;td&gt;4,442&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/bench-of-magistrates-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/courts-lower/bench-of-magistrates-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Botanic Gardens and government domains employees&lt;/td&gt;
&lt;td&gt;916&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/botanic-gardens-and-government-domains-employees-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/professions-and-occupations/botanic-gardens-and-government-domains-employees-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bubonic plague index 1900-1908&lt;/td&gt;
&lt;td&gt;567&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/bubonic-plague-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/bubonic-plague/bubonic-plague-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Census - 1841&lt;/td&gt;
&lt;td&gt;9,355&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/census-1841.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/census-and-musters/census-1841/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chemists, druggists and pharmacists index 1876-1920&lt;/td&gt;
&lt;td&gt;2,967&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/chemists-druggists-and-pharmacists-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/professions-and-occupations/chemists-druggists-and-pharmacists-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Child care and protection index 1817-1942&lt;/td&gt;
&lt;td&gt;21,292&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/child-care-and-protection-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/child-care-and-protection/child-care-and-protection-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Colonial (Government) Architect index 1837-1970&lt;/td&gt;
&lt;td&gt;2,373&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/colonial-architect-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/architecture-and-design/colonial-architect-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Colonial Secretary Letters Received, 1826-1896&lt;/td&gt;
&lt;td&gt;205,863&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/colonial-secretary-letters-received-1826-1896.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/colonial-secretary/colonial-secretary-letters-received-1826-1896/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Colonial Secretary&amp;rsquo;s Papers 1788-1825&lt;/td&gt;
&lt;td&gt;144,572&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/colonial-secretarys-papers-1788-1825.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/colonial-secretary/colonial-secretarys-papers-1788-1825/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Colonial Secretary&amp;rsquo;s letters relating to land 1826-1856&lt;/td&gt;
&lt;td&gt;20,000&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/colonial-secretarys-letters-re-land.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/land/colonial-secretarys-letters-re-land/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Colonial Secretary&amp;rsquo;s main series of letters received&lt;/td&gt;
&lt;td&gt;7,638&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/colonial-secretarys-main-series-of-letters-received.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/colonial-secretary/colonial-secretarys-main-series-of-letters-received/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Convict assignments index 1821-1825&lt;/td&gt;
&lt;td&gt;6,156&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/convict-assignments-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/convicts/convict-assignments-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Convict exiles index 1849-1850&lt;/td&gt;
&lt;td&gt;3,004&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/convict-exiles-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/convicts/convict-exiles-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Convict indents (digitised) index 1788-1801&lt;/td&gt;
&lt;td&gt;20,000&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/convict-indents-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/convicts/convict-indents-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Convicts applications to marry 1825-1851&lt;/td&gt;
&lt;td&gt;14,327&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/convicts-applications-to-marry.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/convicts/convicts-applications-to-marry/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Convicts index 1791-1873&lt;/td&gt;
&lt;td&gt;150,000&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/convicts-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/convicts/convicts-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Coroners&#39; inquests index 1796-1824&lt;/td&gt;
&lt;td&gt;808&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/coroners-inquests-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/inquests-and-coronial-inquiries/coroners-inquests-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Court of Civil Jurisdiction index 1799-1814&lt;/td&gt;
&lt;td&gt;2,876&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/court-of-civil-jurisdiction-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/courts-lower/court-of-civil-jurisdiction-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Court of Claims (Land) index 1833-1922&lt;/td&gt;
&lt;td&gt;2,966&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/court-of-claims-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/land/court-of-claims-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Crew and passengers 1828-1841&lt;/td&gt;
&lt;td&gt;2,560&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/crew-and-passengers-index-1828-1841.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/immigration-and-shipping/crew-and-passengers-index-1828-1841/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Criminal court records index 1788-1833&lt;/td&gt;
&lt;td&gt;5,028&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/criminal-court-records-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/criminal-courts/criminal-court-records-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Criminal depositions (Deposition Books) index 1849-1949&lt;/td&gt;
&lt;td&gt;117,508&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/criminal-depositions-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/criminal-courts/criminal-depositions-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Criminal indictments index 1863-1919&lt;/td&gt;
&lt;td&gt;20,000&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/criminal-indictments-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/criminal-courts/criminal-indictments-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deceased estates index 1880-1958&lt;/td&gt;
&lt;td&gt;577,891&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/deceased-estates-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/deceased-estates/deceased-estates-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Depasturing licenses index 1837-1851&lt;/td&gt;
&lt;td&gt;7,449&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/depasturing-licenses-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/land/depasturing-licenses-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependent children registers 1883-1923&lt;/td&gt;
&lt;td&gt;28,910&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/dependent-children-registers.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/child-care-and-protection/dependent-children-registers/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Devonshire Street Cemetery reinterment index&lt;/td&gt;
&lt;td&gt;9,559&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/devonshire-street-cemetery-reinterment-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/cemeteries-and-burials/devonshire-street-cemetery-reinterment-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Divorce records index 1873-1923&lt;/td&gt;
&lt;td&gt;21,239&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/divorce-records-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/divorce/divorce-records-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fire Commissioners Personnel&lt;/td&gt;
&lt;td&gt;3,767&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/fire-commissioners-personnel.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/firefighters-fires-and-fire-brigades/fire-commissioners-personnel/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gaol inmates &amp;amp; prisoners photos index 1870-1930&lt;/td&gt;
&lt;td&gt;52,055&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/gaol-inmates-prisoners-photos-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/gaol-inmates-and-prisoners/gaol-inmates-prisoners-photos-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gold (auriferous) lease registers 1874-1953&lt;/td&gt;
&lt;td&gt;60,000&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/gold-lease-registers.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/goldmining/gold-lease-registers/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Indigenous colonial court cases 1788-1838&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/indigenous-colonial-court-cases.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/first-nations/indigenous-colonial-court-cases/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infirm &amp;amp; destitute (Government) asylums index 1880-1896&lt;/td&gt;
&lt;td&gt;20,000&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/infirm-destitute-asylums-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/asylums/infirm-destitute-asylums-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inquest index 1942-1963&lt;/td&gt;
&lt;td&gt;45,547&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/inquest-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/inquests-and-coronial-inquiries/inquest-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Insolvency index 1842-1887&lt;/td&gt;
&lt;td&gt;23,108&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/insolvency-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/bankruptcy-and-insolvency/insolvency-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Intestate estates index 1821-1913&lt;/td&gt;
&lt;td&gt;30,000&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/intestate-estates-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/intestates/intestate-estates-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Land grants and leases (registers) 1792-1865&lt;/td&gt;
&lt;td&gt;5,627&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/land-grants-and-leases.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/land/land-grants-and-leases/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Letters re migration to NSW 1838-1857&lt;/td&gt;
&lt;td&gt;22,771&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/letters-re-migration-to-nsw-1838-1857.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/immigration-and-shipping/letters-re-migration-to-nsw-1838-1857/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maintenance registers - Metropolitan Children&amp;rsquo;s Court 1915-1917&lt;/td&gt;
&lt;td&gt;1,372&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/maintenance-registers-metropolitan-childrens-court.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/child-care-and-protection/maintenance-registers-metropolitan-childrens-court/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Miscellaneous immigrants index 1828-1843&lt;/td&gt;
&lt;td&gt;8,821&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/miscellaneous-immigrants-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/immigration-and-shipping/miscellaneous-immigrants-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NSW Government employees granted military leave&lt;/td&gt;
&lt;td&gt;20,000&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/nsw-government-employees-granted-military-leave.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/military-and-war/nsw-government-employees-granted-military-leave/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NSW King’s / Queen’s Counsel appointment correspondence&lt;/td&gt;
&lt;td&gt;2,083&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/nsw-kings-queens-counsel-appointment-correspondence.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/professions-and-occupations/nsw-kings-queens-counsel-appointment-correspondence/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Naturalization index 1834-1903&lt;/td&gt;
&lt;td&gt;9,860&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/naturalization-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/naturalisation-and-citizenship/naturalization-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nominal Roll of the First Railway Section (AIF)&lt;/td&gt;
&lt;td&gt;416&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/nominal-roll-of-the-first-railway-section-aif.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/wwi/nominal-roll-of-the-first-railway-section-aif/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Norfolk Island special bundles index 1794-1813&lt;/td&gt;
&lt;td&gt;216&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/norfolk-island-special-bundles-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/norfolk-island/norfolk-island-special-bundles-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nurses index 1926-1954&lt;/td&gt;
&lt;td&gt;46,499&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/nurses-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/nurses-and-midwives/nurses-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Police service registers 1852-1913&lt;/td&gt;
&lt;td&gt;20,000&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/police-service-registers.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/police/police-service-registers/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Port Macquarie Small Debts Register, 1845-1887&lt;/td&gt;
&lt;td&gt;2,036&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/port-macquarie-small-debts-register-1845-1887.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/courts-lower/port-macquarie-small-debts-register-1845-1887/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Probate records - supplementary index 1790-1875&lt;/td&gt;
&lt;td&gt;1,626&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/probate-records-supplementary-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/probates-and-wills/probate-records-supplementary-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Public Works Salary Registers&lt;/td&gt;
&lt;td&gt;523&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/public-works-salary-registers.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/professions-and-occupations/public-works-salary-registers/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Publicans&#39; licenses index 1830-1861&lt;/td&gt;
&lt;td&gt;20,000&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/publicans-licenses-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/publicans-hoteliers-innkeepers/publicans-licenses-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quarter sessions cases 1824-1837&lt;/td&gt;
&lt;td&gt;6,232&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/quarter-sessions-cases-1824-1837.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/criminal-courts/quarter-sessions-cases-1824-1837/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Railway employment records 1856-1917&lt;/td&gt;
&lt;td&gt;763&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/railway-employment-records.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/railways-and-railway-workers/railway-employment-records/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Railways and Tramways Roll of Honour&lt;/td&gt;
&lt;td&gt;1,214&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/railways-and-tramways-roll-of-honour.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/military-and-war/railways-and-tramways-roll-of-honour/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Register of Firms index 1903-1922&lt;/td&gt;
&lt;td&gt;50,000&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/register-of-firms-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/business-and-company-records/register-of-firms-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;School teachers&#39; rolls 1869-1908&lt;/td&gt;
&lt;td&gt;20,000&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/teachers-rolls.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/teachers/teachers-rolls/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schools and related records 1876-1979&lt;/td&gt;
&lt;td&gt;30,181&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/schools-and-related-records.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/schools-and-education/schools-and-related-records/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Soldier (Closer) Settlement - Returned Soldiers Transfer files 1907-1951&lt;/td&gt;
&lt;td&gt;9,656&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/soldier-settlement-returned-soldiers-transfer-files.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/soldier-settlement/soldier-settlement-returned-soldiers-transfer-files/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Soldier (Closer) Settlement transfer registers 1919-1925&lt;/td&gt;
&lt;td&gt;4,957&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/soldier-settlement-transfer-registers.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/soldier-settlement/soldier-settlement-transfer-registers/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Soldier (Closer) settlement promotion files index 1913-1958&lt;/td&gt;
&lt;td&gt;4,354&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/soldier-settlement-promotion-files-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/soldier-settlement/soldier-settlement-promotion-files-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Soldier Settlement loan files index 1906-1960&lt;/td&gt;
&lt;td&gt;7,642&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/soldier-settlement-loan-files-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/soldier-settlement/soldier-settlement-loan-files-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Soldier Settlement miscellaneous files index 1916&lt;/td&gt;
&lt;td&gt;1,050&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/soldier-settlement-miscellaneous-files-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/soldier-settlement/soldier-settlement-miscellaneous-files-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Soldier Settlement purchases index 1905-1937&lt;/td&gt;
&lt;td&gt;9,776&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/soldier-settlement-purchases-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/soldier-settlement/soldier-settlement-purchases-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Squatters and graziers index 1837-1849&lt;/td&gt;
&lt;td&gt;9,003&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/squatters-and-graziers-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/squatters-and-graziers/squatters-and-graziers-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Surveyor General&amp;rsquo;s crown plans 1792-1886&lt;/td&gt;
&lt;td&gt;5,455&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/surveyor-generals-crown-plans.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/maps-and-plans/surveyor-generals-crown-plans/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Surveyors&#39; field books 1794-1860&lt;/td&gt;
&lt;td&gt;813&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/surveyors-field-books.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/maps-and-plans/surveyors-field-books/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Surveyors’ letters 1822-1855&lt;/td&gt;
&lt;td&gt;157&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/surveyors-letters-1822-1855.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/surveyor-general/surveyors-letters-1822-1855/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tramway employees 1879-1911&lt;/td&gt;
&lt;td&gt;10,606&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/tramway-employees.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/tramways/tramway-employees/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unassisted immigrants index 1842-1855&lt;/td&gt;
&lt;td&gt;140,000&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/unassisted-immigrants-index.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/immigration-and-shipping/unassisted-immigrants-index/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unemployed in Sydney 1866&lt;/td&gt;
&lt;td&gt;3,222&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/unemployed-in-sydney.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/unemployment/unemployed-in-sydney/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vessels arrived in Sydney 1837-1925&lt;/td&gt;
&lt;td&gt;129,999&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://media.githubusercontent.com/media/wragge/srnsw-indexes/master/data/vessels-arrived-in-sydney.csv&#34;&gt;CSV file&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href=&#34;https://mhnsw.au/indexes/immigration-and-shipping/vessels-arrived-in-sydney/&#34;&gt;Browse index&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The list below is also available as &lt;a href=&#34;https://glam-workbench.net/nsw-state-archives/index-list/&#34;&gt;a CSV file&lt;/a&gt;. The &lt;a href=&#34;https://github.com/wragge/srnsw-indexes&#34;&gt;complete repository of CSV files is available on GitHub&lt;/a&gt;, and the methods used to harvest the data are &lt;a href=&#34;https://glam-workbench.net/nsw-state-archives/&#34;&gt;documented in the GLAM Workbench&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;But what’s in all these wonderfully-rich datasets? You can find out using &lt;a href=&#34;https://glam-workbench.net/nsw-state-archives/index-explorer/&#34;&gt;the Index Explorer&lt;/a&gt; in the GLAM Workbench. Just select an index from the dropdown list, and the Index Explorer will analyse every column, summarising the contents and building visualisations to help you understand the range of values.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2023/059fb0fddf.gif&#34; width=&#34;600&#34; height=&#34;549&#34; alt=&#34;&#34; /&gt;
&lt;p&gt;Data from the NSW State Archives Indexes is also included in the &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Index Search&lt;/a&gt;, which helps you find people in 263 different indexes from 10 Australian GLAM organisations.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>A big milestone, Trove contributor data, and the coming of API v3 – recent GLAM Workbench updates</title>
      <link>https://updates.timsherratt.org/2023/03/24/a-big-milestone.html</link>
      <pubDate>Fri, 24 Mar 2023 12:40:58 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2023/03/24/a-big-milestone.html</guid>
      <description>&lt;p&gt;There have been quite a few GLAM Workbench updates over the last month, here’s some notes. (See &lt;a href=&#34;https://updates.timsherratt.org/2023/02/17/maps-people-lists.html&#34;&gt;February’s update&lt;/a&gt; for more recent changes…)&lt;/p&gt;
&lt;h2 id=&#34;general-developments&#34;&gt;General developments&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;After many months of work, &lt;strong&gt;all thirteen Trove repositories&lt;/strong&gt; within the GLAM Workbench have been updated to include standard configurations, integrations, and basic tests. This will make ongoing development and maintenance much easier. Docker images of every repository are now built automatically whenever the code changes. These images can be used across multiple computing environments, including cloud services such as &lt;a href=&#34;https://glam-workbench.net/using-binder/&#34;&gt;Binder&lt;/a&gt;, &lt;a href=&#34;https://glam-workbench.net/using-nectar/&#34;&gt;Nectar&lt;/a&gt;, and &lt;a href=&#34;https://glam-workbench.net/using-reclaim-cloud/&#34;&gt;Reclaim Cloud&lt;/a&gt;, as well on a &lt;a href=&#34;https://glam-workbench.net/using-docker/&#34;&gt;local computer&lt;/a&gt;. This means users have more options for running the notebooks within a consistent, pre-configured, and tested environment.&lt;/li&gt;
&lt;li&gt;With all the Trove repositories now Docker-ised, I worked with the &lt;a href=&#34;https://ardc.edu.au/services/ardc-nectar-research-cloud/&#34;&gt;Nectar Cloud&lt;/a&gt; team to update the &lt;a href=&#34;https://support.ehelp.edu.au/support/solutions/articles/6000253108-nectar-applications-glam-workbench&#34;&gt;&lt;strong&gt;GLAM Workbench app&lt;/strong&gt;&lt;/a&gt;. Now when you install the app, you can select from any of the Trove repositories (excluding API intro and Random items), as well as the &lt;a href=&#34;https://glam-workbench.net/recordsearch/&#34;&gt;NAA RecordSearch&lt;/a&gt;, &lt;a href=&#34;https://glam-workbench.net/web-archives/&#34;&gt;Web Archives&lt;/a&gt;, &lt;a href=&#34;https://glam-workbench.net/digitalnz/&#34;&gt;Digital NZ&lt;/a&gt; and &lt;a href=&#34;https://glam-workbench.net/tepapa/&#34;&gt;Te Papa&lt;/a&gt; repositories.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Datasette Lite integration!&lt;/strong&gt; &lt;a href=&#34;https://github.com/simonw/datasette-lite&#34;&gt;Datasette Lite&lt;/a&gt; turns CSV and JSON files into fully searchable databases running within your browser (no server required). I spent some time creating a &lt;a href=&#34;https://github.com/GLAM-Workbench/datasette-lite&#34;&gt;customised version of Datasette Lite&lt;/a&gt; for the GLAM Workbench. Now all I can just point the urls for CSV datasets to my Datasette Lite repository to open them up for quick exploration – &lt;a href=&#34;https://glam-workbench.net/datasette-lite/?csv=https://github.com/GLAM-Workbench/trove-journals/blob/master/digital-journals-with-text-20220831.csv#/data/digital-journals-with-text-20220831&#34;&gt;like this&lt;/a&gt;! I’ve started adding &lt;strong&gt;Explore in Datasette&lt;/strong&gt; buttons to dataset pages in the GLAM Workbench. Some examples are mentioned below.&lt;/li&gt;
&lt;/ul&gt;
&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2023/6bfdff462d.png&#34; width=&#34;600&#34; height=&#34;193&#34; alt=&#34;Screenshot showing an example of an ‘Explore in Datasette&#39; button.&#34; /&gt;
&lt;h2 id=&#34;new-sections&#34;&gt;New sections&lt;/h2&gt;
&lt;h3 id=&#34;trove-contributorshttpsglam-workbenchnettrove-contributors&#34;&gt;&lt;a href=&#34;https://glam-workbench.net/trove-contributors/&#34;&gt;Trove contributors&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Trove aggregates metadata from thousands of organisations and projects  around Australia. Data about contributors is available from the &lt;code&gt;/contributor&lt;/code&gt; endpoint of the Trove API. This section includes examples of harvesting and exploring this data.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2023/93226e3f32.png&#34; width=&#34;600&#34; height=&#34;735&#34; alt=&#34;Table showing the top five metadata contributors per Trove zone.&#34; /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Notebooks:&lt;/strong&gt; One notebook &lt;a href=&#34;https://glam-workbench.net/trove-contributors/get_contributors/&#34;&gt;converts the nested data&lt;/a&gt; available from the &lt;code&gt;/contributor&lt;/code&gt; endpoint into a single flat list of contributors. Another uses this list to &lt;a href=&#34;https://glam-workbench.net/trove-contributors/get_contributors_totals_zone_format/&#34;&gt;to find out the number of records contributed by each organisation&lt;/a&gt;, aggregated by zone and format.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Datasets&lt;/strong&gt;: Three datasets, generated by the code in the notebooks above, are being updated weekly – a &lt;a href=&#34;https://glam-workbench.net/trove-contributors/trove-contributors-list/&#34;&gt;list of organisations contributing metadata to Trove&lt;/a&gt;, a &lt;a href=&#34;https://glam-workbench.net/trove-contributors/trove-contributors-zones/&#34;&gt;count of records by contributor and zone&lt;/a&gt;, and a &lt;a href=&#34;https://glam-workbench.net/trove-contributors/trove-contributors-formats/&#34;&gt;count of records by contributor, zone, and format&lt;/a&gt;. You can explore them using Datasette Lite.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;trove-api-v3httpsglam-workbenchnettrove-api-v3&#34;&gt;&lt;a href=&#34;https://glam-workbench.net/trove-api-v3/&#34;&gt;Trove API v3&lt;/a&gt;&lt;/h3&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2023/fffe8f9381.png&#34; width=&#34;600&#34; height=&#34;411&#34; alt=&#34;Screenshot of the new v3 beta section of the Trove API Console&#34; /&gt;
&lt;p&gt;This is a temporary section of the the GLAM Workbench created to bring  together information and examples relating to version 3 beta of the Trove  API. It will probably disappear once the new version is officially  released and I reorganise the Trove sections of the GLAM Workbench  accordingly. It currently includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A link to the new &lt;a href=&#34;https://troveconsole.herokuapp.com/v3/&#34;&gt;v3 beta section of the Trove API Console&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;A &lt;a href=&#34;https://glam-workbench.net/trove-api-v3/#summary-of-breaking-changes&#34;&gt;summary of breaking changes&lt;/a&gt; in the new version – these are the things you&amp;rsquo;ll need to change in your code before v2 of the API is switched off in early 2024.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Over the coming months I’ll be adding updated notebooks and examples from other Trove sections.&lt;/p&gt;
&lt;h2 id=&#34;updated-sections&#34;&gt;Updated sections&lt;/h2&gt;
&lt;h3 id=&#34;trove-newspaper--gazette-harvesterhttpsglam-workbenchnettrove-harvester&#34;&gt;&lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;Trove newspaper &amp;amp; gazette harvester&lt;/a&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;New notebook!&lt;/strong&gt; When I overhauled the &lt;a href=&#34;https://wragge.github.io/trove-newspaper-harvester/&#34;&gt;trove-newspaper-harvester&lt;/a&gt; Python package last year, I made it possible to use the harvester as a library as well as a command line tool. This means you can integrate the harvester into your own tools or workflows. This new notebook, &lt;a href=&#34;https://glam-workbench.net/trove-harvester/harvest-specific-days/&#34;&gt;Harvesting articles that mention &amp;ldquo;Anzac Day&amp;rdquo; on Anzac Day&lt;/a&gt;, gives an example of a complex search query where it might be easier to use the harvester as a library – in this case, finding newspaper articles with particular keywords, published on a particular day, over a span of years!&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;trove-unpublished-works-diaries-letters-and-archiveshttpsglam-workbenchnettrove-unpublished&#34;&gt;&lt;a href=&#34;https://glam-workbench.net/trove-unpublished/&#34;&gt;Trove unpublished works (diaries, letters, and archives)&lt;/a&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Repository upgraded&lt;/strong&gt;: Python packages updated, configuration files standardised, basic tests added, automated Docker builds configured, integrations with Zenodo and Reclaim Cloud added.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;New notebooks!&lt;/strong&gt; I’ve added a series of notebooks related to finding, using, &amp;amp; analysing the NLA&amp;rsquo;s digitised manuscript finding aids (which are somewhat submerged beneath other content). One notebook, &lt;a href=&#34;https://glam-workbench.net/trove-unpublished/find-finding-aids/&#34;&gt;finds the finding aids&lt;/a&gt;, another one &lt;a href=&#34;https://glam-workbench.net/trove-unpublished/get-info-finding-aids/&#34;&gt;gathers some summary information&lt;/a&gt; about all 2,337 finding aids, and a third helps you &lt;a href=&#34;https://glam-workbench.net/trove-unpublished/convert-fa-to-json/&#34;&gt;reconstruct the hierarchy from the HTML&lt;/a&gt; of a single finding aid &amp;amp; saves the content as JSON.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;New datasets!&lt;/strong&gt; I used the new finding aids notebooks to generate a couple of datasets. One is just a &lt;a href=&#34;https://glam-workbench.net/trove-unpublished/finding-aids-urls/&#34;&gt;list of Trove urls&lt;/a&gt; pointing to finding aids. The other dataset provides some &lt;a href=&#34;https://glam-workbench.net/trove-unpublished/finding-aids-summary/&#34;&gt;summary information about each finding aid&lt;/a&gt; – this includes the number of items described, digitised, and searchable. You can &lt;a href=&#34;https://glam-workbench.net/datasette-lite/?csv=https://github.com/GLAM-Workbench/nla-finding-aids-data/blob/main/finding-aids-totals.csv&#34;&gt;explore the summary data&lt;/a&gt; using Datasette Lite.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;trove-music--soundhttpsglam-workbenchnettrove-music&#34;&gt;&lt;a href=&#34;https://glam-workbench.net/trove-music/&#34;&gt;Trove music &amp;amp; sound&lt;/a&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Repository upgraded&lt;/strong&gt;: Python packages updated, configuration files standardised, basic tests added, automated Docker builds configured, integrations with Zenodo and Reclaim Cloud added.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Updated harvest&lt;/strong&gt; of &lt;a href=&#34;https://glam-workbench.net/trove-music/abcrn-data/&#34;&gt;ABC Radio National program metadata&lt;/a&gt;: there&amp;rsquo;s now 421,277 records from about 163 programs, though there doesn&amp;rsquo;t seem to have been any additions since early 2022.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;random-items-from-trovehttpsglam-workbenchnettrove-random&#34;&gt;&lt;a href=&#34;https://glam-workbench.net/trove-random/&#34;&gt;Random items from Trove&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This section documents some ways of retrieving &lt;em&gt;random-ish&lt;/em&gt; works and newspaper articles from Trove.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Repository upgraded&lt;/strong&gt;: Python packages updated, configuration files standardised, basic tests added, automated Docker builds configured, integrations with Zenodo and Reclaim Cloud added.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;related-developments&#34;&gt;Related developments&lt;/h2&gt;
&lt;p&gt;I’ve also created a couple of new ‘git scrapers’ to capture information about Trove. These are just bits of code that run on a schedule using GitHub actions and save their results into a GitHub repository. Along with the &lt;a href=&#34;https://wragge.github.io/trove-newspaper-totals/&#34;&gt;Trove Newspaper Data Dashboard&lt;/a&gt; and other &lt;a href=&#34;https://zenodo.org/communities/trove-historical-data/?page=1&amp;amp;size=20&#34;&gt;historical datasets&lt;/a&gt;, these help researchers understand how the contents of Trove changes over time.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-zone-totals&#34;&gt;trove-zone-totals&lt;/a&gt;: saves data about the contents of Trove&amp;rsquo;s zones (this will need to be changed to categories with the release of v3 of the Trove API)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-contributor-totals&#34;&gt;trove-contributor-totals&lt;/a&gt;: saves details of organisations and projects that contribute metadata to Trove&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Maps, people, lists &amp; more – recent updates to Trove resources in the GLAM Workbench</title>
      <link>https://updates.timsherratt.org/2023/02/17/maps-people-lists.html</link>
      <pubDate>Fri, 17 Feb 2023 16:53:07 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2023/02/17/maps-people-lists.html</guid>
      <description>&lt;p&gt;Once again I’ve gotten a bit behind in noting &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; updates, so here’s a quick catch up on some Trove-related changes from the last couple of months.&lt;/p&gt;
&lt;h2 id=&#34;trove-api-introduction&#34;&gt;Trove API introduction&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/trove/&#34;&gt;section that introduces the Trove API&lt;/a&gt; (or APIs!) hasn’t had much love over recent years. I’m hoping to add some more content in the coming months, but for now I just did a bit of maintenance – updating Python packages and config files, including tests, and setting up automated builds of Docker containers. The documentation pages have also had a bit of a refresh. More soon!&lt;/p&gt;
&lt;h2 id=&#34;trove-lists-and-tags&#34;&gt;Trove lists and tags&lt;/h2&gt;
&lt;p&gt;For the Everyday Heritage digital workshop last November, I added a new notebook to &lt;a href=&#34;https://glam-workbench.net/trove-lists/convert-list-to-cb-exhibition/&#34;&gt;convert a Trove list into a CollectionBuilder exhibition&lt;/a&gt;. The notebook harvests metadata and images from the items in a Trove list, then packages everything up in a form that can be uploaded to a &lt;a href=&#34;https://github.com/CollectionBuilder/collectionbuilder-gh&#34;&gt;CollectionBuilder-GH&lt;/a&gt; repository. Just upload the files to create your own instant exhibition! For example, &lt;a href=&#34;https://wragge.github.io/trove-wragge-list-demo/&#34;&gt;this exhibition&lt;/a&gt; was generated from this &lt;a href=&#34;https://trove.nla.gov.au/list/83777&#34;&gt;Trove list&lt;/a&gt;.&lt;/p&gt;
&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2023/6588d32798.png&#34; width=&#34;600&#34; height=&#34;374&#34; alt=&#34;Screenshot of CollectionBuilder exhibition front page, showing the image viewer and subject navigation.&#34; /&gt;
&lt;p&gt;More recently, I updated and reorganised &lt;a href=&#34;https://glam-workbench.net/trove-lists/&#34;&gt;the documentation pages&lt;/a&gt;. In particular, I updated the links to the pre-harvested datasets which are now all saved in Zenodo:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.6827253&#34;&gt;Trove lists metadata&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.6814722&#34;&gt;Trove public tags&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.7563995&#34;&gt;Trove tag counts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These datasets have also been added to the &lt;a href=&#34;https://zenodo.org/communities/trove-historical-data/?page=1&amp;amp;size=20&#34;&gt;Trove historical data collection&lt;/a&gt; in Zenodo.&lt;/p&gt;
&lt;h2 id=&#34;trove-maps&#34;&gt;Trove maps&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/trove-maps/&#34;&gt;Trove Maps&lt;/a&gt; section also needed a refresh and rebuild. The &lt;a href=&#34;https://glam-workbench.net/trove-maps/exploring-digitised-maps/&#34;&gt;existing notebook&lt;/a&gt; harvested metadata about digitised maps to build an overview of what was available. I updated the harvesting code to capture a bit more information, including spatial coordinates where available. These coordinates aren’t available through the API, and aren’t visible on a map’s web page, but they &lt;em&gt;are&lt;/em&gt; embedded within the HTML (this is a &lt;a href=&#34;https://glam-workbench.net/trove-books/metadata-for-digital-works/&#34;&gt;little trick&lt;/a&gt; I’ve used with other digitised materials to get additional metadata). You can download the &lt;a href=&#34;https://glam-workbench.net/trove-maps/single-maps-data/&#34;&gt;updated dataset&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I also added a new notebook that attempts to parse the spatial data strings and convert the coordinates into decimal values that we can display on a map. Some of the coordinates were bounding boxes, while others were just points. If there was a bounding box, I calculated the centre point and saved that as well. I ended up with decimal coordinates for 26,591 digitised maps. You can download the &lt;a href=&#34;https://glam-workbench.net/trove-maps/single-maps-coordinates-data/&#34;&gt;parsed and converted coordinates&lt;/a&gt; as a dataset.&lt;/p&gt;
&lt;p&gt;From this data I was able generate some interesting visualisations. I used &lt;a href=&#34;https://python-visualization.github.io/folium/index.html&#34;&gt;Folium&lt;/a&gt; and the &lt;a href=&#34;https://python-visualization.github.io/folium/plugins.html#folium.plugins.FastMarkerCluster&#34;&gt;FastMarkerCluster&lt;/a&gt; plugin to map all of the centre points. I’ve had trouble before displaying lots of markers in Jupyter, but FastMarkerCluster handled it easily. I also saved the Folium map as &lt;a href=&#34;https://glam-workbench.net/trove-maps/trove-map-clusters.html&#34;&gt;a HTML page&lt;/a&gt; to make it easy for people to explore. Just zoom around and click on markers to display the map’s title and a link to Trove.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-maps/trove-map-clusters.html&#34;&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2023/0b83b60e03.png&#34; width=&#34;600&#34; height=&#34;468&#34; alt=&#34;Screenshot of cluster map.&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Where there are bounding boxes, you can overlay the map images themselves on a modern map. Of course, this isn’t as accurate as georectifying the map, particularly if the map doesn’t fill the whole image, but it’s still pretty fun. There’s a demonstration in the notebook that selects a random map and overlays the image on a modern basemap using &lt;a href=&#34;https://ipyleaflet.readthedocs.io/en/latest/&#34;&gt;IPyleaflet&lt;/a&gt;. It includes a widget to adjust the opacity of the map image (something I didn’t seem to be able to include using Folium?).&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2023/c9f69f7544.png&#34; width=&#34;600&#34; height=&#34;292&#34; alt=&#34;Screenshot of a historical map of the Nile overlaid on a modern map of Egypt.&#34; /&gt;
&lt;h2 id=&#34;trove-people--organisations&#34;&gt;Trove People &amp;amp; Organisations&lt;/h2&gt;
&lt;p&gt;I’ve finally added a &lt;a href=&#34;https://glam-workbench.net/trove-people/&#34;&gt;section for the Trove People &amp;amp; Organisations zone&lt;/a&gt;! This has been in the works for a while, but thanks to the &lt;a href=&#34;https://www.acd-engine.org/&#34;&gt;Australian Cultural Data Engine&lt;/a&gt; I was able to devote some time to it. Trove&amp;rsquo;s People and Organisations zone aggregates information about  individuals and organisations, bringing multiple sources together under a single identifier. Data is available from a series of APIs, which are not well-documented. The notebooks show you how to &lt;a href=&#34;https://glam-workbench.net/trove-people/complete_harvest/&#34;&gt;harvest all the available people and organisations data&lt;/a&gt; as EAC-CPF encoded XML files. Once you have the data, you can extract some summary information about sources and occupations, and use this to &lt;a href=&#34;https://glam-workbench.net/trove-people/intersections/&#34;&gt;explore the way that records from different sources have been merged into unique identities&lt;/a&gt;. For example you can create a network graph of relationships between sources.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2023/a33ef94fc9.png&#34; width=&#34;600&#34; height=&#34;585&#34; alt=&#34;Screenshot of a newtork graph dispaying links between data sources.&#34; /&gt;
&lt;p&gt;Or use an UpSet chart to show the most common groupings of sources.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2023/50fe7a2836.png&#34; width=&#34;600&#34; height=&#34;301&#34; alt=&#34;Screenshot of an UpSet chart.&#34; /&gt;
&lt;p&gt;There’s also a couple of notebooks with some handy examples of code to &lt;a href=&#34;https://glam-workbench.net/trove-people/get_sru_results_as_json/&#34;&gt;convert the XML results from the SRU API to JSON&lt;/a&gt;, and to &lt;a href=&#34;https://glam-workbench.net/trove-people/viaf/&#34;&gt;find extra identity links through VIAF&lt;/a&gt;. I’ve also shared a pre-harvested version of the &lt;a href=&#34;https://glam-workbench.net/trove-people/complete_harvest_dataset/&#34;&gt;complete dataset&lt;/a&gt; and the &lt;a href=&#34;https://glam-workbench.net/trove-people/aggregated_datasets/&#34;&gt;extracted summaries&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Recent presentations – Library of Congress Data Jam, Everyday Heritage, Wikidata, and GLAM Workbench!</title>
      <link>https://updates.timsherratt.org/2022/12/10/recent-presentations-library.html</link>
      <pubDate>Sat, 10 Dec 2022 14:00:33 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/12/10/recent-presentations-library.html</guid>
      <description>&lt;p&gt;October and November brought a flurry of presentations from which I’m still recovering. Here’s a few details and links.&lt;/p&gt;
&lt;h2 id=&#34;library-of-congress-data-jam&#34;&gt;Library of Congress Data Jam&lt;/h2&gt;
&lt;p&gt;In October, the &lt;em&gt;Computing Cultural Heritage in the Cloud&lt;/em&gt; project at the Library of Congress organised a &lt;a href=&#34;https://blogs.loc.gov/thesignal/2022/12/now-playing-the-cchc-data-jam/&#34;&gt;Data Jam&lt;/a&gt;. I was invited to spend a couple of weeks playing around with one of their datasets and to report on the results. I ended up trying to find references to countries in a collection of 90,000 OCRd books. Of course I struck a few problems along the way, and didn’t get nearly as much done as I’d hoped, but that was really part of the point – to find the problems and explore the possibilities. You can watch a &lt;a href=&#34;https://vimeo.com/763639279&#34;&gt;video of my presentation&lt;/a&gt;, or the &lt;a href=&#34;https://www.loc.gov/item/webcast-10669/?loclr=blogsig&#34;&gt;whole Data Jam&lt;/a&gt;.&lt;/p&gt;
&lt;div style=&#34;padding:56.25% 0 0 0;position:relative;&#34;&gt;&lt;iframe src=&#34;https://player.vimeo.com/video/763639279?h=847e5ae3ce&amp;badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479&#34; frameborder=&#34;0&#34; allow=&#34;autoplay; fullscreen; picture-in-picture&#34; allowfullscreen style=&#34;position:absolute;top:0;left:0;width:100%;height:100%;&#34; title=&#34;LoC Data Jam Project&#34;&gt;&lt;/iframe&gt;&lt;/div&gt;&lt;script src=&#34;https://player.vimeo.com/api/player.js&#34;&gt;&lt;/script&gt;
&lt;p&gt;You can also:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://slides.com/wragge/loc-datajam&#34;&gt;View the slides of my presentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/loc-datajam&#34;&gt;Examine the Jupyter notebooks I used to process the data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://loc-books-yajhxrvxsa-ts.a.run.app/&#34;&gt;Explore the data I extracted using Datasette&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://wragge.github.io/loc-books-demo/&#34;&gt;Play with a simple app based on the extracted data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/loc-books-demo&#34;&gt;View the app code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;wikimedia-australia-community-meeting&#34;&gt;Wikimedia Australia Community Meeting&lt;/h2&gt;
&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2022/071c7cad60.png&#34; width=&#34;600&#34; height=&#34;571&#34; alt=&#34;Network visualisation of Australian government departments consisting of a series of hierarchically-ordered coloured nodes connected by curved lines. Each node represents a department, the colour indicates the decade in which it was created and the size indicates its lifespan. The connecting lines join departments and their successors. The chart is ordered chronologically with 1901 at the top and the present day at the bottom.&#34; /&gt;
&lt;p&gt;Thanks to a grant from Wikimedia Australia, I’ve been able to spend some time this year working to align information about government agencies in Wikidata with details from the National Archives of Australia’s RecordSearch database. You can read about the project in this &lt;a href=&#34;https://wikimedia.org.au/wiki/Exploring_government_departments_by_linking_Wikidata_to_the_National_Archives_of_Australia&#34;&gt;blog post&lt;/a&gt;. In November, I gave a report on the project at Wikimedia Australia’s monthly Community Meeting. You can watch the &lt;a href=&#34;https://youtu.be/XkQ_-QgUOBw&#34;&gt;video on YouTube&lt;/a&gt;. You can also explore the &lt;a href=&#34;https://glam-workbench.net/wikidata/&#34;&gt;new Wikidata section&lt;/a&gt; of the GLAM Workbench.&lt;/p&gt;
&lt;h2 id=&#34;everyday-heritage-workshop-and-symposium&#34;&gt;Everyday Heritage workshop and symposium&lt;/h2&gt;
&lt;p&gt;November also brought the first of what will be a series of annual symposia relating to the ARC-funded &lt;a href=&#34;https://everydayheritage.au/&#34;&gt;Everyday Heritage project&lt;/a&gt;. On 9 November, Kate Bagnall and I ran a &lt;a href=&#34;https://everydayheritage.au/events/workshop-november/&#34;&gt;Connecting People and Place&lt;/a&gt; workshop at the University of Canberra. We walked participants through some ways of finding and using digital collections such as Trove, and worked with them to create projects based on their own research using &lt;a href=&#34;https://storymap.knightlab.com/&#34;&gt;StoryMapJS&lt;/a&gt; and &lt;a href=&#34;https://collectionbuilder.github.io/&#34;&gt;CollectionBuilder&lt;/a&gt;. You can view the &lt;a href=&#34;https://slides.com/wragge/trove-newspapers-tips-tricks&#34;&gt;slides of my Trove tips &amp;amp; tricks presentation&lt;/a&gt;. In preparation for the workshop I also created a new notebook in the GLAM Workbench that &lt;a href=&#34;https://glam-workbench.net/trove-lists/#convert-a-trove-list-into-a-collectionbuilder-exhibition&#34;&gt;converts a Trove List into a Collection Builder exhibition&lt;/a&gt;.&lt;/p&gt;
&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2022/4c1592995e.png&#34; width=&#34;600&#34; height=&#34;672&#34; alt=&#34;Screenshot of tweet showing participants at the Connecting People and Place workshop.&#34; /&gt;
&lt;p&gt;The following day the Everyday Heritage Symposium, &lt;a href=&#34;https://everydayheritage.au/news/everyday-heritage-project-2022-public-workshop-and-symposium/&#34;&gt;Connecting digital archives people &amp;amp; place&lt;/a&gt;, was held at the National Film and Sound Archive in Canberra. You can view the slides from my presentation &lt;a href=&#34;https://slides.com/wragge/everyday-heritage-2022&#34;&gt;Access to the everyday through digital collections&lt;/a&gt;. You can also &lt;a href=&#34;https://vimeo.com/776032956&#34;&gt;watch a video of the full symposium&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;building-dh&#34;&gt;Building DH&lt;/h2&gt;
&lt;p&gt;On 15 November I took part in a panel discussion on &lt;a href=&#34;https://web.cvent.com/event/811e389e-78de-46cd-877d-b20b9ae9ed85/websitePage:ab53bd08-f535-4c29-b2f9-af0b5e31c2d2?RefId=15%20NOV%20Session%2002&#34;&gt;Designing user-friendly Platforms and Toolkits for Digital Humanities&lt;/a&gt; as part of the Building DH online conference. You can &lt;a href=&#34;https://youtu.be/CvWRBjl0VYs&#34;&gt;watch the video of the session&lt;/a&gt; on YouTube.&lt;/p&gt;
&lt;h2 id=&#34;capos-2022&#34;&gt;CAPOS 2022&lt;/h2&gt;
&lt;p&gt;On 16 November I gave the feature presentation at &lt;a href=&#34;https://inke.ca/reviewing-revising-and-refining-open-social-scholarship-australasia/&#34;&gt;Reviewing, Revising, and Refining Open Social Scholarship: Australasia&lt;/a&gt;, and event organised by the Canadian-Australian Partnership for Open Scholarship. My talk, ‘DIY Infrastructure – Building the GLAM Workbench’, described work I’ve been doing over the past year or so to make the GLAM Workbench more sustainable by automating and standardising basic tasks, and integrating it with a range of existing services and tools. You can view &lt;a href=&#34;https://echo360.ca/media/14297fe8-2a18-4f60-9da6-5986fde1fcfb/public&#34;&gt;the video of my talk&lt;/a&gt;, or &lt;a href=&#34;https://slides.com/wragge/capos22&#34;&gt;browse the slides&lt;/a&gt;.&lt;/p&gt;
&lt;iframe src=&#34;https://slides.com/wragge/capos22/embed&#34; width=&#34;600&#34; height=&#34;420&#34; title=&#34;DIY Infrastructure: Building the GLAM Workbench&#34; scrolling=&#34;no&#34; frameborder=&#34;0&#34; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;
&lt;h2 id=&#34;worlds-of-wikimedia-2022&#34;&gt;Worlds of Wikimedia 2022&lt;/h2&gt;
&lt;p&gt;Finally, on 17 November, I gave a keynote presentation at the &lt;a href=&#34;https://www.wow2022.net/&#34;&gt;Worlds of Wikimedia Conference&lt;/a&gt;. My talk, ‘Portals, platforms, and participation: building online collaboration around GLAM collections’, revisited my &lt;a href=&#34;https://doi.org/10.5281/zenodo.3563238&#34;&gt;Portals to platforms&lt;/a&gt; paper from 2014, looking at ways that people can engage and create in the space around GLAM collections. You can &lt;a href=&#34;https://slides.com/wragge/wow2022&#34;&gt;view my slides online&lt;/a&gt;.&lt;/p&gt;
&lt;iframe src=&#34;https://slides.com/wragge/wow2022/embed&#34; width=&#34;600&#34; height=&#34;420&#34; title=&#34;Portals, platforms, and participation&#34; scrolling=&#34;no&#34; frameborder=&#34;0&#34; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;
</description>
    </item>
    
    <item>
      <title>Do you want your Trove newspaper articles in bulk? Meet the new Trove Newspaper Harvester Python package!</title>
      <link>https://updates.timsherratt.org/2022/09/22/do-you-want.html</link>
      <pubDate>Thu, 22 Sep 2022 14:53:49 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/09/22/do-you-want.html</guid>
      <description>&lt;p&gt;The Trove Newspaper Harvester has been around in different forms for more than a decade. It helps you download all the articles in a &lt;a href=&#34;https://trove.nla.gov.au/&#34;&gt;Trove&lt;/a&gt; newspaper search, opening up new possibilities for large-scale analysis. You can use it as a command-line tool by installing a Python package, or through the &lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;Trove Newspaper Harvester&lt;/a&gt; section of the GLAM Workbench.&lt;/p&gt;
&lt;p&gt;I’ve just overhauled development of the Python package. The new &lt;a href=&#34;https://github.com/wragge/trove-newspaper-harvester&#34;&gt;trove-newspaper-harvester&lt;/a&gt; replaces the old &lt;a href=&#34;https://github.com/wragge/troveharvester&#34;&gt;troveharvester&lt;/a&gt; repository. The command-line interface remains the same (with a few new options), so it’s really a drop in replacement. &lt;a href=&#34;https://wragge.github.io/trove-newspaper-harvester/&#34;&gt;&lt;strong&gt;Read the full documentation&lt;/strong&gt;&lt;/a&gt; of the new package for more details.&lt;/p&gt;
&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2022/7b5c0bea8e.png&#34; width=&#34;600&#34; height=&#34;483&#34; alt=&#34;Screenshot of the trove-newspaper-harvester documentation describing its use as a Python library.&#34; /&gt;
&lt;p&gt;Here’s a summary of the changes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the package can &lt;strong&gt;now be used as a &lt;a href=&#34;https://wragge.github.io/trove-newspaper-harvester/core.html&#34;&gt;library&lt;/a&gt;&lt;/strong&gt; (that you incorporate into your own code) as well as a standalone &lt;a href=&#34;https://wragge.github.io/trove-newspaper-harvester/cli.html&#34;&gt;command-line tool&lt;/a&gt; – this means you can embed the harvester in your own tools or workflows&lt;/li&gt;
&lt;li&gt;both the library and the CLI now let you set the names of the directories in which your harvests will be saved – this makes it easier to organise your harvests into groups and give them meaningful names&lt;/li&gt;
&lt;li&gt;the harvesting process now &lt;a href=&#34;https://wragge.github.io/trove-newspaper-harvester/core.html#results&#34;&gt;saves results&lt;/a&gt; into a newline-delimited JSON file (one JSON object per line) – the library has a &lt;code&gt;save_csv()&lt;/code&gt; option to convert this to a CSV file, while the CLI automatically converts the results to CSV to maintain compatibility with previous versions&lt;/li&gt;
&lt;li&gt;behind the scenes, the package is now developed and maintained using &lt;a href=&#34;https://nbdev.fast.ai/&#34;&gt;nbdev&lt;/a&gt; – this means the code and documentation are all generated from a set of Jupyter notebooks&lt;/li&gt;
&lt;li&gt;the Jupyter notebooks include a variety of automatic tests which should make maintenance and development much easier in the future&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I’ve also updated the &lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;Trove Newspaper Harvester section&lt;/a&gt; of the GLAM Workbench to use the new package. The new core library will make it easier to develop more complex harvesting examples – for example, searching for articles from a specific day across a range of years. If you find any problems, or want to suggest improvements, please &lt;a href=&#34;https://github.com/wragge/trove-newspaper-harvester/issues/new&#34;&gt;raise an issue&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>From 48 PDFs to one searchable database – opening up the Tasmanian Post Office Directories with the GLAM Workbench</title>
      <link>https://updates.timsherratt.org/2022/09/15/from-pdfs-to.html</link>
      <pubDate>Thu, 15 Sep 2022 23:15:21 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/09/15/from-pdfs-to.html</guid>
      <description>&lt;p&gt;A few weeks ago I &lt;a href=&#34;https://updates.timsherratt.org/2022/09/01/making-nsw-postal.html&#34;&gt;created a new search interface&lt;/a&gt; to the &lt;a href=&#34;https://glam-workbench.net/trove-journals/nsw-post-office-directories/&#34;&gt;NSW Post Office Directories from 1886 to 1950&lt;/a&gt;. Since then, I’ve used the same process on the &lt;a href=&#34;https://glam-workbench.net/trove-journals/sydney-telephone-directories/&#34;&gt;Sydney Telephone Directories from 1926 to 1954&lt;/a&gt;. Both of these publications had been digitised by the State Library of NSW and made available through Trove. To build the new interfaces I downloaded the text from Trove, indexed it by line, and linked it back to the online page images.&lt;/p&gt;
&lt;p&gt;But there are similar directories from other states that are not available through Trove. The Tasmanian Post Office Directory, for example, has been &lt;a href=&#34;https://stors.tas.gov.au/ILS/SD_ILS-981598&#34;&gt;digitised between 1890 and 1948&lt;/a&gt; and made available as 48 individual PDF files from Libraries Tasmania. While it’s great that they’ve been digitised, it’s not really possible to search them without downloading all the PDFs.&lt;/p&gt;
&lt;p&gt;As part of the &lt;a href=&#34;https://everydayheritage.au/&#34;&gt;Everyday Heritage&lt;/a&gt; project, &lt;a href=&#34;https://katebagnall.com/&#34;&gt;Kate Bagnall&lt;/a&gt; and I are working on mapping Tasmania’s Chinese history – finding new ways of connecting people and places. The Tasmanian Post Office Directories will be a useful source for us, so I thought I’d try converting them into a database as I had with the NSW directories. But how?&lt;/p&gt;
&lt;p&gt;There were several stages involved:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Downloading the 48 PDF files&lt;/li&gt;
&lt;li&gt;Extracting the text and images from the PDFs&lt;/li&gt;
&lt;li&gt;Making the separate images available online so they could be integrated with the search interface&lt;/li&gt;
&lt;li&gt;Loading all the text and image links into a SQLite database for online delivery using Datasette&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And here’s the result!&lt;/p&gt;
&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2022/91aff3faba.png&#34; width=&#34;600&#34; height=&#34;326&#34; alt=&#34;Screenshot of search interface for the Tasmanian Post Office Directories.&#34; /&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/tasmanian-post-office-directories/&#34;&gt;&lt;strong&gt;Search for people and places in Tasmania from 1890 to 1948!&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The complete process is documented in a &lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/#tasmanian-post-office-directories&#34;&gt;series of notebooks&lt;/a&gt;, shared through the brand new &lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/&#34;&gt;Libraries Tasmania section&lt;/a&gt; of the GLAM Workbench. As with the NSW directories, the processing pipeline I developed could be reused with similar publications in PDF form. Any suggestions?&lt;/p&gt;
&lt;h2 id=&#34;some-technical-details&#34;&gt;Some technical details&lt;/h2&gt;
&lt;p&gt;There were some interesting challenges in connecting up all the pieces. &lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/tas-pod-save-text-images/&#34;&gt;Extracting the text and images from the PDFs&lt;/a&gt; was remarkably easy using &lt;a href=&#34;https://pymupdf.readthedocs.io/en/latest/&#34;&gt;PyMuPDF&lt;/a&gt;, but the quality of the text wasn’t great. In particular, I had trouble with columns – values from neighbouring columns would be munged together, upsetting the order of the text. I tried working with the positional information provided by PyMuPDF to improve column detection, but every improvement seemed to raise another issue. I was also worried that too much processing might result in some text being lost completely.&lt;/p&gt;
&lt;p&gt;I tried a few experiments re-OCRing the images with &lt;a href=&#34;https://aws.amazon.com/textract/&#34;&gt;Textract&lt;/a&gt; ( a paid service from Amazon) and &lt;a href=&#34;https://github.com/tesseract-ocr/tesseract&#34;&gt;Tesseract&lt;/a&gt;. The basic Textract product provides good OCR, but again I needed to work with the positional information to try and reassemble the columns. On the other hand, Tesseract’s automatic layout detection seemed to work pretty well with just the default settings. It wasn’t perfect, but good enough to support search and navigation. So I decided to &lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/tas-pod-ocr-with-tesseract/&#34;&gt;re-OCR all the images using Tesseract&lt;/a&gt;. I’m pretty happy with the result.&lt;/p&gt;
&lt;p&gt;The search interfaces for the NSW directories display page images loaded directly from Trove into an &lt;a href=&#34;https://openseadragon.github.io/&#34;&gt;OpenSeadragon viewer&lt;/a&gt;. The Tasmanian directories have no online images to integrate in this way, so I had to set up some hosting for the images I extracted from the PDFs. I could have just loaded them from an Amazon s3 bucket, but I wanted to use IIIF to deliver the images. Fortunately there’s a great project that uses Amazon’s Lambda service to provide a &lt;a href=&#34;https://github.com/samvera-labs/serverless-iiif&#34;&gt;Serverless IIIF Image API&lt;/a&gt;. To prepare the images for IIIF, you convert them to pyramidal TIFFs (a format that contains an image at a number of different resolutions) using &lt;a href=&#34;https://github.com/libvips/libvips&#34;&gt;VIPS&lt;/a&gt;. Then you upload the TIFFs to an s3 bucket and point the Serverless IIIF app at the bucket. There’s &lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/tas-pod-upload-images/&#34;&gt;more details in this notebook&lt;/a&gt;. It’s very easy and seems to deliver images amazingly quickly.&lt;/p&gt;
&lt;p&gt;The rest of the processing followed the process I used with the NSW directories – &lt;a href=&#34;https://glam-workbench.net/libraries-tasmania/tas-pod-add-to-datasette/&#34;&gt;using SQLite-utils and Datasette to package the data&lt;/a&gt; and deliver it online via Google Cloudrun.&lt;/p&gt;
&lt;h2 id=&#34;postscript-time-and-money&#34;&gt;Postscript: Time and money&lt;/h2&gt;
&lt;p&gt;I thought I should add a little note about costs (time and money) in case anyone was interested in using this workflow on other publications. I started working on this on Sunday afternoon and had a full working version up about 24 hours later – that includes a fair bit of work that I didn’t end up using, but doesn’t include the time I spent re-OCRing the text a day or so later. This was possible because I was reusing bits of code from other projects, and taking advantage of some awesome open-source software. Now that the processing pipeline is pretty well-defined and documented it should be even faster.&lt;/p&gt;
&lt;p&gt;The search interface uses cloud services from Amazon and Google. It’s a bit tricky to calculate the precise costs of these, but here’s a rough estimate.&lt;/p&gt;
&lt;p&gt;I uploaded 63.9gb of images to Amazon s3. These should cost about US$1.47 per month to store.&lt;/p&gt;
&lt;p&gt;The Serverless IIIF API uses Amazon’s Lambda service. At the moment my usage is within the free tier, so $0 so far.&lt;/p&gt;
&lt;p&gt;The Datasette instance uses Google Cloudrun. Costs for this service are based on a combination of usage, storage space, and the configuration of the environment. The size of the database for the Tasmanian directories is about 600mb, so I can get away with 2gb of application memory. (The NSW Post Office directory currently uses 8gb.) These services scale to zero – so basically they shut down if they’re not being used. This saves a lot of money, but means there can be a pause if they need to start up again. I’m running the Tasmanian and NSW  directories, as well as the &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Index search&lt;/a&gt;, within the same Google Cloud account, and I’m not quite sure how to itemise the costs. But overall, it’s costing me about US$4.00 a month to run them all. Of course if usage increases, so will the costs!&lt;/p&gt;
&lt;p&gt;So I suppose the point is that these sorts of approaches can be quite a practical and cost-effective way of improving access to digitised resources, and don’t need huge investments in time or infrastructure.&lt;/p&gt;
&lt;p&gt;If you want to contribute to the running costs of the NSW and Tasmanian directories you can &lt;a href=&#34;https://github.com/sponsors/wragge?o=esb&#34;&gt;sponsor me on GitHub&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Fresh harvest of OCRd text from Trove&#39;s digitised periodicals – 9gb of text to explore and analyse!</title>
      <link>https://updates.timsherratt.org/2022/09/05/fresh-harvest-of.html</link>
      <pubDate>Mon, 05 Sep 2022 17:13:16 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/09/05/fresh-harvest-of.html</guid>
      <description>&lt;p&gt;I’ve updated the GLAM Workbench’s harvest of OCRd text from Trove&amp;rsquo;s digitised periodicals. This is a completely fresh harvest, so should include any corrections made in recent months. It includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1,430 periodicals&lt;/li&gt;
&lt;li&gt;OCRd text from 41,645 issues&lt;/li&gt;
&lt;li&gt;About 9gb of text&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The easiest way to explore the harvest is probably this &lt;a href=&#34;https://glam-workbench.net/trove-journals/journals-with-ocr/&#34;&gt;human-readable list&lt;/a&gt;. The list of periodicals with OCRd text is also &lt;a href=&#34;https://glam-workbench.net/trove-journals/csv-journals-with-ocr/&#34;&gt;available as a CSV&lt;/a&gt;. You can find &lt;a href=&#34;https://glam-workbench.net/trove-journals/ocrd-text-all-journals/&#34;&gt;more details&lt;/a&gt; in the Trove journals section of the GLAM Workbench, and &lt;a href=&#34;https://cloudstor.aarnet.edu.au/plus/s/QOmnqpGQCNCSC2h&#34;&gt;download the complete corpus&lt;/a&gt; from CloudStor.&lt;/p&gt;
&lt;p&gt;Finding which periodical issues in Trove have OCRd text you can download is not as easy as it should be. The &lt;code&gt;fullTextInd&lt;/code&gt; index doesn&amp;rsquo;t seem to distinguish between digitised works (with OCR)  and born-digital publications (like PDFs) without downloadable text. You can use &lt;code&gt;has:correctabletext&lt;/code&gt; to find articles with OCR, but you can&amp;rsquo;t get a full list of the periodicals the articles come from using the &lt;code&gt;title&lt;/code&gt; facet. As &lt;a href=&#34;https://glam-workbench.net/trove-journals/create-list-digitised-journals/&#34;&gt;this notebook explains&lt;/a&gt;, you can search for &lt;code&gt;nla.obj&lt;/code&gt;, but this returns both digitised works and publications supplied through edeposit. In previous harvests of OCRd text I processed all of the  titles returned by the &lt;code&gt;nla.obj&lt;/code&gt; search, finding out whether  there was any OCRd text by just requesting it and seeing what came back. But the number of non-digitised works on the list of periodicals in  digital form has skyrocketed through the edeposit scheme and this  approach is no longer practical. It just means you waste a lot of time  asking for things that don&amp;rsquo;t exist.&lt;/p&gt;
&lt;p&gt;For the latest harvest I took a different approach. I only processed periodicals in digital form that &lt;em&gt;weren&amp;rsquo;t&lt;/em&gt; identified as coming through edeposit. These are the publications with a &lt;code&gt;fulltext_url_type&lt;/code&gt; value of either &amp;lsquo;digitised&amp;rsquo; or &amp;lsquo;other&amp;rsquo; in my &lt;a href=&#34;https://glam-workbench.net/trove-journals/csv-digital-journals/&#34;&gt;dataset of digital periodicals&lt;/a&gt;. Is it possible that there&amp;rsquo;s some downloadable text in edeposit works  that&amp;rsquo;s now missing from the harvest? Yep, but I think this is a much  more sensible, straightforward, and reproduceable approach.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s not the only problem. As I noted when creating the list of  periodicals in digital form, there are duplicates in the list, so they  have to be removed. You then have to find information about the issues  available for each title. This is not provided by the Trove API, but  there is an internal API used in the web interface that you can access – see &lt;a href=&#34;https://glam-workbench.net/trove-journals/get-ocrd-text-from-digitised-journal/&#34;&gt;this notebook for details&lt;/a&gt;. I also noticed that sometimes where there&amp;rsquo;s a single issue of a title,  it&amp;rsquo;s presented as if each page is an issue. I think I&amp;rsquo;ve found a work  around for that as well.&lt;/p&gt;
&lt;p&gt;All these doubts, inconsistencies and workarounds mean that I&amp;rsquo;m fairly certain I don&amp;rsquo;t have &lt;em&gt;everything&lt;/em&gt;. But I do think I have &lt;em&gt;most&lt;/em&gt; of the OCRd text available from digitised periodicals, and I do have a  methodology, &lt;a href=&#34;https://glam-workbench.net/trove-journals/get-ocrd-text-from-all-journals/&#34;&gt;documented in this notebook&lt;/a&gt;, that at least provides a  starting point for further investigation. As I noted in my &lt;a href=&#34;https://updates.timsherratt.org/2022/05/11/my-trove-researcher.html&#34;&gt;wishlist for a Trove Researcher Platform&lt;/a&gt;, it would be great if more  metadata for digitised works, other than newspapers, was made available  through the API.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Explore Trove&#39;s digitised newspapers by place</title>
      <link>https://updates.timsherratt.org/2022/09/05/explore-troves-digitised.html</link>
      <pubDate>Mon, 05 Sep 2022 16:34:05 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/09/05/explore-troves-digitised.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve updated my map displaying places where Trove digitised newspapers were published or distributed. You can view &lt;a href=&#34;https://troveplaces.herokuapp.com/all/&#34;&gt;all the places on single map&lt;/a&gt; – zoom in for more markers, and click on a marker for title details and a link back to Trove.&lt;/p&gt;
&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2022/e91962f3b3.png&#34; width=&#34;600&#34; height=&#34;552&#34; alt=&#34;A map of Australia with coloured markers indicating the number of Trove’s digitised newspapers published in different locations around the country.&#34; /&gt;
&lt;p&gt;If you want to find newspapers from a particular area, just click on a location &lt;a href=&#34;https://troveplaces.herokuapp.com/map/&#34;&gt;using this map&lt;/a&gt; to view the 10 closest titles.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2022/cccbceaddd.png&#34; width=&#34;600&#34; height=&#34;316&#34; alt=&#34;A map section focused on Walhalla in eastern Victoria with markers indicating nearby places where Trove’s digitised newspapers were published. A column on the right lists the newspaper titles.&#34; /&gt;
&lt;p&gt;You can &lt;a href=&#34;https://docs.google.com/spreadsheets/d/1rURriHBSf3MocI8wsdl1114t0YeyU0BVSXWeg232MZs/edit?usp=sharing&#34;&gt;view or download the dataset&lt;/a&gt; used to construct the map. Place names were extracted from the newspaper titles using the Geoscience Gazetteer.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Making NSW Postal Directories (and other digitised directories) easier to search with the GLAM Workbench and Datasette</title>
      <link>https://updates.timsherratt.org/2022/09/01/making-nsw-postal.html</link>
      <pubDate>Thu, 01 Sep 2022 17:22:09 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/09/01/making-nsw-postal.html</guid>
      <description>&lt;p&gt;As part of my work on the &lt;a href=&#34;https://everydayheritage.au/about/&#34;&gt;Everyday Heritage&lt;/a&gt; project I’m looking at how we can make better use of digitised collections to explore the everyday experiences woven around places such as Parramatta Road in Sydney. For example, the NSW Postal Directories from &lt;a href=&#34;https://nla.gov.au/nla.obj-518308191&#34;&gt;1886 to 1908&lt;/a&gt; and &lt;a href=&#34;https://nla.gov.au/nla.obj-522689844&#34;&gt;1909 to 1950&lt;/a&gt; have been digitised by the State Library of NSW and made available through Trove.  The directories list residences and businesses by name and street location. Using them we can explore changes in the use of Parramatta Road across 60 years of history. But there’s a problem. While you can browse the directories page by page, searching is clunky. Trove’s main search indexes the contents of the directories by ‘article’. Each ‘article’ can be many pages long, so it’s difficult to focus in on the matching text. Clicking through from the search results to the digitised volume lands you in another set of search results, showing all the matches in the volume. However, the internal search index  works differently to the main Trove index. In particular it doesn’t seem to understand phrase or boolean searches. If you start off searching for “parramatta road” , Trove tells you there’s 50 matching articles, but if you click through to a volume you’re told there’s no results. If you remove the quotes you get every match for ‘parramatta’ or ‘road’. It’s all pretty confusing.&lt;/p&gt;
&lt;p&gt;The idea of ‘articles’ is really not very useful for publications like the Post Office Directories where information is mostly organised by column, row or line. You want to be able to search for a name, and go directly to the line in the directory where that name is mentioned. And now you can! Using Datasette, I’ve created an interface that searches &lt;em&gt;by line&lt;/em&gt; across all 54 volumes of the NSW Post Office Directory from 1886 to 1950 (that’s over 30 million lines of text).&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/nsw-post-office-directories/&#34;&gt;&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2022/cb7e107b84.png&#34; width=&#34;600&#34; height=&#34;400&#34; alt=&#34;Screenshot of home page for the NSW Post Office Directories. There’s a search box to search across all 54 volumes, some search tips, and a summary of the data that notes there are more than 30 million rows.&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/nsw-post-office-directories/&#34;&gt;&lt;strong&gt;Try it now!&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;basic-features&#34;&gt;Basic features&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The full text search supports phrase, boolean, and wildcard searches. Just enter your query in the main search box to get results from all 54 volumes in a flash.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Each search result is a single line of text. Click on the link to view this line in context – it’ll show you 5 lines above and below your match, as well as a zoomable image of the digitised page from Trove.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For more context, you can click on &lt;strong&gt;View full page&lt;/strong&gt; to see all the lines of text extracted from that page. You can then use the &lt;strong&gt;Next&lt;/strong&gt; and &lt;strong&gt;Previous&lt;/strong&gt; buttons to browse page by page.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To view the full digitised volume, just click on the &lt;strong&gt;View page in Trove&lt;/strong&gt; button.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2022/8c1b80625f.png&#34; width=&#34;600&#34; height=&#34;335&#34; alt=&#34;Screenshot of information about a single row in the NSW Post Office Directories. The row is highlighted in yellow, and displayed in context with five rows above and below. There’s a button to view the full page, and box displaying a zoomable image of the page from Trove.&#34; /&gt;
&lt;h2 id=&#34;how-it-works&#34;&gt;How it works&lt;/h2&gt;
&lt;p&gt;There were a few stages involved in creating this resource, but mostly I was able to reuse bits of code from the GLAM Workbench’s Trove &lt;a href=&#34;https://glam-workbench.net/trove-journals/&#34;&gt;journals&lt;/a&gt; and &lt;a href=&#34;https://glam-workbench.net/trove-books/&#34;&gt;books&lt;/a&gt; sections, and other related projects such as the &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Index Search&lt;/a&gt;. Here’s a summary of the processing steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I started with the two top-level entries for the NSW Postal Directories, harvesting details of the 54 volumes under them.&lt;/li&gt;
&lt;li&gt;For each of these 54 volumes, I downloaded the OCRd text &lt;strong&gt;page by page&lt;/strong&gt;. Downloading the text by page, rather than volume, was very slow, but I thought it was important to be able to link each line of text back to its original page.&lt;/li&gt;
&lt;li&gt;To create links back to pages, I also needed the individual identifiers for each page. A list of page identifiers is embedded as a JSON string within each volume’s HTML, so I extracted this data and matched the page ids to the text.&lt;/li&gt;
&lt;li&gt;Using &lt;a href=&#34;https://sqlite-utils.datasette.io/&#34;&gt;sqlite-utils&lt;/a&gt;, I created a SQLite database with a separate table for every volume. Then I processed the text by volume, page, and line – adding each line line of text and its page details as a individual record in the database.&lt;/li&gt;
&lt;li&gt;I then ran full text indexing across each line to make it easily searchable.&lt;/li&gt;
&lt;li&gt;Using &lt;a href=&#34;https://datasette.io/&#34;&gt;Datasette&lt;/a&gt; and its &lt;a href=&#34;https://github.com/simonw/datasette-search-all&#34;&gt;search-all plugin&lt;/a&gt;, I loaded up the database and BINGO! More than 30 million lines of text across 54 digitised volumes were instantly searchable.&lt;/li&gt;
&lt;li&gt;To make it all public, I used Datasette’s &lt;code&gt;publish&lt;/code&gt; function to push the database to Google’s Cloudrun service.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/create-text-db-indexed-by-line/&#34;&gt;All the code is available in the journals section&lt;/a&gt; of the GLAM Workbench.&lt;/p&gt;
&lt;h2 id=&#34;future-developments&#34;&gt;Future developments&lt;/h2&gt;
&lt;p&gt;One of the most exciting things to me is that this processing pipeline can be used with any digitised publication on Trove where it would be easier to search by line rather than article. Any suggestions?&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2022/08/29/interested-in-victorian.html</link>
      <pubDate>Mon, 29 Aug 2022 16:55:30 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/08/29/interested-in-victorian.html</guid>
      <description>&lt;p&gt;Interested in Victorian shipwrecks? Kim Doyle and Mitchell Harrop have added a new notebook to the Heritage Council of Victoria section of the GLAM Workbench exploring shipwrecks in the Victorian Heritage Database: &lt;a href=&#34;https://glam-workbench.net/heritage-council-of-victoria/&#34;&gt;glam-workbench.net/heritage-&amp;hellip;&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2022/08/29/updates-troveharvester-python.html</link>
      <pubDate>Mon, 29 Aug 2022 14:53:26 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/08/29/updates-troveharvester-python.html</guid>
      <description>&lt;p&gt;Updates!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;troveharvester Python package updated to v0.5.1: &lt;a href=&#34;https://github.com/wragge/troveharvester/releases/tag/v0.5.1&#34;&gt;github.com/wragge/tr&amp;hellip;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Trove Newspaper Harvester section of #GLAMWorkbench updated to v1.1.1 to use latest troveharvester: &lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;glam-workbench.net/trove-har&amp;hellip;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2022/08/25/minor-update-to.html</link>
      <pubDate>Thu, 25 Aug 2022 14:29:39 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/08/25/minor-update-to.html</guid>
      <description>&lt;p&gt;Minor update to RecordSearch Data Scraper – now captures &amp;lsquo;institution title&amp;rsquo; for agencies if it is present. &lt;a href=&#34;https://pypi.org/project/recordsearch-data-scraper/0.0.17/&#34;&gt;pypi.org/project/r&amp;hellip;&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Many thanks to the British Library – sponsors of the GLAM Workbench’s web archives section!</title>
      <link>https://updates.timsherratt.org/2022/08/16/many-thanks-to.html</link>
      <pubDate>Tue, 16 Aug 2022 11:21:59 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/08/16/many-thanks-to.html</guid>
      <description>&lt;p&gt;You might have noticed some changes to the web archives section of the GLAM Workbench.&lt;/p&gt;
&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2022/07d12ab5f7.png&#34; width=&#34;600&#34; height=&#34;293&#34; alt=&#34;Screenshot of the web archives section showing the acknowledgement of the British Library&#39;s sponsorship.&#34; /&gt;
&lt;p&gt;I’m very excited to announce that the &lt;a href=&#34;https://www.bl.uk/&#34;&gt;British Library&lt;/a&gt; is now sponsoring the &lt;a href=&#34;https://glam-workbench.net/web-archives/&#34;&gt;web archives section&lt;/a&gt;! Many thanks to the British Library and the &lt;a href=&#34;https://www.webarchive.org.uk/&#34;&gt;UK Web Archive&lt;/a&gt; for their support – it really makes a difference.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/web-archives/&#34;&gt;web archives section&lt;/a&gt; was developed in 2020 with the support of the International Internet Preservation Consortium&amp;rsquo;s &lt;a href=&#34;http://netpreserve.org/projects/&#34;&gt;Discretionary Funding Programme&lt;/a&gt;, in collaboration with the British Library, the National Library of Australia, and the National Library of New Zealand. It’s intended to help historians, and other researchers, understand what sort of data is available through web archives, how to get it,  and what you can do with it. It provides a series of tools and examples that document existing APIs, and explore questions such as how web pages change over time. The notebooks focus on four particular web archives: the &lt;a href=&#34;https://www.webarchive.org.uk/&#34;&gt;UK Web Archive&lt;/a&gt;, the &lt;a href=&#34;https://trove.nla.gov.au/website&#34;&gt;Australian Web Archive&lt;/a&gt; (National Library of Australia ), the &lt;a href=&#34;https://natlib.govt.nz/collections/a-z/new-zealand-web-archive&#34;&gt;New Zealand Web Archive&lt;/a&gt; (National Library of New Zealand), and the &lt;a href=&#34;https://archive.org/web/&#34;&gt;Internet Archive&lt;/a&gt;. However, the tools and approaches could be easily extended to other web archives (and soon will be!). I introduced the web archives section of the GLAM Workbench in this seminar for the IIPC in August 2020:&lt;/p&gt;
&lt;iframe width=&#34;560&#34; height=&#34;315&#34; src=&#34;https://www.youtube.com/embed/rVidh_wexoo&#34; title=&#34;YouTube video player&#34; frameborder=&#34;0&#34; allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&#34; allowfullscreen&gt;&lt;/iframe&gt;
&lt;p&gt;According to the &lt;a href=&#34;https://updates.timsherratt.org/2021/06/13/some-glam-workbench.html&#34;&gt;Binder launch stats&lt;/a&gt;, the web archives section is the most heavily used part of the GLAM Workbench. In December 2020, it &lt;a href=&#34;https://updates.timsherratt.org/2020/12/16/glam-workbench-wins.html&#34;&gt;won the British Library Labs Research Award&lt;/a&gt;. Last year I &lt;a href=&#34;https://updates.timsherratt.org/2021/05/17/web-archives-section.html&#34;&gt;updated the repository&lt;/a&gt;, automating the build of Docker images, and adding integrations with Zenodo, Reclaim Cloud, and &lt;a href=&#34;https://updates.timsherratt.org/2021/10/21/glam-workbench-now.html&#34;&gt;Australia’s Nectar research cloud&lt;/a&gt;. I’m also thinking about some new notebooks – watch this space!&lt;/p&gt;
&lt;p&gt;The GLAM Workbench &lt;a href=&#34;https://glam-workbench.net/about/#how-is-the-glam-workbench-funded&#34;&gt;receives no direct funding&lt;/a&gt; from government or research agencies, and so the support of sponsors like the British Library and all my other &lt;a href=&#34;https://glam-workbench.net/get-involved/supporters/&#34;&gt;GitHub sponsors&lt;/a&gt; is really important. &lt;strong&gt;Thank you!&lt;/strong&gt; If you think this work is valuable, have a look at the &lt;a href=&#34;https://glam-workbench.net/get-involved/&#34;&gt;Get involved!&lt;/a&gt; page to see how you can contribute. And if your organisation would like to sponsor a section of the GLAM Workbench, let me know!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>New GLAM data to search, visualise and explore using the GLAM Workbench!</title>
      <link>https://updates.timsherratt.org/2022/08/15/new-glam-data.html</link>
      <pubDate>Mon, 15 Aug 2022 17:43:44 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/08/15/new-glam-data.html</guid>
      <description>&lt;p&gt;There’s lots of GLAM data out there if you know where to look! For the past few years I’ve been harvesting a list of datasets published by Australian galleries, libraries, archives, and museums through open government data portals. I’ve just updated the harvest and there’s now 463 datasets containing 1,192 files. There’s a &lt;a href=&#34;https://glam-workbench.net/glam-datasets-from-gov-portals/&#34;&gt;human-readable version of the list&lt;/a&gt; that you can browse. If you just want the data you can &lt;a href=&#34;https://github.com/GLAM-Workbench/ozglam-data/blob/master/glam-datasets-from-gov-portals.csv&#34;&gt;download it as a CSV&lt;/a&gt;. Or if you’d like to search the list there’s a &lt;a href=&#34;https://ozglam-datasets.glitch.me/data/glam-datasets&#34;&gt;database version hosted on Glitch&lt;/a&gt;. The harvesting and processing code is &lt;a href=&#34;https://glam-workbench.net/glam-data-portals/#harvesting-glam-data-from-government-portals&#34;&gt;available in this notebook&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/glam-data-portals/&#34;&gt;GLAM data from government portals&lt;/a&gt; section of the GLAM Workbench provides more information and a summary of results.  For example, here’s a list of the number of data files by GLAM institution.&lt;/p&gt;
&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2022/ee819455a2.png&#34; width=&#34;600&#34; height=&#34;302&#34; alt=&#34;Table showing the number of datasets published by each GLAM organisation. Queensland State Archives is on top with 108 datasets.&#34; /&gt;
&lt;p&gt;Most of the datasets are in CSV format, and most have a CC-BY licence.&lt;/p&gt;
&lt;h2 id=&#34;whats-inside&#34;&gt;What’s inside?&lt;/h2&gt;
&lt;p&gt;Obviously it’s great that GLAM organisations are sharing lots of open data, but what’s actually inside all of those CSV files? To help you find out, I created the &lt;a href=&#34;https://glam-workbench.net/csv-explorer/&#34;&gt;GLAM CSV Explorer&lt;/a&gt;. Click on the blue button to run it in Binder, then just select a dataset from the dropdown list. The CSV Explorer will download the file, and examine the contents of every field to try and determine the type of data it holds – such as text, dates, or numbers. It then summarises the results and builds a series of visualisations to give you an overview of the dataset.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2022/755e6bddaf.png&#34; width=&#34;600&#34; height=&#34;312&#34; alt=&#34;Screenshot of GLAM CSV Explorer. A series of dropdown boxes allow you to select a dataset to analyse.&#34; /&gt;
&lt;h2 id=&#34;search-for-names&#34;&gt;Search for names&lt;/h2&gt;
&lt;p&gt;Many of the datasets are name indexes to collections of records – GLAM staff or volunteers have transcribed the names of people mentioned in records as an aid to users. For Family History Month last year I aggregated all of the name indexes and made them searchable through a single interface using Datasette. The &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Index Search&lt;/a&gt; has been updated as well – it searches across 10.3 million records in 253 indexes from 10 GLAM organisations. And it’s free!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2022/f3291b49e2.png&#34; width=&#34;600&#34; height=&#34;555&#34; alt=&#34;Screenshot of GLAM Name Index Search showing the list of GLAM organisations with the number of tables and rows you can search from each.&#34; /&gt;
&lt;h2 id=&#34;and-a-bit-of-maintenance&#34;&gt;And a bit of maintenance…&lt;/h2&gt;
&lt;p&gt;As well as updating the data, I also updated &lt;a href=&#34;https://github.com/GLAM-Workbench/ozglam-data&#34;&gt;the code repository&lt;/a&gt; adding the features that I’m rolling out across the whole of the GLAM Workbench. This includes automated Docker builds saved to Quay.io, integrations with Reclaim Cloud and Zenodo, and some basic quality controls through testing and code format checks.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Zotero now saves links to digitised items in Trove from the NLA catalogue!</title>
      <link>https://updates.timsherratt.org/2022/08/09/zotero-now-saves.html</link>
      <pubDate>Tue, 09 Aug 2022 10:52:53 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/08/09/zotero-now-saves.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve made a small change to the Zotero translator for the National Library of Australia&amp;rsquo;s catalogue. Now, if there&amp;rsquo;s a link to a digitised version of the work in Trove, that link will be saved in Zotero&amp;rsquo;s &lt;code&gt;url&lt;/code&gt; field. This makes it quicker and easier to view digitised items – just click on the &amp;lsquo;URL&amp;rsquo; label in Zotero to open the link.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s also handy if you&amp;rsquo;re viewing a digitised work in Trove and want to capture the metadata about it. Just click on the &amp;lsquo;View catalogue&amp;rsquo; link in the details tab of a Trove item, then use Zotero to save the details from the catalogue.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>View embedded JSON metadata for Trove&#39;s digitised books and journals</title>
      <link>https://updates.timsherratt.org/2022/08/01/view-embedded-json.html</link>
      <pubDate>Mon, 01 Aug 2022 19:22:47 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/08/01/view-embedded-json.html</guid>
      <description>&lt;p&gt;The metadata for digitised books and journals in Trove can seem a bit sparse, but there&amp;rsquo;s quite a lot of useful metadata embedded within Trove&amp;rsquo;s web pages that isn&amp;rsquo;t displayed to users or made available through the Trove API. This &lt;a href=&#34;https://glam-workbench.net/trove-books/metadata-for-digital-works/&#34;&gt;notebook&lt;/a&gt; in the GLAM Workbench shows you how you can access it. To make it even easier, I&amp;rsquo;ve added a new endpoint to my &lt;a href=&#34;https://trove-proxy.herokuapp.com/&#34;&gt;Trove Proxy&lt;/a&gt; that returns the metadata in JSON format.&lt;/p&gt;
&lt;p&gt;Just pass the url of a digitised book or journal as a parameter named &lt;code&gt;url&lt;/code&gt; to &lt;code&gt;https://trove-proxy.herokuapp.com/metadata/&lt;/code&gt;. For example:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://trove-proxy.herokuapp.com/metadata/?url=https://nla.gov.au/nla.obj-2906940941&#34;&gt;https://trove-proxy.herokuapp.com/metadata/?url=https://nla.gov.au/nla.obj-2906940941&lt;/a&gt;&lt;/p&gt;
&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2022/65a7caed7d.png&#34; width=&#34;600&#34; height=&#34;978&#34; alt=&#34;Screenshot of the collapsed JSON metadata returned from the url above. It includes fields such as &#39;title&#39;, &#39;accessConditions&#39;, &#39;marcData&#39;, and &#39;children&#39;.&#34; /&gt;
&lt;p&gt;I&amp;rsquo;ve created a simple bookmarklet to make it simpler to open the proxy. To use it just:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Drag this link to your bookmarks toolbar: &lt;a href=&#34;javascript:(function(){let o=window.location.href,e=&#39;https://trove-proxy.herokuapp.com/metadata/?url=&#39;+encodeURIComponent(o);window.location.href=e})();&#34; &gt;Get Trove work metadata&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;View a digitised book or journal in Trove.&lt;/li&gt;
&lt;li&gt;Click on the bookmarklet to view the metadata in JSON format.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To view the JSON data in your browser you might need to install an extension like &lt;a href=&#34;https://jsonview.com/&#34;&gt;JSONView&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Where did all those NSW articles go? Trove Newspapers Data Dashboard update!</title>
      <link>https://updates.timsherratt.org/2022/07/29/where-did-all.html</link>
      <pubDate>Fri, 29 Jul 2022 14:20:34 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/07/29/where-did-all.html</guid>
      <description>&lt;p&gt;I was looking at my &lt;a href=&#34;https://wragge.github.io/trove-newspaper-totals/&#34;&gt;Trove Newspapers Data Dashboard&lt;/a&gt; again last night trying to figure out why the number of newspaper articles from NSW seemed to have dropped by more than 700,000 since my harvesting began. It took me a while to figure out, but it seems that the search index was rebuilt on 31 May, and that caused some major shifts in the distribution of articles by state, as reported by the main &lt;code&gt;result&lt;/code&gt; API. So the indexing of the articles changed, not the actual number of articles. Interestingly, the number of articles by state reported by the &lt;code&gt;newspaper&lt;/code&gt; API doesn&amp;rsquo;t show the same fluctuations.&lt;/p&gt;
&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2022/c20b4dbf89.png&#34; width=&#34;600&#34; height=&#34;421&#34; alt=&#34;Screenshot of data dashboard that compares the number of articles by state as reported by the results and newspapers APIs. There are major differences in the column that shows the change since April 2022.&#34; /&gt;
&lt;p&gt;This adds another layer of complexity to understanding how Trove changes over time. To try and document such things, I&amp;rsquo;ve added a &amp;lsquo;Significant events&amp;rsquo; section to the Dashboard. I&amp;rsquo;ve also included a new &amp;lsquo;Total articles by publication state&amp;rsquo; section that compares results from the &lt;code&gt;result&lt;/code&gt; and &lt;code&gt;newspaper&lt;/code&gt; APIs. This should make it easier to identify such issues in the future.&lt;/p&gt;
&lt;p&gt;Stay alert people – remember, search interfaces lie!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Catching up – some recent GLAM Workbench updates!</title>
      <link>https://updates.timsherratt.org/2022/07/28/catching-up-some.html</link>
      <pubDate>Thu, 28 Jul 2022 15:59:13 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/07/28/catching-up-some.html</guid>
      <description>&lt;p&gt;There’s been lots of small updates to the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; over the last couple of months and I’ve fallen behind in sharing details. So here’s an omnibus list of everything I can remember…&lt;/p&gt;
&lt;h3 id=&#34;data&#34;&gt;Data&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals&#34;&gt;Weekly harvests of basic Trove newspaper data&lt;/a&gt; continue, there&amp;rsquo;s now about three months worth. You can view a summary of the harvested data through the &lt;strong&gt;brand new&lt;/strong&gt; &lt;a href=&#34;https://wragge.github.io/trove-newspaper-totals/&#34;&gt;Trove Newspaper Data Dashboard&lt;/a&gt;. The Dashboard is generated from a Jupyter notebook and is updated whenever there’s a new data harvest.&lt;/li&gt;
&lt;li&gt;There&amp;rsquo;s also &lt;a href=&#34;https://github.com/wragge/naa-recently-digitised&#34;&gt;weekly harvests of files digitised by the NAA&lt;/a&gt;, now 16 months worth of data.&lt;/li&gt;
&lt;li&gt;Updated harvest of &lt;a href=&#34;https://doi.org/10.5281/zenodo.6814722&#34;&gt;Trove public tags&lt;/a&gt; (Zenodo) – includes 2,201,090 unique public tags added to 9,370,614 resources in Trove between August 2008 and July 2022.&lt;/li&gt;
&lt;li&gt;I&amp;rsquo;ve started moving other pre-harvested datasets out of the GLAM Workbench code repositories, into their own data repositories. This means better versioning and citability. The first example is the list of Trove newspapers with articles post the 1955 copyright cliff of death – here&amp;rsquo;s the &lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers-data-post-54/&#34;&gt;GH repo&lt;/a&gt;, and the &lt;a href=&#34;https://doi.org/10.5281/zenodo.6812811&#34;&gt;Zenodo record&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;To bring together datasets that provide historical data about Trove itself, I&amp;rsquo;ve created a &lt;a href=&#34;https://zenodo.org/communities/trove-historical-data/?page=1&amp;amp;size=20&#34;&gt;Trove historical data&lt;/a&gt; community on Zenodo. Anyone&amp;rsquo;s welcome to contribute. There&amp;rsquo;s much more to come.&lt;/li&gt;
&lt;/ul&gt;
&lt;img class=&#34;u-photo&#34; src=&#34;https://cdn.uploads.micro.blog/8371/2022/98dd084059.png&#34; width=&#34;600&#34; height=&#34;401&#34; alt=&#34;Tag cloud showing the frequency of the two hundred most commonly-used tags in Trove.&#34; /&gt;
&lt;p&gt;&lt;em&gt;Tag cloud generated from the latest harvest of Trove Tags&lt;/em&gt;&lt;/p&gt;
&lt;h3 id=&#34;code&#34;&gt;Code&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Big thanks to Mitchell Harrop who contributed a new &lt;a href=&#34;https://glam-workbench.net/heritage-council-of-victoria/&#34;&gt;Heritage Council of Victoria section&lt;/a&gt; to the GLAM Workbench providing examples using the Victorian Heritage Database API.&lt;/li&gt;
&lt;li&gt;The &lt;a href=&#34;https://pypi.org/project/troveharvester/&#34;&gt;&lt;code&gt;troveharvester&lt;/code&gt; Python package&lt;/a&gt; has been updated. Mainly to remove annoying Pandas warnings and to make use of the &lt;a href=&#34;https://pypi.org/project/trove-query-parser/&#34;&gt;&lt;code&gt;trove-query-parser&lt;/code&gt;&lt;/a&gt; package.&lt;/li&gt;
&lt;li&gt;As a result of the above, the &lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;Trove Newspaper &amp;amp; Gazette Harvester section&lt;/a&gt; of the GLAM Workbench has been updated. No major changes to notebooks, but I&amp;rsquo;ve implemented basic testing and linting to improve code quality.&lt;/li&gt;
&lt;li&gt;The &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/&#34;&gt;Trove newspapers section&lt;/a&gt; of the GW has been updated. There were a few bug fixes and minor improvements. In particular there was a problem downloading data and HTML files from QueryPic, and some date queries in QueryPic were returning no results.&lt;/li&gt;
&lt;li&gt;The tool to &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#download-a-page-image&#34;&gt;download complete, high-res newspaper page images&lt;/a&gt; has been updated so that you now no longer need to supply an API key. Also fixed a problem with displaying the images in Voila.&lt;/li&gt;
&lt;li&gt;The &lt;a href=&#34;https://pypi.org/project/recordsearch-data-scraper/&#34;&gt;&lt;code&gt;recordsearch_data_scraper&lt;/code&gt; Python package&lt;/a&gt; has been updated. This fixes a bug where agency and series searches with only one result weren&amp;rsquo;t being captured properly.&lt;/li&gt;
&lt;li&gt;The &lt;a href=&#34;https://glam-workbench.net/recordsearch/&#34;&gt;RecordSearch section&lt;/a&gt; of the GW has been updated. This is incorporates the above update, but I took the opportunity to update all packages, and implement basic testing and linting. The &lt;strong&gt;Harvest items from a search in RecordSearch&lt;/strong&gt; notebook has been simplified and reorganised. There are two new notebooks: &lt;strong&gt;Exploring harvested series data, 2022&lt;/strong&gt; – generates some basic statistics from the harvest of series data in 2022 and compares the results to the previous year; &lt;strong&gt;Summary of records digitised in the previous week&lt;/strong&gt; – run this notebook to analyse the most recent dataset of recently digitised files, summarising the results by series.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://updates.timsherratt.org/2022/07/14/calling-all-tasmanian.html&#34;&gt;A new Zotero translator for Libraries Tasmania&lt;/a&gt; has been developed&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2022/07/13/updated-dataset-harvests.html</link>
      <pubDate>Thu, 14 Jul 2022 00:04:24 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/07/13/updated-dataset-harvests.html</guid>
      <description>&lt;p&gt;Updated dataset! Harvests of Trove list metadata from 2018, 2020, and 2022 are now available on Zenodo: &lt;a href=&#34;https://doi.org/10.5281/zenodo.6827077&#34;&gt;doi.org/10.5281/z&amp;hellip;&lt;/a&gt; Another addition to the growing collection of historical Trove data. #GLAMWorkbench&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2022/9d215dd3f6.png&#34; width=&#34;600&#34; height=&#34;547&#34; alt=&#34;Screen capture of version information from Zenodo showing that there are three available versions, v1.0, v1.1, and v1.2.&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2022/07/09/coz-i-love.html</link>
      <pubDate>Sat, 09 Jul 2022 18:34:26 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/07/09/coz-i-love.html</guid>
      <description>&lt;p&gt;Coz I love making work for myself, I&amp;rsquo;ve started pulling datasets out of #GLAMWorkbench code repos &amp;amp; creating new data repos for them. This way they&amp;rsquo;ll have their own version histories in Zenodo. Here&amp;rsquo;s the first: &lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers-data-post-54/&#34;&gt;github.com/GLAM-Work&amp;hellip;&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2022/06/28/ahead-of-my.html</link>
      <pubDate>Tue, 28 Jun 2022 09:48:32 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/06/28/ahead-of-my.html</guid>
      <description>&lt;p&gt;Ahead of my session at #OzHA2022 tomorrow, I&amp;rsquo;ve updated the NAA section of the  #GLAMWorkbench. Come along to find out how to harvest file details, digitsed images, and PDFs, from a search in RecordSearch! &lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/releases/tag/v1.1.0&#34;&gt;github.com/GLAM-Work&amp;hellip;&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2022/06/26/noticed-that-querypic.html</link>
      <pubDate>Sun, 26 Jun 2022 12:48:52 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/06/26/noticed-that-querypic.html</guid>
      <description>&lt;p&gt;Noticed that QueryPic was having a problem with some date queries. Should be fixed in the latest release of the Trove Newspapers section of the #GLAMWorkbench: &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/&#34;&gt;glam-workbench.net/trove-new&amp;hellip;&lt;/a&gt; #maintenance #researchinfrastructure&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2022/06/24/the-trove-newspapers.html</link>
      <pubDate>Fri, 24 Jun 2022 17:38:04 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/06/24/the-trove-newspapers.html</guid>
      <description>&lt;p&gt;The Trove Newspapers section of the #GLAMWorkbench has been updated! Voilá was causing a problem in QueryPic, stopping results from being downloaded. A package update did the trick! Everything now updated &amp;amp; tested. &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/&#34;&gt;glam-workbench.net/trove-new&amp;hellip;&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2022/06/24/some-more-glamworkbench.html</link>
      <pubDate>Fri, 24 Jun 2022 15:40:33 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/06/24/some-more-glamworkbench.html</guid>
      <description>&lt;p&gt;Some more #GLAMWorkbench maintenance – this app to download a high-res page images from Trove newspapers now doesn&amp;rsquo;t require an API key if you have a url, &amp;amp; some display problems have been fixed. &lt;a href=&#34;https://trove-newspaper-apps.herokuapp.com/voila/render/Save-page-image.ipynb&#34;&gt;trove-newspaper-apps.herokuapp.com/voila/ren&amp;hellip;&lt;/a&gt;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2022/d174ff5c1a.png&#34; width=&#34;600&#34; height=&#34;238&#34; alt=&#34;Screen shot of app --  Download a page image  The Trove web interface doesn&#39;t provide a way of getting high-resolution page images from newspapers. This simple app lets you download page images as complete, high-resolution JPG files.&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2022/06/23/the-trove-newspaper.html</link>
      <pubDate>Thu, 23 Jun 2022 17:11:35 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/06/23/the-trove-newspaper.html</guid>
      <description>&lt;p&gt;The Trove Newspaper and Gazette Harvester section of the #GLAMWorkbench has been updated! No major changes to notebooks, just lots of background maintenance stuff such as updating packages, testing, linting notebooks etc. &lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;glam-workbench.net/trove-har&amp;hellip;&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2022/06/01/ordering-some-glamworkbench.html</link>
      <pubDate>Wed, 01 Jun 2022 17:43:27 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/06/01/ordering-some-glamworkbench.html</guid>
      <description>&lt;p&gt;Ordering some #GLAMWorkbench stickers&amp;hellip;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2022/da543e0090.png&#34; width=&#34;324&#34; height=&#34;340&#34; alt=&#34;Proof image of a hexagonal sticker. The sticker has white lettering on a blue blackground which reads GLAM Workbench. In the centre is a crossed hammer and wrench icon.&#34; /&gt;
</description>
    </item>
    
    <item>
      <title>Using Datasette on Nectar</title>
      <link>https://updates.timsherratt.org/2022/05/26/using-datasette-on.html</link>
      <pubDate>Thu, 26 May 2022 16:24:49 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/05/26/using-datasette-on.html</guid>
      <description>&lt;p&gt;If you have a dataset that you want to share as a searchable online database then check out &lt;a href=&#34;https://datasette.io/&#34;&gt;Datasette&lt;/a&gt; – it’s a fabulous tool that provides an ever-growing range of options for exploring and publishing data. I particularly like how easy Datasette makes it to publish datasets on cloud services like Google’s Cloudrun and Heroku. A couple of weekends ago I migrated the &lt;a href=&#34;https://resources.chineseaustralia.org/tung_wah_newspaper_index&#34;&gt;TungWah Newspaper Index&lt;/a&gt; to Datasette. It’s now running on Heroku, and I can push updates to it in seconds.&lt;/p&gt;
&lt;p&gt;I’m also using Datasette as the platform for sharing data from the &lt;a href=&#34;https://glam-workbench.net/anu-archives/&#34;&gt;Sydney Stock Exchange Project&lt;/a&gt; that I’m working on with the ANU Archives. There’s a lot of data – more than 20 million rows – but getting it running on Google Cloudrun was pretty straightforward with Datasette’s &lt;code&gt;publish&lt;/code&gt; command. The problem was, however, that Datasette is configured to run on most cloud services in ‘immutable’ mode and we want authenticated users to be able to improve the data. So I needed to explore alternatives.&lt;/p&gt;
&lt;p&gt;I’ve been working with &lt;a href=&#34;https://ardc.edu.au/services/nectar-research-cloud/how-to-access-the-ardc-nectar-research-cloud/&#34;&gt;Nectar&lt;/a&gt; over the past year to develop &lt;a href=&#34;https://glam-workbench.net/using-nectar/&#34;&gt;a GLAM Workbench application&lt;/a&gt; that helps researchers do things like harvesting newspaper articles from a Trove search. So I thought I’d have a go at setting up Datasette in a Nectar instance, and it works! Here’s a few notes on what I did…&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;First of course you need &lt;a href=&#34;https://tutorials.rc.nectar.org.au/allocation-management/01-overview&#34;&gt;get yourself a resource allocation&lt;/a&gt; on Nectar. I’ve also got a persistent volume storage allocation that I’m using for the data.&lt;/li&gt;
&lt;li&gt;From the Nectar Dashboard I made sure that I had an &lt;a href=&#34;https://tutorials.rc.nectar.org.au/keypairs&#34;&gt;SSH keypair configured&lt;/a&gt;, and created a &lt;a href=&#34;https://tutorials.rc.nectar.org.au/sec-groups-101&#34;&gt;security group&lt;/a&gt; to allow access via SSH, HTTP and HTTPS. I also &lt;a href=&#34;https://tutorials.rc.nectar.org.au/volume-storage/01-overview&#34;&gt;set up a new storage volume&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;I then &lt;a href=&#34;https://tutorials.rc.nectar.org.au/launching-virtual-machines/01-overview&#34;&gt;created a new Virtual Machine&lt;/a&gt; using the Ubuntu 22.04 image, and attaching the keypair, security group, and volume storage. For the stock exchange data I’m currently used the ‘m3.medium’ flavour of virtual machine which provides 8gb of RAM and 4 VCPUs. This might be overkill, but I went with the bigger machine because of the size of the SQLite database (around 2gb). This is similar to what I used on Cloudstor after I ran into problems with the memory limit. I think most projects would run perfectly well using one of the ‘small’ flavours. In any case, it’s easy to resize if you run into problems.&lt;/li&gt;
&lt;li&gt;Once the new machine was running I grabbed the IP address. Because I have DNS configured on my Nectar project, I also created a ‘datasette’ subdomain from the DNS dashboard by pointing an ‘A’ (alias) record to the IP address.&lt;/li&gt;
&lt;li&gt;Using the IP address I &lt;a href=&#34;https://tutorials.rc.nectar.org.au/connecting/02-terminal-and-ssh&#34;&gt;logged into the new machine via SSH&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;With all the Nectar config done, it was time to set up Datasette. I mainly just followed the &lt;a href=&#34;https://docs.datasette.io/en/stable/deploying.html&#34;&gt;excellent instructions in the Datasette documention&lt;/a&gt; for deploying Datasette using &lt;code&gt;systemd&lt;/code&gt;. This involved installing &lt;code&gt;datasette&lt;/code&gt; via &lt;code&gt;pip&lt;/code&gt;, creating a folder for the Datasette data and configuration files, creating a &lt;code&gt;datasette.service&lt;/code&gt; file for &lt;code&gt;systemd&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;I also used the &lt;code&gt;datasette install&lt;/code&gt; command to add a couple of Datasette plugins. One of these is the &lt;code&gt;datasette-github-auth&lt;/code&gt; plugin, which needs a couple of secret tokens set. I added these as environment variables in the &lt;code&gt;datasette.service&lt;/code&gt; file.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;systemd&lt;/code&gt; setup uses Datasette’s &lt;a href=&#34;https://docs.datasette.io/en/stable/settings.html#configuration-directory-mode&#34;&gt;configuration directory mode&lt;/a&gt;. This means you can put your database, metadata definitions, custom templates and CSS, and any other settings all together in a single directory and Datasette will find and use them. I’d previously passed runtime settings via the command line, so I had to create a &lt;code&gt;settings.json&lt;/code&gt; for these.&lt;/li&gt;
&lt;li&gt;Then I just uploaded all my Datasette database and configuration files to the folder I created on the virtual machine using &lt;code&gt;rsync&lt;/code&gt; and started the Datasette service. It worked!&lt;/li&gt;
&lt;li&gt;The next step was to use the persistent volume storage for my Datasette files. The persistent storage exists independently of the virtual machine, so you don’t need to worry about losing data if there’s a change to the instance. I mounted the storage volume as &lt;code&gt;/pvol&lt;/code&gt; in the virtual machine &lt;a href=&#34;https://tutorials.rc.nectar.org.au/volume-storage/04-format-mount&#34;&gt;as the Nectar documentation describes&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;I created a &lt;code&gt;datasette-root&lt;/code&gt; folder under &lt;code&gt;pvol&lt;/code&gt;, copied the Datasette files to it, and changed the &lt;code&gt;datasette.service&lt;/code&gt; file to point to it. This didn’t seem to work and I’m not sure why. So instead I created a symbolic link between &lt;code&gt;/home/ubuntu/datasette-root&lt;/code&gt; and &lt;code&gt;/pvol/datasette-root&lt;/code&gt; and and set the path in the service file back to &lt;code&gt;/home/ubuntu/datasette-root&lt;/code&gt;. This worked! So now the database and configuration files are sitting in the persistent storage volume.&lt;/li&gt;
&lt;li&gt;To make the new Datasette instance visible to the outside world, I &lt;a href=&#34;https://www.digitalocean.com/community/tutorials/how-to-install-nginx-on-ubuntu-20-04&#34;&gt;installed nginx&lt;/a&gt;, and configured it as a Datasette proxy using the example in the Datasette documentation.&lt;/li&gt;
&lt;li&gt;Finally I configured HTTPS &lt;a href=&#34;https://certbot.eff.org/&#34;&gt;using certbot&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Although the steps above might seem complicated, it was mainly just a matter of copying and pasting commands from the existing documentation. The new Datasette instance is &lt;a href=&#34;https://datasette.glamworkbench.cloud.edu.au&#34;&gt;running here&lt;/a&gt;, but this is just for testing and will disappear soon. If you’d like to know more about the Stock Exchange Project, check out the &lt;a href=&#34;https://glam-workbench.net/anu-archives/&#34;&gt;ANU Archives&lt;/a&gt; section of the GLAM Workbench.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Convert your Trove newspaper searches to an API query with just one click!</title>
      <link>https://updates.timsherratt.org/2022/05/20/convert-your-trove.html</link>
      <pubDate>Fri, 20 May 2022 16:43:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/05/20/convert-your-trove.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;m thinking about the Trove Researcher Platform discussions &amp;amp; ways of integrating Trove with other apps and platforms (like the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;As a simple demo I modifed my &lt;a href=&#34;https://trove-proxy.herokuapp.com/&#34;&gt;Trove Proxy app&lt;/a&gt; to convert a newspaper search url from the Trove web interface into an API query (using the &lt;a href=&#34;https://pypi.org/project/trove-query-parser/&#34;&gt;trove-query-parser&lt;/a&gt; package). The proxy app then redirects you to the &lt;a href=&#34;https://troveconsole.herokuapp.com/&#34;&gt;Trove API Console&lt;/a&gt; so you can see the results of the API query without needing a key.&lt;/p&gt;
&lt;p&gt;To make it easy to use, I created a bookmarklet that encodes your current url and feeds it to the proxy. To use it just:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Drag this link to your bookmarks toolbar: &lt;a href=&#39;javascript:(function(){let o=window.location.href,e=&#34;https://trove-proxy.herokuapp.com/parse/?url=&#34;+encodeURIComponent(o);window.location.href=e})();&#39;&gt;Open Trove API Console&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Run a search in Trove&amp;rsquo;s newspapers.&lt;/li&gt;
&lt;li&gt;Click on the bookmarklet.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This little hack provides a bit of &amp;lsquo;glue&amp;rsquo; to help researchers think their about search results as data, and explore other possibilities for download and analysis. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>My Trove researcher platform wishlist</title>
      <link>https://updates.timsherratt.org/2022/05/11/my-trove-researcher.html</link>
      <pubDate>Wed, 11 May 2022 14:26:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/05/11/my-trove-researcher.html</guid>
      <description>&lt;p&gt;The ARDC is collecting user requirements for the &lt;a href=&#34;https://ardc.edu.au/collaborations/strategic-activities/hass-and-indigenous-research-data-commons/project-plans/trove-researcher-platform-for-advanced-research-project-plan/&#34;&gt;Trove researcher platform for advanced research&lt;/a&gt;. This is a chance to start from scratch, and think about the types of data, tools, or interface enhancements that would support innovative research in the humanities and social sciences. The ARDC will be holding &lt;a href=&#34;https://ardc.edu.au/events/trove-researcher-platform-roundtables/&#34;&gt;two public roundtables&lt;/a&gt;, on 13 and 20 May, to gather ideas. I created a list of possible API improvements in my response to last year&amp;rsquo;s draft plan, and thought it might be useful to expand that a bit, and add in a few other annoyances, possibilities, and long-held dreams.&lt;/p&gt;
&lt;p&gt;My focus is again on the data; this is for two reasons. First because public access to consistent, good quality data makes all other things possible. But, of course, it&amp;rsquo;s never just a matter of OPENING ALL THE DATA. There will be questions about priorities, about formats, about delivery, about normalisation and enrichment. Many of these questions will arise as people try to make use of the data. There needs to be an ongoing conversation between data providers, research tool makers, and research tool users. This is the second reason I think the data is critically important – our focus should be on developing communities and skills, not products. A series of one-off tools for researchers might be useful, but the benefits will wane. Building tools through networks of collaboration and information sharing based on good quality data offers much more. Researchers should be participants in these processes, not consumers.&lt;/p&gt;
&lt;p&gt;Anyway, here&amp;rsquo;s my current wishlist&amp;hellip;&lt;/p&gt;
&lt;h2 id=&#34;apis-and-data&#34;&gt;APIs and data&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Bring the web interface and main public API back into sync, so that researchers can easily transfer queries between the two.&lt;/strong&gt; The Trove interface update in 2020 reorganised resources into &amp;lsquo;categories&amp;rsquo;, replacing the original &amp;lsquo;zones&amp;rsquo;. The API, however, is still organised by zone and knows nothing about these new categories. Why does this matter? The web interface allows researchers to explore the collection and develop research questions. Some of these questions might be answered by downloading data from the API for analysis or visualisation. But, except for the newspapers, there is currently no one-to-one correspondence between searches in the web interface and searches using the API. There&amp;rsquo;s no way of transferring your questions – you need to start again.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Expand the metadata available for digitised resources other than newspapers.&lt;/strong&gt; In recent years, the NLA has digitised huge quantities of books, journals, images, manuscripts, and maps. The digitisation process has generated new metadata describing these resources, but most of this is not available through the public API. We can get an idea of what&amp;rsquo;s missing by comparing the digitised journals to the newspapers. The API includes &lt;a href=&#34;https://troveconsole.herokuapp.com/#get-a-list-of-all-newspaper-titles&#34;&gt;a &lt;code&gt;newspaper&lt;/code&gt; endpoint&lt;/a&gt; that provides data on all the newspapers in Trove. You can use it to get a list of available issues for any newspaper. There is no comparable way of retrieving a list of digitised journals, or the issues that have been digitised. The data’s somewhere – there&amp;rsquo;s an internal API that’s used to generate lists of issues in the browse interface and I&amp;rsquo;ve &lt;a href=&#34;https://glam-workbench.net/trove-journals/get-ocrd-text-from-digitised-journal/&#34;&gt;scraped this to harvest issue details&lt;/a&gt;. But this information should should be in the public API. Manuscripts are described using finding aids, themselves generated from EAD formatted XML files, but none of this important structured data is available from the API, or for download. There’s also &lt;a href=&#34;https://glam-workbench.net/trove-books/metadata-for-digital-works/&#34;&gt;other resource metadata&lt;/a&gt;, such as parent/child relationships between different levels in the object hierarchy (eg publication &amp;gt; pages). These are embedded in web pages but not exposed in the API. The main point is that when it comes to data-driven research, &lt;strong&gt;digitised books, journals, manuscripts, images, and maps are second-class citizens&lt;/strong&gt;, trailing far behind the newspapers in research possibilities. There needs to be a thorough stocktake of available metadata, and a plan to make this available in machine actionable form.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Standardise the delivery of text, images, and PDFs and provide download links through the API.&lt;/strong&gt; As noted above, digitised resources are treated differently depending on where they sit in Trove. There are no standard mechanisms for downloading the products of digitisation, such as OCRd text and images. OCRd text is available directly though the API for newspaper and journal articles, but to download text from a book or journal issue you need to hack the download mechanisms from the web interface. Links to these should be included in the API. Similarly, machine access to images requires various hacks and workarounds. There should be a consistent approach that allows researchers to compile image datasets from digitised resources using the API. Ideally IIIF standard APIs should be used for the delivery of images and maps. This would enable the use of the growing ecosystem of IIIF compliant tools for integration, analysis, and annotation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Provide an option to exclude search results in tags and comments.&lt;/strong&gt; The Trove advanced search used to give you the option of excluding search results which only matched tags or comments, rather than the content of the resource. Back when I was working at Trove, the IT folks said this feature would be added to a future version of the API, but instead it disappeared from the web interface with the 2020 update! Why is this important? If you&amp;rsquo;re trying to analyse the occurance of search terms within a collection, such as Trove&amp;rsquo;s digitised newspapers, you want to be sure that the result reflects the actual content, and not a recent annotation by Trove users.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Finally add the People &amp;amp; Organisations data to the main API.&lt;/strong&gt; Trove&amp;rsquo;s People &amp;amp; Organisations section was ahead of the game in providing machine-readable access, but the original API is out-of-date and uses a completely different query language. Some work was done on adding it to the main RESTful API, but it was never finished. With a bit of long-overdue attention, the People &amp;amp; Organisations data could power new ways of using and linking biographical resources.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Improve web archives CDX API.&lt;/strong&gt; Although the NLA does little to inform researchers of the possibilities, the web archives software it uses (Pywb) includes some baked in options for retrieving machine-readable data. This includes &lt;a href=&#34;https://glam-workbench.net/web-archives/timegates-timemaps-mementos/&#34;&gt;support for the Memento protocol&lt;/a&gt;, and the provision of a CDX API that delivers basic metadata about individual web page captures. The current CDX API has some limitations ( &lt;a href=&#34;https://nbviewer.jupyter.org/github/GLAM-Workbench/web-archives/blob/master/comparing_cdx_apis.ipynb&#34;&gt;documented here&lt;/a&gt; ). In particular, there&amp;rsquo;s no pagination or results, and no support for domain-level queries. Addressing these limitations would make the existing CDX API much more useful.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Provide new data sources for web archives analysis.&lt;/strong&gt; There needs to be an constructive, ongoing discussion about the types of data that could be extracted and shared from the Australian web archive. For example, a search API, or downloadable datasets of word frequencies. The scale is a challenge, but some pilot studies could help us all understand both the limits and the possibilities.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Provide a Write API for annotations.&lt;/strong&gt; Integration between components in the HASS RDC would be greatly enhanced if other projects could automatically add structured annotations to existing Trove resources. Indeed, this would create exciting possibilities for embedding Trove resources within systems of scholarly analysis, allowing insights gained through research to be automatically fed back into Trove to enhance discovery and understanding.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Provide historical statistics on Trove resources.&lt;/strong&gt; It&amp;rsquo;s important for researchers to understand how Trove itself changes over time. There used to be a page that provided regularly-updated statistics on the number of resources and user annotations, but this was removed by the interface upgrade in 2020. I&amp;rsquo;ve &lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals&#34;&gt;started harvesting some basic stats&lt;/a&gt; relating to Trove newspapers, but access to general statistics should be reinstated.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Reassess key authentication and account limits.&lt;/strong&gt; DigitalNZ recently &lt;a href=&#34;https://digitalnz.org/blog/posts/accessing-the-digitalnz-api&#34;&gt;changed their policy around API authentication&lt;/a&gt;, allowing public access without a key. Authentication requirements hinder exploration and limit opportunities for using the API in teaching and workshops. Similarly, I don&amp;rsquo;t think the account usage limits have been changed since the API was released, even though the capacity of the systems has increased. It seems like time that both of these were reassessed.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ok, I&amp;rsquo;ll admit, that&amp;rsquo;s a pretty long list, and not everything can be done immediately! I think this would be a good opportunity for the NLA to develop and share an API and Data Roadmap, that is regularly updated, and invites comments and suggestions. This would help researchers plan for future projects, and build a case for further government investment.&lt;/p&gt;
&lt;h2 id=&#34;integration&#34;&gt;Integration&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Unbreak Zotero integration.&lt;/strong&gt; The 2020 interface upgrade broke the existing Zotero integration and there&amp;rsquo;s no straightforward way of fixing it without changes at the Trove end. Zotero used to be able to capture search results, metadata and images from most of the zones in Trove. Now it can only capture individual newspaper articles. This greatly limits the ability of researchers to assemble and manage their own research collections. More generally, a program to examine and support &lt;a href=&#34;https://updates.timsherratt.org/2022/01/29/zotero-support-in.html&#34;&gt;Zotero integration across the GLAM sector&lt;/a&gt; would be a useful way of spending some research infrastructure dollars.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Provide useful page metadata&lt;/strong&gt;. Zotero is just one example of a tool that can extract structured metadata from web pages. Such metadata supports reuse and integration, without the need for separate API requests. Only Trove&amp;rsquo;s newspaper articles currently provide embedded metadata. Libraries used to lead the way is promoting the use of standardised, structured, embedded page metadata (Dublin Core anyone?), but now?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Explore annotation frameworks.&lt;/strong&gt; I&amp;rsquo;ve mentioned the possibility of a Write API for annotations above, but there are other possibilities for supporting web scale annotations, such as &lt;a href=&#34;https://web.hypothes.is/&#34;&gt;Hypothesis&lt;/a&gt;. Again, the current Trove interface makes the use of Hypothesis difficult, and again this sort of integration would be usefully assessed across the whole GLAM sector.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;tools--interfaces&#34;&gt;Tools &amp;amp; interfaces&lt;/h2&gt;
&lt;p&gt;Obviously any discussion of new tools or interfaces needs to start by looking at what&amp;rsquo;s already available. This is difficult when the NLA won&amp;rsquo;t even link to existing resources such as the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt;. Sharing information about existing tools needs to be the starting point from which to plan investment in the Trove Researcher Platform. From there we can identify gaps and develop processes and collaborations to meet specific research needs. Here&amp;rsquo;s a list of some &lt;a href=&#34;https://updates.timsherratt.org/2022/05/02/working-with-trove.html&#34;&gt;Trove-related tools and resources&lt;/a&gt; currently available through the GLAM Workbench.&lt;/p&gt;
&lt;h2 id=&#34;update-18-may-some-extra-bonus-bugs&#34;&gt;Update (18 May): some extra bonus bugs&lt;/h2&gt;
&lt;p&gt;I forgot to add these annoying bugs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;newspaper&lt;/code&gt; endpoint returns both newspaper and gazette titles, even though there&amp;rsquo;s a separate &lt;code&gt;gazette&lt;/code&gt; endpoint. This forces you to do silly workarounds &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#get-a-list-of-trove-newspapers-that-doesnt-include-government-gazettes&#34;&gt;like this in the GLAM Workbench&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;list&lt;/code&gt; zone has recurring problems. At the moment it&amp;rsquo;s impossible to &lt;a href=&#34;https://glam-workbench.net/trove-lists/#harvest-summary-data-from-trove-lists&#34;&gt;harvest a complete set of Trove lists&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2022/05/10/spending-the-evening.html</link>
      <pubDate>Tue, 10 May 2022 22:26:35 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/05/10/spending-the-evening.html</guid>
      <description>&lt;p&gt;Spending the evening updating the NAA section of the #GLAMWorkbench. Here&amp;rsquo;s a fresh harvest of the agency functions currently being used in RecordSearch&amp;hellip; &lt;a href=&#34;https://gist.github.com/wragge/d1612daa4c87c4a0f1eeb27c6adeb4ab&#34;&gt;gist.github.com/wragge/d1&amp;hellip;&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Working with Trove data – a collection of tools and resources</title>
      <link>https://updates.timsherratt.org/2022/05/02/working-with-trove.html</link>
      <pubDate>Mon, 02 May 2022 15:23:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/05/02/working-with-trove.html</guid>
      <description>&lt;p&gt;The ARDC is organising &lt;a href=&#34;https://ardc.edu.au/events/trove-researcher-platform-roundtables/&#34;&gt;a couple of public forums&lt;/a&gt; to help gather researcher requirements for the &lt;a href=&#34;https://ardc.edu.au/collaborations/strategic-activities/hass-and-indigenous-research-data-commons/project-plans/trove-researcher-platform-for-advanced-research-project-plan/&#34;&gt;Trove component of the HASS RDC&lt;/a&gt;. One of the roundtables will look at &amp;lsquo;Existing tools that utilise Trove data and APIs&amp;rsquo;. Last year I wrote a summary of what &lt;a href=&#34;https://updates.timsherratt.org/2021/08/26/glam-workbench-a.html&#34;&gt;the GLAM Workbench can contribute to the development of humanities research infrastructure&lt;/a&gt;, particularly in regard to Trove. I thought it might be useful to update that list to include recent additions to the GLAM Workbench, as well as a range of other datasets, software, tools, and interfaces that exist outside of the GLAM Workbench.&lt;/p&gt;
&lt;p&gt;Since last year&amp;rsquo;s post I&amp;rsquo;ve also been working hard to integrate the GLAM Workbench with other eResearch services such as &lt;a href=&#34;https://glam-workbench.net/using-nectar/&#34;&gt;Nectar&lt;/a&gt; and &lt;a href=&#34;https://glam-workbench.net/glam-tools/&#34;&gt;CloudStor&lt;/a&gt;, and to document and support the ways that &lt;a href=&#34;https://updates.timsherratt.org/2022/03/02/the-glam-workbench.html&#34;&gt;individuals and institutions can contribute&lt;/a&gt; code and documentation.&lt;/p&gt;
&lt;h2 id=&#34;getting-and-moving-data&#34;&gt;Getting and moving data&lt;/h2&gt;
&lt;p&gt;There’s lots of fabulous data in Trove and other GLAM collections. In fact, there’s so much data that it can be difficult for researchers to find and collect what’s relevant to their interests. There are many tools in the GLAM Workbench to help researchers assemble their own datasets. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;Get newspaper articles in bulk with the Trove Newspaper and Gazette Harvester&lt;/a&gt;&lt;/strong&gt; – This has been around in some form for more than ten years (it pre-dates the Trove API!). Give it the url of a search in Trove’s newspapers and gazettes and the harvester will save all the metadata in a CSV file, and optionally download the complete articles as OCRd text, images, or PDFs. The amount of data you harvest is really only limited by your patience and disk space. I’ve harvested more than a million articles in the past. The GLAM Workbench includes a web app version of the harvester that runs live in the cloud – just paste in your Trove API key and the search url, and click the button.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#harvest-information-about-newspaper-issues&#34;&gt;&lt;strong&gt;Harvest information about newspaper issues&lt;/strong&gt;&lt;/a&gt; – When you search Trove&amp;rsquo;s newspapers, you find articles – these articles are grouped by page, and all the pages from a particular date make up an issue. But how do you find out what issues are available? On what dates were newspapers published? This notebook shows how you can get information about issues from the Trove API.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Get Trove newspaper pages as images&lt;/strong&gt; – If you need a nice, high-resolution version of a newspaper page you can &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#download-a-page-image&#34;&gt;use this web app&lt;/a&gt;. If you want to harvest every front page (or some other particular page) here’s an example that &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#harvest-australian-womens-weekly-covers-or-the-front-pages-of-any-newspaper&#34;&gt;gets all the covers of the &lt;em&gt;Australian Women’s Weekly&lt;/em&gt;&lt;/a&gt;. A pre-harvested &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#australian-womens-weekly-front-covers-1933-to-1982&#34;&gt;collection of the AWW covers&lt;/a&gt; is included as a bonus extra.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#save-a-trove-newspaper-article-as-an-image&#34;&gt;Get Trove newspaper articles as images&lt;/a&gt;&lt;/strong&gt; – The Trove web interface makes it difficult to download complete images of articles, but this tool will do the job. There’s a handy web app to grab individual images, but the code from this tool is reused in other places such as the Trove Newspaper Harvester and the Omeka uploader, and could be built-in to your own research workflows.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#harvest-the-issues-of-a-newspaper-as-pdfs&#34;&gt;&lt;strong&gt;Harvest the issues of a newspaper as PDFs&lt;/strong&gt;&lt;/a&gt; – This notebook harvests &lt;em&gt;whole issues&lt;/em&gt; of newspapers as PDFs – one PDF per issue.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#upload-trove-newspaper-articles-to-omeka-s&#34;&gt;Upload Trove newspaper articles to Omeka&lt;/a&gt;&lt;/strong&gt; – Whether you’re creating on online exhibition or building a research database, Omeka can be very useful. This notebook connects Trove’s newspapers to Omeka for easy upload. Your selected articles can come from a search query, a Trove list, a Zotero library, or just a list of article ids. Metadata records are created in Omeka for each article and newspaper, and an image of each article is attached. My &lt;a href=&#34;https://wragge.github.io/omeka_s_tools/api.html&#34;&gt;Omeka S Tools software package&lt;/a&gt; also includes an example using Trove newspapers.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/#get-ocrd-text-from-a-digitised-journal-in-trove&#34;&gt;Get OCRd text from digitised periodicals in Trove&lt;/a&gt;&lt;/strong&gt; – They’re often overshadowed by the newspapers, but there’s now lots of digitised journals, magazines, and parliamentary papers in Trove. You can get article-level data from the API, but not issue data. This notebook enables researchers to get metadata and OCRd text from every available issue of a periodical. To make researchers’ lives even easier, I regularly harvest &lt;a href=&#34;https://glam-workbench.net/trove-journals/#ocrd-text-from-trove-digitised-journals&#34;&gt;&lt;strong&gt;all&lt;/strong&gt; the available OCRd text&lt;/a&gt; from digitised periodicals in Trove. The latest harvest downloaded 51,928 issues from 1,163 periodicals – that’s about 10gb of text. You can &lt;a href=&#34;https://github.com/GLAM-Workbench/trove-journals/blob/master/digital-journals-with-text.md&#34;&gt;browse the list of periodicals&lt;/a&gt; with OCRd text, or &lt;a href=&#34;https://trove-digital-periodicals.glitch.me/data/trove-digital-journals&#34;&gt;search this database&lt;/a&gt;. All the OCRd text is stored in a public repository on CloudStor.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/#get-covers-or-any-other-pages-from-a-digitised-journal-in-trove&#34;&gt;Get page images from digitised periodicals in Trove&lt;/a&gt;&lt;/strong&gt; – There’s more than text in digitised periodicals, and you might want to download images of pages for visual analysis. This notebook shows you how to get cover images, but could be easily modified to get another page, or a PDF. I used a modified version of this to create &lt;a href=&#34;https://glam-workbench.net/trove-journals/#editorial-cartoons-from-the-bulletin-1886-to-1952&#34;&gt;a collection of 3,471 full page editorial cartoons&lt;/a&gt; from &lt;em&gt;The Bulletin&lt;/em&gt;, 1886 to 1952 – all available to download from CloudStor.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-books/#harvesting-the-text-of-digitised-books-and-ephemera&#34;&gt;Get OCRd text from digitised books in Trove&lt;/a&gt;&lt;/strong&gt; – Yep, there’s digitised books as well as newspapers and periodicals. You can download OCRd text from an individual book using the Trove web interface, but how do you make a collection of books without all that pointing and clicking? This notebook downloads all the available OCRd text from digitised books in Trove. The latest harvest includes &lt;a href=&#34;https://glam-workbench.net/trove-books/#ocrd-text-from-trove-books-and-ephemera&#34;&gt;text from 26,762 works&lt;/a&gt;. You can explore the results &lt;a href=&#34;https://trove-digital-books.glitch.me/data/trove-digital-books&#34;&gt;using this database&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/#harvest-parliament-press-releases-from-trove&#34;&gt;Harvest parliamentary press releases from Trove&lt;/a&gt;&lt;/strong&gt; – Trove includes more than 380,000 press releases, speeches, and interview transcripts issued by Australian federal politicians and saved by the Parliamentary Library. This notebook shows you how to harvest both metadata and fulltext from a search of the parliamentary press releases. For example, here’s a collection of &lt;a href=&#34;https://glam-workbench.net/trove-journals/#politicians-talking-about-immigrants-and-refugees&#34;&gt;politicians talking about ‘refugees’&lt;/a&gt;, and another &lt;a href=&#34;https://glam-workbench.net/trove-journals/#politicians-talking-about-covid&#34;&gt;relating to COVID-19&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-music/#harvest-abc-radio-national-records-from-trove&#34;&gt;Harvest details of Radio National programs from Trove&lt;/a&gt;&lt;/strong&gt; – Trove creates records for programs broadcast on ABC Radio National, for the major current affairs programs these records at at segment level. Even though they don’t provide full transcripts, this data provide a rich, fine-grained record of Australia’s recent political, social, and economic history. This notebook shows you how to download the Radio National data. If you just want to dive straight in, there’s also a &lt;a href=&#34;https://glam-workbench.net/trove-music/#abc-radio-national-programs&#34;&gt;pre-harvested collection&lt;/a&gt; containing more than 400,000 records, with separate downloads for some of the main programs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/web-archives/#find-all-the-archived-versions-of-a-web-page&#34;&gt;Find all the versions of an archived web page in Trove&lt;/a&gt;&lt;/strong&gt; – Many of the tools in the &lt;a href=&#34;https://glam-workbench.net/web-archives/&#34;&gt;Web Archives section&lt;/a&gt; of the GLAM Workbench will work with the Australian Web Archive, which is part of Trove. This notebook shows you how to get data about the number of times a web page has been archived over time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/web-archives/#harvesting-collections-of-text-from-archived-web-pages&#34;&gt;Harvesting collections of text from archived web pages in Trove&lt;/a&gt;&lt;/strong&gt; – If you want to explore how the content of a web page changes over time, you can use this notebook to capture the text content of every archived version of a web page.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-lists/#convert-a-trove-list-into-a-csv-file&#34;&gt;Convert a Trove list into a CSV file&lt;/a&gt;&lt;/strong&gt; – While Trove provides a data download option for lists, it leaves out a lot of useful data. This notebook downloads full details of newspaper articles and other works in a list and saves them as CSV files. Like the Trove Newspaper Harvester, it lets you download OCRd text and images from newspaper articles.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Collecting information about Trove user activity&lt;/strong&gt; – It’s not just the content of Trove that provides interesting research data, it’s also the way people engage with it. Using the Trove API it’s possible to harvest details of &lt;a href=&#34;https://glam-workbench.net/trove-lists/&#34;&gt;all user created lists and tags&lt;/a&gt;. And yes, there’s pre-harvested collections of &lt;a href=&#34;https://glam-workbench.net/trove-lists/#trove-lists-metadata&#34;&gt;lists&lt;/a&gt; and &lt;a href=&#34;https://glam-workbench.net/trove-lists/#trove-public-tags&#34;&gt;tags&lt;/a&gt; for the impatient.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While I’m focusing here on Trove, there’s also tools to create datasets from the &lt;a href=&#34;https://glam-workbench.net/recordsearch/&#34;&gt;National Archives of Australia&lt;/a&gt;, &lt;a href=&#34;https://glam-workbench.net/digitalnz/&#34;&gt;Digital NZ and Papers Past&lt;/a&gt;, the &lt;a href=&#34;https://glam-workbench.net/nma/&#34;&gt;National Museum of Australia&lt;/a&gt; and more. And there’s a &lt;a href=&#34;https://glam-workbench.net/glam-data-list/&#34;&gt;big list of readily downloadable datasets&lt;/a&gt; from Australian GLAM organisations.&lt;/p&gt;
&lt;h2 id=&#34;visualisation-and-analysis&#34;&gt;Visualisation and analysis&lt;/h2&gt;
&lt;p&gt;Many of the notebooks listed above include examples that demonstrate ways of exploring and analysing your harvested data. There are also a number of companion notebooks that examine some possibilities in more detail, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-harvester/#exploring-your-troveharvester-data&#34;&gt;Explore your Trove newspaper harvests&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-harvester/#display-the-results-of-a-harvest-as-a-searchable-database-using-datasette&#34;&gt;Load your Trove newspaper harvest in Datasette&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-music/#exploring-abc-radio-national-metadata&#34;&gt;Exploring ABC Radio National metadata&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-lists/#analyse-public-tags-added-to-trove&#34;&gt;Analyse public tags added to Trove&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But there are also many other notebooks that demonstrate methods for analysing Trove’s content, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#querypic&#34;&gt;QueryPic&lt;/a&gt;&lt;/strong&gt; – Another tool that’s been around in different forms for a decade, QueryPic visualises searches in Trove’s newspapers. The latest web app couldn’t be simpler, just paste in your API key and a search url and create charts showing the number of matching articles over time. You can combine queries, change time scales, and download the data and visualisations. Interested to see how other researchers have used it? Here&amp;rsquo;s a &lt;a href=&#34;https://updates.timsherratt.org/2021/08/30/some-research-projects.html&#34;&gt;Twitter thread with links to some publications&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#visualise-trove-newspaper-searches-over-time&#34;&gt;Visualise Trove newspaper searches over time&lt;/a&gt;&lt;/strong&gt; – This is like a deconstructed version of QueryPic that walks you through the process of using Trove’s facets to assemble a dataset of results over time. It provide a lot of detail on the sorts of data available, and the questions we can ask of it.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#visualise-the-total-number-of-newspaper-articles-in-trove-by-year-and-state&#34;&gt;Visualise the total number of newspaper articles in Trove by year and state&lt;/a&gt;&lt;/strong&gt; – This notebook uses a modified version of the code above to analyse the construction and context of Trove’s newspaper corpus itself. What are you actually searching? Meet the WWI effect and the copyright cliff of death! This is a great place to start if you want to get people thinking critically about digital resources are constructed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/examples/trove_newspaper_issues_per_day.html&#34;&gt;Trove newspapers – number of issues per day, 1803–2020&lt;/a&gt;&lt;/strong&gt; – visualisation of the number of newspaper issues published &lt;strong&gt;every day&lt;/strong&gt; in Trove.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#analyse-rates-of-ocr-correction&#34;&gt;Analyse rates of OCR correction&lt;/a&gt;&lt;/strong&gt; – Some more meta-analysis of the Trove corpus itself, this time focusing on patterns of OCR correction by Trove users.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#finding-non-english-newspapers-in-trove&#34;&gt;Identifying non-English language newspapers in Trove&lt;/a&gt;&lt;/strong&gt; – There are a growing number of non-English language newspapers digitised in Trove. However, if you&amp;rsquo;re only searching using English keywords, you might never know that they&amp;rsquo;re there. This notebook analyses a sample of articles from every newspaper in Trove to identify non-English content.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#beyond-the-copyright-cliff-of-death&#34;&gt;Beyond the copyright cliff of death&lt;/a&gt;&lt;/strong&gt; – Most of the newspaper articles on Trove were published before 1955, but there are some from the later period. This notebook helps you find out how many, and which newspapers they were published in.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#map-trove-newspaper-results-by-state&#34;&gt;Map Trove newspaper results by state&lt;/a&gt;&lt;/strong&gt; – This notebook uses the Trove &lt;code&gt;state&lt;/code&gt; facet to create a choropleth map that visualises the number of search results per state.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#map-trove-newspaper-results-by-place-of-publication&#34;&gt;Map Trove newspaper results by place of publication&lt;/a&gt;&lt;/strong&gt; – This notebook uses the Trove &lt;code&gt;title&lt;/code&gt; facet to find the number of results per newspaper, then merges the results with a dataset of geolocated newspapers to map where articles were published.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/web-archives/#compare-two-versions-of-an-archived-web-page&#34;&gt;Compare two versions of an archived web page&lt;/a&gt;&lt;/strong&gt; – This notebook demonstrates a number of different ways of comparing versions of archived web pages. Just choose a repository, enter a url, and select two dates to see comparisons based on: page metadata, basic statistics such as file size and number of words, numbers of internal and external links, cosine similarity of text, line by line differences in text or code, and screenshots.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/web-archives/#display-changes-in-the-text-of-an-archived-web-page-over-time&#34;&gt;Display changes in the text of an archived web page over time&lt;/a&gt;&lt;/strong&gt; – This web app gathers all the available versions of a web page and then visualises changes in its content between versions – what’s been added, removed, and changed?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/web-archives/#using-screenshots-to-visualise-change-in-a-page-over-time&#34;&gt;Use screenshots to visualise change in a page over time&lt;/a&gt;&lt;/strong&gt;– Create a series of full page screenshots of a web page over time, then assemble them into a time series.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are also possibilities for using Trove data creatively. For example you can &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#create-scissors-and-paste-messages-from-trove-newspaper-articles&#34;&gt;create &amp;lsquo;scissors and paste&amp;rsquo; messages from Trove newspaper articles&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;documentation-and-examples&#34;&gt;Documentation and examples&lt;/h2&gt;
&lt;p&gt;All the Trove notebooks in the GLAM Workbench help document the possibilities and limits of the Trove API. The examples above can be modified and reworked to suit different research interests. Some notebooks also explore particular aspects of the API, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove/&#34;&gt;Trove API Introduction&lt;/a&gt;&lt;/strong&gt; – Some very basic examples of making requests and understanding results.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#todays-news-yesterday&#34;&gt;Today’s news yesterday&lt;/a&gt;&lt;/strong&gt; – Uses the &lt;code&gt;date&lt;/code&gt; index and the &lt;code&gt;firstpageseq&lt;/code&gt; parameter to find articles from exactly 100 years ago that were published on the front page. It then selects one of the articles at random and downloads and displays an image of the front page.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-images/#the-use-of-standard-licences-and-rights-statements-in-trove-image-records&#34;&gt;The use of standard licences and rights statements in Trove image records&lt;/a&gt;&lt;/strong&gt; – Version 2.1 of the Trove API introduced a new rights index that you can use to limit your search results to records that include one of a list of standard licences and rights statements. We can also use this index to build a picture of which rights statements are currently being used, and by who.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-random/&#34;&gt;Random items from Trove&lt;/a&gt;&lt;/strong&gt; – Changes to the Trove API meant that techniques you could previously use to select resources at random no longer work. This section documents some alternative ways of retrieving random-ish works and newspaper articles from Trove.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And while it’s not officially part of the GLAM Workbench, I also maintain the &lt;a href=&#34;https://troveconsole.herokuapp.com/&#34;&gt;Trove API Console&lt;/a&gt; which provides lots of examples of the API in action.&lt;/p&gt;
&lt;h2 id=&#34;videos&#34;&gt;Videos&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve started making videos to help you get started with the GLAM Workbench.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://youtu.be/vdyKNowv9gw&#34;&gt;&lt;strong&gt;GLAM Workbench – use QueryPic to visualise searches in Trove&amp;rsquo;s digitised newspapers&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://youtu.be/J_LgNL2EM4M&#34;&gt;&lt;strong&gt;GLAM Workbench – use QueryPic to visualise searches in Trove&amp;rsquo;s digitised newspapers (part 2)&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://youtu.be/WKFuJR6lLF4&#34;&gt;&lt;strong&gt;GLAM Workbench – using the Trove Newspaper &amp;amp; Gazette Harvester (the web app version)&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;datasets&#34;&gt;Datasets&lt;/h2&gt;
&lt;p&gt;A number of pre-harvested datasets are noted above in the &amp;lsquo;Getting and moving data&amp;rsquo; section. Here&amp;rsquo;s a fairly complete list of ready-to-download datasets harvested from Trove.&lt;/p&gt;
&lt;h3 id=&#34;newspapers&#34;&gt;Newspapers&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.6471544&#34;&gt;Trove – newspaper totals (historical datasets harvested between 2011 and 2022)&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals&#34;&gt;Trove – current newspaper totals (harvested weekly since April 2022)&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#total-number-of-issues-per-year-for-every-newspaper-in-trove&#34;&gt;Trove – Total number of issues per year for every newspaper in Trove&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#complete-list-of-issues-for-every-newspaper-in-trove&#34;&gt;Trove – Complete list of issues for every newspaper in Trove&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#csv-formatted-lists-of-newspaper-titles-in-trove&#34;&gt;Trove – data from web archives showing when digitised newspaper titles were added to Trove&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://docs.google.com/spreadsheets/d/1rURriHBSf3MocI8wsdl1114t0YeyU0BVSXWeg232MZs/edit?usp=sharing&#34;&gt;Trove – geolocated newspaper titles&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#trove-newspapers-with-articles-published-after-1954&#34;&gt;Trove – newspapers with articles published after 1954&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#trove-newspapers-with-non-english-language-content&#34;&gt;Trove – newspapers with non-English language content&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://doi.org/10.6084/m9.figshare.1439432.v1&#34;&gt;Trove – faces extracted from Trove newspaper photographs&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#csv-formatted-list-of-australian-womens-weekly-issues-1933-to-1982&#34;&gt;Trove – Australian Women&amp;rsquo;s Weekly issues, 1933 to 1982&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#australian-womens-weekly-front-covers-1933-to-1982&#34;&gt;Trove – Australian Women&amp;rsquo;s Weekly front covers, 1933 to 1982&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;books-and-periodicals&#34;&gt;Books and periodicals&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-books/#csv-formatted-list-of-books-available-in-digital-form&#34;&gt;Trove – books available in digital form&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.github.io/trove-journals/#csv-formatted-list-of-journals-available-from-trove-in-digital-form&#34;&gt;Trove – periodicals available from in digital form&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.github.io/trove-journals/#csv-formatted-list-of-journals-with-ocrd-text&#34;&gt;Trove – periodicals in Trove with OCRd text&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-books/#government-publications-in-digital-form&#34;&gt;Trove – government publications available in digital form&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.github.io/trove-books/#ocrd-text-from-trove-books-and-ephemera&#34;&gt;Trove – OCRd text from digitised books (and ephemera)&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.github.io/trove-journals/#ocrd-text-from-trove-digitised-journals&#34;&gt;Trove – OCRd text of digitised journals&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.github.io/trove-books/#ocrd-text-from-the-internet-archive-of-australian-books-listed-in-trove&#34;&gt;Trove &amp;amp; Internet Archive – OCRd text from the Internet Archive of &amp;lsquo;Australian&amp;rsquo; books listed in Trove&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/#politicians-talking-about-covid&#34;&gt;Trove &amp;amp; Parliamentary Library – parliamentary press releases relating to COVID-19&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.github.io/trove-journals/#politicians-talking-about-immigrants-and-refugees&#34;&gt;Trove &amp;amp; Parliamentary Library – parliamentary press releases relating to immigrants and refugees&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.github.io/trove-journals/#editorial-cartoons-from-the-bulletin-1886-to-1952&#34;&gt;Trove – editorial cartoons from The Bulletin, 1886 to 1952&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;other&#34;&gt;Other&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.github.io/trove-lists/#trove-lists-metadata&#34;&gt;Trove – public lists metadata&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-lists/#trove-tag-counts&#34;&gt;Trove – public tag counts&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.5094314&#34;&gt;Trove – public tags added to resources, 2008 to 2021&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-music/#abc-radio-national-programs&#34;&gt;Trove – Radio National program data&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.github.io/trove-maps/#csv-formatted-list-of-maps-with-high-resolution-downloads&#34;&gt;Trove – maps with high-resolution downloads&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See also &lt;a href=&#34;https://glam-workbench.net/glam-data-list/&#34;&gt;Sources of Australian GLAM data&lt;/a&gt; in the GLAM Workbench.&lt;/p&gt;
&lt;h2 id=&#34;software&#34;&gt;Software&lt;/h2&gt;
&lt;p&gt;The GLAM Workbench makes use of a number of software packages that I&amp;rsquo;ve created in Python to work with Trove data. These are openly-licensed and available for installation from PyPi.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://pypi.org/project/troveharvester/&#34;&gt;&lt;strong&gt;Trove Harvester&lt;/strong&gt;&lt;/a&gt; – harvest newspaper and gazette articles&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://pypi.org/project/trove-query-parser/&#34;&gt;Trove Query Parser&lt;/a&gt;&lt;/strong&gt; – convert search parameters from the Trove web interface into a form the API understands&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://pypi.org/project/trove-newspaper-images/&#34;&gt;&lt;strong&gt;Trove Newspaper Images&lt;/strong&gt;&lt;/a&gt; – tools for downloading images from Trove&amp;rsquo;s digitised newspapers and gazettes&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;other-tools-and-interfaces&#34;&gt;Other tools and interfaces&lt;/h2&gt;
&lt;p&gt;Over the years I&amp;rsquo;ve developed many tools and interfaces using Trove data. Some have been replaced by the GLAM Workbench, but others keep chugging along, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://trove-titles.herokuapp.com/&#34;&gt;Explore Trove&amp;rsquo;s Digitised Journals&lt;/a&gt;&lt;/strong&gt; – a tool to help you browse, search, and explore Trove&amp;rsquo;s digitised journals&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://headlineroulette.net/&#34;&gt;Headline Roulette&lt;/a&gt;&lt;/strong&gt; – a simple, customisable game using Trove newspapers&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://inaword.herokuapp.com/&#34;&gt;In a word&amp;hellip;&lt;/a&gt;&lt;/strong&gt; – currents in Australian affairs, 2003–2013, using data from ABC Radio National&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://troveconsole.herokuapp.com/&#34;&gt;Trove API Console&lt;/a&gt;&lt;/strong&gt; – use this to learn how to construct API queries&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://www.timsherratt.org/shed/trovetraces/traces/index.html&#34;&gt;Trove Traces&lt;/a&gt;&lt;/strong&gt; – archived version of a 2014 experiment to see who was citing Trove newspapers&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://twitter.com/trovenewsbot&#34;&gt;&lt;strong&gt;@TroveNewsBot&lt;/strong&gt;&lt;/a&gt; – unlike most GLAM Twitter bots, @TroveNewsBot doesn&amp;rsquo;t just tweet random stuff, you can use it to search Trove from Twitter, see the &lt;a href=&#34;https://wragge.github.io/trovenewsbot2019/&#34;&gt;full operating instructions&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://troveplaces.herokuapp.com/&#34;&gt;&lt;strong&gt;Trove Places&lt;/strong&gt;&lt;/a&gt; – click on a map to find Trove newspapers by place&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://ozglam.chat/t/capturing-trove-newspaper-articles-with-zotero/21&#34;&gt;&lt;strong&gt;Trove Zotero Translator&lt;/strong&gt;&lt;/a&gt; – lets you capture metadata, OCRd text, and a PDF from an article in Trove newspapers, installed as part of &lt;a href=&#34;https://www.zotero.org/&#34;&gt;Zotero&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://ozglam.chat/t/easy-browsing-of-trove-newspapers-with-these-keyboard-shortcuts/105&#34;&gt;Keyboard shortcuts for Trove newspapers&lt;/a&gt;&lt;/strong&gt; – this userscript activates your arrow keys to help you navigate newspapers by page and issue.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See also &lt;a href=&#34;https://glam-workbench.net/glam-tools-interfaces/&#34;&gt;More GLAM tools and interfaces&lt;/a&gt; in the GLAM Workbench. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2022/04/30/and-so-it.html</link>
      <pubDate>Sat, 30 Apr 2022 16:39:07 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/04/30/and-so-it.html</guid>
      <description>&lt;p&gt;And so it starts&amp;hellip; #GLAMWorkbench&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2022/94f142b02c.png&#34; width=&#34;600&#34; height=&#34;215&#34; alt=&#34;Screenshot of GLAM Workbook welcome page. Text states: &#39;This is a companion to the GLAM Workbench. Here you&#39;ll documentation, tips, tutorials, and exercises to help you work with digital collections from galleries, libraries, archives, and museums (the GLAM sector).&#39;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2022/04/28/ok-ive-created.html</link>
      <pubDate>Thu, 28 Apr 2022 23:49:31 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/04/28/ok-ive-created.html</guid>
      <description>&lt;p&gt;Ok, I&amp;rsquo;ve created a new #GLAMWorkbench meta issue to try and bring together all the things I&amp;rsquo;m trying to do to improve &amp;amp; automate the code &amp;amp; documentation. This should help me keep track of things&amp;hellip; &lt;a href=&#34;https://github.com/GLAM-Workbench/glam-workbench.github.io/issues/53&#34;&gt;github.com/GLAM-Work&amp;hellip;&lt;/a&gt; #DayofDH2022&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2022/04/28/a-couple-of.html</link>
      <pubDate>Thu, 28 Apr 2022 22:33:41 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/04/28/a-couple-of.html</guid>
      <description>&lt;p&gt;A couple of hours of #DayofDH2022 left – feeling a bit uninspired, so I&amp;rsquo;m going to do some pruning &amp;amp; reorganising of the #GLAMWorkbench issues list: &lt;a href=&#34;https://github.com/GLAM-Workbench/glam-workbench.github.io/issues&#34;&gt;github.com/GLAM-Work&amp;hellip;&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Tracking Trove changes over time</title>
      <link>https://updates.timsherratt.org/2022/04/20/tracking-trove-changes.html</link>
      <pubDate>Wed, 20 Apr 2022 15:49:46 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/04/20/tracking-trove-changes.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve been doing a bit of cleaning up, trying to make some old datasets more easily available. In particular I&amp;rsquo;ve been pulling together harvests of the number of newspaper articles in Trove by year and state. My &lt;a href=&#34;https://timsherratt.org/shed/trove/graphs/&#34;&gt;first harvests&lt;/a&gt; date all the way back to 2011, before there was even a Trove API. Unfortunately, I didn&amp;rsquo;t run the harvests as often as I should&amp;rsquo;ve and there are some big gaps. Nonetheless, if you&amp;rsquo;re interested in how Trove&amp;rsquo;s newspaper corpus has grown and changed over time, you might find them useful. They&amp;rsquo;re available in &lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals-historical&#34;&gt;this repository&lt;/a&gt; and also &lt;a href=&#34;https://doi.org/10.5281/zenodo.6471544&#34;&gt;in Zenodo&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2022/2c1e7dd328.png&#34; alt=&#34;Chart showing number of newspaper articles per year available in Trove – harvested multiple times from 2011 to 2022&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This chart shows how the number of newspaper articles per year in Trove has changed from 2011 to 2022. Note the rapid growth between 2011 and 2015.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;To try and make sure that there&amp;rsquo;s a more consistent record from now on, I&amp;rsquo;ve also created a new Git Scraper – &lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals&#34;&gt;a GitHub repository&lt;/a&gt; that automatically harvests and saves data at weekly intervals. As well as the number of articles by &lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals/blob/master/data/total_articles_by_year.csv&#34;&gt;year&lt;/a&gt; and &lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals/blob/master/data/total_articles_by_year_and_state.csv&#34;&gt;state&lt;/a&gt;, it also harvests the number of articles by &lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals/blob/master/data/total_articles_by_newspaper.csv&#34;&gt;newspaper&lt;/a&gt; and &lt;a href=&#34;https://github.com/wragge/trove-newspaper-totals/blob/master/data/total_articles_by_category.csv&#34;&gt;category&lt;/a&gt;. As mentioned, these four datasets are updated weekly. If you want to get all the changes over time, you can retrieve earlier versions from the repository&amp;rsquo;s commit history.&lt;/p&gt;
&lt;p&gt;All the datasets are CC-0 licensed and validated with Frictionless.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s also a &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#visualise-the-total-number-of-newspaper-articles-in-trove-by-year-and-state&#34;&gt;notebook in the GLAM Workbench&lt;/a&gt; that explores this sort of data.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>The GLAM Workbench wants you!</title>
      <link>https://updates.timsherratt.org/2022/03/02/the-glam-workbench.html</link>
      <pubDate>Wed, 02 Mar 2022 15:52:20 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/03/02/the-glam-workbench.html</guid>
      <description>&lt;p&gt;Over the past few months I&amp;rsquo;ve been doing a lot of behind-the-scenes work on the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; – &lt;a href=&#34;https://updates.timsherratt.org/2021/12/01/digitalnz-te-papa.html&#34;&gt;automating&lt;/a&gt;, &lt;a href=&#34;https://updates.timsherratt.org/2022/01/28/testing-testing.html&#34;&gt;standardising&lt;/a&gt;, and documenting processes for developing and managing repositories. These sort of things ease the maintenance burden on me and help make the GLAM Workbench &lt;a href=&#34;https://glam-workbench.net/about/#is-the-glam-workbench-sustainable&#34;&gt;sustainable&lt;/a&gt;, even as it continues to grow. But these changes are also aimed at making it easier for &lt;strong&gt;you&lt;/strong&gt; to contribute to the GLAM Workbench!&lt;/p&gt;
&lt;p&gt;Perhaps you&amp;rsquo;re part of a GLAM organisation that wants to help researchers explore its collection data – why not create your own section of the GLAM Workbench? It would be a great opportunity for staff to develop their own digital skills and learn about the possibilities of Jupyter notebooks. I&amp;rsquo;ve developed a &lt;a href=&#34;https://github.com/GLAM-Workbench/glam-workbench-template&#34;&gt;repository template&lt;/a&gt; and some &lt;a href=&#34;https://glam-workbench.net/get-involved/developing-repositories/&#34;&gt;detailed documentation&lt;/a&gt; to get you started. The repository template includes everything you need to create and test notebooks, as well as built-in integration with Binder, Docker, Reclaim Cloud, and Zenodo. And, of course, I&amp;rsquo;ll be around to help you through the process.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2022/a46891dbeb.png&#34; alt=&#34;Screenshot of documentation&#34;&gt;&lt;/p&gt;
&lt;p&gt;Or perhaps you&amp;rsquo;re a researcher who wants to &lt;a href=&#34;https://glam-workbench.net/get-involved/contribute-code/&#34;&gt;share some code you&amp;rsquo;ve developed&lt;/a&gt; that extends or improves an existing GLAM Workbench repository. Yes please!  Or maybe you&amp;rsquo;re a GLAM Workbench user who has &lt;a href=&#34;https://glam-workbench.net/get-involved/add-links/&#34;&gt;something to add&lt;/a&gt; to one of the lists of resources; or you&amp;rsquo;ve noticed a problem with some of the documentation that &lt;a href=&#34;https://glam-workbench.net/get-involved/editing-documentation/&#34;&gt;you&amp;rsquo;d like to fix&lt;/a&gt;. All contributions welcome!&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/get-involved/&#34;&gt;Get involved!&lt;/a&gt; page includes links to all this information, as well as some other possibilities such as becoming a &lt;a href=&#34;https://glam-workbench.net/get-involved/supporters/&#34;&gt;sponsor&lt;/a&gt;, or sharing &lt;a href=&#34;https://updates.timsherratt.org/categories/glamworkbench/&#34;&gt;news&lt;/a&gt;. And to recognise those who make a contribution to the code or documentation there&amp;rsquo;s also a brand new &lt;a href=&#34;https://glam-workbench.net/contributors/&#34;&gt;contributors&lt;/a&gt; page.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m looking forward to exploring how we can build the GLAM Workbench together. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Omeka S Tools – new Python package</title>
      <link>https://updates.timsherratt.org/2022/02/17/omeka-s-tools.html</link>
      <pubDate>Thu, 17 Feb 2022 10:31:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/02/17/omeka-s-tools.html</guid>
      <description>&lt;p&gt;Over the last couple of years I&#39;ve been fiddling with bits of Python code to work with the &lt;a href=&#34;https://omeka.org/s/&#34;&gt;Omeka S&lt;/a&gt; REST API. The &lt;a href=&#34;https://omeka.org/s/docs/developer/api/rest_api/&#34;&gt;Omeka S API&lt;/a&gt; is powerful, but the documentation is patchy, and doing basic things like uploading images can seem quite confusing. My code was an attempt to simplify common tasks, like creating new items.&lt;/p&gt;
&lt;p&gt;In case it&#39;s of use to others, I&#39;ve now &lt;a href=&#34;https://github.com/wragge/omeka_s_tools&#34;&gt;shared my code as a Python package&lt;/a&gt;. So you can just `pip install omeka-s-tools` to get started. The code helps you:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;download lists of resources&lt;/li&gt;
&lt;li&gt;search and filter lists of items&lt;/li&gt;
&lt;li&gt;create new items&lt;/li&gt;
&lt;li&gt;create new items based on a resource template&lt;/li&gt;
&lt;li&gt;update and delete resources&lt;/li&gt;
&lt;li&gt;add media to items&lt;/li&gt;
&lt;li&gt;add map markers to items (assuming the &lt;a href=&#34;https://omeka.org/s/modules/Mapping/&#34;&gt;Mapping&lt;/a&gt; module is installed)&lt;/li&gt;
&lt;li&gt;upload templates exported from one Omeka instance to a new instance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There&#39;s quite &lt;a href=&#34;https://wragge.github.io/omeka_s_tools/&#34;&gt;detailed documentation&lt;/a&gt; available, including an example of adding a newspaper article from Trove to Omeka. If you want to see the code in action, there&#39;s also a &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#upload-trove-newspaper-articles-to-omeka-s&#34;&gt;notebook&lt;/a&gt; in the &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/&#34;&gt;Trove newspapers&lt;/a&gt; section of the GLAM Workbench that uploads newspaper articles (including images and OCRd text) to Omeka from a variety of sources, including Trove searches, Trove lists, and Zotero libraries.&lt;/p&gt;
&lt;p&gt;
  If you find any problems, or would like additional features, feel free to &lt;a href=&#34;https://github.com/wragge/omeka_s_tools/issues&#34;&gt;create an issue&lt;/a&gt; in the GitHub repository. #dhhacks
  &lt;br /&gt;
&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Testing, testing...</title>
      <link>https://updates.timsherratt.org/2022/01/28/testing-testing.html</link>
      <pubDate>Fri, 28 Jan 2022 15:01:15 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2022/01/28/testing-testing.html</guid>
      <description>&lt;p&gt;I regularly update the Python packages used in the different sections of the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt;; though probably not as often as I should. Part of the problem is that once I&#39;ve updated the packages, I have to run all the notebooks to make sure I haven&#39;t inadvertently broken something -- and this takes time. And in those cases where the notebooks need an API key to run, I have to copy and paste the key in at the appropriate spots, then remember to delete them afterwords. They&#39;re little niggles, but they add up, particularly as the GLAM Workbench itself expands.&lt;/p&gt;
&lt;p&gt;I&#39;ve been looking around at Jupyter notebook automated testing options for a while. There&#39;s &lt;a href=&#34;https://github.com/treebeardtech/nbmake&#34;&gt;nbmake&lt;/a&gt;, &lt;a href=&#34;https://github.com/nteract/testbook&#34;&gt;testbook&lt;/a&gt;, and &lt;a href=&#34;https://github.com/computationalmodelling/nbval&#34;&gt;nbval&lt;/a&gt;, as well as custom solutions involving things like &lt;a href=&#34;https://github.com/nteract/papermill&#34;&gt;papermill&lt;/a&gt; and &lt;a href=&#34;https://github.com/jupyter/nbconvert&#34;&gt;nbconvert&lt;/a&gt;. After much wavering, I finally decided to give `nbval` a go. The thing that I like about `nbval` is that I can start simple, then increase the complexity of my testing as required. The `--nbval-lax` option just checks to make sure that all the cells in a notebook run without generating exceptions. You can also tag individual cells that you want to exclude from testing. This gives me a testing baseline -- this notebook runs without errors -- it might not do exactly what I think it&#39;s doing, but at least it&#39;s not exploding in flames. Working from this baseline, I can start tagging individual cells where I want the output of the cell to be checked. This will let me test whether a cell is doing what it&#39;s supposed to.&lt;/p&gt;
&lt;p&gt;This approach means that I can start testing without making major changes to existing notebooks. The main thing I had to think about is how to handle API keys or other variables which are manually set by users. I decided the easiest approach was to store them in a `.env` file and use &lt;a href=&#34;https://github.com/theskumar/python-dotenv&#34;&gt;dotenv&lt;/a&gt; to load them within the notebook. This also makes it easy for users to save their own credentials and use them across multiple notebooks -- no more cutting and pasting of keys! Some notebooks are designed to run as web apps using Voila, so they expect human interaction. In these cases, I added extra cells that only run in the testing environment -- they populate the necessary fields and simulate button clicks to start.&lt;/p&gt;
&lt;p&gt;While I was in a QA frame of mind, I also started playing with &lt;a href=&#34;https://github.com/nbQA-dev/nbQA&#34;&gt;nbqa&lt;/a&gt; -- a framework for all sorts of code formatting, linting, and checking tools. I decided I&#39;d try to standardise the formatting of my notebook code by running &lt;a href=&#34;https://github.com/PyCQA/isort&#34;&gt;isort&lt;/a&gt;, &lt;a href=&#34;https://github.com/psf/black&#34;&gt;black&lt;/a&gt;, and &lt;a href=&#34;https://flake8.pycqa.org/en/latest/&#34;&gt;flake8&lt;/a&gt;. As well ask making the code cleaner and more readable, they pick up things like unused imports or variables. To further automate this process, I configured the `nbqa` checks to run when I try to commit any notebook code changes using `git`. This was made easy by the &lt;a href=&#34;https://pre-commit.com/&#34;&gt;pre-commit&lt;/a&gt; package.&lt;/p&gt;
&lt;p&gt;This is all set up and running in the &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/&#34;&gt;Trove newspapers&lt;/a&gt; repository -- you can &lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers/releases/tag/v1.3.0&#34;&gt;see the changes here&lt;/a&gt;. Now if I update the Python packages or make any other changes to the repository, I can just run `pytest --nbval-lax` to test &lt;b&gt;every&lt;/b&gt; notebook at once. And if I make changes to an individual notebook, `nbqa` will automatically give the changes a code quality check before I save them to the repository. I&#39;m planning to roll these changes out across the whole of the GLAM Workbench in coming months.&lt;/p&gt;
&lt;p&gt;
  Developments like these are not very exciting for users, but they&#39;re important for the management and sustainability of the GLAM Workbench, and help create a solid foundation for future development and collaboration. Last year I created a &lt;a href=&#34;https://updates.timsherratt.org/2021/11/11/a-template-for.html&#34;&gt;GLAM Workbench repository template&lt;/a&gt; to help people or organisations thinking about contributing new sections. I can now add these testing and QA steps into the template to further share and standardise the work of developing the GLAM Workbench.
  &lt;br /&gt;
&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Some big pictures of newspapers in Trove and DigitalNZ</title>
      <link>https://updates.timsherratt.org/2021/12/09/some-big-pictures.html</link>
      <pubDate>Thu, 09 Dec 2021 10:44:24 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/12/09/some-big-pictures.html</guid>
      <description>&lt;p&gt;One of the things I really like about Jupyter is the fact that I can share notebooks in a variety of different formats. Tools like &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#querypic&#34;&gt;QueryPic&lt;/a&gt; can run as simple web apps using Voila, static versions of notebooks can be viewed using NBViewer, and live versions can be spun up as required on Binder. It’s also possible to export notebooks at PDFs, slideshows, or just plain-old HTML pages. Just recently I realised I could export notebooks to HTML using the same template I use for Voila. This gives me &lt;em&gt;another&lt;/em&gt; way of sharing – static web pages delivered via the main &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; site.&lt;/p&gt;
&lt;p&gt;Here’s a couple of examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/digitalnz-views/papers_past_newspapers.html&#34;&gt;Papers Past newspapers in DigitalNZ&lt;/a&gt; – showing which Papers Past newspapers are available through DigitalNZ&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/examples/trove_newspaper_issues_per_day.html&#34;&gt;Trove newspapers – number of issues per day, 1803–2020&lt;/a&gt; – the number of newspaper issues published &lt;strong&gt;every day&lt;/strong&gt; in Trove&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/83610c54c6.png&#34; alt=&#34;&#34; title=&#34;Chart showing the number of newspaper issues in Trove published on each day in 1834.&#34;&gt;&lt;/p&gt;
&lt;p&gt;Both are HTML pages that embed visualisations created using &lt;a href=&#34;https://altair-viz.github.io/&#34;&gt;Altair&lt;/a&gt;. The visualisations are rendered using javascript, and even though the notebook isn’t running in a live computing environment, there’s some basic interactivity built-in – for example, you can hover for more details, and click on the DigitalNZ chart to search for articles from a newspaper. More to come! #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Exploring GLAM data at ResBaz</title>
      <link>https://updates.timsherratt.org/2021/12/09/exploring-glam-data.html</link>
      <pubDate>Thu, 09 Dec 2021 10:13:48 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/12/09/exploring-glam-data.html</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://vimeo.com/652696272&#34;&gt;video&lt;/a&gt; of my key story presentation at &lt;a href=&#34;https://resbaz.github.io/resbaz2021qld/&#34;&gt;ResBaz Queensland&lt;/a&gt; (simulcast via &lt;a href=&#34;https://resbaz.github.io/resbaz2021/sydney/&#34;&gt;ResBaz Sydney&lt;/a&gt;) is now available on Vimeo. In it, I explore some of the possibilities of GLAM data by retracing my own journey through WWI service records, &lt;a href=&#34;https://www.realfaceofwhiteaustralia.net/&#34;&gt;The Real Face of White Australia&lt;/a&gt;, #redactionart, and Trove – ending up at the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt;, which brings together a lot of my tools and resources in a form that anyone can use. The &lt;a href=&#34;https://slides.com/wragge/resbaz-2021&#34;&gt;slides&lt;/a&gt; are also available, and there’s an &lt;a href=&#34;https://doi.org/10.5281/zenodo.5760042&#34;&gt;archived version&lt;/a&gt; of everything in Zenodo.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://vimeo.com/652696272&#34;&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/34f5c3763d.png&#34; alt=&#34;&#34; title=&#34;Screencap of video showing #redactionart.&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This and many other presentations about the GLAM Workbench are &lt;a href=&#34;https://glam-workbench.net/presentations/&#34;&gt;listed here&lt;/a&gt;. It seems I’ve given at least 11 talks and workshops this year! #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>GLAM Workbench Nectar Cloud Application updated!</title>
      <link>https://updates.timsherratt.org/2021/12/01/glam-workbench-nectar.html</link>
      <pubDate>Wed, 01 Dec 2021 11:20:53 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/12/01/glam-workbench-nectar.html</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://updates.timsherratt.org/2021/12/01/digitalnz-te-papa.html&#34;&gt;newly-updated&lt;/a&gt; &lt;a href=&#34;https://glam-workbench.net/digitalnz/&#34;&gt;DigitalNZ&lt;/a&gt; and &lt;a href=&#34;https://glam-workbench.net/tepapa/&#34;&gt;Te Papa&lt;/a&gt; sections of the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench &lt;/a&gt;have been added to the list of available repositories in the &lt;a href=&#34;https://dashboard.rc.nectar.org.au/&#34;&gt;Nectar Research Cloud&lt;/a&gt;’s GLAM Workbench Application. This means you can create your very own version of these repositories running in the Nectar Cloud, simply by choosing them from the app’s dropdown list. See the &lt;a href=&#34;https://glam-workbench.net/using-nectar/&#34;&gt;Using Nectar&lt;/a&gt; help page for more information.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/ad2303f5e6.png&#34; alt=&#34;&#34; title=&#34;Screenshot of Nectar app showing dropdown list&#34;&gt;&lt;/p&gt;
&lt;p&gt;I’ve also taken the opportunity to make use of the new container registry service developed by the ARDC as part of the &lt;a href=&#34;https://arcos.ardc.edu.au/&#34;&gt;ARCOS&lt;/a&gt; project. The app now pulls the GLAM Workbench Docker images from &lt;a href=&#34;https://quay.io/organization/glamworkbench&#34;&gt;Quay.io&lt;/a&gt; via the container registry’s cache. This means that copies of the images are cached locally, speeding things up and saving on data transfers. Yay for integration!&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/cb4e1280cb.png&#34; alt=&#34;&#34; title=&#34;Screenshot of container registry interface showing cahced images&#34;&gt;&lt;/p&gt;
&lt;p&gt;Thanks again to Andy and the Nectar Cloud staff for their help! #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>DigitalNZ &amp; Te Papa sections of the GLAMWorkbench updated!</title>
      <link>https://updates.timsherratt.org/2021/12/01/digitalnz-te-papa.html</link>
      <pubDate>Wed, 01 Dec 2021 10:41:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/12/01/digitalnz-te-papa.html</guid>
      <description>&lt;p&gt;In preparation for my talk at &lt;a href=&#34;https://resbaz.auckland.ac.nz/&#34;&gt;ResBaz Aotearoa&lt;/a&gt;, I updated the &lt;a href=&#34;https://glam-workbench.net/digitalnz/&#34;&gt;DigitalNZ&lt;/a&gt; and &lt;a href=&#34;https://glam-workbench.net/tepapa/&#34;&gt;Te Papa&lt;/a&gt; sections of the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt;. Most of the changes are related to management, maintenance, and integration of the repositories. Things like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Setting up GitHub actions to automatically generate Docker images when the repositories change, and to upload the images to the &lt;a href=&#34;https://quay.io/organization/glamworkbench&#34;&gt;Quay.io container registry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Automatic generation of an &lt;code&gt;index.ipynb&lt;/code&gt; file based on &lt;code&gt;README.md&lt;/code&gt; to act as a front page within Jupyter Lab&lt;/li&gt;
&lt;li&gt;Addition of a &lt;code&gt;reclaim-manifest.jps&lt;/code&gt; file to allow for one-click installation of the repository in &lt;a href=&#34;https://reclaim.cloud/&#34;&gt;Reclaim Cloud&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Additional documentation in &lt;code&gt;README.md&lt;/code&gt; with instructions on how to run the repository via &lt;a href=&#34;https://mybinder.org/&#34;&gt;Binder&lt;/a&gt;, &lt;a href=&#34;https://reclaim.cloud/&#34;&gt;Reclaim Cloud&lt;/a&gt;, &lt;a href=&#34;https://dashboard.rc.nectar.org.au/&#34;&gt;Nectar Research Cloud&lt;/a&gt;, and Docker Desktop.&lt;/li&gt;
&lt;li&gt;Addition of a &lt;code&gt;.zenodo.json&lt;/code&gt; metadata file so that new releases are preserved in Zenodo&lt;/li&gt;
&lt;li&gt;Switch to using &lt;code&gt;pip-tools&lt;/code&gt; for generating &lt;code&gt;requirements.txt&lt;/code&gt; files, and the include unpinned requirements in &lt;code&gt;requirements.in&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Update of all Python packages&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From the user’s point of view, the main benefit of these changes is the ability to run the repositories in a variety of different environments depending on your needs and skills. The Docker images, generated using &lt;a href=&#34;https://github.com/jupyterhub/repo2docker&#34;&gt;repo2Docker&lt;/a&gt;, are used by Binder, Reclaim Cloud, Nectar, and Docker Desktop. Same image, multiple environments! See ‘Run these notebooks’ in the &lt;a href=&#34;https://glam-workbench.net/digitalnz/&#34;&gt;DigitalNZ&lt;/a&gt; and &lt;a href=&#34;https://glam-workbench.net/tepapa/&#34;&gt;Te Papa&lt;/a&gt; sections of the GLAM Workbench for more information.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/d40a7d95cb.png&#34; alt=&#34;&#34; title=&#34;Screenshot of visualisation displaying information about Papers Past newspapers available through DigitalNZ.&#34;&gt;&lt;/p&gt;
&lt;p&gt;Of course, I’ve also re-run all of the notebooks to make sure everything works and to update any statistics, visualisations, and datasets. As a bonus, there’s a couple of new notebooks in the DigitalNZ repository:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/digitalnz/#querypic-digitalnz&#34;&gt;QueryPic DigitalNZ&lt;/a&gt; – a web app to visualise searches in Papers Past over time (I’ll post some further info about this shortly)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/digitalnz/#papers-past-newspapers-in-digitalnz&#34;&gt;Papers Past newspapers in DigitalNZ&lt;/a&gt; – displays details of the Papers Past newspapers available through DigitalNZ (you can &lt;a href=&#34;https://glam-workbench.net/digitalnz-views/papers_past_newspapers.html&#34;&gt;view the results&lt;/a&gt; as an HTML page)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;#dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>A template for GLAM Workbench development</title>
      <link>https://updates.timsherratt.org/2021/11/11/a-template-for.html</link>
      <pubDate>Thu, 11 Nov 2021 17:03:43 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/11/11/a-template-for.html</guid>
      <description>&lt;p&gt;I’m hoping that the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; will encourage GLAM organisations and GLAM data nerds (like me) to create their own Jupyter notebooks. If they do, they can put a link to them in the list of &lt;a href=&#34;https://glam-workbench.net/more-glam-notebooks/&#34;&gt;GLAM Jupyter resources&lt;/a&gt;. But what if they want to add the notebooks to the GLAM Workbench itself?&lt;/p&gt;
&lt;p&gt;To make this easier, I’ve been working on &lt;a href=&#34;https://github.com/GLAM-Workbench/glam-workbench-template&#34;&gt;a template repository for the GLAM Workbench&lt;/a&gt;. It generates a new skeleton repository with all the files you need to develop and manage your own section of the GLAM Workbench. It uses GitHub’s built in templating feature, together with &lt;a href=&#34;https://cookiecutter.readthedocs.io/en/1.7.2/&#34;&gt;Cookiecutter&lt;/a&gt; , and this &lt;a href=&#34;https://github.com/stefanbuck/cookiecutter-template/blob/main/.github/workflows/setup-repository.yml&#34;&gt;GitHub Action&lt;/a&gt; by Stefan Buck. Stefan has also written a &lt;a href=&#34;https://stefanbuck.com/blog/repository-templates-meets-github-actions&#34;&gt;very helpful blog post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The new repository is configured to do various things automatically, such as generate and save Docker images, and integrate with Reclaim Cloud and Zenodo. Lurking inside the &lt;code&gt;dev&lt;/code&gt; folder of each new repository, you’ll find &lt;a href=&#34;https://github.com/GLAM-Workbench/glam-workbench-template/tree/master/%7B%7Bcookiecutter.temp_directory%7D%7D/dev#readme&#34;&gt;some basic details&lt;/a&gt; on how to set up and manage your development environment.&lt;/p&gt;
&lt;p&gt;This is just the first step. There’s more documentation to come, but you’re very welcome to try it out. And, of course, if you &lt;em&gt;are&lt;/em&gt; interested in contributing to the development of the GLAM Workbench, let me know and I’ll help get you set up!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Coming up! GLAM Workbench at ResBaz(s)</title>
      <link>https://updates.timsherratt.org/2021/11/04/coming-up-glam.html</link>
      <pubDate>Thu, 04 Nov 2021 15:28:30 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/11/04/coming-up-glam.html</guid>
      <description>&lt;p&gt;Want a bit of added GLAM with your digital research skills? You’re in luck, as I’ll be speaking at not one, but three ResBaz events in November. If you haven’t heard of it before, ResBaz (Research Bazaar) is ‘a worldwide festival promoting the digital literacy at the centre of modern research’.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;On Wednesday, 24 November I’ll be giving a key story presentation (like a keynote, but with more story!) entitled &lt;a href=&#34;https://resbaz.github.io/resbaz2021qld/schedule/#session-100&#34;&gt;Exploring GLAM data&lt;/a&gt; for &lt;a href=&#34;https://resbaz.github.io/resbaz2021qld/&#34;&gt;ResBaz Queensland&lt;/a&gt;. This presentation will also be simulcast through &lt;a href=&#34;https://resbaz.github.io/resbaz2021/sydney/&#34;&gt;ResBaz Sydney&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;On Thursday, 25 November I’ll be giving a presentation on the &lt;a href=&#34;https://resbaz.auckland.ac.nz/schedule/#session-1479&#34;&gt;GLAM Workbench&lt;/a&gt; for &lt;a href=&#34;https://resbaz.auckland.ac.nz/&#34;&gt;ResBaz Aotearoa&lt;/a&gt; – focusing in particular on NZ GLAM data.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The programs of all three ResBaz events are chock full of excellent opportunities to develop your digital skills, learn new research methods, and explore digital tools. If you’re an HDR student you should check out what’s on offer.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>New video – using the Trove Newspaper &amp; Gazette Harvester</title>
      <link>https://updates.timsherratt.org/2021/11/01/new-video-using.html</link>
      <pubDate>Mon, 01 Nov 2021 10:32:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/11/01/new-video-using.html</guid>
      <description>&lt;p&gt;The latest help video for the GLAM Workbench walks through the web app version of the &lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;Trove Newspaper &amp;amp; Gazette Harvester&lt;/a&gt;. Just paste in your search url and Trove API key and you can harvest thousands of digitised newspaper articles in minutes!&lt;/p&gt;
&lt;iframe width=&#34;100%&#34; height=&#34;400px&#34; src=&#34;https://www.youtube.com/embed/WKFuJR6lLF4&#34; title=&#34;YouTube video player&#34; frameborder=&#34;0&#34; allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&#34; allowfullscreen&gt;&lt;/iframe&gt;
</description>
    </item>
    
    <item>
      <title>Harvest newspaper issues as PDFs</title>
      <link>https://updates.timsherratt.org/2021/11/01/harvest-newspaper-issues.html</link>
      <pubDate>Mon, 01 Nov 2021 09:53:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/11/01/harvest-newspaper-issues.html</guid>
      <description>&lt;p&gt;An inquiry on Twitter prompted me to put together &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#harvest-the-issues-of-a-newspaper-as-pdfs&#34;&gt;a notebook&lt;/a&gt; that you can use to &lt;strong&gt;download all available issues of a newspaper as PDFs&lt;/strong&gt;. It was really just a matter of copying code from other tools and making a few modifications. The first step harvests a list of available issues for a particular newspaper from Trove. You can then download the PDFs of those issues, supplying an optional date range. Beware – this could consume a lot of disk space!&lt;/p&gt;
&lt;p&gt;The PDF file names have the following structure:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[newspaper identifier]-[issue date as YYYYMMDD]-[issue identifier].pdf
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;903-19320528-1791051.pdf
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;903&lt;/code&gt; – the &lt;a href=&#34;https://trove.nla.gov.au/newspaper/title/903&#34;&gt;Glen Innes Examiner&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;19320528&lt;/code&gt; – 28 May 1932&lt;/li&gt;
&lt;li&gt;&lt;code&gt;1791051&lt;/code&gt; – to view in Trove just add this to &lt;code&gt;http://nla.gov.au/nla.news-issue&lt;/code&gt;, eg &lt;a href=&#34;http://nla.gov.au/nla.news-issue1791051&#34;&gt;http://nla.gov.au/nla.news-issue1791051&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I also took the opportunity to create a new &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#harvesting-data&#34;&gt;Harvesting data&lt;/a&gt; heading in the &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/&#34;&gt;Trove newspapers&lt;/a&gt; section of the GLAM Workbench. #dhhacks&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/af229b4fa2.png&#34; alt=&#34;&#34; title=&#34;Screenshot of the Harvesting data section in Trove newspapers&#34;&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>GLAM Workbench now in the Nectar Research Cloud!</title>
      <link>https://updates.timsherratt.org/2021/10/21/glam-workbench-now.html</link>
      <pubDate>Thu, 21 Oct 2021 10:18:11 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/10/21/glam-workbench-now.html</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; isn’t dependent on one big piece of technological infrastructure. It’s basically a collection of Jupyter notebooks, and those notebooks can be used within a variety of different environments. This helps make the GLAM Workbench more sustainable – new components can be swapped in and out as required. It also makes it possible to create different pathways for users, depending on their digital skills, institutional support, and research needs. For example, links to &lt;a href=&#34;https://glam-workbench.net/using-binder/&#34;&gt;Binder&lt;/a&gt; make it easy for users to explore the possibilities of the GLAM Workbench and accomplish quick tasks. But Binder has limits. Where do you go when your research project scales up?&lt;/p&gt;
&lt;p&gt;Earlier this year I added one-click installation of GLAM Workbench repositories in &lt;a href=&#34;https://glam-workbench.net/using-reclaim-cloud/&#34;&gt;Reclaim Cloud&lt;/a&gt;. &lt;strong&gt;Today I’m very pleased to announce that selected GLAM Workbench repositories can be installed as applications within the &lt;a href=&#34;https://ardc.edu.au/services/nectar-research-cloud/&#34;&gt;Nectar Research Cloud&lt;/a&gt;.&lt;/strong&gt; Using nationally-funded digital infrastructure, researchers in Australian universities can now create their own workbenches in minutes. So whether you’re harvesting truckloads of data from Trove or analysing web archives at scale, you can move beyond Binder and set up an environment dedicated to your research project. Cool huh?&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/e10ef8218a.png&#34; alt=&#34;You can install GLAM Workbench repositories using this simple application in Nectar!&#34;&gt;&lt;/p&gt;
&lt;p&gt;Currently four repositories can be installed on Nectar in this way:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/&#34;&gt;Trove newspapers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;Trove newspaper and gazette harvester&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/recordsearch/&#34;&gt;Recordsearch&lt;/a&gt; (National Archives of Australia)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/web-archives/&#34;&gt;Web archives&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But more will be added in the future as I update repositories to generate the necessary Docker images. Nectar installation information is now included in each of these four repositories, and I’ve added a &lt;a href=&#34;https://glam-workbench.net/using-nectar/&#34;&gt;Using the Nectar Cloud&lt;/a&gt; section to the help documentation that includes a detailed walkthrough of the installation process. If you strike any problems either &lt;a href=&#34;https://github.com/GLAM-Workbench/glam-workbench.github.io/issues&#34;&gt;raise an issue&lt;/a&gt; on GitHub, or ask a question at &lt;a href=&#34;https://ozglam.chat/c/glam-workbench/8&#34;&gt;OzGLAM Chat&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Huge thanks to Andy, Jacob, and Jo&lt;/strong&gt; at the &lt;a href=&#34;https://ardc.edu.au/&#34;&gt;Australian Research Data Commons&lt;/a&gt; (ARDC) who responded enthusiastically to my tweeted query, and packaged the repositories up into an easy-to-install, &lt;a href=&#34;https://github.com/NeCTAR-RC/murano-glamworkbench&#34;&gt;reusable application&lt;/a&gt;. After all the work I’ve put into the GLAM Workbench, it’s really exciting to see it embedded within Australia’s digital research infrastructure. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>More GLAM Name Index updates from Queensland State Archives and SLWA</title>
      <link>https://updates.timsherratt.org/2021/10/18/more-glam-name.html</link>
      <pubDate>Mon, 18 Oct 2021 10:40:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/10/18/more-glam-name.html</guid>
      <description>&lt;p&gt;A new version of the &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Index Search&lt;/a&gt; is available. An additional 49 indexes have been added, bringing the total to 246. You can now search for names in more than &lt;strong&gt;10.2 million records&lt;/strong&gt; from 9 organisations.&lt;/p&gt;
&lt;p&gt;The new indexes come from Queensland State Archives and the State Library of WA. QSA &lt;a href=&#34;https://twitter.com/QSArchives/status/1448891637116067840&#34;&gt;announced on Friday&lt;/a&gt; that they’d added two new indexes to their site. When I went to harvest them, I realised there was another 25 indexes that I hadn’t previously picked up. It seems that some QSA datasets are tagged as ‘Queensland State Archives’ in the data.qld.gov.au portal, but others are tagged as ‘queensland state archives’ – and the tag search is case sensitive! I now search for both the upper and lower case tags.&lt;/p&gt;
&lt;p&gt;There’s also a number of additions from the State Library of WA. These datasets were already in my harvest, but because of some oddities in their formatting, I hadn’t included them in the Index Search. Looking at them again, I realised they were right to go, so I’ve added them in.&lt;/p&gt;
&lt;p&gt;Here’s the list of additions:&lt;/p&gt;
&lt;h3 id=&#34;queensland-state-archives&#34;&gt;Queensland State Archives&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Australian South Sea Islanders 1867 to 1908 - A-K&lt;/li&gt;
&lt;li&gt;Australian South Sea Islanders 1867 to 1908 L-Z&lt;/li&gt;
&lt;li&gt;Beaudesert Shire Burials - Logan Village 1878-2000 - Beaudesert Shire and Logan Village Burials 1878-2000&lt;/li&gt;
&lt;li&gt;Immigrants, Bowen Immigration Depot 1885-1892&lt;/li&gt;
&lt;li&gt;Brisbane Gaol Hospital Admission registers 1889-1911 - Index to Brisbane Gaol Hospital Admission Registers 1889-1911&lt;/li&gt;
&lt;li&gt;Index to Correspondence of Queensland Colonial Secretary 1859-1861 - Index to Colonial Secretary s Correspondence Bundles 1859 - 1861.csv&lt;/li&gt;
&lt;li&gt;Dunwich Benevolent Asylum records - Index to Dunwich Benevolent Asylum 1859-1948&lt;/li&gt;
&lt;li&gt;Dunwich Benevolent Asylum records - Index to Dunwich Benevolent Asylum 1885-1907&lt;/li&gt;
&lt;li&gt;Immigrants and Crew 1860-1865 (COL/A) - Index to Immigrants and Crew 1860 - 1964&lt;/li&gt;
&lt;li&gt;Index to Immigration 1909-1932&lt;/li&gt;
&lt;li&gt;Outdoor Relief 1900-1904 - Index to Outdoor Relief 1892-1920&lt;/li&gt;
&lt;li&gt;Pensions 1908-1919 - Index to Pensions 1908-1909&lt;/li&gt;
&lt;li&gt;Cases &amp;amp; treatment Moreton Bay Hospital 1830-1862 - Index to Register of Cases and treatment at Moreton Bay Hospital 1830-1862&lt;/li&gt;
&lt;li&gt;Index to Registers of Agricultural Lessees 1885-1908&lt;/li&gt;
&lt;li&gt;Index to Registers of Immigrants, Rockhampton 1882-1915&lt;/li&gt;
&lt;li&gt;Pneumonic influenza patients, Wallangarra Quarantine Compound - Index to Wallangarra Flu Camp 1918-1919&lt;/li&gt;
&lt;li&gt;Land selections 1885-1981&lt;/li&gt;
&lt;li&gt;Lazaret patient registers - Lazaret Patient Registers&lt;/li&gt;
&lt;li&gt;Leases, Selections and Pastoral Runs and other related records 1850-2014&lt;/li&gt;
&lt;li&gt;Perpetual Lease Selections of soldier settlements 1917 - 1929 - Perpetual Lease Selections of soldier settlements 1917-1929&lt;/li&gt;
&lt;li&gt;Photographic records of prisoners 1875-1913 - Photographic Records of Prisoners 1875-1913&lt;/li&gt;
&lt;li&gt;Redeemed land orders 1860-1906 - Redeemed land orders 1860-1907&lt;/li&gt;
&lt;li&gt;Register of the Engagement of Immigrants at the Immigration Depot - Bowen 1873-1912&lt;/li&gt;
&lt;li&gt;Registers of Applications by Selectors 1868-1885&lt;/li&gt;
&lt;li&gt;Registers of Immigrants Promissory Notes (Maryborough)&lt;/li&gt;
&lt;li&gt;Education Office Gazette Scholarships 1900 - 1940 - Scholarships in the Education Office Gazette 1900 - 1940&lt;/li&gt;
&lt;li&gt;Teachers in the Education Office Gazettes 1899-1925&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;state-library-of-western-australia&#34;&gt;State Library of Western Australia&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;WABI Subset: Eastern Goldfields - Eastern Goldfields&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with A&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with B&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with C&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with D and E&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with F&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with G&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with H&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with I and J&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with K&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with L&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with M&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with N&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with O&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with P and Q&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with R&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with S&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with T&lt;/li&gt;
&lt;li&gt;Western Australian Biographical Index (WABI) - Index entries beginning with U-Z&lt;/li&gt;
&lt;li&gt;Digital Photographic Collection - Pictorial collection_csv&lt;/li&gt;
&lt;li&gt;WABI subset: Police - WABI police subset&lt;/li&gt;
&lt;li&gt;WABI subset: York - York and districts subset&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;bonus-update&#34;&gt;Bonus update&lt;/h3&gt;
&lt;p&gt;After a bit more work last night I added in a dataset from the State Library of Victoria:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Melbourne and metropolitan hotels, pubs and publicans&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That&amp;rsquo;s an extra 21,000 records, and takes the total number of indexes to 247 from 10 different GLAM organisations!&lt;/p&gt;
&lt;p&gt;#dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Getting data about newspaper issues in Trove</title>
      <link>https://updates.timsherratt.org/2021/10/15/getting-data-about.html</link>
      <pubDate>Fri, 15 Oct 2021 10:44:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/10/15/getting-data-about.html</guid>
      <description>&lt;p&gt;When you search Trove&amp;rsquo;s newspapers, you find articles – these articles are grouped by page, and all the pages from a particular date make up an issue. But how do you find out what issues are available? How do you get a list of dates when newspapers were published? This &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#harvest-information-about-newspaper-issues&#34;&gt;notebook in the GLAM Workbench&lt;/a&gt; shows how you can get information about issues from the Trove API.&lt;/p&gt;
&lt;p&gt;Using the notebook, I’ve created a couple of datasets ready for download and use.&lt;/p&gt;
&lt;h3 id=&#34;total-number-of-issues-per-year-for-every-newspaper-in-trove&#34;&gt;Total number of issues per year for every newspaper in Trove&lt;/h3&gt;
&lt;p&gt;Harvested 10 October 2021&lt;/p&gt;
&lt;p&gt;CSV formatted dataset containing the number of newspaper issues available on Trove, totalled by title and year – comprises 27,604 rows with the fields:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;title&lt;/code&gt; – newspaper title&lt;/li&gt;
&lt;li&gt;&lt;code&gt;title_id&lt;/code&gt; – newspaper id&lt;/li&gt;
&lt;li&gt;&lt;code&gt;state&lt;/code&gt; – place of publication&lt;/li&gt;
&lt;li&gt;&lt;code&gt;year&lt;/code&gt; – year published&lt;/li&gt;
&lt;li&gt;&lt;code&gt;issues&lt;/code&gt; – number of issues&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Download from Cloudstor: &lt;a href=&#34;https://cloudstor.aarnet.edu.au/plus/s/oEkqztgGELlvluQ&#34;&gt;newspaper_issues_totals_by_year_20211010.csv&lt;/a&gt; (2.1mb)&lt;/p&gt;
&lt;h3 id=&#34;complete-list-of-issues-for-every-newspaper-in-trove&#34;&gt;Complete list of issues for every newspaper in Trove&lt;/h3&gt;
&lt;p&gt;Harvested 10 October 2021&lt;/p&gt;
&lt;p&gt;CSV formatted dataset containing a complete list of newspaper issues available on Trove – comprises 2,654,020 rows with the fields:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;title&lt;/code&gt; – newspaper title&lt;/li&gt;
&lt;li&gt;&lt;code&gt;title_id&lt;/code&gt; – newspaper id&lt;/li&gt;
&lt;li&gt;&lt;code&gt;state&lt;/code&gt; – place of publication&lt;/li&gt;
&lt;li&gt;&lt;code&gt;issue_id&lt;/code&gt; – issue identifier&lt;/li&gt;
&lt;li&gt;&lt;code&gt;issue_date&lt;/code&gt; – date of publication (YYYY-MM-DD)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To keep the file size down, I haven&amp;rsquo;t included an &lt;code&gt;issue_url&lt;/code&gt; in this dataset, but these are easily generated from the &lt;code&gt;issue_id&lt;/code&gt;. Just add the &lt;code&gt;issue_id&lt;/code&gt; to the end of &lt;code&gt;http://nla.gov.au/nla.news-issue&lt;/code&gt;. For example: &lt;a href=&#34;http://nla.gov.au/nla.news-issue495426&#34;&gt;http://nla.gov.au/nla.news-issue495426&lt;/a&gt;. Note that when you follow an issue url, you actually get redirected to the url of the first page in the issue.&lt;/p&gt;
&lt;p&gt;Download from Cloudstor: &lt;a href=&#34;https://cloudstor.aarnet.edu.au/plus/s/BWVyJDsdrXQbQAg&#34;&gt;newspaper_issues_20211010.csv&lt;/a&gt; (222mb)&lt;/p&gt;
&lt;p&gt;For more information see the &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/&#34;&gt;Trove newspapers&lt;/a&gt; section of the GLAM Workbench.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>GLAM Workbench at eResearch Australasia 2021</title>
      <link>https://updates.timsherratt.org/2021/10/15/glam-workbench-at.html</link>
      <pubDate>Fri, 15 Oct 2021 09:50:58 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/10/15/glam-workbench-at.html</guid>
      <description>&lt;p&gt;Way back in 2013, I went to the eResearch Australasia conference as the manager of Trove to talk about &lt;a href=&#34;https://www.slideshare.net/wragge/beyond-discovery&#34;&gt;new research possibilities using the Trove API&lt;/a&gt;. Eight years years later &lt;a href=&#34;https://conference.eresearch.edu.au/events/a-glam-workbench-for-humanities-researchers/&#34;&gt;I was back&lt;/a&gt;, still spruiking the possibilities of Trove data. This time, however, I was discussing Trove in the broader context of &lt;strong&gt;GLAM&lt;/strong&gt; data – all the exciting possibilities that have emerged as galleries, libraries, archives and museums make more of their collections available in machine-readable form. The &lt;strong&gt;big question&lt;/strong&gt; is, of course, how do researchers, particularly those in the humanities, make use of that data? The &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; is my attempt to address that question – to provide humanities researchers with both the tools and information they need, and an understanding of the possibilities that might emerge if they invest a bit of time in working with GLAM data. My eResearch Australasia 2021 presentation provides a quick introduction to the GLAM Workbench, here’s the &lt;a href=&#34;https://vimeo.com/631475562&#34;&gt;video&lt;/a&gt;, and the &lt;a href=&#34;https://slides.com/wragge/eresearch2021&#34;&gt;slides&lt;/a&gt;.&lt;/p&gt;
&lt;div style=&#34;padding:56.25% 0 0 0;position:relative;&#34;&gt;&lt;iframe src=&#34;https://player.vimeo.com/video/631475562?h=6b2e2b5636&#34; style=&#34;position:absolute;top:0;left:0;width:100%;height:100%;&#34; frameborder=&#34;0&#34; allow=&#34;autoplay; fullscreen; picture-in-picture&#34; allowfullscreen&gt;&lt;/iframe&gt;&lt;/div&gt;&lt;script src=&#34;https://player.vimeo.com/api/player.js&#34;&gt;&lt;/script&gt;
&lt;p&gt;&lt;a href=&#34;https://vimeo.com/631475562&#34;&gt;A GLAM Workbench for humanities researchers&lt;/a&gt; from &lt;a href=&#34;https://vimeo.com/wragge&#34;&gt;Tim Sherratt&lt;/a&gt; on &lt;a href=&#34;https://vimeo.com&#34;&gt;Vimeo&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The presentation was pre-recorded, but I managed to sneak in an update via chat for those who attended the session. More news on this next week… 🥳&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/29b72b05bc.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>New Python package to download Trove newspaper images</title>
      <link>https://updates.timsherratt.org/2021/10/05/new-python-package.html</link>
      <pubDate>Tue, 05 Oct 2021 12:03:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/10/05/new-python-package.html</guid>
      <description>&lt;p&gt;There&amp;rsquo;s no reliable way of downloading an image of a Trove newspaper article from the web interface. The image download option produces an HTML page with embedded images, and the article is often sliced into pieces to fit the page.&lt;/p&gt;
&lt;p&gt;This &lt;a href=&#34;https://pypi.org/project/trove-newspaper-images/&#34;&gt;Python package&lt;/a&gt; includes tools to download articles as complete JPEG images. If an article is printed across multiple newspaper pages, multiple images will be downloaded – one for each page. It&amp;rsquo;s intended for integration into other tools and processing workflows, or for people who like working on the command line.&lt;/p&gt;
&lt;p&gt;You can use it as a library:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from trove_newspaper_images.articles import download_images

images = download_images(&#39;107024751&#39;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or from the command line:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;trove_newspaper_images.download 107024751 --output_dir images
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you just want to quickly download an article as an image without installing anything, you can use &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#save-a-trove-newspaper-article-as-an-image&#34;&gt;this web app&lt;/a&gt; in the GLAM Workbench. To download images of all articles returned by a search in Trove, you can also use the &lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;Trove Newspaper and Gazette Harvester&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;See the &lt;a href=&#34;https://wragge.github.io/trove_newspaper_images/&#34;&gt;documentation&lt;/a&gt; for more information. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>More records for the GLAM Name Index Search</title>
      <link>https://updates.timsherratt.org/2021/09/29/more-records-for.html</link>
      <pubDate>Wed, 29 Sep 2021 12:17:35 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/09/29/more-records-for.html</guid>
      <description>&lt;p&gt;Two more datasets have been added to the &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Index Search&lt;/a&gt;! From the &lt;a href=&#34;https://history.sa.gov.au/&#34;&gt;History Trust of South Australia&lt;/a&gt; and &lt;a href=&#34;https://collab.sa.gov.au/&#34;&gt;Collab&lt;/a&gt;, I’ve added:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://data.sa.gov.au/data/dataset/passengers-in-history&#34;&gt;Passengers in History&lt;/a&gt; – that’s 371,894 records of people arriving in South Australia from 1836 to 1961&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://data.sa.gov.au/data/dataset/3f6fab54-8cc8-4732-9c1e-fb3f73df53b0&#34;&gt;Women’s Suffrage Petition 1894 (South Australia)&lt;/a&gt; – another 10,638 names&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/7eadb95bde.png&#34; alt=&#34;&#34; title=&#34;Screenshot of GLAM Name Index Search home page&#34;&gt;&lt;/p&gt;
&lt;p&gt;In total there’s 9.67 million name records to search across 197 datasets provided by 9 GLAM organisations!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>More QueryPic in action</title>
      <link>https://updates.timsherratt.org/2021/09/29/more-querypic-in.html</link>
      <pubDate>Wed, 29 Sep 2021 11:30:26 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/09/29/more-querypic-in.html</guid>
      <description>&lt;p&gt;Recently I created a &lt;a href=&#34;https://updates.timsherratt.org/2021/08/30/some-research-projects.html&#34;&gt;list of publications&lt;/a&gt; that made use of &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#querypic&#34;&gt;QueryPic&lt;/a&gt;, my tool to visualise searches in Trove’s digitised newspapers. Here’s another example of the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; and QueryPic in action, in Professor Julian Meyrick’s recent keynote lecture, &amp;lsquo;Looking Forward to the 1950s: A Hauntological Method for Investigating Australian Theatre History’.&lt;/p&gt;
&lt;iframe width=&#34;560&#34; height=&#34;315&#34; src=&#34;https://www.youtube.com/embed/iOLmEBlKeQs&#34; title=&#34;YouTube video player&#34; frameborder=&#34;0&#34; allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&#34; allowfullscreen\&gt;&lt;/iframe&gt;
</description>
    </item>
    
    <item>
      <title>Some thoughts on the ‘Trove Researcher Platform for Advanced Research’ draft plan</title>
      <link>https://updates.timsherratt.org/2021/09/10/some-thoughts-on.html</link>
      <pubDate>Fri, 10 Sep 2021 11:12:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/09/10/some-thoughts-on.html</guid>
      <description>&lt;p&gt;Late last year the Federal Government &lt;a href=&#34;https://ministers.dese.gov.au/tehan/improving-hass-and-indigenous-research-infrastructure&#34;&gt;announced&lt;/a&gt; it was making an $8.9 million investment in HASS and Indigenous research infrastructure. This program is being managed by the ARDC and will lead to the development of a &lt;a href=&#34;https://ardc.edu.au/collaborations/strategic-activities/hass-and-indigenous-research-data-commons/&#34;&gt;HASS Research Data Commons&lt;/a&gt;. According to the ARDC, a research data commons:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;brings together people, skills, data, and related resources such as storage, compute, software, and models to enable researchers to conduct world class data-intensive research&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Sounds awesome!&lt;/p&gt;
&lt;p&gt;Based on scoping studies commissioned by the Department of Education, Skills, and Employment (which have not yet been made public), &lt;a href=&#34;https://ardc.edu.au/collaborations/strategic-activities/hass-and-indigenous-research-data-commons/recommendations-for-co-investment-in-humanities-arts-and-social-sciences-research-data-commons-program/&#34;&gt;four activities were selected for initial funding&lt;/a&gt; under this program. Draft project plans for these four activities have now been &lt;a href=&#34;https://ardc.edu.au/collaborations/strategic-activities/hass-and-indigenous-research-data-commons/project-plans/&#34;&gt;released for public comment&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;One of these activities aims to develop a ’Trove researcher platform for advanced research’:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Augmenting existing National Library of Australia resources, this platform will enable a focus on the delivery of researcher portals accessible through Trove, Australia’s unique public heritage site. The platform will create tools for visualisation, entity recognition, transcription and geocoding across Trove content and other corpora.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You can &lt;a href=&#34;https://ardc.edu.au/wp-content/uploads/2021/09/Research_Platform_TROVE.pdf&#34;&gt;download the draft project plan&lt;/a&gt; for the Trove platform. Funding for this activity will be capped at $2,301,185 across 2021-23. In this post I’ll try to pull together some of my own thoughts on this plan.&lt;/p&gt;
&lt;p&gt;I suppose I’d better start with a disclaimer – I’m not a neutral observer in this. I started scraping data from Trove newspapers way back in 2010, building the first versions of tools like QueryPic and the Trove Newspaper Harvester. While I was manager of Trove, from 2013 to 2016, I argued for recognition of Trove as a key part of Australia’s humanities research infrastructure, and highlighted possible research uses of Trove data available through the API. Since then I’ve worked to bring a range of digital tools, examples, tutorials, and hacks together for researchers in the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; – a &lt;a href=&#34;https://updates.timsherratt.org/2021/08/26/glam-workbench-a.html&#34;&gt;large number of these&lt;/a&gt; work with data from Trove.&lt;/p&gt;
&lt;p&gt;I strongly believe that Trove should receive ongoing funding through &lt;a href=&#34;https://www.dese.gov.au/ncris&#34;&gt;NCRIS&lt;/a&gt; as a piece of national research infrastructure. Unfortunately though, the draft project plan does not make a strong case for investment – it’s vague, unimaginative, and makes little attempt to integrate with existing tools and services. I think it scores poorly against the ARDC’s &lt;a href=&#34;https://ardc.edu.au/wp-content/uploads/2021/09/Evaluation-Criteria-HASS-RDC-and-Indigenous-Research-Capability-program.pdf&#34;&gt;evaluation criteria&lt;/a&gt;, and doesn’t seem to offer good value for money. As someone who has championed the use of Trove data for research across the last decade, I’m very disappointed.&lt;/p&gt;
&lt;h2 id=&#34;whats-planned&#34;&gt;What’s planned?&lt;/h2&gt;
&lt;p&gt;So what is being proposed? There seems to be three main components:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Authenticated ‘project’ spaces for researchers where datasets relating to a particular research topic can be stored&lt;/li&gt;
&lt;li&gt;The ability to create custom datasets from a search in Trove&lt;/li&gt;
&lt;li&gt;Tools to visualise stored datasets.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There’s no doubt that these are all useful functions for researchers, but many problems arise when we look at how they’re going to be implemented.&lt;/p&gt;
&lt;h3 id=&#34;1-authenticated-project-spaces&#34;&gt;1. Authenticated project spaces&lt;/h3&gt;
&lt;p&gt;The draft plan indicates that authentication of users through the &lt;a href=&#34;https://aaf.edu.au/&#34;&gt;Australian Access Federation&lt;/a&gt; is preferred. Why? Trove already has a system for the creation of user accounts. Using AAF would limit use of the new platform to those attached to universities or research agencies. I don’t understand what the use of AAF adds to the project, except perhaps to provide an example of integration with existing infrastructure services.&lt;/p&gt;
&lt;p&gt;The plan notes that project spaces could be ‘public’ or ‘private’. Presumably a ‘public’ space would give access to stored datasets, but what sort of access controls would be available in relation to individual datasets? It’s also noted (Deliverable 7) that researchers would have ‘an option to “publish” their research findings for public consumption‘. Does this mean datasets and visualisations would be assigned a DOI (or other persistent identifier) and preserved indefinitely? How might these spaces integrate with existing data repositories?&lt;/p&gt;
&lt;h3 id=&#34;2-create-custom-datasets&#34;&gt;2. Create custom datasets&lt;/h3&gt;
&lt;p&gt;The lack of detail in the plan makes it difficult to assess what’s being proposed here. But it seems that users would be able to construct a search using the Trove web interface (or a new search interface?) and save the results as a dataset.&lt;/p&gt;
&lt;p&gt;What data would be searched? It’s not clear, but in reference to the visualisations it’s stated that data would come from ’Trove’s existing full text collections (newspapers and gazettes, magazines and newsletters, books)’. So no web archives, and no metadata from any of Trove’s aggregated collections (even without full text, collection metadata can create interesting research possibilities, see for example the &lt;a href=&#34;https://glam-workbench.net/trove-music/&#34;&gt;Radio National records&lt;/a&gt; in the GLAM Workbench).&lt;/p&gt;
&lt;p&gt;What will be included in each dataset? There’s few details, but at a minimum you’d expect something like a CSV containing the metadata of all the matching records, and files containing the full text content of the items. These could potentially be &lt;em&gt;very&lt;/em&gt; large. There’s no indication about how storage and processing demands would be managed, but presumably there would be some per user, or per project, limits.&lt;/p&gt;
&lt;p&gt;Deliverable 8, ‘Data and visual download’, states that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;All query results must be available as downloadable files, this would include CSV, JSON and XML for the query results list.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;But there’s no mention of the full text content at all. Will it be included in downloadable datasets?&lt;/p&gt;
&lt;p&gt;As well as the record metadata and full text, you’d want there to be some metadata captured about the dataset itself – the search query used, when it was captured, the number of records, etc. To support integration and reuse, it would be good to align this with something like &lt;a href=&#34;https://www.researchobject.org/ro-crate/&#34;&gt;RO Crate&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;How will searches be constructed? It’s not clear if this will be integrated with the existing search interface, or be something completely separate; however, the plan does note that ‘limitations are put onto the dataset like keyword search terms and filters corresponding to the filters currently available in the interface’. So it seems that the new platform will be using the existing search indexes. It’s obviously important for the relationship between existing search functions and the new dataset creation tool to be explicit and transparent so that researchers understand what they’re getting.&lt;/p&gt;
&lt;p&gt;It’s also worth noting that changes to the search interface last year removed some useful options from the advanced search form. In particular, you can no longer exclude matches in tags or comments. If you’re a researcher looking for the occurrence of a particular word, you generally don’t want to include records where that word only appears in a user added tag (I have a story about ‘Word War I’ that illustrates this!).&lt;/p&gt;
&lt;p&gt;This raises a broader issue. There doesn’t seem to be any mention in the project plan of work to improve the metadata and indexing in response to research needs. Even just identifying digitised books in the current web interface can be a bit of a challenge, and digitised books and periodicals can be grouped into work records with other versions. We need to recognise that the needs of discovery sometimes compromise specific research uses.&lt;/p&gt;
&lt;p&gt;I’m trying to be constructive in my responses here, but at this point I just have to scream – WHAT ABOUT THE &lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;TROVE NEWSPAPER HARVESTER&lt;/a&gt;? A tool has existed for &lt;em&gt;ten years&lt;/em&gt; that lets users create a dataset containing metadata and full text from a search in Trove’s newspapers and gazettes. I’ve spent a lot of time over recent years adding features and making it easier to use. Now you can download not only full text, but also PDFs and images of articles. The latest web app version in the GLAM Workbench runs in the cloud. Just one click to start it up, then all you need to do is paste in your Trove API key and the url of your search. It can’t get much easier.&lt;/p&gt;
&lt;p&gt;The GLAM Workbench also includes tools to create datasets and download OCRd text from Trove’s &lt;a href=&#34;https://glam-workbench.net/trove-books/&#34;&gt;books&lt;/a&gt; and digitised &lt;a href=&#34;https://glam-workbench.net/trove-journals/&#34;&gt;journals&lt;/a&gt;. These are still in notebook form, so are not as easy to use, but I have created pre-harvested datasets of all books and periodicals with OCRd text, and stored them on CloudStor. What’s missing at the moment is something to harvest a collection of journal articles, but this would not be difficult. As an added bonus, the GLAM Workbench has tools to &lt;a href=&#34;https://glam-workbench.net/web-archives/#harvesting-collections-of-text-from-archived-web-pages&#34;&gt;create full text datasets&lt;/a&gt; from the Australian Web Archive.&lt;/p&gt;
&lt;p&gt;So what is this project really adding? And why is there no attempt to leverage existing tools and resources?&lt;/p&gt;
&lt;h3 id=&#34;3-visualise-datasets&#34;&gt;3. Visualise datasets&lt;/h3&gt;
&lt;p&gt;Again, there’s a fair bit of hand waving in the plan, but it seems that users will be able to select a stored dataset and then choose a form of visualisation. The plan says that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;An initial pilot would allow users to create line graphs that plot the frequency of a search term over time and maps that display results based on state-level geolocation.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Up to three additional visualisations would be created later based on research feedback. It’s not clear which researchers will be consulted and when their feedback will be sought.&lt;/p&gt;
&lt;p&gt;The value of these sorts of visualisations is obviously dependent on the quality and consistency of the metadata. There’s nothing built into this plan that would, for example, allow a researcher to clean or normalise any of the saved data. You have to take what you’re given. The newspaper metadata is generally consistent, but books and periodicals less so.&lt;/p&gt;
&lt;p&gt;It’s also important to clarify what’s meant by ‘the frequency of a search term over time’. Does this mean the number of records matching a search term, or the number of times that the search term actually appears in the full text of all matched records? If the latter, then this &lt;em&gt;would&lt;/em&gt; be a major enrichment of the available data. Though if this data was available it should be pushed through the API and/or made available as a downloadable dataset for integration with other platforms (perhaps along the lines of the Hathi Trust’s &lt;a href=&#34;https://analytics.hathitrust.org/datasets&#34;&gt;Extracted Features Dataset&lt;/a&gt;). I suspect, however, that what is actually meant is the number of matching search results.&lt;/p&gt;
&lt;p&gt;Again, the value of any geospatial visualisation depends on what is actually being visualised! The &lt;code&gt;state&lt;/code&gt; facet in newspapers indicates place of publication, it’s not clear what the &lt;code&gt;place&lt;/code&gt; facet in other categories represents. For this sort of visualisation to be useful in a research context, there would need to be some explanation of how these values were created, and any gaps or uncertainties.&lt;/p&gt;
&lt;p&gt;Time for another scream of frustration — WHAT ABOUT &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#querypic&#34;&gt;QUERYPIC&lt;/a&gt;? Another long-standing tool which has already been &lt;a href=&#34;https://updates.timsherratt.org/2021/08/30/some-research-projects.html&#34;&gt;cited a number of times&lt;/a&gt; in research literature. &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#querypic&#34;&gt;QueryPic&lt;/a&gt; visualises searches in Trove’s newspapers and gazettes over time. You can adjust time scales and intervals, and download the results as images, a CSV file, and an HTML page. The project plan makes a point of claiming that its tools would not require any coding, but neither does QueryPic. Just plug in an API key and a search URL. I even made &lt;a href=&#34;https://www.youtube.com/playlist?list=PLAclcciEeCD2z2BWQ2r3xD_Q8c05HppfP&#34;&gt;some videos&lt;/a&gt; about it! The GLAM Workbench also includes a number of examples of how you can &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#map-trove-newspaper-results-by-state&#34;&gt;visualise places of publication&lt;/a&gt; of newspaper articles.&lt;/p&gt;
&lt;p&gt;But it’s not just the GLAM Workbench. The &lt;a href=&#34;https://ardc.edu.au/wp-content/uploads/2021/09/Language_Commons_Australia.pdf&#34;&gt;Linguistics Data Commons of Australia&lt;/a&gt;, another activity to be funded as part of the HASS Research Data Commons, will include tools for text analysis and visualisation. The &lt;a href=&#34;https://www.tlcmap.org/&#34;&gt;Time Layered Cultural Map&lt;/a&gt; is developing tools for geospatial visualisation of Australian collections. Surely the focus should be on connecting and reusing what’s available. Again I’m wondering what this project is really adding.&lt;/p&gt;
&lt;h2 id=&#34;portals-and-platforms&#34;&gt;Portals and platforms&lt;/h2&gt;
&lt;p&gt;The original language describing the funded activity is interesting — it is intended to ‘focus on the delivery of researcher portals accessible through Trove’.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Portals&lt;/em&gt; (plural) accessible &lt;em&gt;through&lt;/em&gt; (not in) Trove.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The NLA could meet a fair proportion of its stated objectives right now, simply by including links to QueryPic and the Trove Newspaper and Gazette Harvester. Done! There’s a million dollars saved.&lt;/p&gt;
&lt;p&gt;More seriously, there’s no reason why the outcome of this activity should be a new interface attached to Trove and managed by the NLA. Indeed, such an approach works against integration, reuse, and data sharing. I believe the basic assumptions of the draft plan are seriously flawed. We need to separate out the strands of what’s meant by a ‘platform for advanced research’, and think more creatively and collaboratively about how we could achieve something useful, flexible, and sustainable.&lt;/p&gt;
&lt;h2 id=&#34;wheres-the-api&#34;&gt;Where’s the API?&lt;/h2&gt;
&lt;p&gt;I think the primary role of the NLA in the development of this research platform should be as the data provider. There are numerous ways in which Trove’s data might be improved and enriched in support of new research uses. These improvements could then be pushed through the API to integrate with a range of tools and resources. Which raises the question — where is the API in this plan?&lt;/p&gt;
&lt;p&gt;The only mention of the API comes as an option for a user with ‘high technical expertise’ to extend the analysis provided by the built-in visualisations. This is all backwards. The API is the key pipeline for data-sharing and integration and should be at the heart of this plan.&lt;/p&gt;
&lt;p&gt;This program offers an opportunity to make some much-needed improvements to the API. Here’s a few possibilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Bring the web interface and API back into sync so that researchers can easily transfer queries between the two (Trove’s interface update introduced new categories, while the API still groups resources by the original zones).&lt;/li&gt;
&lt;li&gt;Provide public API access to additional data about digitised items. For example, you can get lists of newspaper titles and issues from the API, but there’s no comparable method to get titles and issues for digitised periodicals. The data’s there – it’s used to generate lists of issues in the browse interface – but it’s not in the API. There’s also other resource metadata, such as parent/child relationships, which are embedded in web pages but not exposed in the API.&lt;/li&gt;
&lt;li&gt;Standardise the delivery of OCRd text for different resource types.&lt;/li&gt;
&lt;li&gt;Finally add the People &amp;amp; Organisations data to the main RESTful API.&lt;/li&gt;
&lt;li&gt;Fix the limitations of the web archives CDX API (&lt;a href=&#34;https://nbviewer.jupyter.org/github/GLAM-Workbench/web-archives/blob/master/comparing_cdx_apis.ipynb&#34;&gt;documented here&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Add a search API for the web archives.&lt;/li&gt;
&lt;li&gt;And what about a Write API? Integration between components in the HASS RDC would be greatly enhanced if other projects could automatically add structured annotations to existing Trove resources.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I think the HASS RDC would benefit greatly by thinking much more about the role of the Trove API in establishing reusable data flows, and connecting up components.&lt;/p&gt;
&lt;h2 id=&#34;pathways&#34;&gt;Pathways&lt;/h2&gt;
&lt;p&gt;Anyone who’s been to one of my &lt;a href=&#34;https://glam-workbench.net/presentations/&#34;&gt;GLAM Workbench talks&lt;/a&gt; will know that I talk a lot about ‘pathways’. My concern is not just to provide useful tools and examples, but to try and connect them in ways that encourage researchers to develop their skills and confidence. So a researcher with limited digital skills can spin up QueryPic and start making visualisations without any specialised knowledge. But if they want to explore the data and assumptions behind QueryPic, they can view a notebook that walks them through the process of getting data from facets and assembling a time series. If they find something interesting in QueryPic, they can go to the Newspaper Harvester and assemble a dataset that helps them zoom into a particular period. There are places to go.&lt;/p&gt;
&lt;p&gt;Similarly, users can start making use of the GLAM Workbench in the cloud using &lt;a href=&#34;https://glam-workbench.net/using-binder/&#34;&gt;Binder&lt;/a&gt; – one click and it’s running. But as their research develops they might find Binder a bit limiting, so there are options to spin up the GLAM Workbench using &lt;a href=&#34;https://glam-workbench.net/using-reclaim-cloud/&#34;&gt;Reclaim Cloud&lt;/a&gt; or &lt;a href=&#34;https://glam-workbench.net/using-docker/&#34;&gt;Docker&lt;/a&gt;. As a researcher’s skills, needs, and questions change, so does their use of the GLAM Workbench. At least that’s the plan – I’m very aware that there’s much, much more to do to build and document these pathways.&lt;/p&gt;
&lt;p&gt;The developments described in the draft plan are focused on providing simple tools for non-technical users. That’s fair enough, but you have to give those users somewhere to go, some path beyond, or else it just becomes another dead end. Users can download their data or visualisation, but then what?&lt;/p&gt;
&lt;p&gt;Of course you don’t point a non-coder to API documentation and say ‘there you go’. But coders can use the API to build and share a range of tools that introduce people to the possibilities of data, and scaffold their learning. Why should there be just one interface? It’s not too difficult to imagine a range of introductory visualisation tools aimed at different humanities disciplines. Instead of focusing inward on a single Trove Viz Lite tool, why not look outwards at ways of embedding Trove data within a range of research training contexts?&lt;/p&gt;
&lt;h2 id=&#34;integration&#34;&gt;Integration&lt;/h2&gt;
&lt;p&gt;A number of the HASS RDC Evaluation Criteria focus on issues of integration, collaboration, and reuse of existing resources. For example:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Project plans should display robust proposal planning including the maximisation of the use or re-use of existing research infrastructure, platforms, tools, services, data storage and compute.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Project plans should display integrated infrastructure layers with other HASS RDC activities, in particular by linking together elements such as data storage, tools, authentication, licensing, networks, cloud and high-performance computing, and access to data resources for reuse.&lt;/li&gt;
&lt;li&gt;Project plans must be robust and contribute to the HASS RDC as a coherent whole that capitalises on existing data collections, adheres to the F.A.I.R. principles, develops collaborative tools, utilises shared underlying infrastructure and has appropriate governance planning.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There’s little evidence of this sort of thinking in the draft project plan. I’ve mentioned a few obvious opportunities for integration above, but there are many more. Overall, I think the proposed ‘platform for advanced research’ needs to be designed as a series of interconnected components, and not be seen as the product of a single institution.&lt;/p&gt;
&lt;p&gt;We could imagine, for example, a system where the NLA focused on the delivery of research-ready data via the Trove API. A layer of data filtering, cleaning, and packaging tools could be built on top of the API to help users assemble actionable datasets. The packaging processes could use standards such as RO-Crate to prepare datasets for ingest into data repositories. Existing storage services, such as CloudStor, could be used for saving and sharing working datasets. Another layer of visualisation and analysis tools could either process these datasets, or integrate directly with the API. These tools could be spread across different projects including LDaCA, TLCMap, and the GLAM Workbench — using standards such as Jupyter to encourage sharing and reuse of individual components, and running on a variety of cloud-hosted platforms. Instead of just adding another component to Trove, we’d be building a collaborative network of tool builders and data wranglers — developing capacities across the research sector, and spreading the burden of maintenance.&lt;/p&gt;
&lt;h2 id=&#34;sustainability&#34;&gt;Sustainability&lt;/h2&gt;
&lt;p&gt;The draft project plan includes some pretty worrying comments about long-term support for the new platform. Work Package 5 notes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The developed product will require support post release which can be guaranteed for a period not exceeding the contracted period for this project&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;ARDC will be responsible for providing ongoing financial support for this phase. It has not been included in the proposal.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So once the project is over, the NLA will not support the new platform unless the ARDC provides ongoing funding. What researcher would want to ‘publish’ their data on a platform that could disappear at any time? We all know that sustainability is hard, but you would think that the NLA could at least offer to work collaboratively with the research sector to develop a plan for sustainability, instead of just asking for more money. Why would anyone invest so much for so little?&lt;/p&gt;
&lt;h2 id=&#34;leadership-and-community&#34;&gt;Leadership and community&lt;/h2&gt;
&lt;p&gt;The development of collaborations and communities also figure prominently in the HASS RDC Evaluation Criteria. For example:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Project plans should clearly demonstrate that they enable collaboration and build communities across geographically dispersed research groups through facilitated sharing of high-quality data, particularly for computational analysis; the development of new platforms for collaboration and sharing; and, the encouragement of innovative methodologies through the use of analytic tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Project plans must include a demonstrated commitment to ongoing community development to ensure the sustainability of the development is vital. The deliverables will act as ongoing national research infrastructure. They must be broadly usable by more than just the project partners and serve as input to a wide range of research.&lt;/li&gt;
&lt;li&gt;Project plans, and project leads in particular, should demonstrate the research leadership that will foster and encourage the uptake and use of the HASS RDC.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once again the draft project plan falls short. There are no project partners listed. Instead the plan refers broadly to all of Trove’s content partners, none of whom have direct involvement in this project. Indeed, as noted above, data aggregated from project parters is excluded from the new platform.&lt;/p&gt;
&lt;p&gt;There are no new governance arrangements proposed for this project. Instead the plan refers to the Trove Strategic Advisory Committee which includes representatives from partner organisations. But there are no researcher representatives on this committee.&lt;/p&gt;
&lt;p&gt;The only consultation with the research sector undertaken in the ‘Consultation Phase’ of the project is that undertaken by the ARDC itself. Does that mean this current process whereby the ARDC is soliciting feedback on the project plans? Whoa, meta…&lt;/p&gt;
&lt;p&gt;The plan notes that during the testing phase described in Work Package 3, ‘HASS community members would gain access to a beta version of the product for comment’. However, later it is stated that access would be provided to ’a subset of researchers’, and that only system bugs and ‘high priority improvements’ would be acted upon.&lt;/p&gt;
&lt;p&gt;Generally speaking, it seems that the NLA is seeking as little consultation as possible. It’s not exploring options for collaboration. It’s not engaging with the research community about these developments. That doesn’t seem like an effective way to build communities. Nor does it demonstrate leadership.&lt;/p&gt;
&lt;h2 id=&#34;summing-up&#34;&gt;Summing up&lt;/h2&gt;
&lt;p&gt;This project plan can’t be accepted in its current form. We’ve had failures and disappointments in the development of HASS research infrastructure in the past. The HASS RDC program gives us a chance to start afresh, and the focus on integration, data-sharing, and reuse give hope that we can build something that will continue to grow and develop, and not wither through lack of engagement and support. So should the NLA be getting $2 million to add a new component to Trove that is not integrated with other HASS RDC projects, and substantially duplicates tools available elsewhere? No, I don’t think so. They need to go back to the drawing board, undertake some real consultation, and build collaborations, not products.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Some research projects that have used QueryPic </title>
      <link>https://updates.timsherratt.org/2021/08/30/some-research-projects.html</link>
      <pubDate>Mon, 30 Aug 2021 12:48:11 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/08/30/some-research-projects.html</guid>
      <description>&lt;p&gt;A Twitter thread about some of the research uses of QueryPic&amp;hellip;&lt;/p&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-partner=&#34;tweetdeck&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;QueryPic, my tool for visualising searches in &lt;a href=&#34;https://twitter.com/TroveAustralia?ref_src=twsrc%5Etfw&#34;&gt;@TroveAustralia&lt;/a&gt;’s digitised newspapers, has been around in different forms for more than 10 years. The latest version is part of the &lt;a href=&#34;https://twitter.com/hashtag/GLAMWorkbench?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#GLAMWorkbench&lt;/a&gt;: &lt;a href=&#34;https://t.co/qnY5tVDwgY&#34;&gt;https://t.co/qnY5tVDwgY&lt;/a&gt;  &lt;a href=&#34;https://twitter.com/hashtag/researchinfrastructure?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#researchinfrastructure&lt;/a&gt; &lt;a href=&#34;https://t.co/QyHWJwGV3u&#34;&gt;pic.twitter.com/QyHWJwGV3u&lt;/a&gt;&lt;/p&gt;&amp;mdash; Tim Sherratt (@wragge) &lt;a href=&#34;https://twitter.com/wragge/status/1431841378720370691?ref_src=twsrc%5Etfw&#34;&gt;August 29, 2021&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-conversation=&#34;none&#34; data-cards=&#34;hidden&#34; data-partner=&#34;tweetdeck&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;I thought I’d highlight some of the research publications that have made use of QueryPic over the years, so, in no particular order...&lt;/p&gt;&amp;mdash; Tim Sherratt (@wragge) &lt;a href=&#34;https://twitter.com/wragge/status/1431841710477242378?ref_src=twsrc%5Etfw&#34;&gt;August 29, 2021&lt;/a&gt;&lt;/blockquote&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-conversation=&#34;none&#34; data-cards=&#34;hidden&#34; data-partner=&#34;tweetdeck&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;There’s &lt;a href=&#34;https://twitter.com/Airminded?ref_src=twsrc%5Etfw&#34;&gt;@Airminded&lt;/a&gt;’s article in &lt;a href=&#34;https://twitter.com/HistAustJournal?ref_src=twsrc%5Etfw&#34;&gt;@HistAustJournal&lt;/a&gt; – Brett Holman (2013) &amp;#39;Dreaming War: Airmindedness and the Australian Mystery Aeroplane Scare of 1918&amp;#39;, History Australia, 10:2, 180-201, DOI: &lt;a href=&#34;https://t.co/2wgiLueHGL&#34;&gt;https://t.co/2wgiLueHGL&lt;/a&gt;&lt;/p&gt;&amp;mdash; Tim Sherratt (@wragge) &lt;a href=&#34;https://twitter.com/wragge/status/1431842737041510403?ref_src=twsrc%5Etfw&#34;&gt;August 29, 2021&lt;/a&gt;&lt;/blockquote&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-conversation=&#34;none&#34; data-cards=&#34;hidden&#34; data-partner=&#34;tweetdeck&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;A book! Simon Sleight, Young People and the Shaping of Public Space in Melbourne, 1870–1914, Ashgate Publishing, Ltd., 2013. &lt;a href=&#34;https://t.co/CPgGMrYYYq&#34;&gt;https://t.co/CPgGMrYYYq&lt;/a&gt; &lt;a href=&#34;https://t.co/XryAF0hJ5K&#34;&gt;pic.twitter.com/XryAF0hJ5K&lt;/a&gt;&lt;/p&gt;&amp;mdash; Tim Sherratt (@wragge) &lt;a href=&#34;https://twitter.com/wragge/status/1431843658664275973?ref_src=twsrc%5Etfw&#34;&gt;August 29, 2021&lt;/a&gt;&lt;/blockquote&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-conversation=&#34;none&#34; data-cards=&#34;hidden&#34; data-partner=&#34;tweetdeck&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Yorick Smaal (2013) Keeping it in the family: prosecuting incest in colonial Queensland, Journal of Australian Studies, 37:3, 316-332, DOI: &lt;a href=&#34;https://t.co/n5tQlER9Vo&#34;&gt;https://t.co/n5tQlER9Vo&lt;/a&gt; &lt;a href=&#34;https://t.co/tKzpAosu1i&#34;&gt;pic.twitter.com/tKzpAosu1i&lt;/a&gt;&lt;/p&gt;&amp;mdash; Tim Sherratt (@wragge) &lt;a href=&#34;https://twitter.com/wragge/status/1431844330474409988?ref_src=twsrc%5Etfw&#34;&gt;August 29, 2021&lt;/a&gt;&lt;/blockquote&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-conversation=&#34;none&#34; data-cards=&#34;hidden&#34; data-partner=&#34;tweetdeck&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;In &lt;a href=&#34;https://twitter.com/AHSjournal?ref_src=twsrc%5Etfw&#34;&gt;@AHSjournal&lt;/a&gt; there’s – Murray G. Phillips &amp;amp; Gary Osmond (2015) Australia&amp;#39;s Women Surfers: History, Methodology and the Digital Humanities, Australian Historical Studies, 46:2, 285-303, DOI: &lt;a href=&#34;https://t.co/Gxs1Ru6Ojt&#34;&gt;https://t.co/Gxs1Ru6Ojt&lt;/a&gt;&lt;/p&gt;&amp;mdash; Tim Sherratt (@wragge) &lt;a href=&#34;https://twitter.com/wragge/status/1431844996370419712?ref_src=twsrc%5Etfw&#34;&gt;August 29, 2021&lt;/a&gt;&lt;/blockquote&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-conversation=&#34;none&#34; data-cards=&#34;hidden&#34; data-partner=&#34;tweetdeck&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Gary Osmond (2015) ‘Pink Tea and Sissy Boys’: Digitized Fragments of Male Homosexuality, Non-Heteronormativity and Homophobia in the Australian Sporting Press, 1845–1954, The International Journal of the History of Sport, 32:13, 1578-1592, DOI: &lt;a href=&#34;https://t.co/C6FndD7C4E&#34;&gt;https://t.co/C6FndD7C4E&lt;/a&gt;&lt;/p&gt;&amp;mdash; Tim Sherratt (@wragge) &lt;a href=&#34;https://twitter.com/wragge/status/1431845470419050499?ref_src=twsrc%5Etfw&#34;&gt;August 29, 2021&lt;/a&gt;&lt;/blockquote&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-conversation=&#34;none&#34; data-cards=&#34;hidden&#34; data-partner=&#34;tweetdeck&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Murray G. Phillips, Gary Osmond &amp;amp; Stephen Townsend (2015) A Bird’s-Eye View of the Past: Digital History, Distant Reading and Sport History, The International Journal of the History of Sport, 32:15, 1725-1740, DOI: &lt;a href=&#34;https://t.co/4rB2hkmmDM&#34;&gt;https://t.co/4rB2hkmmDM&lt;/a&gt;&lt;/p&gt;&amp;mdash; Tim Sherratt (@wragge) &lt;a href=&#34;https://twitter.com/wragge/status/1431845844060282880?ref_src=twsrc%5Etfw&#34;&gt;August 29, 2021&lt;/a&gt;&lt;/blockquote&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-conversation=&#34;none&#34; data-cards=&#34;hidden&#34; data-partner=&#34;tweetdeck&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Sarah Ailwood and Maree Sainsbury, ‘Copyright Law, Readers and Authors in Colonial Australia’, Journal of the Association for the Study of Australian Literature, vol. 14, no. 3, 2014. &lt;a href=&#34;https://t.co/XWqx8XJGLQ&#34;&gt;https://t.co/XWqx8XJGLQ&lt;/a&gt;&lt;/p&gt;&amp;mdash; Tim Sherratt (@wragge) &lt;a href=&#34;https://twitter.com/wragge/status/1431846330142314497?ref_src=twsrc%5Etfw&#34;&gt;August 29, 2021&lt;/a&gt;&lt;/blockquote&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-conversation=&#34;none&#34; data-cards=&#34;hidden&#34; data-partner=&#34;tweetdeck&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Sarah Ailwood and Maree Sainsbury, ‘The Imperial Effect: Literary Copyright Law in Colonial Australia’, Law, Culture and the Humanities, vol. 12, no. 3, 1 October 2016, pp. 716–740. &lt;a href=&#34;https://t.co/s6HrBZmQ6N&#34;&gt;https://t.co/s6HrBZmQ6N&lt;/a&gt;&lt;/p&gt;&amp;mdash; Tim Sherratt (@wragge) &lt;a href=&#34;https://twitter.com/wragge/status/1431846863200677888?ref_src=twsrc%5Etfw&#34;&gt;August 29, 2021&lt;/a&gt;&lt;/blockquote&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-conversation=&#34;none&#34; data-cards=&#34;hidden&#34; data-partner=&#34;tweetdeck&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;A book chapter by &lt;a href=&#34;https://twitter.com/JVLamond?ref_src=twsrc%5Etfw&#34;&gt;@JVLamond&lt;/a&gt; – Lamond, J, 2016, &amp;#39;Zones of Connection: Common Reading in a Regional Australian Library&amp;#39;, in Print Culture Histories Beyond the Metropolis, University of Toronto Press, Toronto, pp. 355-374. &lt;a href=&#34;https://t.co/o3oAmreYne&#34;&gt;https://t.co/o3oAmreYne&lt;/a&gt;&lt;/p&gt;&amp;mdash; Tim Sherratt (@wragge) &lt;a href=&#34;https://twitter.com/wragge/status/1431848340660965378?ref_src=twsrc%5Etfw&#34;&gt;August 29, 2021&lt;/a&gt;&lt;/blockquote&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-conversation=&#34;none&#34; data-cards=&#34;hidden&#34; data-partner=&#34;tweetdeck&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Not just history – Scifleet, P., Henninger, M. &amp;amp; Albright, K.H. (2013). When social media are your source. Information Research, 18(3) paper C41. &lt;a href=&#34;https://t.co/qOYbZ3TMTf&#34;&gt;https://t.co/qOYbZ3TMTf&lt;/a&gt; &lt;a href=&#34;https://t.co/GDP2TmeUzp&#34;&gt;pic.twitter.com/GDP2TmeUzp&lt;/a&gt;&lt;/p&gt;&amp;mdash; Tim Sherratt (@wragge) &lt;a href=&#34;https://twitter.com/wragge/status/1431849020478033926?ref_src=twsrc%5Etfw&#34;&gt;August 29, 2021&lt;/a&gt;&lt;/blockquote&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-conversation=&#34;none&#34; data-cards=&#34;hidden&#34; data-partner=&#34;tweetdeck&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;There’s also a number of references to QueryPic as a tool in the DH &amp;amp; library literature, that I won’t list.&lt;br&gt;&lt;br&gt;There’s probably more – citation of tools like QueryPic can be a bit hit and miss.&lt;/p&gt;&amp;mdash; Tim Sherratt (@wragge) &lt;a href=&#34;https://twitter.com/wragge/status/1431850352396029955?ref_src=twsrc%5Etfw&#34;&gt;August 29, 2021&lt;/a&gt;&lt;/blockquote&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-conversation=&#34;none&#34; data-cards=&#34;hidden&#34; data-partner=&#34;tweetdeck&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;The latest version of QueryPic is designed to be both easy-to-use and flexible – click a link to start it up, paste in your &lt;a href=&#34;https://twitter.com/TroveAustralia?ref_src=twsrc%5Etfw&#34;&gt;@TroveAustralia&lt;/a&gt; API key, and a search url from Trove… and bingo!&lt;br&gt;&lt;br&gt;For a quick intro, see this video: &lt;a href=&#34;https://t.co/Hh1oDIOh9a&#34;&gt;https://t.co/Hh1oDIOh9a&lt;/a&gt;&lt;/p&gt;&amp;mdash; Tim Sherratt (@wragge) &lt;a href=&#34;https://twitter.com/wragge/status/1431851213847347203?ref_src=twsrc%5Etfw&#34;&gt;August 29, 2021&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-conversation=&#34;none&#34; data-cards=&#34;hidden&#34; data-partner=&#34;tweetdeck&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;But even though it’s easy to get started, QueryPic can do interesting things like compare queries. You can also adjust facets, date ranges, and time scales.&lt;br&gt;&lt;br&gt;This video shows you how to create more complex queries: &lt;a href=&#34;https://t.co/0CoJhO7vaJ&#34;&gt;https://t.co/0CoJhO7vaJ&lt;/a&gt;&lt;/p&gt;&amp;mdash; Tim Sherratt (@wragge) &lt;a href=&#34;https://twitter.com/wragge/status/1431851859065507844?ref_src=twsrc%5Etfw&#34;&gt;August 29, 2021&lt;/a&gt;&lt;/blockquote&gt;
&lt;blockquote class=&#34;twitter-tweet&#34; data-conversation=&#34;none&#34; data-cards=&#34;hidden&#34; data-partner=&#34;tweetdeck&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;As I often say, not all &lt;a href=&#34;https://twitter.com/hashtag/researchinfrastructure?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#researchinfrastructure&lt;/a&gt; has to be big. A simple tool like this can help researchers see their topics in new ways.&lt;br&gt;&lt;br&gt;And from this starting point, there’s all sorts of pathways to follow in the &lt;a href=&#34;https://twitter.com/hashtag/GLAMWorkbench?src=hash&amp;amp;ref_src=twsrc%5Etfw&#34;&gt;#GLAMWorkbench&lt;/a&gt; &lt;a href=&#34;https://t.co/AC2tipN8eY&#34;&gt;https://t.co/AC2tipN8eY&lt;/a&gt;&lt;/p&gt;&amp;mdash; Tim Sherratt (@wragge) &lt;a href=&#34;https://twitter.com/wragge/status/1431853486564536320?ref_src=twsrc%5Etfw&#34;&gt;August 29, 2021&lt;/a&gt;&lt;/blockquote&gt;
</description>
    </item>
    
    <item>
      <title>Government publications in Trove</title>
      <link>https://updates.timsherratt.org/2021/08/30/government-publications-in.html</link>
      <pubDate>Mon, 30 Aug 2021 12:21:46 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/08/30/government-publications-in.html</guid>
      <description>&lt;p&gt;Over the last few weeks I’ve been updating my harvests of OCRd text from digitised &lt;a href=&#34;https://updates.timsherratt.org/2021/08/16/explore-troves-digitised.html&#34;&gt;books&lt;/a&gt; and &lt;a href=&#34;https://updates.timsherratt.org/2021/08/06/updated-lots-and.html&#34;&gt;periodicals&lt;/a&gt; in Trove. As part of the harvesting process, I’ve created lists of both that are available in digital form – this includes digitised works, as well as those that are born-digital (such as PDFs or epubs). I’ve published the full lists of &lt;a href=&#34;https://trove-digital-books.glitch.me/data/trove-digital-books&#34;&gt;books&lt;/a&gt; and &lt;a href=&#34;https://trove-digital-periodicals.glitch.me/data/trove-digital-journals&#34;&gt;periodicals&lt;/a&gt; as searchable databases to make them easy to explore.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/51c3edce9a.png&#34; alt=&#34;&#34; title=&#34;Screenshot of database of Trove&#39;s government publications.&#34;&gt;&lt;/p&gt;
&lt;p&gt;One thing that you might notice is that works with the format ‘Government publication’ pop up in both lists – sometimes it’s not clear whether something is a ‘book’ or ‘periodical’. To make it easier to find these items, no matter what their format, I’ve combined data from my two harvests and created a &lt;a href=&#34;https://trove-government-publications.glitch.me/data/trove-government-publications&#34;&gt;searchable dataset of government publications&lt;/a&gt;. It includes links to download OCRd text from CloudStor if available.&lt;/p&gt;
&lt;p&gt;All three databases make use of Datasette, which I’ve also used for the &lt;a href=&#34;https://updates.timsherratt.org/2021/08/23/a-family-history.html&#34;&gt;GLAM Name Index Search&lt;/a&gt;. One of the cool things about &lt;a href=&#34;https://datasette.io/&#34;&gt;Datasette&lt;/a&gt; is that it provides it’s own API, so if you find some interesting in any of these databases, you can easily download the machine-readable data for further analysis. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>GLAM Workbench – a platform for digital HASS research</title>
      <link>https://updates.timsherratt.org/2021/08/26/glam-workbench-a.html</link>
      <pubDate>Thu, 26 Aug 2021 17:31:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/08/26/glam-workbench-a.html</guid>
      <description>&lt;p&gt;We’re in the midst of planning for the &lt;a href=&#34;https://ardc.edu.au/collaborations/strategic-activities/hass-and-indigenous-research-data-commons/&#34;&gt;HASS Research Data Commons&lt;/a&gt;, which will deliver some much-needed investment in digital research infrastructure for the humanities and social sciences. Amongst the funded programs are tools for text analysis as part of the Linguistics Data Commons, and a platform for more advanced research using Trove. I’m hoping that this will be an opportunity to take stock of existing tools and resources, and build flexible pathways for researchers that enable them to collect, move, analyse, preserve, and share data across different platforms and services.&lt;/p&gt;
&lt;p&gt;To this end, I thought it might be useful to try and summarise what the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; offers, particularly for &lt;a href=&#34;https://trove.nla.gov.au/&#34;&gt;Trove&lt;/a&gt; researchers. The GLAM Workbench doesn’t really have an institutional home, and is mostly unfunded – it’s my passion project. That means that it’s easy to overlook, particularly when the big grants are being doled out. But I think it has a lot to offer and I’m looking forward to exploring ways it can connect with these new initiatives.&lt;/p&gt;
&lt;h2 id=&#34;getting-and-moving-data&#34;&gt;Getting and moving data&lt;/h2&gt;
&lt;p&gt;There’s lots of fabulous data in Trove and other GLAM collections. In fact, there’s so much data that it can be difficult for researchers to find and collect what’s relevant to their interests. There are many tools in the GLAM Workbench to help researchers assemble their own datasets. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;Get newspaper articles in bulk with the Trove Newspaper and Gazette Harvester&lt;/a&gt;&lt;/strong&gt; – This has been around in some form for more than ten years (it pre-dates the Trove API!). Give it the url of a search in Trove’s newspapers and gazettes and the harvester will save all the metadata in a CSV file, and optionally download the complete articles as OCRd text, images, or PDFs. The amount of data you harvest is really only limited by your patience and disk space. I’ve harvested more than a million articles in the past. The GLAM Workbench includes a web app version of the harvester that runs live in the cloud – just paste in your Trove API key and the search url, and click the button.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Get Trove newspaper pages as images&lt;/strong&gt; – If you need a nice, high-resolution version of a newspaper page you can &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#download-a-page-image&#34;&gt;use this web app&lt;/a&gt;. If you want to harvest every front page (or some other particular page) here’s an example that &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#harvest-australian-womens-weekly-covers-or-the-front-pages-of-any-newspaper&#34;&gt;gets all the covers of the &lt;em&gt;Australian Women’s Weekly&lt;/em&gt;&lt;/a&gt;. A pre-harvested &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#australian-womens-weekly-front-covers-1933-to-1982&#34;&gt;collection of the AWW covers&lt;/a&gt; is included as a bonus extra.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#save-a-trove-newspaper-article-as-an-image&#34;&gt;Get Trove newspaper articles as images&lt;/a&gt;&lt;/strong&gt; – The Trove web interface makes it difficult to download complete images of articles, but this tool will do the job. There’s a handy web app to grab individual images, but the code from this tool is reused in other places such as the Trove Newspaper Harvester and the Omeka uploader, and could be built-in to your own research workflows.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#upload-trove-newspaper-articles-to-omeka-s&#34;&gt;Upload Trove newspaper articles to Omeka&lt;/a&gt;&lt;/strong&gt; – Whether you’re creating on online exhibition or building a research database, Omeka can be very useful. This notebook connects Trove’s newspapers to Omeka for easy upload. Your selected articles can come from a search query, a Trove list, a Zotero library, or just a list of article ids. Metadata records are created in Omeka for each article and newspaper, and an image of each article is attached.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/#get-ocrd-text-from-a-digitised-journal-in-trove&#34;&gt;Get OCRd text from digitised periodicals in Trove&lt;/a&gt;&lt;/strong&gt; – They’re often overshadowed by the newspapers, but there’s now lots of digitised journals, magazines, and parliamentary papers in Trove. You can get article-level data from the API, but not issue data. This notebook enables researchers to get metadata and OCRd text from every available issue of a periodical. To make researchers’ lives even easier, I regularly harvest &lt;a href=&#34;https://glam-workbench.net/trove-journals/#ocrd-text-from-trove-digitised-journals&#34;&gt;&lt;strong&gt;all&lt;/strong&gt; the available OCRd text&lt;/a&gt; from digitised periodicals in Trove. The latest harvest downloaded 51,928 issues from 1,163 periodicals – that’s about 10gb of text. You can &lt;a href=&#34;https://github.com/GLAM-Workbench/trove-journals/blob/master/digital-journals-with-text.md&#34;&gt;browse the list of periodicals&lt;/a&gt; with OCRd text, or &lt;a href=&#34;https://trove-digital-periodicals.glitch.me/data/trove-digital-journals&#34;&gt;search this database&lt;/a&gt;. All the OCRd text is stored in a public repository on CloudStor.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/#get-covers-or-any-other-pages-from-a-digitised-journal-in-trove&#34;&gt;Get page images from digitised periodicals in Trove&lt;/a&gt;&lt;/strong&gt; – There’s more than text in digitised periodicals, and you might want to download images of pages for visual analysis. This notebook shows you how to get cover images, but could be easily modified to get another page, or a PDF. I used a modified version of this to create &lt;a href=&#34;https://glam-workbench.net/trove-journals/#editorial-cartoons-from-the-bulletin-1886-to-1952&#34;&gt;a collection of 3,471 full page editorial cartoons&lt;/a&gt; from &lt;em&gt;The Bulletin&lt;/em&gt;, 1886 to 1952 – all available to download from CloudStor.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-books/#harvesting-the-text-of-digitised-books-and-ephemera&#34;&gt;Get OCRd text from digitised books in Trove&lt;/a&gt;&lt;/strong&gt; – Yep, there’s digitised books as well as newspapers and periodicals. You can download OCRd text from an individual book using the Trove web interface, but how do you make a collection of books without all that pointing and clicking? This notebook downloads all the available OCRd text from digitised books in Trove. The latest harvest includes &lt;a href=&#34;https://glam-workbench.net/trove-books/#ocrd-text-from-trove-books-and-ephemera&#34;&gt;text from 26,762 works&lt;/a&gt;. You can explore the results &lt;a href=&#34;https://trove-digital-books.glitch.me/data/trove-digital-books&#34;&gt;using this database&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-journals/#harvest-parliament-press-releases-from-trove&#34;&gt;Harvest parliamentary press releases from Trove&lt;/a&gt;&lt;/strong&gt; – Trove includes more than 380,000 press releases, speeches, and interview transcripts issued by Australian federal politicians and saved by the Parliamentary Library. This notebook shows you how to harvest both metadata and fulltext from a search of the parliamentary press releases. For example, here’s a collection of &lt;a href=&#34;https://glam-workbench.net/trove-journals/#politicians-talking-about-immigrants-and-refugees&#34;&gt;politicians talking about ‘refugees’&lt;/a&gt;, and another &lt;a href=&#34;https://glam-workbench.net/trove-journals/#politicians-talking-about-covid&#34;&gt;relating to COVID-19&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-music/#harvest-abc-radio-national-records-from-trove&#34;&gt;Harvest details of Radio National programs from Trove&lt;/a&gt;&lt;/strong&gt; – Trove creates records for programs broadcast on ABC Radio National, for the major current affairs programs these records at at segment level. Even though they don’t provide full transcripts, this data provide a rich, fine-grained record of Australia’s recent political, social, and economic history. This notebook shows you how to download the Radio National data. If you just want to dive straight in, there’s also a &lt;a href=&#34;https://glam-workbench.net/trove-music/#abc-radio-national-programs&#34;&gt;pre-harvested collection&lt;/a&gt; containing more than 400,000 records, with separate downloads for some of the main programs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/web-archives/#find-all-the-archived-versions-of-a-web-page&#34;&gt;Find all the versions of an archived web page in Trove&lt;/a&gt;&lt;/strong&gt; – Many of the tools in the &lt;a href=&#34;https://glam-workbench.net/web-archives/&#34;&gt;Web Archives section&lt;/a&gt; of the GLAM Workbench will work with the Australian Web Archive, which is part of Trove. This notebook shows you how to get data about the number of times a web page has been archived over time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/web-archives/#harvesting-collections-of-text-from-archived-web-pages&#34;&gt;Harvesting collections of text from archived web pages in Trove&lt;/a&gt;&lt;/strong&gt; – If you want to explore how the content of a web page changes over time, you can use this notebook to capture the text content of every archived version of a web page.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-lists/#convert-a-trove-list-into-a-csv-file&#34;&gt;Convert a Trove list into a CSV file&lt;/a&gt;&lt;/strong&gt; – While Trove provides a data download option for lists, it leaves out a lot of useful data. This notebook downloads full details of newspaper articles and other works in a list and saves them as CSV files. Like the Trove Newspaper Harvester, it lets you download OCRd text and images from newspaper articles.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Collecting information about Trove user activity&lt;/strong&gt; – It’s not just the content of Trove that provides interesting research data, it’s also the way people engage with it. Using the Trove API it’s possible to harvest details of &lt;a href=&#34;https://glam-workbench.net/trove-lists/&#34;&gt;all user created lists and tags&lt;/a&gt;. And yes, there’s pre-harvested collections of &lt;a href=&#34;https://glam-workbench.net/trove-lists/#trove-lists-metadata&#34;&gt;lists&lt;/a&gt; and &lt;a href=&#34;https://glam-workbench.net/trove-lists/#trove-public-tags&#34;&gt;tags&lt;/a&gt; for the impatient.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While I’m focusing here on Trove, there’s also tools to create datasets from the &lt;a href=&#34;https://glam-workbench.net/recordsearch/&#34;&gt;National Archives of Australia&lt;/a&gt;, &lt;a href=&#34;https://glam-workbench.net/digitalnz/&#34;&gt;Digital NZ and Papers Past&lt;/a&gt;, the &lt;a href=&#34;https://glam-workbench.net/nma/&#34;&gt;National Museum of Australia&lt;/a&gt; and more. And there’s a &lt;a href=&#34;https://glam-workbench.net/glam-data-list/&#34;&gt;big list of readily downloadable datasets&lt;/a&gt; from Australian GLAM organisations.&lt;/p&gt;
&lt;h2 id=&#34;visualisation-and-analysis&#34;&gt;Visualisation and analysis&lt;/h2&gt;
&lt;p&gt;Many of the notebooks listed above include examples that demonstrate ways of exploring and analysing your harvested data. There are also a number of companion notebooks that examine some possibilities in more detail, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-harvester/#exploring-your-troveharvester-data&#34;&gt;Explore your Trove newspaper harvests&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-harvester/#display-the-results-of-a-harvest-as-a-searchable-database-using-datasette&#34;&gt;Load your Trove newspaper harvest in Datasette&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-music/#exploring-abc-radio-national-metadata&#34;&gt;Exploring ABC Radio National metadata&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-lists/#analyse-public-tags-added-to-trove&#34;&gt;Analyse public tags added to Trove&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But there are also many other notebooks that demonstrate methods for analysing Trove’s content, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#querypic&#34;&gt;QueryPic&lt;/a&gt;&lt;/strong&gt; – Another tool that’s been around in different forms for a decade, QueryPic visualises searches in Trove’s newspapers. The latest web app couldn’t be simpler, just paste in your API key and a search url and create charts showing the number of matching articles over time. You can combine queries, change time scales, and download the data and visualisations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#visualise-trove-newspaper-searches-over-time&#34;&gt;Visualise Trove newspaper searches over time&lt;/a&gt;&lt;/strong&gt; – This is like a deconstructed version of QueryPic that walks you through the process of using Trove’s facets to assemble a dataset of results over time. It provide a lot of detail on the sorts of data available, and the questions we can ask of it.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#visualise-the-total-number-of-newspaper-articles-in-trove-by-year-and-state&#34;&gt;Visualise the total number of newspaper articles in Trove by year and state&lt;/a&gt;&lt;/strong&gt; – This notebook uses a modified version of the code above to analyse the construction and context of Trove’s newspaper corpus itself. What are you actually searching? Meet the WWI effect and the copyright cliff of death! This is a great place to start if you want to get people thinking critically about digital resources are constructed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#analyse-rates-of-ocr-correction&#34;&gt;Analyse rates of OCR correction&lt;/a&gt;&lt;/strong&gt; – Some more meta-analysis of the Trove corpus itself, this time focusing on patterns of OCR correction by Trove users.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#finding-non-english-newspapers-in-trove&#34;&gt;Identifying non-English language newspapers in Trove&lt;/a&gt;&lt;/strong&gt; – There are a growing number of non-English language newspapers digitised in Trove. However, if you&amp;rsquo;re only searching using English keywords, you might never know that they&amp;rsquo;re there. This notebook analyses a sample of articles from every newspaper in Trove to identify non-English content.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#beyond-the-copyright-cliff-of-death&#34;&gt;Beyond the copyright cliff of death&lt;/a&gt;&lt;/strong&gt; – Most of the newspaper articles on Trove were published before 1955, but there are some from the later period. This notebook helps you find out how many, and which newspapers they were published in.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#map-trove-newspaper-results-by-state&#34;&gt;Map Trove newspaper results by state&lt;/a&gt;&lt;/strong&gt; – This notebook uses the Trove &lt;code&gt;state&lt;/code&gt; facet to create a choropleth map that visualises the number of search results per state.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#map-trove-newspaper-results-by-place-of-publication&#34;&gt;Map Trove newspaper results by place of publication&lt;/a&gt;&lt;/strong&gt; – This notebook uses the Trove &lt;code&gt;title&lt;/code&gt; facet to find the number of results per newspaper, then merges the results with a dataset of geolocated newspapers to map where articles were published.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/web-archives/#compare-two-versions-of-an-archived-web-page&#34;&gt;Compare two versions of an archived web page&lt;/a&gt;&lt;/strong&gt; – This notebook demonstrates a number of different ways of comparing versions of archived web pages. Just choose a repository, enter a url, and select two dates to see comparisons based on: page metadata, basic statistics such as file size and number of words, numbers of internal and external links, cosine similarity of text, line by line differences in text or code, and screenshots.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/web-archives/#display-changes-in-the-text-of-an-archived-web-page-over-time&#34;&gt;Display changes in the text of an archived web page over time&lt;/a&gt;&lt;/strong&gt; – This web app gathers all the available versions of a web page and then visualises changes in its content between versions – what’s been added, removed, and changed?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/web-archives/#using-screenshots-to-visualise-change-in-a-page-over-time&#34;&gt;Use screenshots to visualise change in a page over time&lt;/a&gt;&lt;/strong&gt;– Create a series of full page screenshots of a web page over time, then assemble them into a time series.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are also possibilities for using Trove data creatively. For example you can &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#create-scissors-and-paste-messages-from-trove-newspaper-articles&#34;&gt;create &amp;lsquo;scissors and paste&amp;rsquo; messages from Trove newspaper articles&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;documentation-and-examples&#34;&gt;Documentation and examples&lt;/h2&gt;
&lt;p&gt;All the Trove notebooks in the GLAM Workbench help document the possibilities and limits of the Trove API. The examples above can be modified and reworked to suit different research interests. Some notebooks also explore particular aspects of the API, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove/&#34;&gt;Trove API Introduction&lt;/a&gt;&lt;/strong&gt; – Some very basic examples of making requests and understanding results.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#todays-news-yesterday&#34;&gt;Today’s news yesterday&lt;/a&gt;&lt;/strong&gt; – Uses the &lt;code&gt;date&lt;/code&gt; index and the &lt;code&gt;firstpageseq&lt;/code&gt; parameter to find articles from exactly 100 years ago that were published on the front page. It then selects one of the articles at random and downloads and displays an image of the front page.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-images/#the-use-of-standard-licences-and-rights-statements-in-trove-image-records&#34;&gt;The use of standard licences and rights statements in Trove image records&lt;/a&gt;&lt;/strong&gt; – Version 2.1 of the Trove API introduced a new rights index that you can use to limit your search results to records that include one of a list of standard licences and rights statements. We can also use this index to build a picture of which rights statements are currently being used, and by who.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/trove-random/&#34;&gt;Random items from Trove&lt;/a&gt;&lt;/strong&gt; – Changes to the Trove API meant that techniques you could previously use to select resources at random no longer work. This section documents some alternative ways of retrieving random-ish works and newspaper articles from Trove.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And while it’s not officially part of the GLAM Workbench, I also maintain the &lt;a href=&#34;https://troveconsole.herokuapp.com/&#34;&gt;Trove API Console&lt;/a&gt; which provides lots of examples of the API in action.&lt;/p&gt;
&lt;h2 id=&#34;pathways&#34;&gt;Pathways&lt;/h2&gt;
&lt;p&gt;In developing the GLAM Workbench I’m very aware that people will arrive with different levels of digital skill, confidence, and experience. That’s why I’ve been putting a lot of thought and effort into ways of providing a range of entry points.&lt;/p&gt;
&lt;p&gt;Someone who might not identify as a ‘digital’ researcher can, with a single click, start up QueryPic and start exploring changes over time in Trove’s newspapers. This is possible because the GLAM Workbench is configured to &lt;a href=&#34;https://glam-workbench.net/using-binder/&#34;&gt;make use of Binder&lt;/a&gt;, a service that spins up customised computing environments as needed.&lt;/p&gt;
&lt;p&gt;Another researcher might start running the Trove Newspaper Harvester using Binder, but find that they want to run bigger and longer harvests. In that case, the GLAM Workbench offers a one-click installation of the Trove Newspaper Harvester on &lt;a href=&#34;https://glam-workbench.net/using-reclaim-cloud/&#34;&gt;Reclaim Cloud&lt;/a&gt;. Unlike Binder, Reclaim Cloud environments are persistent, so you can run the harvester for as long as you want without the worry of interruptions.&lt;/p&gt;
&lt;p&gt;Yet another researcher might want to understand how the Trove API works and the sorts of data that it makes available. By exploring the various notebooks they’ll find useful snippets of code they can try out in their own projects.&lt;/p&gt;
&lt;p&gt;The GLAM Workbench connects outwards to make use of a range of other services – the notebooks run in Binder, Reclaim Cloud, and Docker; the code is all openly licensed and publicly available through GitHub and Zenodo; data is hosted on GitHub, CloudStor, and Zenodo; datasets can be explored using Datasette running on Glitch or Google CloudRun. I’m hoping that the new investments in HASS research infrastructure will embed a similar philosophy, connecting up existing services rather than starting from scratch.&lt;/p&gt;
&lt;h2 id=&#34;the-future&#34;&gt;The future&lt;/h2&gt;
&lt;p&gt;This is just an outline on what the GLAM Workbench currently offers researchers wanting to make use of the data available from Trove. It&amp;rsquo;s all there now, publicly accessible, openly licensed, and ready to use – take it, use it, change it, share it. But there&amp;rsquo;s &lt;strong&gt;much more&lt;/strong&gt; I&amp;rsquo;d like to do, both in regard to Trove and to encourage use of GLAM data more generally. I&amp;rsquo;m also interested in your ideas for new tools, examples, or data sources – what would help your research? You can &lt;a href=&#34;https://glam-workbench.net/suggest-a-topic/&#34;&gt;add a suggestion&lt;/a&gt; in GitHub, or post a comment in the &lt;a href=&#34;https://ozglam.chat/c/glam-workbench/8&#34;&gt;GLAM Workbench channel&lt;/a&gt; of OzGLAM Help.&lt;/p&gt;
&lt;p&gt;See the &lt;a href=&#34;https://glam-workbench.net/getting-started/&#34;&gt;Getting Started section&lt;/a&gt; of the GLAM Workbench for more hints and examples. And keep an eye on the &lt;a href=&#34;https://updates.timsherratt.org/categories/glamworkbench/&#34;&gt;news feed&lt;/a&gt; for the latest additions and updates.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>A Family History Month experiment – search millions of name records from GLAM organisations</title>
      <link>https://updates.timsherratt.org/2021/08/23/a-family-history.html</link>
      <pubDate>Mon, 23 Aug 2021 11:05:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/08/23/a-family-history.html</guid>
      <description>&lt;p&gt;There’s a lot of rich historical data contained within the indexes that Australian GLAM organisations provide to help people navigate their records. These indexes, often created by volunteers, allow access by key fields such as name, date or location. They aid discovery, but also allow new forms of analysis and visualisation. Kate Bagnall and I wrote about some of the possibilities, and the difficulties, in this &lt;a href=&#34;http://doi.org/10.1353/jwh.2021.0025&#34;&gt;recently published article&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Many of these indexes can be downloaded from government data portals. The GLAM Workbench demonstrates &lt;a href=&#34;https://glam-workbench.net/glam-data-portals/&#34;&gt;how these can be harvested&lt;/a&gt;, and provides a &lt;a href=&#34;https://glam-workbench.net/glam-datasets-from-gov-portals/&#34;&gt;list of available datasets&lt;/a&gt; to browse. But what’s inside them? The &lt;a href=&#34;https://glam-workbench.net/csv-explorer/&#34;&gt;GLAM CSV Explorer&lt;/a&gt; visualises the contents of the indexes to give you a sneak peek and encourage you to dig deeper.&lt;/p&gt;
&lt;p&gt;There’s even more indexes available from the NSW State Archives. Most of these aren’t accessible thought the NSW government data portal yet, but I managed to &lt;a href=&#34;https://glam-workbench.net/nsw-state-archives/&#34;&gt;scrape them from the website&lt;/a&gt; a couple of years ago and made them &lt;a href=&#34;https://glam-workbench.net/nsw-state-archives/#nsw-state-archives-online-indexes&#34;&gt;available as CSVs&lt;/a&gt; for easy download.&lt;/p&gt;
&lt;p&gt;It’s &lt;a href=&#34;https://familyhistorymonth.org.au/&#34;&gt;Family History Month&lt;/a&gt; at the moment, and the other night I thought of an interesting little experiment using the indexes. I’ve been playing round with &lt;a href=&#34;https://datasette.io/&#34;&gt;Datasette&lt;/a&gt; lately. It’s a fabulous tool for exploring tabular data, like CSVs. I also noticed that Datasette’s creator Simon Willison had added a &lt;a href=&#34;https://simonwillison.net/2020/Mar/9/datasette-search-all/&#34;&gt;search-all plugin&lt;/a&gt; that enabled you to run a full text search across multiple databases and tables. Hmmm, I wondered, would it be possible to use Datasette to provide a way of searching for names across &lt;strong&gt;all&lt;/strong&gt; those GLAM indexes?&lt;/p&gt;
&lt;p&gt;After a few nights work, I found the answer was &lt;strong&gt;yes&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;Try out my new aggregated search interface here!&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;(The cloud service it uses runs on demand, so if it has gone to sleep, it might take a little while to wake up again – just be patient for a few seconds.)&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/bf9da8208a.png&#34; alt=&#34;&#34; title=&#34;Scrrenshot of GLAM Name index search&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Currently, the &lt;a href=&#34;https://glam-workbench.net/name-search/&#34;&gt;GLAM Name Search&lt;/a&gt; interface lets you search for names across 195 indexes from eight GLAM organisations. All together, there’s a total of more than &lt;strong&gt;9.2 million rows of data&lt;/strong&gt; to explore!&lt;/p&gt;
&lt;p&gt;It’s simple to use – just enter a name in the search box and Datasette will search each index in turn, displaying the first five matching results. You can click through to view all results from a specific index. Not surprisingly, the aggregated name search only searches columns containing names. However, once you click through to an individual table, you can apply additional filters or facets.&lt;/p&gt;
&lt;p&gt;To create the aggregated search interface I worked through the &lt;a href=&#34;https://github.com/GLAM-Workbench/ozglam-data/blob/master/glam-datasets-from-gov-portals-csvs.csv&#34;&gt;list of CSVs&lt;/a&gt; I’d harvested from government data portals to identify those that contained names of people, and discard those that contained administrative, rather than historical data. I also made a note of the columns that contained the names so I could index their contents once they’d been added to the database. Usually these were fields such as &lt;code&gt;Surname&lt;/code&gt; or &lt;code&gt;Given names&lt;/code&gt;, but sometimes names were in the record title or notes.&lt;/p&gt;
&lt;p&gt;Datasette uses SQLite databases to store its data. I decided to create one database for each GLAM organisation. I wrote some code to work through my list of datasets, saving them into an SQLite database, indexing the name columns, and writing information about the dataset to a &lt;code&gt;metadata.json&lt;/code&gt; file. This file is used by Datasette to display information such as the title, source, licence, and last modified date of each of the indexes.&lt;/p&gt;
&lt;p&gt;Once that was done, I could fire up Datasette and feed it all the SQLite databases. Amazingly it all worked – searching across all the indexes was remarkably quick! To make it publicly available I used the Datasette &lt;a href=&#34;https://docs.datasette.io/en/stable/publish.html&#34;&gt;&lt;code&gt;publish&lt;/code&gt;&lt;/a&gt; to push everything to Google CloudRun (about 1.4 gb of data). The first time I used CloudRun it took some time to get the authentication and other settings working properly. This time was much smoother. Before long it was live!&lt;/p&gt;
&lt;p&gt;Once I knew it all worked, I decided to add in another 59 indexes from the NSW State Archives. I also plugged in a few extra indexes from the Public Record Office of Victoria. These datasets are stored as ZIP files in the Victorian government data portal, so it took a little bit of extra manual processing to get everything sorted. But finally I had all &lt;strong&gt;195 indexes&lt;/strong&gt; loaded.&lt;/p&gt;
&lt;p&gt;What now? That depends on whether people find this experiment useful. I have a few ideas for improvements. But if people do use it, then the costs will go up. I’m going to have to monitor this over the next couple of months to see if I can afford to keep it going. If you want to help with the running costs, you might like to sign up as a &lt;a href=&#34;https://github.com/sponsors/wragge?o=esb&#34;&gt;GitHub sponsor&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;And please let me know if you think it’s worth developing!&lt;/strong&gt; #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Explore Trove’s digitised books</title>
      <link>https://updates.timsherratt.org/2021/08/16/explore-troves-digitised.html</link>
      <pubDate>Mon, 16 Aug 2021 16:40:15 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/08/16/explore-troves-digitised.html</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/trove-books/&#34;&gt;Trove books section of the GLAM Workbench &lt;/a&gt;has been updated! There’s freshly-harvested data, as well as updated Python packages, integration with Reclaim Cloud, and automated Docker builds.&lt;/p&gt;
&lt;p&gt;Included is &lt;a href=&#34;https://glam-workbench.net/trove-books/#harvesting-the-text-of-digitised-books-and-ephemera&#34;&gt;a notebook to harvest details of all books&lt;/a&gt; available from Trove in digital form. This includes both digitised books, that have been scanned and OCRd, as well as born digital publications, such as PDFs and epubs. The definition of ‘books’ is pretty loose – I’ve harvested details of anything that has been assigned the format ‘Book’ in Trove, but this includes &lt;a href=&#34;https://updates.timsherratt.org/2021/08/13/a-miscellany-of.html&#34;&gt;ephemera, such as posters, pamphlets, and advertising&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the latest harvest, I ended up with details of 42,174 ‘books’. This includes some duplicates, because multiple metadata entries can point to the same digital object. I thought it was best to preserve the duplicates, rather than discard the metadata.&lt;/p&gt;
&lt;p&gt;Once I’d harvested the details of the books, I tried to see if there was any OCRd text available for download. If there was, I saved it to a &lt;a href=&#34;https://cloudstor.aarnet.edu.au/plus/s/ugiw3gdijSKaoTL&#34;&gt;public folder on CloudStor&lt;/a&gt;. In total, I was able to download 26,762 files of OCRd text.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/7b5774a071.png&#34; alt=&#34;Screenshot of database showing details of digital book&#34;&gt;&lt;/p&gt;
&lt;p&gt;The easiest way to explore the books is &lt;a href=&#34;https://trove-digital-books.glitch.me/data/trove-digital-books&#34;&gt;using this searchable database&lt;/a&gt;. It’s created using Datasette and is running on Glitch. Full text search is available on the ‘title’ and ‘contributors’ fields, and you can filter on things like date, copyright status, number of pages, and whether OCRd text is available for download. If there is OCRd text, a direct link to the file on CloudStor is included. You can use the database to filter the titles, creating your own dataset that you can download in CSV or JSON format.&lt;/p&gt;
&lt;p&gt;If you just want the full list of books as a CSV file, you can &lt;a href=&#34;https://nbviewer.jupyter.org/github/GLAM-Workbench/trove-books/blob/master/trove_digitised_books_with_ocr.csv&#34;&gt;download it here&lt;/a&gt;. And if you want &lt;em&gt;all&lt;/em&gt; the OCRd text, you can go straight to the &lt;a href=&#34;https://cloudstor.aarnet.edu.au/plus/s/ugiw3gdijSKaoTL&#34;&gt;public folder on CloudStor&lt;/a&gt; – there’s about 3.6gb of text files to explore! #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>A miscellany of ephemera, oddities, &amp; estrays</title>
      <link>https://updates.timsherratt.org/2021/08/13/a-miscellany-of.html</link>
      <pubDate>Fri, 13 Aug 2021 12:02:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/08/13/a-miscellany-of.html</guid>
      <description>&lt;p&gt;I’m just in the midst of updating my harvest of OCRd text from Trove’s digitised books (more about that soon!). But amongst the items catalogued as ‘books’ are a wide assortment of ephemera, posters, advertisements, and other oddities. There’s no consistent way of identifying these items through the search interface, but because I’ve found the number of pages in each ‘book’ as part of the harvesting process, I can limit results to items with just a single digitised page – there’s more than 1,500! To make it easy to explore this collection of odds and ends, I’ve downloaded all the single page images and compiled them into &lt;a href=&#34;https://www.dropbox.com/s/xi84y12zz6iryfu/trove-ephemera.pdf?dl=0&#34;&gt;one big PDF&lt;/a&gt; with links back to their entries in Trove. Enjoy your browsing!&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/1f185ad85f.png&#34; alt=&#34;&#34; title=&#34;Screenshot from PDF showing a &#39;To let&#39; poster&#34;&gt;&lt;/p&gt;
&lt;p&gt;This is another example of the ways in which we can extend and enrich existing collection interfaces using simple technologies like PDFs and CSVs. We can create slices across existing categories to expose interesting features, and provide new entry points for researchers. Some other examples in the GLAM Workbench are the collection of &lt;a href=&#34;https://glam-workbench.net/trove-journals/#editorial-cartoons-from-the-bulletin-1886-to-1952&#34;&gt;editorial cartoons from &lt;em&gt;The Bulletin&lt;/em&gt;&lt;/a&gt;, the list of Trove newspapers with &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#trove-newspapers-with-non-english-language-content&#34;&gt;non-English content&lt;/a&gt;, the harvest of &lt;a href=&#34;https://glam-workbench.net/trove-music/#abc-radio-national-programs&#34;&gt;ABC Radio National programs&lt;/a&gt;, and the recent collection of &lt;a href=&#34;https://glam-workbench.net/trove-journals/#politicians-talking-about-covid&#34;&gt;politicians talking about COVID&lt;/a&gt;. &lt;a href=&#34;https://glam-workbench.net/suggest-a-topic/&#34;&gt;Let me know&lt;/a&gt; if you have any ideas for additional slices! #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Everyday heritage and the GLAM Workbench</title>
      <link>https://updates.timsherratt.org/2021/08/09/everyday-heritage-and.html</link>
      <pubDate>Mon, 09 Aug 2021 12:26:38 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/08/09/everyday-heritage-and.html</guid>
      <description>&lt;p&gt;Some good news on the funding front with the success of the &lt;a href=&#34;https://dataportal.arc.gov.au/NCGP/Web/Grant/Grant/LP200301446&#34;&gt;Everyday Heritage project&lt;/a&gt; in the latest round of ARC Linkage grants. The project aims to look beyond the formal discourses of ‘national’ heritage to develop a more diverse range of heritage narratives. Working at the intersection of place, digital collections, and material culture, team members will develop a series of ‘heritage biographies’, that document everyday experience, and provide new models for the heritage sector.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/37dbda004e.png&#34; alt=&#34;Screen capture of project details in ARC grants database&#34;&gt;&lt;/p&gt;
&lt;p&gt;Digital methods will play a major role in the project. I’ll be leading the ‘Heritage Hacks’ work package that will support the creation of the heritage biographies and develop a range of new tools and tutorials for use in heritage management contexts. All the tools, methods, and data generated through the project will be documented using Jupyter notebooks and published through the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt;. Watch this space!&lt;/p&gt;
&lt;p&gt;The project is led by &lt;a href=&#34;https://researchprofiles.canberra.edu.au/en/persons/tracy-ireland&#34;&gt;Tracy Ireland&lt;/a&gt; (University of Canberra), with &lt;a href=&#34;https://research-repository.uwa.edu.au/en/persons/jane-lydon&#34;&gt;Jane Lydon&lt;/a&gt; (UWA), &lt;a href=&#34;https://www.utas.edu.au/profiles/staff/humanities/kate-bagnall&#34;&gt;Kate Bagnall&lt;/a&gt; (UTAS), and me as chief investigators. Our industry partner is &lt;a href=&#34;https://www.gml.com.au/&#34;&gt;GML Heritage&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Recent GLAM Workbench presentations</title>
      <link>https://updates.timsherratt.org/2021/08/06/recent-glam-workbench.html</link>
      <pubDate>Fri, 06 Aug 2021 18:42:17 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/08/06/recent-glam-workbench.html</guid>
      <description>&lt;p&gt;So far this year I’ve given eight workshops or presentations relating to the GLAM Workbench, with probably a few more yet to come. Here’s the latest:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://slides.com/wragge/gcscr-2021&#34;&gt;Introducing the GLAM Workbench&lt;/a&gt;, presentation for the Griffith University Centre for Social and Cultural Research, Digital Humanities Seminar Series, 6 August 2021&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://youtu.be/pkC-seP00Kc&#34;&gt;Exploring the GLAM Workbench&lt;/a&gt; (&lt;a href=&#34;https://slides.com/wragge/uts-dh-2021&#34;&gt;slides&lt;/a&gt;), presentation for the UTS Digital Histories Seminar Series, 8 July 2021&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.5121188&#34;&gt;The GLAM Workbench: A Labs approach?&lt;/a&gt;, presentation for the panel &amp;lsquo;Research use of web archives: A labs approach&amp;rsquo;, at the IIPC Web Archiving Conference, 15 June 2021&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://slides.com/wragge/pha-workshop-may-2021&#34;&gt;Hands-on introduction to the GLAM Workbench&lt;/a&gt;, workshop for the Professional Historians Association of Victoria and Tasmania, 27 May 2021&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.5121224&#34;&gt;Exploring collections through the GLAM Workbench&lt;/a&gt;, keynote presentation for the XVIII Congrés d&amp;rsquo;Arxvística i Gestió de Documents de Catalunya, 11 May 2021&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://slides.com/wragge/aha-ecr-workshop&#34;&gt;Quick hacks and DIY data: Innovations for the discerning historian&lt;/a&gt;, presentation for AHA ECR digital skills seminar, 5 March 2021&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can view &lt;a href=&#34;https://glam-workbench.net/presentations/&#34;&gt;all the GLAM Workbench presentations&lt;/a&gt; here.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Updated! Lots and lots of text freshly harvested from Trove periodicals</title>
      <link>https://updates.timsherratt.org/2021/08/06/updated-lots-and.html</link>
      <pubDate>Fri, 06 Aug 2021 10:49:54 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/08/06/updated-lots-and.html</guid>
      <description>&lt;p&gt;For a few years now I’ve been harvesting downloadable text from digitised periodicals in Trove and making it easily available for exploration and research. I’ve just completed the latest harvest – here’s the summary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1,163 digitised periodicals had text available for download&lt;/li&gt;
&lt;li&gt;Text was downloaded from 51,928 individual issues&lt;/li&gt;
&lt;li&gt;Adding up to a total of around 12gb of text&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you want to dive straight in, here’s a &lt;a href=&#34;https://github.com/GLAM-Workbench/trove-journals/blob/master/digital-journals-with-text.md&#34;&gt;list of all the harvested periodicals&lt;/a&gt;, with links to download a summary of available issues, as well as all the harvested text (there’s one file per issue). You’ll notice that the list includes a large number of parliamentary papers and government reports as well as published journals.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/008359faba.png&#34; alt=&#34;List of Trove periodicals with downloadable text&#34;&gt;&lt;/p&gt;
&lt;p&gt;All of the harvested text is available from a &lt;a href=&#34;https://cloudstor.aarnet.edu.au/plus/s/QOmnqpGQCNCSC2h&#34;&gt;public folder on CloudStor&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The harvesting process involves a few different steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;First I generate a list of periodicals available in digital form from Trove. This includes digitised titles, as well as born-digital titles submitted through e-Legal Deposit. This produced &lt;a href=&#34;https://glam-workbench.net/trove-journals/#csv-formatted-list-of-journals-available-from-trove-in-digital-form&#34;&gt;a CSV file&lt;/a&gt; containing the details of 7,270 titles. See &lt;a href=&#34;https://glam-workbench.net/trove-journals/#create-a-list-of-troves-digitised-journals&#34;&gt;this notebook&lt;/a&gt; for details.&lt;/li&gt;
&lt;li&gt;Then I work through this list of titles to find out how many issues of each title are available through Trove. This information isn’t accessible through the API, so I have to do some screen scraping.&lt;/li&gt;
&lt;li&gt;Next I work through the list of issues and try to download the text contents. Most of the born-digital titles don’t have downloadable text.&lt;/li&gt;
&lt;li&gt;Once I’ve downloaded all the text I can from a title, I create a CSV file for it that lists the available issues and notes whether text is available for each. This file is stored with the text on CloudStor.&lt;/li&gt;
&lt;li&gt;Once I’ve checked all the titles, I generate &lt;a href=&#34;https://glam-workbench.net/trove-journals/#csv-formatted-list-of-journals-with-ocrd-text&#34;&gt;another CSV file&lt;/a&gt; that lists the details of all the periodicals that have downloadable text.&lt;/li&gt;
&lt;li&gt;The code to harvest and document the downloaded text is &lt;a href=&#34;https://glam-workbench.net/trove-journals/#download-the-ocrd-text-for-all-the-digitised-journals-in-trove&#34;&gt;available in this notebook&lt;/a&gt;. #dhhacks&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>New dataset – Politicians talking about COVID</title>
      <link>https://updates.timsherratt.org/2021/08/02/new-dataset-politicians.html</link>
      <pubDate>Mon, 02 Aug 2021 11:23:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/08/02/new-dataset-politicians.html</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/trove-journals/&#34;&gt;Trove Journals&lt;/a&gt; section of the GLAM Workbench includes &lt;a href=&#34;https://glam-workbench.net/trove-journals/#harvest-parliament-press-releases-from-trove&#34;&gt;a notebook&lt;/a&gt; that helps you download press releases, speeches, and interview transcripts by Australian federal politicians. These documents are compiled and published by the Parliamentary Library, and the details are regularly harvested into Trove.&lt;/p&gt;
&lt;p&gt;Using this notebook, I’ve created a collection of documents that include the words ‘COVID’ or ‘Coronavirus’. It includes all the &lt;strong&gt;metadata&lt;/strong&gt; from Trove, as well as the &lt;strong&gt;full text&lt;/strong&gt; of each document downloaded from the Parliamentary Library. There’s &lt;strong&gt;3,995 documents in total&lt;/strong&gt;, covering the period up until early April 2021. You can &lt;strong&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-journals/raw/6f4d805186716853b81c7d93cac6754685b384bf/press-releases/press-releases-coronavirus-or-covid.zip&#34;&gt;download them all as a zip file&lt;/a&gt;&lt;/strong&gt; (12 mb).&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/44a7f87b2c.png&#34; alt=&#34;&#34; title=&#34;Screenshot showing a sample of the harvested metadata&#34;&gt;&lt;/p&gt;
&lt;p&gt;While I was compiling this dataset, I also made a few improvements to the notebook. You can now filter the results to weed out false positives, and identify duplicates. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>8 million Trove tags to explore!</title>
      <link>https://updates.timsherratt.org/2021/07/14/million-trove-tags.html</link>
      <pubDate>Wed, 14 Jul 2021 18:06:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/07/14/million-trove-tags.html</guid>
      <description>&lt;p&gt;I’ve always been interested in the way people add value to resources in Trove. OCR correction tends to get all the attention, but Trove users have also been busy organising resources using tags, lists, and comments. I used to refer to tagging quite often in &lt;a href=&#34;http://discontents.com.au/myths-mega-projects-and-making/&#34;&gt;presentations&lt;/a&gt;, pointing to the different ways they were used. For example, ‘TBD’ is a workflow marker, used by text correctors to label articles that are ‘To Be Done’. My favourite was ‘LRRSA’, one of the most heavily-used tags across the whole of Trove. What does it mean? It stands for the Light Rail Research Society of Australia, and the tag is used by members to mark items of shared interest. It’s a great example of how something as simple as plain text tags can be used to support collaboration and build communities.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/f76b2e110b.png&#34; alt=&#34;Word cloud showing the top 200 Trove tags&#34;&gt;&lt;/p&gt;
&lt;p&gt;Until its update last year, Trove used to provide some basic stats about user activity. There was also a tag cloud that let you explore the most commonly-used tags. It’s now much harder to access this sort of information. However, you can extract some basic information about tags from the Trove API. First of all, you can filter a search using ‘has:tags’ to limit the results to items that have tags attached to them. Then to find out what the tags actually are, you can add the &lt;code&gt;include=tags&lt;/code&gt; parameter. This embeds the tags within the item record, so you can work through a set of results, extracting all the tags as you go. To save you the trouble, I’ve done this for the whole of Trove, and ended up with &lt;strong&gt;a dataset containing more than 8 million tags&lt;/strong&gt;!&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/6ef630ccef.png&#34; alt=&#34;Chart showing the number of tags per year and zone.&#34;&gt;&lt;/p&gt;
&lt;p&gt;The dataset is saved as a 500mb CSV file, and contains the following fields:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tag&lt;/code&gt; – lower-cased version of the tag&lt;/li&gt;
&lt;li&gt;&lt;code&gt;date&lt;/code&gt; – date the tag was added&lt;/li&gt;
&lt;li&gt;&lt;code&gt;zone&lt;/code&gt; – the API zone that contains the tagged resource&lt;/li&gt;
&lt;li&gt;&lt;code&gt;resource_id&lt;/code&gt; – the identifier of he tagged resource&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There’s a few things to note about the data:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Works (such as books) in Trove can have tags attached at either work or version level. This dataset aggregates all tags at the work level, removing any duplicates.&lt;/li&gt;
&lt;li&gt;A single resource in Trove can appear in multiple zones – for example, a book that includes maps and illustrations might appear in the &amp;lsquo;book&amp;rsquo;, &amp;lsquo;picture&amp;rsquo;, and &amp;lsquo;map&amp;rsquo; zones. This means that some of the tags will essentially be duplicates – harvested from different zones, but relating to the same resource. Depending on your interests, you might want to remove these duplicates.&lt;/li&gt;
&lt;li&gt;While most of the tags were added by Trove users, more than 500,000 tags were added by Trove itself in November 2009. I think these tags were automatically generated from related Wikipedia pages. Depending on your interests, you might want to exclude these by limiting the date range or zones.&lt;/li&gt;
&lt;li&gt;User content added to Trove, including tags, is available for reuse under a CC-BY-NC licence.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can download the complete dataset from &lt;a href=&#34;https://doi.org/10.5281/zenodo.5094314&#34;&gt;Zenodo&lt;/a&gt;, or from &lt;a href=&#34;https://cloudstor.aarnet.edu.au/plus/s/YiWStNrhnTo18JI&#34;&gt;CloudStor&lt;/a&gt;. For more information on how I harvested the data, and some of its limits and complexities, see the notebooks in the new &lt;a href=&#34;https://glam-workbench.net/trove-lists/#tags&#34;&gt;‘Tags’ section in the GLAM Workbench&lt;/a&gt;. There’s also some examples of analysing and visualising the tags. As an extra bonus, there’s a more compact &lt;a href=&#34;https://github.com/GLAM-Workbench/trove-lists/blob/master/trove_tag_counts_20210710.csv&#34;&gt;50mb CSV dataset&lt;/a&gt; which lists each unique tag and the number of times it has been used.&lt;/p&gt;
&lt;p&gt;Of course, it’s worth remembering that this sort of dataset is out of date before the harvest is even finished. More tags are being added all the time! But hopefully this data will help us better understand the way people work to organise and enrich complex resources like Trove. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Integrating GLAM Workbench news and discussion</title>
      <link>https://updates.timsherratt.org/2021/07/01/integrating-glam-workbench.html</link>
      <pubDate>Thu, 01 Jul 2021 16:18:41 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/07/01/integrating-glam-workbench.html</guid>
      <description>&lt;p&gt;I’ve spent a lot of time this year working on ways of improving the GLAM Workbench’s documentation and its integration with other services. Last year I created &lt;a href=&#34;https://ozglam.chat/&#34;&gt;OzGLAM Help&lt;/a&gt; to provide a space where users of GLAM collections could ask questions and share discoveries – including a dedicated &lt;a href=&#34;https://ozglam.chat/c/glam-workbench/8&#34;&gt;GLAM Workbench channel&lt;/a&gt;. Earlier this year, I tweaked my Micro.blog powered updates to include a dedicated &lt;a href=&#34;https://updates.timsherratt.org/categories/glamworkbench&#34;&gt;GLAM Workbench news feed&lt;/a&gt;. Now I’ve brought the two together! What does this mean?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Any GLAM Workbench news that I post to my updates feed is now automatically added to OzGLAM Help&lt;/li&gt;
&lt;li&gt;Links are automatically added to items in the news feed that let you add comments or questions in OzGLAM Help&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So now there’s two-way communication between the services providing more ways for people to discover and discuss how the GLAM Workbench can help them.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>GLAM Workbench now on YouTube!</title>
      <link>https://updates.timsherratt.org/2021/07/01/glam-workbench-now.html</link>
      <pubDate>Thu, 01 Jul 2021 15:51:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/07/01/glam-workbench-now.html</guid>
      <description>&lt;p&gt;I’ve started creating short videos to introduce or explain various components of the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt;. The first video shows how you can visualise searches in Trove’s digitised newspapers using the latest version of QueryPic. It’s a useful introduction to the way access to collection data enables us to ask different types of questions of historical sources.&lt;/p&gt;
&lt;iframe width=&#34;560&#34; height=&#34;315&#34; src=&#34;https://www.youtube.com/embed/vdyKNowv9gw&#34; title=&#34;YouTube video player&#34; frameborder=&#34;0&#34; allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&#34; allowfullscreen&gt;&lt;/iframe&gt;
&lt;p&gt;As with all GLAM Workbench resources, the video is openly-licensed – so feel free to stop it into your own course materials or workshops. It could, for example, provide an interesting little digital methods task in an Australian history unit.&lt;/p&gt;
&lt;p&gt;I’ll be creating a second QueryPic video shortly, demonstrating how you can work with complex queries and differing timescales. Let me know if you find it useful, or if you have any ideas for future topics. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>GLAM Workbench office hours</title>
      <link>https://updates.timsherratt.org/2021/06/28/glam-workbench-office.html</link>
      <pubDate>Mon, 28 Jun 2021 15:04:56 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/06/28/glam-workbench-office.html</guid>
      <description>&lt;p&gt;To help you make use of the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt;, I’ve set up an ‘office hours’ time slot every Friday when people can book in for 30 minute chats via Zoom. Want to talk about how you might use the GLAM Workbench in your latest research project? Are you having trouble getting started with GLAM data? Or perhaps you have some ideas for future notebooks you’d like to share? Just click on the ‘Book a chat’ link in the GLAM Workbench, or head straight to the &lt;a href=&#34;https://calendly.com/timsherratt/30minchat&#34;&gt;scheduling page&lt;/a&gt; to set up a time!&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/94822f8fd4.png&#34; alt=&#34;Book a chat!&#34; title=&#34;Screenshot of booking pop up in the GLAM Workbench&#34;&gt;&lt;/p&gt;
&lt;p&gt;This is yet another experiment to see how I can support the use of GLAM data and the development of digital skills with the GLAM Workbench. Let me know if you think it’s worthwhile. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>QueryPic: The Next Generation</title>
      <link>https://updates.timsherratt.org/2021/06/21/querypic-the-next.html</link>
      <pubDate>Mon, 21 Jun 2021 11:50:19 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/06/21/querypic-the-next.html</guid>
      <description>&lt;p&gt;QueryPic is a tool to visualise searches in Trove’s digitised newspapers. I created the first version &lt;a href=&#34;http://discontents.com.au/mining-the-treasures-of-trove-part-2/&#34;&gt;way back in 2011&lt;/a&gt;, and since then it’s taken a number of different forms. The latest version introduces some new features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Automatic query creation&lt;/strong&gt; – construct your search in the Trove web interface, then just copy and paste the url into QueryPic. This means you can take advantage of Trove’s advanced search and facets to build complex queries.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multiple time scales&lt;/strong&gt; – previous versions only aggregated search results by year, but now you can also aggregate by month, or by day. QueryPic will automatically choose a time unit based on the date range of your query, but if you’re not happy with the result you can change it!&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Links back to Trove&lt;/strong&gt; – click on any of the points on the chart to search Trove within that time period. This enables you to zoom in and out of your results, from the high-level visualisation, to individual articles.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/b02780ed6c.png&#34; alt=&#34;Screenshot of QueryPic chart&#34;&gt;&lt;/p&gt;
&lt;p&gt;This version of QueryPic is built within a Jupyter notebook, and designed to run using Voila (which hides all the code and makes the notebook look like a web app). See the &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#querypic&#34;&gt;Trove Newspapers section&lt;/a&gt; of the GLAM Workbench for more information. If you’d like to give it a try, just click the button below to run it live using Binder.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://mybinder.org/v2/gh/GLAM-Workbench/trove-newspapers/master?urlpath=voila/render/querypic.ipynb&#34;&gt;&lt;img src=&#34;https://static.mybinder.org/badge_logo.svg&#34; alt=&#34;Binder badge&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hope you find it useful! #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Everyone gets a Lab!</title>
      <link>https://updates.timsherratt.org/2021/06/21/everyone-gets-a.html</link>
      <pubDate>Mon, 21 Jun 2021 11:00:30 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/06/21/everyone-gets-a.html</guid>
      <description>&lt;p&gt;I recently took part in a panel at the &lt;a href=&#34;https://netpreserve.org/ga2021/&#34;&gt;IIPC Web Archiving Conference&lt;/a&gt; discussing ‘Research use of web archives: a Labs approach’. My fellow panellists described some amazing stuff going on in European cultural heritage organisations to support researchers who want to make use of web archives. My ‘lab’ doesn’t have a physical presence, or an institutional home, but it does provide a starting point for researchers, and with the latest Reclaim Cloud and Docker integrations, everyone can have their own web archives lab! Here’s my 8 minute video. The slides are &lt;a href=&#34;https://slides.com/wragge/wac-labs-panel&#34;&gt;available here&lt;/a&gt;.&lt;/p&gt;
&lt;iframe src=&#34;https://player.vimeo.com/video/563996783&#34; width=&#34;640&#34; height=&#34;360&#34; frameborder=&#34;0&#34; allow=&#34;autoplay; fullscreen; picture-in-picture&#34; allowfullscreen&gt;&lt;/iframe&gt;
</description>
    </item>
    
    <item>
      <title>Minor change to Reclaim Cloud config</title>
      <link>https://updates.timsherratt.org/2021/06/14/minor-change-to.html</link>
      <pubDate>Mon, 14 Jun 2021 15:44:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/06/14/minor-change-to.html</guid>
      <description>&lt;p&gt;When the 1-click installer for Reclaim Cloud works its magic and turns GLAM Workbench repositories into your own, personal digital labs, it creates a new &lt;code&gt;work&lt;/code&gt; directory mounted inside of your main Jupyter directory. This new directory is independent of the Docker image used to run Jupyter, so it’s a handy place to copy things if you ever want to update the Docker image. However, I just realised that there was a permissions problem with the &lt;code&gt;work&lt;/code&gt; directory which meant you couldn’t write files to it from within Jupyter.&lt;/p&gt;
&lt;p&gt;To fix the problem, I’ve added an extra line to the &lt;code&gt;reclaim-manifest.jps&lt;/code&gt; config file to make the Jupyter user the owner of the &lt;code&gt;work&lt;/code&gt; directory:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;	- cmd[cp]: chown -R jovyan:jovyan /home/jovyan/work
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This takes care of any new installations. If you have an existing installation, you can either just create a completely new environment using the updated config, or you can manually change the permissions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hover over the name of your environment in the control panel to display the option buttons.&lt;/li&gt;
&lt;li&gt;Click on the Settings button. A new box will open at the bottom of the control panel with all the settings options.&lt;/li&gt;
&lt;li&gt;Click on &amp;lsquo;SSH Access&amp;rsquo; in the left hand menu of the settings box.&lt;/li&gt;
&lt;li&gt;Click on the &amp;lsquo;SSH Connection&amp;rsquo; tab.&lt;/li&gt;
&lt;li&gt;Under &amp;lsquo;Web SSH&amp;rsquo; click on the Connect button and select the default node.&lt;/li&gt;
&lt;li&gt;A terminal session will open. At the command line enter the following:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;	chown -R jovyan:jovyan /home/jovyan/work
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Done! See the &lt;a href=&#34;https://glam-workbench.net/using-reclaim-cloud/&#34;&gt;Using Reclaim Cloud&lt;/a&gt; section of the GLAM Workbench for more information.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Trove Query Parser</title>
      <link>https://updates.timsherratt.org/2021/06/14/trove-query-parser.html</link>
      <pubDate>Mon, 14 Jun 2021 13:46:01 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/06/14/trove-query-parser.html</guid>
      <description>&lt;p&gt;Here’s a &lt;a href=&#34;https://github.com/wragge/trove_query_parser/&#34;&gt;new little Python package&lt;/a&gt; that you might find useful. It simply takes a search url from Trove’s Newspapers &amp;amp; Gazettes category and converts it into a set of parameters that you can use to request data from the Trove API. While some parameters are used both in the web interface and the API, there are a lot of variations – this package means you don’t have to keep track of all the differences!&lt;/p&gt;
&lt;p&gt;It’s very simple to use.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/5b7e85ea88.png&#34; alt=&#34;How to use the Trove Query Parser.&#34;&gt;&lt;/p&gt;
&lt;p&gt;The code for the parser has been basically lifted from the &lt;a href=&#34;https://pypi.org/project/troveharvester/&#34;&gt;Trove Newspaper Harvester&lt;/a&gt;. I wanted to separate it out so that I could use it at various spots in the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt; and in other projects.&lt;/p&gt;
&lt;p&gt;This package, the documentation, and the tests were all created using &lt;a href=&#34;https://github.com/fastai/nbdev&#34;&gt;nbdev&lt;/a&gt;, which is really quite a fun way to develop Python packages. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Some GLAM Workbench stats</title>
      <link>https://updates.timsherratt.org/2021/06/13/some-glam-workbench.html</link>
      <pubDate>Sun, 13 Jun 2021 19:01:23 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/06/13/some-glam-workbench.html</guid>
      <description>&lt;p&gt;I deliberately don’t keep any stats about GLAM Workbench visits, because I think they’re pretty meaningless. On the other hand, I’m always interested to see how often GLAM Workbench repositories are launched on &lt;a href=&#34;https://archive.analytics.mybinder.org/&#34;&gt;Binder&lt;/a&gt;. Rather than just random clicks, these numbers represent the number of times users started new computing sessions using the GLAM Workbench. I just compiled these stats for the past year, and I was very pleased to see that the &lt;a href=&#34;https://glam-workbench.net/web-archives/&#34;&gt;Web Archives&lt;/a&gt; section has been launched over 1,000 times in the past twelve months! The &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/&#34;&gt;Trove Newspapers&lt;/a&gt; and &lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;Trove Newspaper Harvester&lt;/a&gt; repositories are also well used – on average these are both being launched more than once a day.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/3aa3ee045b.png&#34; alt=&#34;Binder launches of GLAM Workbench repositories, 1 June 2020 to 2 June 2021.&#34;&gt;&lt;/p&gt;
&lt;p&gt;The GLAM Workbench is never going to attract massive numbers of users – it’s all about &lt;em&gt;being there&lt;/em&gt; when a researcher needs help to use GLAM collections. One or two launches per day means one or two researchers from somewhere around the world are able to explore new datasets, or ask new questions. I think that’s pretty important.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>More Reclaim Cloud integrations!</title>
      <link>https://updates.timsherratt.org/2021/06/13/more-reclaim-cloud.html</link>
      <pubDate>Sun, 13 Jun 2021 18:28:12 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/06/13/more-reclaim-cloud.html</guid>
      <description>&lt;p&gt;Five of the GLAM Workbench repositories now have automatically built Docker images and 1-click integration with &lt;a href=&#34;https://reclaim.cloud/&#34;&gt;Reclaim Cloud&lt;/a&gt; – &lt;a href=&#34;https://glam-workbench.net/anu-archives/&#34;&gt;ANU Archives&lt;/a&gt;, &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/&#34;&gt;Trove Newspapers&lt;/a&gt;, &lt;a href=&#34;https://glam-workbench.net/trove-harvester/&#34;&gt;Trove Newspaper Harvester&lt;/a&gt;, &lt;a href=&#34;https://glam-workbench.net/recordsearch/&#34;&gt;NAA RecordSearch&lt;/a&gt;, &amp;amp; &lt;a href=&#34;https://glam-workbench.net/web-archives/&#34;&gt;Web Archives&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/45605b531f.png&#34; alt=&#34;&#34; title=&#34;Screencap showing Reclaim Cloud details&#34;&gt;&lt;/p&gt;
&lt;p&gt;This means you can launch your very own version of these GLAM Workbench repositories in the cloud, where all your downloads and experiments will be saved! Find out more on the &lt;a href=&#34;https://glam-workbench.net/using-reclaim-cloud/&#34;&gt;Using Reclaim Cloud&lt;/a&gt; page.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Get your GLAM datasets here!</title>
      <link>https://updates.timsherratt.org/2021/06/13/get-your-glam.html</link>
      <pubDate>Sun, 13 Jun 2021 18:12:37 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/06/13/get-your-glam.html</guid>
      <description>&lt;p&gt;I’ve updated my harvest of Australian GLAM datasets from state/national government open data portals. There’s now 387 datasets, containing 1049 files (including 684 CSVs). &lt;a href=&#34;https://glam-workbench.net/glam-datasets-from-gov-portals/&#34;&gt;There’s a list&lt;/a&gt; if you want to browse, and &lt;a href=&#34;https://github.com/GLAM-Workbench/ozglam-data/blob/master/glam-datasets-from-gov-portals.csv&#34;&gt;a CSV file&lt;/a&gt; if you want to download all the metadata. For more more information see the &lt;a href=&#34;https://glam-workbench.net/glam-data-portals/&#34;&gt;data portals section&lt;/a&gt; of the GLAM Workbench.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/ef977dfcfc.png&#34; alt=&#34;Number of datatsets by institution&#34; title=&#34;Screencap showing number of datasets per institution&#34;&gt;&lt;/p&gt;
&lt;p&gt;If you’re interested in finding out what’s inside all those 684 CVS files, take the &lt;a href=&#34;https://glam-workbench.net/csv-explorer/&#34;&gt;GLAM CSV Explorer&lt;/a&gt; for a spin! It’s also been given a refresh, with new data and a new interface. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>NAA RecordSearch section of the GLAM Workbench updated!</title>
      <link>https://updates.timsherratt.org/2021/05/24/naa-recordsearch-section.html</link>
      <pubDate>Mon, 24 May 2021 11:50:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/05/24/naa-recordsearch-section.html</guid>
      <description>&lt;p&gt;If you work with the collections of the National Archives of Australia, you might find the &lt;a href=&#34;https://glam-workbench.net/recordsearch/&#34;&gt;RecordSearch section&lt;/a&gt; of the GLAM Workbench helpful. I’ve just updated the repository to add new options for running the notebooks, including 1-click installation on Reclaim Cloud. There’s also a few new notebooks.&lt;/p&gt;
&lt;h2 id=&#34;new-notebooks-and-datasets&#34;&gt;New notebooks and datasets&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/recordsearch/#harvest-details-of-all-series-in-recordsearch&#34;&gt;Harvest details of all series in RecordSearch&lt;/a&gt; – get details of all series registered in RecordSearch, also generates a summary dataset with the total number of items digitised, described and in each access category&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/recordsearch/#exploring-harvested-series-data&#34;&gt;Exploring harvested series data&lt;/a&gt;  – generates some basic statistics from the harvest of series data&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/blob/master/series_totals_May_2021.csv&#34;&gt;Summary data about all series in RecordSearch&lt;/a&gt;  (15mb CSV) – contains basic descriptive information about all the series currently registered on RecordSearch (May 2021) as well as the total number of items described, digitised, and in each access category&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/4239169ca4.png&#34; width=&#34;594&#34; height=&#34;678&#34; alt=&#34;&#34; /&gt;
&lt;h2 id=&#34;updated&#34;&gt;Updated&lt;/h2&gt;
&lt;p&gt;I’ve started (but not completed) updating all the notebooks in this repository to use my new &lt;a href=&#34;https://wragge.github.io/recordsearch_data_scraper/&#34;&gt;RecordSearch Data Scraper&lt;/a&gt;. The new scraper is simpler and more efficient, and enables me to get rid of a lot of boilerplate code. Updated notebooks include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/recordsearch/#harvest-items-from-a-search-in-recordsearch&#34;&gt;Harvest items from a search in RecordSearch&lt;/a&gt; – save the results of an item search in RecordSearch as a downloadable dataset, you can also save images and PDFs from digitised files (PDF saving is new!)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/recordsearch/#harvest-files-with-the-access-status-of-closed&#34;&gt;Harvest files with the access status of ‘closed’&lt;/a&gt; – find out what we’re not allowed to see by harvesting details of ‘closed’ files&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Other updates include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Python packages updated&lt;/li&gt;
&lt;li&gt;Integration with Reclaim Cloud allowing 1-click installation of the whole repository and environment&lt;/li&gt;
&lt;li&gt;Automatic creation of Docker images when the repository is updated&lt;/li&gt;
&lt;li&gt;Updated &lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch#readme&#34;&gt;README&lt;/a&gt; and repository index with list of all notebooks&lt;/li&gt;
&lt;li&gt;Notebooks intended to run as apps now use Voila rather than Appmode for better integration with Jupyter Lab&lt;/li&gt;
&lt;li&gt;&lt;code&gt;requirements-unpinned.txt&lt;/code&gt; added to repository for people who want to develop the notebooks in their own clean environment&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Hope you find these changes useful! #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Web archives section of GLAM Workbench updated!</title>
      <link>https://updates.timsherratt.org/2021/05/17/web-archives-section.html</link>
      <pubDate>Mon, 17 May 2021 13:22:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/05/17/web-archives-section.html</guid>
      <description>&lt;p&gt;My program of rolling out new features and integrations across the GLAM Workbench continues. The latest section to be updated is the &lt;a href=&#34;https://glam-workbench.net/web-archives/&#34;&gt;Web Archives&lt;/a&gt; section!&lt;/p&gt;
&lt;p&gt;There are no new notebooks with this update, but some important changes under the hood. If you haven’t used it before, the Web Archives section contains 16 notebooks providing documentation, tools, apps, and examples to help you make use of web archives in your research. The notebooks are grouped by the following topics: &lt;strong&gt;Types of data&lt;/strong&gt;, &lt;strong&gt;Harvesting data and creating datasets&lt;/strong&gt;, and &lt;strong&gt;Exploring change over time&lt;/strong&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/51d0d97ca3.png&#34; width=&#34;721&#34; height=&#34;622&#34; alt=&#34;&#34; /&gt;
&lt;p&gt;I’ve updated all the Python packages used in this repository and changed the app-ified notebooks to run using Voila (which is better integrated with Jupyter Lab than Appmode). But most importantly, you can now install the repository into your own persistent environment using &lt;a href=&#34;https://reclaim.cloud/&#34;&gt;Reclaim Cloud&lt;/a&gt; or Docker.&lt;/p&gt;
&lt;p&gt;As &lt;a href=&#34;https://circulatingnow.nlm.nih.gov/2021/05/13/exploring-the-data-of-web-archives-as-part-of-data-science-nlm/&#34;&gt;Christie Moffatt noted recently&lt;/a&gt; harvesting data from web archives can take a long time, and you might hit the limits of the free Binder service. These new integrations mean you don’t have to worry about your notebooks timing out. Just click on the &lt;strong&gt;Launch on Reclaim Cloud&lt;/strong&gt; button and you can have your own fully-provisioned, persistent environment up and running in minutes!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/bf594af78e.png&#34; width=&#34;553&#34; height=&#34;343&#34; alt=&#34;&#34; /&gt;
&lt;p&gt;This is possible because every change to the Web Archives repository now triggers the build of a new Docker image with all the software that you need pre-installed. You can also run this Docker image on your own computer, or using another cloud service.&lt;/p&gt;
&lt;p&gt;The Web Archives section now &lt;a href=&#34;https://glam-workbench.net/web-archives/#run-these-notebooks&#34;&gt;includes documentation&lt;/a&gt; on running the notebooks using Binder, Reclaim, Cloud or Docker. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Using web archives to find out when newspapers were added to Trove</title>
      <link>https://updates.timsherratt.org/2021/05/12/using-web-archives.html</link>
      <pubDate>Wed, 12 May 2021 12:36:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/05/12/using-web-archives.html</guid>
      <description>&lt;p&gt;There’s no doubt that Trove’s digitised newspapers have had a significant impact on the practice of history in Australia. But analysing that impact is difficult when Trove itself is always changing – more newspapers and articles are being added all the time.&lt;/p&gt;
&lt;p&gt;In an attempt to chart the development of Trove, I’ve created a dataset that shows (approximately) when particular newspaper titles were first added. This gives a rough snapshot of what Trove contained at any point in the last 12 years.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/a272374b48.png&#34; width=&#34;519&#34; height=&#34;242&#34; alt=&#34;&#34; /&gt;
&lt;p&gt;I say &lt;em&gt;approximately&lt;/em&gt; because the only public source of this information are web archives like the &lt;a href=&#34;https://archive.org/web/&#34;&gt;Internet Archive’s Wayback Machine&lt;/a&gt; and &lt;a href=&#34;https://webarchive.nla.gov.au/collection&#34;&gt;Trove itself&lt;/a&gt;. By downloading &lt;a href=&#34;https://web.archive.org/web/*/http://trove.nla.gov.au/ndp/del/titles&#34;&gt;captures of Trove’s browse page&lt;/a&gt;, I was able to extract a list of newspaper titles available &lt;strong&gt;when that capture was made&lt;/strong&gt;. Depending on the frequency of captures, the titles may have been first made available some time earlier.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#gathering-historical-data-about-the-addition-of-newspaper-titles-to-trove&#34;&gt;method I used&lt;/a&gt; to create the dataset is documented in the Trove Newspapers section of the GLAM Workbench. I used the Internet Archive as my source rather than Trove just because there were more captures available. Most of the code I could conveniently copy from the &lt;a href=&#34;https://glam-workbench.net/web-archives/&#34;&gt;Web Archives&lt;/a&gt; section of the GLAM Workbench, in particular the &lt;a href=&#34;https://glam-workbench.net/web-archives/#find-all-the-archived-versions-of-a-web-page&#34;&gt;Find all the archived versions of a particular web page&lt;/a&gt; notebook.&lt;/p&gt;
&lt;p&gt;The result was actually two datasets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers/blob/master/trove_newspaper_titles_2009_2021.csv&#34;&gt;trove_newspaper_titles_2009_2021.csv&lt;/a&gt;  – complete dataset of captures and titles&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers/blob/master/trove_newspaper_titles_first_appearance_2009_2021.csv&#34;&gt;trove_newspaper_titles_first_appearance_2009_2021.csv&lt;/a&gt;  – filtered dataset, showing only the first appearance of each title / place / date range combination&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There’s also an &lt;a href=&#34;https://gist.github.com/wragge/7d80507c3e7957e271c572b8f664031a&#34;&gt;alphabetical list of newspaper titles&lt;/a&gt; for easy browsing. The list shows the date of the capture in which the title was first recorded, as well as any changes to its date range. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>GLAM Jupyter Resources</title>
      <link>https://updates.timsherratt.org/2021/05/12/glam-jupyter-resources.html</link>
      <pubDate>Wed, 12 May 2021 11:58:29 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/05/12/glam-jupyter-resources.html</guid>
      <description>&lt;p&gt;To make it easier for people to suggest additions, I’ve created a &lt;a href=&#34;https://github.com/GLAM-Workbench/GLAM-jupyter-resources&#34;&gt;GitHub repository&lt;/a&gt; for my list of GLAM Jupyter examples and resources. Contributions are welcome!&lt;/p&gt;
&lt;p&gt;This list is automatically pulled into the &lt;a href=&#34;https://glam-workbench.net/more-glam-notebooks/&#34;&gt;GLAM Workbench&amp;rsquo;s help documentation&lt;/a&gt;. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Running notebooks – a sign of things to come in the GLAM Workbench</title>
      <link>https://updates.timsherratt.org/2021/05/12/running-notebooks-a.html</link>
      <pubDate>Wed, 12 May 2021 11:51:11 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/05/12/running-notebooks-a.html</guid>
      <description>&lt;p&gt;I recently made some changes in the GLAM Workbench’s Help documentation, adding a new &lt;strong&gt;Running notebooks&lt;/strong&gt; section. This section provides detailed information of running and managing GLAM Workbench repositories using &lt;a href=&#34;https://glam-workbench.net/using-reclaim-cloud/&#34;&gt;Reclaim Cloud&lt;/a&gt; and &lt;a href=&#34;https://glam-workbench.net/using-docker/&#34;&gt;Docker&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I’m still rolling out this functionality across all the repositories, but it’s going to take a while. When I’m finished you’ll be able to create your own persistent environment on Reclaim Cloud from any repository with just the click of a button. See the &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/&#34;&gt;Trove Newspapers&lt;/a&gt; section to try this out now! #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Sponsor my work on GitHub!</title>
      <link>https://updates.timsherratt.org/2021/05/12/sponsor-my-work.html</link>
      <pubDate>Wed, 12 May 2021 11:25:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/05/12/sponsor-my-work.html</guid>
      <description>&lt;p&gt;As I &lt;a href=&#34;https://updates.timsherratt.org/2021/03/26/moving-on-from.html&#34;&gt;foreshadowed some weeks ago&lt;/a&gt;, I’ve shut down my Patreon page. Thanks to everyone who has supported me there over the last few years!&lt;/p&gt;
&lt;p&gt;I’ve now shifted across to GitHub Sponsors, which is focused on supporting open source projects. This seems like a much better fit for the things that I do, which are all &lt;strong&gt;free&lt;/strong&gt; and &lt;strong&gt;open&lt;/strong&gt; by default.&lt;/p&gt;
&lt;p&gt;So if you think things like the &lt;a href=&#34;https://glam-workbench.net/&#34;&gt;GLAM Workbench&lt;/a&gt;, &lt;a href=&#34;https://historichansard.net/&#34;&gt;Historic Hansard&lt;/a&gt;, &lt;a href=&#34;https://ozglam.chat/&#34;&gt;OzGLAM Help&lt;/a&gt;, and &lt;a href=&#34;https://www.realfaceofwhiteaustralia.net/&#34;&gt;The Real Face of White Australia&lt;/a&gt; are worth supporting, you can sign up using my &lt;a href=&#34;https://github.com/sponsors/wragge?o=esb&#34;&gt;&lt;strong&gt;GitHub Sponsors page&lt;/strong&gt;&lt;/a&gt;. Sponsorship tiers start at just $1 a month. Financially, your contributions help pay some of my cloud hosting bills and keep everything online. But just as important is the encouragement and motivation I get from knowing that there are people out there who think this work is important and useful.&lt;/p&gt;
&lt;iframe src=&#34;https://github.com/sponsors/wragge/card&#34; title=&#34;Sponsor wragge&#34; height=&#34;225&#34; width=&#34;600&#34; style=&#34;border: 0;&#34;&gt;&lt;/iframe&gt;
&lt;p&gt;To recognise my GitHub sponsors, I&amp;rsquo;ve also created a &lt;a href=&#34;https://glam-workbench.net/supporters/&#34;&gt;new Supporters page&lt;/a&gt; in the GLAM Workbench.&lt;/p&gt;
&lt;p&gt;Thanks!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Updates to the Trove Newspapers section of GLAM Workbench</title>
      <link>https://updates.timsherratt.org/2021/05/12/updates-to-the.html</link>
      <pubDate>Wed, 12 May 2021 10:52:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/05/12/updates-to-the.html</guid>
      <description>&lt;p&gt;I’ve updated, refreshed, and reorganised the &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/&#34;&gt;Trove newspapers section&lt;/a&gt; of the GLAM Workbench.  There’s currently 22 Jupyter notebooks organised under the following headings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#trove-newspapers-in-context&#34;&gt;&lt;strong&gt;Trove newspapers in context&lt;/strong&gt;&lt;/a&gt; – Notebooks in this section look at the Trove newspaper corpus as a whole, to try and understand what’s there, and what’s not.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#visualising-searches&#34;&gt;&lt;strong&gt;Visualising searches&lt;/strong&gt;&lt;/a&gt; – Notebooks in this section demonstrate some ways of visualising searches in Trove newspapers – seeing everything rather than just a list of search results.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#useful-tools&#34;&gt;&lt;strong&gt;Useful tools&lt;/strong&gt;&lt;/a&gt; – Notebooks in this section provide useful tools that extend or enhance the Trove web interface and API.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#tips-and-tricks&#34;&gt;&lt;strong&gt;Tips and tricks&lt;/strong&gt;&lt;/a&gt; – Notebooks in this section provide some useful hints to use with the Trove API.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#get-creative&#34;&gt;&lt;strong&gt;Get creative&lt;/strong&gt;&lt;/a&gt; – Notebooks in this section look at ways you can use data from Trove newspapers in creative ways.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There’s also a number of pre-harvested datasets.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/78c376493d.png&#34; width=&#34;702&#34; height=&#34;550&#34; alt=&#34;&#34; /&gt;
&lt;p&gt;Recently refreshed analyses, visualisations, and datasets include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://nbviewer.jupyter.org/github/GLAM-Workbench/trove-newspapers/blob/master/visualise-total-newspaper-articles-by-state-year.ipynb&#34;&gt;Number of Trove newspaper articles by year and state&lt;/a&gt; (notebook)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://nbviewer.jupyter.org/github/GLAM-Workbench/trove-newspapers/blob/master/Analysing_OCR_corrections.ipynb&#34;&gt;Analysing OCR correction in Trove’s newspapers&lt;/a&gt; (notebook)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://gist.github.com/wragge/9aa385648cff5f0de0c7d4837896df97&#34;&gt;List of Trove newspapers in languages other than English&lt;/a&gt; (markdown formatted list)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers/blob/master/newspapers_post_54.csv&#34;&gt;Newspapers with content from beyond the 1954 copyright ‘cliff of death’&lt;/a&gt; (CSV file)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As part of the update, notebooks that are intended to run as apps (with all the code hidden) have been updated to use Voila. But perhaps the thing I’m most excited about are the new options for &lt;strong&gt;running&lt;/strong&gt; the notebooks. As well as being able to launch the notebooks on Binder, you can now create your very own, &lt;strong&gt;persistent&lt;/strong&gt; environment on Reclaim Cloud with just a click of a button.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/c39e0d2441.png&#34; width=&#34;532&#34; height=&#34;337&#34; alt=&#34;&#34; /&gt;
&lt;p&gt;There’s also an automatically-built Docker image of this repository, containing everything you need to run the notebooks on your own computer. Check out the new &lt;a href=&#34;https://glam-workbench.net/trove-newspapers/#run-these-notebooks&#34;&gt;Run these notebooks&lt;/a&gt;  section for details. I’m gradually rolling this out across all the repositories in the GLAM Workbench. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Introducing the new, improved RecordSearch Data Scraper!</title>
      <link>https://updates.timsherratt.org/2021/04/27/introducing-the-new.html</link>
      <pubDate>Tue, 27 Apr 2021 10:55:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/04/27/introducing-the-new.html</guid>
      <description>&lt;p&gt;It was way back in 2009 that I created &lt;a href=&#34;http://discontents.com.au/some-archives-hacking/&#34;&gt;my first scraper&lt;/a&gt; for getting machine-readable data out of the National Archives of Australia&amp;rsquo;s online database, RecordSearch. Since then I’ve used versions of this scraper in a number of different projects such as &lt;a href=&#34;https://www.realfaceofwhiteaustralia.net/&#34;&gt;The Real Face of White Australia&lt;/a&gt;, &lt;a href=&#34;https://closedaccess.herokuapp.com/&#34;&gt;Closed Access&lt;/a&gt;, and &lt;a href=&#34;https://owebrowse.herokuapp.com/redactions/&#34;&gt;Redacted&lt;/a&gt; (including the &lt;a href=&#34;https://updates.timsherratt.org/2021/04/21/secrets-and-lives.html&#34;&gt;recent update&lt;/a&gt;). The scraper is also embedded in many of the notebooks that I’ve created for the &lt;a href=&#34;https://glam-workbench.net/recordsearch/&#34;&gt;RecordSearch section&lt;/a&gt; of the GLAM Workbench.&lt;/p&gt;
&lt;p&gt;However, the scraper was showing its age. The main problem was that one of its dependencies, Robobrowser, is no longer maintained. This made it difficult to update. I&amp;rsquo;d put off a major rewrite, thinking that RecordSearch itself might be getting a much-needed overhaul, but I could wait no longer. Introducing the brand new &lt;a href=&#34;https://github.com/wragge/recordsearch_data_scraper&#34;&gt;RecordSearch Data Scraper&lt;/a&gt;.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/ab67b3d0c3.png&#34; width=&#34;806&#34; height=&#34;422&#34; alt=&#34;&#34; /&gt;
&lt;p&gt;Just like the old version, the new scraper delivers machine-readable data relating to Items, Series and Agencies – both from individual records, and search results. It also adds a little extra to the basic metadata, for example, if an Item is digitised, the data includes the number of pages in the file. Series records can include the number of digitised files, and the breakdown of files by access category.&lt;/p&gt;
&lt;p&gt;The new scraper adds in some additional search parameters for Series and Agencies. It also makes use of a simple caching system to improve speed and efficiency.  RecordSearch makes use of an odd assortment of sessions, redirects, and hidden forms, which make scraping a challenge. Hopefully I’ve nailed down the idiosyncrasies, but I expect to catching bugs for a while.&lt;/p&gt;
&lt;p&gt;I created the new scraper in Jupyter using &lt;a href=&#34;https://github.com/fastai/nbdev&#34;&gt;NBDev&lt;/a&gt;. NBDev helps you to keep your code, examples, tests, and documentation all together in Jupyter notebooks. When you&amp;rsquo;re ready, it converts the code from the notebooks  into distributable Python libraries, runs all your tests, and builds a &lt;a href=&#34;https://wragge.github.io/recordsearch_data_scraper/&#34;&gt;documentation site&lt;/a&gt;. It’s very cool.&lt;/p&gt;
&lt;p&gt;Having updated the scraper, I now need to update the notebooks in the GLAM Workbench – more on that soon. The maintenance never ends! #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Recently digitised files in the National Archives of Australia</title>
      <link>https://updates.timsherratt.org/2021/03/29/recently-digitised-files.html</link>
      <pubDate>Mon, 29 Mar 2021 10:00:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/03/29/recently-digitised-files.html</guid>
      <description>&lt;p&gt;I’m interested in understanding what gets digitised and when by our cultural institutions, but accessible data is scarce. The National Archives of Australia lists ‘newly scanned&#39; records in RecordSearch, so I thought I’d see if I could convert that list into a machine-readable form for analysis. I’ve had a lot of experience trying to &lt;a href=&#34;https://glam-workbench.github.io/recordsearch/&#34;&gt;get data out of RecordSearch&lt;/a&gt;, but even so it took me a while to figure out how the ‘newly scanned’ page worked. Eventually I was able to extract all the file metadata from the list and save it to a CSV file. The details are in &lt;a href=&#34;https://glam-workbench.github.io/recordsearch/#harvest-recently-digitised-files-from-recordsearch&#34;&gt;this notebook in the GLAM Workbench&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I used the code to create a dataset of &lt;a href=&#34;https://github.com/GLAM-Workbench/recordsearch/blob/master/data/recently-digitised-20210327&#34;&gt;all the files digitised in the past month&lt;/a&gt;. The ‘newly scanned&#39; list only displays a month&amp;rsquo;s worth of additions, so that&amp;rsquo;s as much as I could get in one hit. In the past month, 24,039 files were digitised. 22,500 of these (about 93%) come from just four series of military records. This is no surprise, as the NAA is currently undertaking a major project to digitise WW2 service records. What is perhaps more interesting is the long tail of series from which a small number of files were digitised. 357 of the 375 series represented in the dataset (about 95%) appear 20 or fewer times. 210 series have had only one file digitised in the last month. I’m assuming that this diversity represents research interests, refracted through the digitisation on demand  service. But this really needs more data, and more analysis.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/53a63e0de5.jpg&#34; width=&#34;600&#34; height=&#34;266&#34; alt=&#34;&#34; /&gt;
&lt;p&gt;As I mentioned, only one month&amp;rsquo;s data is available from RecordSearch at any time. To try and capture a longer record of the digitisation process, I’ve set up an automated &lt;a href=&#34;https://simonwillison.net/2020/Oct/9/git-scraping/&#34;&gt;‘git scraper’&lt;/a&gt; that runs every Sunday and captures metadata of all the files digitised in the preceding week. The weekly datasets are saved as CSV files in a &lt;a href=&#34;https://github.com/wragge/naa-recently-digitised&#34;&gt;public GitHub repository&lt;/a&gt;. Over time, this should become a useful dataset for exploring long-term patterns in digitisation. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Moving on from Patreon...</title>
      <link>https://updates.timsherratt.org/2021/03/26/moving-on-from.html</link>
      <pubDate>Fri, 26 Mar 2021 13:37:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/03/26/moving-on-from.html</guid>
      <description>&lt;p&gt;Over the last few years, I&amp;rsquo;ve been very grateful for the support of my Patreon subscribers. Financially, their contributions have helped me cover a substantial proportion of the cloud hosting costs associated with projects like &lt;a href=&#34;https://historichansard.net/&#34;&gt;Historic Hansard&lt;/a&gt; and &lt;a href=&#34;https://www.realfaceofwhiteaustralia.net/&#34;&gt;The Real Face of White Australia&lt;/a&gt;. But, more importantly, just knowing that they thought my work was of value has helped keep me going, and inspired me to develop a range of new resources.&lt;/p&gt;
&lt;p&gt;However, while I&amp;rsquo;ve been grateful for the platform provided by Patreon, I&amp;rsquo;ve increasingly felt that it&amp;rsquo;s not a good fit for the sort of work I do. Patreon is geared towards providing special content to supporters, but, as you know, all my work is open. And that&amp;rsquo;s really important to me.&lt;/p&gt;
&lt;p&gt;Recently GitHub opened up its own sponsorship program for the development of open source software. This program seems to align more closely with what I do. I already share and manage my code through GitHub, so integrating sponsorship seems to make a lot of sense. It&amp;rsquo;s worth noting too, that, unlike Patreon, GitHub charges no fees and takes no cut of your contributions. As a result I&amp;rsquo;ve decided to close my Patreon account by the end of April, and create a GitHub sponsors page.&lt;/p&gt;
&lt;h2 id=&#34;what-does-this-mean-for-you&#34;&gt;What does this mean for you?&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;If you&amp;rsquo;re a Patreon subscriber and you&amp;rsquo;d like to keep supporting me&lt;/strong&gt;, you should cancel your Patreon contribution, then head over to my brand new &lt;a href=&#34;https://github.com/sponsors/wragge&#34;&gt;&lt;strong&gt;GitHub sponsors page&lt;/strong&gt;&lt;/a&gt; and sign up! Thanks for your continued support!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;If you&amp;rsquo;d prefer to let your contributions lapse&lt;/strong&gt;, just do nothing. Your payments will stop when I close the account at the end of April. I understand that circumstances change – thank you so much for your support over the years, and I hope you will continue to make use of the things I create.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;If you make use of any of my tools or resources and would like to support their continued development&lt;/strong&gt;, please think about &lt;a href=&#34;https://github.com/sponsors/wragge&#34;&gt;becoming a sponsor&lt;/a&gt;. For a sample of the sorts of things I&amp;rsquo;ve been working on lately, see my &lt;a href=&#34;https://updates.timsherratt.org/&#34;&gt;updates feed&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;the-future&#34;&gt;The future!&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m very excited about the possibilities ahead. The &lt;a href=&#34;https://glam-workbench.github.io/&#34;&gt;GLAM Workbench&lt;/a&gt; has received a lot of attention around the world (including a &lt;a href=&#34;https://glam-workbench.github.io/awards/#british-library-lab-awards-2020&#34;&gt;Research Award&lt;/a&gt; from the British Library Labs), and I&amp;rsquo;m planning some &lt;a href=&#34;https://updates.timsherratt.org/2021/03/25/reclaim-cloud-integration.html&#34;&gt;major developments&lt;/a&gt; over coming months. And, of course, I won&amp;rsquo;t forget all my other resources – I spent a lot of time in 2020 migrating databases and platforms to keep everything chugging along.&lt;/p&gt;
&lt;p&gt;On my &lt;a href=&#34;https://github.com/sponsors/wragge&#34;&gt;GitHub sponsors page&lt;/a&gt;, I&amp;rsquo;ve set an initial target of 50 sponsors. That might be ambitious, but as I said above, it&amp;rsquo;s not just about money. Being able to point to a group of people who use and value this work will help me argue for new ways of enabling digital research in the humanities. So please help me spread the word – let&amp;rsquo;s make things together!&lt;/p&gt;
&lt;iframe src=&#34;https://github.com/sponsors/wragge/card&#34; title=&#34;Sponsor wragge&#34; height=&#34;225&#34; width=&#34;600&#34; style=&#34;border: 0;&#34;&gt;&lt;/iframe&gt;
</description>
    </item>
    
    <item>
      <title>What can you do with the GLAM Workbench?</title>
      <link>https://updates.timsherratt.org/2021/03/25/what-can-you.html</link>
      <pubDate>Thu, 25 Mar 2021 11:43:47 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/03/25/what-can-you.html</guid>
      <description>&lt;p&gt;You might have noticed some changes to the &lt;a href=&#34;https://glam-workbench.github.io/&#34;&gt;GLAM Workbench&lt;/a&gt; home page recently. One of the difficulties has always been trying to explain what the GLAM Workbench actually is, so I thought it might be useful to put more examples up front. The home page now lists about 25 notebooks under the headings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.github.io/#finding-glam-data&#34;&gt;Finding GLAM data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.github.io/#asking-different-questions&#34;&gt;Asking different questions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.github.io/#hacking-heritage&#34;&gt;Hacking heritage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://glam-workbench.github.io/#bringing-documentation-alive&#34;&gt;Bringing documentation alive&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Hopefully they give a decent representation of the sorts of things you can do using the GLAM Workbench. I’ve also included a little rotating slideshow built using &lt;a href=&#34;https://slides.com&#34;&gt;Slides.com&lt;/a&gt;.&lt;/p&gt;
&lt;iframe src=&#34;https://slides.com/wragge/gw-highlights/embed?byline=hidden&amp;share=hidden&#34; width=&#34;576&#34; height=&#34;420&#34; scrolling=&#34;no&#34; frameborder=&#34;0&#34; webkitallowfullscreen mozallowfullscreen allowfullscreen&gt;&lt;/iframe&gt;
&lt;p&gt;Other recent additions include a new &lt;a href=&#34;https://glam-workbench.github.io/awards/&#34;&gt;Grants and Awards&lt;/a&gt; page. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Reclaim Cloud integration coming soon to the GLAM Workbench</title>
      <link>https://updates.timsherratt.org/2021/03/25/reclaim-cloud-integration.html</link>
      <pubDate>Thu, 25 Mar 2021 11:18:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/03/25/reclaim-cloud-integration.html</guid>
      <description>&lt;p&gt;I’ve been doing a bit of work behind the scenes lately to prepare for a major update to the &lt;a href=&#34;https://glam-workbench.github.io/&#34;&gt;GLAM Workbench&lt;/a&gt;. My plan is to provide one click installation of any of the GLAM Workbench repositories on the &lt;a href=&#34;https://reclaim.cloud/&#34;&gt;Reclaim Cloud&lt;/a&gt; platform. This will provide a useful step up from Binder for any researcher who wants to do large-scale or sustained work using the GLAM Workbench. Reclaim Cloud is a paid service, but they do a great job supporting digital scholarship in the humanities, and it’s fairly easy to minimise your costs by shutting down environments when they&amp;rsquo;re not in use.&lt;/p&gt;
&lt;p&gt;I’ve still got a lot of work to do to roll this out across the GLAM Workbench&amp;rsquo;s 40 repositories, but if you&amp;rsquo;d like a preview head to the &lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspaper-harvester&#34;&gt;Trove Newspaper and Gazette Harvester repository&lt;/a&gt; on GitHub. Get yourself a Reclaim Cloud account and click on the &lt;strong&gt;Launch on Reclaim Cloud&lt;/strong&gt; button. It&amp;rsquo;s that easy!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/a2c7a079eb.jpg&#34; width=&#34;600&#34; height=&#34;270&#34; alt=&#34;&#34; /&gt;
&lt;p&gt;There&amp;rsquo;s &lt;a href=&#34;https://community.reclaimhosting.com/t/a-one-click-jupyter-install-example-for-the-glam-workbench/3676&#34;&gt;some technical notes&lt;/a&gt; in the Reclaim Hosting forum, and &lt;a href=&#34;https://bavatuesdays.com/reclaim-clouds-got-glam/&#34;&gt;a post&lt;/a&gt; by Reclaim Hosting guru Jim Groom describing his own experience spinning up the GLAM Workbench.&lt;/p&gt;
&lt;p&gt;Watch this space for more news! #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Some recent GLAM Workbench presentations</title>
      <link>https://updates.timsherratt.org/2021/03/25/some-recent-glam.html</link>
      <pubDate>Thu, 25 Mar 2021 10:40:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/03/25/some-recent-glam.html</guid>
      <description>&lt;p&gt;I’ve given a couple of talks lately on the &lt;a href=&#34;https://glam-workbench.github.io/&#34;&gt;GLAM Workbench&lt;/a&gt; and some of my other work relating to the construction of online access to GLAM collections. Videos and slides are available for both:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;From collections as data to collections as infrastructure: Building the GLAM Workbench&lt;/strong&gt;, seminar for the Centre for Creative and Cultural Research, University of Canberra, 22 February 2021 – &lt;a href=&#34;https://vimeo.com/528145007&#34;&gt;video&lt;/a&gt; (40 minutes) and &lt;a href=&#34;https://slides.com/wragge/uc-cccr-glamworkbench&#34;&gt;slides&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;iframe src=&#34;https://player.vimeo.com/video/528145007&#34; width=&#34;640&#34; height=&#34;360&#34; frameborder=&#34;0&#34; allow=&#34;autoplay; fullscreen; picture-in-picture&#34; allowfullscreen&gt;&lt;/iframe&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Building the GLAM Workbench&lt;/strong&gt; (and various other projects such as &lt;a href=&#34;https://www.realfaceofwhiteaustralia.net/&#34;&gt;The Real Face of White Australia&lt;/a&gt;, &lt;a href=&#34;http://closedaccess.herokuapp.com/&#34;&gt;Closed Access&lt;/a&gt;, and &lt;a href=&#34;https://owebrowse.herokuapp.com/redactions/&#34;&gt;redacted&lt;/a&gt;), guest lecture for the &lt;a href=&#34;https://wiki.epfl.ch/cultural.data.sculpting&#34;&gt;Cultural Data Sculpting&lt;/a&gt; course, EPFL, Switzerland, 18 March 2021 – &lt;a href=&#34;https://vimeo.com/525872948&#34;&gt;video&lt;/a&gt; (1hr 40mins) and &lt;a href=&#34;https://slides.com/wragge/data-sculpting-2021/&#34;&gt;slides&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I’ve also updated the &lt;a href=&#34;https://glam-workbench.github.io/presentations/&#34;&gt;presentations&lt;/a&gt; page in the GLAM Workbench. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Some GLAM Workbench datasets to explore for Open Data Day</title>
      <link>https://updates.timsherratt.org/2021/03/08/some-glam-workbench.html</link>
      <pubDate>Mon, 08 Mar 2021 14:54:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/03/08/some-glam-workbench.html</guid>
      <description>&lt;p&gt;It was Open Data Day on Saturday 6 March – here’s some of the ready-to-go datasets you can find in the &lt;a href=&#34;https://glam-workbench.github.io/&#34;&gt;GLAM Workbench&lt;/a&gt; – there’s something for historians, humanities researchers, teachers &amp;amp; more!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;First here’s a &lt;a href=&#34;https://glam-workbench.github.io/glam-data-list/&#34;&gt;list of Australian GLAM (that’s galleries, libraries, archives &amp;amp; museums) data sources&lt;/a&gt;. It includes APIs, portals, and downloadable datasets. Suggested additions welcome!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There’s also a &lt;a href=&#34;https://glam-workbench.github.io/glam-datasets-from-gov-portals/&#34;&gt;list of Australian GLAM datasets that are available through government open data portals&lt;/a&gt;. There’s hundreds of them, but they’re not always easy to find. Convicts, immigration, hospitals, WWI – includes lots of useful biographical data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you’re not sure where to start with a list of 600 CSV files, have a look at the &lt;a href=&#34;https://glam-workbench.github.io/csv-explorer/&#34;&gt;GLAM CSV Explorer&lt;/a&gt;! Select a file and this Jupyter-powered app will build a series of visualisations based on the contents of each column.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;While they’re not yet in an open data portal, NSW State Archives has a rich collection of indexes transcribed by volunteers. I’ve scraped 64 indexes, with over 1.4 million rows of data and &lt;a href=&#34;https://glam-workbench.github.io/nsw-state-archives/#nsw-state-archives-online-indexes&#34;&gt;put them in a repository for easy download.&lt;/a&gt; There’s even a &lt;a href=&#34;https://glam-workbench.github.io/nsw-state-archives/#nsw-state-archives-index-explorer&#34;&gt;version of the CSV Explorer&lt;/a&gt;, just for the NSW State Archives indexes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Here’s a &lt;a href=&#34;https://glam-workbench.github.io/trove-newspapers/#csv-formatted-list-of-australian-womens-weekly-issues-1933-to-1982&#34;&gt;CSV file&lt;/a&gt; containing details of every issue of the Australian Women’s Weekly in Trove.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A &lt;a href=&#34;https://glam-workbench.github.io/trove-newspapers/#australian-womens-weekly-front-covers-1933-to-1982&#34;&gt;collection of front covers from the Australian Women’s Weekly from 1933 to 1982&lt;/a&gt;! That’s 2,566 images you can download from Cloudstor or browse in a series of convenient PDFs.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/e829a7a078.jpg&#34; width=&#34;600&#34; height=&#34;376&#34; alt=&#34;&#34; /&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Here’s a &lt;a href=&#34;https://glam-workbench.github.io/trove-newspapers/#trove-newspapers-with-non-english-language-content&#34;&gt;list of non-English language newspapers&lt;/a&gt; in Trove.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;And another &lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers/blob/master/newspapers_post_54.csv&#34;&gt;list of newspapers in Trove&lt;/a&gt; with articles available from beyond the 1954 copyright cliff of death.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;While we’re on newspapers, here’s a &lt;a href=&#34;https://docs.google.com/spreadsheets/d/1rURriHBSf3MocI8wsdl1114t0YeyU0BVSXWeg232MZs/edit?usp=sharing&#34;&gt;spreadsheet&lt;/a&gt; that identifies places of publication or circulation of Trove newspapers, and provides geocordinates&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;What about some text? Here’s &lt;a href=&#34;https://glam-workbench.github.io/trove-books/#ocrd-text-from-trove-books-and-ephemera&#34;&gt;24,620 files of OCRd text from digitised books and ephemera&lt;/a&gt; in Trove. There’s also a &lt;a href=&#34;https://glam-workbench.github.io/trove-books/#csv-formatted-list-of-books-with-ocrd-text&#34;&gt;CSV-formatted list&lt;/a&gt; with the basic details of each book.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;More text! Here’s &lt;a href=&#34;https://glam-workbench.github.io/trove-journals/#ocrd-text-from-trove-digitised-journals&#34;&gt;OCRd text from 26,234 issues of 397 digitised journals&lt;/a&gt; in Trove.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Something different – a &lt;a href=&#34;https://glam-workbench.github.io/trove-journals/#politicians-talking-about-immigrants-and-refugees&#34;&gt;collection of 12,619 press releases &amp;amp; speeches by Australian politicians&lt;/a&gt; that include any of the terms ‘immigrant’, ‘asylum seeker’, ‘boat people’, ‘illegal arrivals’, or &amp;lsquo;boat arrivals&amp;rsquo;.  From the Parliamentary Library via Trove.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Some more images – a &lt;a href=&#34;https://glam-workbench.github.io/trove-journals/#editorial-cartoons-from-the-bulletin-1886-to-1952&#34;&gt;collection of 3,471 full-page editorial cartoons from The Bulletin&lt;/a&gt;, 1886 to 1952 (with a warning for racist content). Available both as individual images and compiled into PDFs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;From the ABC via Trove, there’s &lt;a href=&#34;https://glam-workbench.github.io/trove-music/#abc-radio-national-programs&#34;&gt;400,000 records from Radio National programs broadcast since the late 1990s&lt;/a&gt;. That includes every segment broadcast on AM, PM, RN Breakfast etc.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;This might be handy – from some work I’m doing with ANU Archives, here’s a &lt;a href=&#34;https://github.com/GLAM-Workbench/anu-archives/blob/master/nsw_holidays_1900_1950.csv&#34;&gt;CSV file containing details of holidays in NSW from 1901 to 1950&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Department of Prime Minister and Cabinet provides XML versions of more than 20,000 speeches &amp;amp; interviews from recent PMs for download. I’ve &lt;a href=&#34;https://glam-workbench.github.io/pm-transcripts/&#34;&gt;saved them to a repository and compiled some indexes&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;And finally – Commonwealth Hansard from the Parliamentary Library – lots of well-structured XML files! I’ve &lt;a href=&#34;https://glam-workbench.github.io/hansard/&#34;&gt;created a repo&lt;/a&gt; with one file for each sitting day from 1901 to 1980 &amp;amp; 1998 to 2005 (hopefully the gap will be filled soon). There’s also a &lt;a href=&#34;https://glam-workbench.github.io/hansard/#list-of-sitting-days-1901-to-2005&#34;&gt;CSV index to sitting days&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And if that’s not enough data, the GLAM Workbench provides tools to help you create your own datasets from Trove, the National Archives of Australia, the National Museum of Australia, Archives NZ, DigitalNZ, &amp;amp; more! #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2021/02/11/the-naa-recently.html</link>
      <pubDate>Thu, 11 Feb 2021 10:24:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/02/11/the-naa-recently.html</guid>
      <description>&lt;p&gt;The NAA recently changed field labels in RecordSearch, so that ‘Barcode&#39; is now ‘Item ID’. This required an update to my &lt;a href=&#34;https://github.com/wragge/recordsearch_tools&#34;&gt;&lt;code&gt;recordsearch_tools&lt;/code&gt; screen scraper&lt;/a&gt;. I also had to make a few changes in the &lt;a href=&#34;https://glam-workbench.github.io/recordsearch/&#34;&gt;RecordSearch section&lt;/a&gt; of the GLAM Workbench. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>New! DigitalNZ API Query Builder added to GLAM Workbench</title>
      <link>https://updates.timsherratt.org/2021/02/03/new-digitalnz-api.html</link>
      <pubDate>Wed, 03 Feb 2021 10:08:17 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/02/03/new-digitalnz-api.html</guid>
      <description>&lt;p&gt;I’ve added an API Query Builder to the &lt;a href=&#34;https://glam-workbench.github.io/digitalnz/&#34;&gt;DigitalNZ section of the GLAM Workbench&lt;/a&gt;. You can use it to learn about the different parameters available from the search API, and experiment with different queries. Just get your API key from DigitalNZ, then try entering keywords and selecting options. Once you understand how the API works, you can start thinking about how you can make use of it in your own projects.&lt;/p&gt;
&lt;p&gt;👉🏻 &lt;a href=&#34;https://mybinder.org/v2/gh/GLAM-Workbench/digitalnz/master?urlpath=voila%2Frender%2Fbuild_api_query.ipynb&#34;&gt;Try it out live on Binder!&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Under the hood the API Query Builder is a Jupyter notebook (of course), but it uses &lt;a href=&#34;https://ipyvuetify.readthedocs.io/en/latest/index.html&#34;&gt;ipyvuetify&lt;/a&gt; to create good-looking, responsive, form widgets. It’s intended to be run using &lt;a href=&#34;https://voila.readthedocs.io/en/stable/index.html&#34;&gt;Voilà&lt;/a&gt;, which turns notebooks into interactive apps and dashboards. You can now run any Jupyter notebook &lt;a href=&#34;https://voila.readthedocs.io/en/stable/deploy.html#deployment-on-binder&#34;&gt;using Voilà on Binder&lt;/a&gt;, just by changing the url.&lt;/p&gt;
&lt;p&gt;If this app seems useful (let me know!) I might put a version on Heroku so the start up time is reduced. I’m also thinking of using this sort of pattern to create apps for other APIs in the GLAM Workbench. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/48f0490cce.jpg&#34; width=&#34;600&#34; height=&#34;445&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title>OpenGLAM fireworks! Finding open collections in DigitalNZ</title>
      <link>https://updates.timsherratt.org/2021/01/28/openglam-fireworks-finding.html</link>
      <pubDate>Thu, 28 Jan 2021 11:29:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/01/28/openglam-fireworks-finding.html</guid>
      <description>&lt;p&gt;Lately I’ve been updating and expanding the notebooks in the &lt;a href=&#34;https://glam-workbench.github.io/digitalnz/&#34;&gt;DigitalNZ section&lt;/a&gt; of the GLAM Workbench. In particular, I’ve been looking at the &lt;code&gt;usage&lt;/code&gt; facet to understand how much of the aggregated content is ‘open’. What do I mean by ‘open’? The &lt;a href=&#34;https://opendefinition.org/&#34;&gt;Open Knowledge Foundation definition&lt;/a&gt; states that ‘open data and content can be freely used, modified, and shared by anyone for any purpose’. Obviously things that are in the public domain, such as out-of-copyright resources, are open. But so are resources with an &lt;a href=&#34;https://opendefinition.org/licenses/&#34;&gt;open licence&lt;/a&gt; such as &lt;a href=&#34;https://opendefinition.org/licenses/cc-by&#34;&gt;CC-BY&lt;/a&gt; or &lt;a href=&#34;https://opendefinition.org/licenses/cc-by-sa&#34;&gt;CC-BY-SA&lt;/a&gt;. The Creative Commons ‘Non commercial&#39; and ‘No derivatives’ licences are &lt;em&gt;not&lt;/em&gt; open because they put limits on how you can use resources.&lt;/p&gt;
&lt;p&gt;How does this definition map to DigitalNZ? The &lt;code&gt;usage&lt;/code&gt; facet includes &lt;a href=&#34;https://github.com/GLAM-Workbench/digitalnz/blob/master/facets/usage.csv&#34;&gt;five values&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Share&lt;/li&gt;
&lt;li&gt;Modify&lt;/li&gt;
&lt;li&gt;Use commercially&lt;/li&gt;
&lt;li&gt;All rights reserved&lt;/li&gt;
&lt;li&gt;Unknown&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These values have been assigned by DigitalNZ based on  the &lt;a href=&#34;https://github.com/GLAM-Workbench/digitalnz/blob/master/facets/rights.csv&#34;&gt;35,000 different rights statements&lt;/a&gt; and &lt;a href=&#34;https://github.com/GLAM-Workbench/digitalnz/blob/master/facets/copyright.csv&#34;&gt;30 different copyright statements&lt;/a&gt; that are included in DigitalNZ metadata records. I find I have to turn the &lt;code&gt;usage&lt;/code&gt; values inside out to really understand them. A resource that only allows you to ‘Share’, excludes the ‘Modify’ and ‘Use commercially’ permissions and so is sort of equivalent to a CC-BY-ND-NC licence. The only open value, according to the definition above, is ‘Use commercially’, which is like CC-BY. I’m assuming that ‘Use commercially’ has been assigned to resources that either out of copyright (or with no known copyright restrictions)  or are openly licensed.&lt;/p&gt;
&lt;p&gt;It’s also worth noting that the ‘usage’ values are not mutually-exclusive. A record with a ‘usage’ value of ‘Use commercially’, will also be assigned ‘Share’ and ‘Modify&#39; values. This is because ‘Use commercially’ includes the &amp;lsquo;Share&amp;rsquo; and ‘Modify’ permissions. This seems a bit counter-intuitive, but makes sense if you think about doing a search for everything you&amp;rsquo;re allowed to share.&lt;/p&gt;
&lt;p&gt;A rough calculation based on the usage facet indicates that 71.76% of the resources aggregated by DigitalNZ are open. That seems pretty good, though a lot of those are probably out-of-copyright newspaper articles from Papers Past. For a more fine-grained analysis, I decided to look at the ‘usage’ data for each combination of ‘content_partner’ and ‘primary_collection’. How open is each individual collection in DigitalNZ?&lt;/p&gt;
&lt;p&gt;For added excitement, and to stretch my knowledge of what &lt;a href=&#34;https://altair-viz.github.io/&#34;&gt;Altair&lt;/a&gt; can do, I decided to visualise the results as display of colourful fireworks. The higher the explosion, the more open the collection! I’m pretty pleased with the result.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://glam-workbench.github.io/images/dnz-fireworks.png&#34;&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/533add5b13.jpg&#34; width=&#34;600&#34; height=&#34;174&#34; alt=&#34;&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve saved &lt;a href=&#34;http://timsherratt.org/shed/digitalnz/open_collections_digitalnz.html&#34;&gt;a HTML version of the chart&lt;/a&gt; so you can mouseover the explosions for more details. All the code is &lt;a href=&#34;https://glam-workbench.github.io/digitalnz/#visualising-open-collections-in-digitalnz&#34;&gt;included in this notebook&lt;/a&gt;, along with a &lt;a href=&#34;https://github.com/GLAM-Workbench/digitalnz/blob/master/facets/usage_by_collection_and_partner.csv&#34;&gt;CSV file&lt;/a&gt; containing all the harvested facet data. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>New dataset and notebooks – twenty years of ABC Radio National</title>
      <link>https://updates.timsherratt.org/2021/01/18/new-dataset-twenty.html</link>
      <pubDate>Mon, 18 Jan 2021 10:27:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2021/01/18/new-dataset-twenty.html</guid>
      <description>&lt;p&gt;There’s a &lt;a href=&#34;https://glam-workbench.github.io/trove-music/&#34;&gt;new  GLAM Workbench section&lt;/a&gt; for working with data from Trove’s Music &amp;amp; Sound zone!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Inside you&amp;rsquo;ll find out how to harvest all the metadata from ABC Radio National program records – that&amp;rsquo;s 400,000+ records, from 160 Radio National programs, over more than 20 years.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It’s metadata only, so not full transcripts or audio, though there are links back to the ABC site where you might find transcripts. Most records should at least have a title, a date, the name of the program it was broadcast on, a list of contributors, and perhaps a brief abstract/summary. It&amp;rsquo;s also worth noting that many of these records, particularly those from the main current affairs programs, represent individual stories or segments – so they provide a detailed record of the major news stories for the last couple of decades!&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/43f5a3e067.jpg&#34; width=&#34;600&#34; height=&#34;365&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/c1f9a22191.jpg&#34; width=&#34;600&#34; height=&#34;243&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.github.io/trove-music/#harvest-abc-radio-national-records-from-trove&#34;&gt;harvesting notebook&lt;/a&gt; shows you how to get the data from the Trove API. There are a number of duplicate records, and some inconsistencies in the way the data is formatted, so the harvesting code tries to clean things up a bit. You can of course adjust this to meet your own needs.&lt;/p&gt;
&lt;p&gt;If you don&amp;rsquo;t want to do the harvesting yourself, there’s &lt;a href=&#34;https://glam-workbench.github.io/trove-music/#abc-radio-national-programs&#34;&gt;pre-harvested datasets&lt;/a&gt; that you can download immediately from Cloudstor and start exploring. The complete harvest of all 400,000+ records is available both in JSONL (newline separated JSON) and CSV formats. There&amp;rsquo;s also a series of separate datasets for the most frequently occurring programs: RN Breakfast, RN Drive, AM, PM, The World Today, Late Night Live, Life Matters, and the Science Show.&lt;/p&gt;
&lt;p&gt;There’s also a &lt;a href=&#34;https://glam-workbench.github.io/trove-music/#exploring-abc-radio-national-metadata&#34;&gt;notebook&lt;/a&gt; that demonstrates a few possible ways you might start to play with the data – looking at the range of programs, the distribution of records over time, the people involved in each story, and words in the titles of each segment.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/228b9e1e74.jpg&#34; width=&#34;600&#34; height=&#34;506&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/61b8d46dbe.jpg&#34; width=&#34;600&#34; height=&#34;300&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;
&lt;p&gt;This is a very rich source of data for examining Australia&amp;rsquo;s political and social history over the last twenty years. Dive in and see what you can find! #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2021/2e05accfd1.jpg&#34; width=&#34;403&#34; height=&#34;600&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title>GLAM Workbench wins British Library Labs Research Award!</title>
      <link>https://updates.timsherratt.org/2020/12/16/glam-workbench-wins.html</link>
      <pubDate>Wed, 16 Dec 2020 11:46:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/12/16/glam-workbench-wins.html</guid>
      <description>&lt;p&gt;&lt;strong&gt;Asking questions with web archives – introductory notebooks for historians&lt;/strong&gt; has won the British Library Labs Research Award for 2020. &lt;a href=&#34;https://data.bl.uk/bl_labs_awards/index.html&#34;&gt;The awards&lt;/a&gt; recognise &amp;lsquo;exceptional projects that have used the Library’s digital collections and data&amp;rsquo;.&lt;/p&gt;
&lt;p&gt;This project gave me a chance to work with web archives collections and staff from the British Library, the National Library of Australia, and the National Library of New Zealand, and was &lt;a href=&#34;https://netpreserve.org/projects/jupyter-notebooks-for-historians/&#34;&gt;supported&lt;/a&gt; by the International Internet Preservation Consortium&amp;rsquo;s Discretionary Funding Program.&lt;/p&gt;
&lt;p&gt;We developed a range of tools, examples, and documentation to help researchers use and explore the vast historical resources available through web archives. A new &lt;a href=&#34;https://glam-workbench.github.io/web-archives/&#34;&gt;web archives section&lt;/a&gt; was added to the GLAM Workbench, and 16 Jupyter notebooks, combining text, images, and live code, were created.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a &lt;a href=&#34;https://youtu.be/qhaRQ0LxNAo&#34;&gt;30 second summary&lt;/a&gt; of the project!&lt;/p&gt;
&lt;p&gt;The judges noted:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“The panel were impressed with the level of documentation and thought that went into how to work computationally through Jupyter notebooks with web archives which are challenging to work with because of their size. These tools were some of the first of their kind.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;“The Labs Advisory Board wanted to acknowledge and reward the incredible work of Tim Sherratt in particular. Tim you have been a pioneer as a one-person lab over many years and these 16 notebooks are a fine addition to your already extensive suite in your GLAM Workbench. Your work has inspired so many in GLAM, the humanities community, and BL Labs to develop their own notebooks. To our audience, we strongly recommend that you look at the GLAM Workbench if you’re interested in doing computational experiments with many institutions’ data sources.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Thanks to Andy, Olga, Alex, and Ben for your advice and support. And thanks to the British Library Labs for the award! #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>The GLAM Workbench as research infrastructure (some basic stats)</title>
      <link>https://updates.timsherratt.org/2020/12/15/the-glam-workbench.html</link>
      <pubDate>Tue, 15 Dec 2020 10:43:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/12/15/the-glam-workbench.html</guid>
      <description>&lt;p&gt;Repositories in the &lt;a href=&#34;https://glam-workbench.github.io/&#34;&gt;GLAM Workbench&lt;/a&gt; have been launched on &lt;a href=&#34;https://mybinder.org/&#34;&gt;Binder&lt;/a&gt; 3,529 times since the start of this year (according to data from the &lt;a href=&#34;https://archive.analytics.mybinder.org&#34;&gt;Binder Events log&lt;/a&gt;). That’s repository launches, not notebooks. Having launched a repository, users might use multiple notebooks. And of course these stats don’t include people using the notebooks in contexts other than Binder – on their own machines, servers, or services like AARNet’s SWAN. Or just viewing the notebooks in GitHub and copying code into their own projects.&lt;/p&gt;
&lt;p&gt;I’m suspicious of web stats, but the Binder data indicates that people have actually done more than ‘visit’ – they’ve spun up a Binder session ready to do some exploration.&lt;/p&gt;
&lt;p&gt;Every Jupyter notebook in the GLAM Workbench has a link that opens the notebook in Binder. If you click on the link, Binder reads configuration details from the repository and loads a customised computing environment. All in your browser! That means you can start using the GLAM Workbench without installing any software. Just click on the Binder link and start exploring!&lt;/p&gt;
&lt;p&gt;There are about &lt;a href=&#34;https://github.com/GLAM-Workbench/&#34;&gt;40 different repositories&lt;/a&gt; in the GLAM Workbench, helping you work with data from Trove, DigitalNZ, NAA, SLNSW, NSW Archives, NMA, ArchivesNZ, ANU Archives &amp;amp; more! The image below shows them ranked by number of Binder launches this year.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.github.io/web-archives/&#34;&gt;web archives section&lt;/a&gt; was added this year in collaboration with the IIPC, the UK Web Archive, the Australian Web Archive, and the NZ Web Archive. Its annual number of launches is inflated a bit by the development process. But there’s been 426 launches since it went public in June.&lt;/p&gt;
&lt;p&gt;I’m really pleased to see the &lt;a href=&#34;https://glam-workbench.github.io/trove-harvester/&#34;&gt;Trove newspaper harvester&lt;/a&gt; up near the top. At least once a day (on average) someone’s been firing up the repository to grab Trove newspaper articles in bulk.&lt;/p&gt;
&lt;p&gt;Overall, that’s about 11 GLAM Workbench repository launches a day on Binder. It might not seem like much, but that’s 11 research opportunities that didn’t exist before, 11 GLAM collections opened to exploration, 11 researchers building their digital skills…&lt;/p&gt;
&lt;p&gt;As humanities researchers continue to learn of the possibilities of GLAM data and develop their digital skills the numbers will grow. It’s a start. And a reminder that not all research infrastructure needs to be built in Go8 unis, by large teams, with $millions. We can all contribute by sharing our tools and methods. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/5db095f0fe.jpg&#34; width=&#34;600&#34; height=&#34;592&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/11/27/earlier-this-year.html</link>
      <pubDate>Fri, 27 Nov 2020 16:02:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/11/27/earlier-this-year.html</guid>
      <description>&lt;p&gt;Earlier this year I gave a seminar for the International Internet Preservation Consortium (IIPC) introducing the &lt;a href=&#34;https://glam-workbench.github.io/web-archives/&#34;&gt;web archives section&lt;/a&gt; of the GLAM Workbench. The seminar is now available online: &lt;a href=&#34;https://youtu.be/rVidh_wexoo&#34;&gt;youtu.be/rVidh_wex&amp;hellip;&lt;/a&gt;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/51f7e056dd.jpg&#34; width=&#34;600&#34; height=&#34;397&#34; alt=&#34;&#34; /&gt;
&lt;p&gt;Here are &lt;a href=&#34;https://slides.com/wragge/iipc-jupyter&#34;&gt;the slides&lt;/a&gt; if you want to follow along. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Harvest text from the Australian Women&#39;s Weekly!</title>
      <link>https://updates.timsherratt.org/2020/11/25/harvest-text-from.html</link>
      <pubDate>Wed, 25 Nov 2020 15:52:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/11/25/harvest-text-from.html</guid>
      <description>&lt;p&gt;The Trove Newspaper &amp;amp; Gazette Harvester has been updated to version 0.4.0. The major change is that if the OCRd text for an article isn&amp;rsquo;t available through the API, it will be automatically downloaded via the web interface. What does this mean in practice? Well previously you couldn&amp;rsquo;t harvest OCRd text from the &lt;em&gt;Australian Women&amp;rsquo;s Weekly&lt;/em&gt; because it&amp;rsquo;s not included in API results, but now you can!&lt;/p&gt;
&lt;p&gt;You don&amp;rsquo;t need to do anything differently. If there are AWW articles in your search, and you ask for all the OCRd text using the &lt;code&gt;--text&lt;/code&gt; option, the AWW text files will automagically appear in your harvest.&lt;/p&gt;
&lt;p&gt;Under the hood, I&amp;rsquo;ve started using &lt;a href=&#34;https://pypi.org/project/html2text/&#34;&gt;html2text&lt;/a&gt; to remove tags from the OCRd text. I think this should produce more consistent results. As previously, line breaks are removed by default from the OCRd text files. However, I&amp;rsquo;ve now added a &lt;code&gt;--include_linebreaks&lt;/code&gt; option if you&amp;rsquo;d like to keep them. This generally produces text that is more human-readable, but note that the line breaks produced by OCR aren&amp;rsquo;t always accurate.&lt;/p&gt;
&lt;p&gt;Head to the &lt;a href=&#34;https://glam-workbench.github.io/trove-harvester/&#34;&gt;GLAM Workbench to try it out&lt;/a&gt;, or &lt;a href=&#34;https://pypi.org/project/troveharvester/0.4.0/&#34;&gt;download the code from PyPi&lt;/a&gt;. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Beyond the copyright cliff of death</title>
      <link>https://updates.timsherratt.org/2020/11/13/beyond-the-copyright.html</link>
      <pubDate>Fri, 13 Nov 2020 09:56:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/11/13/beyond-the-copyright.html</guid>
      <description>&lt;p&gt;If you&amp;rsquo;ve done any searching in Trove&amp;rsquo;s digitised newspapers, you&amp;rsquo;ve probably noticed that there aren&amp;rsquo;t many results after 1954. This is basically because of copyright restrictions (though given the complexities of Australia&amp;rsquo;s copyright system, you can&amp;rsquo;t be sure that everything published before 1955 is &lt;em&gt;out&lt;/em&gt; of copyright). We can visualise the impact of this by looking at the number of newspaper articles in Trove by year.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/dcf55d9941.jpg&#34; width=&#34;600&#34; height=&#34;333&#34; alt=&#34;&#34; /&gt;
&lt;p&gt;You can see why I started referring to it as the &lt;strong&gt;copyright cliff of death&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;But you can also see a little trickle of articles continuing post-1954. The number of newspapers from beyond the copyright cliff of death continues to increase as agreements are made with publishers to put them online. I just checked and there&amp;rsquo;s now 83 newspapers that have at least &lt;em&gt;some&lt;/em&gt; post-1954 articles available. Here&amp;rsquo;s the top 10 (by number of articles).&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/95e9d27bd6.jpg&#34; width=&#34;600&#34; height=&#34;223&#34; alt=&#34;&#34; /&gt;
&lt;p&gt;If you&amp;rsquo;d like to browse the full list of post-1954 newspapers, &lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers/blob/master/newspapers_post_54.csv&#34;&gt;here&amp;rsquo;s the data&lt;/a&gt; as a CSV (spreadsheet) file.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;d like to see how I generated this list, have a look &lt;a href=&#34;https://glam-workbench.github.io/trove-newspapers/#beyond-the-copyright-cliff-of-death&#34;&gt;at this notebook&lt;/a&gt; in the Trove Newspapers section of the GLAM Workbench.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;d like to know how I created the chart above, have a look at &lt;a href=&#34;https://glam-workbench.github.io/trove-newspapers/#visualise-the-total-number-of-newspaper-articles-in-trove-by-year-and-state&#34;&gt;Visualise the total number of newspaper articles in Trove by year and state&lt;/a&gt;. #dhhacks&lt;/p&gt;
&lt;p&gt;Questions? Ask away at &lt;a href=&#34;https://ozglam.chat/t/trove-newspapers-and-the-copyright-cliff-of-death/94?u=wragge&#34;&gt;OzGLAM Help&lt;/a&gt;. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/10/26/ive-added-a.html</link>
      <pubDate>Mon, 26 Oct 2020 17:15:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/10/26/ive-added-a.html</guid>
      <description>&lt;p&gt;I’ve added a &lt;a href=&#34;https://glam-workbench.github.io/anu-archives/&#34;&gt;new section to the GLAM Workbench&lt;/a&gt; for the ANU Archives. The first set of notebooks relates to the &lt;a href=&#34;http://archivescollection.anu.edu.au/index.php/or59j&#34;&gt;Sydney Stock exchange stock and share lists&lt;/a&gt;. As the content note describes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;These are large format bound volumes of the official lists that were posted up for the public to see - 3 times a day - forenoon, noon and afternoon - at the close of the trading session in the call room at the Sydney Stock Exchange. The closing prices of stocks and shares were entered in by hand on pre-printed sheets.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The volumes have been digitised, resulting in a collection of 70,000+ high resolution images. You can browse the details of each volume using &lt;a href=&#34;https://nbviewer.jupyter.org/github/GLAM-Workbench/anu-archives/blob/master/stock-exchange-details-by-volume.ipynb&#34;&gt;this notebook&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I’ve been exploring ways of getting useful, machine-readable data out of the images. There’s more information about the processes involved in &lt;a href=&#34;https://github.com/wragge/sydney-stock-exchange&#34;&gt;this repository&lt;/a&gt;. I’ve also been working on improving the metadata and have managed to assign a date and session (Morning, Noon, or Afternoon) to each page. We these, we can start to explore the content!&lt;/p&gt;
&lt;p&gt;One of the notebooks creates a &lt;a href=&#34;https://nbviewer.jupyter.org/github/GLAM-Workbench/anu-archives/blob/master/stock-exchange-pages-calendar.ipynb&#34;&gt;calendar-like view&lt;/a&gt; of the whole collection, showing the number of pages surviving from each trading day. This makes it easy to find the gaps and changes in process. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/00cfdb0dde.jpg&#34; width=&#34;600&#34; height=&#34;470&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/10/26/ive-added-more.html</link>
      <pubDate>Mon, 26 Oct 2020 16:52:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/10/26/ive-added-more.html</guid>
      <description>&lt;p&gt;I’ve added more years to my &lt;a href=&#34;https://github.com/wragge/hansard-xml&#34;&gt;repository of Commonwealth Hansard&lt;/a&gt;! The repository now includes XML-formatted text files for both houses from 1901 to 1980, and 1998 to 2005. I’ve done some more checking and confirmed that the XML files for 1981 to 1997 aren&amp;rsquo;t currently available through ParlInfo, however, the Parliamentary Library are looking into it. I’ve also created a &lt;a href=&#34;https://github.com/GLAM-Workbench/australian-commonwealth-hansard/blob/master/data/all-sitting-days.csv&#34;&gt;CSV-formatted list of sitting days&lt;/a&gt; from 1901 to 2005 (based on ParlInfo search results). Details of the harvesting process are available &lt;a href=&#34;https://glam-workbench.github.io/hansard/&#34;&gt;in the GLAM Workbench&lt;/a&gt;. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/08/14/another-glamworkbench-update.html</link>
      <pubDate>Fri, 14 Aug 2020 18:29:50 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/08/14/another-glamworkbench-update.html</guid>
      <description>&lt;p&gt;Another #GLAMWorkbench update! Snip words out of @TroveAustralia newspaper pages and create big composite images. OCR art! &lt;a href=&#34;https://glam-workbench.github.io/trove-newspapers/#create-large-composite-images-from-snipped-words&#34;&gt;glam-workbench.github.io/trove-new&amp;hellip;&lt;/a&gt; #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/84a098fba1.jpg&#34; width=&#34;600&#34; height=&#34;380&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/07/30/ok-so-do.html</link>
      <pubDate>Thu, 30 Jul 2020 12:37:16 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/07/30/ok-so-do.html</guid>
      <description>&lt;p&gt;Ok, so do you want to make your own ‘scissors &amp;amp; paste’ messages using words from @TroveAustralia  newspaper articles? Go to &lt;a href=&#34;https://glam-workbench.github.io/trove-newspapers/#create-scissors-and-paste-messages-from-trove-newspaper-articles&#34;&gt;the notebook&lt;/a&gt; in #GLAMWorkbench &amp;amp; click on ‘Run live on Binder in Appmode’. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/f81fa29de5.jpg&#34; width=&#34;600&#34; height=&#34;534&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/07/29/another-glamworkbench-update.html</link>
      <pubDate>Wed, 29 Jul 2020 14:17:56 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/07/29/another-glamworkbench-update.html</guid>
      <description>&lt;p&gt;Another #GLAMWorkbench update! The Trove Harvester will now download both newspaper &lt;em&gt;and gazette&lt;/em&gt; articles in bulk. You can optionally include full text, and save copies of the articles as images and PDFs. #dhhacks &lt;a href=&#34;https://glam-workbench.github.io/trove-harvester/&#34;&gt;glam-workbench.github.io/trove-har&amp;hellip;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/6517b75846.jpg&#34; width=&#34;600&#34; height=&#34;401&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/f24a2a85a7.jpg&#34; width=&#34;600&#34; height=&#34;418&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/8512bd6dcc.jpg&#34; width=&#34;600&#34; height=&#34;426&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/07/28/interested-in-using.html</link>
      <pubDate>Tue, 28 Jul 2020 10:27:51 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/07/28/interested-in-using.html</guid>
      <description>&lt;p&gt;Interested in using web archives in your research? Join us on 5/6 August for a free @netpreserve webinar introducing the tools and examples available in the new #webarchives section of the #GLAMWorkbench. There are two timeslots to cover multiple timezones: &lt;a href=&#34;https://www.eventbrite.com/e/iipc-rss-webinar-jupyter-notebooks-for-web-archives-i-tickets-111349651806&#34;&gt;www.eventbrite.com/e/iipc-rs&amp;hellip;&lt;/a&gt; and &lt;a href=&#34;https://www.eventbrite.com/e/iipc-rss-webinar-jupyter-notebooks-for-web-archives-ii-tickets-112728556146&#34;&gt;www.eventbrite.com/e/iipc-rs&amp;hellip;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/7c05a95c47.jpg&#34; width=&#34;600&#34; height=&#34;600&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/2b04bdefc6.jpg&#34; width=&#34;600&#34; height=&#34;534&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/97c4d924f2.jpg&#34; width=&#34;600&#34; height=&#34;255&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/07/27/introducing-a-brand.html</link>
      <pubDate>Mon, 27 Jul 2020 18:46:40 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/07/27/introducing-a-brand.html</guid>
      <description>&lt;p&gt;Introducing a brand new section of the #GLAMWorkbench, exploring the @MuseumsVictoria collection API. Harvest species records, display random images, and download ALL THE ANTECHINUSES! &lt;a href=&#34;https://glam-workbench.github.io/museumsvictoria/&#34;&gt;glam-workbench.github.io/museumsvi&amp;hellip;&lt;/a&gt; #dhhacks&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/49e33f96fa.jpg&#34; width=&#34;504&#34; height=&#34;600&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/4c51985cb0.jpg&#34; width=&#34;600&#34; height=&#34;510&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/8e81a5f34f.jpg&#34; width=&#34;600&#34; height=&#34;369&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/07/27/new-additions-to.html</link>
      <pubDate>Mon, 27 Jul 2020 16:32:34 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/07/27/new-additions-to.html</guid>
      <description>&lt;p&gt;New additions to the @TroveAustralia books section of the #GLAMWorkbench – word frequency examples with OCRd text from digitised books, and a random recipe generator powered by a 19th C cook book! &lt;a href=&#34;https://glam-workbench.github.io/trove-books/&#34;&gt;glam-workbench.github.io/trove-boo&amp;hellip;&lt;/a&gt; #dhhacks&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/70f278df90.jpg&#34; width=&#34;600&#34; height=&#34;485&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/5fdbedbcea.jpg&#34; width=&#34;600&#34; height=&#34;229&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/07/27/with-the-recent.html</link>
      <pubDate>Mon, 27 Jul 2020 11:52:18 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/07/27/with-the-recent.html</guid>
      <description>&lt;p&gt;With the recent changes to @TroveAustralia, the Australian Women’s Weekly cover browser was retired. As a low-tech alternative, I’ve harvested all the cover images from the Women&amp;rsquo;s Weekly and saved them into PDFs for easy browsing, one for each decade. There are 2,566 images from 1933 to 1982.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.dropbox.com/s/0j6zpeuw6tbey5k/aww-1933-1939.pdf?dl=0&#34;&gt;1933 to 1939&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.dropbox.com/s/y1he8dd6h655weu/aww-1940-1949.pdf?dl=0&#34;&gt;1940 to 1949&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.dropbox.com/s/i9gp9i51nofmlqo/aww-1950-1959.pdf?dl=0&#34;&gt;1950 to 1959&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.dropbox.com/s/2of63tovcnphijo/aww-1960-1969.pdf?dl=0&#34;&gt;1960 to 1969&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.dropbox.com/s/f2yxpg8u4dx5uf2/aww-1970-1979.pdf?dl=0&#34;&gt;1970 to 1979&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.dropbox.com/s/xanohtas1fi7eu4/aww-1980-1982.pdf?dl=0&#34;&gt;1980 to 1982&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Just click on the link below each image to explore the complete issue on Trove. You can also download the full collection of images &lt;a href=&#34;https://cloudstor.aarnet.edu.au/plus/s/NaKjoKNFOGXXDNN&#34;&gt;from Cloudstor&lt;/a&gt;. There&amp;rsquo;s a &lt;a href=&#34;https://github.com/GLAM-Workbench/trove-newspapers/blob/58307d3ccae4d2c939ecb6aff59944f27d213842/data/aww-issues.csv&#34;&gt;CSV file&lt;/a&gt; containing all the issue metadata.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.github.io/trove-newspapers/#harvest-australian-womens-weekly-covers-or-the-front-pages-of-any-newspaper&#34;&gt;notebook used to harvest the images&lt;/a&gt; is in the Trove newspapers section of the GLAM Workbench. You could easily adapt the notebook to harvest the front pages of any newspaper. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/a8730e6b43.jpg&#34; width=&#34;447&#34; height=&#34;600&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/07/17/the-trove-books.html</link>
      <pubDate>Fri, 17 Jul 2020 23:11:35 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/07/17/the-trove-books.html</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.github.io/trove-books/&#34;&gt;Trove books section&lt;/a&gt; of the #GLAMWorkbench has been updated. There&amp;rsquo;s a fresh harvest of OCRd text &amp;amp; the notebooks have been changed to work with the new @TroveAustralia interface. Download &amp;amp; explore 24,620 files (3gb) of OCRd text! #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/766d9155ba.jpg&#34; width=&#34;600&#34; height=&#34;527&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/07/17/revisiting-my-historic.html</link>
      <pubDate>Fri, 17 Jul 2020 17:07:44 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/07/17/revisiting-my-historic.html</guid>
      <description>&lt;p&gt;Revisiting my Historic Hansard XML repository &amp;amp; realising how easy it is to load files as needed via the GitHub API &amp;amp; explore with Pandas &amp;amp; Jupyter. This #GLAMWorkbench &lt;a href=&#34;https://glam-workbench.github.io/hansard/#convert-a-years-worth-of-historic-hansard-into-a-dataframe-for-analysis&#34;&gt;notebook&lt;/a&gt; helps you explore a particular year/house. #dhhacks&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/2839d8e003.jpg&#34; width=&#34;600&#34; height=&#34;357&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/ec52714756.jpg&#34; width=&#34;600&#34; height=&#34;265&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/a4c3f7667d.jpg&#34; width=&#34;600&#34; height=&#34;396&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/a788b76bb3.jpg&#34; width=&#34;600&#34; height=&#34;436&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/07/14/the-trove-journals.html</link>
      <pubDate>Tue, 14 Jul 2020 14:31:47 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/07/14/the-trove-journals.html</guid>
      <description>&lt;p&gt;The &lt;a href=&#34;https://glam-workbench.github.io/trove-journals/&#34;&gt;Trove Journals section&lt;/a&gt; of the #GLAMWorkbench has been updated to work with the new @TroveAustralia interface! I’ve also re-harvested ALL the OCRd text from digitised journals — 6gb of text from 397 journals now downloadable in bulk from CloudStor. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/d9b0c2e64d.jpg&#34; width=&#34;600&#34; height=&#34;264&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/07/12/new-in-glamworkbench.html</link>
      <pubDate>Sun, 12 Jul 2020 14:18:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/07/12/new-in-glamworkbench.html</guid>
      <description>&lt;p&gt;New in #GLAMWorkbench! After you’ve used the @TroveAustralia Newspaper Harvester to download lots &amp;amp; lots of articles, try exploring the results in Datasette. &lt;a href=&#34;https://glam-workbench.github.io/trove-harvester/#display-the-results-of-a-harvest-as-a-searchable-database-using-datasette&#34;&gt;This notebook&lt;/a&gt; sets everything up, you can even add full text search &amp;amp; images! #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/43c8c5e1e7.jpg&#34; width=&#34;600&#34; height=&#34;428&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/06/29/download-newspaper-articles.html</link>
      <pubDate>Mon, 29 Jun 2020 10:48:53 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/06/29/download-newspaper-articles.html</guid>
      <description>&lt;p&gt;Download newspaper articles in bulk! The Trove Newspaper Harvester has been updated to work with the new @TroveAustralia interface. I’ve also added the ability to save articles as .jpg images! The easiest way to get started is &lt;a href=&#34;https://glam-workbench.github.io/trove-harvester/&#34;&gt;via the #GLAMWorkbench&lt;/a&gt;. #dhhacks&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/fe9cb9a58b.jpg&#34; width=&#34;600&#34; height=&#34;487&#34; alt=&#34;Screenshot of Trove Harvester page in GLAM Workbench&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/5b8395e147.jpg&#34; width=&#34;600&#34; height=&#34;255&#34; alt=&#34;Screenshot of TroveHarvester web app&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/39a828a8c3.jpg&#34; width=&#34;600&#34; height=&#34;270&#34; alt=&#34;Details of image file naming scheme&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/af98a9856b.jpg&#34; width=&#34;600&#34; height=&#34;302&#34; alt=&#34;Thumbnails of newspaper articles saved as images&#34; /&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>New GLAM Workbench section on web archives!</title>
      <link>https://updates.timsherratt.org/2020/05/27/new-glam-workbench.html</link>
      <pubDate>Wed, 27 May 2020 14:11:25 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/05/27/new-glam-workbench.html</guid>
      <description>&lt;p&gt;We tend to think of a web archive as a site we go to when links are broken – a useful fallback, rather than a source of new research data. But web archives don&amp;rsquo;t just store old web pages, they capture multiple versions of web resources over time. Using web archives we can observe change – we can ask historical questions.  But web archives store &lt;strong&gt;huge&lt;/strong&gt; amounts of data, and access is often limited for legal reasons. Just knowing what data is available and how to get to it can be difficult. Where do you start?&lt;/p&gt;
&lt;p&gt;The GLAM Workbench’s &lt;a href=&#34;https://glam-workbench.github.io/web-archives/&#34;&gt;new web archives section&lt;/a&gt; can help! Here you’ll find a collection of Jupyter notebooks that  document web archive data sources and standards, and walk through methods of harvesting, analysing, and visualising that data. It’s a mix of examples, explorations, apps and tools. The notebooks  use existing APIs to get data in manageable chunks, but many of the examples demonstrated can also be scaled up to build substantial datasets for research – you just have to be patient!&lt;/p&gt;
&lt;p&gt;Have you ever wanted to find when a particular fragment of text first appeared in a web page?  Or compare full-page screenshots  of archived sites?  Perhaps you want to explore how the text content of a page has changed over time, or create a side-by-side comparison of web archive captures. There are notebooks to help you with all of these.&lt;/p&gt;
&lt;p&gt;To dig deeper you might want to assemble a dataset of text extracted from archived web pages, construct your own database of archived Powerpoint files, or explore patterns within a whole domain. The notebooks provide a range of approaches that can be extended or modified according to your research questions.&lt;/p&gt;
&lt;p&gt;The development of these notebooks was supported by the International Internet Preservation Consortium&amp;rsquo;s &lt;a href=&#34;http://netpreserve.org/projects/&#34;&gt;Discretionary Funding Programme 2019-2020&lt;/a&gt;, with the participation of the British Library, the National Library of Australia, and the National Library of New Zealand. #dhhacks&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/b678824058.jpg&#34; width=&#34;600&#34; height=&#34;569&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/7db7dee281.jpg&#34; width=&#34;600&#34; height=&#34;600&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/deac772182.jpg&#34; width=&#34;600&#34; height=&#34;132&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/5bfddf3b1a.jpg&#34; width=&#34;600&#34; height=&#34;255&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/05/08/thanks-to-netpreserve.html</link>
      <pubDate>Fri, 08 May 2020 12:18:15 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/05/08/thanks-to-netpreserve.html</guid>
      <description>&lt;p&gt;Thanks to @NetPreserve, I’ve been spending time lately working on a set of web archive exploration notebooks for the #GLAMWorkbench. Here’s &lt;a href=&#34;https://mybinder.org/v2/gh/GLAM-Workbench/web-archives/master?urlpath=%2Fapps%2Fsave_screenshot.ipynb&#34;&gt;an example&lt;/a&gt; to create/compare screenshots of captures. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/3c45289d77.jpg&#34; width=&#34;600&#34; height=&#34;534&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/04/03/the-glam-csv.html</link>
      <pubDate>Fri, 03 Apr 2020 00:19:24 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/04/03/the-glam-csv.html</guid>
      <description>&lt;p&gt;The GLAM CSV Explorer has had a few updates — you can now filter by organisation, and upload your own CSV files! #GLAMWorkbench &lt;a href=&#34;https://mybinder.org/v2/gh/GLAM-Workbench/csv-explorer/master?urlpath=%2Fapps%2Fcsv-explorer.ipynb&#34;&gt;Try it live on Binder.&lt;/a&gt;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/7bdc1f9c9a.jpg&#34; width=&#34;600&#34; height=&#34;523&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title>Buildings might be closed, but the data is open – explore hundreds of datasets from Australian GLAM organisations!</title>
      <link>https://updates.timsherratt.org/2020/03/31/buildings-might-be.html</link>
      <pubDate>Tue, 31 Mar 2020 13:48:07 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/03/31/buildings-might-be.html</guid>
      <description>&lt;p&gt;For a couple of years I’ve been harvesting datasets created or published by Australian GLAM organisations through government data portals. I’ve just completed &lt;a href=&#34;https://glam-workbench.github.io/glam-data-portals/#results-30-march-2020&#34;&gt;the latest harvest&lt;/a&gt;, and there’s now 369 datasets, containing 983 files, from 23 GLAM organisations. 628 of these files are in CSV (spreadsheet) format.&lt;/p&gt;
&lt;p&gt;There’s a number of ways that you can explore the harvested data. You can &lt;a href=&#34;https://glam-workbench.github.io/glam-datasets-from-gov-portals/&#34;&gt;browse a big list of datasets&lt;/a&gt;, or &lt;a href=&#34;https://github.com/GLAM-Workbench/ozglam-data/blob/master/glam-datasets-from-gov-portals.csv&#34;&gt;download a CSV&lt;/a&gt; containing all the harvested data or &lt;a href=&#34;https://github.com/GLAM-Workbench/ozglam-data/blob/master/glam-datasets-from-gov-portals-csvs.csv&#34;&gt;just those formatted as CSVs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;With this harvest I’ve added a &lt;a href=&#34;https://ozglam-datasets.glitch.me/data/glam-datasets-from-gov-portals&#34;&gt;new way of searching and filtering the harvested data&lt;/a&gt; using &lt;a href=&#34;https://github.com/simonw/datasette&#34;&gt;Datasette&lt;/a&gt; running on Glitch. This interface lets you narrow your queries by field or facet, and search text fields for keywords.&lt;/p&gt;
&lt;p&gt;But what’s actually in all those CSV files? If you’d like to start exploring the &lt;em&gt;content&lt;/em&gt; of the datasets, then give my &lt;a href=&#34;https://glam-workbench.github.io/csv-explorer/&#34;&gt;GLAM CSV Explorer&lt;/a&gt; a go! The CSV Explorer looks at each column in the dataset and tries to identify the type of data inside. It then attempts to tell you something useful about it.&lt;/p&gt;
&lt;p&gt;For all the details, links, and harvesting code, &lt;a href=&#34;https://glam-workbench.github.io/glam-data-portals/&#34;&gt;see the #GLAMWorkbench&lt;/a&gt;. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/9b14bfb26b.jpg&#34; width=&#34;448&#34; height=&#34;600&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/03/11/my-harvest-of.html</link>
      <pubDate>Wed, 11 Mar 2020 20:05:04 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/03/11/my-harvest-of.html</guid>
      <description>&lt;p&gt;My harvest of OCRd text from @TroveAustralia digitised books, ephemera, and parliamentary papers has been updated! There&amp;rsquo;s now 19,795 text files (about 3gb) to explore! Harvesting details and links to browse/download files from Cloudstor are &lt;a href=&#34;https://glam-workbench.github.io/trove-books/&#34;&gt;in the #GLAMWorkbench&lt;/a&gt;. #dhhacks&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/bc31f3a83b.jpg&#34; width=&#34;600&#34; height=&#34;278&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/cf8c124f49.jpg&#34; width=&#34;585&#34; height=&#34;600&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/6ca36757e7.jpg&#34; width=&#34;600&#34; height=&#34;341&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/03/03/ive-added-some.html</link>
      <pubDate>Tue, 03 Mar 2020 11:09:17 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/03/03/ive-added-some.html</guid>
      <description>&lt;p&gt;I’ve added some more documentation to the &lt;a href=&#34;https://glam-workbench.github.io/trove-harvester/https://glam-workbench.github.io/trove-harvester/&#34;&gt;Trove Newspaper Harvester&lt;/a&gt; page in the #GLAMWorkbench. Get your @TroveAustralia newspaper articles in bulk! #dhhacks #collectionsasdata&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/6538bf262c.jpg&#34; width=&#34;600&#34; height=&#34;462&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/02/27/new-section-added.html</link>
      <pubDate>Thu, 27 Feb 2020 12:47:18 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/02/27/new-section-added.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://glam-workbench.github.io/slv/&#34;&gt;New section added&lt;/a&gt; to the #GLAMWorkbench with examples from @Library_Vic! #slvdata #dhhacks #collectionsasdata&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/33f7643329.jpg&#34; width=&#34;600&#34; height=&#34;468&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/02/27/more-fun-with.html</link>
      <pubDate>Thu, 27 Feb 2020 12:18:57 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/02/27/more-fun-with.html</guid>
      <description>&lt;p&gt;More fun with @iiif_io and images from @library_vic – resize, rotate, crop and more! Try it out with this &lt;a href=&#34;https://github.com/GLAM-Workbench/state-library-victoria/blob/086f17821d0ffcb0d7d6db4251a6208d1da6a146/more_fun_with_iiif.ipynb&#34;&gt;new notebook&lt;/a&gt; in the #GLAMWorkbench. #slvdata #dhhacks&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/10c0427d8b.jpg&#34; width=&#34;600&#34; height=&#34;320&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/284c6f2898.jpg&#34; width=&#34;600&#34; height=&#34;435&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/671198b3cf.jpg&#34; width=&#34;600&#34; height=&#34;382&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/02/26/new-glamworkbench-notebook.html</link>
      <pubDate>Wed, 26 Feb 2020 22:39:10 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/02/26/new-glamworkbench-notebook.html</guid>
      <description>&lt;p&gt;New #GLAMWorkbench &lt;a href=&#34;https://nbviewer.jupyter.org/github/GLAM-Workbench/state-library-victoria/blob/master/download_image_from_iiif.ipynb&#34;&gt;notebook&lt;/a&gt;! Download images from @Library_Vic using IIIF and Handle… #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/2029641121.jpg&#34; width=&#34;600&#34; height=&#34;593&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/02/21/want-to-save.html</link>
      <pubDate>Fri, 21 Feb 2020 10:10:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/02/21/want-to-save.html</guid>
      <description>&lt;p&gt;Want to save @TroveAustralia newspaper articles as images (that aren&amp;rsquo;t sliced up in annoying ways)? There&amp;rsquo;s an &lt;a href=&#34;https://glam-workbench.github.io/trove-newspapers/#save-a-trove-newspaper-article-as-an-image&#34;&gt;app for that&lt;/a&gt; in the #GLAMWorkbench. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/d6dedfab88.jpg&#34; width=&#34;600&#34; height=&#34;530&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/02/17/new-trove-images.html</link>
      <pubDate>Mon, 17 Feb 2020 21:08:39 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/02/17/new-trove-images.html</guid>
      <description>&lt;p&gt;New &lt;a href=&#34;https://glam-workbench.github.io/trove-images/&#34;&gt;‘Trove images&#39; section&lt;/a&gt; added to the #GLAMWorkbench! Here you’ll find my latest Jupyter notebook harvesting data about the use of standard licences &amp;amp; rights statements in Trove&amp;rsquo;s picture zone. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/117cf2e18f.jpg&#34; width=&#34;600&#34; height=&#34;463&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2020/02/14/voting-in-the.html</link>
      <pubDate>Fri, 14 Feb 2020 09:09:06 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2020/02/14/voting-in-the.html</guid>
      <description>&lt;p&gt;Voting in the 2019 @dhawards is &lt;a href=&#34;http://dhawards.org/dhawards2019/voting/&#34;&gt;now open&lt;/a&gt;! Go and check out all the cool #DigitalHumanities projects from around the world. And while you’re there, you might like to vote for my #GLAMWorkbench in the ‘Tools’ category!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2020/f9821a9126.jpg&#34; width=&#34;600&#34; height=&#34;532&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/11/20/new-glamworkbench-section.html</link>
      <pubDate>Wed, 20 Nov 2019 22:24:46 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/11/20/new-glamworkbench-section.html</guid>
      <description>&lt;p&gt;New #GLAMWorkbench &lt;a href=&#34;https://glam-workbench.github.io/trove-random/&#34;&gt;section&lt;/a&gt; with examples of how to get &lt;em&gt;random-ish&lt;/em&gt; works and newspaper articles from @TroveAustralia. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/ea26af5df6.jpg&#34; width=&#34;600&#34; height=&#34;394&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/09/04/the-naagovau-recordsearch.html</link>
      <pubDate>Thu, 05 Sep 2019 00:03:04 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/09/04/the-naagovau-recordsearch.html</guid>
      <description>&lt;p&gt;The @naagovau RecordSearch section of the #GLAMWorkbench has been updated with more notebooks to help you get Australian archives data in a usable form. &lt;a href=&#34;https://glam-workbench.github.io/recordsearch/&#34;&gt;glam-workbench.github.io/recordsea&amp;hellip;&lt;/a&gt; Useful for #twitterstorians, #ozhist, &amp;amp; #govhack!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/2443ab1a76.jpg&#34; width=&#34;600&#34; height=&#34;474&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/08/25/ive-updated-my.html</link>
      <pubDate>Sun, 25 Aug 2019 14:33:18 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/08/25/ive-updated-my.html</guid>
      <description>&lt;p&gt;I’ve updated my harvest of OCRd text from digitised journals in @TroveAustralia. The complete dataset now includes 33,035 issues from 720 titles – about 8gb of text to explore. Details in the #GLAMWorkbench: &lt;a href=&#34;https://glam-workbench.github.io/trove-journals/#data-and-text&#34;&gt;glam-workbench.github.io/trove-jou&amp;hellip;&lt;/a&gt; #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/799cee870c.jpg&#34; width=&#34;600&#34; height=&#34;277&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/08/09/theres-a-new.html</link>
      <pubDate>Fri, 09 Aug 2019 23:55:23 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/08/09/theres-a-new.html</guid>
      <description>&lt;p&gt;There’s a &lt;a href=&#34;https://glam-workbench.github.io/nma/&#34;&gt;new section of the GLAM Workbench&lt;/a&gt; devoted to the National Museum of Australia collection API! Harvest @nma data, then explore it by time and place. #dhhacks&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/600a9eb2f5.jpg&#34; width=&#34;600&#34; height=&#34;359&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/07e1f50468.jpg&#34; width=&#34;600&#34; height=&#34;341&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/5ffd854d09.jpg&#34; width=&#34;600&#34; height=&#34;472&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/07/24/updates-to-the.html</link>
      <pubDate>Wed, 24 Jul 2019 18:09:56 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/07/24/updates-to-the.html</guid>
      <description>&lt;p&gt;Updates to the &lt;a href=&#34;https://glam-workbench.github.io/trove-newspapers/&#34;&gt;Trove newspapers section&lt;/a&gt; of GLAM Workbench – adding links to app-ified versions of some notebooks, &amp;amp; direct links to @mybinderteam for everything. If you work with @TroveAustralia newspapers you might find it useful.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/10876f160b.jpg&#34; width=&#34;600&#34; height=&#34;547&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title>Download &amp; explore 1,499,259 rows of open data from NSW State Archives Online Indexes</title>
      <link>https://updates.timsherratt.org/2019/07/24/download-explore-rows.html</link>
      <pubDate>Wed, 24 Jul 2019 14:34:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/07/24/download-explore-rows.html</guid>
      <description>&lt;p&gt;NSW State Archives publishes a number of &lt;a href=&#34;https://www.records.nsw.gov.au/archives/collections-and-research/guides-and-indexes/indexes-a-z&#34;&gt;detailed indexes&lt;/a&gt; containing data manually extracted from their records. These provide additional entry points to the records, such as a person&amp;rsquo;s name, or a place. But they also provide useful data for analysis. However, to explore the index data we need to get it out of the web interface and into a form that can be easily downloaded and manipulated.&lt;/p&gt;
&lt;p&gt;I’ve created a series of Jupyter notebooks to harvest the all the indexes and save the data in a series of CSV-formatted files. I’ve also updated my repository containing all the harvested CSV files. It’s available from the new &lt;a href=&#34;https://glam-workbench.github.io/nsw-state-archives/&#34;&gt;NSW State Archives section&lt;/a&gt; of my GLAM Workbench. There are currently 64 different index datasets, containing 1,499,259 rows of data.&lt;/p&gt;
&lt;p&gt;And to help you get a sense of what&amp;rsquo;s actually in all those CSV files, I’ve created an interactive Index Explorer. Just select an index from the list, and the Index Explorer will generate a series of tables and visualisations that provide an overview of the data. Try &lt;a href=&#34;https://mybinder.org/v2/gh/GLAM-Workbench/nsw-state-archives/master?urlpath=%2Fapps%2Findex-explorer.ipynb&#34;&gt;running it live&lt;/a&gt; on Binder.&lt;/p&gt;
&lt;p&gt;Thanks to the State Archives staff and volunteers for preparing all this most excellent data. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/61527f660d.gif&#34; width=&#34;600&#34; height=&#34;497&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/07/11/new-in-glam.html</link>
      <pubDate>Thu, 11 Jul 2019 22:00:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/07/11/new-in-glam.html</guid>
      <description>&lt;p&gt;New in GLAM Workbench! &lt;a href=&#34;https://glam-workbench.github.io/pm-transcripts/&#34;&gt;Notebooks&lt;/a&gt; to harvest, index, analyse, and aggregate transcripts of speeches &amp;amp; interviews by Australian prime ministers. Plus links to harvested data and aggregated files. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/72363568ec.jpg&#34; width=&#34;600&#34; height=&#34;233&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/07/11/reorganising-things-a.html</link>
      <pubDate>Thu, 11 Jul 2019 12:26:29 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/07/11/reorganising-things-a.html</guid>
      <description>&lt;p&gt;Reorganising things a little at &lt;a href=&#34;https://glam-workbench.github.io/&#34;&gt;GLAM Workbench&lt;/a&gt;. @statelibrarynsw gets its own section. Hansard and @datagovau GLAM datasets now under ‘Australian government’. Making some space for further additions…&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/a32652f0ce.jpg&#34; width=&#34;600&#34; height=&#34;448&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/06/17/kicked-off-a.html</link>
      <pubDate>Mon, 17 Jun 2019 18:17:43 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/06/17/kicked-off-a.html</guid>
      <description>&lt;p&gt;Kicked off a new GLAM Workbench repository dedicated to @SLSA with a &lt;a href=&#34;https://nbviewer.jupyter.org/github/GLAM-Workbench/slsa/blob/master/Getting-higher-res-images.ipynb&#34;&gt;quick notebook hack&lt;/a&gt; to get higher res versions of digitised photos. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/7d61a71399.jpg&#34; width=&#34;600&#34; height=&#34;474&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/06/07/recent-additions-to.html</link>
      <pubDate>Fri, 07 Jun 2019 14:57:28 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/06/07/recent-additions-to.html</guid>
      <description>&lt;p&gt;Recent additions to the Trove Newspapers section of the GLAM Workbench: getting images from @TroveAustralia newspaper articles, and uploading article to @Omeka-S: &lt;a href=&#34;https://glam-workbench.github.io/trove-newspapers/&#34;&gt;glam-workbench.github.io/trove-new&amp;hellip;&lt;/a&gt;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/9990b25c5b.jpg&#34; width=&#34;600&#34; height=&#34;498&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/05/26/more-glam-workbench.html</link>
      <pubDate>Sun, 26 May 2019 17:39:22 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/05/26/more-glam-workbench.html</guid>
      <description>&lt;p&gt;More GLAM Workbench updates! More full text of Australian books! I&amp;rsquo;ve added &lt;a href=&#34;https://glam-workbench.github.io/trove-books/&#34;&gt;the notebook &amp;amp; data&lt;/a&gt; from my harvest of @TroveAustralia books in the @InternetArchive. There&amp;rsquo;s metadata and text of 1,153 books to explore. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/1f1ac0e8f8.jpg&#34; width=&#34;600&#34; height=&#34;440&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/05/19/some-overdue-updates.html</link>
      <pubDate>Sun, 19 May 2019 23:22:47 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/05/19/some-overdue-updates.html</guid>
      <description>&lt;p&gt;Some overdue updates to the GLAM Workbench. First here&amp;rsquo;s &lt;a href=&#34;https://glam-workbench.github.io/glam-data-portals/&#34;&gt;details, data, and code&lt;/a&gt; from a harvest of GLAM datasets on @datagovau. Includes details of more than 400 CSV datasets. #dhhacks&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/8ecb3eba56.jpg&#34; width=&#34;600&#34; height=&#34;438&#34; alt=&#34;&#34; /&gt;&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/58c8fa43a4.jpg&#34; width=&#34;554&#34; height=&#34;600&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/05/09/over-the-last.html</link>
      <pubDate>Thu, 09 May 2019 20:14:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/05/09/over-the-last.html</guid>
      <description>&lt;p&gt;Over the last week I&amp;rsquo;ve been downloading editorial cartoons published in &lt;em&gt;The Bulletin&lt;/em&gt; from @TroveAustralia. There&amp;rsquo;s 3,471 cartoons – at least one from every issue published between 4 Sep 1886 and 17 Sep 1952. And you can browse them all&amp;hellip;&lt;/p&gt;
&lt;p&gt;To make it easier to explore the images, I&amp;rsquo;ve compiled them into a series of PDFs – one PDF for each decade. The PDFs include lower resolution versions of the images together with their publication details and a link to Trove. They&amp;rsquo;re all &lt;a href=&#34;https://www.dropbox.com/sh/rulkbsqgfe8cyhv/AABel9b95buJSG5hZrVCvaQsa?dl=0&#34;&gt;available from DropBox&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.dropbox.com/s/altjl6jixwv5pt0/bulletin-1886-1889.pdf?dl=0&#34;&gt;1886 to 1889&lt;/a&gt; (45mb PDF)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.dropbox.com/s/p15swmact2c9euf/bulletin-1890-1899.pdf?dl=0&#34;&gt;1890 to 1899&lt;/a&gt; (139mb PDF)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.dropbox.com/s/0rivg50s8qam2et/bulletin-1900-1909.pdf?dl=0&#34;&gt;1900 to 1909&lt;/a&gt; (147mb PDF)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.dropbox.com/s/pdsj6xjot0l928w/bulletin-1910-1919.pdf?dl=0&#34;&gt;1910 to 1919&lt;/a&gt; (153mb PDF)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.dropbox.com/s/64x9y5nvgez1q3o/bulletin-1920-1929.pdf?dl=0&#34;&gt;1920 to 1929&lt;/a&gt; (159mb PDF)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.dropbox.com/s/8mytp5qhqcrctt3/bulletin-1930-1939.pdf?dl=0&#34;&gt;1930 to 1939&lt;/a&gt; (151mb PDF)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.dropbox.com/s/go3vyuqq0td6oqd/bulletin-1940-1949.pdf?dl=0&#34;&gt;1940 to 1949&lt;/a&gt; (146mb PDF)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.dropbox.com/s/klcv0gyjs81c0pm/bulletin-1950-1952.pdf?dl=0&#34;&gt;1950 to 1952&lt;/a&gt; (42mb PDF)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The complete collection of high resolution images (about 60gb in total) can be &lt;a href=&#34;https://cloudstor.aarnet.edu.au/plus/s/bI7hJREvO0oJLGL&#34;&gt;downloaded from CloudStor&lt;/a&gt;. The names of each image file provide useful contextual metadata. For example, the file name &lt;code&gt;19330412-2774-nla.obj-606969767-7.jpg&lt;/code&gt; tells you:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;19330412&lt;/code&gt; – the cartoon was published on 12 April 1933&lt;/li&gt;
&lt;li&gt;&lt;code&gt;2774&lt;/code&gt; – it was published in issue number 2774&lt;/li&gt;
&lt;li&gt;&lt;code&gt;nla.obj-606969767&lt;/code&gt; – the Trove identifier for the issue, can be used to make a url eg &lt;code&gt;[nla.gov.au/nla.obj-6...](https://nla.gov.au/nla.obj-606969767)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;7&lt;/code&gt; – on page 7&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There&amp;rsquo;s some details of the method that I used to find the cartoons &lt;a href=&#34;https://nbviewer.jupyter.org/github/GLAM-Workbench/trove-journals/blob/master/Finding_editorial_cartoons_in_the_Bulletin.ipynb&#34;&gt;in this notebook&lt;/a&gt;. I&amp;rsquo;ve also documented everything in the &lt;a href=&#34;https://glam-workbench.github.io/trove-journals/&#34;&gt;Trove Journals section&lt;/a&gt; of my GLAM Workbench.&lt;/p&gt;
&lt;p&gt;Be warned – the language, images, and ideas presented in &lt;em&gt;The Bulletin&lt;/em&gt; were often racist, anti-Semitic, and sexist. You won’t have to look far within this collection to find something offensive. This was, after all, the journal whose slogan for many years was ‘Australia for the white man’. This is our history&amp;hellip; #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/a92c91b746.jpg&#34; width=&#34;380&#34; height=&#34;600&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/04/27/and-now-my.html</link>
      <pubDate>Sat, 27 Apr 2019 12:44:19 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/04/27/and-now-my.html</guid>
      <description>&lt;p&gt;And now my GLAM Workbench has a &amp;lsquo;Trove Maps&amp;rsquo; section to document examples and explorations using data from @TroveAustralia&amp;rsquo;s &amp;lsquo;map&amp;rsquo; zone: &lt;a href=&#34;https://glam-workbench.github.io/trove-maps/&#34;&gt;glam-workbench.github.io/trove-map&amp;hellip;&lt;/a&gt; Includes a list of 20,158 maps with high-res downloads. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/4b7e1bf53e.jpg&#34; width=&#34;600&#34; height=&#34;545&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/04/23/ive-been-busy.html</link>
      <pubDate>Tue, 23 Apr 2019 14:49:00 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/04/23/ive-been-busy.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve been busy lately harvesting LOTS of full text data from @TroveAustralia&amp;rsquo;s digitised journals – so many opportunities for research! You should be able to get to all the code &amp;amp; data from the new &lt;a href=&#34;https://glam-workbench.github.io/trove-journals/&#34;&gt;Trove journals section&lt;/a&gt; of my GLAM Workbench. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/9f44f2edae.jpg&#34; width=&#34;600&#34; height=&#34;571&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/04/22/ive-added-a.html</link>
      <pubDate>Mon, 22 Apr 2019 23:07:01 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/04/22/ive-added-a.html</guid>
      <description>&lt;p&gt;I’ve &lt;a href=&#34;https://glam-workbench.github.io/trove-books/&#34;&gt;added a section&lt;/a&gt; for the @TroveAustralia ‘book’ zone to the GLAM Workbench.&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/452579e763.jpg&#34; width=&#34;600&#34; height=&#34;483&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/04/22/all-ocrd-text.html</link>
      <pubDate>Mon, 22 Apr 2019 12:11:06 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/04/22/all-ocrd-text.html</guid>
      <description>&lt;p&gt;All 9,738 OCRd text files harvested from books, pamphlets and leaflets in @TroveAustralia&amp;rsquo;s ‘book&#39; zone have been uploaded to @aarnet’s CloudStor for &lt;a href=&#34;https://cloudstor.aarnet.edu.au/plus/s/ugiw3gdijSKaoTL&#34;&gt;easy browsing/download&lt;/a&gt;. There&amp;rsquo;s also a &lt;a href=&#34;https://cloudstor.aarnet.edu.au/plus/s/SrNqP4IOwF1fMBz&#34;&gt;400mb zip file&lt;/a&gt; if you want the whole lot.&lt;/p&gt;
&lt;p&gt;The harvesting method and code is available &lt;a href=&#34;https://nbviewer.jupyter.org/github/GLAM-Workbench/trove-books/blob/master/Harvesting-digitised-books.ipynb&#34;&gt;in this notebook&lt;/a&gt;. All this and more will be documented soon in my &lt;a href=&#34;http://glam-workbench.github.io/&#34;&gt;GLAM Workbench&lt;/a&gt;. #dhhacks&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/03/31/train-from-canberra.html</link>
      <pubDate>Sun, 31 Mar 2019 21:23:43 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/03/31/train-from-canberra.html</guid>
      <description>&lt;p&gt;Train from Canberra to Melbourne booked for #VALATechCamp. I&amp;rsquo;ll be hanging around both days, so let me know if you&amp;rsquo;d like to chat about the &lt;a href=&#34;https://glam-workbench.github.io/&#34;&gt;GLAM Workbench&lt;/a&gt;, Jupyter, Trove data, or any of the other things I fiddle with&amp;hellip;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/02/24/ive-updated-the.html</link>
      <pubDate>Sun, 24 Feb 2019 20:15:56 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/02/24/ive-updated-the.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve updated the notebook for harvesting records from @archivesnz&amp;rsquo;s Archway database in &lt;a href=&#34;https://glam-workbench.github.io/archway/&#34;&gt;my GLAM Workbench&lt;/a&gt;. I just used it to harvest more than 8,000 records from series 8333 relating to naturalisation. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/b97644687c.jpg&#34; width=&#34;600&#34; height=&#34;313&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/02/21/new-section-added.html</link>
      <pubDate>Thu, 21 Feb 2019 11:46:17 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/02/21/new-section-added.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://glam-workbench.github.io/qsa/&#34;&gt;New section added&lt;/a&gt; to my GLAM Workbench for the Queensland State Archives (@qsarchives). Includes a notebook to add series information into their Naturalisations 1851-1904 index. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/23a43eec3b.jpg&#34; width=&#34;600&#34; height=&#34;363&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/02/17/suggestions-of-new.html</link>
      <pubDate>Sun, 17 Feb 2019 14:33:44 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/02/17/suggestions-of-new.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://glam-workbench.github.io/suggest-a-topic/&#34;&gt;Suggestions of new topics and collections&lt;/a&gt; for my GLAM workbench are welcome!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/bc41952023.jpg&#34; width=&#34;600&#34; height=&#34;352&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/02/17/ive-added-a.html</link>
      <pubDate>Sun, 17 Feb 2019 12:40:14 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/02/17/ive-added-a.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve added a &lt;a href=&#34;https://glam-workbench.github.io/lac/&#34;&gt;section for Library and Archives Canada&lt;/a&gt; to my GLAM workbench. The first notebook extracts records of people from a specific country from their naturalisations database and saves the results as a CSV file. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/6dac64955d.jpg&#34; width=&#34;600&#34; height=&#34;334&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/02/15/current-status-extracting.html</link>
      <pubDate>Fri, 15 Feb 2019 23:59:24 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/02/15/current-status-extracting.html</guid>
      <description>&lt;p&gt;Current status — extracting data from Library and Archives Canada&amp;rsquo;s 1915-1946 naturalisation database. Coming soon to my GLAM Workbench&amp;hellip;&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/a43f9aa170.jpg&#34; width=&#34;600&#34; height=&#34;479&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/02/01/ive-added-a.html</link>
      <pubDate>Fri, 01 Feb 2019 22:26:26 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/02/01/ive-added-a.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve added a &amp;lsquo;save chart&amp;rsquo; option to the &lt;a href=&#34;https://glam-workbench.github.io/trove-newspapers/&#34;&gt;QueryPic app in my GLAM Workbench&lt;/a&gt;. Visualise your searches in @TroveAustralia newspapers, then save the results as HTML for easy download. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/7e313ccec9.jpg&#34; width=&#34;600&#34; height=&#34;545&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/01/23/one-more-and.html</link>
      <pubDate>Wed, 23 Jan 2019 23:17:20 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/01/23/one-more-and.html</guid>
      <description>&lt;p&gt;One more and I&amp;rsquo;m done for the night&amp;hellip; New GLAM Workbench page for the &lt;a href=&#34;https://glam-workbench.github.io/trove/&#34;&gt;&amp;lsquo;Trove API introduction&amp;rsquo;&lt;/a&gt; notebooks.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/01/23/ive-finished-putting.html</link>
      <pubDate>Wed, 23 Jan 2019 22:22:10 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/01/23/ive-finished-putting.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve finished putting details of all the current GLAM Workbench repositories into the &lt;a href=&#34;https://glam-workbench.github.io/&#34;&gt;new documentation site&lt;/a&gt;. Still a few notebooks to migrate from the original workbench, but getting there! There&amp;rsquo;s about 50 Jupyter notebooks so far. #dhhacks&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/d1161383c0.jpg&#34; width=&#34;600&#34; height=&#34;457&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/01/23/added-a-data.html</link>
      <pubDate>Wed, 23 Jan 2019 17:07:30 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/01/23/added-a-data.html</guid>
      <description>&lt;p&gt;Added a &amp;lsquo;data&amp;rsquo; section to the GLAM Workbench docs, with info on &lt;a href=&#34;https://glam-workbench.github.io/glam-data-portals/&#34;&gt;harvests from government data portals&lt;/a&gt;, as well as series from @naagovau relating to ASIO and the White Australia Policy.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/01/23/and-now-a.html</link>
      <pubDate>Wed, 23 Jan 2019 11:23:08 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/01/23/and-now-a.html</guid>
      <description>&lt;p&gt;And now a &lt;a href=&#34;https://glam-workbench.github.io/tepapa/&#34;&gt;GLAM Workbench page&lt;/a&gt; for @Te_Papa&amp;hellip;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/01/23/added-a-page.html</link>
      <pubDate>Wed, 23 Jan 2019 10:49:01 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/01/22/added-a-page.html</guid>
      <description>&lt;p&gt;Added a page for @ArchivesNZ&amp;rsquo;s &lt;a href=&#34;https://glam-workbench.github.io/archway/&#34;&gt;Archway&lt;/a&gt; to the GLAM Workbench docs&amp;hellip;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/01/22/so-heres-some.html</link>
      <pubDate>Tue, 22 Jan 2019 17:30:49 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/01/22/so-heres-some.html</guid>
      <description>&lt;p&gt;So here&amp;rsquo;s &lt;a href=&#34;https://glam-workbench.github.io/trove-newspapers/&#34;&gt;some fun things to do&lt;/a&gt; with @TroveAustralia newspapers&amp;hellip; (via GLAM Workbench)&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/dca9e5a815.jpg&#34; width=&#34;600&#34; height=&#34;170&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/01/22/ok-more-documentation.html</link>
      <pubDate>Tue, 22 Jan 2019 15:06:50 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/01/22/ok-more-documentation.html</guid>
      <description>&lt;p&gt;Ok, &lt;a href=&#34;https://glam-workbench.github.io/digitalnz/&#34;&gt;more documentation&lt;/a&gt; for you — page for the @DigitalNZ API in GLAM Workbench updated!&lt;/p&gt;
&lt;img src=&#34;https://cdn.uploads.micro.blog/8371/2019/47fb133507.jpg&#34; width=&#34;600&#34; height=&#34;231&#34; alt=&#34;&#34; /&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/01/22/slowly-working-my.html</link>
      <pubDate>Tue, 22 Jan 2019 13:11:33 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/01/22/slowly-working-my.html</guid>
      <description>&lt;p&gt;Slowly working my way through the documentation for my GLAM Workbench. Still &lt;strong&gt;lots&lt;/strong&gt; to do, but I think the page for @naagovau&amp;rsquo;s &lt;a href=&#34;https://glam-workbench.github.io/recordsearch/&#34;&gt;RecordSearch&lt;/a&gt; is now up-to-date.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/01/22/if-there-are.html</link>
      <pubDate>Tue, 22 Jan 2019 12:38:15 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/01/22/if-there-are.html</guid>
      <description>&lt;p&gt;If there are APIs or other data sources you&amp;rsquo;d like me to add to my GLAM Workbench, feel free to &lt;a href=&#34;https://github.com/GLAM-Workbench/glam-workbench.github.io/issues&#34;&gt;create an issue&lt;/a&gt;. You could also describe what sorts of tools or examples using that data source would be useful.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title></title>
      <link>https://updates.timsherratt.org/2019/01/16/new-notebook-added.html</link>
      <pubDate>Wed, 16 Jan 2019 17:41:45 +1100</pubDate>
      
      <guid>http://wragge.micro.blog/2019/01/16/new-notebook-added.html</guid>
      <description>&lt;p&gt;New notebook added to the #GLAMWorkbench RecordSearch repository — get the basic details of agencies associated with all government functions used in @naagovau&amp;rsquo;s RecordSearch and save to a single JSON data file. &lt;a href=&#34;https://nbviewer.jupyter.org/github/GLAM-Workbench/recordsearch/blob/master/get_all_agencies_by_function.ipynb&#34;&gt;View code and data&lt;/a&gt;. #dhhacks&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>