Archive

October 2025

Creating bounding boxes for parish maps in the SLV collection: The State Library of Victoria holds a collection of 8,804 parish maps. As part of my residency at the SLV LAB, I’ve been poking around in the metadata. SLV staff have geocoded many of the parish maps using the Composite Gazetteer of Australia, which provides coordinates for Victorian parishes …

September 2025

Exploring SLV urls: I like urls. They take you places. And if you know how to read them, they can tell you things about the systems that created them. One of the first things I did when I started my residency at SLV LAB, was to try and understand how their collection urls work. There’s a couple of well-worn …

Creative Technologist-in-Residence at the State Library of Victoria!: I’m very excited to be the new Creative Technologist-in-Residence at the SLV LAB. For the next few months I get to play around with metadata and images, think about online access, experiment with different technologies, and build things to help people to explore the State Library’s …

August 2025

WikiFest at the State Library of Victoria: This week I was lucky enough to participate in WikiFest at the State Library of Victoria. Organised by the State Library’s new innovation LAB and Wikimedia Australia, Wikifest was a hands-on, participant-led workshop focused on the possibilities of connecting SLV’s collections to (and …

July 2025

GLAM hacking with userscripts: In teaching and workshops I used to get students to question the idea that websites are ‘published’. They’re not released into the world in a fixed, immutable form – they’re a set of blueprints which only reach their final form in your browser window. This makes it possible …

The rebirth of Wragge Labs (and moving my Heroku apps): It looks like some paid work I was counting on won’t be going ahead, so I’m trying to save a bit of money on cloud hosting. As I previously noted, this resulted in the resurrection of The future of the past, but I’ve also been continuing to slog away at migrating all my old Flask …

The future of the past... in the present: I’ve been on a bit of a self-archiving binge lately. It started because I needed to cut back some of my web hosting costs, and was looking at ways of bringing together a group of separately hosted Heroku apps onto a single Digital Ocean droplet. While taking stock of my various apps and …

June 2025

Mining for meanings: In 2012, I was lucky enough to be awarded a Harold White Fellowship by the National Library of Australia. I used my time to explore ways of using Trove’s digitised newspapers as data, and presented my work at a public lecture in May 2012. I spoke from notes and never got round to writing it …

A brief and biased history of Trove Twitter bots: The socials recently alerted me to an interesting article by Dominique Carlon, Jean Burgess, and Kateryna Kasianenko on the history of community-created Twitter bots. The article explores bot-making within the context of Twitter’s rise and fall, and provides a handy taxonomy of bot species. …

Some Archives Week goodies: It’s International Archives Week and I’m feeling a bit crook after being double-vaxxed yesterday, so instead of doing something productive, I’m just going to make a list of potentially handy archives-related resources from the Wonderful World of Wragge(TM). The theme of Archives …

New dataset – Trove links shared on Twitter, 2009 to 2020: A few years ago, I harvested the details of tweets that included links to Trove. The data has just been sitting on my computer, so I thought I should package it up and share, in case it’s of use to anyone. The story is that back in 2021, I was working on the article ‘More than …

GLAM Workbench ­– preprint for 'Building User-Friendly Toolkits and Platforms for Digital Humanities': This is a preprint of my contribution to the publication ‘Building User-Friendly Toolkits and Platforms for Digital Humanities’. It provides a brief overview of the GLAM Workbench. I had to leave a lot out, but hopefully it provides a useful summary of what the GLAM Workbench is, and …

May 2025

No more harvesting data from the National Archives of Australia: A couple of weeks ago I bid farewell to Trove due to the cancellation of my API keys and the NLA’s lack of transparency around changes to API access. Now it seems I have to wave goodbye to 16+ years of work on RecordSearch, the National Archives of Australia’s online database. I noticed …

Farewell Trove: Over the last few months I’ve been grappling with the cancellation of my Trove API keys by the National Library of Australia. It may seem like a minor technical hiccup from the outside, but it’s had a major personal impact. For the sake of my health, I’ve decided to stop work on …

SLV LAB and GLAM Workbench updates: Last week the State Library of Victoria launched SLV LAB, a prototyping and innovation lab that ‘experiment[s] with technology to open access to collections, data and spaces’. The SLV LAB encourages collaboration, and is sharing code, datasets, and tutorials. It’s an exciting …

April 2025

New PROV section added to the GLAM Workbench: There’s a brand new GLAM Workbench section to help you work with data from the Public Record Office Victoria! Over the past couple of months, I’ve been poking around in the PROV’s collection API. The API provides data about PROV’s archival holdings in a machine readable …

The GLAM Workbench introduction to how notebooks work now runs in Jupyter Lite: I’ve just updated my introduction to using Jupyter notebooks in the GLAM Workbench so that it runs in Jupyter Lite – that means no more waiting for cloud services to spin up, it all happens in your browser! All the Jupyter notebooks in GLAM Workbench can be run in the cloud using the free …

Update on Trove data access and my suspended API keys: On 21 February, my Trove API keys were cancelled without warning. A week later, I met with NLA staff and was shocked to be told that downloading ‘content’, such as the text of digitised newspaper articles, was regarded as a breach of the API terms of use. Without API access I can’t …

Using the Public Record Office Victoria's API to build an overview of their collection: Over the past few weeks I’ve been exploring the Public Record Office Victoria’s public API. There’s not a lot of documentation, but there is a lot of data! What’s not immediately obvious is that the API includes information about a variety of different entities within the …

More than 6 million rows of data from Public Record Office Victoria added to the GLAM Name Index Search: The GLAM Name Index Search now includes more than 6 million rows of data from the Public Record Office Victoria, downloaded using their public API. The GLAM Name Index Search brings together records that include the names of people from 10 Australian GLAM organisations. With a single search, you can …

Introducing PROVBot – sharing photos from Public Record Office Victoria: With poor old TroveNewsBot killed by the NLA, my Mastodon feed has had less GLAM goodness of late. To try and fill the void I’ve created PROVBot, sharing photos from the Public Record Office Victoria. PROVBot makes use of the Public Record Office Victoria’s public API. At this stage it …

March 2025

Trove API users beware! – the latest in the saga of my cancelled API keys: After my Trove API keys were cancelled without warning on 21 February, I reluctantly agreed to a meeting with the National Library of Australia. They had provided so little information in their emails, that it seemed to be the only way to find out what was really going on. I came out of the meeting …

February 2025

15 years of work on Trove threatened by the NLA: See my latest post for an update! On Friday, without warning, I received an email from the National Library of Australia informing me that my Trove API keys had been suspended. This threatens the future of 15 years of work helping people use and understand the possibilities of Trove for new types of …

The Primary Source – GLAM collection news and help: I’ve created a new site (or in fact, renovated an old site) to aggregate news from GLAM collections (that’s galleries, libraries, archives, and museums) and help researchers using those collections. It’s called The Primary Source which is a bit of a bad history pun. Why is is …

National Archives of Australia Digitisation Dashboard: Since March 2021, I’ve been harvesting details of newly-digitised files in the National Archives of Australia to help document long-term changes to online access. A few weeks ago, I summarised the data from 2024, and published annual compilations in Zenodo. I’ve now created an …

Search the content of periodicals uploaded to Trove through the National eDeposit service : I’ve added a notebook to the GLAM Workbench that walks through the steps involved in creating a fully searchable database of content extracted from a periodical uploaded to Trove through the National eDeposit service (NED). Why is this needed? I was contacted recently by a member of the team …

Ten years of data! The files you're not allowed to see in the National Archives of Australia: I’ve created a new dataset containing 10 years of data that can be used to explore the workings of the National Archives of Australia’s access examination system. Australian government records become available for public access after 20 years. But before being opened to the public, …

January 2025

A Community Data Lab (CDL) wishlist: The ARDC is holding an event on 18 February to begin shaping the next phase of the Community Data Lab. If you’re interested in the development of digital tools and resources to support HASS research, I’d suggest you go along. I worked on the first phase of the Community Data Lab, …

Files digitised by the National Archives of Australia in 2024: In 2024, the National Archives of Australia digitised 254,953 files (down from 416,602 in 2023). This chart shows the number of files digitised per day in 2024. The decrease in the total number of files digitised is probably related to the completion of the NAA’s five year project to digitise …

Changes to Trove newspapers in 2024: Every Sunday I harvest information about the number of digitised newspaper articles in Trove. You can view the current results in the Trove Data Dashboard. By compiling all the data from 2024, you can find out what changed last year. 6,241,739 digitised newspaper articles were added to Trove in …

December 2024

@trovenewsbot has a new home: @trovenewsbot has been around for more than eleven years now – originally sharing Trove newspaper articles on Twitter, and now on the Fediverse. But with the imminent closure of the botsin.space Mastodon instance, I’ve had to find it a new home. Say hello to the latest version: …

November 2024

Six more volumes added to the searchable database of Tasmanian Post Office Directories!: A couple of months ago I realised my big, searchable database of Tasmanian Post Office Directories was missing the volume from 1920. It took a bit of work to add it in, as described in this post. Unfortunately, I’d barely finished when I realised that a number of other years were also missing! …

September 2024

Where's 1920? Missing volume added to Tasmanian Post Office Directories!: Visualisation is a great way to find problems in your data. As part of the Everyday Heritage project, I’m working with a team to document the lives of Tasmania’s Chinese residents in the 19th and early 20th centuries. We’re using a variety of sources such as Trove’s …

Major update for the Trove Newspapers section of the GLAM Workbench: The Trove newspapers section of the GLAM Workbench was updated last week. Over the last year I’ve been gradually updating notebooks to use version 3 of the Trove API, but when version 2 suddenly disappeared a couple of weeks ago I had to hurriedly pull everything together. The Trove newspapers …

Preserving the history of online collections (my love letter to future historians): It’s pretty obvious that access to digitised resources, like Trove’s newspapers, has changed the practice of history in Australia. But how? I’m certain that the historiographical implications of the growth and development of online collections will become a topic of increasing …

Saving Trove's digitised periodicals as PDFs: I was recently contacted by a researcher who wanted to be able to automatically download the issues of a digitised periodical in Trove as PDFs. There was already a notebook in the GLAM Workbench that downloads the issues of a digitised newspaper as PDFs, but newspapers work differently to other …

August 2024

The future (and past) of Historic Hansard: Don’t panic! Historic Hansard is not closing down – on the contrary, I’m planning a major update in the next few months. But as I look to the future, I thought it was a good time to pull together a few threads documenting my adventures with Commonwealth Hansard. The past Commonwealth …

Join the Research Data Alliance's new Collections as Data Interest Group!: If you’re interested in opening up GLAM collections for use in research, you might like to join the new Collections as Data Interest Group, part of the Research Data Alliance. According to the group description: This group is aimed at collections professionals such as archivists, librarians, …

More datasets added to GLAM Name Index Search – now almost 12 million rows of data!: The GLAM Name Index Search brings datasets from 10 Australian GLAM organisations together into a single search interface. All these datasets index collections by people’s names, so with one search you can find information about individuals across a broad range of records, locations, and periods. It …

New Zotero translators for PROV and Queensland State Archives: Good news for Australian archives users – you can now use Zotero to capture item details and digitised files from the collections of the Public Record Office Victoria and the Queensland State Archives! What is Zotero? According to the Zotero website: Zotero is a free, easy-to-use tool to help you …

Explore Trove's digitised maps: Trove contains thousands of digitised maps from the collections of the National Library of Australia, but they’re not always easy to find because of the way they’re arranged and described. To help you explore these maps I’ve created a new database and published it using Datasette. …

July 2024

Share your spreadsheet as a searchable online database using Datasette-Lite: HASS researchers often compile data in spreadsheets. Sometimes they want to ‘publish’ this data online in a form that encourages others to use and explore – but how? I’ve just added a simple tool to the GLAM Workbench that helps you construct a url that will open a CSV file as a …

Home renovations at timsherratt.au : I had to update my sadly-neglected CV, so of course I ended up renovating the whole of my personal website at timsherratt.au. To start with, I migrated my CV from Pages to Markdown. This made it easy to integrate the CV’s content into the site’s about me page. As i was updating the CV, I …

Updated datasets describing Trove's digitised newspapers: The Trove newspapers section of the GLAM Workbench includes a number of notebooks and datasets that document the context and content of the newspaper corpus. I’ve just updated a few of these datasets: Total number of issues per year for each newspaper in Trove Complete list of issues for …

Digital History stream at the AHA conference: The recently finished Australian Historical Association conference in Adelaide included a digital history stream sponsored by the Australian Research Data Commons. I’ve listed the details of all the presentations below. I also thought it might be useful to try and bring together links to the …

Understanding Trove at the AHA annual conference: A fairly intensive period of work came to an end today as I delivered a workshop on ‘Understanding Trove’ at the Australian Historical Association’s annual conference in Adelaide. In effect, the workshop was also the launch of the Trove Data Guide, which I’ve been developing …

June 2024

Who is the Trove Data Guide for?: The Trove Data Guide aims to help researchers understand, access, and use data from Trove. But just because it’s about ‘data’ doesn’t mean you need to be able to code. To understand Trove data and its possibilities for research, you first need to understand Trove itself – its history, its structure, …

Loading locations of Trove's digitised maps into the Gazetteer of Historical Australian Placenames: For this part of the ARDC’s Community Data Lab project, I’ve been focusing in particular on adding a series of researcher pathways to the Trove Data Guide. These pathways link data from Trove to a variety of tools and approaches and include five detailed tutorials. The first four were: …

Instant exhibitions with Trove and CollectionBuilder: You’ve been collecting and annotating items relating to your research project in a Trove List. You’d like to display the contents of your list as an online exhibition for others to explore. But how? One possible approach is now documented in the Trove Data Guide. I’ve added a tutorial which …

Keyword analysis of Trove newspapers with the GLAM Workbench & ATAP: There’s a new draft tutorial in the development version of the Trove Data Guide. It walks through the process of harvesting a collection of digitised newspaper articles from Trove, reshaping the harvest to create sub-collections, and then loading the data into the Keyword Analysis Tool …

Running Mirador on GitHub Pages: I’ve just created a GitHub repository template that you can use to get your own Mirador version 3 installation running in minutes. You can also configure it to display local or remote IIIF manifests. I was thinking that it could be useful for researchers who want to create their own customised …

May 2024

Commonwealth Hansard XML repository updates: Hey Australian Hansard fans, I’ve done a complete reharvest of all of the Commonwealth Hansard XML files from 1901 to 1980 from ParlInfo. There’s been lots of improvements/corrections, and most of the file names have changed (they now have a version flag). The improvements seem to be …

More tools for harvesting Trove newspaper articles: I’ve just added a couple of new notebooks to the Trove Newspaper & Gazette Harvester section of the GLAM Workbench. Using the Trove Harvester as a Python package provides a basic example of using the trove-newspaper-harvester Python package. While there’s already a simple web app …

Trove to Tropy via IIIF – documenting data pathways in the Trove Data Guide: Last week I added a notebook to the GLAM Workbench that saves a collection of images from Trove as an IIIF manifest. This week I’ve written a tutorial that shows how you can use the notebook to load the collection data in Tropy – a desktop tool for managing and annotating images for research. …

Using IIIF to explore Trove's digitised images: I’ve just added a new notebook to the Trove images section of the GLAM Workbench. It helps you save a collection of digitised images as an IIIF manifest. But what does that mean? It means the notebook packages up all the metadata describing the images in a standard form that can be used with a …

Using Pandora's collection of archived websites: There’s a brand new section of the GLAM Workbench to help you use data from Pandora’s collection of archived websites. What’s Pandora? Pandora is an initiative of the National Library of Australia which has been selecting web sites and online resources for preservation since 1996. …

April 2024

How to download all the images from a digitised collection in Trove (& learn some cool Trove tricks): Digitised resources in Trove are sometimes grouped into collections – an album of photographs, a set of posters, a bundle of letters. I’ve just added a notebook to the GLAM Workbench that downloads all the images in a collection at the highest available resolution. A sample of the 3,048 …

What do you want to do with Trove data?: In my work on the Trove Data Guide I’ve started sketching out a series of research pathways. These are intended as ways of connecting Trove data to tools and questions – providing examples of the steps involved in gathering, preparing, and using data to explore particular research topics. …

Update! Saving Trove newspaper articles and pages as images: You probably know that when you select the Download as Image option for a digitised newspaper article in Trove what you get back is not actually an image ­– it’s an HTML document, in which the original image has been sliced up to try and fit on an A4 page when printed. So this article: Ends up …

Getting to know NED – born-digital periodicals in Trove: I spend a lot of my time trying to highlight the wealth of resources available through Trove – whether that’s 25,000 digitised Parliamentary Papers, 6,000 oral histories you can listen to online, or 3,471 full-page editorial cartoons from The Bulletin. Most recently I’ve been working on …

March 2024

More tools and data for working with Trove's digitised periodicals: The Trove Periodicals section of the GLAM Workbench has been updated! Some changes were necessary to make use of version 3 of the Trove API, but I’ve also taken the chance to reorganise things a bit – starting with the name. This section used to be called ‘Trove journals’, …

A new way to explore editorial cartoons from *The Bulletin*: About five years ago I created a collection of full-page editorial cartoons from The Bulletin, harvested from Trove. Through a process that might be politely described as ‘iterative’, I fiddled with an assortment of queries and methods until I had at least one cartoon from every issue …

February 2024

New GLAM Workbench section for working with government publications in Trove: The GLAM Workbench has a brand new section aimed at helping you find and use government publications in Trove. Most of the GLAM Workbench’s existing sections focus on a particular resource format, or are related to one of Trove’s top-level categories. This didn’t quite work for …

Digital history stream at AHA annual conference in July: This year the annual conference of the Australian Historical Association will include a digital history stream, sponsored by the Australian Research Data Commons (ARDC), and convened by me! The call for papers is available here or through the Conference website. The list of possible topics is …

Some recent presentations on the GLAM Workbench and Trove Data Guide: Last week I attended the ARDC Workshop on Repositories & Workspaces where I gave a quick intro to the GLAM Workbench and the Community Data Lab. Then it was off to the ARDC HASS&I Research Data Commons Summer School where I explored some of the mysteries of Trove in a walk-through of the …

January 2024

Exploring Trove’s digitised periodicals: While Trove’s digitised newspapers get all the attention, there are many other digitised periodicals to explore. But it’s not easy to find them from the Trove web interface – unlike the newspapers, there’s no list of digitised titles. So to help researchers find and use Trove’s digitised …

The Trove Newspaper Data Dashboard now has an archive!: Since July 2022 I’ve been generating weekly snapshots of the contents of the Trove newspaper corpus. Every Sunday a new version of the Trove Newspaper Data Dashboard is created, highlighting what’s changed over the previous week, and visualising trends since April 2022 (when I first started regular …

Customising Datasette-Lite to explore datasets in the GLAM Workbench: As well as tools and code, the GLAM Workbench includes a number of pre-harvested datasets for researchers to play with. But just including a link to a CSV file in GitHub or Zenodo isn’t very useful – it doesn’t help researchers understand what’s in the dataset, and why it might be useful. That’s why …

What’s going on?: The hardest part of developing tools and resources like the GLAM Workbench is getting information about them to the people who might benefit. The collapse of Twitter has only added to the difficulty, as has the reluctance of GLAM organisations to share new resources with their users. I’d rather …

Exploring oral histories in Trove: The National Library of Australia holds over 55,000 hours of oral history and folklore recordings dating back to the 1950s. This collection is being made available online, and many recordings can now be listened to using Trove’s audio player. However, the oral history collection is not easy to find …

Mapping MARC Geographic Area codes to Wikidata: Trove uses codes from the MARC Geographic Areas list to identify locations in metadata records. I couldn’t find any mappings of these codes to other sources of geospatial information, so I fired up OpenRefine and reconciled the geographic area names against Wikidata. Once I’d linked as …

National Archives of Australia in 2023 – digitisation of files: In 2023 the National Archives of Australia digitised 416,602 files (down from 575,597 in 2022). This chart shows the number of files digitised per day in 2023. These files were drawn from 1,423 different series, but the vast bulk (81%) were from 4 series of World War Two service records. (This media …

Trove newspapers in 2023: I’ve been capturing weekly snapshots of the Trove newspaper corpus for the last couple of years. You can see the latest results in the Trove Newspaper Data Dashboard. Using this data I’ve compiled a quick summary of changes over the last year. 7,518,764 digitised newspaper articles were added to …

September 2023

Trove Data Guide update – accessing data from newspapers and gazettes: I’m continuing to slog away at the Trove Data Guide (part of the ARDC’s HASS Community Data Lab) – dumping everything I know about Trove into a format that I hope will be useful for researchers. I’ve just finished a first pass through the section on accessing data from newspapers and gazettes, and …

August 2023

Some important updates for the Trove Newspaper & Gazette Harvester : Version 3 of the Trove API is out, and version 2 is scheduled to be decommissioned in early 2023 – that means I have a lot of code to update! First cab of the rank is the Trove Newspaper & Gazette Harvester with version 0.7.1 now available. The Harvester is a Python package that can be used as …

Run GLAM Workbench notebooks on the ARDC’s new Binder service: There are a number of different ways to run the Jupyter notebooks in the GLAM Workbench depending on your needs and technical skills. But the easiest and quickest has always been the public, international Binder service, based in Europe. One click in the GLAM Workbench and Binder prepares a …

Trove Query Parser updated!: I’ve just updated the Trove Query Parser to work with version 3 of the Trove API. You just give it the url of a search in Trove’s newspapers, and it translates the search into a set of parameters that the API will understand. So this: …

Family history resources in the GLAM Workbench: It’s Family History Month, so I thought a brief post was in order describing some of the family history related resources in the GLAM Workbench. GLAM Name Index Search This is the biggie (in more ways than one). I’ve brought 263 datasets from 10 Australian GLAM organisations together into a single …

Bye bye birdsite: In early June I pinned a “nobody’s home” post to my profile and said goodbye to Twitter. After 15 years, I was sad to leave behind friends and colleagues, but glad to get away from the hate, the nazis, and the transphobes. I hadn’t been posting much since Elno took over …

Exploring the front pages of newspapers (10 years on): Way back in 2012, I used the brand new Trove API to download the details of 4 million articles published on the front pages of newspapers. I did it for two reasons: first, I wanted to see how the content of front pages changed over time; and second, I wanted to show that large-scale data wrangling …

July 2023

Trove API Console updates: The Trove API Console provides examples of the Trove API in action that you can run, edit, and share. It’s been online for 9 years now, and I’ve just updated it to use version 3 of the Trove API by default. I’ve also added a new ‘Share’ button that makes it easier to share and embed examples. If you …

Getting to work on the Trove Data Guide: The ARDC has started work on the development of a HASS Community Data Lab to support digital research in the humanities. I’m part of the team of contractors, and my work package is focused on the development of a Trove Data Guide. My aim is to give researchers what they need to use and understand …

May 2023

Updated harvest of NSW State Archives indexes – more than 2 million rows of data!: The NSW State Archives (now part of Museums of History NSW) publishes a series of useful indexes to its collections. The indexes include basic data transcribed from the records, such as names, dates, and places, providing fine-grained access to the collections. But when they’re explored as data, the …

March 2023

A big milestone, Trove contributor data, and the coming of API v3 – recent GLAM Workbench updates: There have been quite a few GLAM Workbench updates over the last month, here’s some notes. (See February’s update for more recent changes…) General developments After many months of work, all thirteen Trove repositories within the GLAM Workbench have been updated to include standard configurations, …

February 2023

Maps, people, lists & more – recent updates to Trove resources in the GLAM Workbench: Once again I’ve gotten a bit behind in noting GLAM Workbench updates, so here’s a quick catch up on some Trove-related changes from the last couple of months. Trove API introduction The section that introduces the Trove API (or APIs!) hasn’t had much love over recent years. I’m hoping to add some …

December 2022

Real Face of White Australia – updated site to transcribe records from the National Archives of Australia: Back in 2017, I worked with students from my ‘Exploring Digital Heritage’ class at the University of Canberra to develop and launch a site to transcribe records from the National Archives of Australia relating to the administration of the White Australia Policy. The highlight was a weekend-long …

Recent presentations – Library of Congress Data Jam, Everyday Heritage, Wikidata, and GLAM Workbench!: October and November brought a flurry of presentations from which I’m still recovering. Here’s a few details and links. Library of Congress Data Jam In October, the Computing Cultural Heritage in the Cloud project at the Library of Congress organised a Data Jam. I was invited to spend a couple of …

November 2022

The Australian history industry and the impact of digitisation (open access preprint chapter): The Australian History Industry was published recently. Edited by Paul Ashton and Paula Hamilton, the book ‘explores the complex, multi-roomed house of Australian history’, exploring academic, school, and public history, the impact of digital technologies, and the relationship of history to memory, …

Recent updates to trove-newspaper-harvester and trove-newspaper-images: Catching up on some software package updates over the last few months. The trove-newspaper-harvester package is now at v0.6.5. Recent changes include: Fix to handle articles with missing metadata Don’t try to re-download existing text and PDF files on restart Better error messages for CLI …

September 2022

Do you want your Trove newspaper articles in bulk? Meet the new Trove Newspaper Harvester Python package!: The Trove Newspaper Harvester has been around in different forms for more than a decade. It helps you download all the articles in a Trove newspaper search, opening up new possibilities for large-scale analysis. You can use it as a command-line tool by installing a Python package, or through the …

From 48 PDFs to one searchable database – opening up the Tasmanian Post Office Directories with the GLAM Workbench: A few weeks ago I created a new search interface to the NSW Post Office Directories from 1886 to 1950. Since then, I’ve used the same process on the Sydney Telephone Directories from 1926 to 1954. Both of these publications had been digitised by the State Library of NSW and made available through …

Fresh harvest of OCRd text from Trove's digitised periodicals – 9gb of text to explore and analyse!: I’ve updated the GLAM Workbench’s harvest of OCRd text from Trove’s digitised periodicals. This is a completely fresh harvest, so should include any corrections made in recent months. It includes: 1,430 periodicals OCRd text from 41,645 issues About 9gb of text The easiest way to explore the …

Explore Trove's digitised newspapers by place: I’ve updated my map displaying places where Trove digitised newspapers were published or distributed. You can view all the places on single map – zoom in for more markers, and click on a marker for title details and a link back to Trove. If you want to find newspapers from a particular area, …

Making NSW Postal Directories (and other digitised directories) easier to search with the GLAM Workbench and Datasette: As part of my work on the Everyday Heritage project I’m looking at how we can make better use of digitised collections to explore the everyday experiences woven around places such as Parramatta Road in Sydney. For example, the NSW Postal Directories from 1886 to 1908 and 1909 to 1950 have been …

August 2022

Interested in Victorian shipwrecks? Kim Doyle and Mitchell Harrop have added a new notebook to the Heritage Council of Victoria section of the GLAM Workbench exploring shipwrecks in the Victorian Heritage Database: glam-workbench.net/heritage-…

Updates! troveharvester Python package updated to v0.5.1: github.com/wragge/tr… Trove Newspaper Harvester section of #GLAMWorkbench updated to v1.1.1 to use latest troveharvester: glam-workbench.net/trove-har…

Minor update to RecordSearch Data Scraper – now captures ‘institution title’ for agencies if it is present. pypi.org/project/r…

Many thanks to the British Library – sponsors of the GLAM Workbench’s web archives section!: You might have noticed some changes to the web archives section of the GLAM Workbench. I’m very excited to announce that the British Library is now sponsoring the web archives section! Many thanks to the British Library and the UK Web Archive for their support – it really makes a difference. The web …

New GLAM data to search, visualise and explore using the GLAM Workbench!: There’s lots of GLAM data out there if you know where to look! For the past few years I’ve been harvesting a list of datasets published by Australian galleries, libraries, archives, and museums through open government data portals. I’ve just updated the harvest and there’s now 463 datasets …

Zotero now saves links to digitised items in Trove from the NLA catalogue!: I’ve made a small change to the Zotero translator for the National Library of Australia’s catalogue. Now, if there’s a link to a digitised version of the work in Trove, that link will be saved in Zotero’s url field. This makes it quicker and easier to view digitised items – …

View embedded JSON metadata for Trove's digitised books and journals: The metadata for digitised books and journals in Trove can seem a bit sparse, but there’s quite a lot of useful metadata embedded within Trove’s web pages that isn’t displayed to users or made available through the Trove API. This notebook in the GLAM Workbench shows you how you …

July 2022

Where did all those NSW articles go? Trove Newspapers Data Dashboard update!: I was looking at my Trove Newspapers Data Dashboard again last night trying to figure out why the number of newspaper articles from NSW seemed to have dropped by more than 700,000 since my harvesting began. It took me a while to figure out, but it seems that the search index was rebuilt on 31 May, …

Catching up – some recent GLAM Workbench updates!: There’s been lots of small updates to the GLAM Workbench over the last couple of months and I’ve fallen behind in sharing details. So here’s an omnibus list of everything I can remember… Data Weekly harvests of basic Trove newspaper data continue, there’s now about three months worth. You can …

Calling all Tasmanian historians – you can now save resources from Libraries Tasmania into Zotero!: I’ve created a Zotero translator for the Libraries Tasmania catalogue. Using it, you can save metadata and digital resources to your own research database with a single click. Libraries Tasmania actually has three catalogues rolled into one – the main library catalogue, the Archives catalogue, and …

Updated dataset! Harvests of Trove list metadata from 2018, 2020, and 2022 are now available on Zenodo: doi.org/10.5281/z… Another addition to the growing collection of historical Trove data. #GLAMWorkbench

Updated dataset! Details of 2,201,090 unique public tags added to 9,370,614 resources in Trove between August 2008 and July 2022. Useful for exploring folksonomies, and the way people organise and use massive online resources like Trove. doi.org/10.5281/z…

Ok, I’ve created a Zenodo community for datasets documenting changes in the content and structure of Trove. Lots more to add… zenodo.org/communiti…

Coz I love making work for myself, I’ve started pulling datasets out of #GLAMWorkbench code repos & creating new data repos for them. This way they’ll have their own version histories in Zenodo. Here’s the first: github.com/GLAM-Work…

June 2022

Ahead of my session at #OzHA2022 tomorrow, I’ve updated the NAA section of the #GLAMWorkbench. Come along to find out how to harvest file details, digitsed images, and PDFs, from a search in RecordSearch! github.com/GLAM-Work…

55,633 items digitised by the National Archives of Australia last week. Including: Bonegilla name index cards (A2751 & A2752): +42,434 CMF Personnel Dossiers (B884): +10,150 Aust Women’s Land Army personnel cards (C610): +961 github.com/wragge/na…

Newspapers added to Trove last week Freelance (WA) The Standard (WA) Berrigan Advocate (NSW) Baileys Sporting & Dramatic Weekly (WA) Farmers' Weekly (WA) Harvey-Waroona Mail (WA) W.A. Family Sphere (WA) Coonabarabran Times (NSW) github.com/wragge/tr…

Noticed that QueryPic was having a problem with some date queries. Should be fixed in the latest release of the Trove Newspapers section of the #GLAMWorkbench: glam-workbench.net/trove-new… #maintenance #researchinfrastructure

The Trove Newspapers section of the #GLAMWorkbench has been updated! Voilá was causing a problem in QueryPic, stopping results from being downloaded. A package update did the trick! Everything now updated & tested. glam-workbench.net/trove-new…

Some more #GLAMWorkbench maintenance – this app to download a high-res page images from Trove newspapers now doesn’t require an API key if you have a url, & some display problems have been fixed. trove-newspaper-apps.herokuapp.com/voila/ren…

The Trove Newspaper and Gazette Harvester section of the #GLAMWorkbench has been updated! No major changes to notebooks, just lots of background maintenance stuff such as updating packages, testing, linting notebooks etc. glam-workbench.net/trove-har…

Main changes to individual Trove newspapers last week: +19,862 articles in Daily News (WA) +10,822 articles in Dalgety’s Review (WA) +13,352 articles in Manning River News… (NSW) See: github.com/wragge/tr…

Changes to Trove newspapers last week: +86,761 articles +16,367 articles with corrections +5,355 articles with tags +417 articles with comments See: github.com/wragge/tr…

42,472 files were digitised by the National Archives of Australia last week. 36,238 of these were migrant registration cards from Bonegilla (A2571 & A2572). Here’s a screenshot of the top twenty series. More info: github.com/wragge/na…

I wrote up something for the #GLAMWorkbook on ‘Empty searches and hacking urls’: glam-workbook.net/url-hacki…

Under development – a Zotero translator for Libraries Tasmania!: I’ve created a Zotero translator for the Libraries Tasmania catalogue. Using it, you can save metadata and digital resources to your own research database with a single click. Libraries Tasmania actually has three catalogues rolled into one – the main library catalogue, the archives catalogue, and …

Ok, I’ve submitted my Libraries Tasmania translator to the Zotero repository for inclusion. No doubt a bit of additional tweaking will be required. github.com/zotero/tr…

Getting to work migrating the Real Face of White Australia transcription site from scribeAPI (no longer maintained) to Zooniverse Panoptes. First the workflows and config, then the subjects, then the current transcription data…

It’s getting there – new Real Face of White Australia site using Datasette, IIIF, and Universal Viewer…

Ordering some #GLAMWorkbench stickers…

May 2022

After much faffing about today I’ve got the latest version of the UniversalViewer building locally, and I also know how to listen for changes in the canvas index. Next, I have to do something about the CSS name clashes with Bulma…

Files digitised by the National Archives of Australia this week: 25,981 Top series: +14,025 in B884 (CMF personnel dossiers) +7,652 in A2572 (Bonegilla name index cards) See: github.com/wragge/na…

New in Trove this week: Berrigan Advocate (NSW) Tingha Spectator & North Western Journal (NSW) Tingha Miner & North Western Advocate (NSW) Boggy Camp Tingha & Bora Creek (NSW) See: github.com/wragge/tr…

Major changes to Trove newspaper titles this week: +30,713 in Queanbeyan Age (NSW, end date now 1944) +10,014 in Daily News (WA) +179 in Inverell Argus (NSW) See: github.com/wragge/tr…

Added to Trove newspapers this week: +43,015 articles +10,263 articles with corrections +7,499 articles with tags +243 articles with comments See: github.com/wragge/tr…

Using Datasette on Nectar: If you have a dataset that you want to share as a searchable online database then check out Datasette – it’s a fabulous tool that provides an ever-growing range of options for exploring and publishing data. I particularly like how easy Datasette makes it to publish datasets on cloud services like …

This week the National Archives of Australia digitised 21,488 files: +10,305 in B884 (CMF personnel dossiers) +7,339 in A2572 & A2571 (Bonegilla name index cards) +1,345 in A9301 (RAAF personnel files) See: github.com/wragge/na…

Major changes to individual Trove newspapers this week: +52,543 articles from The Daily News (WA) +22,316 articles from Sunraysia Daily (Vic) See: github.com/wragge/tr…

Changes to Trove newspapers this week: +74,859 articles +15,852 articles with corrections +8,668 articles with tags +592 articles with comments See: github.com/wragge/tr…

Convert your Trove newspaper searches to an API query with just one click!: I’m thinking about the Trove Researcher Platform discussions & ways of integrating Trove with other apps and platforms (like the GLAM Workbench). As a simple demo I modifed my Trove Proxy app to convert a newspaper search url from the Trove web interface into an API query (using the …

Also on tonight’s episode of ‘In GLAM This Week’, the NAA digitised 20,517 files. Of these: 13,850 were name index cards from the Bonegilla Migrant Camp (Series A2571 & A2572). 4,938 were CMF personnel files (Series B884) See: github.com/wragge/na…

This week’s changes in Trove newspapers: +7,451 articles +16,628 articles with corrections +8,148 articles with tags +491 articles with comments github.com/wragge/tr…

Another overdue maintenance task completed! The Tung Wah Newspaper index has been migrated from a custom Django app on a now defunct web host, to a nice, neat Datasette instance on Heroku. https://resources.chineseaustralia.org/tung_wah_newspaper_index

TIL you can do date maths in Trove. Searching for date:[NOW-10YEAR TO NOW] in newspapers & gazettes returns articles from the last 10 years. Try it: trove.nla.gov.au/search/ca…

My Trove researcher platform wishlist: The ARDC is collecting user requirements for the Trove researcher platform for advanced research. This is a chance to start from scratch, and think about the types of data, tools, or interface enhancements that would support innovative research in the humanities and social sciences. The ARDC will be …

Spending the evening updating the NAA section of the #GLAMWorkbench. Here’s a fresh harvest of the agency functions currently being used in RecordSearch… gist.github.com/wragge/d1…

This morning has been all bug hunting. But at least I’ve now found & fixed the problem, & released version 0.0.14 of the RecordSearch Data Scraper! https://github.com/wragge/recordsearch_data_scraper/releases/tag/v0.0.14 Also updated on Pypi.

Changes in Trove newspaper titles in the last week: +26,731 articles from Daily News (WA) +88,485 articles from Sunraysia Daily (Vic) +6,810 articles from Dalgety’s Review (WA) See: github.com/wragge/tr…

Changes in Trove newspapers in the past week: +124,404 articles available +17,306 articles with corrections +8,269 articles with tags +523 articles with comments See: github.com/wragge/tr…

Somewhat unexpectedly the US National Archives & Records Administration catalogue includes some historic photos of Tasmania… catalog.archives.gov/search

I’ve got a site, I suppose I need to add some content now… glam-workbook.net #GLAMWorkbook

Working with Trove data – a collection of tools and resources: The ARDC is organising a couple of public forums to help gather researcher requirements for the Trove component of the HASS RDC. One of the roundtables will look at ‘Existing tools that utilise Trove data and APIs’. Last year I wrote a summary of what the GLAM Workbench can contribute to …

New articles available this week from specific Trove newspapers: +39,296 from Daily News (WA) +39,810 from Australian Jewish Herald (Vic) +104,190 from Sunraysia Daily (Vic) More: github.com/wragge/tr…

Changes to Trove newspapers in the last week: +211,393 articles available +15,738 articles with corrections +8,233 articles with tags +499 articles with comments More here: github.com/wragge/tr…

April 2022

And so it starts… #GLAMWorkbench

Followed up my last FOI request about HASS research infrastructure to find out why the appendix was missing, & now it’s available! Includes an interesting (& selective) list of HASS infrastructure projects. www.dropbox.com/sh/uq7vz4…

And with that I think I’ll call it quits for #DayofDH2022 in this part of the world. Time to go check the backyard for pademelons before turning in. Hope all the DH-y types around the world have an interesting, inspiring & productive day!

Ok, I’ve created a new #GLAMWorkbench meta issue to try and bring together all the things I’m trying to do to improve & automate the code & documentation. This should help me keep track of things… github.com/GLAM-Work… #DayofDH2022

A couple of hours of #DayofDH2022 left – feeling a bit uninspired, so I’m going to do some pruning & reorganising of the #GLAMWorkbench issues list: github.com/GLAM-Work…

Hmm, didn’t get any writing done and now it’s time to cook dinner for the family. Bit of a frustrating day really. Might do a bit of coding later. #DayofDH2022

Workspace photo for #DayofDH2022. Old iMac for the socials & meetings. Newish laptop running Pop!_Os for dev work. To my left is a big window facing the backyard through which I can see visiting rosellas (Green & Eastern).

Now after coding, bug chasing, meeting, & prototype demo, I have to try and get into a headspace to do some writing about government surveillance records in the National Archives of Australia. But first some late lunch! #DayofDH2022

Meeting and demo done. Love working with ANU Archives, & really excited about the latest version of the Sydney Stock Exchange database. Still WIP but you can play with the prototype here: sydney-stock-exchange-xqtkxtd5za-ts.a.run.app/stock_exc… #DayofDH2022

I’ve hit a bit of a brick wall with my Datasette deployment. I’m probably missing something obvious. Suggestions welcome… github.com/simonw/da… #DayofDH2022

And Cloudstor is down for maintenance. I suppose I wan’t be doing that demo then… #DayofDH2022

Perhaps appropriately for #DayofDH2022, I’ve spent most of the morning trying to hunt down a bug. Everything works perfectly running locally, but fails mysteriously when deployed to the cloud…

Micro.blog offers another alternative for people wanting more control over their socials. I’m posting this from my Micro account – it’ll be cross-posted to Mastodon & Twitter, saved to GitHub, syndicated via RSS, and accessible from updates.timsherratt.org

Tracking Trove changes over time: I’ve been doing a bit of cleaning up, trying to make some old datasets more easily available. In particular I’ve been pulling together harvests of the number of newspaper articles in Trove by year and state. My first harvests date all the way back to 2011, before there was even a Trove …

March 2022

Adventures in FOI – HASS RDC Scoping Studies: So my FOI request to release the scoping studies that informed investments in the current round of ARDC-managed HASS research infrastructure development was partially successful. As I’ve previously noted, reports from the ARDC and Academy of Humanities are now publicly available. There was a …

The GLAM Workbench wants you!: Over the past few months I’ve been doing a lot of behind-the-scenes work on the GLAM Workbench – automating, standardising, and documenting processes for developing and managing repositories. These sort of things ease the maintenance burden on me and help make the GLAM Workbench sustainable, …

February 2022

Omeka S Tools – new Python package: Over the last couple of years I've been fiddling with bits of Python code to work with the Omeka S REST API. The Omeka S API is powerful, but the documentation is patchy, and doing basic things like uploading images can seem quite confusing. My code was an attempt to simplify common tasks, like …

January 2022

Zotero support in Australian GLAMs: Last year I started compiling information about the level of Zotero integration provided by Australian GLAM organisations though their online collections. The basic test is, can Zotero capture useful, structured information about an item from the collection interface. The results are not great. …

Testing, testing...: I regularly update the Python packages used in the different sections of the GLAM Workbench; though probably not as often as I should. Part of the problem is that once I've updated the packages, I have to run all the notebooks to make sure I haven't inadvertently broken something -- and this takes …

December 2021

Some big pictures of newspapers in Trove and DigitalNZ: One of the things I really like about Jupyter is the fact that I can share notebooks in a variety of different formats. Tools like QueryPic can run as simple web apps using Voila, static versions of notebooks can be viewed using NBViewer, and live versions can be spun up as required on Binder. It’s …

Exploring GLAM data at ResBaz: The video of my key story presentation at ResBaz Queensland (simulcast via ResBaz Sydney) is now available on Vimeo. In it, I explore some of the possibilities of GLAM data by retracing my own journey through WWI service records, The Real Face of White Australia, #redactionart, and Trove – ending up …

GLAM Workbench Nectar Cloud Application updated!: The newly-updated DigitalNZ and Te Papa sections of the GLAM Workbench have been added to the list of available repositories in the Nectar Research Cloud’s GLAM Workbench Application. This means you can create your very own version of these repositories running in the Nectar Cloud, simply by …

DigitalNZ & Te Papa sections of the GLAMWorkbench updated!: In preparation for my talk at ResBaz Aotearoa, I updated the DigitalNZ and Te Papa sections of the GLAM Workbench. Most of the changes are related to management, maintenance, and integration of the repositories. Things like: Setting up GitHub actions to automatically generate Docker images when the …

November 2021

A template for GLAM Workbench development: I’m hoping that the GLAM Workbench will encourage GLAM organisations and GLAM data nerds (like me) to create their own Jupyter notebooks. If they do, they can put a link to them in the list of GLAM Jupyter resources. But what if they want to add the notebooks to the GLAM Workbench itself? To make …

More thoughts on the Trove researcher platform for advanced research: Previously on ‘What could we do with $2.3 million?’, the National Library of Australia produced a draft plan for an ‘Advanced Researcher Platform’ that was thoroughly inadequate. Rather than submit this plan to the ARDC for consideration as part of the HASS RDC process, the NLA wisely decided to …

Coming up! GLAM Workbench at ResBaz(s): Want a bit of added GLAM with your digital research skills? You’re in luck, as I’ll be speaking at not one, but three ResBaz events in November. If you haven’t heard of it before, ResBaz (Research Bazaar) is ‘a worldwide festival promoting the digital literacy at the centre of modern research’. On …

New video – using the Trove Newspaper & Gazette Harvester: The latest help video for the GLAM Workbench walks through the web app version of the Trove Newspaper & Gazette Harvester. Just paste in your search url and Trove API key and you can harvest thousands of digitised newspaper articles in minutes!

Harvest newspaper issues as PDFs: An inquiry on Twitter prompted me to put together a notebook that you can use to download all available issues of a newspaper as PDFs. It was really just a matter of copying code from other tools and making a few modifications. The first step harvests a list of available issues for a particular …

October 2021

GLAM Workbench now in the Nectar Research Cloud!: The GLAM Workbench isn’t dependent on one big piece of technological infrastructure. It’s basically a collection of Jupyter notebooks, and those notebooks can be used within a variety of different environments. This helps make the GLAM Workbench more sustainable – new components can be swapped in …

More GLAM Name Index updates from Queensland State Archives and SLWA: A new version of the GLAM Name Index Search is available. An additional 49 indexes have been added, bringing the total to 246. You can now search for names in more than 10.2 million records from 9 organisations. The new indexes come from Queensland State Archives and the State Library of WA. QSA …

Getting data about newspaper issues in Trove: When you search Trove’s newspapers, you find articles – these articles are grouped by page, and all the pages from a particular date make up an issue. But how do you find out what issues are available? How do you get a list of dates when newspapers were published? This notebook in the GLAM …

GLAM Workbench at eResearch Australasia 2021: Way back in 2013, I went to the eResearch Australasia conference as the manager of Trove to talk about new research possibilities using the Trove API. Eight years years later I was back, still spruiking the possibilities of Trove data. This time, however, I was discussing Trove in the broader …

New Python package to download Trove newspaper images: There’s no reliable way of downloading an image of a Trove newspaper article from the web interface. The image download option produces an HTML page with embedded images, and the article is often sliced into pieces to fit the page. This Python package includes tools to download articles as …

September 2021

More records for the GLAM Name Index Search: Two more datasets have been added to the GLAM Name Index Search! From the History Trust of South Australia and Collab, I’ve added: Passengers in History – that’s 371,894 records of people arriving in South Australia from 1836 to 1961 Women’s Suffrage Petition 1894 (South Australia) – another 10,638 …

New preprint – ‘More than newspapers’: Here’s the preprint version of an article, ‘More than newspapers’, that I’ve submitted for a forum about Trove in a forthcoming issue of History Australia.

More QueryPic in action: Recently I created a list of publications that made use of QueryPic, my tool to visualise searches in Trove’s digitised newspapers. Here’s another example of the GLAM Workbench and QueryPic in action, in Professor Julian Meyrick’s recent keynote lecture, ‘Looking Forward to the 1950s: A …

Some thoughts on the ‘Trove Researcher Platform for Advanced Research’ draft plan: Late last year the Federal Government announced it was making an $8.9 million investment in HASS and Indigenous research infrastructure. This program is being managed by the ARDC and will lead to the development of a HASS Research Data Commons. According to the ARDC, a research data commons: brings …

August 2021

Some research projects that have used QueryPic : A Twitter thread about some of the research uses of QueryPic… QueryPic, my tool for visualising searches in @TroveAustralia’s digitised newspapers, has been around in different forms for more than 10 years. The latest version is part of the #GLAMWorkbench: https://t.co/qnY5tVDwgY …

Government publications in Trove: Over the last few weeks I’ve been updating my harvests of OCRd text from digitised books and periodicals in Trove. As part of the harvesting process, I’ve created lists of both that are available in digital form – this includes digitised works, as well as those that are born-digital (such as PDFs or …

GLAM Workbench – a platform for digital HASS research: We’re in the midst of planning for the HASS Research Data Commons, which will deliver some much-needed investment in digital research infrastructure for the humanities and social sciences. Amongst the funded programs are tools for text analysis as part of the Linguistics Data Commons, and a platform …

A Family History Month experiment – search millions of name records from GLAM organisations: There’s a lot of rich historical data contained within the indexes that Australian GLAM organisations provide to help people navigate their records. These indexes, often created by volunteers, allow access by key fields such as name, date or location. They aid discovery, but also allow new forms of …

Explore Trove’s digitised books: The Trove books section of the GLAM Workbench has been updated! There’s freshly-harvested data, as well as updated Python packages, integration with Reclaim Cloud, and automated Docker builds. Included is a notebook to harvest details of all books available from Trove in digital form. This includes …

A miscellany of ephemera, oddities, & estrays: I’m just in the midst of updating my harvest of OCRd text from Trove’s digitised books (more about that soon!). But amongst the items catalogued as ‘books’ are a wide assortment of ephemera, posters, advertisements, and other oddities. There’s no consistent way of identifying these items through the …

Everyday heritage and the GLAM Workbench: Some good news on the funding front with the success of the Everyday Heritage project in the latest round of ARC Linkage grants. The project aims to look beyond the formal discourses of ‘national’ heritage to develop a more diverse range of heritage narratives. Working at the intersection of place, …

Recent GLAM Workbench presentations: So far this year I’ve given eight workshops or presentations relating to the GLAM Workbench, with probably a few more yet to come. Here’s the latest: Introducing the GLAM Workbench, presentation for the Griffith University Centre for Social and Cultural Research, Digital Humanities Seminar Series, …

Updated! Lots and lots of text freshly harvested from Trove periodicals: For a few years now I’ve been harvesting downloadable text from digitised periodicals in Trove and making it easily available for exploration and research. I’ve just completed the latest harvest – here’s the summary: 1,163 digitised periodicals had text available for download Text was downloaded …

New dataset – Politicians talking about COVID: The Trove Journals section of the GLAM Workbench includes a notebook that helps you download press releases, speeches, and interview transcripts by Australian federal politicians. These documents are compiled and published by the Parliamentary Library, and the details are regularly harvested into …

July 2021

8 million Trove tags to explore!: I’ve always been interested in the way people add value to resources in Trove. OCR correction tends to get all the attention, but Trove users have also been busy organising resources using tags, lists, and comments. I used to refer to tagging quite often in presentations, pointing to the different …

Integrating GLAM Workbench news and discussion: I’ve spent a lot of time this year working on ways of improving the GLAM Workbench’s documentation and its integration with other services. Last year I created OzGLAM Help to provide a space where users of GLAM collections could ask questions and share discoveries – including a dedicated GLAM …

GLAM Workbench now on YouTube!: I’ve started creating short videos to introduce or explain various components of the GLAM Workbench. The first video shows how you can visualise searches in Trove’s digitised newspapers using the latest version of QueryPic. It’s a useful introduction to the way access to collection data enables us …

June 2021

GLAM Workbench office hours: To help you make use of the GLAM Workbench, I’ve set up an ‘office hours’ time slot every Friday when people can book in for 30 minute chats via Zoom. Want to talk about how you might use the GLAM Workbench in your latest research project? Are you having trouble getting started with GLAM data? Or …

‘Missing Links’ – new open access article!: An article written by Kate Bagnall and me has just been published in a special issue of the Journal of World History focusing on digital history. And it’s open access! The article is ‘Missing Links: Data Stories from the Archive of British Settler Colonial Citizenship’. In it we document our efforts …

QueryPic: The Next Generation: QueryPic is a tool to visualise searches in Trove’s digitised newspapers. I created the first version way back in 2011, and since then it’s taken a number of different forms. The latest version introduces some new features: Automatic query creation – construct your search in the Trove web …

Everyone gets a Lab!: I recently took part in a panel at the IIPC Web Archiving Conference discussing ‘Research use of web archives: a Labs approach’. My fellow panellists described some amazing stuff going on in European cultural heritage organisations to support researchers who want to make use of web archives. My …

Minor change to Reclaim Cloud config: When the 1-click installer for Reclaim Cloud works its magic and turns GLAM Workbench repositories into your own, personal digital labs, it creates a new work directory mounted inside of your main Jupyter directory. This new directory is independent of the Docker image used to run Jupyter, so it’s a …

Preprint! The limits and affordances of online collections: I’ve been working on an essay for publication in a forthcoming edited collection. I wanted to explore how the practice of history in Australia had been changed by GLAM organisations making their collections available online – both the new possibilities that had emerged, and the problems that …

Trove Query Parser: Here’s a new little Python package that you might find useful. It simply takes a search url from Trove’s Newspapers & Gazettes category and converts it into a set of parameters that you can use to request data from the Trove API. While some parameters are used both in the web interface and the …

Some GLAM Workbench stats: I deliberately don’t keep any stats about GLAM Workbench visits, because I think they’re pretty meaningless. On the other hand, I’m always interested to see how often GLAM Workbench repositories are launched on Binder. Rather than just random clicks, these numbers represent the number of times users …

More Reclaim Cloud integrations!: Five of the GLAM Workbench repositories now have automatically built Docker images and 1-click integration with Reclaim Cloud – ANU Archives, Trove Newspapers, Trove Newspaper Harvester, NAA RecordSearch, & Web Archives. This means you can launch your very own version of these GLAM Workbench …

Get your GLAM datasets here!: I’ve updated my harvest of Australian GLAM datasets from state/national government open data portals. There’s now 387 datasets, containing 1049 files (including 684 CSVs). There’s a list if you want to browse, and a CSV file if you want to download all the metadata. For more more information see the …

May 2021

NAA RecordSearch section of the GLAM Workbench updated!: If you work with the collections of the National Archives of Australia, you might find the RecordSearch section of the GLAM Workbench helpful. I’ve just updated the repository to add new options for running the notebooks, including 1-click installation on Reclaim Cloud. There’s also a few new …

Web archives section of GLAM Workbench updated!: My program of rolling out new features and integrations across the GLAM Workbench continues. The latest section to be updated is the Web Archives section! There are no new notebooks with this update, but some important changes under the hood. If you haven’t used it before, the Web Archives section …

Using web archives to find out when newspapers were added to Trove: There’s no doubt that Trove’s digitised newspapers have had a significant impact on the practice of history in Australia. But analysing that impact is difficult when Trove itself is always changing – more newspapers and articles are being added all the time. In an attempt to chart the development of …

GLAM Jupyter Resources: To make it easier for people to suggest additions, I’ve created a GitHub repository for my list of GLAM Jupyter examples and resources. Contributions are welcome! This list is automatically pulled into the GLAM Workbench’s help documentation. #dhhacks

Running notebooks – a sign of things to come in the GLAM Workbench: I recently made some changes in the GLAM Workbench’s Help documentation, adding a new Running notebooks section. This section provides detailed information of running and managing GLAM Workbench repositories using Reclaim Cloud and Docker. I’m still rolling out this functionality across all the …

Sponsor my work on GitHub!: As I foreshadowed some weeks ago, I’ve shut down my Patreon page. Thanks to everyone who has supported me there over the last few years! I’ve now shifted across to GitHub Sponsors, which is focused on supporting open source projects. This seems like a much better fit for the things that I do, which …

Updates to the Trove Newspapers section of GLAM Workbench: I’ve updated, refreshed, and reorganised the Trove newspapers section of the GLAM Workbench. There’s currently 22 Jupyter notebooks organised under the following headings: Trove newspapers in context – Notebooks in this section look at the Trove newspaper corpus as a whole, to try and understand …

April 2021

Introducing the new, improved RecordSearch Data Scraper!: It was way back in 2009 that I created my first scraper for getting machine-readable data out of the National Archives of Australia’s online database, RecordSearch. Since then I’ve used versions of this scraper in a number of different projects such as The Real Face of White Australia, Closed …

Secrets and lives: Here’s the video of my presentation, ‘Secrets and lies’, for the (Re)create symposium at the University of Canberra, 21 April 2021. It’s mainly about finding and resting redactions in ASIO surveillance files held by the National Archives of Australia. Secrets and lives from Tim Sherratt on Vimeo. …

March 2021

Recently digitised files in the National Archives of Australia: I’m interested in understanding what gets digitised and when by our cultural institutions, but accessible data is scarce. The National Archives of Australia lists ‘newly scanned' records in RecordSearch, so I thought I’d see if I could convert that list into a machine-readable form for analysis. …

Moving on from Patreon...: Over the last few years, I’ve been very grateful for the support of my Patreon subscribers. Financially, their contributions have helped me cover a substantial proportion of the cloud hosting costs associated with projects like Historic Hansard and The Real Face of White Australia. But, more …

What can you do with the GLAM Workbench?: You might have noticed some changes to the GLAM Workbench home page recently. One of the difficulties has always been trying to explain what the GLAM Workbench actually is, so I thought it might be useful to put more examples up front. The home page now lists about 25 notebooks under the headings: …

Reclaim Cloud integration coming soon to the GLAM Workbench: I’ve been doing a bit of work behind the scenes lately to prepare for a major update to the GLAM Workbench. My plan is to provide one click installation of any of the GLAM Workbench repositories on the Reclaim Cloud platform. This will provide a useful step up from Binder for any researcher who …

Some recent GLAM Workbench presentations: I’ve given a couple of talks lately on the GLAM Workbench and some of my other work relating to the construction of online access to GLAM collections. Videos and slides are available for both: From collections as data to collections as infrastructure: Building the GLAM Workbench, seminar for the …

Some GLAM Workbench datasets to explore for Open Data Day: It was Open Data Day on Saturday 6 March – here’s some of the ready-to-go datasets you can find in the GLAM Workbench – there’s something for historians, humanities researchers, teachers & more! First here’s a list of Australian GLAM (that’s galleries, libraries, archives & museums) data …

February 2021

Zotero translator for NAA RecordSearch updated: The recent change of labels from ‘Barcode' to ‘ItemID’ in the National Archives of Australia’s RecordSearch database broke the Zotero translator. I’ve now updated the translator, and the new version has been merged into the Zotero translators repository. It should be updated when you restart …

TroveNewsBot upgraded – now sharing articles published 'on this day'!: @TroveNewsBot has been sharing Trove newspaper articles on Twitter for over 7 years. With its latest upgrade the bot now has an ‘on this day’ function. Every day at AEST9.00am, TroveNewsBot will share an article published on that day in the past. Even better, you can make your own ‘on this day' …

The NAA recently changed field labels in RecordSearch, so that ‘Barcode' is now ‘Item ID’. This required an update to my recordsearch_tools screen scraper. I also had to make a few changes in the RecordSearch section of the GLAM Workbench. #dhhacks

Open access publishing for Australian historians: After some recent investigations of the availability of open access versions of articles published in paywalled Australian history journals, I’ve started a Google doc to capture useful links and information for Australian historians wanting to make their research open access. Comments and additions …

Who was linking to Trove newspapers in 2014?: In 2014 I pulled together a sample of web pages that included links back to digitised newspaper articles in Trove and created the ‘Trove Traces’ app. It was interesting, and sometimes disturbing, to see the diversity of sites that made use of Trove. Amongst the family and local history enthusiasts …

New! DigitalNZ API Query Builder added to GLAM Workbench: I’ve added an API Query Builder to the DigitalNZ section of the GLAM Workbench. You can use it to learn about the different parameters available from the search API, and experiment with different queries. Just get your API key from DigitalNZ, then try entering keywords and selecting options. Once …

January 2021

OpenGLAM fireworks! Finding open collections in DigitalNZ: Lately I’ve been updating and expanding the notebooks in the DigitalNZ section of the GLAM Workbench. In particular, I’ve been looking at the usage facet to understand how much of the aggregated content is ‘open’. What do I mean by ‘open’? The Open Knowledge Foundation definition states that ‘open …

Easy browsing of Trove newspapers with these keyboard shortcuts!: If you like browsing Trove’s digitised newspapers page by page, you might have found that the current interface is a bit clunky. To move between pages you have to hover over the page number and click on ‘Next’ or ‘Previous’. Wouldn’t it be good if you could just …

New dataset and notebooks – twenty years of ABC Radio National: There’s a new GLAM Workbench section for working with data from Trove’s Music & Sound zone! Inside you’ll find out how to harvest all the metadata from ABC Radio National program records – that’s 400,000+ records, from 160 Radio National programs, over more than 20 years. It’s …

Finding non-English newspapers in Trove: There are a growing number of non-English newspapers in Trove, but how do you know what’s there? After trying a few different approaches, I generated a list of 48 newspapers with non-English content. The full details are in this notebook). As the notebook describes, I found the language …

Open access versions of Australian history articles: Last year I did some analysis of the availability of open access versions of research articles published between 2008 and 2018 in Australian Historical Studies. I’ve now broadened this out to cover all individual articles (with a DOI) across a number of journals. It’s pretty grim. …

A long thread exploring files in the National Archives of Australia with the access status of ‘closed’. This is the 6th consecutive year I’ve harvested ‘closed’ files on or about 1 January. It’s January 1, the day each year when our minds turn to newly released Cabinet records from …

More updates from The Real Face of White Australia – running facial detection code over NAA: SP42/1. Finished! NAA: SP42/1 is a general correspondence series from the Collector of Customs in Sydney. It includes many files relating to the administration of the White Australia Policy. 3,375 files have …

I reharvested NAA: ST84/1 and ended up with 14,545 images from 461 digitised files (about 17% of the total series). In these images I found 9,970 faces – this is a couple of thousand more than when I used OpenCV in 2010/11 for the original wall of faces. https://t.co/BAnkX7u83S — Tim Sherratt …

December 2020

GLAM Workbench wins British Library Labs Research Award!: Asking questions with web archives – introductory notebooks for historians has won the British Library Labs Research Award for 2020. The awards recognise ‘exceptional projects that have used the Library’s digital collections and data’. This project gave me a chance to work with web …

Want to relive the early days of digital humanities in Australia? I’ve archived the websites created for THATCamp Canberra in 2010, 2011, and 2014. They’re now static sites so search and commenting won’t work, but all the content should be there! #dhhacks

The Invisible Australians website has been given a much needed overhaul, and we’ve brought all our related projects together under the title The real face of White Australia. This includes an updated version of the wall of faces. #dhhacks

The GLAM Workbench as research infrastructure (some basic stats): Repositories in the GLAM Workbench have been launched on Binder 3,529 times since the start of this year (according to data from the Binder Events log). That’s repository launches, not notebooks. Having launched a repository, users might use multiple notebooks. And of course these stats don’t …

November 2020

Earlier this year I gave a seminar for the International Internet Preservation Consortium (IIPC) introducing the web archives section of the GLAM Workbench. The seminar is now available online: youtu.be/rVidh_wex… Here are the slides if you want to follow along. #dhhacks

Harvest text from the Australian Women's Weekly!: The Trove Newspaper & Gazette Harvester has been updated to version 0.4.0. The major change is that if the OCRd text for an article isn’t available through the API, it will be automatically downloaded via the web interface. What does this mean in practice? Well previously you …

Beyond the copyright cliff of death: If you’ve done any searching in Trove’s digitised newspapers, you’ve probably noticed that there aren’t many results after 1954. This is basically because of copyright restrictions (though given the complexities of Australia’s copyright system, you can’t be sure …

Updated! Find Trove newspapers by place of publication by using this simple interface – just click on the map to find the 10 closest newspapers. Now including newspapers added to Trove since June. You can also browse the locations of all newspapers across Australia. The underlying data file is …

October 2020

I’ve added a new section to the GLAM Workbench for the ANU Archives. The first set of notebooks relates to the Sydney Stock exchange stock and share lists. As the content note describes: These are large format bound volumes of the official lists that were posted up for the public to see - 3 times a …

Any regular user of RecordSearch, the National Archives of Australia’s online database, will understand its frustrations. But here’s a handy little hack to fix a couple of annoying problems and add some useful functionality! The RecordSearch Show Pages userscript updates links to digitised files in …

I’ve added more years to my repository of Commonwealth Hansard! The repository now includes XML-formatted text files for both houses from 1901 to 1980, and 1998 to 2005. I’ve done some more checking and confirmed that the XML files for 1981 to 1997 aren’t currently available through ParlInfo, …

It was Open Access Week last week, so I tried a little experiment. How many research articles published in Australian Historical Studies between 2008 and 2018 are available via Open Access? Just 9.5% (23 out of 242). This is despite the fact that all articles published in 2018 or earlier are outside …

September 2020

The Trove Newspaper and Gazette Harvester has been updated to include the snippet field in the harvested metadata. https://ozglam.chat/t/trove-newspaper-gazette-harvester-updated-to-version-0-3-3/56 #dhhacks

Calling users of Australian galleries, libraries, archives, & museums – OzGLAM Help is now live! Ask a question or simply share your latest discoveries. There’s handy tips, news about recent developments, & links to useful tools. Please use & share! #dhhacks

The Zotero translator for RecordSearch (the National Archives of Australia’s online database) has been updated. There’s many fixes and enhancements — see the full details. #dhhacks

If you try to share or bookmark the url of an item in RecordSearch (the National Archives of Australia’s online database), you’ll often get a ‘Session time out’ error when you access it. That’s because the urls only work within the current active RecordSearch session. So how can you create a …

The Zotero translator for Trove was failing on newspaper articles with tags. I’ve submitted a fix for approval: github.com/zotero/tr… I’m not sure yet whether the capture of works and search results can be fixed following the Trove redesign. React is not very scraper friendly…

August 2020

Another #GLAMWorkbench update! Snip words out of @TroveAustralia newspaper pages and create big composite images. OCR art! glam-workbench.github.io/trove-new… #dhhacks

Just in time for #GovHack, I’ve given the Trove API Console a major overhaul. It’s been updated for the latest API versions and has MANY MANY more examples. Explore all the data you can get from @TroveAustralia! troveconsole.herokuapp.com #dhhacks

July 2020

Ok, so do you want to make your own ‘scissors & paste’ messages using words from @TroveAustralia newspaper articles? Go to the notebook in #GLAMWorkbench & click on ‘Run live on Binder in Appmode’. #dhhacks

Another #GLAMWorkbench update! The Trove Harvester will now download both newspaper and gazette articles in bulk. You can optionally include full text, and save copies of the articles as images and PDFs. #dhhacks glam-workbench.github.io/trove-har…

Interested in using web archives in your research? Join us on 5/6 August for a free @netpreserve webinar introducing the tools and examples available in the new #webarchives section of the #GLAMWorkbench. There are two timeslots to cover multiple timezones: www.eventbrite.com/e/iipc-rs… and …

Introducing a brand new section of the #GLAMWorkbench, exploring the @MuseumsVictoria collection API. Harvest species records, display random images, and download ALL THE ANTECHINUSES! glam-workbench.github.io/museumsvi… #dhhacks

New additions to the @TroveAustralia books section of the #GLAMWorkbench – word frequency examples with OCRd text from digitised books, and a random recipe generator powered by a 19th C cook book! glam-workbench.github.io/trove-boo… #dhhacks

With the recent changes to @TroveAustralia, the Australian Women’s Weekly cover browser was retired. As a low-tech alternative, I’ve harvested all the cover images from the Women’s Weekly and saved them into PDFs for easy browsing, one for each decade. There are 2,566 images from 1933 to 1982. …

The Trove books section of the #GLAMWorkbench has been updated. There’s a fresh harvest of OCRd text & the notebooks have been changed to work with the new @TroveAustralia interface. Download & explore 24,620 files (3gb) of OCRd text! #dhhacks

Revisiting my Historic Hansard XML repository & realising how easy it is to load files as needed via the GitHub API & explore with Pandas & Jupyter. This #GLAMWorkbench notebook helps you explore a particular year/house. #dhhacks

The Trove Journals section of the #GLAMWorkbench has been updated to work with the new @TroveAustralia interface! I’ve also re-harvested ALL the OCRd text from digitised journals — 6gb of text from 397 journals now downloadable in bulk from CloudStor. #dhhacks

New in #GLAMWorkbench! After you’ve used the @TroveAustralia Newspaper Harvester to download lots & lots of articles, try exploring the results in Datasette. This notebook sets everything up, you can even add full text search & images! #dhhacks

June 2020

Download newspaper articles in bulk! The Trove Newspaper Harvester has been updated to work with the new @TroveAustralia interface. I’ve also added the ability to save articles as .jpg images! The easiest way to get started is via the #GLAMWorkbench. #dhhacks

My app for searching in @TroveAustralia’s digitised journals has been updated to work with the new Trove interface. You’ll need to have switched over to the new interface before you try searching (just click the link on the Trove home page). #dhhacks

Another db migrated and app updated! Have you ever wondered what interjections in historic hansard would look like as tweets? Well I did… Now with longer interjections & more emojis! hansard-interjections.herokuapp.com/tweets/ #dhhacks

Here’s a map of places where @TroveAustralia digitised newspapers were published/circulated. Click on the map to find the closest newspapers to a place. Updated with new titles from the last year! troveplaces.herokuapp.com/map/ #dhhacks

May 2020

New GLAM Workbench section on web archives!: We tend to think of a web archive as a site we go to when links are broken – a useful fallback, rather than a source of new research data. But web archives don’t just store old web pages, they capture multiple versions of web resources over time. Using web archives we can observe change – we …

Thanks to @NetPreserve, I’ve been spending time lately working on a set of web archive exploration notebooks for the #GLAMWorkbench. Here’s an example to create/compare screenshots of captures. #dhhacks

April 2020

Do you have a CSV file you’d like to make searchable, maybe even share online? New on #dhhacks, I show you how with @simonw’s awesome Datasette tool & @Glitch. Give it a try! 101dhhacks.net/share-sea…

New on #dhhacks – make your own @TroveAustralia newspaper game! Thanks to @glitch, just edit a couple of files to create your own customised edition of Headline Roulette – make it about cats, or Queensland, or Communist Party newspapers, or whatever!

I’ve given my #dhhacks site a refresh, and updated my @TroveAustralia Twitter bot tutorial to link to the latest versions of the bots on @glitch. The new code is actually easier to customise, so plenty of opportunities to play around! More DHHacks coming soon…

If you’d ever wished you could get a random(ish) newspaper article from @TroveAustralia’s API, here’s a hack for you! I’ve added an option to return a random article to my Trove proxy app. You can filter by normal API facets. Go to: trove-proxy.herokuapp.com #dhhacks

The GLAM CSV Explorer has had a few updates — you can now filter by organisation, and upload your own CSV files! #GLAMWorkbench Try it live on Binder.

March 2020

Buildings might be closed, but the data is open – explore hundreds of datasets from Australian GLAM organisations!: For a couple of years I’ve been harvesting datasets created or published by Australian GLAM organisations through government data portals. I’ve just completed the latest harvest, and there’s now 369 datasets, containing 983 files, from 23 GLAM organisations. 628 of these files are in CSV …

Updated! My notebook to upload digitised newspapers from @TroveAustralia to an @Omeka-S site has been improved — no longer trips over non-newspaper articles in Zotero collections, and does a check to avoid uploading existing articles. #dhhacks

My data file of public holidays in NSW from 1900-1950 has been updated – now including variations in the King’s Birthday holiday and extra days like VE Day. #dhhacks

My harvest of OCRd text from @TroveAustralia digitised books, ephemera, and parliamentary papers has been updated! There’s now 19,795 text files (about 3gb) to explore! Harvesting details and links to browse/download files from Cloudstor are in the #GLAMWorkbench. #dhhacks

The simple Trove proxy that you can use to get to download links for PDFs of newspaper articles from @TroveAustralia has been updated to Python 3, and Trove API version 2. It’s used in @Zotero and elsewhere… #dhhacks

I’ve added some more documentation to the Trove Newspaper Harvester page in the #GLAMWorkbench. Get your @TroveAustralia newspaper articles in bulk! #dhhacks #collectionsasdata

February 2020

New section added to the #GLAMWorkbench with examples from @Library_Vic! #slvdata #dhhacks #collectionsasdata

More fun with @iiif_io and images from @library_vic – resize, rotate, crop and more! Try it out with this new notebook in the #GLAMWorkbench. #slvdata #dhhacks

New #GLAMWorkbench notebook! Download images from @Library_Vic using IIIF and Handle… #dhhacks

And for a taste of the recent additions to @TroveAustralia’s digitised journals, check out this thread.

Explore @TroveAustralia’s digitised journals with this simple app. Now updated with the latest additions to Trove, and a new page for government publications. #dhhacks

Want to save @TroveAustralia newspaper articles as images (that aren’t sliced up in annoying ways)? There’s an app for that in the #GLAMWorkbench. #dhhacks

I’ve updated the repository of data transcribed from White Australia policy records in @naagovau. Remember to follow @invisibleaus for daily snapshots.

I’ve added a few updates to my ‘Digital tools and such like…’ list for Australian historians. Hope you find it useful! #ozhist

New ‘Trove images' section added to the #GLAMWorkbench! Here you’ll find my latest Jupyter notebook harvesting data about the use of standard licences & rights statements in Trove’s picture zone. #dhhacks

Ok, the LODBook Jekyll plugin is cleaned & commented enough for me to put it aside for a while and work on the theme and data structure. I also took the opportunity to improve the way context strings are created…

Voting in the 2019 @dhawards is now open! Go and check out all the cool #DigitalHumanities projects from around the world. And while you’re there, you might like to vote for my #GLAMWorkbench in the ‘Tools’ category!

January 2020

Gathering together videos of past presentations. Here they are on YT: https://www.youtube.com/playlist?list=PLAclcciEeCD3INz0o1t_E9-bkW_BWTmSv and Vimeo: vimeo.com/showcase/…

December 2019

Archived via Zenodo – ‘Inigo Jones: the weather prophet’ – exploring our desire for certainty amidst a highly variable climate. doi.org/10.5281/z…

Archived via Zenodo – ‘Civilization versus the giant, winged lizards’ – a thing I wrote about the climate emergency in 2006. doi.org/10.5281/z…

One of my favourite things this year was finally publishing ‘The People Inside' with @baibi — describing the start of our @invisibleaus experiments eight years ago with records relating to the White Australia policy held by @naagovau. doi.org/10.5281/z…

November 2019

New #GLAMWorkbench section with examples of how to get random-ish works and newspaper articles from @TroveAustralia. #dhhacks

Did you make a DIY Exhibition from a @TroveAustralia list using my GitHub starter kit? If so, it’s probably broken. Never fear, here’s how you can upgrade your exhibition to use version 2 of the Trove API. #DHHacks

The death and (hopefully) resurrection of Trove Twitter bots: Today version 1 of the Trove API was decommissioned. As I explained elsewhere, this meant that a number of Trove Twitter bots also died. The problem is that version 2 of the API provides no easy way to randomly select records. Bots, and other apps that share random content, require major reworking. …

October 2019

Updated my NSW public holidays data to include a few extras proclaimed by the government: nbviewer.jupyter.org/github/wr…

Creators and users of my Trove Twitter Bots, please read and share this update!: tl;dr Version 1 of the Trove API will be discontinued soon so Trove Twitter bots need to be upgraded. Unfortunately, Version 2 of the Trove API doesn’t support the random selection of resources, so the current behaviour of many bots will change. The problem In January 2018, I created a series of …

September 2019

Over the last few weeks I’ve been exploring ways of recording dates for 70,000 digitised pages from Sydney Stock Exchnage records in the @TheANUArchives. Here’s the progress so far…

Here’s my attempt to calculate NSW holidays from 1900 to 1950. It’s probably incomplete, but it’s a start… nbviewer.jupyter.org/github/wr…

A couple of years ago I gave a talk in which I tried to justify what I do as research. I was going to turn it into an article, but never did. So here’s ‘The multiplication of contexts’ as a blog post.

The @naagovau RecordSearch section of the #GLAMWorkbench has been updated with more notebooks to help you get Australian archives data in a usable form. glam-workbench.github.io/recordsea… Useful for #twitterstorians, #ozhist, & #govhack!

Crikey, my notebook for getting useful data out of @naagovau keeps growing! Now with sections on tackling large series, and harvesting ALL the images from digitised files. https://nbviewer.jupyter.org/github/GLAM-Workbench/recordsearch/blob/master/harvesting_items_from_a_search.ipynb

Want to save searches for items in @naagovau’s RecordSearch as CSVs for exploration & analysis? This notebook walks through the process of constructing, managing, and saving data harvests. #dhhacks

August 2019

I’ve updated my harvest of OCRd text from digitised journals in @TroveAustralia. The complete dataset now includes 33,035 issues from 720 titles – about 8gb of text to explore. Details in the #GLAMWorkbench: glam-workbench.github.io/trove-jou… #dhhacks

My app to browse & search @TroveAustralia’s digitised journals has been updated! Since 4 July, 112 new titles & 86,211 new articles have been added to Trove. Many of these new titles are parliamentary papers. Explore here: trove-titles.herokuapp.com #dhhacks

Another WIP notebook in need of additional documentation… This one explores the stats around volunteer correction of OCR errors in @TroveAustralia’s newspapers. More to come!

And this notebook uses TF-IDF to explore the OCRd text of a digitised journal from Trove. Get the top TF-IDF scores for each year across a journal’s life and see how they change. More documentation coming!

This notebooks lets you download the OCRd text of a digitised journal from @TroveAustralia (via CloudStor) and then explore word frequencies over time. More documentation coming soon!

A new notebook looking at the data about digitised journals on @TroveAustralia. #dhhacks

There’s a new section of the GLAM Workbench devoted to the National Museum of Australia collection API! Harvest @nma data, then explore it by time and place. #dhhacks

July 2019

The second new notebook looks at @TroveAustralia’s newspapers as a whole, visualising both by time and by state. Along the way it looks at favourites such as the WWI effect and the copyright cliff of death. #dhhacks

Some brand new Jupyter notebooks for those interested in #ozhist & digital exploration of @TroveAustralia’s newspapers. The first walks through different ways of visualising newspaper searches over time. #dhhacks

I’ve updated the @invisibleaus data repository with latest transcriptions/markings from White Australia Policy records in @naagovau.

According to my last harvest, @TroveAustralia’s digitised journals comprise 31,216 separate issues. Here are the number of issues by year.

Updates to the Trove newspapers section of GLAM Workbench – adding links to app-ified versions of some notebooks, & direct links to @mybinderteam for everything. If you work with @TroveAustralia newspapers you might find it useful.

Download & explore 1,499,259 rows of open data from NSW State Archives Online Indexes: NSW State Archives publishes a number of detailed indexes containing data manually extracted from their records. These provide additional entry points to the records, such as a person’s name, or a place. But they also provide useful data for analysis. However, to explore the index data we need …

Visualising CV-detected column widths across 100 volumes (30,000+ pages) of Sydney Stock Exchange records from @TheANUArchives…

New in GLAM Workbench! Notebooks to harvest, index, analyse, and aggregate transcripts of speeches & interviews by Australian prime ministers. Plus links to harvested data and aggregated files. #dhhacks

I’ve updated my harvest of the PM Transcripts site — 22,814 XML files with transcripts of speeches, interviews, media releases etc by Australian Prime Ministers. Now with added @TurnbullMalcolm… Repository includes an index of the files, and aggregations by PM. #dhhacks

Reorganising things a little at GLAM Workbench. @statelibrarynsw gets its own section. Hansard and @datagovau GLAM datasets now under ‘Australian government’. Making some space for further additions…

What’s that? You want MORE GLAM data? Well, I’ve started a list of sources for Australian GLAM data. Metadata, full text, images & more. Contributions welcome! #dhhacks

I’ve updated my harvest of GLAM datasets from data.gov.au. Now there’s 584 CSV files available for download! #dhhacks

I’ve put a copy of my article on using @TroveAustralia for digital research/play, written for the @HTANSW journal, up on my blog. #dhhacks

I’ve updated the list of orgs who have supported the digitisation of @TroveAustralia’s journals. As usual @statelibrarynsw leads the way, but great to see @dvaaus supporting online access.

Today I finished updating a harvest of all OCRd text available from Trove’s digitised journals. That’s about 7gb of text from 30,462 issues of 384 different journals — a fab corpus for text analysis! Here’s all the metadata, links, and harvesting code. @TDHASSN #dhhacks

Update time! Yesterday I updated my Trove digitised journals app to include all the exciting new titles added to Trove in the last few months. This includes ABC Weekly, Current Notes on International Affairs & much more. #dhhacks

A quick interactive view of newspaper articles in @TroveAustralia by state and year. Click on the bars or legend to filter by state. Jupyter notebook on its way… vega.github.io/editor/

Anyone who’s been to one of my Trove workshops will be pleased to know that the WWI effect is still evident when viewing the total number of @TroveAustralia newspaper articles by year. As is the copyright cliff of death…

So there are now almost twice as many newspaper articles in @TroveAustralia from NSW as there are from any other state. (cc @statelibrarynsw)

June 2019

Well look at that! – a selection of my @TroveAustralia related Jupyter notebooks turned into simple apps using Voila and delivered via Heroku. Save complete newspaper articles as images, create thumbnails, or download pages! #dhhacks

Kicked off a new GLAM Workbench repository dedicated to @SLSA with a quick notebook hack to get higher res versions of digitised photos. #dhhacks

Search @TroveAustralia newspapers without leaving Twitter using the updated and enhanced @TroveNewsBot! After 6 years of regular tweeting, TroveNewsBot needed an upgrade. Check out all its new features, including article thumbnails, here. #dhhacks

Recent additions to the Trove Newspapers section of the GLAM Workbench: getting images from @TroveAustralia newspaper articles, and uploading article to @Omeka-S: glam-workbench.github.io/trove-new…

Want to upload @TroveAustralia newspaper articles to @Omeka-S to create an exhibition or populate a research database? This notebook collects article references from a search, a Trove list, or Zotero, & uploads metadata, images & PDF to your Omeka site. #dhhacks

Slides ready for tomorrow’s workshop at @unicanberra – Trove as a pltform for digital research & creativity. Here’s a sneak peek: slides.com/wragge/tr…

May 2019

Ever wanted to save a @TroveAustralia newspaper article as an image? This notebook lets you do just that. Paste in an article url and you get nice big JPGs to download. It even works with article spread across multiple pages. #dhhacks

More GLAM Workbench updates! More full text of Australian books! I’ve added the notebook & data from my harvest of @TroveAustralia books in the @InternetArchive. There’s metadata and text of 1,153 books to explore. #dhhacks

2019 has been pretty busy so far! I just compiled a list of tools, updates, and examples from the last few months for my @Patreon supporters.

Here’s how you can get the text of Australian books in @TroveAustralia from the Internet Archive (via the Open Library). #dhhacks

I’ve updated the data that sits behind my Trove Places app and added more than 140 newspaper titles. To find @TroveAustralia newspapers published in any region of Australia simply click on the map! #dhhacks

If you’re researching foreign policy using @naagovau you might find this little tool useful – it tries to find the file containing a specific numbered cable. Good for tracking down rogue references! Try it live! #dhhacks

And what can you do with 400 CSV files? Well, you could explore their contents using my GLAM CSV Explorer. Select one of the files to peek inside, or upload your own CSV. #dhhacks

Some overdue updates to the GLAM Workbench. First here’s details, data, and code from a harvest of GLAM datasets on @datagovau. Includes details of more than 400 CSV datasets. #dhhacks

Over the last week I’ve been downloading editorial cartoons published in The Bulletin from @TroveAustralia. There’s 3,471 cartoons – at least one from every issue published between 4 Sep 1886 and 17 Sep 1952. And you can browse them all… To make it easier to explore the images, …

After a number of unsuccessful attempts, I seem to be getting The Bulletin title art fairly reliably now. Tricky because it’s not always on the same page…

April 2019

Here’s the notebook-ified version of the code I used to harvest all the Australian Commonwealth Hansard XML files from 1901 to 1980: nbviewer.jupyter.org/github/GL…

I’ve reharvested Commonwealth Hansard from 1901 to 1980 and updated my repository of XML files. This should pick up the work of @ParlLibrary staff over recent years to improve the XML output. #dhhacks

And now my GLAM Workbench has a ‘Trove Maps’ section to document examples and explorations using data from @TroveAustralia’s ‘map’ zone: glam-workbench.github.io/trove-map… Includes a list of 20,158 maps with high-res downloads. #dhhacks

The other night @OpenGLAM was sharing collections of high-res images from GLAM orgs that are free to download. That got me thinking about @TroveAustralia’s digitised maps because there’s lots of them, most are out of copyright, and the images are BIG. #dhhacks

If you’d like to make your own big, composite images from lots of @TroveAustralia newspaper thumbnails, here’s a notebook that shows you how.

Australian pilots, aviators, airmen, and flyers — 4,950 thumbnails from a search in @TroveAustralia’s newspapers combined into one very big, zoomable image.

I’ve been busy lately harvesting LOTS of full text data from @TroveAustralia’s digitised journals – so many opportunities for research! You should be able to get to all the code & data from the new Trove journals section of my GLAM Workbench. #dhhacks

I’ve added a section for the @TroveAustralia ‘book’ zone to the GLAM Workbench.

Ok, so I’ve downloaded the OCRd text from 27,426 issues of 358 digitised journals/series in @TroveAustralia. That’s 6.6gb of full text. Tune in tomorrow for full details…

All 9,738 OCRd text files harvested from books, pamphlets and leaflets in @TroveAustralia’s ‘book' zone have been uploaded to @aarnet’s CloudStor for easy browsing/download. There’s also a 400mb zip file if you want the whole lot. The harvesting method and code is available in this …

So @TroveAustralia includes more than 370,000 press releases, speeches, and interview transcripts issued by Aust federal politicians & saved by the Parliamentary Library. Learn how to harvest metadata & full text to create your own datasets in this notebook. #dhhacks

Among the OCRd texts I’m currently harvesting from Trove’s journals zone are things like the NSW Post Office Directories from 1886 onwards. Useful sources for compiling data about occupations, locations etc?

Wow, there are now over 371,000 press releases, interview transcripts and more from the @ParlLibrary available through @TroveAustralia. Just working on a new notebook to harvest sets for research…

Another collection of OCRd text from @TroveAustralia is on its way…

Playing with @TroveAustralia newspaper results. Here’s illustrated articles with ‘White Australia Policy' in their title…

The final tally – after much tweaking I’ve downloaded OCRd text from 9,738 works in the @TroveAustralia books zone. This includes ephemera such as pamphlets and posters as well as more booky books. Here’s the full metadata, all the text files, & harvesting code. #dhhacks

I’m looking for books in @TroveAustralia, but there’s lots of ephemera (pamphlets, posters etc) in the book zone. So I tried grabbing the images of ‘books’ with one page & found some nice stuff including this collection of playbills. #dhhacks

Text of over 3 thousand digitised books and pamphlets downloaded so far from @TroveAustralia…

After talking to @PrimahadiWijaya today about work at @MonashLing, I started harvesting metadata & full text from digitised books in @TroveAustralia. OCRd text from about 2,000 books downloaded so far. More soon… #dhhacks

What I did at #valatechcamp! Here’s a CSV with basic details of 7,719 digitised books available through @TroveAustralia. I’m not sure if they all have OCRd text available, but if they do I’ll attempt to download it once I’m back home.

TIL that the web pages for digitised works (like books and journal issues) on @TroveAustralia embed a lot of useful metadata that you can’t get through the API. Here’s how to extract it.

Just posting the link to my ‘Introducing APIs’ slides for #VALATechCamp again, so that they show up in my MicroBlog feed…

Hmm, it occurs to me that the method I used to generate newspaper article thumbnails from Trove, could also be used to extract illustrations (cartoons, drawings, photos etc)…

So, I’ve finally figured out a way to automatically generate nice-looking thumbnails from @TroveAustralia newspaper articles. Demo notebook here. #dhhacks

So I put the recent report into Australia’s national cultural institutions into the @TDHASSN instance of @VoyantTools. Here’s the contexts of the word ‘story’…

March 2019

Train from Canberra to Melbourne booked for #VALATechCamp. I’ll be hanging around both days, so let me know if you’d like to chat about the GLAM Workbench, Jupyter, Trove data, or any of the other things I fiddle with…

Sneak preview of my GLAM CSV Explorer now live on @MyBinderTeam! Select one of 447 GLAM-related CSVs from @datagovau for analysis, or load your own. Coming soon to @TDHASSN. #dhhacks

Having ripped out a lot of code and simplified a mess of conditionals, I think this CSV Explorer thingy is getting there…

Now to load that new CSV of GLAM CSVs into my CSV Explorer…

Quick notebook to harvest GLAM datasets via the new(ish) @datagovau API. Includes 447 CSVs from 19 institutions.

This is why we can’t have nice things…

Still plenty to do, but my CSV Explorer is taking shape… (coming soon to @TDHASSN & elsewhere!) I’ll be giving a demo at the @HumanitiesAU data summit on Friday. Now with animated gif…

Doesn’t take much to show when there’s a problem with dates in metadata… (Yep, post 1900 dates have all jumped 100 years into the future.)

Fun day talking to the @dhpanu team at ANU about digital history possibilities. Slides/links are all online.

Currently working on a CSV explorer to give researchers an overview of the contents of GLAM datasets. Sort of like WTFCSV, but in a Jupyter notebook…

A bit more progress. Having found columns with OpenCV, I can use Tesseract to help me find the rows…

After much OpenCV fiddling & tweaking, sorry… iteration, I’m pretty pleased with this. Columns and headers being detected accurately despite lots of variation in the images.

So right around now I think I’m talking (via video) about my adventures with #HistoricHansard for the ‘Between Cyberutopia and Cyberphobia’ workshop at @witswiser in South Africa. You can follow along: vimeo.com/321657685

More updates! Latest data and images from The Real Face of White Australia transcription project are up on GitHub. #dhhacks

I’ve finally updated the @TroveAustralia API Console to use version 2 of the API & https by default. (Also updated to Python3 & latest Heroku stack.) More examples coming soon… #dhhacks

Lots of exciting new stuff has been added to @TroveAustralia’s digitised journals in the last few months. To explore it all, head here and click on the ‘New titles’ button. #dhhacks

Art & Architecture: the journal of the Institute of Architects NSW, 51 issues from 1905 to 1912 now on @TroveAustralia.

Only 12 issues, but check out the fabulous covers on The New Triad from 1927-8. Now on @TroveAustralia.

Want some arts? 130 issues of RealTime from 1994 to 2016 now on @TroveAustralia.

Hey #ozhist, 295 issues of the journal of the Royal Australian Historical Society from 1918 to 1954 are now on @TroveAustralia.

Also amongst the latest batch of digitised journals on @TroveAustralia, 39 issues of Camp Ink from 1970 to 1977.

There’s more literary journals digitised in @TroveAustralia as well. Including 18 issues of the Bookfellow from 1907.

But wait, there’s more — the KCC Kennel Gazette was renamed, wait for it, Dogs. Another 94 issues from 1962 to 1969 in @TroveAustralia here.

People, you need to know that @TroveAustralia has digitised 360 issues of the KCC Kennel Gazette from 1932 to 1962. 13/10 would browse.

Updating my list of digitised journals on @TroveAustralia this morning and seeing what’s new. Highlights to follow…

I’ll be running some more @TroveAustralia workshops for @UniCanberraReD this year. On 13 May it’s ‘Trove tips & tricks’, followed on 27 May with ‘Trove as a platform for digital research & creativity’. See here for details.

February 2019

#dhhacks — Save a page image from the State Archives of NSW's Bubonic Plague Register: So NSW State Archives has digitised the Register of Cases of Bubonic Plague 1900-1908. Great work! Unfortunately though, they’ve put the digitised page images in one of those annoying page-turnery things, without any obvious way of downloading them (please correct me if I’m wrong!). Even …

I’ve updated the notebook for harvesting records from @archivesnz’s Archway database in my GLAM Workbench. I just used it to harvest more than 8,000 records from series 8333 relating to naturalisation. #dhhacks

Uh, ok — so an advanced search for keywords only in Archway gives me a maximum of 1000 results. But if I add a date range I can get 10,000 results.

Looks like I’ll be heading to the VALA Tech Camp in April to talk APIs. See you there!

New section added to my GLAM Workbench for the Queensland State Archives (@qsarchives). Includes a notebook to add series information into their Naturalisations 1851-1904 index. #dhhacks

So in case you’re wondering, the @qsarchives ‘Naturalisations 1851 to 1904’ index actually collates names from these 10 series (and therefore has multiple entries for a single person).

Whoops. Here’s the actual full list of countries of origin from the @nswarchives NSW naturalisations data (and not just the screenshot!).

Here’s the full list of countries of origin from the NSW naturalisations data, 1834-1903.

NSW naturalisations 1834 to 1903. The sudden rise in Chinese naturalisations followed the introduction of the poll tax. More restrictions soon followed… Using (deduped) naturalisations data from @nswarchives.

Suggestions of new topics and collections for my GLAM workbench are welcome!

Here’s an example dataset harvested from Library and Archives Canada’s naturalisation database. It’s all the people with ‘China’ as their country of origin, supplemented with wives and children (who are not included in a country search).

I’ve added a section for Library and Archives Canada to my GLAM workbench. The first notebook extracts records of people from a specific country from their naturalisations database and saves the results as a CSV file. #dhhacks

Current status — extracting data from Library and Archives Canada’s 1915-1946 naturalisation database. Coming soon to my GLAM Workbench…

The full text of ‘Who belongs? Reading identity, ownership, and legitimacy’, my talk for #text2data last week, is now online. Includes slides, code, data & more… #dhhacks

My talk for #text2data at the National Library of Sweden looks at occurence of the words ‘aliens’ & ‘immigrants’ in @TroveAustralia newspapers, The Bulletin, & Hansard. The slides, code & data are online. #dhhacks

Back to school report — what I did on my holidays…

Another slide for Sweden — this one comparing words appearing before ‘aliens’ in The Bulletin and Commonwealth Hansard (1901-1980).

Working on my slides for From Text to Data in Stockholm this week…

I’ve added a ‘save chart’ option to the QueryPic app in my GLAM Workbench. Visualise your searches in @TroveAustralia newspapers, then save the results as HTML for easy download. #dhhacks

January 2019

Pleased and proud to see the chapter @baibi & I wrote on the Real Face of White Australia now published as part of an awesome collection. Buy now or read the CC-BY version online!

In a bit over a week I’ll be heading to Stockholm for the ‘From text to data’ conference. Preparing myself for the 40 degree temperature difference…

Talking about 'immigrants' in Trove's digitised newspapers: I’m giving a talk in a week or so (eep!) that looks at some of the changing contexts in which the word ‘aliens’ has been used in Australia. I thought, by way of comparison, it would be useful to do the same for ‘immigrants’. While I was playing around with the data last …

In case you’re wondering, it took about 13 hours to download the metadata and full text of more than 2,000,000 @TroveAustralia articles including the word ‘Chinese’ using my Trove Newspaper Harvester. You can try it here.

Ok, so let’s see how I go harvesting 2 million newspaper articles from @TroveAustralia conatining the word ‘Chinese’…

30,000+ occurences of the word ‘Chinese’ in the OCRd full text of The Bulletin, 1880-1968.

One more and I’m done for the night… New GLAM Workbench page for the ‘Trove API introduction’ notebooks.

I’ve finished putting details of all the current GLAM Workbench repositories into the new documentation site. Still a few notebooks to migrate from the original workbench, but getting there! There’s about 50 Jupyter notebooks so far. #dhhacks

Added a ‘data’ section to the GLAM Workbench docs, with info on harvests from government data portals, as well as series from @naagovau relating to ASIO and the White Australia Policy.

And now a GLAM Workbench page for @Te_Papa…

Added a page for @ArchivesNZ’s Archway to the GLAM Workbench docs…

So here’s some fun things to do with @TroveAustralia newspapers… (via GLAM Workbench)

Ok, more documentation for you — page for the @DigitalNZ API in GLAM Workbench updated!

Slowly working my way through the documentation for my GLAM Workbench. Still lots to do, but I think the page for @naagovau’s RecordSearch is now up-to-date.

If there are APIs or other data sources you’d like me to add to my GLAM Workbench, feel free to create an issue. You could also describe what sorts of tools or examples using that data source would be useful.

Updated list of the fifty most common words occuring before the word ‘aliens’ in @TroveAustralia newspapers (with no capitalisation and stopwords removed). 274,157 occurances in 213,151 articles.

Just updated my harvest of metadata and full text from The Bulletin in @TroveAustralia. There’s about 2gb of OCRd text from 4,534 issues (1880-1968). Full text for about 60 issues have been added since my last harvest. 111 have no OCRd text. Download it all from GitHub #dhhacks

Fifty most common words occuring before the word ‘aliens’ in @TroveAustralia newspapers (213,000 articles)…

You want big data? I just harvested 213,340 newspaper articles (including full OCRd text) from @TroveAustralia in 82 minutes, at about 40 articles a second. https://mybinder.org/v2/gh/GLAM-Workbench/trove-newspaper-harvester/master?urlpath=%2Fapps%2Fnewspaper_harvester_app.ipynb

So now I’ve updated TroveHarvester and built a new interface I can get back to the task I wanted the TroveHarvester for a couple of days ago — harvesting all references to ‘aliens’ in newspapers… #yakshaving

Want an easy way to download @TroveAustralia newspaper articles in bulk? No installation? Point and click? I’ve created a simple web app version of my TroveHarvester using a Jupyter notebook & running on @mybinderteam. Try it live! #dhhacks

And version 0.2.2 of TroveHarvester quickly follows 0.2.1 as I squash a bug when downloading PDFs… Also managed to get the README displaying properly on Pypi. pypi.org/project/t…

TroveHarvester 0.2.1 — updated to work with version 2 of the @TroveAustralia API. Now on pypi! More details shortly…

Ok, that’s more like it. Full text and metadata of 29,203 newspaper articles harvested using the @TroveAustralia API in under 10 minutes. Testing nearly done…

Ah ok, I forgot about the new ‘bulkHarvest’ parameter in the @TroveAustralia API. Setting that to ‘true’ seems to make all the difference…

Uh, never come across one of these before from the @TroveAustralia API. Needless to say it causes the Newspaper Harvester to die. It’s easy to check for these things once you know they exist, but…

Testing the updated Trove Newspaper Harvester… Run into a problem with the @TroveAustralia API not returning the complete result set in large harvests, trying to figure out why…

Thanks to the @TroveAustralia API upgrade, the new version of the Trove Newspaper Harvester should be a lot faster. For harvests with full text (but not PDFs which slow things down a lot) I’m getting 40-50 articles a second.

Since I’m updating the Trove Newspaper Harvester to work with version 2 of the @TroveAustralia API thought I might as well fix up a few other things as well… Now with added progress bars!

I’m enjoying using micro.blog as a way of capturing what I’m working on: updates.timsherratt.org Just need to get the GitHub mirror site working…

Finally biting the bullet and getting to work on updating the TroveHarvester to work with version 2 of the API…

That’s cool — just realised I can share easily share live versions of Altair charts from Jupyter notebooks using Vega. Here’s the complete ‘aliens’ chart.

And also “coloured alien” which, not suprisingly, peaks in 1901 when the Immigration Restriction Act is passed…

Exploring some of the adjectives attached to ‘alien’ in @TroveAustralia newspapers… You can create these sorts of comparisons yourself using this app. #dhhacks

Just to emphasise my point the other day about the impact of stemming on searches for naturalisation/naturalization in @TroveAustralia. Compare these — the stemming on/off results for ‘naturalisation’ are pretty much in proportion, but not for ‘naturalization’…

Nothing like browsing the databases of another country’s national/state archives to make you realise how useful the series system is…

The Australian version of ‘Who’s responsible?’ is up! Just select a government function and explore the different agencies associated with it over time. It’s built with data from @naagovau’s RecordSearch. Try it live! #dhhacks

New notebook added to the #GLAMWorkbench RecordSearch repository — get the basic details of agencies associated with all government functions used in @naagovau’s RecordSearch and save to a single JSON data file. View code and data. #dhhacks

Hmm, wondering why the ‘National Council of Women of the Australian Capital Territory’ is assigned the function ‘CITIZENSHIP’ in @naagovau’s RecordSearch…

As well as cross-posting updates to Twitter and Mastodon, I’ve now set up IFTTT to keep an eye on my micro.blog feed and post anything with the hashtag #dhhacks to my 101 DH Hacks FB page!

Adventures in stemming, or what happens when you search Trove for 'naturalization': Fun fact — the Porter stemming algorithm treats the words ‘naturalisation’ and ‘naturalization’ differently. Naturalisation is stemmed to ‘naturalis’, naturalization to ‘natur’. You can try this yourself using this NLTK stemming demo. Why does this …

I have a brand new updates page powered by micro.blog!