Tim Sherratt

Sharing recent updates and work-in-progress

glamworkbench

17 May 2021

Web archives section of GLAM Workbench updated!

My program of rolling out new features and integrations across the GLAM Workbench continues. The latest section to be updated is the Web Archives section! There are no new notebooks with this update, but some important changes under the hoo...
12 May 2021

Using web archives to find out when newspapers were added to Trove

There’s no doubt that Trove’s digitised newspapers have had a significant impact on the practice of history in Australia. But analysing that impact is difficult when Trove itself is always changing – more newspapers and articles are being a...
12 May 2021

GLAM Jupyter Resources

To make it easier for people to suggest additions, I’ve created a GitHub repository for my list of GLAM Jupyter examples and resources. Contributions are welcome! This list is automatically pulled into the GLAM Workbench’s help documentation. #dhhacks
12 May 2021

Running notebooks – a sign of things to come in the GLAM Workbench

I recently made some changes in the GLAM Workbench’s Help documentation, adding a new Running notebooks section. This section provides detailed information of running and managing GLAM Workbench repositories using Reclaim Cloud and Docker. ...
12 May 2021

Sponsor my work on GitHub!

As I foreshadowed some weeks ago, I’ve shut down my Patreon page. Thanks to everyone who has supported me there over the last few years! I’ve now shifted across to GitHub Sponsors, which is focused on supporting open source projects. This s...
12 May 2021

Updates to the Trove Newspapers section of GLAM Workbench

I’ve updated, refreshed, and reorganised the Trove newspapers section of the GLAM Workbench. There’s currently 22 Jupyter notebooks organised under the following headings: Trove newspapers in context – Notebooks in this section look at the...
27 Apr 2021

Introducing the new, improved RecordSearch Data Scraper!

It was way back in 2009 that I created my first scraper for getting machine-readable data out of the National Archives of Australia’s online database, RecordSearch. Since then I’ve used versions of this scraper in a number of different proj...
29 Mar 2021

Recently digitised files in the National Archives of Australia

I’m interested in understanding what gets digitised and when by our cultural institutions, but accessible data is scarce. The National Archives of Australia lists ‘newly scanned' records in RecordSearch, so I thought I’d see if I could conv...
26 Mar 2021

Moving on from Patreon...

Over the last few years, I’ve been very grateful for the support of my Patreon subscribers. Financially, their contributions have helped me cover a substantial proportion of the cloud hosting costs associated with projects like Historic Han...
25 Mar 2021

What can you do with the GLAM Workbench?

You might have noticed some changes to the GLAM Workbench home page recently. One of the difficulties has always been trying to explain what the GLAM Workbench actually is, so I thought it might be useful to put more examples up front. The ...
25 Mar 2021

Reclaim Cloud integration coming soon to the GLAM Workbench

I’ve been doing a bit of work behind the scenes lately to prepare for a major update to the GLAM Workbench. My plan is to provide one click installation of any of the GLAM Workbench repositories on the Reclaim Cloud platform. This will prov...
25 Mar 2021

Some recent GLAM Workbench presentations

I’ve given a couple of talks lately on the GLAM Workbench and some of my other work relating to the construction of online access to GLAM collections. Videos and slides are available for both: From collections as data to collections as inf...
08 Mar 2021

Some GLAM Workbench datasets to explore for Open Data Day

It was Open Data Day on Saturday 6 March – here’s some of the ready-to-go datasets you can find in the GLAM Workbench – there’s something for historians, humanities researchers, teachers & more! First here’s a list of Australian GLAM (tha...
11 Feb 2021

The NAA recently changed field labels in RecordSearch, so that ‘Barcode' is now ‘Item ID’. This required an update to my recordsearch_tools screen scraper. I also had to make a few changes in the RecordSearch section of the GLAM Workbench. #dhhacks

03 Feb 2021

New! DigitalNZ API Query Builder added to GLAM Workbench

I’ve added an API Query Builder to the DigitalNZ section of the GLAM Workbench. You can use it to learn about the different parameters available from the search API, and experiment with different queries. Just get your API key from DigitalN...
28 Jan 2021

OpenGLAM fireworks! Finding open collections in DigitalNZ

Lately I’ve been updating and expanding the notebooks in the DigitalNZ section of the GLAM Workbench. In particular, I’ve been looking at the usage facet to understand how much of the aggregated content is ‘open’. What do I mean by ‘open’? ...
18 Jan 2021

New dataset and notebooks – twenty years of ABC Radio National

There’s a new GLAM Workbench section for working with data from Trove’s Music & Sound zone! Inside you’ll find out how to harvest all the metadata from ABC Radio National program records – that’s 400,000+ records, from 160 Radio National pr...
16 Dec 2020

GLAM Workbench wins British Library Labs Research Award!

Asking questions with web archives – introductory notebooks for historians has won the British Library Labs Research Award for 2020. The awards recognise ‘exceptional projects that have used the Library’s digital collections and data’. This...
15 Dec 2020

The GLAM Workbench as research infrastructure (some basic stats)

Repositories in the GLAM Workbench have been launched on Binder 3,529 times since the start of this year (according to data from the Binder Events log). That’s repository launches, not notebooks. Having launched a repository, users might us...
27 Nov 2020

Earlier this year I gave a seminar for the International Internet Preservation Consortium (IIPC) introducing the web archives section of the GLAM Workbench. The seminar is now available online: youtu.be/rVidh_wex…

Here are the slides if you want to follow along. #dhhacks

25 Nov 2020

Harvest text from the Australian Women's Weekly!

The Trove Newspaper & Gazette Harvester has been updated to version 0.4.0. The major change is that if the OCRd text for an article isn’t available through the API, it will be automatically downloaded via the web interface. What does this m...
13 Nov 2020

Beyond the copyright cliff of death

If you’ve done any searching in Trove’s digitised newspapers, you’ve probably noticed that there aren’t many results after 1954. This is basically because of copyright restrictions (though given the complexities of Australia’s copyright sys...
26 Oct 2020

I’ve added a new section to the GLAM Workbench for the ANU Archives. The first set of notebooks relates to the Sydney Stock exchange stock and share lists. As the content note describes:

These are large format bound volumes of the official lists that were posted up for the public to see - 3 times a day - forenoon, noon and afternoon - at the close of the trading session in the call room at the Sydney Stock Exchange. The closing prices of stocks and shares were entered in by hand on pre-printed sheets.

The volumes have been digitised, resulting in a collection of 70,000+ high resolution images. You can browse the details of each volume using this notebook.

I’ve been exploring ways of getting useful, machine-readable data out of the images. There’s more information about the processes involved in this repository. I’ve also been working on improving the metadata and have managed to assign a date and session (Morning, Noon, or Afternoon) to each page. We these, we can start to explore the content!

One of the notebooks creates a calendar-like view of the whole collection, showing the number of pages surviving from each trading day. This makes it easy to find the gaps and changes in process. #dhhacks

26 Oct 2020

I’ve added more years to my repository of Commonwealth Hansard! The repository now includes XML-formatted text files for both houses from 1901 to 1980, and 1998 to 2005. I’ve done some more checking and confirmed that the XML files for 1981 to 1997 aren’t currently available through ParlInfo, however, the Parliamentary Library are looking into it. I’ve also created a CSV-formatted list of sitting days from 1901 to 2005 (based on ParlInfo search results). Details of the harvesting process are available in the GLAM Workbench. #dhhacks

14 Aug 2020

Another #GLAMWorkbench update! Snip words out of @TroveAustralia newspaper pages and create big composite images. OCR art! glam-workbench.github.io/trove-new… #dhhacks