glamworkbench

Working with Trove data – a collection of tools and resources

Monday, May 2, 2022

The ARDC is organising a couple of public forums to help gather researcher requirements for the Trove component of the HASS RDC. One of the roundtables will look at ‘Existing tools that utilise Trove data and APIs’. Last year I wrote a summary of what the GLAM Workbench can contribute to the development of humanities research infrastructure, particularly in regard to Trove. I thought it might be useful to update that list to include recent additions to the GLAM Workbench, as well as a range of other datasets, software, tools, and interfaces that exist outside of the GLAM Workbench.

Continue reading →

Saturday, April 30, 2022 →

And so it starts… #GLAMWorkbench

Screenshot of GLAM Workbook welcome page. Text states: 'This is a companion to the GLAM Workbench. Here you'll documentation, tips, tutorials, and exercises to help you work with digital collections from galleries, libraries, archives, and museums (the GLAM sector).'

Thursday, April 28, 2022 →

Ok, I’ve created a new #GLAMWorkbench meta issue to try and bring together all the things I’m trying to do to improve & automate the code & documentation. This should help me keep track of things… github.com/GLAM-Work… #DayofDH2022

Thursday, April 28, 2022 →

A couple of hours of #DayofDH2022 left – feeling a bit uninspired, so I’m going to do some pruning & reorganising of the #GLAMWorkbench issues list: github.com/GLAM-Work…

Tracking Trove changes over time

Wednesday, April 20, 2022

I’ve been doing a bit of cleaning up, trying to make some old datasets more easily available. In particular I’ve been pulling together harvests of the number of newspaper articles in Trove by year and state. My first harvests date all the way back to 2011, before there was even a Trove API. Unfortunately, I didn’t run the harvests as often as I should’ve and there are some big gaps. Nonetheless, if you’re interested in how Trove’s newspaper corpus has grown and changed over time, you might find them useful.

Continue reading →

The GLAM Workbench wants you!

Wednesday, March 2, 2022

Over the past few months I’ve been doing a lot of behind-the-scenes work on the GLAM Workbench – automating, standardising, and documenting processes for developing and managing repositories. These sort of things ease the maintenance burden on me and help make the GLAM Workbench sustainable, even as it continues to grow. But these changes are also aimed at making it easier for you to contribute to the GLAM Workbench! Perhaps you’re part of a GLAM organisation that wants to help researchers explore its collection data – why not create your own section of the GLAM Workbench?

Continue reading →

Omeka S Tools – new Python package

Thursday, February 17, 2022

Over the last couple of years I've been fiddling with bits of Python code to work with the Omeka S REST API. The Omeka S API is powerful, but the documentation is patchy, and doing basic things like uploading images can seem quite confusing. My code was an attempt to simplify common tasks, like creating new items. In case it's of use to others, I've now shared my code as a Python package.

Continue reading →

Testing, testing...

Friday, January 28, 2022

I regularly update the Python packages used in the different sections of the GLAM Workbench; though probably not as often as I should. Part of the problem is that once I've updated the packages, I have to run all the notebooks to make sure I haven't inadvertently broken something -- and this takes time. And in those cases where the notebooks need an API key to run, I have to copy and paste the key in at the appropriate spots, then remember to delete them afterwords.

Continue reading →

Some big pictures of newspapers in Trove and DigitalNZ

Thursday, December 9, 2021

One of the things I really like about Jupyter is the fact that I can share notebooks in a variety of different formats. Tools like QueryPic can run as simple web apps using Voila, static versions of notebooks can be viewed using NBViewer, and live versions can be spun up as required on Binder. It’s also possible to export notebooks at PDFs, slideshows, or just plain-old HTML pages. Just recently I realised I could export notebooks to HTML using the same template I use for Voila.

Continue reading →

Exploring GLAM data at ResBaz

Thursday, December 9, 2021

The video of my key story presentation at ResBaz Queensland (simulcast via ResBaz Sydney) is now available on Vimeo. In it, I explore some of the possibilities of GLAM data by retracing my own journey through WWI service records, The Real Face of White Australia, #redactionart, and Trove – ending up at the GLAM Workbench, which brings together a lot of my tools and resources in a form that anyone can use.

Continue reading →

GLAM Workbench Nectar Cloud Application updated!

Wednesday, December 1, 2021

The newly-updated DigitalNZ and Te Papa sections of the GLAM Workbench have been added to the list of available repositories in the Nectar Research Cloud’s GLAM Workbench Application. This means you can create your very own version of these repositories running in the Nectar Cloud, simply by choosing them from the app’s dropdown list. See the Using Nectar help page for more information. I’ve also taken the opportunity to make use of the new container registry service developed by the ARDC as part of the ARCOS project.

Continue reading →

DigitalNZ & Te Papa sections of the GLAMWorkbench updated!

Wednesday, December 1, 2021

In preparation for my talk at ResBaz Aotearoa, I updated the DigitalNZ and Te Papa sections of the GLAM Workbench. Most of the changes are related to management, maintenance, and integration of the repositories. Things like: Setting up GitHub actions to automatically generate Docker images when the repositories change, and to upload the images to the Quay.io container registry Automatic generation of an index.ipynb file based on README.md to act as a front page within Jupyter Lab Addition of a reclaim-manifest.

Continue reading →

A template for GLAM Workbench development

Thursday, November 11, 2021

I’m hoping that the GLAM Workbench will encourage GLAM organisations and GLAM data nerds (like me) to create their own Jupyter notebooks. If they do, they can put a link to them in the list of GLAM Jupyter resources. But what if they want to add the notebooks to the GLAM Workbench itself? To make this easier, I’ve been working on a template repository for the GLAM Workbench. It generates a new skeleton repository with all the files you need to develop and manage your own section of the GLAM Workbench.

Continue reading →

Coming up! GLAM Workbench at ResBaz(s)

Thursday, November 4, 2021

Want a bit of added GLAM with your digital research skills? You’re in luck, as I’ll be speaking at not one, but three ResBaz events in November. If you haven’t heard of it before, ResBaz (Research Bazaar) is ‘a worldwide festival promoting the digital literacy at the centre of modern research’. On Wednesday, 24 November I’ll be giving a key story presentation (like a keynote, but with more story!) entitled Exploring GLAM data for ResBaz Queensland.

Continue reading →

New video – using the Trove Newspaper & Gazette Harvester

Monday, November 1, 2021

The latest help video for the GLAM Workbench walks through the web app version of the Trove Newspaper & Gazette Harvester. Just paste in your search url and Trove API key and you can harvest thousands of digitised newspaper articles in minutes!

Continue reading →

Harvest newspaper issues as PDFs

Monday, November 1, 2021

An inquiry on Twitter prompted me to put together a notebook that you can use to download all available issues of a newspaper as PDFs. It was really just a matter of copying code from other tools and making a few modifications. The first step harvests a list of available issues for a particular newspaper from Trove. You can then download the PDFs of those issues, supplying an optional date range.

Continue reading →

GLAM Workbench now in the Nectar Research Cloud!

Thursday, October 21, 2021

The GLAM Workbench isn’t dependent on one big piece of technological infrastructure. It’s basically a collection of Jupyter notebooks, and those notebooks can be used within a variety of different environments. This helps make the GLAM Workbench more sustainable – new components can be swapped in and out as required. It also makes it possible to create different pathways for users, depending on their digital skills, institutional support, and research needs.

Continue reading →

More GLAM Name Index updates from Queensland State Archives and SLWA

Monday, October 18, 2021

A new version of the GLAM Name Index Search is available. An additional 49 indexes have been added, bringing the total to 246. You can now search for names in more than 10.2 million records from 9 organisations. The new indexes come from Queensland State Archives and the State Library of WA. QSA announced on Friday that they’d added two new indexes to their site. When I went to harvest them, I realised there was another 25 indexes that I hadn’t previously picked up.

Continue reading →

Getting data about newspaper issues in Trove

Friday, October 15, 2021

When you search Trove’s newspapers, you find articles – these articles are grouped by page, and all the pages from a particular date make up an issue. But how do you find out what issues are available? How do you get a list of dates when newspapers were published? This notebook in the GLAM Workbench shows how you can get information about issues from the Trove API. Using the notebook, I’ve created a couple of datasets ready for download and use.

Continue reading →

GLAM Workbench at eResearch Australasia 2021

Friday, October 15, 2021

Way back in 2013, I went to the eResearch Australasia conference as the manager of Trove to talk about new research possibilities using the Trove API. Eight years years later I was back, still spruiking the possibilities of Trove data. This time, however, I was discussing Trove in the broader context of GLAM data – all the exciting possibilities that have emerged as galleries, libraries, archives and museums make more of their collections available in machine-readable form.

Continue reading →

New Python package to download Trove newspaper images

Tuesday, October 5, 2021

There’s no reliable way of downloading an image of a Trove newspaper article from the web interface. The image download option produces an HTML page with embedded images, and the article is often sliced into pieces to fit the page. This Python package includes tools to download articles as complete JPEG images. If an article is printed across multiple newspaper pages, multiple images will be downloaded – one for each page.

Continue reading →

More records for the GLAM Name Index Search

Wednesday, September 29, 2021

Two more datasets have been added to the GLAM Name Index Search! From the History Trust of South Australia and Collab, I’ve added: Passengers in History – that’s 371,894 records of people arriving in South Australia from 1836 to 1961 Women’s Suffrage Petition 1894 (South Australia) – another 10,638 names In total there’s 9.67 million name records to search across 197 datasets provided by 9 GLAM organisations!

Continue reading →

More QueryPic in action

Wednesday, September 29, 2021

Recently I created a list of publications that made use of QueryPic, my tool to visualise searches in Trove’s digitised newspapers. Here’s another example of the GLAM Workbench and QueryPic in action, in Professor Julian Meyrick’s recent keynote lecture, ‘Looking Forward to the 1950s: A Hauntological Method for Investigating Australian Theatre History’.

Continue reading →

Some thoughts on the ‘Trove Researcher Platform for Advanced Research’ draft plan

Friday, September 10, 2021

Late last year the Federal Government announced it was making an $8.9 million investment in HASS and Indigenous research infrastructure. This program is being managed by the ARDC and will lead to the development of a HASS Research Data Commons. According to the ARDC, a research data commons: brings together people, skills, data, and related resources such as storage, compute, software, and models to enable researchers to conduct world class data-intensive research

Continue reading →

Some research projects that have used QueryPic

Monday, August 30, 2021

A Twitter thread about some of the research uses of QueryPic… QueryPic, my tool for visualising searches in @TroveAustralia’s digitised newspapers, has been around in different forms for more than 10 years. The latest version is part of the #GLAMWorkbench: https://t.co/qnY5tVDwgY #researchinfrastructure pic.twitter.com/QyHWJwGV3u — Tim Sherratt (@wragge) August 29, 2021 I thought I’d highlight some of the research publications that have made use of QueryPic over the years, so, in no particular order.

Continue reading →

Government publications in Trove

Monday, August 30, 2021

Over the last few weeks I’ve been updating my harvests of OCRd text from digitised books and periodicals in Trove. As part of the harvesting process, I’ve created lists of both that are available in digital form – this includes digitised works, as well as those that are born-digital (such as PDFs or epubs). I’ve published the full lists of books and periodicals as searchable databases to make them easy to explore.

Continue reading →

GLAM Workbench – a platform for digital HASS research

Thursday, August 26, 2021

We’re in the midst of planning for the HASS Research Data Commons, which will deliver some much-needed investment in digital research infrastructure for the humanities and social sciences. Amongst the funded programs are tools for text analysis as part of the Linguistics Data Commons, and a platform for more advanced research using Trove. I’m hoping that this will be an opportunity to take stock of existing tools and resources, and build flexible pathways for researchers that enable them to collect, move, analyse, preserve, and share data across different platforms and services.

Continue reading →

A Family History Month experiment – search millions of name records from GLAM organisations

Monday, August 23, 2021

There’s a lot of rich historical data contained within the indexes that Australian GLAM organisations provide to help people navigate their records. These indexes, often created by volunteers, allow access by key fields such as name, date or location. They aid discovery, but also allow new forms of analysis and visualisation. Kate Bagnall and I wrote about some of the possibilities, and the difficulties, in this recently published article. Many of these indexes can be downloaded from government data portals.

Continue reading →

Explore Trove’s digitised books

Monday, August 16, 2021

The Trove books section of the GLAM Workbench has been updated! There’s freshly-harvested data, as well as updated Python packages, integration with Reclaim Cloud, and automated Docker builds. Included is a notebook to harvest details of all books available from Trove in digital form. This includes both digitised books, that have been scanned and OCRd, as well as born digital publications, such as PDFs and epubs. The definition of ‘books’ is pretty loose – I’ve harvested details of anything that has been assigned the format ‘Book’ in Trove, but this includes ephemera, such as posters, pamphlets, and advertising.

Continue reading →

A miscellany of ephemera, oddities, & estrays

Friday, August 13, 2021

I’m just in the midst of updating my harvest of OCRd text from Trove’s digitised books (more about that soon!). But amongst the items catalogued as ‘books’ are a wide assortment of ephemera, posters, advertisements, and other oddities. There’s no consistent way of identifying these items through the search interface, but because I’ve found the number of pages in each ‘book’ as part of the harvesting process, I can limit results to items with just a single digitised page – there’s more than 1,500!

Continue reading →