Exploring the front pages of newspapers (10 years on)

Way back in 2012, I used the brand new Trove API to download the details of 4 million articles published on the front pages of newspapers. I did it for two reasons: first, I wanted to see how the content of front pages changed over time; and second, I wanted to show that large-scale data wrangling was entirely possible using nothing more than a laptop and a home broadband connection. I described my adventures in this blog post, but if you look at it now you’ll see lots of sad, empty boxes where live charts used to be.

Continue reading →

Trove API Console updates

The Trove API Console provides examples of the Trove API in action that you can run, edit, and share. It’s been online for 9 years now, and I’ve just updated it to use version 3 of the Trove API by default. I’ve also added a new ‘Share’ button that makes it easier to share and embed examples. If you click on the ‘Share’ button, a box will pop up. If you add a comment, this will appear above the example query when users follow the shared link.

Continue reading →

Getting to work on the Trove Data Guide

The ARDC has started work on the development of a HASS Community Data Lab to support digital research in the humanities. I’m part of the team of contractors, and my work package is focused on the development of a Trove Data Guide. My aim is to give researchers what they need to use and understand all the varieties of data that Trove makes available – from newspaper text to digitised, high-resolution maps.

Continue reading →

Updated harvest of NSW State Archives indexes – more than 2 million rows of data!

The NSW State Archives (now part of Museums of History NSW) publishes a series of useful indexes to its collections. The indexes include basic data transcribed from the records, such as names, dates, and places, providing fine-grained access to the collections. But when they’re explored as data, the indexes also suggest new ways of analysing, visualising, and linking sets of records. (For some of the possibilities and challenges of using this sort of data see Missing Links: Data Stories from the Archive of British Settler Colonial Citizenship).

Continue reading →

A big milestone, Trove contributor data, and the coming of API v3 – recent GLAM Workbench updates

There have been quite a few GLAM Workbench updates over the last month, here’s some notes. (See February’s update for more recent changes…) General developments After many months of work, all thirteen Trove repositories within the GLAM Workbench have been updated to include standard configurations, integrations, and basic tests. This will make ongoing development and maintenance much easier. Docker images of every repository are now built automatically whenever the code changes.

Continue reading →

Maps, people, lists & more – recent updates to Trove resources in the GLAM Workbench

Once again I’ve gotten a bit behind in noting GLAM Workbench updates, so here’s a quick catch up on some Trove-related changes from the last couple of months. Trove API introduction The section that introduces the Trove API (or APIs!) hasn’t had much love over recent years. I’m hoping to add some more content in the coming months, but for now I just did a bit of maintenance – updating Python packages and config files, including tests, and setting up automated builds of Docker containers.

Continue reading →

Real Face of White Australia – updated site to transcribe records from the National Archives of Australia

Back in 2017, I worked with students from my ‘Exploring Digital Heritage’ class at the University of Canberra to develop and launch a site to transcribe records from the National Archives of Australia relating to the administration of the White Australia Policy. The highlight was a weekend-long ‘transcribe-a-thon’ held in Kings Hall at Old Parliament House. This was part of the Real Face of White Australia project – an ongoing effort by Kate Bagnall and me to increase awareness and understanding of the White Australia Policy records held by the NAA.

Continue reading →

Recent presentations – Library of Congress Data Jam, Everyday Heritage, Wikidata, and GLAM Workbench!

October and November brought a flurry of presentations from which I’m still recovering. Here’s a few details and links. Library of Congress Data Jam In October, the Computing Cultural Heritage in the Cloud project at the Library of Congress organised a Data Jam. I was invited to spend a couple of weeks playing around with one of their datasets and to report on the results. I ended up trying to find references to countries in a collection of 90,000 OCRd books.

Continue reading →

The Australian history industry and the impact of digitisation (open access preprint chapter)

The Australian History Industry was published recently. Edited by Paul Ashton and Paula Hamilton, the book ‘explores the complex, multi-roomed house of Australian history’, exploring academic, school, and public history, the impact of digital technologies, and the relationship of history to memory, social justice, politics, and cultural practice. My chapter ‘Digital revolutions: The limits and affordances of online collections’ looks at how digitisation of GLAM collections has (and hasn’t) changed historical practice:

Continue reading →

Recent updates to trove-newspaper-harvester and trove-newspaper-images

Catching up on some software package updates over the last few months. The trove-newspaper-harvester package is now at v0.6.5. Recent changes include: Fix to handle articles with missing metadata Don’t try to re-download existing text and PDF files on restart Better error messages for CLI Better handling of exceptions The trove-newspaper-images package is now at v0.2.1. Recent changes include: Minor changes to make it easier to use this package within the trove-newspaper-harvester Use argparse directly for the CLI, putting the initialisation within a function to avoid conflicts Remove the messages printed to stdout Updated the repository and documentation to use nbdev v2 Don’t try to re-download existing images

Continue reading →

Do you want your Trove newspaper articles in bulk? Meet the new Trove Newspaper Harvester Python package!

The Trove Newspaper Harvester has been around in different forms for more than a decade. It helps you download all the articles in a Trove newspaper search, opening up new possibilities for large-scale analysis. You can use it as a command-line tool by installing a Python package, or through the Trove Newspaper Harvester section of the GLAM Workbench. I’ve just overhauled development of the Python package. The new trove-newspaper-harvester replaces the old troveharvester repository.

Continue reading →

From 48 PDFs to one searchable database – opening up the Tasmanian Post Office Directories with the GLAM Workbench

A few weeks ago I created a new search interface to the NSW Post Office Directories from 1886 to 1950. Since then, I’ve used the same process on the Sydney Telephone Directories from 1926 to 1954. Both of these publications had been digitised by the State Library of NSW and made available through Trove. To build the new interfaces I downloaded the text from Trove, indexed it by line, and linked it back to the online page images.

Continue reading →

Fresh harvest of OCRd text from Trove's digitised periodicals – 9gb of text to explore and analyse!

I’ve updated the GLAM Workbench’s harvest of OCRd text from Trove’s digitised periodicals. This is a completely fresh harvest, so should include any corrections made in recent months. It includes: 1,430 periodicals OCRd text from 41,645 issues About 9gb of text The easiest way to explore the harvest is probably this human-readable list. The list of periodicals with OCRd text is also available as a CSV. You can find more details in the Trove journals section of the GLAM Workbench, and download the complete corpus from CloudStor.

Continue reading →

Explore Trove's digitised newspapers by place

I’ve updated my map displaying places where Trove digitised newspapers were published or distributed. You can view all the places on single map – zoom in for more markers, and click on a marker for title details and a link back to Trove. If you want to find newspapers from a particular area, just click on a location using this map to view the 10 closest titles. You can view or download the dataset used to construct the map.

Continue reading →

Making NSW Postal Directories (and other digitised directories) easier to search with the GLAM Workbench and Datasette

As part of my work on the Everyday Heritage project I’m looking at how we can make better use of digitised collections to explore the everyday experiences woven around places such as Parramatta Road in Sydney. For example, the NSW Postal Directories from 1886 to 1908 and 1909 to 1950 have been digitised by the State Library of NSW and made available through Trove. The directories list residences and businesses by name and street location.

Continue reading →

Interested in Victorian shipwrecks? Kim Doyle and Mitchell Harrop have added a new notebook to the Heritage Council of Victoria section of the GLAM Workbench exploring shipwrecks in the Victorian Heritage Database: glam-workbench.net/heritage-…

Updates!

Minor update to RecordSearch Data Scraper – now captures ‘institution title’ for agencies if it is present. pypi.org/project/r…

Many thanks to the British Library – sponsors of the GLAM Workbench’s web archives section!

You might have noticed some changes to the web archives section of the GLAM Workbench. I’m very excited to announce that the British Library is now sponsoring the web archives section! Many thanks to the British Library and the UK Web Archive for their support – it really makes a difference. The web archives section was developed in 2020 with the support of the International Internet Preservation Consortium’s Discretionary Funding Programme, in collaboration with the British Library, the National Library of Australia, and the National Library of New Zealand.

Continue reading →

New GLAM data to search, visualise and explore using the GLAM Workbench!

There’s lots of GLAM data out there if you know where to look! For the past few years I’ve been harvesting a list of datasets published by Australian galleries, libraries, archives, and museums through open government data portals. I’ve just updated the harvest and there’s now 463 datasets containing 1,192 files. There’s a human-readable version of the list that you can browse. If you just want the data you can download it as a CSV.

Continue reading →