Tim Sherratt

The Australian history industry and the impact of digitisation (open access preprint chapter)

Monday, November 21, 2022

The Australian History Industry was published recently. Edited by Paul Ashton and Paula Hamilton, the book ‘explores the complex, multi-roomed house of Australian history’, exploring academic, school, and public history, the impact of digital technologies, and the relationship of history to memory, social justice, politics, and cultural practice. My chapter ‘Digital revolutions: The limits and affordances of online collections’ looks at how digitisation of GLAM collections has (and hasn’t) changed historical practice:

Continue reading →

Recent updates to trove-newspaper-harvester and trove-newspaper-images

Monday, November 21, 2022

Catching up on some software package updates over the last few months. The trove-newspaper-harvester package is now at v0.6.5. Recent changes include: Fix to handle articles with missing metadata Don’t try to re-download existing text and PDF files on restart Better error messages for CLI Better handling of exceptions The trove-newspaper-images package is now at v0.2.1. Recent changes include: Minor changes to make it easier to use this package within the trove-newspaper-harvester Use argparse directly for the CLI, putting the initialisation within a function to avoid conflicts Remove the messages printed to stdout Updated the repository and documentation to use nbdev v2 Don’t try to re-download existing images

Continue reading →

Do you want your Trove newspaper articles in bulk? Meet the new Trove Newspaper Harvester Python package!

Thursday, September 22, 2022

The Trove Newspaper Harvester has been around in different forms for more than a decade. It helps you download all the articles in a Trove newspaper search, opening up new possibilities for large-scale analysis. You can use it as a command-line tool by installing a Python package, or through the Trove Newspaper Harvester section of the GLAM Workbench. I’ve just overhauled development of the Python package. The new trove-newspaper-harvester replaces the old troveharvester repository.

Continue reading →

From 48 PDFs to one searchable database – opening up the Tasmanian Post Office Directories with the GLAM Workbench

Thursday, September 15, 2022

A few weeks ago I created a new search interface to the NSW Post Office Directories from 1886 to 1950. Since then, I’ve used the same process on the Sydney Telephone Directories from 1926 to 1954. Both of these publications had been digitised by the State Library of NSW and made available through Trove. To build the new interfaces I downloaded the text from Trove, indexed it by line, and linked it back to the online page images.

Continue reading →

Fresh harvest of OCRd text from Trove's digitised periodicals – 9gb of text to explore and analyse!

Monday, September 5, 2022

I’ve updated the GLAM Workbench’s harvest of OCRd text from Trove’s digitised periodicals. This is a completely fresh harvest, so should include any corrections made in recent months. It includes: 1,430 periodicals OCRd text from 41,645 issues About 9gb of text The easiest way to explore the harvest is probably this human-readable list. The list of periodicals with OCRd text is also available as a CSV. You can find more details in the Trove journals section of the GLAM Workbench, and download the complete corpus from CloudStor.

Continue reading →

Explore Trove's digitised newspapers by place

Monday, September 5, 2022

I’ve updated my map displaying places where Trove digitised newspapers were published or distributed. You can view all the places on single map – zoom in for more markers, and click on a marker for title details and a link back to Trove. If you want to find newspapers from a particular area, just click on a location using this map to view the 10 closest titles. You can view or download the dataset used to construct the map.

Continue reading →

Making NSW Postal Directories (and other digitised directories) easier to search with the GLAM Workbench and Datasette

Thursday, September 1, 2022

As part of my work on the Everyday Heritage project I’m looking at how we can make better use of digitised collections to explore the everyday experiences woven around places such as Parramatta Road in Sydney. For example, the NSW Postal Directories from 1886 to 1908 and 1909 to 1950 have been digitised by the State Library of NSW and made available through Trove. The directories list residences and businesses by name and street location.

Continue reading →

Monday, August 29, 2022 →

Interested in Victorian shipwrecks? Kim Doyle and Mitchell Harrop have added a new notebook to the Heritage Council of Victoria section of the GLAM Workbench exploring shipwrecks in the Victorian Heritage Database: glam-workbench.net/heritage-…

Monday, August 29, 2022 →

Updates!

troveharvester Python package updated to v0.5.1: github.com/wragge/tr…
Trove Newspaper Harvester section of #GLAMWorkbench updated to v1.1.1 to use latest troveharvester: glam-workbench.net/trove-har…

Thursday, August 25, 2022 →

Minor update to RecordSearch Data Scraper – now captures ‘institution title’ for agencies if it is present. pypi.org/project/r…

Many thanks to the British Library – sponsors of the GLAM Workbench’s web archives section!

Tuesday, August 16, 2022

You might have noticed some changes to the web archives section of the GLAM Workbench. I’m very excited to announce that the British Library is now sponsoring the web archives section! Many thanks to the British Library and the UK Web Archive for their support – it really makes a difference. The web archives section was developed in 2020 with the support of the International Internet Preservation Consortium’s Discretionary Funding Programme, in collaboration with the British Library, the National Library of Australia, and the National Library of New Zealand.

Continue reading →

New GLAM data to search, visualise and explore using the GLAM Workbench!

Monday, August 15, 2022

There’s lots of GLAM data out there if you know where to look! For the past few years I’ve been harvesting a list of datasets published by Australian galleries, libraries, archives, and museums through open government data portals. I’ve just updated the harvest and there’s now 463 datasets containing 1,192 files. There’s a human-readable version of the list that you can browse. If you just want the data you can download it as a CSV.

Continue reading →

Zotero now saves links to digitised items in Trove from the NLA catalogue!

Tuesday, August 9, 2022

I’ve made a small change to the Zotero translator for the National Library of Australia’s catalogue. Now, if there’s a link to a digitised version of the work in Trove, that link will be saved in Zotero’s url field. This makes it quicker and easier to view digitised items – just click on the ‘URL’ label in Zotero to open the link. It’s also handy if you’re viewing a digitised work in Trove and want to capture the metadata about it.

Continue reading →

View embedded JSON metadata for Trove's digitised books and journals

Monday, August 1, 2022

The metadata for digitised books and journals in Trove can seem a bit sparse, but there’s quite a lot of useful metadata embedded within Trove’s web pages that isn’t displayed to users or made available through the Trove API. This notebook in the GLAM Workbench shows you how you can access it. To make it even easier, I’ve added a new endpoint to my Trove Proxy that returns the metadata in JSON format.

Continue reading →

Where did all those NSW articles go? Trove Newspapers Data Dashboard update!

Friday, July 29, 2022

I was looking at my Trove Newspapers Data Dashboard again last night trying to figure out why the number of newspaper articles from NSW seemed to have dropped by more than 700,000 since my harvesting began. It took me a while to figure out, but it seems that the search index was rebuilt on 31 May, and that caused some major shifts in the distribution of articles by state, as reported by the main result API.

Continue reading →

Catching up – some recent GLAM Workbench updates!

Thursday, July 28, 2022

There’s been lots of small updates to the GLAM Workbench over the last couple of months and I’ve fallen behind in sharing details. So here’s an omnibus list of everything I can remember… Data Weekly harvests of basic Trove newspaper data continue, there’s now about three months worth. You can view a summary of the harvested data through the brand new Trove Newspaper Data Dashboard. The Dashboard is generated from a Jupyter notebook and is updated whenever there’s a new data harvest.

Continue reading →

Calling all Tasmanian historians – you can now save resources from Libraries Tasmania into Zotero!

Thursday, July 14, 2022

I’ve created a Zotero translator for the Libraries Tasmania catalogue. Using it, you can save metadata and digital resources to your own research database with a single click. Libraries Tasmania actually has three catalogues rolled into one – the main library catalogue, the Archives catalogue, and the Names Index. The translator works across all three. Features include: Select and save items from a page of search results. Save individual items across the full range of formats.

Continue reading →

Thursday, July 14, 2022 →

Updated dataset! Harvests of Trove list metadata from 2018, 2020, and 2022 are now available on Zenodo: doi.org/10.5281/z… Another addition to the growing collection of historical Trove data. #GLAMWorkbench

Screen capture of version information from Zenodo showing that there are three available versions, v1.0, v1.1, and v1.2.

Sunday, July 10, 2022 →

Updated dataset! Details of 2,201,090 unique public tags added to 9,370,614 resources in Trove between August 2008 and July 2022. Useful for exploring folksonomies, and the way people organise and use massive online resources like Trove. doi.org/10.5281/z…

Saturday, July 9, 2022 →

Ok, I’ve created a Zenodo community for datasets documenting changes in the content and structure of Trove. Lots more to add… zenodo.org/communiti…