The death and (hopefully) resurrection of Trove Twitter bots

Today version 1 of the Trove API was decommissioned. As I explained elsewhere, this meant that a number of Trove Twitter bots also died. The problem is that version 2 of the API provides no easy way to randomly select records. Bots, and other apps that share random content, require major reworking. After a lot of experimentation, I’ve settled on a few methods for selecting random-ish results. They’re far from perfect, but they seem to work reliably.

Continue reading →

Updated my NSW public holidays data to include a few extras proclaimed by the government: nbviewer.jupyter.org/github/wr…

Creators and users of my Trove Twitter Bots, please read and share this update!

tl;dr Version 1 of the Trove API will be discontinued soon so Trove Twitter bots need to be upgraded. Unfortunately, Version 2 of the Trove API doesn’t support the random selection of resources, so the current behaviour of many bots will change. The problem In January 2018, I created a series of templates on Glitch that made it easy for people to build their own Trove Twitter bots. And they did!

Continue reading →

Over the last few weeks I’ve been exploring ways of recording dates for 70,000 digitised pages from Sydney Stock Exchnage records in the @TheANUArchives. Here’s the progress so far

Here’s my attempt to calculate NSW holidays from 1900 to 1950. It’s probably incomplete, but it’s a start… nbviewer.jupyter.org/github/wr…

A couple of years ago I gave a talk in which I tried to justify what I do as research. I was going to turn it into an article, but never did. So here’s ‘The multiplication of contexts’ as a blog post.

The @naagovau RecordSearch section of the #GLAMWorkbench has been updated with more notebooks to help you get Australian archives data in a usable form. glam-workbench.github.io/recordsea… Useful for #twitterstorians, #ozhist, & #govhack!

Crikey, my notebook for getting useful data out of @naagovau keeps growing! Now with sections on tackling large series, and harvesting ALL the images from digitised files. https://nbviewer.jupyter.org/github/GLAM-Workbench/recordsearch/blob/master/harvesting_items_from_a_search.ipynb

Want to save searches for items in @naagovau’s RecordSearch as CSVs for exploration & analysis? This notebook walks through the process of constructing, managing, and saving data harvests. #dhhacks

I’ve updated my harvest of OCRd text from digitised journals in @TroveAustralia. The complete dataset now includes 33,035 issues from 720 titles – about 8gb of text to explore. Details in the #GLAMWorkbench: glam-workbench.github.io/trove-jou… #dhhacks

My app to browse & search @TroveAustralia’s digitised journals has been updated! Since 4 July, 112 new titles & 86,211 new articles have been added to Trove. Many of these new titles are parliamentary papers. Explore here: trove-titles.herokuapp.com #dhhacks

Another WIP notebook in need of additional documentation… This one explores the stats around volunteer correction of OCR errors in @TroveAustralia’s newspapers. More to come!

And this notebook uses TF-IDF to explore the OCRd text of a digitised journal from Trove. Get the top TF-IDF scores for each year across a journal’s life and see how they change. More documentation coming!

This notebooks lets you download the OCRd text of a digitised journal from @TroveAustralia (via CloudStor) and then explore word frequencies over time. More documentation coming soon!

A new notebook looking at the data about digitised journals on @TroveAustralia. #dhhacks

There’s a new section of the GLAM Workbench devoted to the National Museum of Australia collection API! Harvest @nma data, then explore it by time and place. #dhhacks

The second new notebook looks at @TroveAustralia’s newspapers as a whole, visualising both by time and by state. Along the way it looks at favourites such as the WWI effect and the copyright cliff of death. #dhhacks

Some brand new Jupyter notebooks for those interested in #ozhist & digital exploration of @TroveAustralia’s newspapers. The first walks through different ways of visualising newspaper searches over time. #dhhacks

I’ve updated the @invisibleaus data repository with latest transcriptions/markings from White Australia Policy records in @naagovau.

According to my last harvest, @TroveAustralia’s digitised journals comprise 31,216 separate issues. Here are the number of issues by year.