I’ve been doing a bit of work behind the scenes lately to prepare for a major update to the GLAM Workbench. My plan is to provide one click installation of any of the GLAM Workbench repositories on the Reclaim Cloud platform. This will provide a useful step up from Binder for any researcher who wants to do large-scale or sustained work using the GLAM Workbench. Reclaim Cloud is a paid service, but they do a great job supporting digital scholarship in the humanities, and it’s fairly easy to minimise your costs by shutting down environments when they’re not in use.
I’ve given a couple of talks lately on the GLAM Workbench and some of my other work relating to the construction of online access to GLAM collections. Videos and slides are available for both:
From collections as data to collections as infrastructure: Building the GLAM Workbench, seminar for the Centre for Creative and Cultural Research, University of Canberra, 22 February 2021 – video (40 minutes) and slides Building the GLAM Workbench (and various other projects such as The Real Face of White Australia, Closed Access, and redacted), guest lecture for the Cultural Data Sculpting course, EPFL, Switzerland, 18 March 2021 – video (1hr 40mins) and slides I’ve also updated the presentations page in the GLAM Workbench.
It was Open Data Day on Saturday 6 March – here’s some of the ready-to-go datasets you can find in the GLAM Workbench – there’s something for historians, humanities researchers, teachers & more!
First here’s a list of Australian GLAM (that’s galleries, libraries, archives & museums) data sources. It includes APIs, portals, and downloadable datasets. Suggested additions welcome!
There’s also a list of Australian GLAM datasets that are available through government open data portals.
The recent change of labels from ‘Barcode' to ‘ItemID’ in the National Archives of Australia’s RecordSearch database broke the Zotero translator. I’ve now updated the translator, and the new version has been merged into the Zotero translators repository. It should be updated when you restart Zotero, but if not you can go to Preferences > Advanced > Files and folders and click on the Reset translators button.
The translator lets you:
@TroveNewsBot has been sharing Trove newspaper articles on Twitter for over 7 years. With its latest upgrade the bot now has an ‘on this day’ function. Every day at AEST9.00am, TroveNewsBot will share an article published on that day in the past.
Even better, you can make your own ‘on this day' queries by tweeting to @TroveNewsBot with the hashtag #onthisday. For example:
Tweeting ‘#onthisday #luckydip’ – will return a random article published on this day in the past.
The NAA recently changed field labels in RecordSearch, so that ‘Barcode' is now ‘Item ID’. This required an update to my recordsearch_tools screen scraper. I also had to make a few changes in the RecordSearch section of the GLAM Workbench. #dhhacks
After some recent investigations of the availability of open access versions of articles published in paywalled Australian history journals, I’ve started a Google doc to capture useful links and information for Australian historians wanting to make their research open access. Comments and additions are welcome. #dhhacks
In 2014 I pulled together a sample of web pages that included links back to digitised newspaper articles in Trove and created the ‘Trove Traces’ app. It was interesting, and sometimes disturbing, to see the diversity of sites that made use of Trove. Amongst the family and local history enthusiasts were climate change deniers and racists who found ‘evidence' for their views in past newspapers. And of course, the sample only includes links in web pages, not social media sharing.
I’ve added an API Query Builder to the DigitalNZ section of the GLAM Workbench. You can use it to learn about the different parameters available from the search API, and experiment with different queries. Just get your API key from DigitalNZ, then try entering keywords and selecting options. Once you understand how the API works, you can start thinking about how you can make use of it in your own projects.
Lately I’ve been updating and expanding the notebooks in the DigitalNZ section of the GLAM Workbench. In particular, I’ve been looking at the usage facet to understand how much of the aggregated content is ‘open’. What do I mean by ‘open’? The Open Knowledge Foundation definition states that ‘open data and content can be freely used, modified, and shared by anyone for any purpose’. Obviously things that are in the public domain, such as out-of-copyright resources, are open.
If you like browsing Trove’s digitised newspapers page by page, you might have found that the current interface is a bit clunky. To move between pages you have to hover over the page number and click on ‘Next’ or ‘Previous’. Wouldn’t it be good if you could just use the arrow keys on your keyboard? Well now you can!
I’ve created a very simple script that allows you to use the arrows on your keyboard to move between pages in Trove’s digitised newspapers.
There’s a new GLAM Workbench section for working with data from Trove’s Music & Sound zone!
Inside you’ll find out how to harvest all the metadata from ABC Radio National program records – that’s 400,000+ records, from 160 Radio National programs, over more than 20 years.
It’s metadata only, so not full transcripts or audio, though there are links back to the ABC site where you might find transcripts. Most records should at least have a title, a date, the name of the program it was broadcast on, a list of contributors, and perhaps a brief abstract/summary.
There are a growing number of non-English newspapers in Trove, but how do you know what’s there? After trying a few different approaches, I generated a list of 48 newspapers with non-English content. The full details are in this notebook).
As the notebook describes, I found the language metadata for newspapers was incomplete, so I used some language detection code on a sample of articles from every newspaper to try and find those with non-English content.
Last year I did some analysis of the availability of open access versions of research articles published between 2008 and 2018 in Australian Historical Studies. I’ve now broadened this out to cover all individual articles (with a DOI) across a number of journals. It’s pretty grim. Despite Green OA policies that allow researchers to share versions of their articles through institutional repositories, Australian history journals still seem to be about 94% closed.
A long thread exploring files in the National Archives of Australia with the access status of ‘closed’. This is the 6th consecutive year I’ve harvested ‘closed’ files on or about 1 January.
It’s January 1, the day each year when our minds turn to newly released Cabinet records from @naagovau. But while the media focuses on the records that have been made open, I’ll be spending the day looking at those that were closed. What weren’t you allowed to see in 2020?
More updates from The Real Face of White Australia – running facial detection code over NAA: SP42/1.
Finished! NAA: SP42/1 is a general correspondence series from the Collector of Customs in Sydney. It includes many files relating to the administration of the White Australia Policy. 3,375 files have been digitised (about 20% of the series), that’s 49,781 digital images. https://t.co/Y1ZoAYSXeP
I reharvested NAA: ST84/1 and ended up with 14,545 images from 461 digitised files (about 17% of the total series).
In these images I found 9,970 faces – this is a couple of thousand more than when I used OpenCV in 2010/11 for the original wall of faces. https://t.co/BAnkX7u83S
Asking questions with web archives – introductory notebooks for historians has won the British Library Labs Research Award for 2020. The awards recognise ‘exceptional projects that have used the Library’s digital collections and data’.
This project gave me a chance to work with web archives collections and staff from the British Library, the National Library of Australia, and the National Library of New Zealand, and was supported by the International Internet Preservation Consortium’s Discretionary Funding Program.
Want to relive the early days of digital humanities in Australia? I’ve archived the websites created for THATCamp Canberra in 2010, 2011, and 2014. They’re now static sites so search and commenting won’t work, but all the content should be there! #dhhacks
The Invisible Australians website has been given a much needed overhaul, and we’ve brought all our related projects together under the title The real face of White Australia. This includes an updated version of the wall of faces. #dhhacks