Sharing recent updates and work-in-progress
It was Open Data Day on Saturday 6 March – here’s some of the ready-to-go datasets you can find in the GLAM Workbench – there’s something for historians, humanities researchers, teachers & more!
First here’s a list of Australian GLAM (that’s galleries, libraries, archives & museums) data sources. It includes APIs, portals, and downloadable datasets. Suggested additions welcome!
There’s also a list of Australian GLAM datasets that are available through government open data portals. There’s hundreds of them, but they’re not always easy to find. Convicts, immigration, hospitals, WWI – includes lots of useful biographical data.
If you’re not sure where to start with a list of 600 CSV files, have a look at the GLAM CSV Explorer! Select a file and this Jupyter-powered app will build a series of visualisations based on the contents of each column.
While they’re not yet in an open data portal, NSW State Archives has a rich collection of indexes transcribed by volunteers. I’ve scraped 64 indexes, with over 1.4 million rows of data and put them in a repository for easy download. There’s even a version of the CSV Explorer, just for the NSW State Archives indexes.
Here’s a CSV file containing details of every issue of the Australian Women’s Weekly in Trove.
A collection of front covers from the Australian Women’s Weekly from 1933 to 1982! That’s 2,566 images you can download from Cloudstor or browse in a series of convenient PDFs.
Here’s a list of non-English language newspapers in Trove.
And another list of newspapers in Trove with articles available from beyond the 1954 copyright cliff of death.
While we’re on newspapers, here’s a spreadsheet that identifies places of publication or circulation of Trove newspapers, and provides geocordinates
What about some text? Here’s 24,620 files of OCRd text from digitised books and ephemera in Trove. There’s also a CSV-formatted list with the basic details of each book.
More text! Here’s OCRd text from 26,234 issues of 397 digitised journals in Trove.
Something different – a collection of 12,619 press releases & speeches by Australian politicians that include any of the terms ‘immigrant’, ‘asylum seeker’, ‘boat people’, ‘illegal arrivals’, or ‘boat arrivals’. From the Parliamentary Library via Trove.
Some more images – a collection of 3,471 full-page editorial cartoons from The Bulletin, 1886 to 1952 (with a warning for racist content). Available both as individual images and compiled into PDFs.
From the ABC via Trove, there’s 400,000 records from Radio National programs broadcast since the late 1990s. That includes every segment broadcast on AM, PM, RN Breakfast etc.
This might be handy – from some work I’m doing with ANU Archives, here’s a CSV file containing details of holidays in NSW from 1901 to 1950.
The Department of Prime Minister and Cabinet provides XML versions of more than 20,000 speeches & interviews from recent PMs for download. I’ve saved them to a repository and compiled some indexes.
And finally – Commonwealth Hansard from the Parliamentary Library – lots of well-structured XML files! I’ve created a repo with one file for each sitting day from 1901 to 1980 & 1998 to 2005 (hopefully the gap will be filled soon). There’s also a CSV index to sitting days.
And if that’s not enough data, the GLAM Workbench provides tools to help you create your own datasets from Trove, the National Archives of Australia, the National Museum of Australia, Archives NZ, DigitalNZ, & more! #dhhacks