I’ve added a ‘save chart’ option to the QueryPic app in my GLAM Workbench. Visualise your searches in @TroveAustralia newspapers, then save the results as HTML for easy download. #dhhacks

I’ve added a ‘save chart’ option to the QueryPic app in my GLAM Workbench. Visualise your searches in @TroveAustralia newspapers, then save the results as HTML for easy download. #dhhacks
Pleased and proud to see the chapter @baibi & I wrote on the Real Face of White Australia now published as part of an awesome collection. Buy now or read the CC-BY version online!
In a bit over a week I’ll be heading to Stockholm for the ‘From text to data’ conference. Preparing myself for the 40 degree temperature difference…
I’m giving a talk in a week or so (eep!) that looks at some of the changing contexts in which the word ‘aliens’ has been used in Australia. I thought, by way of comparison, it would be useful to do the same for ‘immigrants’. While I was playing around with the data last night, I came across something interesting, so here’s a sneak preview… Getting the data Using my TroveHarvester I downloaded the full text of all newspaper articles in Trove that included the word ‘immigrants’.
In case you’re wondering, it took about 13 hours to download the metadata and full text of more than 2,000,000 @TroveAustralia articles including the word ‘Chinese’ using my Trove Newspaper Harvester. You can try it here.
Ok, so let’s see how I go harvesting 2 million newspaper articles from @TroveAustralia conatining the word ‘Chinese’…
30,000+ occurences of the word ‘Chinese’ in the OCRd full text of The Bulletin, 1880-1968.
One more and I’m done for the night… New GLAM Workbench page for the ‘Trove API introduction’ notebooks.
I’ve finished putting details of all the current GLAM Workbench repositories into the new documentation site. Still a few notebooks to migrate from the original workbench, but getting there! There’s about 50 Jupyter notebooks so far. #dhhacks
Added a ‘data’ section to the GLAM Workbench docs, with info on harvests from government data portals, as well as series from @naagovau relating to ASIO and the White Australia Policy.
And now a GLAM Workbench page for @Te_Papa…
Added a page for @ArchivesNZ’s Archway to the GLAM Workbench docs…
So here’s some fun things to do with @TroveAustralia newspapers… (via GLAM Workbench)
Ok, more documentation for you — page for the @DigitalNZ API in GLAM Workbench updated!
Slowly working my way through the documentation for my GLAM Workbench. Still lots to do, but I think the page for @naagovau’s RecordSearch is now up-to-date.
If there are APIs or other data sources you’d like me to add to my GLAM Workbench, feel free to create an issue. You could also describe what sorts of tools or examples using that data source would be useful.
Updated list of the fifty most common words occuring before the word ‘aliens’ in @TroveAustralia newspapers (with no capitalisation and stopwords removed). 274,157 occurances in 213,151 articles.
Just updated my harvest of metadata and full text from The Bulletin in @TroveAustralia. There’s about 2gb of OCRd text from 4,534 issues (1880-1968). Full text for about 60 issues have been added since my last harvest. 111 have no OCRd text. Download it all from GitHub #dhhacks
Fifty most common words occuring before the word ‘aliens’ in @TroveAustralia newspapers (213,000 articles)…
You want big data? I just harvested 213,340 newspaper articles (including full OCRd text) from @TroveAustralia in 82 minutes, at about 40 articles a second. https://mybinder.org/v2/gh/GLAM-Workbench/trove-newspaper-harvester/master?urlpath=%2Fapps%2Fnewspaper_harvester_app.ipynb