Tim Sherratt

Wednesday, January 23, 2019 →

One more and I’m done for the night… New GLAM Workbench page for the ‘Trove API introduction’ notebooks.

Wednesday, January 23, 2019 →

I’ve finished putting details of all the current GLAM Workbench repositories into the new documentation site. Still a few notebooks to migrate from the original workbench, but getting there! There’s about 50 Jupyter notebooks so far. #dhhacks

Wednesday, January 23, 2019 →

Added a ‘data’ section to the GLAM Workbench docs, with info on harvests from government data portals, as well as series from @naagovau relating to ASIO and the White Australia Policy.

Wednesday, January 23, 2019 →

And now a GLAM Workbench page for @Te_Papa…

Wednesday, January 23, 2019 →

Added a page for @ArchivesNZ’s Archway to the GLAM Workbench docs…

Tuesday, January 22, 2019 →

So here’s some fun things to do with @TroveAustralia newspapers… (via GLAM Workbench)

Tuesday, January 22, 2019 →

Ok, more documentation for you — page for the @DigitalNZ API in GLAM Workbench updated!

Tuesday, January 22, 2019 →

Slowly working my way through the documentation for my GLAM Workbench. Still lots to do, but I think the page for @naagovau’s RecordSearch is now up-to-date.

Tuesday, January 22, 2019 →

If there are APIs or other data sources you’d like me to add to my GLAM Workbench, feel free to create an issue. You could also describe what sorts of tools or examples using that data source would be useful.

Monday, January 21, 2019 →

Updated list of the fifty most common words occuring before the word ‘aliens’ in @TroveAustralia newspapers (with no capitalisation and stopwords removed). 274,157 occurances in 213,151 articles.

Monday, January 21, 2019 →

Just updated my harvest of metadata and full text from The Bulletin in @TroveAustralia. There’s about 2gb of OCRd text from 4,534 issues (1880-1968). Full text for about 60 issues have been added since my last harvest. 111 have no OCRd text. Download it all from GitHub #dhhacks

Sunday, January 20, 2019 →

Fifty most common words occuring before the word ‘aliens’ in @TroveAustralia newspapers (213,000 articles)…

Saturday, January 19, 2019 →

You want big data? I just harvested 213,340 newspaper articles (including full OCRd text) from @TroveAustralia in 82 minutes, at about 40 articles a second. https://mybinder.org/v2/gh/GLAM-Workbench/trove-newspaper-harvester/master?urlpath=%2Fapps%2Fnewspaper_harvester_app.ipynb

Saturday, January 19, 2019 →

So now I’ve updated TroveHarvester and built a new interface I can get back to the task I wanted the TroveHarvester for a couple of days ago — harvesting all references to ‘aliens’ in newspapers… #yakshaving

Saturday, January 19, 2019 →

Want an easy way to download @TroveAustralia newspaper articles in bulk? No installation? Point and click? I’ve created a simple web app version of my TroveHarvester using a Jupyter notebook & running on @mybinderteam. Try it live! #dhhacks

Saturday, January 19, 2019 →

And version 0.2.2 of TroveHarvester quickly follows 0.2.1 as I squash a bug when downloading PDFs… Also managed to get the README displaying properly on Pypi. pypi.org/project/t…

Friday, January 18, 2019 →

TroveHarvester 0.2.1 — updated to work with version 2 of the @TroveAustralia API. Now on pypi! More details shortly…

Friday, January 18, 2019 →

Ok, that’s more like it. Full text and metadata of 29,203 newspaper articles harvested using the @TroveAustralia API in under 10 minutes. Testing nearly done…

Friday, January 18, 2019 →

Ah ok, I forgot about the new ‘bulkHarvest’ parameter in the @TroveAustralia API. Setting that to ‘true’ seems to make all the difference…

Friday, January 18, 2019 →

Uh, never come across one of these before from the @TroveAustralia API. Needless to say it causes the Newspaper Harvester to die.

It’s easy to check for these things once you know they exist, but…