Tim Sherratt - Sharing recent updates and work-in-progress

Tim Sherratt

Sharing recent updates and work-in-progress

18 Jan 2019

Uh, never come across one of these before from the @TroveAustralia API. Needless to say it causes the Newspaper Harvester to die.

It’s easy to check for these things once you know they exist, but…

18 Jan 2019

Testing the updated Trove Newspaper Harvester…

Run into a problem with the @TroveAustralia API not returning the complete result set in large harvests, trying to figure out why…

18 Jan 2019

Thanks to the @TroveAustralia API upgrade, the new version of the Trove Newspaper Harvester should be a lot faster. For harvests with full text (but not PDFs which slow things down a lot) I’m getting 40-50 articles a second.

18 Jan 2019

Since I’m updating the Trove Newspaper Harvester to work with version 2 of the @TroveAustralia API thought I might as well fix up a few other things as well…

Now with added progress bars!

17 Jan 2019

I’m enjoying using micro.blog as a way of capturing what I’m working on: updates.timsherratt.org

Just need to get the GitHub mirror site working…

17 Jan 2019

Finally biting the bullet and getting to work on updating the TroveHarvester to work with version 2 of the API…

17 Jan 2019

That’s cool — just realised I can share easily share live versions of Altair charts from Jupyter notebooks using Vega. Here’s the complete ‘aliens’ chart.

17 Jan 2019

And also “coloured alien” which, not suprisingly, peaks in 1901 when the Immigration Restriction Act is passed…

17 Jan 2019

Exploring some of the adjectives attached to ‘alien’ in @TroveAustralia newspapers…

You can create these sorts of comparisons yourself using this app. #dhhacks

17 Jan 2019

Just to emphasise my point the other day about the impact of stemming on searches for naturalisation/naturalization in @TroveAustralia. Compare these — the stemming on/off results for ‘naturalisation’ are pretty much in proportion, but not for ‘naturalization’…

17 Jan 2019

Nothing like browsing the databases of another country’s national/state archives to make you realise how useful the series system is…

16 Jan 2019

The Australian version of ‘Who’s responsible?’ is up! Just select a government function and explore the different agencies associated with it over time. It’s built with data from @naagovau’s RecordSearch. Try it live! #dhhacks

16 Jan 2019

New notebook added to the #GLAMWorkbench RecordSearch repository — get the basic details of agencies associated with all government functions used in @naagovau’s RecordSearch and save to a single JSON data file. View code and data. #dhhacks

16 Jan 2019

Hmm, wondering why the ‘National Council of Women of the Australian Capital Territory’ is assigned the function ‘CITIZENSHIP’ in @naagovau’s RecordSearch…

15 Jan 2019

As well as cross-posting updates to Twitter and Mastodon, I’ve now set up IFTTT to keep an eye on my micro.blog feed and post anything with the hashtag #dhhacks to my 101 DH Hacks FB page!

15 Jan 2019

Adventures in stemming, or what happens when you search Trove for 'naturalization'

Fun fact — the Porter stemming algorithm treats the words ‘naturalisation’ and ‘naturalization’ differently. Naturalisation is stemmed to ‘naturalis’, naturalization to ‘natur’. You can try this yourself using this NLTK stemming demo. Why d...
14 Jan 2019

I have a brand new updates page powered by micro.blog!