Tim Sherratt - Sharing recent updates and work-in-progress

Tim Sherratt

Sharing recent updates and work-in-progress

20 Apr 2019

Among the OCRd texts I’m currently harvesting from Trove’s journals zone are things like the NSW Post Office Directories from 1886 onwards. Useful sources for compiling data about occupations, locations etc?

20 Apr 2019

Wow, there are now over 371,000 press releases, interview transcripts and more from the @ParlLibrary available through @TroveAustralia. Just working on a new notebook to harvest sets for research…

19 Apr 2019

Another collection of OCRd text from @TroveAustralia is on its way…

17 Apr 2019

Playing with @TroveAustralia newspaper results. Here’s illustrated articles with ‘White Australia Policy' in their title…

15 Apr 2019

The final tally – after much tweaking I’ve downloaded OCRd text from 9,738 works in the @TroveAustralia books zone. This includes ephemera such as pamphlets and posters as well as more booky books. Here’s the full metadata, all the text files, & harvesting code. #dhhacks

mp-photo-alt[]=

14 Apr 2019

I’m looking for books in @TroveAustralia, but there’s lots of ephemera (pamphlets, posters etc) in the book zone. So I tried grabbing the images of ‘books’ with one page & found some nice stuff including this collection of playbills. #dhhacks

mp-photo-alt[]=mp-photo-alt[]=mp-photo-alt[]=

12 Apr 2019

Text of over 3 thousand digitised books and pamphlets downloaded so far from @TroveAustralia…

mp-photo-alt[]=

11 Apr 2019

After talking to @PrimahadiWijaya today about work at @MonashLing, I started harvesting metadata & full text from digitised books in @TroveAustralia. OCRd text from about 2,000 books downloaded so far. More soon… #dhhacks

11 Apr 2019

What I did at #valatechcamp! Here’s a CSV with basic details of 7,719 digitised books available through @TroveAustralia. I’m not sure if they all have OCRd text available, but if they do I’ll attempt to download it once I’m back home.

11 Apr 2019

TIL that the web pages for digitised works (like books and journal issues) on @TroveAustralia embed a lot of useful metadata that you can’t get through the API. Here’s how to extract it.

11 Apr 2019

Just posting the link to my ‘Introducing APIs’ slides for #VALATechCamp again, so that they show up in my MicroBlog feed…

07 Apr 2019

Hmm, it occurs to me that the method I used to generate newspaper article thumbnails from Trove, could also be used to extract illustrations (cartoons, drawings, photos etc)…

07 Apr 2019

So, I’ve finally figured out a way to automatically generate nice-looking thumbnails from @TroveAustralia newspaper articles. Demo notebook here. #dhhacks

06 Apr 2019

So I put the recent report into Australia’s national cultural institutions into the @TDHASSN instance of @VoyantTools. Here’s the contexts of the word ‘story’

31 Mar 2019

Train from Canberra to Melbourne booked for #VALATechCamp. I’ll be hanging around both days, so let me know if you’d like to chat about the GLAM Workbench, Jupyter, Trove data, or any of the other things I fiddle with…

26 Mar 2019

Sneak preview of my GLAM CSV Explorer now live on @MyBinderTeam! Select one of 447 GLAM-related CSVs from @datagovau for analysis, or load your own. Coming soon to @TDHASSN. #dhhacks

26 Mar 2019

Having ripped out a lot of code and simplified a mess of conditionals, I think this CSV Explorer thingy is getting there…

25 Mar 2019

Now to load that new CSV of GLAM CSVs into my CSV Explorer…

25 Mar 2019

Quick notebook to harvest GLAM datasets via the new(ish) @datagovau API. Includes 447 CSVs from 19 institutions.

mp-photo-alt[]=mp-photo-alt[]=

23 Mar 2019

This is why we can’t have nice things…

22 Mar 2019

Still plenty to do, but my CSV Explorer is taking shape… (coming soon to @TDHASSN & elsewhere!)

I’ll be giving a demo at the @HumanitiesAU data summit on Friday.

Now with animated gif…

21 Mar 2019

Doesn’t take much to show when there’s a problem with dates in metadata… (Yep, post 1900 dates have all jumped 100 years into the future.)

20 Mar 2019

Fun day talking to the @dhpanu team at ANU about digital history possibilities. Slides/links are all online.

16 Mar 2019

Currently working on a CSV explorer to give researchers an overview of the contents of GLAM datasets. Sort of like WTFCSV, but in a Jupyter notebook…

12 Mar 2019

A bit more progress. Having found columns with OpenCV, I can use Tesseract to help me find the rows…

mp-photo-alt[]=