Tim Sherratt - Sharing recent updates and work-in-progress

Tim Sherratt

Sharing recent updates and work-in-progress

29 Jul 2020

Another #GLAMWorkbench update! The Trove Harvester will now download both newspaper and gazette articles in bulk. You can optionally include full text, and save copies of the articles as images and PDFs. #dhhacks glam-workbench.github.io/trove-har…

28 Jul 2020

Interested in using web archives in your research? Join us on 5/6 August for a free @netpreserve webinar introducing the tools and examples available in the new #webarchives section of the #GLAMWorkbench. There are two timeslots to cover multiple timezones: www.eventbrite.com/e/iipc-rs… and www.eventbrite.com/e/iipc-rs…

27 Jul 2020

Introducing a brand new section of the #GLAMWorkbench, exploring the @MuseumsVictoria collection API. Harvest species records, display random images, and download ALL THE ANTECHINUSES! glam-workbench.github.io/museumsvi… #dhhacks

27 Jul 2020

New additions to the @TroveAustralia books section of the #GLAMWorkbench – word frequency examples with OCRd text from digitised books, and a random recipe generator powered by a 19th C cook book! glam-workbench.github.io/trove-boo… #dhhacks

27 Jul 2020

With the recent changes to @TroveAustralia, the Australian Women’s Weekly cover browser was retired. As a low-tech alternative, I’ve harvested all the cover images from the Women’s Weekly and saved them into PDFs for easy browsing, one for each decade. There are 2,566 images from 1933 to 1982.

Just click on the link below each image to explore the complete issue on Trove. You can also download the full collection of images from Cloudstor. There’s a CSV file containing all the issue metadata.

The notebook used to harvest the images is in the Trove newspapers section of the GLAM Workbench. You could easily adapt the notebook to harvest the front pages of any newspaper. #dhhacks

17 Jul 2020

The Trove books section of the #GLAMWorkbench has been updated. There’s a fresh harvest of OCRd text & the notebooks have been changed to work with the new @TroveAustralia interface. Download & explore 24,620 files (3gb) of OCRd text! #dhhacks

17 Jul 2020

Revisiting my Historic Hansard XML repository & realising how easy it is to load files as needed via the GitHub API & explore with Pandas & Jupyter. This #GLAMWorkbench notebook helps you explore a particular year/house. #dhhacks

14 Jul 2020

The Trove Journals section of the #GLAMWorkbench has been updated to work with the new @TroveAustralia interface! I’ve also re-harvested ALL the OCRd text from digitised journals — 6gb of text from 397 journals now downloadable in bulk from CloudStor. #dhhacks

12 Jul 2020

New in #GLAMWorkbench! After you’ve used the @TroveAustralia Newspaper Harvester to download lots & lots of articles, try exploring the results in Datasette. This notebook sets everything up, you can even add full text search & images! #dhhacks

29 Jun 2020

Download newspaper articles in bulk! The Trove Newspaper Harvester has been updated to work with the new @TroveAustralia interface. I’ve also added the ability to save articles as .jpg images! The easiest way to get started is via the #GLAMWorkbench. #dhhacks

Screenshot of Trove Harvester page in GLAM WorkbenchScreenshot of TroveHarvester web appDetails of image file naming schemeThumbnails of newspaper articles saved as images

22 Jun 2020

My app for searching in @TroveAustralia’s digitised journals has been updated to work with the new Trove interface. You’ll need to have switched over to the new interface before you try searching (just click the link on the Trove home page). #dhhacks

09 Jun 2020

Another db migrated and app updated!

Have you ever wondered what interjections in historic hansard would look like as tweets? Well I did… Now with longer interjections & more emojis!

hansard-interjections.herokuapp.com/tweets/ #dhhacks

09 Jun 2020

Here’s a map of places where @TroveAustralia digitised newspapers were published/circulated. Click on the map to find the closest newspapers to a place. Updated with new titles from the last year! troveplaces.herokuapp.com/map/ #dhhacks

27 May 2020

New GLAM Workbench section on web archives!

We tend to think of a web archive as a site we go to when links are broken – a useful fallback, rather than a source of new research data. But web archives don’t just store old web pages, they capture multiple versions of web resources over...
08 May 2020

Thanks to @NetPreserve, I’ve been spending time lately working on a set of web archive exploration notebooks for the #GLAMWorkbench. Here’s an example to create/compare screenshots of captures. #dhhacks

18 Apr 2020

Do you have a CSV file you’d like to make searchable, maybe even share online? New on #dhhacks, I show you how with @simonw’s awesome Datasette tool & @Glitch. Give it a try! 101dhhacks.net/share-sea…

13 Apr 2020

New on #dhhacks – make your own @TroveAustralia newspaper game! Thanks to @glitch, just edit a couple of files to create your own customised edition of Headline Roulette – make it about cats, or Queensland, or Communist Party newspapers, or whatever!

12 Apr 2020

I’ve given my #dhhacks site a refresh, and updated my @TroveAustralia Twitter bot tutorial to link to the latest versions of the bots on @glitch. The new code is actually easier to customise, so plenty of opportunities to play around! More DHHacks coming soon…

04 Apr 2020

If you’d ever wished you could get a random(ish) newspaper article from @TroveAustralia’s API, here’s a hack for you! I’ve added an option to return a random article to my Trove proxy app. You can filter by normal API facets. Go to: trove-proxy.herokuapp.com #dhhacks

02 Apr 2020

The GLAM CSV Explorer has had a few updates — you can now filter by organisation, and upload your own CSV files! #GLAMWorkbench Try it live on Binder.

31 Mar 2020

Buildings might be closed, but the data is open – explore hundreds of datasets from Australian GLAM organisations!

For a couple of years I’ve been harvesting datasets created or published by Australian GLAM organisations through government data portals. I’ve just completed the latest harvest, and there’s now 369 datasets, containing 983 files, from 23 G...
30 Mar 2020

Updated! My notebook to upload digitised newspapers from @TroveAustralia to an @Omeka-S site has been improved — no longer trips over non-newspaper articles in Zotero collections, and does a check to avoid uploading existing articles. #dhhacks

12 Mar 2020

My data file of public holidays in NSW from 1900-1950 has been updated – now including variations in the King’s Birthday holiday and extra days like VE Day. #dhhacks

11 Mar 2020

My harvest of OCRd text from @TroveAustralia digitised books, ephemera, and parliamentary papers has been updated! There’s now 19,795 text files (about 3gb) to explore! Harvesting details and links to browse/download files from Cloudstor are in the #GLAMWorkbench. #dhhacks

09 Mar 2020

The simple Trove proxy that you can use to get to download links for PDFs of newspaper articles from @TroveAustralia has been updated to Python 3, and Trove API version 2. It’s used in @Zotero and elsewhere… #dhhacks