glamworkbench

Another #GLAMWorkbench update! The Trove Harvester will now download both newspaper and gazette articles in bulk. You can optionally include full text, and save copies of the articles as images and PDFs. #dhhacks glam-workbench.github.io/trove-har…

Interested in using web archives in your research? Join us on 5/6 August for a free @netpreserve webinar introducing the tools and examples available in the new #webarchives section of the #GLAMWorkbench. There are two timeslots to cover multiple timezones: www.eventbrite.com/e/iipc-rs… and www.eventbrite.com/e/iipc-rs…

Introducing a brand new section of the #GLAMWorkbench, exploring the @MuseumsVictoria collection API. Harvest species records, display random images, and download ALL THE ANTECHINUSES! glam-workbench.github.io/museumsvi… #dhhacks

New additions to the @TroveAustralia books section of the #GLAMWorkbench – word frequency examples with OCRd text from digitised books, and a random recipe generator powered by a 19th C cook book! glam-workbench.github.io/trove-boo… #dhhacks

With the recent changes to @TroveAustralia, the Australian Women’s Weekly cover browser was retired. As a low-tech alternative, I’ve harvested all the cover images from the Women’s Weekly and saved them into PDFs for easy browsing, one for each decade. There are 2,566 images from 1933 to 1982.

Just click on the link below each image to explore the complete issue on Trove. You can also download the full collection of images from Cloudstor. There’s a CSV file containing all the issue metadata.

The notebook used to harvest the images is in the Trove newspapers section of the GLAM Workbench. You could easily adapt the notebook to harvest the front pages of any newspaper. #dhhacks

The Trove books section of the #GLAMWorkbench has been updated. There’s a fresh harvest of OCRd text & the notebooks have been changed to work with the new @TroveAustralia interface. Download & explore 24,620 files (3gb) of OCRd text! #dhhacks

Revisiting my Historic Hansard XML repository & realising how easy it is to load files as needed via the GitHub API & explore with Pandas & Jupyter. This #GLAMWorkbench notebook helps you explore a particular year/house. #dhhacks

The Trove Journals section of the #GLAMWorkbench has been updated to work with the new @TroveAustralia interface! I’ve also re-harvested ALL the OCRd text from digitised journals — 6gb of text from 397 journals now downloadable in bulk from CloudStor. #dhhacks

New in #GLAMWorkbench! After you’ve used the @TroveAustralia Newspaper Harvester to download lots & lots of articles, try exploring the results in Datasette. This notebook sets everything up, you can even add full text search & images! #dhhacks

Download newspaper articles in bulk! The Trove Newspaper Harvester has been updated to work with the new @TroveAustralia interface. I’ve also added the ability to save articles as .jpg images! The easiest way to get started is via the #GLAMWorkbench. #dhhacks

Screenshot of Trove Harvester page in GLAM WorkbenchScreenshot of TroveHarvester web appDetails of image file naming schemeThumbnails of newspaper articles saved as images

New GLAM Workbench section on web archives!

We tend to think of a web archive as a site we go to when links are broken – a useful fallback, rather than a source of new research data. But web archives don’t just store old web pages, they capture multiple versions of web resources over time. Using web archives we can observe change – we can ask historical questions. But web archives store huge amounts of data, and access is often limited for legal reasons.

Continue reading →

Thanks to @NetPreserve, I’ve been spending time lately working on a set of web archive exploration notebooks for the #GLAMWorkbench. Here’s an example to create/compare screenshots of captures. #dhhacks

The GLAM CSV Explorer has had a few updates — you can now filter by organisation, and upload your own CSV files! #GLAMWorkbench Try it live on Binder.

Buildings might be closed, but the data is open – explore hundreds of datasets from Australian GLAM organisations!

For a couple of years I’ve been harvesting datasets created or published by Australian GLAM organisations through government data portals. I’ve just completed the latest harvest, and there’s now 369 datasets, containing 983 files, from 23 GLAM organisations. 628 of these files are in CSV (spreadsheet) format. There’s a number of ways that you can explore the harvested data. You can browse a big list of datasets, or download a CSV containing all the harvested data or just those formatted as CSVs.

Continue reading →

My harvest of OCRd text from @TroveAustralia digitised books, ephemera, and parliamentary papers has been updated! There’s now 19,795 text files (about 3gb) to explore! Harvesting details and links to browse/download files from Cloudstor are in the #GLAMWorkbench. #dhhacks

I’ve added some more documentation to the Trove Newspaper Harvester page in the #GLAMWorkbench. Get your @TroveAustralia newspaper articles in bulk! #dhhacks #collectionsasdata

New section added to the #GLAMWorkbench with examples from @Library_Vic! #slvdata #dhhacks #collectionsasdata

More fun with @iiif_io and images from @library_vic – resize, rotate, crop and more! Try it out with this new notebook in the #GLAMWorkbench. #slvdata #dhhacks

New #GLAMWorkbench notebook! Download images from @Library_Vic using IIIF and Handle… #dhhacks

Want to save @TroveAustralia newspaper articles as images (that aren’t sliced up in annoying ways)? There’s an app for that in the #GLAMWorkbench. #dhhacks

New ‘Trove images' section added to the #GLAMWorkbench! Here you’ll find my latest Jupyter notebook harvesting data about the use of standard licences & rights statements in Trove’s picture zone. #dhhacks

Voting in the 2019 @dhawards is now open! Go and check out all the cool #DigitalHumanities projects from around the world. And while you’re there, you might like to vote for my #GLAMWorkbench in the ‘Tools’ category!

New #GLAMWorkbench section with examples of how to get random-ish works and newspaper articles from @TroveAustralia. #dhhacks

The @naagovau RecordSearch section of the #GLAMWorkbench has been updated with more notebooks to help you get Australian archives data in a usable form. glam-workbench.github.io/recordsea… Useful for #twitterstorians, #ozhist, & #govhack!

I’ve updated my harvest of OCRd text from digitised journals in @TroveAustralia. The complete dataset now includes 33,035 issues from 720 titles – about 8gb of text to explore. Details in the #GLAMWorkbench: glam-workbench.github.io/trove-jou… #dhhacks

There’s a new section of the GLAM Workbench devoted to the National Museum of Australia collection API! Harvest @nma data, then explore it by time and place. #dhhacks

Updates to the Trove newspapers section of GLAM Workbench – adding links to app-ified versions of some notebooks, & direct links to @mybinderteam for everything. If you work with @TroveAustralia newspapers you might find it useful.

Download & explore 1,499,259 rows of open data from NSW State Archives Online Indexes

NSW State Archives publishes a number of detailed indexes containing data manually extracted from their records. These provide additional entry points to the records, such as a person’s name, or a place. But they also provide useful data for analysis. However, to explore the index data we need to get it out of the web interface and into a form that can be easily downloaded and manipulated. I’ve created a series of Jupyter notebooks to harvest the all the indexes and save the data in a series of CSV-formatted files.

Continue reading →

New in GLAM Workbench! Notebooks to harvest, index, analyse, and aggregate transcripts of speeches & interviews by Australian prime ministers. Plus links to harvested data and aggregated files. #dhhacks

Reorganising things a little at GLAM Workbench. @statelibrarynsw gets its own section. Hansard and @datagovau GLAM datasets now under ‘Australian government’. Making some space for further additions…