Zotero now saves links to digitised items in Trove from the NLA catalogue!

I’ve made a small change to the Zotero translator for the National Library of Australia’s catalogue. Now, if there’s a link to a digitised version of the work in Trove, that link will be saved in Zotero’s url field. This makes it quicker and easier to view digitised items – just click on the ‘URL’ label in Zotero to open the link.

It’s also handy if you’re viewing a digitised work in Trove and want to capture the metadata about it. Just click on the ‘View catalogue’ link in the details tab of a Trove item, then use Zotero to save the details from the catalogue.

View embedded JSON metadata for Trove's digitised books and journals

The metadata for digitised books and journals in Trove can seem a bit sparse, but there’s quite a lot of useful metadata embedded within Trove’s web pages that isn’t displayed to users or made available through the Trove API. This notebook in the GLAM Workbench shows you how you can access it. To make it even easier, I’ve added a new endpoint to my Trove Proxy that returns the metadata in JSON format.

Just pass the url of a digitised book or journal as a parameter named url to https://trove-proxy.herokuapp.com/metadata/. For example:

https://trove-proxy.herokuapp.com/metadata/?url=https://nla.gov.au/nla.obj-2906940941

Screenshot of the collapsed JSON metadata returned from the url above. It includes fields such as 'title', 'accessConditions', 'marcData', and 'children'.

I’ve created a simple bookmarklet to make it simpler to open the proxy. To use it just:

  • Drag this link to your bookmarks toolbar: Get Trove work metadata
  • View a digitised book or journal in Trove.
  • Click on the bookmarklet to view the metadata in JSON format.

To view the JSON data in your browser you might need to install an extension like JSONView.

Where did all those NSW articles go? Trove Newspapers Data Dashboard update!

I was looking at my Trove Newspapers Data Dashboard again last night trying to figure out why the number of newspaper articles from NSW seemed to have dropped by more than 700,000 since my harvesting began. It took me a while to figure out, but it seems that the search index was rebuilt on 31 May, and that caused some major shifts in the distribution of articles by state, as reported by the main result API. So the indexing of the articles changed, not the actual number of articles. Interestingly, the number of articles by state reported by the newspaper API doesn’t show the same fluctuations.

Screenshot of data dashboard that compares the number of articles by state as reported by the results and newspapers APIs. There are major differences in the column that shows the change since April 2022.

This adds another layer of complexity to understanding how Trove changes over time. To try and document such things, I’ve added a ‘Significant events’ section to the Dashboard. I’ve also included a new ‘Total articles by publication state’ section that compares results from the result and newspaper APIs. This should make it easier to identify such issues in the future.

Stay alert people – remember, search interfaces lie!

Catching up – some recent GLAM Workbench updates!

There’s been lots of small updates to the GLAM Workbench over the last couple of months and I’ve fallen behind in sharing details. So here’s an omnibus list of everything I can remember…

Data

  • Weekly harvests of basic Trove newspaper data continue, there’s now about three months worth. You can view a summary of the harvested data through the brand new Trove Newspaper Data Dashboard. The Dashboard is generated from a Jupyter notebook and is updated whenever there’s a new data harvest.
  • There’s also weekly harvests of files digitised by the NAA, now 16 months worth of data.
  • Updated harvest of Trove public tags (Zenodo) – includes 2,201,090 unique public tags added to 9,370,614 resources in Trove between August 2008 and July 2022.
  • I’ve started moving other pre-harvested datasets out of the GLAM Workbench code repositories, into their own data repositories. This means better versioning and citability. The first example is the list of Trove newspapers with articles post the 1955 copyright cliff of death – here’s the GH repo, and the Zenodo record.
  • To bring together datasets that provide historical data about Trove itself, I’ve created a Trove historical data community on Zenodo. Anyone’s welcome to contribute. There’s much more to come.

Tag cloud showing the frequency of the two hundred most commonly-used tags in Trove.

Tag cloud generated from the latest harvest of Trove Tags

Code

  • Big thanks to Mitchell Harrop who contributed a new Heritage Council of Victoria section to the GLAM Workbench providing examples using the Victorian Heritage Database API.
  • The troveharvester Python package has been updated. Mainly to remove annoying Pandas warnings and to make use of the trove-query-parser package.
  • As a result of the above, the Trove Newspaper & Gazette Harvester section of the GLAM Workbench has been updated. No major changes to notebooks, but I’ve implemented basic testing and linting to improve code quality.
  • The Trove newspapers section of the GW has been updated. There were a few bug fixes and minor improvements. In particular there was a problem downloading data and HTML files from QueryPic, and some date queries in QueryPic were returning no results.
  • The tool to download complete, high-res newspaper page images has been updated so that you now no longer need to supply an API key. Also fixed a problem with displaying the images in Voila.
  • The recordsearch_data_scraper Python package has been updated. This fixes a bug where agency and series searches with only one result weren’t being captured properly.
  • The RecordSearch section of the GW has been updated. This is incorporates the above update, but I took the opportunity to update all packages, and implement basic testing and linting. The Harvest items from a search in RecordSearch notebook has been simplified and reorganised. There are two new notebooks: Exploring harvested series data, 2022 – generates some basic statistics from the harvest of series data in 2022 and compares the results to the previous year; Summary of records digitised in the previous week – run this notebook to analyse the most recent dataset of recently digitised files, summarising the results by series.
  • A new Zotero translator for Libraries Tasmania has been developed

Calling all Tasmanian historians – you can now save resources from Libraries Tasmania into Zotero!

I’ve created a Zotero translator for the Libraries Tasmania catalogue. Using it, you can save metadata and digital resources to your own research database with a single click. Libraries Tasmania actually has three catalogues rolled into one – the main library catalogue, the Archives catalogue, and the Names Index. The translator works across all three. Features include:

  • Select and save items from a page of search results.
  • Save individual items across the full range of formats. (By default, individual records in the catalogue open in a modal overlay. For Zotero to recognise the item you need to click on the Permalink button and open the record on a separate page.)
  • Automatically download digital images and PDFs attached to records. This works when the record points to a particular page – it won’t download multiple images from a single link. However, if a record contains multiple links to digitised pages (such as the Convict records in the Names Index), you’ll get them all!
  • Fields in the Archives catalogue and Name Index that don’t map to Zotero properties are saved as key/value pairs in Zotero’s ‘Extra’ field

Screenshot of Zotero interface showing captured Libraries Tasmania records.

The translator is now included in the main Zotero repository so should install and update itself automatically. If the Zotero browser extension doesn’t seem to be detecting Libraries Tasmania items you can force an update by right clicking on the Zotero icon in your browser toolbar and clicking on Preferences > Advanced > Update translators.

My work on this translator was not entirely altruistic – it’s going to be very useful in the Everyday Heritage project as Kate Bagnall and I try to bring together sources relating to Chinese heritage in Tasmania.

But I’m also very happy to be able to update my spreadsheet of Zotero support in Australian GLAM organisations and put Libraries Tasmania in the green! #dhhacks

Screen capture of spreadsheet showing full Zotero support for Libraries Tasmania

Updated dataset! Harvests of Trove list metadata from 2018, 2020, and 2022 are now available on Zenodo: doi.org/10.5281/z… Another addition to the growing collection of historical Trove data. #GLAMWorkbench

Screen capture of version information from Zenodo showing that there are three available versions, v1.0, v1.1, and v1.2.

Updated dataset! Details of 2,201,090 unique public tags added to 9,370,614 resources in Trove between August 2008 and July 2022. Useful for exploring folksonomies, and the way people organise and use massive online resources like Trove. doi.org/10.5281/z…

Ok, I’ve created a Zenodo community for datasets documenting changes in the content and structure of Trove. Lots more to add… zenodo.org/communiti…

Coz I love making work for myself, I’ve started pulling datasets out of #GLAMWorkbench code repos & creating new data repos for them. This way they’ll have their own version histories in Zenodo. Here’s the first: github.com/GLAM-Work…

Ahead of my session at #OzHA2022 tomorrow, I’ve updated the NAA section of the #GLAMWorkbench. Come along to find out how to harvest file details, digitsed images, and PDFs, from a search in RecordSearch! github.com/GLAM-Work…

55,633 items digitised by the National Archives of Australia last week. Including:

  • Bonegilla name index cards (A2751 & A2752): +42,434
  • CMF Personnel Dossiers (B884): +10,150
  • Aust Women’s Land Army personnel cards (C610): +961

github.com/wragge/na…

A2571, Name Index Cards, Migrants Registration [Bonegilla], 33686 files digitised; B884, Citizen Military Forces Personnel Dossiers, 1939-1947, 10150 files digitised; A2572, Name Index Cards, Migrants Registration [Bonegilla], 8748 files digitised; C610, Australian Women's Land Army - personnel cards, alphabetical series, 961 files digitised; A9301, RAAF Personnel files of Non-Commissioned Officers (NCOs) and other ranks, 1921-1948, 735 files digitised; D874, Still photograph outdoor and studio negatives, annual single number series with N prefix (and progressive alpha infix A-K from 1948-1957), 624 files digitised; B883, Second Australian Imperial Force Personnel Dossiers, 1939-1947, 163 files digitised; J853, Architectural plans, annual single number series with alpha (denoting Papua New Guinea and discipline) prefix and/or alpha/numeric (denoting size and amendment) suffix, 161 files digitised; A14487, Royal Australian Air Force Air Board and Air Council Agendas, Submissions and Determinations - Master Copy, 102 files digitised; A2478, Non-British European migrant selection documents, 21 files digitised; D4881, Alien registration cards, alphabetical series, 18 files digitised; A471, Courts-Martial files [including war crimes trials], single number series, 10 files digitised; A1877, British migrants - Selection documents for free or assisted passage (Commonwealth nominees), 9 files digitised; A13860, Medical Documents - Army (Department of Defence Medical Documents), 9 files digitised; A1196, Correspondence files, multiple number series [Class 501] [501-539] [Classified] [Main correspondence files series of the agency], 9 files digitised; B78, Alien registration documents, 8 files digitised; A712, Letters received, annual single number series with letter prefix or infix, 6 files digitised; A12372, RAAF Personnel files - All Ranks [Main correspondence files series of the agency], 6 files digitised; AP476/4, Applications etc. for registration of copyright of literary, dramatic and musical productions, pictures etc., 6 files digitised; A714, Books of duplicate certificates of naturalization A(1)[Individual person] series, 6 files digitised;

Newspapers added to Trove last week

  • Freelance (WA)
  • The Standard (WA)
  • Berrigan Advocate (NSW)
  • Baileys Sporting & Dramatic Weekly (WA)
  • Farmers’ Weekly (WA)
  • Harvey-Waroona Mail (WA)
  • W.A. Family Sphere (WA)
  • Coonabarabran Times (NSW)

github.com/wragge/tr…

Noticed that QueryPic was having a problem with some date queries. Should be fixed in the latest release of the Trove Newspapers section of the #GLAMWorkbench: glam-workbench.net/trove-new… #maintenance #researchinfrastructure

The Trove Newspapers section of the #GLAMWorkbench has been updated! Voilá was causing a problem in QueryPic, stopping results from being downloaded. A package update did the trick! Everything now updated & tested. glam-workbench.net/trove-new…

Some more #GLAMWorkbench maintenance – this app to download a high-res page images from Trove newspapers now doesn’t require an API key if you have a url, & some display problems have been fixed. trove-newspaper-apps.herokuapp.com/voila/ren…

Screen shot of app --  Download a page image  The Trove web interface doesn't provide a way of getting high-resolution page images from newspapers. This simple app lets you download page images as complete, high-resolution JPG files.

The Trove Newspaper and Gazette Harvester section of the #GLAMWorkbench has been updated! No major changes to notebooks, just lots of background maintenance stuff such as updating packages, testing, linting notebooks etc. glam-workbench.net/trove-har…

Main changes to individual Trove newspapers last week:

+19,862 articles in Daily News (WA)
+10,822 articles in Dalgety’s Review (WA)
+13,352 articles in Manning River News… (NSW)

See: github.com/wragge/tr…

Changes to Trove newspapers last week:

+86,761 articles
+16,367 articles with corrections
+5,355 articles with tags
+417 articles with comments

See: github.com/wragge/tr…

42,472 files were digitised by the National Archives of Australia last week.

36,238 of these were migrant registration cards from Bonegilla (A2571 & A2572).

Here’s a screenshot of the top twenty series. More info: github.com/wragge/na…

A2571, Name Index Cards, Migrants Registration [Bonegilla], 31307 files digitised; B884, Citizen Military Forces Personnel Dossiers, 1939-1947, 4981 files digitised; A2572, Name Index Cards, Migrants Registration [Bonegilla], 4931 files digitised; C610, Australian Women's Land Army - personnel cards, alphabetical series, 566 files digitised; J853, Architectural plans, annual single number series with alpha (denoting Papua New Guinea and discipline) prefix and/or alpha/numeric (denoting size and amendment) suffix, 292 files digitised; A9301, RAAF Personnel files of Non-Commissioned Officers (NCOs) and other ranks, 1921-1948, 188 files digitised; B883, Second Australian Imperial Force Personnel Dossiers, 1939-1947, 56 files digitised; K1145, Certificates of Exemption from Dictation Test, annual certificate number order, 17 files digitised; SP551/1, Log books of HMC [Her Majesty's Colonial], HM [His/Her Majesty's] and HMA [Her Majesty's Australian] Ships, 9 files digitised; F1, Correspondence files, annual single number series [Main correspondence files series of the agency], 7 files digitised; A1196, Correspondence files, multiple number series [Class 501] [501-539] [Classified] [Main correspondence files series of the agency], 6 files digitised; B78, Alien registration documents, 6 files digitised; A12372, RAAF Personnel files - All Ranks [Main correspondence files series of the agency], 4 files digitised; SP1122/1, General correspondence files, multiple number series and annual single number with 'N' (New South Wales) prefix, 4 files digitised; A9300, RAAF Officers Personnel files, 1921-1948, 4 files digitised; MP508/1, General correspondence files, multiple number series, 3 files digitised; C321, Case files, annual single number series with 'N' (NSW) prefix, 3 files digitised; B2458, Army Personnel Files, multiple number series, 3 files digitised; K47, Correspondence files, annual single number series with 'W' prefix, 3 files digitised; PP892/1, Correspondence [client] files, annual single number series with 'W' prefix, 2 files digitised;

I wrote up something for the #GLAMWorkbook on ‘Empty searches and hacking urls’: glam-workbook.net/url-hacki…

Under development – a Zotero translator for Libraries Tasmania!

I’ve created a Zotero translator for the Libraries Tasmania catalogue. Using it, you can save metadata and digital resources to your own research database with a single click. Libraries Tasmania actually has three catalogues rolled into one – the main library catalogue, the archives catalogue, and the names index. The translator works across all three. Features include:

  • Select and save items from a page of search results.
  • Save individual items across the full range of formats. (By default, individual records in the catalogue open in a modal overlay. For Zotero to recognise the item you need to click on the Permalink button and open the record on a separate page.)
  • Automatically download digital images and PDFs attached to records. This works when the record points to a particular page – it won’t download multiple images from a single link. However, if a record contains multiple links to digitised pages (such as the Convict records in the Names Index), you’ll get them all!
  • Fields in the Archives catalogue and Name Index that don’t map to Zotero properties are saved as key/value pairs in Zotero’s ‘Extra’ field

Screenshot of Zotero interface showing captured Libraries Tasmania records.

I’ve submitted the translator for inclusion in the official repository, but you’re welcome to take it for a test run in the meantime. Just follow these steps:

  • Find out where your Zotero data folder is. From Zotero go to Edit > Preferences > Advanced > Files and Folders and click on Show Data Directory.
  • Open the translator file here. Then right click and choose Save as, navigate to your Zotero data folder, and save it in the ‘translators’ folder.
  • Restart Zotero and your browser.
  • Head to the Libraries Tasmania catalogue and try it!

Let me know if you strike any problems. #dhhacks

Ok, I’ve submitted my Libraries Tasmania translator to the Zotero repository for inclusion. No doubt a bit of additional tweaking will be required. github.com/zotero/tr…

Getting to work migrating the Real Face of White Australia transcription site from scribeAPI (no longer maintained) to Zooniverse Panoptes. First the workflows and config, then the subjects, then the current transcription data…

It’s getting there – new Real Face of White Australia site using Datasette, IIIF, and Universal Viewer…

Screenshot of new site for displaying records relating to the White Australia Policy from the National Archives of Australia. On the left is the Universal viewer showing page images. On the right is the file metadata in tabular format.

Ordering some #GLAMWorkbench stickers…

Proof image of a hexagonal sticker. The sticker has white lettering on a blue blackground which reads GLAM Workbench. In the centre is a crossed hammer and wrench icon.