Tim Sherratt

How to download all the images from a digitised collection in Trove (& learn some cool Trove tricks)

Wed, 24 Apr 2024 23:35:50 +1000

Digitised resources in Trove are sometimes grouped into collections – an album of photographs, a set of posters, a bundle of letters. I’ve just added a notebook to the GLAM Workbench that downloads all the images in a collection at the highest available resolution.

A sample of the 3,048 posters download from nla.obj-2590804313

Why is it necessary?

Trove’s digitised collection viewer includes a download option. But in most cases that seems to be limited to downloading 20 images at a time. Part of the reason for that is probably because the images are zipped up into a single file, which could get very large if 100, or 200 images were added.

Another limitation of the built-in download option is that the images are often fairly low resolution copies (many have a maximum width of 1000px). The quality of the images limits how you can use them.

For many research purposes, you’ll want a complete collection at the highest resolution possible. Trove makes that difficult. The new notebook makes it easy.

How does it work?

The Trove API is not much help. The only collections it knows about are articles in newspapers, and issues in periodicals. However, when you browse through a collection using the digitised collection viewer, a little internal API is called that delivers an HTML list of the next 20 items. By stepping through the collection page by page, you can eventually harvest details of all the items in a collection.

Once you have the nla.obj identifiers for each image, you can download a high-resolution version simply by adding /image to the url. Downloaded this way, the images generally seem to have a longest dimension equal to 5000px – a considerable improvement!

As well as downloading all the images, the notebook also generates an RO-Crate metadata file that describes the context of your harvest – when it was run, what collection you downloaded, what notebook you used, as well as the details of each image. This’ll help you in the future when you’ve forgotten where all the images came from!

Where can I learn all these Trove tricks?

This new notebook came about because I was documenting the method for harvesting collections in the Trove Data Guide. I realised that I needed to adapt my original code to work with complex collections that included multiple layers of nested sub-collections (like manuscript finding aids). Having done that, I thought it would be useful to create a working example in the GLAM Workbench.

The Trove Data Guide is being developed as part of the ARDC’s Community Lab. In it I’m trying to document as many of these problems and workarounds as I can to open Trove data to new research uses. Here, for example, is the page that discusses how to get a list of collection items, and here are some suggestions for downloading high-resolution images. One of my favourite discoveries from recent weeks was the internal API that delivers OCR layout information about book and periodical pages – find out how to save illustrations from books, visualise the layout of a periodical page, and even create some #redactionart poetry.

The content of the Trove Data Guide is changing and developing all the time. It’s all happening in the open, so feel free to explore the bleeding-edge development version or the latest published version. If there’s something you’d like to see, please post it on the GitHub ideas board.

Where is the new notebook?

The new notebook is part of the Trove images section in the GLAM Workbench. Because I was adding a new notebook, I also took the chance to update the whole repository. Changes include:

now using Trove API v3
now using Python 3.10
updated Python packages
now includes an ro-crate-metadata.json containing machine-readable metadata describing this repository

I also updated the datasets that provide information about the application of licences and rights statements to images by Trove contributors and moved them to their own GitHub repository.

What do you want to do with Trove data?

Thu, 18 Apr 2024 16:22:39 +1000

In my work on the Trove Data Guide I’ve started sketching out a series of research pathways. These are intended as ways of connecting Trove data to tools and questions – providing examples of the steps involved in gathering, preparing, and using data to explore particular research topics.

I’ve currently defined six pathways, roughly based on different types of data that you can get from Trove:

‘Creating collections’ is a bit different I suppose, as it’s meant to relate to the work of assembling research collections from data in Trove – for example, creating a collection of annotated newspaper articles in Omeka.

I have some ideas, of course, about the types of tutorials and examples to include in each pathway, but I’m wondering what you would like to see. What would you like to be able to do with Trove data?

You might get some inspiration by browsing through what’s already in the Trove Data Guide and the GLAM Workbench, or perhaps you have a research question that’s foundered because you couldn’t get the data you needed out of Trove. If you have any ideas please share them via the TDG’s ideas board. This is a chance to get some of your gnarly Trove data problems solved!

Note that the TDG links in this post go to the development version, which changes frequently. There is also a published version that doesn’t include the latest content.

Update! Saving Trove newspaper articles and pages as images

Thu, 18 Apr 2024 12:32:50 +1000

You probably know that when you select the Download as Image option for a digitised newspaper article in Trove what you get back is not actually an image – it’s an HTML document, in which the original image has been sliced up to try and fit on an A4 page when printed. So this article:

Ends up looking like this!!

So what do you do when you just want an image of an article as it appeared in the newspaper? Some years ago I figured out a workaround that involves scraping the OCR positional data that’s embedded in Trove’s newspaper viewer and cropping the article from a high-resolution image of the page. The method is documented in the GLAM Workbench and the Trove Data Guide, and I’ve packaged up the code in trove-newspapers-images so you can embed it in your own projects.

I also created a web app (using Jupyter and Voilá) to make it as simple as possible for people to download images from articles. Unlike most of the other notebooks in the GLAM Workbench which are spun up on demand, this web app was hosted on a constantly-running server. This made it faster to start and use, but it was relatively expensive, wasteful, and difficult to maintain. So I decided to make a change!

The new version of the Save Trove newspaper article as image web app is actually embedded with the GLAM Workbench. Behind the scenes, the page calls a AWS Lambda function which uses trove-newspapers-images to generate the image. So far it seems to be working pretty well. Try it now!

Even better, I’ve made some changes to image generation code to give users the option of masking the articles. The original version crops a rectangle from the page using the article coordinates. If an article extends over multiple columns with different lengths, the image will include content from neighbouring articles. It’s not a big problem, but it always annoyed me. Recently I realised that the solution was quite simple – instead of cropping one big box from the page, you can crop each individual OCR ‘zone’ and paste them into a new empty image with the same dimensions as the original. Once you’ve pasted all the zones, you crop the new page image using the article coordinates. Here’s an example of an article without masking:

And the same article with masking:

This enhancement has been pushed to the trove-newspapers-images package, and is available through the web app by simply checking the ‘mask image’ option.

Another frustrating feature of the Trove web interface is that there’s no way of saving a newspaper page as an image, only as a PDF. In this case the workaround is pretty simple, you just have to know the url pattern used to download page images. This is documented in the Trove Data Guide. Once again, I’ve been providing a web app to make this easy for users, and once again I’ve just updated it so that it’s embedded with the GLAM Workbench itself. Try it now!

Getting to know NED – born-digital periodicals in Trove

Wed, 10 Apr 2024 23:05:01 +1000

I spend a lot of my time trying to highlight the wealth of resources available through Trove – whether that’s 25,000 digitised Parliamentary Papers, 6,000 oral histories you can listen to online, or 3,471 full-page editorial cartoons from The Bulletin. Most recently I’ve been working on digitised periodicals, developing a new section for the Trove Data Guide. But as I was harvesting data about the 900 periodicals and 37,000 issues that had so far been digitised, I wondered about periodicals that were born digital – in particular, those that had been submitted to the National Library by publishers and authors through the National eDeposit Scheme (NED). It turns out, there’s a lot more than I realised.

I’ve added a new notebook to the Trove Periodicals section of the GLAM Workbench that harvests data about NED periodicals, and created a new dataset with lists of titles and issues. You can also explore the harvested data using Datasette-Lite. But here’s a quick overview.

There are at least 7,973 born-digital periodicals contributed through NED, comprising a total of 156,151 issues!

One of the 428 issues of the Palm Island Voice

What are they? Here’s the twenty titles with the most issues.

title_id	title	issues
nla.obj-1916881555	Western Australian government gazette.	1869
nla.obj-2940864261	The Australian Jewish News.	1067
nla.obj-2692666983	APSjobs-vacancies daily … daily gazette.	1043
nla.obj-2945379691	Tweed link	825
nla.obj-2541626239	Weekly notice	798
nla.obj-2940863963	The Australian Jewish News.	726
nla.obj-1252109725	Queensland Health services bulletin	700
nla.obj-1247944368	Hyden Karlgarin Householder News.	642
nla.obj-1775015332	E-record : your news from across the Archdioce…	640
nla.obj-638303044	Class ruling	580
nla.obj-2536144595	Plantagenet news.	574
nla.obj-1252305285	Clermont rag : Community newspaper.	514
nla.obj-2815835489	The Apollo Bay news.	513
nla.obj-1908935587	Assessment reports and exam papers	512
nla.obj-3125539859	The Peninsula community access news.	506
nla.obj-2859788676	Council news : weekly information from us to you	469
nla.obj-1252119874	Rot-Ayr-Ian [electronic resource] : the offici…	467
nla.obj-2994765231	Townsville Orchid Society Inc. bulletin.	442
nla.obj-1252246096	Palm Island Voice.	428
nla.obj-3267060622	News & views from George Cochrane.	399

These are born-digital, so they’re not images and OCRd text like the digitised periodicals and newspapers. Most of them are PDFs as we can see from the metadata.

format	number of issues
application/pdf	154,976
not specified	1,075
application/epub+zip	100

Not all NED periodicals can be viewed online. Publishers submitting periodicals through NED can place restrictions on access, specifying that that the publications can only be viewed on-site in a library. The three access categories are:

Unrestricted – you can view online and download
View Only – you can view online but not download
Onsite Only – you can only view when onsite at the designated libraries

Fortunately, the vast majority are Unrestricted.

Access status	Number of issues
Unrestricted	138,557
View Only	12,937
Onsite Only	4,657

One of the most important things about the digitised newspaper corpus is its diversity – it’s not just the metropolitan dailies, but many local, community, political, and religious newspapers as well. While local newspapers might be dying out in their traditional form, electronic publications are popping up. Look at the titles in the list above – the Apollo Bay News, the Palm Island Voice – while current historians mine the digitised newspapers for fragments of everyday life, future historians will be grateful for what’s being captured and preserved by NED.

But wait there’s more! Since 1996, the Australian Web Archive (previously Pandora) has been capturing online periodicals. My next task is to harvest some details of these as well.

More tools and data for working with Trove's digitised periodicals

Tue, 26 Mar 2024 14:35:02 +1000

The Trove Periodicals section of the GLAM Workbench has been updated! Some changes were necessary to make use of version 3 of the Trove API, but I’ve also taken the chance to reorganise things a bit – starting with the name. This section used to be called ‘Trove journals’, reflecting the naming of Trove’s ‘Journals’ zone. But zones have gone, and periodicals are now spread across multiple categories, so I thought a name change was necessary to better reflect the type of content being examined.

What periodicals have been digitised?

It’s surprising difficult to find out what periodicals have actually been digitised in Trove. There’s no straightforward list of titles as there is in the newspapers category. Over the years I’ve created a variety of lists and tools to try and overcome this. I’m now trying to consolidate these efforts into a single dataset which you can explore using Datasette-Lite. I’ve made a few improvements to this in recent weeks, in particular, title records now include a link to download all the OCRd text from periodical.

New notebooks

The notebook pages in the GLAM Workbench now include previews of the notebook’s content. There are a number of new notebooks:

Get details of periodicals from the /magazine/titles API endpoint – shows how you can get a list of titles from version 3 of the Trove API and explores some of the problems with the data
Enrich the list of periodicals from the Trove API – shows how to work around some of the problems with the titles data, adds some extra metadata, and generates the database described above
Harvest illustrations from periodicals – extract illustrations for periodical pages, issues, articles and searches using a OCR layout data

If you’d like an example of the sorts of illustrations you can extract from the digitised periodicals, here’s a collection of photos found by searching for periodical articles with cat or kitten in their titles.

Updated and reorganised datasets

I’ve moved all the datasets out of the main GitHub repository into their own separate repositories. Some large collections that were previously stored on the sadly-deceased Cloudstor service are now sitting in an Amazon s3 bucket. These include:

Details of digitised periodicals from the /magazine/titles API endpoint – these are the datasets created by harvesting and enriching titles and issues data from the Trove API
CSV formatted list of journals available from Trove in digital form – this is an update of an older dataset of titles created by searching for digitised works with the format Periodical
Editorial cartoons from The Bulletin, 1886 to 1952 – the cartoons haven’t been updated, but I’ve created a new metadata file and fixed up some problems with page numbering
OCRd text from Trove digitised journals – I’ve reharvested all of the OCRd text and made it available as individual zip files for each title, and one big zip file with everything!

As previously noted, I’ve also made the Bulletin cartoons available through Datasette-Lite for easy exploration.

A new way to explore editorial cartoons from The Bulletin

Tue, 19 Mar 2024 14:46:22 +1000

About five years ago I created a collection of full-page editorial cartoons from The Bulletin, harvested from Trove. Through a process that might be politely described as ‘iterative’, I fiddled with an assortment of queries and methods until I had at least one cartoon from every issue published between 4 September 1886 and 17 September 1952 – 3,471 cartoons in total. The details of the collection and how I created it are available in the Trove periodicals section of the GLAM Workbench.

Last night, as I was tidying up a new release of the Trove periodicals repository, I had a thought – why not put all of the details of the cartoons in a little database and make it available using Datasette-Lite for easy exploration? So I did.

Try it now!

One of the coolest new features is that I’ve harvested the OCRd text from each page containing a cartoon and created a full-text index. This means you can find cartoons by searching for words in their captions! Other features include embedded thumbnail images and links to download high-resolution versions of each page image.

In creating the database, I realised there were a few problems with the original metadata (dodgy page numbers), so I’ve fixed that up as well. I’ve also moved the mega zip download of every image (over 60gb) from the unfortunately deceased CloudStor service to AWS.

New GLAM Workbench section for working with government publications in Trove

Tue, 27 Feb 2024 13:34:18 +1000

The GLAM Workbench has a brand new section aimed at helping you find and use government publications in Trove. Most of the GLAM Workbench’s existing sections focus on a particular resource format, or are related to one of Trove’s top-level categories. This didn’t quite work for government publications, as things like Parliamentary Papers are spread across multiple categories, and can encompass a variety of formats. So I thought a new section was the best way of bringing it all together.

At the moment the Trove Government section includes two notebooks and three pre-harvested datasets.

It took a bit longer than I was originally expecting because I also made some changes in the way I store and display information about GLAM Workbench resources. You might notice, for example, that each of the datasets lives in its own separate GitHub repository, rather than being rolled together with the notebooks into one big repository. This makes it easier to manage and share information about individual datasets, and also trims down the size of the Docker images built from the code repository.

Each of these data and code repositories have their own machine-readable metadata following the RO-Crate standard. This continues work I’ve been doing with the ARDC Community Data Lab to describe GLAM Workbench resources and outputs using RO-Crate. Having this metadata in a standard format creates new possibilities for integration and automation. I’m now using the RO-Crate files to produce different, public-facing views of the resources they describe. The README files in each repository and all the GLAM Workbench pages in the Trove Government section are automatically generated from the RO-Crate data. In the latter case, I’ve extended my MkDocs setup using macros to pull in the RO-Crate JSON files and make the data available to the page templates. Connecting all the bits up took a lot of time, but I’m pretty happy with the result and will eventually extend this approach to the rest of the GLAM Workbench.

I also fiddled a bit with the way Jupyter notebooks are presented in the GLAM Workbench. The Trove Government pages include a notebook preview – basically an HTML rendering of the notebook in an iframe. This means you can browse the content of the notebook without having to do anything extra, or go anywhere else. In other sections you can view notebook content by following links to GitHub or NBViewer, but the embedded previews seem cleaner and more useful.

I’ve also changed the way options to run the notebook are presented. In the Trove Government section, these options are displayed as tabs beneath the preview – allowing you to choose, for example, between ARDC Binder and the public MyBinder service. In other sections I have a big blue button to launch the notebook using a specific service, with other options listed below. This new approach means I don’t have to prioritise one particular service – it’s left to the user to choose. It’s also expandable. In the future, I’m hoping to make some of the GLAM Workbench’s notebooks available using JupyterLite. As I do this, I can just add the JupyterLite option under another tab.

As with some other sections of the GLAM Workbench, the dataset pages are integrated with Datasette-Lite. If there’s a CSV file in the dataset, you’ll see a button to explore it using Datasette. For example, this link leads to a searchable database with details of 24,997 digitised Parliamentary Papers. That same dataset has also been used in the Trove Data Guide, to visualise Trove’s holdings of Parliamentary Papers. Yay for integration and reuse!

Digital history stream at AHA annual conference in July

Fri, 16 Feb 2024 10:44:32 +1000

This year the annual conference of the Australian Historical Association will include a digital history stream, sponsored by the Australian Research Data Commons (ARDC), and convened by me!

The call for papers is available here or through the Conference website. The list of possible topics is deliberately broad and inclusive – if you’re using digital tools or methods in the organisation, analysis, and visualisation of historical data we’d love to hear from you. Proposals are due on ~~23 February~~ 4 March and can be submitted through the Conference website.

We’re particularly keen for HDR and ECR scholars to be involved. To help meet registration and travel costs, the ARDC is funding up to four $1000 bursaries. More details are available here. Bursary applications close on 31 March.

There’s also likely to be a digital history workshop, as well as updates on the work of the HASS & Indigenous Research Data Commons and ARDC Community Data Lab, including the Trove Data Guide.

Time is short! Get your proposals in now! Contact me at tim@timsherratt.au if you have any questions.

Some recent presentations on the GLAM Workbench and Trove Data Guide

Tue, 13 Feb 2024 08:11:17 +1000

Last week I attended the ARDC Workshop on Repositories & Workspaces where I gave a quick intro to the GLAM Workbench and the Community Data Lab.

Then it was off to the ARDC HASS&I Research Data Commons Summer School where I explored some of the mysteries of Trove in a walk-through of the Trove Data Guide.

Exploring Trove’s digitised periodicals

Tue, 30 Jan 2024 21:53:14 +1000

While Trove’s digitised newspapers get all the attention, there are many other digitised periodicals to explore. But it’s not easy to find them from the Trove web interface – unlike the newspapers, there’s no list of digitised titles. So to help researchers find and use Trove’s digitised periodicals, I’ve created a searchable database using Datasette-Lite. Try it out!

Search for the titles of digitised periodicals.

View the details of an individual title (note the link to available issues at the bottom.

Browse a list of available issues.

The database currently contains details of 923 different titles, and over 37,000 individual issues. You can search for titles by keyword, then click through to view a full list of issues from a periodical. As well as basic descriptive metadata and links back to Trove, there’s a couple of other handy inclusions:

Titles include a ‘Search for articles in Trove’ link that opens up the Trove interface and pre-populates the search box with the title’s identifier. By adding some keywords you can search for articles within the publication.
Issues include a text_download_url link that downloads all the OCRd text from the issue.

Regular viewers might be thinking – wasn’t there already something like this? Yes indeed, for several years I’ve been maintaining the Trove Titles app, which provides a similar list. I’ve also provided harvests of OCRd text. So why the new database? First of all, I’ve harvested the data in a different way – making use of the new /magazine/titles API endpoint. This had several problems (see below), but I’m hoping that in the long term it will make updates easier.

Second, I’m exploring ways to make these sorts of resources available in a more sustainable way. The current Trove Titles app runs on the Heroku platform and there are costs associated with the app and the databases it uses. It just seems a bit silly for a relatively small amount of data. Datasette-Lite takes a very different approach – there’s no constantly running server, just a static site pointing at a dataset. All the magic happens within your browser!

I’ve written previously about how I’ve been customising Datasette-Lite for use within the GLAM Workbench, but I had to handle the periodicals data a bit differently. Because there’s a foreign key relationship between the titles and the issues (each issue is linked back to a title), I loaded the harvested data into a SQLite database (using sqlite-utils), defined the foreign key, and built a fulltext index on the periodical titles. Then I just saved the whole SQLite database to a GitHub repository and pointed Datasette-Lite at it.

I had to modify the GLAM Workbench template a bit to insert links back to the title when you view an individual issue. This happens automatically when you view a list of results, but not when you view an individual item. First I used the install parameter to tell Datasette-Lite to install the datasette-template-sql plugin. This plugin lets you run SQL queries within a template. Then I could run a query to see if there was a foreign key associated with the current item:

{% set fk = sql("SELECT * FROM pragma_foreign_key_list(?)", [table]) %}

If there is a foreign key, I run another query to get the title of the linked title:

{% if fk %}
    {% set flinks = sql("select title, " + fk.0.to + " from " + fk.0.table + " where id = ?", [display_rows.0[fk.0.from]]) %}
    {% set ftitle = flinks.0.title %}
{% endif %}

Then when rendering the column containing the linked value I can insert the title and a link:

{% if fk and cell.column == fk.0.from %}
	<a href="/{{database}}/{{fk.0.table}}/{{cell.value}}">{{ftitle}}</a> <em>{{cell.value}}</em>
{% else %}
	{{ row.display(cell.column) }}
{% endif %}

It seems to work ok, and doesn’t cause problems on databases where there’s no foreign keys.

I’m also using the datasette-json-html plugin to render the thumbnails. I’m also using the metadata parameter to point Datasette-Lite at a custom metadata file – this was primarily to define a custom sort order for the tables.

The data

I’ll write up more about the data and the harvesting process in coming weeks. There’ll also be a new section in the Trove Data Guide and some updated notebooks in the journals section of the GLAM Workbench. But a few notes about the /magazine/titles API endpoint:

there are a few hundred duplicate records – I’ve removed these from the dataset
the API doesn’t provide full information about issues, in particular undated issues are not returned – I’ve tried to fill these gaps
the data includes a thousand or more Parliamentary Papers – I’ve harvested these separately and thought it was best to exclude them from this dataset
some titles are actually nested collections, so the ‘issues’ are actually another level of title, while alternatively some titles are actually issues – I’ve tried to sort as much of this out as I can, but it gets confusing!

So I’m not confident that I’ve got everything, but I think it’s a useful start. I’ve reported the API problems to Trove but haven’t heard anything back yet.

The Trove Newspaper Data Dashboard now has an archive!

Mon, 15 Jan 2024 09:51:58 +1000

Since July 2022 I’ve been generating weekly snapshots of the contents of the Trove newspaper corpus. Every Sunday a new version of the Trove Newspaper Data Dashboard is created, highlighting what’s changed over the previous week, and visualising trends since April 2022 (when I first started regular data harvests).

All of the past versions of the dashboard are preserved in GitHub, but there wasn’t an easy way to browse them, until now. If you want to find out what changed in any week since July 2022, you can now visit the Trove Data Dashboard Archive and select a date from the list!

I created the archive by pulling all the versions from GitHub and saving them as individual files. I’ve also added some code to the weekly process that should automatically archive the past week from now on – we’ll see if it works next Sunday…

Customising Datasette-Lite to explore datasets in the GLAM Workbench

Fri, 12 Jan 2024 15:09:41 +1000

As well as tools and code, the GLAM Workbench includes a number of pre-harvested datasets for researchers to play with. But just including a link to a CSV file in GitHub or Zenodo isn’t very useful – it doesn’t help researchers understand what’s in the dataset, and why it might be useful. That’s why I’ve also started including links that open the CSV files in Datasette-Lite, enabling the contents to be searched, filtered, and faceted. Just look for the Explore in Datasette buttons!

Datasette is an excellent tool for sharing and exploring data. I’ve used it in a number of projects such as the GLAM Name Index Search and the Tasmanian Post Office Directories. Datasette-Lite is a version of Datasette that runs completely in the user’s web browser – no need for separate servers! All you do is point a Datasette-Lite Github repository at a publicly available CSV file, and it builds a searchable database in your browser. So instead of having to configure and maintain a series of servers running Datasette, I just have one static GitHub repository that only springs into action when needed.

For example, click this link to explore metadata describing oral history collections in Trove using Datasette-Lite.

I’ve made a few changes to the standard Datasette-Lite application for use with the GLAM Workbench. These are all included in the GLAM Workbench’s Datasette-Lite fork and described below.

Custom theme

I’d already created a custom Datasette theme for use in other projects. The question was how do I get it to work with Datasette-Lite? Just putting a templates folder in the repository wasn’t enough, as the virtual environment created within the browser doesn’t have direct access to all the files. I eventually figured out that I could zip up the templates folder, fetch the zip file using Javascript, and then unzip the folder into the browser’s virtual environment. This is the code in webworker.js that does all that:

let templateResponse = await fetch("templates.zip");
let templateBinary = await templateResponse.arrayBuffer();
pyodide.unpackArchive(templateBinary, "zip");

Then it’s just a matter of changing the Datasette initialisation to point to the templates directory:

ds = Datasette(
    names, 
    settings={
    	"num_sql_threads": 0, 
    	"truncate_cells_html": 100 # truncate cells
    }, 
    metadata=metadata, 
    template_dir="templates", # point to custom templates directory
    plugins_dir="plugins", # point to custom plugins directory
    memory=${settings.memory ? 'True' : 'False'}
)

As you can see, I’ve also added the "truncate_cells_html": 100 setting to truncate the contents of cells in the table view.

Custom plugins

Sometimes fields can contain multiple urls. While Datasette will make single urls clickable, multiple urls are just left as plain text. The datasette-multiline-links plugin fixes this for urls separated by line breaks, but I generally separate multiple values in CSV fields using the | character. It wasn’t hard to modify the plugin, but again it wasn’t clear how to make the modified plugin work with Datasette-Lite. You can use the install parameter to load plugins, but the plugins have to either be published in PyPI or available in GitHub as a Python wheel. That all seemed like overkill for my tiny plugin modification, but then I realised that I could use the same method as I was using for the custom template – zip, fetch, unzip, then point Datasette to the new plugins directory.

It also took me a while to figure out how to get the plugin to work nicely with the truncate_cells_html setting. Unless a cell-formatting plugin returns None, other cell format operations, such as truncation, aren’t applied. So I had to make sure that the plugin returned None if there were no urls in a cell.

Custom metadata

You can use the metadata parameter in Datasette-Lite to point to a metadata file in either JSON or YAML. I’ve added a custom metadata.json file to the GLAM Workbench repository, and adjusted the webworker.js code to load it by default.

Full text indexing

One really cool things about Datasette is the ability to run full text searches across specified columns. If Datasette detects a full text index, it automatically adds a keyword search box.

There wasn’t a way of adding full text indexes to CSV datasets in Datasette-Lite, so I added a new fts url parameter and used the value in webworker.js to modify the database using SQLite-utils.

fts_cols = ${JSON.stringify(settings.ftsCols || "")} 
try:
    db[bit].enable_fts(fts_cols.split(",")) # add full text indexes to columns
except sqlite3.OperationalError:
    print("Column not found")
    pass

For example, adding fts=title to a Datasette-Lite url will automatically add a full text index to the title column. You can also index multiple columns – just separate the column names with commas.

This url opens a CSV dataset with oral history metadata harvested from Trove and indexes the title and contributor columns: https://glam-workbench.net/datasette-lite/?csv=https://github.com/GLAM-Workbench/trove-oral-histories-data/blob/main/trove-oral-histories.csv&fts=title,contributor

Datasette converts your CSV file to a SQLite database, and SQLite supports a number of advanced search options. These options aren’t enabled by default in Datasette – you need to set searchmode to raw in the table metadata. To enable advanced searches, I’ve added a line in webworker.js to modify the default metadata:

metadata["databases"]["data"]["tables"][bit] = {"searchmode": "raw"}

Drop unwanted columns

Not all the columns in pre-harvested datasets are useful or interesting. To remove selected columns from Datasette-Lite, I added a drop url parameter. Once again, you can submit multiple values separated by commas.

In webworker.js the drop values are used with the SQLite-utils transform() function to remove the columns from the database.

drop_cols = ${JSON.stringify(settings.dropCols || "")}
db[bit].transform(drop=set(drop_cols.split(",")))

This url opens a CSV dataset with oral history metadata harvested from Trove and drops the publisher and work_type columns: https://glam-workbench.net/datasette-lite/?csv=https://github.com/GLAM-Workbench/trove-oral-histories-data/blob/main/trove-oral-histories.csv&drop=publisher,work_type

What’s going on?

Thu, 04 Jan 2024 17:21:41 +1000

The hardest part of developing tools and resources like the GLAM Workbench is getting information about them to the people who might benefit. The collapse of Twitter has only added to the difficulty, as has the reluctance of GLAM organisations to share new resources with their users. I’d rather spend my time making new tools, but what’s the point if no-one knows they exist?

Anyway, I thought I’d do a bit of a communications refresh for the new year. If you’re interested in GLAM Workbench and Trove Data Guide updates you can:

Keep an eye on the GLAM Workbench channel of my microblog (or add the feed to your RSS reader)
Follow the GLAM Workbench Facebook page for cross-posted updates from the RSS feed
Follow the GLAM Workbench LinkedIn page for cross-posted updates from the RSS feed

I’m also working on an email newsletter thing that’ll compile the updates at regular intervals.

For more social socials, as well as questions, requests and problems, you can always find me on Mastodon: @wragge@hcommons.social. My email address is not too hard to find, but, honestly, your chances of getting a reply are slim.

If you’ve got a bug report, or a suggestion for a new notebook or data source, feel free to create an issue on GitHub.

And, of course, everything I do is openly-licensed, so you are very welcome to modify and share! See the GLAM Workbench for other ways to get involved.

Exploring oral histories in Trove

Thu, 04 Jan 2024 11:20:30 +1000

The National Library of Australia holds over 55,000 hours of oral history and folklore recordings dating back to the 1950s. This collection is being made available online, and many recordings can now be listened to using Trove’s audio player.

However, the oral history collection is not easy to find in Trove. You need to go the ‘Music, Audio, & Video’ category and check the ‘Sound/Interview, lecture, talk’ format facet. To limit results to oral histories that have been digitised, you can add “nla.obj” to your query and set the ‘Access’ facet to ‘Online’. But what’s actually in the oral history collection and what can you do with it?

To help researchers explore and analyse the NLA’s oral history collection, I’ve added some notebooks to the Music, sound, and oral histories section of the GLAM Workbench:

Harvest oral histories metadata – harvests metadata describing the NLA’s oral history collection from Trove and saves the results as a CSV file
Save a list of oral history collections and projects – extracts a list of series from metadata describing oral histories held by the NLA and described in Trove
Download summaries and transcripts from oral histories – download all the available transcripts and summaries from digitised oral histories available in Trove

There’s also a couple of associated datasets:

The Trove Data Guide uses these datasets to create an overview of the collection. For example, here’s how the oral histories are distributed over time.

And here’s the top ten subjects of digitised oral histories.

subject	count
Painters – Australia – Interviews	193
Politicians – Australia	192
Prime ministers – Australia – Quotations	188
Older people – New South Wales – Biography	187
Menzies, Robert, Sir, 1894-1978. Speeches	185
Federal politicians	184
Politicians – Australia – Quotations	183
Australia – Politics and government – 1945-1965	172
Politicians – Australia – Interviews	171
Academics	126

The Trove Data Guide also includes information on the types of data from the oral histories and how you can access it.

Mapping MARC Geographic Area codes to Wikidata

Wed, 03 Jan 2024 17:03:48 +1000

Trove uses codes from the MARC Geographic Areas list to identify locations in metadata records. I couldn’t find any mappings of these codes to other sources of geospatial information, so I fired up OpenRefine and reconciled the geographic area names against Wikidata. Once I’d linked as many as possible, I copied additional information from Wikidata, such as ISO country codes, GeoNames identifiers, and geographic coordinates.

I’ve saved the resulting dataset in two formats – as a flattened CSV file (handy for loading as a dataframe), and as a JSON file that uses the geographic area codes as keys (handy for looking up values). You can download the datasets from this GitHub repository.

I’ve also written the codes back into the Wikidata records, so you can now find them with a SPARQL query like this.

The columns in the CSV file are:

code – MARC geographic areas code (without any trailing dashes)
place – name of geographic area from the MARC list
wikidata_label – name of geographic area from Wikidata
wikidata_id – Wikidata identifier
coordinates – pair of decimal coordinates in the form latitude,longitude (multiple values are pipe | separated)
iso_country_code – ISO two letter country code (multiple values are pipe | separated)
iso_numeric_country_code – ISO numeric country code (multiple values are pipe | separated)
geonames_id – GeoNames identifier (multiple values are pipe | separated)

Note that some fields can contain multiple values. For example the area Mediterranean Region is linked to 22 countries, so there will be multiple values in the ISO code fields.

For an example of this dataset in use, see Which countries do the oral histories relate to? in the Trove Data Guide.

National Archives of Australia in 2023 – digitisation of files

Wed, 03 Jan 2024 10:18:21 +1000

In 2023 the National Archives of Australia digitised 416,602 files (down from 575,597 in 2022). This chart shows the number of files digitised per day in 2023.

These files were drawn from 1,423 different series, but the vast bulk (81%) were from 4 series of World War Two service records. (This media release includes some details about the funding of the WW2 digitisation.)

Here’s the top twenty series by number of items digitised in 2023.

series	series_title	total
B883	Second Australian Imperial Force Personnel Dossiers, 1939-1947	201,511
A9301	RAAF Personnel files of Non-Commissioned Officers (NCOs) and other ranks, 1921-1948	111,673
A9300	RAAF Officers Personnel files, 1921-1948	14,125
B884	Citizen Military Forces Personnel Dossiers, 1939-1947	11,265
A14435	Stanley Fowler photographs showing the Australian fishing industry and coastline, numerical series with ‘LA’prefix	10,512
D3481	Photographs (black and white, colour) of buildings, installations, sites, etc	8,295
A1	Correspondence files, annual single number series [Main correspondence files series of the agency]	7,571
K1145	Certificates of Exemption from Dictation Test, annual certificate number order	4,169
A13150	Specifications, examiners reports and correspondence relating to the Registration of Victorian Patents - Second system	3,941
J853	Architectural plans, annual single number series with alpha (denoting Papua New Guinea and discipline) prefix and/or alpha/numeric (denoting size and amendment) suffix	3,322
A2571	Name Index Cards, Migrants Registration [Bonegilla]	2,204
D1423	Original plans (negatives), single number series with alpha prefix denoting discipline	2,102
AP67/1	Personal documents of British migrants (including ex-service) in receipt of free and assisted passages	2,058
E1652	Northern Territory Pastoral Applications (Pastoral Claims)	1,822
D5440	Photographs of post office buildings, personnel and equipment, single number series (with variations)	1,488
C609	Payment cards for employees' entitlements claims, alphabetical series	1,317
B3104	Photographs, Trans-Australian Railway, single number series	1,222
MP1117/2	Microfilm reels of RAAF Engineering Drawings	1,168
A2572	Name Index Cards, Migrants Registration [Bonegilla]	1,130
B6295	Photographs and negatives of Commonwealth building sites and Works departmental activities, single number series	1,113

For more data, see the naa-recently-digitised GitHub repository which runs a process every Sunday to save details of files digitised in the previous week.

Trove newspapers in 2023

Tue, 02 Jan 2024 16:56:58 +1000

I’ve been capturing weekly snapshots of the Trove newspaper corpus for the last couple of years. You can see the latest results in the Trove Newspaper Data Dashboard. Using this data I’ve compiled a quick summary of changes over the last year.

7,518,764 digitised newspaper articles were added to Trove in 2023. The total number of articles increased from 236,530,127 to 244,048,891. The chart below shows how the number of articles varied across the year. You’ll notice that the rate of digitisation increased about the same time the government announced new funding for Trove. Were more articles digitised because of the funding, or were articles in the digitisation pipeline held back until the funding was announced? Or both?

Most of the new articles were published in either Victoria or NSW – both these states had an increase of more than 3 million articles each! There were smaller increases for WA and SA. This chart shows the distribution of articles by state.

Fifty-seven new newspaper titles were added to Trove in 2023:

Trove Data Guide update – accessing data from newspapers and gazettes

Fri, 15 Sep 2023 14:53:20 +1000

I’m continuing to slog away at the Trove Data Guide (part of the ARDC’s HASS Community Data Lab) – dumping everything I know about Trove into a format that I hope will be useful for researchers.

I’ve just finished a first pass through the section on accessing data from newspapers and gazettes, and it’s online if you want to have a look. There’s still lots of things to add, update, and reorganise, but getting the basic content of the section defined is a bit of a milestone, so I’ll allow myself a little moment of celebration. Yay!

Of course it took longer than I expected, but that’s largely due to the fact that I was sketching out related sections as I went along. You’ll see lots of pages in the navigation that only contain a list of dot points, but they’ll get filled out over the next couple of months.

The ‘Accessing data’ section is going to be the most code heavy, as it’s focused on using the API to develop reusable methods for harvesting machine-readable data. Other sections, such as ‘Understanding search’ and ‘Collections and contexts’ will be more discursive, aimed at helping all Trove users better understand what Trove is and how it works.

Comments and suggestions are welcome! You can add issues on GitHub, or use Hypothes.is to annotate the text.

Some important updates for the Trove Newspaper & Gazette Harvester

Thu, 31 Aug 2023 17:00:54 +1000

Version 3 of the Trove API is out, and version 2 is scheduled to be decommissioned in early 2023 – that means I have a lot of code to update! First cab of the rank is the Trove Newspaper & Gazette Harvester with version 0.7.1 now available.

The Harvester is a Python package that can be used as either a library or a command-line tool. It’s been around in some form for more than 10 years. The latest updates include:

support for version 3 of the Trove API
automatic creation of a metadata file describing each harvest according to the RO-Crate format
automatic creation of a harvester config file, capturing the query parameters sent to Trove as well as the Harvester options
the ability to initiate a harvest from an existing config file
more memory-friendly generation of CSV result files (no loading everything into Pandas)

The RO-Crate integration was part of my work for the ARDC’s HASS Community Data Lab. The Harvester was already generating a simple metadata file that captured some of the harvest parameters, but now it documents the context of the harvest in much more detail, and saves it in a standard, Linked Open Data based, format.

Every harvest now creates an ro-crate-metadata.json file. This file includes details of the datasets created by the Harvester, such as the results.csv file that includes article metadata, and the text directory that contains the OCRd text. It also captures contextual information about the Harvester itself. The Harvester and the datasets are linked through a CreateAction that describes the harvesting process. The harvester_config.json file that saves the query parameters and Harvester options is also linked as an input to this process. In this way, all the components of the harvest are described and linked.

Here’s an example RO-Crate file.

Trove is changing all the time. By capturing information such as the query, the harvester version, the date, and the number of results, the RO-Crate file will help researchers document, manage, and share their research. And now that you can start a new harvest with an existing config file, it’s easy for researchers to re-run a harvest to see what changes over time.

As well as updating the Python package, I’ve also updated the Trove Newspaper & Gazette Harvester section of the GLAM Workbench. Here you’ll find examples of the Harvester in action, as well as some ways of exploring the harvested data. If you’d like to take the Harvester for a spin, the easiest way to start is with web app version – no software to install, no code to navigate! If you’re an Australian university researcher you can spin it up on the new ARDC Binder service in seconds.

Run GLAM Workbench notebooks on the ARDC’s new Binder service

Thu, 31 Aug 2023 12:52:11 +1000

There are a number of different ways to run the Jupyter notebooks in the GLAM Workbench depending on your needs and technical skills. But the easiest and quickest has always been the public, international Binder service, based in Europe. One click in the GLAM Workbench and Binder prepares a customised computing environment and loads up the Jupyter notebooks ready for you to explore. Unfortunately, the public Binder service has been having some capacity issues in the last few months, and sometimes repositories fail to run. The good news is that Australian university researchers now have another option with the launch of the Australian Research Data Commons Binder service!

The big difference between the ARDC’s Binder service and the international version is that you need to log in using your university credentials. While that’s an extra hassle, the service itself should be faster and more reliable for Australian researchers. For this reason, I’ve started making ARDC Binder links the default in a number of GLAM Workbench sections. Of course, not all GLAM Workbench users are attached to Australian universities, so the international Binder links remain – it’s just a matter of emphasis.

For example, near the top of many GLAM Workbench pages you’ll see Explore live on Binder buttons that launch the current repository. I’ve now added an Explore live on ARDC Binder option.

Most notebooks in the GLAM Workbench now have their own documentation page with a big blue button to launch the notebook on Binder. I’ve started changing these buttons to use the ARDC Binder service but, as you can see in the screenshot below, there’s also a link to run the notebook on the original Binder service, with no authentication required.

I’ve added some information on using the ARDC Binder service to the GLAM Workbench help pages.

I’ll be continuing to explore new options for running GLAM Workbench notebooks (I’m particularly interested in the possibilities of Jupyter Lite). Also the ARDC’s HASS Community Data Lab project is currently investigating ways of adding more authentication options to the Binder service to open it up to researchers outside of universities.

Trove Query Parser updated!

Sat, 26 Aug 2023 17:21:23 +1000

I’ve just updated the Trove Query Parser to work with version 3 of the Trove API. You just give it the url of a search in Trove’s newspapers, and it translates the search into a set of parameters that the API will understand. So this:

parse_query("https://trove.nla.gov.au/search/category/newspapers?keyword=wragge&l-artType=newspapers&l-state=Queensland&l-category=Article&l-illustrationType=Cartoon", 3)

Produces this:

{'q': 'wragge', 'l-artType': 'newspapers', 'l-state': ['Queensland'], 'l-category': ['Article'], 'l-illustrated': 'true', 'l-illustrationType': ['Cartoon'], 'category': 'newspaper'}

You can then feed the parameters to the Trove API with your API key and you’ll get data back. Easy! It’s simple but handy – I use the Query Parser in other tools like the Trove Newspaper Harvester and QueryPic.

This version adds a second parameter to the parse_query() function so you can specify the version of the Trove API you’re using. The default value is 2 for backwards compatibility. See the documentation for more information.

Family history resources in the GLAM Workbench

Fri, 18 Aug 2023 14:06:21 +1000

It’s Family History Month, so I thought a brief post was in order describing some of the family history related resources in the GLAM Workbench.

GLAM Name Index Search

This is the biggie (in more ways than one). I’ve brought 263 datasets from 10 Australian GLAM organisations together into a single search interface. All these datasets index collections by people’s names, so with one search you can find information about individuals across a broad range of records, locations, and periods. There’s more than 10 million rows of data to explore!

Try it now!

NSW Post Office and Sydney Telephone Directories

Many volumes of the NSW Post Office and Sydney Telephone Directories have been digitised and made available through Trove. However, they’re not easy to search. I’ve taken the text from these volumes and indexed it by line to make it easier to find people and places. There’s two databases to explore:

New South Wales Post Office Directories (54 volumes from 1886 to 1950)
Sydney Telephone Directories (44 volumes from 1926 to 1954)

Tasmanian Post Office Directories

The Tasmanian Post Office Directories have been digitised by Libraries Tasmania, but each volume is available as a separate PDF, making it difficult to search across the collection. I’ve downloaded all the PDFs, extracted the text and images, and indexed the content by line. Now you can search all 48 volumes, from 1890 to 1948, at once!

Try it now!

Trove Places

If your family history research is focused on a particular region, it can be useful to know what newspapers from that region are digitised in Trove. Trove Places is a map interface to Trove’s digitised newspapers, just click on a place to find newspapers published or distributed nearby.

Try it now!

Save a Trove newspaper article as an image

You might have noticed that Trove’s download option for digitised newspaper articles can slice articles up in unfortunate ways. This simple web app saves the complete article as a single image (or multiple images if the article is split across different pages). Simple, but very useful.

Try it now!

Trove Newspapers Data Dashboard

Just about every week more digitised newspaper articles are added to Trove. The search you tried a couple of months ago might now produce different results. How can you keep track of these changes? The Trove Data Dashboard uses weekly snapshots of the digitised newspapers to visualise changes over time. It’s updated every Sunday!

Try it now!

And more!

The GLAM Workbench provides a wide range of tools and examples to help you work with data from libraries, archives, and museums.

The resources above cost money to keep online. If you find them useful, you might like to sponsor me on GitHub or buy me a coffee.

Bye bye birdsite

Thu, 17 Aug 2023 11:53:47 +1000

In early June I pinned a “nobody’s home” post to my profile and said goodbye to Twitter. After 15 years, I was sad to leave behind friends and colleagues, but glad to get away from the hate, the nazis, and the transphobes. I hadn’t been posting much since Elno took over anyway, and was happily building a new network over on Mastodon. This morning I finally removed the Twitter links from my home page. Bye bye birdsite.

If you want to find me now I’m @wragge@hcommons.social. If you’re still wondering about making the move to Mastodon, I can heartily recommend hcommons.social as a friendly, well run instance. They also have a really useful guide to Mastodon for humanities scholars.

Exploring the front pages of newspapers (10 years on)

Tue, 08 Aug 2023 16:28:18 +1000

Way back in 2012, I used the brand new Trove API to download the details of 4 million articles published on the front pages of newspapers. I did it for two reasons: first, I wanted to see how the content of front pages changed over time; and second, I wanted to show that large-scale data wrangling was entirely possible using nothing more than a laptop and a home broadband connection. I described my adventures in this blog post, but if you look at it now you’ll see lots of sad, empty boxes where live charts used to be. This is because I shared my results though a custom web application which, 10 years later, seems like a really, really dumb idea. Needless to say the app fell foul of web hosting changes and is no more. Nowadays I use GitHub and Jupyter notebooks to share my data noodling, so I thought it was time to revisit the topic of newspaper front pages, and create something a bit more robust.

If you just want the data and code, head over to the GitHub repository. I’ve shared the notebooks used to harvest, convert, and explore the data, as well as two parquet formatted datasets.

As you might expect, there are a lot more articles now, as new articles and newspapers are added to Trove every week (see the Trove Newspaper Data Dashboard for details). Instead of 4 million articles, I ended up downloading details of more than 19 million articles! I used my trusty Trove Newspaper Harvester as a library so I could more easily manage the way the data was saved. After several days I had a 14.6gb newline-delimited JSON file. I pared the data down to only the necessary fields, and used DuckDB to create two parquet files – one with the article data, and another that added up the number of words on each page in the different article categories.

One nifty thing about the Trove API is that it tells you the number of words in each newspaper article. Articles are also assigned to a series of categories, such as ‘Article’ (your standard news-type piece) and ‘Advertising’. By adding up the number of words in each category I could explore how the front page’s mix of articles and advertising changed over time. Here, for example, is what happened to the front page of the Sydney Morning Herald.

The first chart shows the average number of words per page in the ‘Article’ and ‘Advertising’ categories across the full span of the SMH’s publication run digitised in Trove (advertising is blue, and articles orange). The second focuses in on the point where the number of words in articles overtakes the number of words in advertisements. You can see that for most of the publication run, the front page was dominated by advertising. But when change came it was quite abrupt. Sydney-siders awoke on 15 April 1944 to a new look newspaper. Here’s what the front page looked like at the beginning and end of the period represented by the second chart.

Different newspapers have different histories. I’ve created a notebook that generates more of these visualisations for a range of newspapers, and tries to provide a bit more context around the changes taking place.

As I continue my work on the Trove Data Guide for the HASS Community Data Lab, I’m discovering more and more inconsistencies, oddities, and undocumented possibilities. As I was finishing up the work on front pages, I realised there was another way of exploring the same topic – by looking at the space articles take up. This data isn’t in the API, but it can be scraped from the web site. Hmmm, interesting…

Trove API Console updates

Tue, 18 Jul 2023 11:27:11 +1000

The Trove API Console provides examples of the Trove API in action that you can run, edit, and share. It’s been online for 9 years now, and I’ve just updated it to use version 3 of the Trove API by default. I’ve also added a new ‘Share’ button that makes it easier to share and embed examples.

If you click on the ‘Share’ button, a box will pop up.

If you add a comment, this will appear above the example query when users follow the shared link. You can use this to provide them with some context or a description.

There are two buttons providing different share options:

Copy share url – copies the full url to the shared example
Copy Markdown button – copies a Markdown snippet that embeds a button like this linked to your example. Just paste it into your Markdown-formatted documentation!

Other improvements include:

The parameters used in any request are now displayed in a table for easy reference.
The Console includes a link to an API Status page. This page runs all the example queries in the Console and checks the results to make sure the API is working as expected. It’s updated every 6 hours.

Version 3 of the Trove API includes a standard Swagger UI that you can use to build queries, and provides limited anonymous access (without the need for an API key). But if, like me, you learn best by looking at examples, then you should find the API Console handy. In particular, the API Console makes it easy to share live examples which can be very useful in training, troubleshooting, and writing documentation. The Trove Data Guide, which I’m working on at the moment, includes lots of ‘Try it!’ buttons.

Tim Sherratt

How to download all the images from a digitised collection in Trove (& learn some cool Trove tricks)

Why is it necessary?

How does it work?

Where can I learn all these Trove tricks?

Where is the new notebook?

What do you want to do with Trove data?

Update! Saving Trove newspaper articles and pages as images

Getting to know NED – born-digital periodicals in Trove

More tools and data for working with Trove's digitised periodicals

What periodicals have been digitised?

New notebooks

Updated and reorganised datasets

A new way to explore editorial cartoons from *The Bulletin*

New GLAM Workbench section for working with government publications in Trove

Digital history stream at AHA annual conference in July

Some recent presentations on the GLAM Workbench and Trove Data Guide

Exploring Trove’s digitised periodicals

The data

The Trove Newspaper Data Dashboard now has an archive!

Customising Datasette-Lite to explore datasets in the GLAM Workbench

Custom theme

Custom plugins

Custom metadata

Full text indexing

Drop unwanted columns

What’s going on?

Exploring oral histories in Trove

Mapping MARC Geographic Area codes to Wikidata

National Archives of Australia in 2023 – digitisation of files

Trove newspapers in 2023

Trove Data Guide update – accessing data from newspapers and gazettes

Some important updates for the Trove Newspaper & Gazette Harvester

Run GLAM Workbench notebooks on the ARDC’s new Binder service

Trove Query Parser updated!

Family history resources in the GLAM Workbench

GLAM Name Index Search

NSW Post Office and Sydney Telephone Directories

Tasmanian Post Office Directories

Trove Places

Save a Trove newspaper article as an image

Trove Newspapers Data Dashboard

And more!

Bye bye birdsite

Exploring the front pages of newspapers (10 years on)

Trove API Console updates

A new way to explore editorial cartoons from The Bulletin