Tim Sherratt

Sharing recent updates and work-in-progress

Oct 2021

Getting data about newspaper issues in Trove

When you search Trove’s newspapers, you find articles – these articles are grouped by page, and all the pages from a particular date make up an issue. But how do you find out what issues are available? How do you get a list of dates when newspapers were published? This notebook in the GLAM Workbench shows how you can get information about issues from the Trove API.

Using the notebook, I’ve created a couple of datasets ready for download and use.

Total number of issues per year for every newspaper in Trove

Harvested 10 October 2021

CSV formatted dataset containing the number of newspaper issues available on Trove, totalled by title and year – comprises 27,604 rows with the fields:

  • title – newspaper title
  • title_id – newspaper id
  • state – place of publication
  • year – year published
  • issues – number of issues

Download from Cloudstor: newspaper_issues_totals_by_year_20211010.csv (2.1mb)

Complete list of issues for every newspaper in Trove

Harvested 10 October 2021

CSV formatted dataset containing a complete list of newspaper issues available on Trove – comprises 2,654,020 rows with the fields:

  • title – newspaper title
  • title_id – newspaper id
  • state – place of publication
  • issue_id – issue identifier
  • issue_date – date of publication (YYYY-MM-DD)

To keep the file size down, I haven’t included an issue_url in this dataset, but these are easily generated from the issue_id. Just add the issue_id to the end of http://nla.gov.au/nla.news-issue. For example: http://nla.gov.au/nla.news-issue495426. Note that when you follow an issue url, you actually get redirected to the url of the first page in the issue.

Download from Cloudstor: newspaper_issues_20211010.csv (222mb)

For more information see the Trove newspapers section of the GLAM Workbench.