Tim Sherratt

Sharing recent updates and work-in-progress

Jul 2022

Where did all those NSW articles go? Trove Newspapers Data Dashboard update!

I was looking at my Trove Newspapers Data Dashboard again last night trying to figure out why the number of newspaper articles from NSW seemed to have dropped by more than 700,000 since my harvesting began. It took me a while to figure out, but it seems that the search index was rebuilt on 31 May, and that caused some major shifts in the distribution of articles by state, as reported by the main result API. So the indexing of the articles changed, not the actual number of articles. Interestingly, the number of articles by state reported by the newspaper API doesn’t show the same fluctuations.

Screenshot of data dashboard that compares the number of articles by state as reported by the results and newspapers APIs. There are major differences in the column that shows the change since April 2022.

This adds another layer of complexity to understanding how Trove changes over time. To try and document such things, I’ve added a ‘Significant events’ section to the Dashboard. I’ve also included a new ‘Total articles by publication state’ section that compares results from the result and newspaper APIs. This should make it easier to identify such issues in the future.

Stay alert people – remember, search interfaces lie!