I was looking at my Trove Newspapers Data Dashboard again last night trying to figure out why the number of newspaper articles from NSW seemed to have dropped by more than 700,000 since my harvesting began. It took me a while to figure out, but it seems that the search index was rebuilt on 31 May, and that caused some major shifts in the distribution of articles by state, as reported by the main result
API. So the indexing of the articles changed, not the actual number of articles. Interestingly, the number of articles by state reported by the newspaper
API doesn’t show the same fluctuations.
This adds another layer of complexity to understanding how Trove changes over time. To try and document such things, I’ve added a ‘Significant events’ section to the Dashboard. I’ve also included a new ‘Total articles by publication state’ section that compares results from the result
and newspaper
APIs. This should make it easier to identify such issues in the future.
Stay alert people – remember, search interfaces lie!