The Trove Newspaper Harvester has been around in different forms for more than a decade. It helps you download all the articles in a Trove newspaper search, opening up new possibilities for large-scale analysis. You can use it as a command-line tool by installing a Python package, or through the Trove Newspaper Harvester section of the GLAM Workbench.
I’ve just overhauled development of the Python package. The new trove-newspaper-harvester replaces the old troveharvester repository. The command-line interface remains the same (with a few new options), so it’s really a drop in replacement. Read the full documentation of the new package for more details.
Here’s a summary of the changes:
save_csv()
option to convert this to a CSV file, while the CLI automatically converts the results to CSV to maintain compatibility with previous versionsI’ve also updated the Trove Newspaper Harvester section of the GLAM Workbench to use the new package. The new core library will make it easier to develop more complex harvesting examples – for example, searching for articles from a specific day across a range of years. If you find any problems, or want to suggest improvements, please raise an issue.