Tim Sherratt

Sharing recent updates and work-in-progress

Jul 2019

Today I finished updating a harvest of all OCRd text available from Trove’s digitised journals. That’s about 7gb of text from 30,462 issues of 384 different journals — a fab corpus for text analysis! Here’s all the metadata, links, and harvesting code. @TDHASSN #dhhacks