Tim Sherratt

Follow @wragge on Micro.blog.

2019-08-13: Another WIP notebook in need of additional documentation… This one explores the stats around …

2019-08-13: And this notebook uses TF-IDF to explore the OCRd text of a digitised journal from Trove. Get the …

2019-08-13: This notebooks lets you download the OCRd text of a digitised journal from @TroveAustralia (via …

2019-08-13: A new notebook looking at the data about digitised journals on @TroveAustralia. #dhhacks

2019-08-09: There’s a new section of the GLAM Workbench devoted to the National Museum of Australia collection …

2019-07-28: The second new notebook looks at @TroveAustralia’s newspapers as a whole, visualising both by time …

2019-07-28: Some brand new Jupyter notebooks for those interested in #ozhist & digital exploration of …

2019-07-27: I’ve updated the @invisibleaus data repository with latest transcriptions/markings from White …

2019-07-25: According to my last harvest, @TroveAustralia’s digitised journals comprise 31,216 separate issues. …

2019-07-24: Updates to the Trove newspapers section of GLAM Workbench – adding links to app-ified versions of …

2019-07-24: Download & explore 1,499,259 rows of open data from NSW State Archives Online Indexes NSW State Archives publishes a number of detailed indexes containing data manually extracted from …

2019-07-12: Visualising CV-detected column widths across 100 volumes (30,000+ pages) of Sydney Stock Exchange …

2019-07-11: New in GLAM Workbench! Notebooks to harvest, index, analyse, and aggregate transcripts of speeches …

2019-07-11: I’ve updated my harvest of the PM Transcripts site — 22,814 XML files with transcripts of speeches, …

2019-07-11: Reorganising things a little at GLAM Workbench. @statelibrarynsw gets its own section. Hansard and …

2019-07-07: What’s that? You want MORE GLAM data? Well, I’ve started a list of sources for Australian GLAM data. …

2019-07-07: I’ve updated my harvest of GLAM datasets from data.gov.au. Now there’s 584 CSV files available …

2019-07-06: I’ve put a copy of my article on using @TroveAustralia for digital research/play, written for the …

2019-07-05: I’ve updated the list of orgs who have supported the digitisation of @TroveAustralia’s …

2019-07-05: Today I finished updating a harvest of all OCRd text available from Trove’s digitised journals. …

2019-07-05: Update time! Yesterday I updated my Trove digitised journals app to include all the exciting new …

2019-07-03: A quick interactive view of newspaper articles in @TroveAustralia by state and year. Click on the …

2019-07-03: Anyone who’s been to one of my Trove workshops will be pleased to know that the WWI effect is still …

2019-07-03: So there are now almost twice as many newspaper articles in @TroveAustralia from NSW as there are …

2019-06-23: Well look at that! – a selection of my @TroveAustralia related Jupyter notebooks turned into simple …

2019-06-17: Kicked off a new GLAM Workbench repository dedicated to @SLSA with a quick notebook hack to get …

2019-06-10: Search @TroveAustralia newspapers without leaving Twitter using the updated and enhanced …

2019-06-07: Recent additions to the Trove Newspapers section of the GLAM Workbench: getting images from …

2019-06-04: Want to upload @TroveAustralia newspaper articles to @Omeka-S to create an exhibition or populate a …

2019-06-02: Slides ready for tomorrow’s workshop at @unicanberra – Trove as a pltform for digital research …

2019-05-31: Ever wanted to save a @TroveAustralia newspaper article as an image? This notebook lets you do just …

2019-05-26: More GLAM Workbench updates! More full text of Australian books! I’ve added the notebook & …

2019-05-25: 2019 has been pretty busy so far! I just compiled a list of tools, updates, and examples from the …

2019-05-24: Here’s how you can get the text of Australian books in @TroveAustralia from the Internet …

2019-05-22: I’ve updated the data that sits behind my Trove Places app and added more than 140 newspaper …

2019-05-21: If you’re researching foreign policy using @naagovau you might find this little tool useful – …

2019-05-19: And what can you do with 400 CSV files? Well, you could explore their contents using my GLAM CSV …

2019-05-19: Some overdue updates to the GLAM Workbench. First here’s details, data, and code from a …

2019-05-09: Over the last week I’ve been downloading editorial cartoons published in The Bulletin from …

2019-05-04: After a number of unsuccessful attempts, I seem to be getting The Bulletin title art fairly reliably …

2019-04-29: Here’s the notebook-ified version of the code I used to harvest all the Australian …

2019-04-28: I’ve reharvested Commonwealth Hansard from 1901 to 1980 and updated my repository of XML …

2019-04-27: And now my GLAM Workbench has a ‘Trove Maps’ section to document examples and …

2019-04-26: The other night @OpenGLAM was sharing collections of high-res images from GLAM orgs that are free to …

2019-04-25: If you’d like to make your own big, composite images from lots of @TroveAustralia newspaper …

2019-04-25: Australian pilots, aviators, airmen, and flyers — 4,950 thumbnails from a search in …

2019-04-23: I’ve been busy lately harvesting LOTS of full text data from @TroveAustralia’s digitised …

2019-04-22: I’ve added a section for the @TroveAustralia ‘book’ zone to the GLAM Workbench.

2019-04-22: Ok, so I’ve downloaded the OCRd text from 27,426 issues of 358 digitised journals/series in …

2019-04-22: All 9,738 OCRd text files harvested from books, pamphlets and leaflets in @TroveAustralia’s …

2019-04-21: So @TroveAustralia includes more than 370,000 press releases, speeches, and interview transcripts …

2019-04-20: Among the OCRd texts I’m currently harvesting from Trove’s journals zone are things like the …

2019-04-20: Wow, there are now over 371,000 press releases, interview transcripts and more from the @ParlLibrary …

2019-04-19: Another collection of OCRd text from @TroveAustralia is on its way…

2019-04-17: Playing with @TroveAustralia newspaper results. Here’s illustrated articles with ‘White Australia …

2019-04-15: The final tally – after much tweaking I’ve downloaded OCRd text from 9,738 works in the …

2019-04-14: I’m looking for books in @TroveAustralia, but there’s lots of ephemera (pamphlets, posters etc) in …

2019-04-12: Text of over 3 thousand digitised books and pamphlets downloaded so far from @TroveAustralia…

2019-04-11: After talking to @PrimahadiWijaya today about work at @MonashLing, I started harvesting metadata …

2019-04-11: What I did at #valatechcamp! Here’s a CSV with basic details of 7,719 digitised books …

2019-04-11: TIL that the web pages for digitised works (like books and journal issues) on @TroveAustralia embed …

2019-04-11: Just posting the link to my ‘Introducing APIs’ slides for #VALATechCamp again, so that …

2019-04-07: Hmm, it occurs to me that the method I used to generate newspaper article thumbnails from Trove, …

2019-04-07: So, I’ve finally figured out a way to automatically generate nice-looking thumbnails from …

2019-04-06: So I put the recent report into Australia’s national cultural institutions into the @TDHASSN …

2019-03-31: Train from Canberra to Melbourne booked for #VALATechCamp. I’ll be hanging around both days, …

2019-03-26: Sneak preview of my GLAM CSV Explorer now live on @MyBinderTeam! Select one of 447 GLAM-related CSVs …

2019-03-26: Having ripped out a lot of code and simplified a mess of conditionals, I think this CSV Explorer …

2019-03-25: Now to load that new CSV of GLAM CSVs into my CSV Explorer…

2019-03-25: Quick notebook to harvest GLAM datasets via the new(ish) @datagovau API. Includes 447 CSVs from 19 …

2019-03-23: This is why we can’t have nice things…

2019-03-22: Still plenty to do, but my CSV Explorer is taking shape… (coming soon to @TDHASSN & elsewhere!) …

2019-03-21: Doesn’t take much to show when there’s a problem with dates in metadata… (Yep, post 1900 …

2019-03-20: Fun day talking to the @dhpanu team at ANU about digital history possibilities. Slides/links are all …

2019-03-16: Currently working on a CSV explorer to give researchers an overview of the contents of GLAM …

2019-03-12: A bit more progress. Having found columns with OpenCV, I can use Tesseract to help me find the rows…

2019-03-11: After much OpenCV fiddling & tweaking, sorry… iteration, I’m pretty pleased with this. Columns …

2019-03-08: So right around now I think I’m talking (via video) about my adventures with #HistoricHansard …

2019-03-07: More updates! Latest data and images from The Real Face of White Australia transcription project are …

2019-03-07: I’ve finally updated the @TroveAustralia API Console to use version 2 of the API & https by …

2019-03-07: Lots of exciting new stuff has been added to @TroveAustralia’s digitised journals in the last few …

2019-03-07: Art & Architecture: the journal of the Institute of Architects NSW, 51 issues from 1905 to 1912 …

2019-03-07: Only 12 issues, but check out the fabulous covers on The New Triad from 1927-8. Now on …

2019-03-07: Want some arts? 130 issues of RealTime from 1994 to 2016 now on @TroveAustralia.

2019-03-07: Hey #ozhist, 295 issues of the journal of the Royal Australian Historical Society from 1918 to 1954 …

2019-03-07: Also amongst the latest batch of digitised journals on @TroveAustralia, 39 issues of Camp Ink from …

2019-03-07: There’s more literary journals digitised in @TroveAustralia as well. Including 18 issues of …

2019-03-07: But wait, there’s more — the KCC Kennel Gazette was renamed, wait for it, Dogs. Another 94 …

2019-03-07: People, you need to know that @TroveAustralia has digitised 360 issues of the KCC Kennel Gazette …

2019-03-07: Updating my list of digitised journals on @TroveAustralia this morning and seeing what’s new. …

2019-03-01: I’ll be running some more @TroveAustralia workshops for @UniCanberraReD this year. On 13 May …

2019-02-26: #dhhacks — Save a page image from the State Archives of NSW's Bubonic Plague Register So NSW State Archives has digitised the Register of Cases of Bubonic Plague 1900-1908. Great work! …

2019-02-24: I’ve updated the notebook for harvesting records from @archivesnz’s Archway database in …

2019-02-24: Uh, ok — so an advanced search for keywords only in Archway gives me a maximum of 1000 results. But …

2019-02-21: Looks like I’ll be heading to the VALA Tech Camp in April to talk APIs. See you there!

2019-02-21: New section added to my GLAM Workbench for the Queensland State Archives (@qsarchives). Includes a …

2019-02-21: So in case you’re wondering, the @qsarchives ‘Naturalisations 1851 to 1904’ index …

2019-02-19: Whoops. Here’s the actual full list of countries of origin from the @nswarchives NSW …

2019-02-19: Here’s the full list of countries of origin from the NSW naturalisations data, 1834-1903.

2019-02-19: NSW naturalisations 1834 to 1903. The sudden rise in Chinese naturalisations followed the …

2019-02-17: Suggestions of new topics and collections for my GLAM workbench are welcome!

2019-02-17: Here’s an example dataset harvested from Library and Archives Canada’s naturalisation …

2019-02-17: I’ve added a section for Library and Archives Canada to my GLAM workbench. The first notebook …

2019-02-15: Current status — extracting data from Library and Archives Canada’s 1915-1946 naturalisation …

2019-02-14: The full text of ‘Who belongs? Reading identity, ownership, and legitimacy’, my talk for …

2019-02-08: My talk for #text2data at the National Library of Sweden looks at occurence of the words …

2019-02-05: Back to school report — what I did on my holidays…

2019-02-04: Another slide for Sweden — this one comparing words appearing before ‘aliens’ in The …

2019-02-03: Working on my slides for From Text to Data in Stockholm this week…

2019-02-01: I’ve added a ‘save chart’ option to the QueryPic app in my GLAM Workbench. …

2019-01-30: Pleased and proud to see the chapter @baibi & I wrote on the Real Face of White Australia now …

2019-01-27: In a bit over a week I’ll be heading to Stockholm for the ‘From text to data’ …

2019-01-26: Talking about 'immigrants' in Trove's digitised newspapers I’m giving a talk in a week or so (eep!) that looks at some of the changing contexts in which …

2019-01-26: In case you’re wondering, it took about 13 hours to download the metadata and full text of …

2019-01-25: Ok, so let’s see how I go harvesting 2 million newspaper articles from @TroveAustralia …

2019-01-25: 30,000+ occurences of the word ‘Chinese’ in the OCRd full text of The Bulletin, …

2019-01-23: One more and I’m done for the night… New GLAM Workbench page for the ‘Trove API …

2019-01-23: I’ve finished putting details of all the current GLAM Workbench repositories into the new …

2019-01-23: Added a ‘data’ section to the GLAM Workbench docs, with info on harvests from government …

2019-01-23: And now a GLAM Workbench page for @Te_Papa…

2019-01-23: Added a page for @ArchivesNZ’s Archway to the GLAM Workbench docs…

2019-01-22: So here’s some fun things to do with @TroveAustralia newspapers… (via GLAM Workbench)

2019-01-22: Ok, more documentation for you — page for the @DigitalNZ API in GLAM Workbench updated!

2019-01-22: Slowly working my way through the documentation for my GLAM Workbench. Still lots to do, but I think …

2019-01-22: If there are APIs or other data sources you’d like me to add to my GLAM Workbench, feel free …

2019-01-21: Updated list of the fifty most common words occuring before the word ‘aliens’ in …

2019-01-21: Just updated my harvest of metadata and full text from The Bulletin in @TroveAustralia. …

2019-01-20: Fifty most common words occuring before the word ‘aliens’ in @TroveAustralia newspapers …

2019-01-19: You want big data? I just harvested 213,340 newspaper articles (including full OCRd text) from …

2019-01-19: So now I’ve updated TroveHarvester and built a new interface I can get back to the task I …

2019-01-19: Want an easy way to download @TroveAustralia newspaper articles in bulk? No installation? Point and …

2019-01-19: And version 0.2.2 of TroveHarvester quickly follows 0.2.1 as I squash a bug when downloading …

2019-01-18: TroveHarvester 0.2.1 — updated to work with version 2 of the @TroveAustralia API. Now on pypi! More …

2019-01-18: Ok, that’s more like it. Full text and metadata of 29,203 newspaper articles harvested using …

2019-01-18: Ah ok, I forgot about the new ‘bulkHarvest’ parameter in the @TroveAustralia API. …

2019-01-18: Uh, never come across one of these before from the @TroveAustralia API. Needless to say it causes …

2019-01-18: Testing the updated Trove Newspaper Harvester… Run into a problem with the @TroveAustralia …

2019-01-18: Thanks to the @TroveAustralia API upgrade, the new version of the Trove Newspaper Harvester should …

2019-01-18: Since I’m updating the Trove Newspaper Harvester to work with version 2 of the @TroveAustralia …

2019-01-17: I’m enjoying using micro.blog as a way of capturing what I’m working on: …

2019-01-17: Finally biting the bullet and getting to work on updating the TroveHarvester to work with version 2 …

2019-01-17: That’s cool — just realised I can share easily share live versions of Altair charts from …

2019-01-17: And also “coloured alien” which, not suprisingly, peaks in 1901 when the Immigration …

2019-01-17: Exploring some of the adjectives attached to ‘alien’ in @TroveAustralia …

2019-01-17: Just to emphasise my point the other day about the impact of stemming on searches for …

2019-01-17: Nothing like browsing the databases of another country’s national/state archives to make you …

2019-01-16: The Australian version of ‘Who’s responsible?’ is up! Just select a government …

2019-01-16: New notebook added to the #GLAMWorkbench RecordSearch repository — get the basic details of agencies …

2019-01-16: Hmm, wondering why the ‘National Council of Women of the Australian Capital Territory’ …

2019-01-15: As well as cross-posting updates to Twitter and Mastodon, I’ve now set up IFTTT to keep an eye …

2019-01-15: Adventures in stemming, or what happens when you search Trove for 'naturalization' Fun fact — the Porter stemming algorithm treats the words ‘naturalisation’ and …

2019-01-14: I have a brand new updates page powered by micro.blog!