Repositories in the GLAM Workbench have been launched on Binder 3,529 times since the start of this year (according to data from the Binder Events log). That’s repository launches, not notebooks. Having launched a repository, users might use multiple notebooks. And of course these stats don’t include people using the notebooks in contexts other than Binder – on their own machines, servers, or services like AARNet’s SWAN. Or just viewing the notebooks in GitHub and copying code into their own projects.
Earlier this year I gave a seminar for the International Internet Preservation Consortium (IIPC) introducing the web archives section of the GLAM Workbench. The seminar is now available online: youtu.be/rVidh_wex…
Here are the slides if you want to follow along. #dhhacks
The Trove Newspaper & Gazette Harvester has been updated to version 0.4.0. The major change is that if the OCRd text for an article isn’t available through the API, it will be automatically downloaded via the web interface. What does this mean in practice? Well previously you couldn’t harvest OCRd text from the Australian Women’s Weekly because it’s not included in API results, but now you can!
You don’t need to do anything differently.
If you’ve done any searching in Trove’s digitised newspapers, you’ve probably noticed that there aren’t many results after 1954. This is basically because of copyright restrictions (though given the complexities of Australia’s copyright system, you can’t be sure that everything published before 1955 is out of copyright). We can visualise the impact of this by looking at the number of newspaper articles in Trove by year.
You can see why I started referring to it as the copyright cliff of death.
Updated! Find Trove newspapers by place of publication by using this simple interface – just click on the map to find the 10 closest newspapers. Now including newspapers added to Trove since June.
The underlying data file is available as a spreadsheet. Feel free to add a comment if you notice any problems. I’m geolocating place names found in newspaper titles, so it’s not always exact.
These are large format bound volumes of the official lists that were posted up for the public to see - 3 times a day - forenoon, noon and afternoon - at the close of the trading session in the call room at the Sydney Stock Exchange. The closing prices of stocks and shares were entered in by hand on pre-printed sheets.
The volumes have been digitised, resulting in a collection of 70,000+ high resolution images. You can browse the details of each volume using this notebook.
I’ve been exploring ways of getting useful, machine-readable data out of the images. There’s more information about the processes involved in this repository. I’ve also been working on improving the metadata and have managed to assign a date and session (Morning, Noon, or Afternoon) to each page. We these, we can start to explore the content!
One of the notebooks creates a calendar-like view of the whole collection, showing the number of pages surviving from each trading day. This makes it easy to find the gaps and changes in process. #dhhacks
Any regular user of RecordSearch, the National Archives of Australia’s online database, will understand its frustrations. But here’s a handy little hack to fix a couple of annoying problems and add some useful functionality!
The RecordSearch Show Pages userscript updates links to digitised files in search results and item details pages, inserting the number of pages in a file. This means that you can easily scan a list of search results to see where the big fat files are, without having to click through to each one individually.
But wait there’s more! The script also rewrites the link to the digitised file viewer so that it opens in the current tab, as you would expect, and not in an annoying pop up window!
And as an extra bonus if you install now, the script also inserts a link on the barcode of an item in the digitised file viewer that takes you back to the item details page. Links to the digitised file viewer are shareable (unlike most RecordSearch links), but they don’t give you a way to find more information about the item. That problem is also fixed by this handy little script.
I’ve added more years to my repository of Commonwealth Hansard! The repository now includes XML-formatted text files for both houses from 1901 to 1980, and 1998 to 2005. I’ve done some more checking and confirmed that the XML files for 1981 to 1997 aren’t currently available through ParlInfo, however, the Parliamentary Library are looking into it. I’ve also created a CSV-formatted list of sitting days from 1901 to 2005 (based on ParlInfo search results). Details of the harvesting process are available in the GLAM Workbench. #dhhacks
It was Open Access Week last week, so I tried a little experiment. How many research articles published in Australian Historical Studies between 2008 and 2018 are available via Open Access? Just 9.5% (23 out of 242). This is despite the fact that all articles published in 2018 or earlier are outside of the journal’s embargo period and Green OA versions could be shared through repositories.
Calling users of Australian galleries, libraries, archives, & museums – OzGLAM Help is now live! Ask a question or simply share your latest discoveries. There’s handy tips, news about recent developments, & links to useful tools. Please use & share! #dhhacks
The Zotero translator for RecordSearch (the National Archives of Australia’s online database) has been updated. There’s many fixes and enhancements — see the full details. #dhhacks
If you try to share or bookmark the url of an item in RecordSearch (the National Archives of Australia’s online database), you’ll often get a ‘Session time out’ error when you access it. That’s because the urls only work within the current active RecordSearch session. So how can you create a shareable link that works across sessions? I’ve created a simple app that helps you create shareable links: recordsearch-links.glitch.me #dhhacks
The Zotero translator for Trove was failing on newspaper articles with tags. I’ve submitted a fix for approval: github.com/zotero/tr…
I’m not sure yet whether the capture of works and search results can be fixed following the Trove redesign. React is not very scraper friendly…
Another #GLAMWorkbench update! Snip words out of @TroveAustralia newspaper pages and create big composite images. OCR art! glam-workbench.github.io/trove-new… #dhhacks
Just in time for #GovHack, I’ve given the Trove API Console a major overhaul. It’s been updated for the latest API versions and has MANY MANY more examples. Explore all the data you can get from @TroveAustralia! troveconsole.herokuapp.com #dhhacks
Ok, so do you want to make your own ‘scissors & paste’ messages using words from @TroveAustralia newspaper articles? Go to the notebook in #GLAMWorkbench & click on ‘Run live on Binder in Appmode’. #dhhacks
Another #GLAMWorkbench update! The Trove Harvester will now download both newspaper and gazette articles in bulk. You can optionally include full text, and save copies of the articles as images and PDFs. #dhhacks glam-workbench.github.io/trove-har…
Interested in using web archives in your research? Join us on 5/6 August for a free @netpreserve webinar introducing the tools and examples available in the new #webarchives section of the #GLAMWorkbench. There are two timeslots to cover multiple timezones: www.eventbrite.com/e/iipc-rs… and www.eventbrite.com/e/iipc-rs…
Introducing a brand new section of the #GLAMWorkbench, exploring the @MuseumsVictoria collection API. Harvest species records, display random images, and download ALL THE ANTECHINUSES! glam-workbench.github.io/museumsvi… #dhhacks