glamworkbench

Some research projects that have used QueryPic

Monday, August 30, 2021

A Twitter thread about some of the research uses of QueryPic… QueryPic, my tool for visualising searches in @TroveAustralia’s digitised newspapers, has been around in different forms for more than 10 years. The latest version is part of the #GLAMWorkbench: https://t.co/qnY5tVDwgY #researchinfrastructure pic.twitter.com/QyHWJwGV3u — Tim Sherratt (@wragge) August 29, 2021 I thought I’d highlight some of the research publications that have made use of QueryPic over the years, so, in no particular order.

Continue reading →

Government publications in Trove

Monday, August 30, 2021

Over the last few weeks I’ve been updating my harvests of OCRd text from digitised books and periodicals in Trove. As part of the harvesting process, I’ve created lists of both that are available in digital form – this includes digitised works, as well as those that are born-digital (such as PDFs or epubs). I’ve published the full lists of books and periodicals as searchable databases to make them easy to explore.

Continue reading →

GLAM Workbench – a platform for digital HASS research

Thursday, August 26, 2021

We’re in the midst of planning for the HASS Research Data Commons, which will deliver some much-needed investment in digital research infrastructure for the humanities and social sciences. Amongst the funded programs are tools for text analysis as part of the Linguistics Data Commons, and a platform for more advanced research using Trove. I’m hoping that this will be an opportunity to take stock of existing tools and resources, and build flexible pathways for researchers that enable them to collect, move, analyse, preserve, and share data across different platforms and services.

Continue reading →

A Family History Month experiment – search millions of name records from GLAM organisations

Monday, August 23, 2021

There’s a lot of rich historical data contained within the indexes that Australian GLAM organisations provide to help people navigate their records. These indexes, often created by volunteers, allow access by key fields such as name, date or location. They aid discovery, but also allow new forms of analysis and visualisation. Kate Bagnall and I wrote about some of the possibilities, and the difficulties, in this recently published article. Many of these indexes can be downloaded from government data portals.

Continue reading →

Explore Trove’s digitised books

Monday, August 16, 2021

The Trove books section of the GLAM Workbench has been updated! There’s freshly-harvested data, as well as updated Python packages, integration with Reclaim Cloud, and automated Docker builds. Included is a notebook to harvest details of all books available from Trove in digital form. This includes both digitised books, that have been scanned and OCRd, as well as born digital publications, such as PDFs and epubs. The definition of ‘books’ is pretty loose – I’ve harvested details of anything that has been assigned the format ‘Book’ in Trove, but this includes ephemera, such as posters, pamphlets, and advertising.

Continue reading →

A miscellany of ephemera, oddities, & estrays

Friday, August 13, 2021

I’m just in the midst of updating my harvest of OCRd text from Trove’s digitised books (more about that soon!). But amongst the items catalogued as ‘books’ are a wide assortment of ephemera, posters, advertisements, and other oddities. There’s no consistent way of identifying these items through the search interface, but because I’ve found the number of pages in each ‘book’ as part of the harvesting process, I can limit results to items with just a single digitised page – there’s more than 1,500!

Continue reading →

Everyday heritage and the GLAM Workbench

Monday, August 9, 2021

Some good news on the funding front with the success of the Everyday Heritage project in the latest round of ARC Linkage grants. The project aims to look beyond the formal discourses of ‘national’ heritage to develop a more diverse range of heritage narratives. Working at the intersection of place, digital collections, and material culture, team members will develop a series of ‘heritage biographies’, that document everyday experience, and provide new models for the heritage sector.

Continue reading →

Recent GLAM Workbench presentations

Friday, August 6, 2021

So far this year I’ve given eight workshops or presentations relating to the GLAM Workbench, with probably a few more yet to come. Here’s the latest: Introducing the GLAM Workbench, presentation for the Griffith University Centre for Social and Cultural Research, Digital Humanities Seminar Series, 6 August 2021 Exploring the GLAM Workbench (slides), presentation for the UTS Digital Histories Seminar Series, 8 July 2021 The GLAM Workbench: A Labs approach?

Continue reading →

Updated! Lots and lots of text freshly harvested from Trove periodicals

Friday, August 6, 2021

For a few years now I’ve been harvesting downloadable text from digitised periodicals in Trove and making it easily available for exploration and research. I’ve just completed the latest harvest – here’s the summary: 1,163 digitised periodicals had text available for download Text was downloaded from 51,928 individual issues Adding up to a total of around 12gb of text If you want to dive straight in, here’s a list of all the harvested periodicals, with links to download a summary of available issues, as well as all the harvested text (there’s one file per issue).

Continue reading →

New dataset – Politicians talking about COVID

Monday, August 2, 2021

The Trove Journals section of the GLAM Workbench includes a notebook that helps you download press releases, speeches, and interview transcripts by Australian federal politicians. These documents are compiled and published by the Parliamentary Library, and the details are regularly harvested into Trove. Using this notebook, I’ve created a collection of documents that include the words ‘COVID’ or ‘Coronavirus’. It includes all the metadata from Trove, as well as the full text of each document downloaded from the Parliamentary Library.

Continue reading →

8 million Trove tags to explore!

Wednesday, July 14, 2021

I’ve always been interested in the way people add value to resources in Trove. OCR correction tends to get all the attention, but Trove users have also been busy organising resources using tags, lists, and comments. I used to refer to tagging quite often in presentations, pointing to the different ways they were used. For example, ‘TBD’ is a workflow marker, used by text correctors to label articles that are ‘To Be Done’.

Continue reading →

Integrating GLAM Workbench news and discussion

Thursday, July 1, 2021

I’ve spent a lot of time this year working on ways of improving the GLAM Workbench’s documentation and its integration with other services. Last year I created OzGLAM Help to provide a space where users of GLAM collections could ask questions and share discoveries – including a dedicated GLAM Workbench channel. Earlier this year, I tweaked my Micro.blog powered updates to include a dedicated GLAM Workbench news feed. Now I’ve brought the two together!

Continue reading →

GLAM Workbench now on YouTube!

Thursday, July 1, 2021

I’ve started creating short videos to introduce or explain various components of the GLAM Workbench. The first video shows how you can visualise searches in Trove’s digitised newspapers using the latest version of QueryPic. It’s a useful introduction to the way access to collection data enables us to ask different types of questions of historical sources. As with all GLAM Workbench resources, the video is openly-licensed – so feel free to stop it into your own course materials or workshops.

Continue reading →

GLAM Workbench office hours

Monday, June 28, 2021

To help you make use of the GLAM Workbench, I’ve set up an ‘office hours’ time slot every Friday when people can book in for 30 minute chats via Zoom. Want to talk about how you might use the GLAM Workbench in your latest research project? Are you having trouble getting started with GLAM data? Or perhaps you have some ideas for future notebooks you’d like to share? Just click on the ‘Book a chat’ link in the GLAM Workbench, or head straight to the scheduling page to set up a time!

Continue reading →

QueryPic: The Next Generation

Monday, June 21, 2021

QueryPic is a tool to visualise searches in Trove’s digitised newspapers. I created the first version way back in 2011, and since then it’s taken a number of different forms. The latest version introduces some new features: Automatic query creation – construct your search in the Trove web interface, then just copy and paste the url into QueryPic. This means you can take advantage of Trove’s advanced search and facets to build complex queries.

Continue reading →

Everyone gets a Lab!

Monday, June 21, 2021

I recently took part in a panel at the IIPC Web Archiving Conference discussing ‘Research use of web archives: a Labs approach’. My fellow panellists described some amazing stuff going on in European cultural heritage organisations to support researchers who want to make use of web archives. My ‘lab’ doesn’t have a physical presence, or an institutional home, but it does provide a starting point for researchers, and with the latest Reclaim Cloud and Docker integrations, everyone can have their own web archives lab!

Continue reading →

Minor change to Reclaim Cloud config

Monday, June 14, 2021

When the 1-click installer for Reclaim Cloud works its magic and turns GLAM Workbench repositories into your own, personal digital labs, it creates a new work directory mounted inside of your main Jupyter directory. This new directory is independent of the Docker image used to run Jupyter, so it’s a handy place to copy things if you ever want to update the Docker image. However, I just realised that there was a permissions problem with the work directory which meant you couldn’t write files to it from within Jupyter.

Continue reading →

Trove Query Parser

Monday, June 14, 2021

Here’s a new little Python package that you might find useful. It simply takes a search url from Trove’s Newspapers & Gazettes category and converts it into a set of parameters that you can use to request data from the Trove API. While some parameters are used both in the web interface and the API, there are a lot of variations – this package means you don’t have to keep track of all the differences!

Continue reading →

Some GLAM Workbench stats

Sunday, June 13, 2021

I deliberately don’t keep any stats about GLAM Workbench visits, because I think they’re pretty meaningless. On the other hand, I’m always interested to see how often GLAM Workbench repositories are launched on Binder. Rather than just random clicks, these numbers represent the number of times users started new computing sessions using the GLAM Workbench. I just compiled these stats for the past year, and I was very pleased to see that the Web Archives section has been launched over 1,000 times in the past twelve months!

Continue reading →

More Reclaim Cloud integrations!

Sunday, June 13, 2021

Five of the GLAM Workbench repositories now have automatically built Docker images and 1-click integration with Reclaim Cloud – ANU Archives, Trove Newspapers, Trove Newspaper Harvester, NAA RecordSearch, & Web Archives. This means you can launch your very own version of these GLAM Workbench repositories in the cloud, where all your downloads and experiments will be saved! Find out more on the Using Reclaim Cloud page.

Continue reading →

Get your GLAM datasets here!

Sunday, June 13, 2021

I’ve updated my harvest of Australian GLAM datasets from state/national government open data portals. There’s now 387 datasets, containing 1049 files (including 684 CSVs). There’s a list if you want to browse, and a CSV file if you want to download all the metadata. For more more information see the data portals section of the GLAM Workbench. If you’re interested in finding out what’s inside all those 684 CVS files, take the GLAM CSV Explorer for a spin!

Continue reading →

NAA RecordSearch section of the GLAM Workbench updated!

Monday, May 24, 2021

If you work with the collections of the National Archives of Australia, you might find the RecordSearch section of the GLAM Workbench helpful. I’ve just updated the repository to add new options for running the notebooks, including 1-click installation on Reclaim Cloud. There’s also a few new notebooks. New notebooks and datasets Harvest details of all series in RecordSearch – get details of all series registered in RecordSearch, also generates a summary dataset with the total number of items digitised, described and in each access category Exploring harvested series data – generates some basic statistics from the harvest of series data Summary data about all series in RecordSearch (15mb CSV) – contains basic descriptive information about all the series currently registered on RecordSearch (May 2021) as well as the total number of items described, digitised, and in each access category Updated I’ve started (but not completed) updating all the notebooks in this repository to use my new RecordSearch Data Scraper.

Continue reading →

Web archives section of GLAM Workbench updated!

Monday, May 17, 2021

My program of rolling out new features and integrations across the GLAM Workbench continues. The latest section to be updated is the Web Archives section! There are no new notebooks with this update, but some important changes under the hood. If you haven’t used it before, the Web Archives section contains 16 notebooks providing documentation, tools, apps, and examples to help you make use of web archives in your research. The notebooks are grouped by the following topics: Types of data, Harvesting data and creating datasets, and Exploring change over time.

Continue reading →

Using web archives to find out when newspapers were added to Trove

Wednesday, May 12, 2021

There’s no doubt that Trove’s digitised newspapers have had a significant impact on the practice of history in Australia. But analysing that impact is difficult when Trove itself is always changing – more newspapers and articles are being added all the time. In an attempt to chart the development of Trove, I’ve created a dataset that shows (approximately) when particular newspaper titles were first added. This gives a rough snapshot of what Trove contained at any point in the last 12 years.

Continue reading →

GLAM Jupyter Resources

Wednesday, May 12, 2021

To make it easier for people to suggest additions, I’ve created a GitHub repository for my list of GLAM Jupyter examples and resources. Contributions are welcome! This list is automatically pulled into the GLAM Workbench’s help documentation. #dhhacks

Continue reading →

Running notebooks – a sign of things to come in the GLAM Workbench

Wednesday, May 12, 2021

I recently made some changes in the GLAM Workbench’s Help documentation, adding a new Running notebooks section. This section provides detailed information of running and managing GLAM Workbench repositories using Reclaim Cloud and Docker. I’m still rolling out this functionality across all the repositories, but it’s going to take a while. When I’m finished you’ll be able to create your own persistent environment on Reclaim Cloud from any repository with just the click of a button.

Continue reading →

Sponsor my work on GitHub!

Wednesday, May 12, 2021

As I foreshadowed some weeks ago, I’ve shut down my Patreon page. Thanks to everyone who has supported me there over the last few years! I’ve now shifted across to GitHub Sponsors, which is focused on supporting open source projects. This seems like a much better fit for the things that I do, which are all free and open by default. So if you think things like the GLAM Workbench, Historic Hansard, OzGLAM Help, and The Real Face of White Australia are worth supporting, you can sign up using my GitHub Sponsors page.

Continue reading →

Updates to the Trove Newspapers section of GLAM Workbench

Wednesday, May 12, 2021

I’ve updated, refreshed, and reorganised the Trove newspapers section of the GLAM Workbench. There’s currently 22 Jupyter notebooks organised under the following headings: Trove newspapers in context – Notebooks in this section look at the Trove newspaper corpus as a whole, to try and understand what’s there, and what’s not. Visualising searches – Notebooks in this section demonstrate some ways of visualising searches in Trove newspapers – seeing everything rather than just a list of search results.

Continue reading →

Introducing the new, improved RecordSearch Data Scraper!

Tuesday, April 27, 2021

It was way back in 2009 that I created my first scraper for getting machine-readable data out of the National Archives of Australia’s online database, RecordSearch. Since then I’ve used versions of this scraper in a number of different projects such as The Real Face of White Australia, Closed Access, and Redacted (including the recent update). The scraper is also embedded in many of the notebooks that I’ve created for the RecordSearch section of the GLAM Workbench.

Continue reading →

Recently digitised files in the National Archives of Australia

Monday, March 29, 2021

I’m interested in understanding what gets digitised and when by our cultural institutions, but accessible data is scarce. The National Archives of Australia lists ‘newly scanned' records in RecordSearch, so I thought I’d see if I could convert that list into a machine-readable form for analysis. I’ve had a lot of experience trying to get data out of RecordSearch, but even so it took me a while to figure out how the ‘newly scanned’ page worked.

Continue reading →