28 Jan 2022

Testing, testing...

I regularly update the Python packages used in the different sections of the GLAM Workbench; though probably not as often as I should. Part of the problem is that once I've updated the packages, I have to run all the notebooks to make sure I haven't inadvertently broken something -- and this takes time. And in those cases where the notebooks need an API key to run, I have to copy and paste the key in at the appropriate spots, then remember to delete them afterwords. They're little niggles, but they add up, particularly as the GLAM Workbench itself expands.

I've been looking around at Jupyter notebook automated testing options for a while. There's nbmake, testbook, and nbval, as well as custom solutions involving things like papermill and nbconvert. After much wavering, I finally decided to give `nbval` a go. The thing that I like about `nbval` is that I can start simple, then increase the complexity of my testing as required. The `--nbval-lax` option just checks to make sure that all the cells in a notebook run without generating exceptions. You can also tag individual cells that you want to exclude from testing. This gives me a testing baseline -- this notebook runs without errors -- it might not do exactly what I think it's doing, but at least it's not exploding in flames. Working from this baseline, I can start tagging individual cells where I want the output of the cell to be checked. This will let me test whether a cell is doing what it's supposed to.

This approach means that I can start testing without making major changes to existing notebooks. The main thing I had to think about is how to handle API keys or other variables which are manually set by users. I decided the easiest approach was to store them in a `.env` file and use dotenv to load them within the notebook. This also makes it easy for users to save their own credentials and use them across multiple notebooks -- no more cutting and pasting of keys! Some notebooks are designed to run as web apps using Voila, so they expect human interaction. In these cases, I added extra cells that only run in the testing environment -- they populate the necessary fields and simulate button clicks to start.

While I was in a QA frame of mind, I also started playing with nbqa -- a framework for all sorts of code formatting, linting, and checking tools. I decided I'd try to standardise the formatting of my notebook code by running isort, black, and flake8. As well ask making the code cleaner and more readable, they pick up things like unused imports or variables. To further automate this process, I configured the `nbqa` checks to run when I try to commit any notebook code changes using `git`. This was made easy by the pre-commit package.

This is all set up and running in the Trove newspapers repository -- you can see the changes here. Now if I update the Python packages or make any other changes to the repository, I can just run `pytest --nbval-lax` to test every notebook at once. And if I make changes to an individual notebook, `nbqa` will automatically give the changes a code quality check before I save them to the repository. I'm planning to roll these changes out across the whole of the GLAM Workbench in coming months.

Developments like these are not very exciting for users, but they're important for the management and sustainability of the GLAM Workbench, and help create a solid foundation for future development and collaboration. Last year I created a GLAM Workbench repository template to help people or organisations thinking about contributing new sections. I can now add these testing and QA steps into the template to further share and standardise the work of developing the GLAM Workbench.

Tim Sherratt

Testing, testing...