After my Trove API keys were cancelled without warning on 21 February, I reluctantly agreed to a meeting with the National Library of Australia. They had provided so little information in their emails, that it seemed to be the only way to find out what was really going on. I came out of the meeting shocked by the NLA’s change in attitude towards API use.
All Trove API users need to be aware that the NLA now insists that accessing the ‘content’ of resources, rather than just the descriptive metadata, is a breach of the API terms of use. This includes the full text of digitised newspaper and journal articles that are included in API responses. Yes, that’s right, using the Trove API in the way that it has been designed and documented is a breach of its own terms of use. You can only download the full text of items using the API if you seek and obtain explicit permission from the NLA beforehand. Note also that the NLA is reviewing people’s use of the API and, as demonstrated by my case, they can and will suspend your API keys without warning.
I hate meetings and avoid conflict, but I didn’t see any alternative to meeting with the NLA to find the real reasons behind the cancellation of my API keys. As a classic introvert, I spent the days leading up to the meeting anxiously imagining every possible way the conversation might go. Or so I thought. The outcome was not one I predicted.
The meeting was attended by three Directors responsible for the delivery of ‘Trove business’: the Director of Trove Community Services, the Director of Trove Data and Platforms, and the Director of Strategy and Transformation. Feeling somewhat outnumbered, I took along a colleague. I’m glad I did, as afterwards we had to confirm with each other that what we thought had just happened, actually happened. We were both stunned.
The meeting started with a description of NLA’s change in API policy as described above. It was like stepping through the looking glass. I had not imagined a world where the NLA would set itself up as gatekeeper to every use of the digitised newspaper corpus through the API.
At one point we were told that this change coincided with the release of version 2 of the API. But this doesn’t seem right. Checking the web archive, the terms of use page seems to have been updated when the whole Trove web interface changed in 2020. (You can see how it changed between February and September 2020.) In fact, version 2.1 of the API, released in September 2019, was described by the NLA as opening up ‘access to richer data for API users, particularly the rapidly growing corpus of 1.6 million digitised articles from Australian journals, magazines and newsletters’. There was no indication then that access to this data required special permission.
But when the change happened is less important than how it was, or wasn’t, communicated, and what it means for researchers. At multiple points throughout the meeting I stressed that if this is how they are interpreting and enforcing the API terms of use, they need to be explaining this change to API users. I’m certain that I’m not alone in being totally blindsided.
The reasons for the change are not clear. There was some talk of ‘data governance’, and the fact that the online world had changed – though I fail to see how researchers downloading newspaper articles from the 1890s can be seen as a possible cyber threat. If there are particular problems or concerns, I suggested that it would be useful to have a broad-ranging conversation with the research sector, to see if it might be possible to carve out space for research uses within the terms of use. In response I was told there already is such a carve out – individual researchers can ask the NLA for permission.
I was so shaken by this turn away from open access that the question of my own API keys hardly seemed to matter. The immediate reason for the cancellation of my keys is still not clear. The NLA admitted that I hadn’t used the API to extract text from NED journals as their original email claimed. Then it was suggested I used the API to find NED journals to extract text from. This is also not true. Discussion then broadened to the whole of the GLAM Workbench, and, yes, I readily admit that under the new interpretation much of my work breaches the API terms of use. The Trove Newspaper Harvester makes it easy for researchers to create datasets from the full text of newspaper articles in a set of search results. There are notebooks within the GLAM Workbench to help users access the full text of journals and books. In most cases researchers want and need content, not just metadata, and I’ve developed a range of tools to help them access it. But as I explained in my last post, none of this is new. I’ve been helping researchers in this way for 15 years. It’s the NLA that has changed, not me.
It was suggested that if I wanted to regain my API keys, an additional series of meetings would be necessary to help me bring my work within the bounds of what is now permissible.
So it seems I have a choice. Either I try to get out of API jail by submitting to the NLA’s re-education program, or I work with others in the research sector and beyond to try and change the NLA’s policy. I’m inclined to do the latter.
Ten years ago we celebrated the Trove API because of what it made possible. Everyone was free to explore and create, to analyse changes across 100 years of digitised newspapers, to shift scales and find new meanings. One of my most cited presentations from 2013 talks about how the API made Trove a platform we could all build upon. Openness was the key:
The more we become aware of the power of networked information, the more we become concerned with making and preserving its ‘openness’. To me open data is a process not a product – each visualisation, or interpretation can challenge our assumptions and help us to see things differently. Each use is an opening into new contexts.
Even if it does not directly impact their work, I think all researchers should be alarmed by the NLA’s turn away from openness. Experiments now will have to be approved, judged against criteria which are themselves not public. Is that what we want or expect from a major, publicly funded, cultural institution?
I also fear for the future of the API itself. This change makes it easier for the NLA to impose further limits over time. Perhaps researcher access to the API will be tied to future investment from the research sector. Perhaps it will be claimed the risks are too great and the API will be shut down completely. I’m no longer confident in the NLA’s commitment to providing researchers with long-term access to Trove data.
I want to thank everyone who has offered their support over the last couple of weeks. It’s been deeply encouraging to hear how my work over the past 15 years is valued, and how the GLAM Workbench has helped researchers and inspired new projects. Thanks too for making your views known to the NLA.
I’m now very doubtful that the Trove sections of the GLAM Workbench can be made acceptable to the NLA without major changes that would severely limit their usefulness. However, I’ll continue to maintain them as best I can without an API key, and I’ll continue to help researchers with their Trove questions.
To get myself back into a more positive frame of mind, I think I’ll also do some work with collections from organisations who value openness and are interested in new uses of their data. Suggestions are welcome!
But as I suggested above, the most important task ahead is to start talking about the implications of these changes at the NLA, particularly for the research sector.
Stay tuned…