08 Nov 2021

More thoughts on the Trove researcher platform for advanced research

Previously on ‘What could we do with $2.3 million?’, the National Library of Australia produced a draft plan for an ‘Advanced Researcher Platform’ that was thoroughly inadequate. Rather than submit this plan to the ARDC for consideration as part of the HASS RDC process, the NLA wisely decided to make some fundamental changes. The redrafted draft is now available for re-feedback. This is where we pick up the story…

So what has changed?

Generally speaking there seem to be two major changes.

There’s a much greater focus on consultation and collaboration.
There’s less detail about what will actually be developed.

These two changes can work together positively. The details of what’s being developed can be worked out through consultation, ensuring that what’s developed meets real research needs. But this assumes first that the consultation process is effective, and second that the overall scope of the project is appropriate. I’m not convinced on either of these two fronts.

TL;DR – it’s better than it was, but the hand waving around collaboration and integration aren’t convincing, the scope needs rethinking, and I still can’t see how what’s proposed provides good value for money.

Consultation

Given that there was almost no space for consultation in the previous draft, this version could only do better. There’s certainly lots of talk about consulting with the HASS community, and a new governance structure that includes researcher representatives, but what will the consultation process actually deliver?

The NLA is now partnering with the ANU (why the ANU? because it’s closest?), and the ANU will apparently be driving the consultation process. The whole process is complex and weirdly hierarchical. Any HASS researcher can participate in the initial rounds, but their numbers dwindle as you move up the hierarchy, until you reach the Project Board where only researchers with specific institutional affiliations are admitted. I’m imagining a Hunger Games scenario…

The process aims to gather feedback on ‘what developments would offer the best assistance to the majority and be feasibly achieved within the project timeframe’. That sounds ok, but earlier in the document the outcome of the consultation process is described in the following way:

The outcome will be expressed as one (or potentially two) goals articulated in high level requirements document(s) and align to the objectives as described in this project plan. Indicative goals might include: a graph visualising occurrence of keywords over time; an interface visualising place of publication as geospatial data on a map; or, or a concordance for exploring textual data such as with key word in context.

So all of this complex, hierarchical consultative structure is just to decide whether we have a line graph or a map (or both if we’re really lucky)? If you look at the project deliverables, it seems that the researcher feedback funnels into Deliverable 6 in Work Package 3 – ‘Analysing and visualising research data’. But what about the rest of the project? Will researcher feedback have any role in determining how datasets are created, for example?

I suppose even this very limited consultation is better than what was previously proposed (ie. nothing). But what then happens to the feedback from researchers? An Advisory Panel will be selected (by whom?) to collate the ideas and produce the high-level requirements. Detailed requirements will be generated from the high-level requirements (don’t you just love project management speak?), and then subjected to the argy bargy of the MoSCoW process where priorities will be set. It’s likely that these priorities will be whittled down further as development proceeds. These are crucial decision-making stages where important ideas can be relegated to the ‘nice to have’ category and never heard of again. It’s not clear from the plan who is involved in this, and where the final decision-making power lies.

Of course, some of these details can be worked out later. But given that the big sell of this version of the plan is the expanded consultative process, I think it’s important to know where the power really lies. What role will researchers actually play in determining the outcomes of the project? This is not at all clear.

Scope

But what is the project? In general terms it hasn’t really changed. There will be some sort of portal where researchers can create and share datasets and visualisations. Crucially, it’s assumed that this portal will be part of Trove itself. As noted the last time round, the original project description provided by the government made no such assumption. It was focused on ’the delivery of researcher portals accessible through Trove’. The NLA has interpreted ‘through’ to mean ‘as part of’, and given the limits on the consultative process described above it seems this won’t change.

Or will it? I’m still puzzling over a few sections in the plan that talk about looking beyond the NLA to see whether there are existing options to meet user requirements. Deliverable 2 in Work Package 2 will:

Undertake an environmental scan for current research usage and tools such as (Glam Workbench) and a market scan to determine if these gaps can be filled by existing services that the HASS community and/or Trove support.

What’s with the weird brackets around ‘Glam Workbench’? Makes me think it was a last minute addition. I suppose I should be grateful that the NLA wants to spend some money to confirm that the GLAM Workbench actually exists. But then what? The next deliverable will:

determine which requirements will be delivered within the Trove Platform and which will be outsourced to other services.

So if the Trove Newspaper Harvester, for example, meets one of the requirements, will Trove simply link to it? Imagine that, Trove actually linking to one of the dozens of Trove tools and resources provided by the GLAM Workbench. Oh frabjous day! But then does the NLA still get the money to develop the thing that I’ve already developed, or will they share some of the project money with me (yeah right)? I really have no idea what’s envisaged here. How will the ‘solution architecture’ integrate existing tools and services? And what does that mean for the resourcing of the project as a whole?

Elsewhere the plan talks about services ‘dedicated specifically to Trove collections and/or to the Australian research community’ that could be ‘“plugged in”’ to the platform ecosystem’. That sounds hopeful, but if the platform is an ecosystem of tools and services from the NLA and beyond, then that changes the scope of the project completely. Why not start with that? Start with the idea of developing an ecosystem of tools and services making use of Trove data, rather than just developing a new Trove interface. Then we could work together to build something really useful.

It just seems that the scope of the project as a whole hasn’t been properly thought through. The original plan has been expanded in vague, hand wavy directions, without thinking through the implications and possibilities of that expansion. Tinkering around the edges isn’t enough, the nature of this project needs to be completely rethought.

Where’s the strategy?

Rather than have an open call for project funding, the HASS RDC process has focused instead on making strategic investments. But where’s the strategy? The current projects were identified through a number of scoping studies undertaken by the Department of Education, Skills, and Employment. But these scoping studies haven’t been publicly released, so we don’t really know why these projects were recommended for funding. Is giving the NLA buckets of money to develop a new interface really what was envisaged? Surely if you were thinking strategically, you’d be considering ways in which the rich data asset represented by Trove could be opened to new research uses. You’d look around at existing tools and resources and think about how they could be leveraged. You’d examine limitations in the delivery of Trove data, and think about what sort of plumbing was needed to connect up new and existing projects. So how did we end up here?

Perhaps we just need to take a step back and recognise that just because Trove provides the data, doesn’t mean it should direct the project. There needs to be another layer of strategic planning which identifies the necessary components and directs resources accordingly. As I noted before, there’s plenty of ways in which Trove’s data and APIs could be improved. Give them money to do that. But should they be building tools for researchers to use that data? Nope. Absolutely not.

I attended the eResearch Australasia Conference recently, and was really impressed with all the activity around research software development. If the tool building component of this project was opened up, it could provide a really useful focus for developing collaboration across the research software community and building capacities and understanding in HASS. It would also encourage greater reuse and integration. This would seems to be a much more strategic intervention.

Assorted questions

I’m not going to go through the plan in detail again. I’ve already spent a couple of weeks engaged in, or worrying about, the HASS RDC process. I’m tired, and I’m frustrated, and I can’t shake the depressing thought that the NLA will end up being rewarded for its bad behaviour. Many of my comments on the earlier draft still apply, particularly those around the API and the development of pathways for researchers.

It’s worth noting, however, that the ’sustainability’ section of the plan has disappeared completely – perhaps not surprisingly, as the only suggestion last time was for someone to give them more money. There are gestures towards integration, such as including representatives of the other HASS RDC projects on the Trove Project Board. But real integration would happen through technical interchange, not governance structures, and there’s no plan for that.

I’m also a bit confused about the role of ANU. It seems to be mostly focused on consultation, but then there are statements like:

The development phase will be completed as a collaboration between the ANU and NLA, with both institutions working on the development of their own systems to align the with the product goals and Trove.

What ANU systems are we talking about here? And why are they part of the project?

A couple of objectives were also added to the plan:

‘Explore opportunities to enrich the Trove corpus’ – sounds good, but it’s not picked up anywhere else in the plan, and has no deliverables associated with it. Is it just window dressing?
‘Develop Trove’s HASS community relationships and engagement capabilities’ – sorry if I’m cynical, but when your first plan doesn’t even bother to consult the HASS community, when you don’t even link to existing resources in the sector, why should we believe that this will now be an important, ongoing objective?

On the issue of community relationships, a few people in the previous round of feedback indicated to me that they didn’t want to criticise the NLA too harshly because they might want to work with them in the future. That’s not healthy community building.

Conclusion

After two attempts, the NLA has still not delivered a coherent project plan that demonstrates real value to the HASS sector and meets the ARDC’s assessment criteria. I think the project needs to be radically rethought, and leadership sought from outside the NLA to ensure that the available funding is used effectively.

I love Trove. I recognise the way it has transformed research, and was honoured to play a small part in its history. It should be appropriately funded. But it shouldn’t be funded to do everything.

In the end, we could do so much better, and so much more…

Have your say…

You can provide your own feedback on the new draft plan. There’ll be a roundtable event on 10 November when you can ask questions of the project participants. You can also submit your feedback by 17 November using the form at the bottom of this page. You might also want to remind yourself of the ARDC’s evaluation criteria.

I don’t think I’ll attend the roundtable, as this whole process has taken a bit of a toll, but I encourage you to do so. The more voices the better.

Update – 13 November

Here’s the final feedback that I submitted to the ARDC.

Tim Sherratt