OpenGLAM fireworks! Finding open collections in DigitalNZ

Lately I’ve been updating and expanding the notebooks in the DigitalNZ section of the GLAM Workbench. In particular, I’ve been looking at the usage facet to understand how much of the aggregated content is ‘open’. What do I mean by ‘open’? The Open Knowledge Foundation definition states that ‘open data and content can be freely used, modified, and shared by anyone for any purpose’. Obviously things that are in the public domain, such as out-of-copyright resources, are open. But so are resources with an open licence such as CC-BY or CC-BY-SA. The Creative Commons ‘Non commercial’ and ‘No derivatives’ licences are not open because they put limits on how you can use resources.

How does this definition map to DigitalNZ? The usage facet includes five values:

  • Share
  • Modify
  • Use commercially
  • All rights reserved
  • Unknown

These values have been assigned by DigitalNZ based on the 35,000 different rights statements and 30 different copyright statements that are included in DigitalNZ metadata records. I find I have to turn the usage values inside out to really understand them. A resource that only allows you to ‘Share’, excludes the ‘Modify’ and ‘Use commercially’ permissions and so is sort of equivalent to a CC-BY-ND-NC licence. The only open value, according to the definition above, is ‘Use commercially’, which is like CC-BY. I’m assuming that ‘Use commercially’ has been assigned to resources that either out of copyright (or with no known copyright restrictions) or are openly licensed.

It’s also worth noting that the ‘usage’ values are not mutually-exclusive. A record with a ‘usage’ value of ‘Use commercially’, will also be assigned ‘Share’ and ‘Modify’ values. This is because ‘Use commercially’ includes the ‘Share’ and ‘Modify’ permissions. This seems a bit counter-intuitive, but makes sense if you think about doing a search for everything you’re allowed to share.

A rough calculation based on the usage facet indicates that 71.76% of the resources aggregated by DigitalNZ are open. That seems pretty good, though a lot of those are probably out-of-copyright newspaper articles from Papers Past. For a more fine-grained analysis, I decided to look at the ‘usage’ data for each combination of ‘content_partner’ and ‘primary_collection’. How open is each individual collection in DigitalNZ?

For added excitement, and to stretch my knowledge of what Altair can do, I decided to visualise the results as display of colourful fireworks. The higher the explosion, the more open the collection! I’m pretty pleased with the result.

I’ve saved a HTML version of the chart so you can mouseover the explosions for more details. All the code is included in this notebook, along with a CSV file containing all the harvested facet data. #dhhacks