Tim Sherratt

Sharing recent updates and work-in-progress

Jan 2024

Mapping MARC Geographic Area codes to Wikidata

Trove uses codes from the MARC Geographic Areas list to identify locations in metadata records. I couldn’t find any mappings of these codes to other sources of geospatial information, so I fired up OpenRefine and reconciled the geographic area names against Wikidata. Once I’d linked as many as possible, I copied additional information from Wikidata, such as ISO country codes, GeoNames identifiers, and geographic coordinates.

I’ve saved the resulting dataset in two formats – as a flattened CSV file (handy for loading as a dataframe), and as a JSON file that uses the geographic area codes as keys (handy for looking up values). You can download the datasets from this GitHub repository.

I’ve also written the codes back into the Wikidata records, so you can now find them with a SPARQL query like this.

The columns in the CSV file are:

  • code – MARC geographic areas code (without any trailing dashes)
  • place – name of geographic area from the MARC list
  • wikidata_label – name of geographic area from Wikidata
  • wikidata_id – Wikidata identifier
  • coordinates – pair of decimal coordinates in the form latitude,longitude (multiple values are pipe | separated)
  • iso_country_code – ISO two letter country code (multiple values are pipe | separated)
  • iso_numeric_country_code – ISO numeric country code (multiple values are pipe | separated)
  • geonames_id – GeoNames identifier (multiple values are pipe | separated)

Note that some fields can contain multiple values. For example the area Mediterranean Region is linked to 22 countries, so there will be multiple values in the ISO code fields.

Choropleth map showing the countries associated with oral history records

For an example of this dataset in use, see Which countries do the oral histories relate to? in the Trove Data Guide.