If you have a dataset that you want to share as a searchable online database then check out Datasette – it’s a fabulous tool that provides an ever-growing range of options for exploring and publishing data. I particularly like how easy Datasette makes it to publish datasets on cloud services like Google’s Cloudrun and Heroku. A couple of weekends ago I migrated the TungWah Newspaper Index to Datasette. It’s now running on Heroku, and I can push updates to it in seconds.
I’m also using Datasette as the platform for sharing data from the Sydney Stock Exchange Project that I’m working on with the ANU Archives. There’s a lot of data – more than 20 million rows – but getting it running on Google Cloudrun was pretty straightforward with Datasette’s publish
command. The problem was, however, that Datasette is configured to run on most cloud services in ‘immutable’ mode and we want authenticated users to be able to improve the data. So I needed to explore alternatives.
I’ve been working with Nectar over the past year to develop a GLAM Workbench application that helps researchers do things like harvesting newspaper articles from a Trove search. So I thought I’d have a go at setting up Datasette in a Nectar instance, and it works! Here’s a few notes on what I did…
systemd
. This involved installing datasette
via pip
, creating a folder for the Datasette data and configuration files, creating a datasette.service
file for systemd
.datasette install
command to add a couple of Datasette plugins. One of these is the datasette-github-auth
plugin, which needs a couple of secret tokens set. I added these as environment variables in the datasette.service
file.systemd
setup uses Datasette’s configuration directory mode. This means you can put your database, metadata definitions, custom templates and CSS, and any other settings all together in a single directory and Datasette will find and use them. I’d previously passed runtime settings via the command line, so I had to create a settings.json
for these.rsync
and started the Datasette service. It worked!/pvol
in the virtual machine as the Nectar documentation describes.datasette-root
folder under pvol
, copied the Datasette files to it, and changed the datasette.service
file to point to it. This didn’t seem to work and I’m not sure why. So instead I created a symbolic link between /home/ubuntu/datasette-root
and /pvol/datasette-root
and and set the path in the service file back to /home/ubuntu/datasette-root
. This worked! So now the database and configuration files are sitting in the persistent storage volume.Although the steps above might seem complicated, it was mainly just a matter of copying and pasting commands from the existing documentation. The new Datasette instance is running here, but this is just for testing and will disappear soon. If you’d like to know more about the Stock Exchange Project, check out the ANU Archives section of the GLAM Workbench.