Sharing recent updates and work-in-progress
If you have a dataset that you want to share as a searchable online database then check out Datasette – it’s a fabulous tool that provides an ever-growing range of options for exploring and publishing data. I particularly like how easy Datasette makes it to publish datasets on cloud services like Google’s Cloudrun and Heroku. A couple of weekends ago I migrated the TungWah Newspaper Index to Datasette. It’s now running on Heroku, and I can push updates to it in seconds.
I’m also using Datasette as the platform for sharing data from the Sydney Stock Exchange Project that I’m working on with the ANU Archives. There’s a lot of data – more than 20 million rows – but getting it running on Google Cloudrun was pretty straightforward with Datasette’s
publish command. The problem was, however, that Datasette is configured to run on most cloud services in ‘immutable’ mode and we want authenticated users to be able to improve the data. So I needed to explore alternatives.
I’ve been working with Nectar over the past year to develop a GLAM Workbench application that helps researchers do things like harvesting newspaper articles from a Trove search. So I thought I’d have a go at setting up Datasette in a Nectar instance, and it works! Here’s a few notes on what I did…
systemd. This involved installing
pip, creating a folder for the Datasette data and configuration files, creating a
datasette installcommand to add a couple of Datasette plugins. One of these is the
datasette-github-authplugin, which needs a couple of secret tokens set. I added these as environment variables in the
systemdsetup uses Datasette’s configuration directory mode. This means you can put your database, metadata definitions, custom templates and CSS, and any other settings all together in a single directory and Datasette will find and use them. I’d previously passed runtime settings via the command line, so I had to create a
rsyncand started the Datasette service. It worked!
/pvolin the virtual machine as the Nectar documentation describes.
pvol, copied the Datasette files to it, and changed the
datasette.servicefile to point to it. This didn’t seem to work and I’m not sure why. So instead I created a symbolic link between
/pvol/datasette-rootand and set the path in the service file back to
/home/ubuntu/datasette-root. This worked! So now the database and configuration files are sitting in the persistent storage volume.
Although the steps above might seem complicated, it was mainly just a matter of copying and pasting commands from the existing documentation. The new Datasette instance is running here, but this is just for testing and will disappear soon. If you’d like to know more about the Stock Exchange Project, check out the ANU Archives section of the GLAM Workbench.