As well as cross-posting updates to Twitter and Mastodon, I’ve now set up IFTTT to keep an eye on my micro.blog feed and post anything with the hashtag #dhhacks to my 101 DH Hacks FB page!

Adventures in stemming, or what happens when you search Trove for 'naturalization'

Fun fact — the Porter stemming algorithm treats the words ‘naturalisation’ and ‘naturalization’ differently. Naturalisation is stemmed to ‘naturalis’, naturalization to ‘natur’. You can try this yourself using this NLTK stemming demo. Why does this matter? If you try searching for ‘naturalization’ in Trove you get almost 14 million results, most of which aren’t relevant because they’re matching words like ‘nature’. Of course you can switch off stemming in Trove by using the text: modifier.

Continue reading →

I have a brand new updates page powered by micro.blog!