A few years ago, I harvested the details of tweets that included links to Trove. The data has just been sitting on my computer, so I thought I should package it up and share, in case it’s of use to anyone.
The story is that back in 2021, I was working on the article ‘More than newspapers’ for a special section of History Australia focusing on Trove. I was thinking that I might include something about the way Trove newspaper articles were mobilised within online discussions about history – a topic I first explored in ‘Life on the outside: connections, contexts, and the wild, wild web’, my keynote for the Annual Conference of the Japanese Association of Digital Humanities in 2014. In the end, the article went in another direction, so I didn’t use the data.
I remembered this recently and thought I should I should do something with it. I’ve now created a dataset and shared it on Zenodo. I’m not working on Trove any more, but I’m hoping that someone else might find the data useful!
The dataset contains information about tweets from 2009 to 2020 that include links to Trove. The tweet data was compiled using Twarc in May 2021, under Twitter’s academic access program. The search queries used were:
url:nla.gov.au/nla.news
url:trove.nla.gov.au
url:newspapers.nla.gov.au
Many of the tweets were produced by bots. Fortunately, I’d been maintaining a list of Trove bots on Twitter, so I used the list to separate the tweets into two files, one for bots and one for ordinary users.
To respect user intentions and comply with the Twitter API terms of use, I removed all the tweet information except for tweet_id
and tweet_date
from the files. If it hasn’t been deleted, the full data for each tweet can probably be obtained from the X API using the tweet_id
, though you might need a paid subscription.
The two main files are:
trove_url_tweets.csv
– links shared by human users (although it may include some unidentified bots)trove_url_tweets_bots.csv
– links shared by botsI also created some additional data files:
trove_url_totals.csv
– the number of times each Trove link was shared by users (not including bots)active_users_per_year.csv
– the number of unique users each year who shared a link to Troveactive_bots_per_year.csv
– the number of active bots each year sharing links to TroveThere’s more information about the structure and contents of the data files in the Zenodo record.
I haven’t explored the data in detail, but here’s some quick summaries to give you a taste.
summary | |
---|---|
number of unique users sharing Trove links | 9,294 |
number of bots sharing Trove links | 43 |
number of tweets by humans containing Trove links | 48,293 |
number of tweets by bots containing Trove links | 318,797 |
number of unique links shared by humans | 36,886 |
number of unique links shared by bots | 270,501 |
What types of links were people sharing?
types of link shared by humans | count |
---|---|
newspaper article | 34,548 |
other (search queries, home page etc) | 8,385 |
work (items other than newspapers – books, maps, photos etc) | 4,856 |
newspaper page | 1,377 |
newspaper title | 400 |
How did the number of links shared by humans vary across time?
Which articles or pages were shared most often by humans? Here’s the top ten (click on the link to view).
trove_id | trove_type | tweets | retweets | quotes | total times shared |
---|---|---|---|---|---|
75869223 | article | 1,232 | 61 | 34 | 1,327 |
1298497 | article | 141 | 1,028 | 53 | 1,222 |
102074798 | article | 74 | 693 | 77 | 844 |
68141866 | article | 138 | 522 | 48 | 708 |
41602327 | article | 633 | 30 | 0 | 663 |
100645214 | article | 111 | 467 | 20 | 598 |
502650 | page | 1 | 513 | 12 | 526 |
60828173 | article | 48 | 444 | 19 | 511 |
4173156 | article | 53 | 321 | 10 | 384 |
79410604 | article | 2 | 303 | 69 | 374 |
The most shared article reports that PM Menzies had described Hitler as a ‘great man’ at a meeting in July 1939. However, most of the tweets sharing this link came from a single user. A number of the other articles relate to the weather, a reflection of the fact that Trove’s newspaper articles have been mobilised on both sides of the climate change debate.
How many Twitter users were sharing links to Trove each year?
I haven’t included any of the bot data in these summaries because I think I’ll write a second bot-themed post – coming soon!