Bots don’t Vote, but They Surely Bother! A Study of Anomalous Accounts in a National Referendum

🔬 Research Summary by Eduardo Graells-Garrido and Ricardo Baeza-Yates.

Eduardo Graells-Garrido is Assistant Professor at the Department of Computer Science in Universidad de Chile. He is interested in improving the understanding of how people live their own informational spaces, including physical worlds (cities) and virtual ones (social networks). His research lies in the intersection of Urban Informatics, Information Visualization, and Social Sciences. He also writes auto- and science-fiction — recently he published a story collection under the title GAME OVER.

Ricardo Baeza-Yates is Director of Research at the Institute for Experiential AI of Northeastern University. He is also a part-time Professor at Universitat Pompeu Fabra in Barcelona and Universidad de Chile in Santiago. Before he was the CTO of NTENT, a semantic search technology company based in California and prior to these roles, he was VP of Research at Yahoo Labs, based in Barcelona, Spain, and later in Sunnyvale, California, from 2006 to 2016.

[Original paper by Eduardo Graells-Garrido and Ricardo Baeza-Yates]

Overview: In 2020, a national referendum was held in Chile due to the social outburst that exploded in 2019. This paper analyzes the political discussion on Twitter in the three months preceding the referendum. Using machine learning methods, we estimated the referendum stance distribution on Twitter and quantified the influence of bots in the discussion.

Introduction

On October 18th, 2019, a “Social Outburst” exploded in Santiago, the capital of Chile, with demands regarding social and economic equality throughout the country. Until today, in 2022, there are still protests regarding this movement, although there has been advancement in fulfilling the demands of the explosion. The most important result was the Chilean Constitutional Referendum, held on October 25th, 2020. In this election, the people asked for a new constitution to replace the current one, drafted in the 80s during the Pinochet dictatorship and partially modified later under democracy. It was decided that a Constitutional Convention should be formed to propose a new foundational text.

“Democratic sunset”: https://commons.wikimedia.org/wiki/File:Atardecer_democr%C3%A1tico.jpg

From 2019 onwards, there has been a deeply polarized discussion in all public spheres regarding the social outburst, the Constitutional process, Chile’s current state of affairs during the pandemic, and other concurrent crises. Web platforms have been no exception. Twitter, in particular, has been important due to its reach and influence. It is common to see media reports and op-eds in response to Twitter discussions. Hence, even though Twitter as a platform does not have the highest popularity in Chile, what happens within Twitter reaches many people in the physical world. Then, analyzing its discussion provides insights regarding how certain groups of people feel and what kind of information reaches all types of people through the several out-of-platform ramifications.

An article about the Twitter discussion regarding the referendum in the main newspaper in Chile, El Mercurio (October 23, 2020).

Through a pipeline of analysis that includes account classification into political stances and unsupervised anomaly detection models, we quantified the volume produced by bots in the political discussion three months before the last national referendum held in Chile. Our key results are two-fold. On the one hand, although there are bots, their volume and content are rather small, and the stance distribution inferred from the Twitter discussion matches the final turnout in the referendum (bots don’t vote). On the other hand, we found evidence of bot coordination and information dissemination of bot content by non-bots (but they surely bother!), particularly in the right-leaning political stance. This has implications for how Twitter, as a platform, can influence the political discussion in both worlds, the virtual and the physical.

Key Insights

For about three months before the referendum, we listened to the Twitter Streaming API to fetch what was being discussed, with a focus on the mainstream political stances at the moment: to approve the drafting of a new constitution (#apruebo in Spanish), or to reject it and keep the current constitution (#rechazo in Spanish). Between August 1st, 2020, and October 25th, 2020, we obtained 2.3M tweets from 251K users. This represents 10% of all Twitter users in Chile and 1.3% of the Chilean population. We analyzed these tweets and classified every account on the dataset into two stances, using a methodology previously published on ACM Web Science’20. In this work, we add an additional step to quantify bot behavior.

The methodology is based on the following steps: first, the data is pre-processed to extract content features, such as word and hashtag usage per account and who each account interacts with. Then, some of the accounts can be pre-assigned a stance, as they may contain strong signals that disclose their political position. For instance, some accounts include the hashtags #apruebo or #rechazo in their profile names, in the same way, people hold a flag with their political preference in a physical manifestation.

Using all accounts with self-disclosed political preferences, we use an XGBoost model to predict the stance of the rest of the dataset. As a result, in our dataset, 81.20% of accounts in #apruebo, 17.34% in #rechazo, and 1.46% were marked as undisclosed due to the classifier needing to be more confident about the prediction. This matches well with the referendum results, where 78.31% voted in favor, and 21.69% voted against the drafting of a new constitution.

Now we have published an additional step to this methodology to quantify bot behavior. How many of those accounts are automated, and what is their influence on the discussion? Here, we assumed that bots present anomalous behavior. That is, although they may share content that appears to be created by humans, they behave differently. For instance, they may tweet (or retweet) in greater volume, or their usernames may be randomly generated rather than with meanings, as it happens with physical people who choose their usernames. Behavioral signals include relative metrics of published content, such as the total of published tweets in the discussion divided by the number of different days where these tweets were published or the number of digits in the account username, among other signals that are not necessarily related to content.

Rather than identifying accounts as anomalous, we calculate an anomaly score for each account in our dataset. We do so with the Isolation Forest model, which applies a bagging approach to anomaly detection by creating many small decision trees that encode how hard it is to isolate samples in the data under the assumption that anomalies are easier to isolate; in our context, this means that anomalous accounts are those that behave so differently from the others, that expressing this difference in a decision tree can be done in few steps. The model does this by building many trees on random subsets of the data and then averaging the number of decisions made in each tree to isolate each sample. This average is the anomaly score of an account.

Isolation Forest scheme. https://www.mdpi.com/2072-4292/12/10/1678#

The anomaly score is a relative metric. In the analysis, a threshold must be decided as a prior proportion of anomalous accounts. We considered the top 7.5% anomalous accounts as potential bots, following a threshold from recent research. Note that we are interested in quantifying the aggregated bot behavior rather than individual bot identification, and this threshold may contain false positives, as “real” accounts can also be anomalous. For instance, some people tweet at a greater volume than the majority. Hence, the score by itself does not guarantee bot status. As such, we need to define a filter for bots within the anomaly group. We characterize the different behavioral signals of anomalous and non-anomalous accounts. To do so, we divide the dataset into five groups. First, an anomalous group comprised the top 7.5% of anomaly scores. Second, four groups contain anomaly scores in decreasing order, all having the same cardinality.

We focused on two features. On the one hand, we observed differences in the distribution of numbers within account usernames. Anomalous accounts tend to have more than four digits. Less than four digits are common, for instance, in dates or other significant numbers. Having more than four digits strongly characterizes anomalous accounts, and they may signal randomly chosen usernames.

On the other hand, we found a pattern in registration dates. We observed that accounts in the anomaly group surpassed the other groups in terms of registration volume since August 8th. Thus, we defined three criteria to consider an account as a bot: it has to be in the anomalous group (i.e., it behaves very differently to other accounts), it has to have more than four digits in the username (i.e., it seems to have been randomly generated), and it has to be registered after August 8th (i.e., it is a recent account). Using this conservative criterion, only 0.66% of accounts are likely bots. Suppose we group the content generated in the discussion concerning political stance and bot determination. In that case, we observe that the amount of content generated by these candidate bot accounts is rather small.

However, when looking at the information dissemination network from retweets, we observed signs of coordination from #rechazo accounts. We applied a Hierarchical Stochastic Block Model community detection. We found that they tended to form small bot-dominated communities and that #rechazo communities that were not bot-dominated tended to retweet bot-dominated community content.

Between the lines

In summary, the number of bots in the discussion is small. In terms of produced content, bot accounts are similar to regular accounts. Instead, the difference lies in the network behavior. These small squads, in coordination with regular accounts, may influence what is being discussed. This influence has a clear political objective, as #rechazo (right-leaning) bots form large communities compared to #apruebo bots. This is relevant for trending topic assignments, a procedure with unknown inner workings outside of Twitter, although trending topics influence the discussion and the political attitudes in the physical world.