🔬 Research Summary by Panagiotis (Panos) Papadopoulos, a Research Scientist at Telefonica Research, interested in Web Security and Privacy.
[Original paper by Emmanouil Papadogiannakis, Panagiotis Papadopoulos, Evangelos P. Markatos, Nicolas Kourtellis]
Overview: Fake news is an age-old phenomenon, widely assumed to be associated with political propaganda published to sway public opinion. Despite many studies performed and countermeasures deployed from researchers and stakeholders, unreliable news sites have increased their share of engagement among the top performing news sources in recent years. In this study, we shed light on the revenue flows of fake news sites by investigating who supports and maintains their existence.
Introduction
BBC recently interviewed a panel of 50 experts about the “grand challenges we face in the 21st century” and many of them named propaganda and fake news [1] as a key challenge. Indeed, the tools people use to spread misinformation in recent years have dramatically improved with the Internet and social media, thus providing the perfect habitable environment for fake news to thrive. Unlike the yellow newspapers of the past, social media and search engines pose a great threat to truth since the more successful in luring visitors the content of a website is, the more it is promoted by the algorithms underpinning these platforms.Â
Albeit the various important actions of academics, advertisers and tech companies, unreliable news websites significantly increased (2.1x) their share of engagement among the top performing news sources in the past year alone [2]. There is no doubt that the success of curbing fake news primarily depends on the efforts to reduce or eliminate the incentives of fake news producers. Admittedly, however, little is known about the incentives and funding of fake news on the Web.
In this paper, we explore the following main questions: Who supports the existence of fake news websites via paid ads, either as an advertiser or an ad seller? Who owns these websites and what other Web businesses are they into?
We develop a novel ad detection methodology to identify the companies that advertise in fake news websites and the intermediary companies responsible for facilitating those ad revenues. We study more than 2400 popular fake and real news websites and show that big ad networks have a direct advertising relation with more than 40% of these fake news websites, and a reseller advertising relation with more than 60% of them.
Key Insights
Our dataset consists of 1,044 sites categorized as fake news and 1,368 sites categorized as real news.
Who buys ad space on fake news websites?
To understand who facilitates the monetization of news websites via ads, we study the entities responsible for selling ads in each website, by utilizing the ads.txt files served by websites:
Figure 1: Most popular authorized digital sellers with direct relationship in ads.txt files. We observe that the majority of news websites have business relationships with Google.
We see that 80.8% of fake news websites have a direct business relationship with google.com, 49.0% of fake news websites have a direct business relationship with indexexchange.com, and 52.5% of real news websites have a direct business relationship with appnexus.com. Note that before starting such a direct business relationship between an ad network and a website, there is a vetting process to be followed.
By examining independently the top ad systems for fake and real news websites, we find that revcontent.com is the only ad system that is popular (i.e., ranked 5th) among all ad networks integrated with fake news websites, but ranked very low (i.e., 51st) among the ad networks of real news websites. On the contrary, yahoo.com mainly works with real news websites (68% of them form a business relationship with yahoo.com), as it was found on around 30% of fake news websites.
To investigate what kind of ads are being displayed in fake news websites, we develop a methodology to detect digital advertisements:
Figure 2: Ad detection methodology that combines both external block lists and network traffic monitoring.
We manually verify our approach and it achieves both high Precision ( 92% of “ads” marked in the websites are actual ads), and Recall (87% of actual ads in the websites were correctly detected).
We apply our methodology across every website in our dataset and we extract the landing page of each detected advertisement. By using Cyren’s content categories, we find that about 70% of the fake news websites advertise “Business” products and services, and close to 40% of the fake news websites advertise “Entertainment” products and services.
Who owns fake news websites?
To understand who owns fake news sites and what other websites do they own, we leverage Publisher-specific IDs (alphanumeric values that follow strict formats and uniquely identify user accounts in popular services, such as AdSense and Google Analytics) that are embedded in the websites (as described in more detail here [3]). We construct directed bipartite graphs for these sites and by combining them together, we get a Metagraph so as websites that share an identifier are connected through an edge in the Metagraph. The weight of an edge is proportional to the number of Publisher-specific IDs two websites share. We integrate information from a 1M crawls dataset, which contains Publisher-specific IDs found in the top 1M most popular websites and thus our Metagraph contains over 114.5K nodes (websites) and 443K edges. To detect clusters of websites operated or owned by the same entity, we use the Louvain method for graph-based community detection.
We demonstrate the correctness and efficiency of our methodology by manually investigating communities that contain at least one fake news website. Our findings include, among others, a cluster of 6 websites of Sophia Media. Four of these websites are part of the Health Impact News Network and have been marked as “Pseudoscience websites by MBFC. In addition, we find a pair of websites, freedomforceinternational.org and needtoknow.news, founded and powered by G. Edward Griffin. In his websites, he promotes not only right-wing beliefs, but also conspiracy theories and pseudoscience treatment.
Finally, we see that owners of fake news websites own other types of websites as well, including “Entertainment”, “Business”, and “Politics”. This implies that the operation of an average fake news website is not an isolated or outlying event, focused on fast ad-profits, but instead is probably part of a wider business function.
Data & Code Availability
To support and enable further research on fake news, and extensibility of our work, we make publicly available:
- The lists of 1,044 fake and 1,368 real news websites
- Screenshots of ads collected on top 50 most popular websites for each category of “News & Media” and “Sports”
- Code for the crawler and novel ad detection method used.
Between the lines
Although fake news is being used more in recent years as a tool of political propaganda, there is no doubt that spreading misinformation has become a very lucrative business on the Web. Stifling fake news impact depends on the efforts from the society, and the market, in limiting the economic incentives of fake news producers. Our study shows that the operation of a fake news website is rarely an isolated event, but it is frequently part of a larger business function with owners usually operating a variety of websites related to entertainment, business, politics, technology, etc. We hope our study, and the material we make publicly available will enhance the transparency and help curb the financial and advertising incentives that such websites have been enjoying so far.
References
- Richard Gray. Lies, propaganda and fake news: A challenge for our age. https://www.bbc.com/future/article/20170301-lies-propaganda-and-fake-news-a-grand-challenge-of-our-age, 2017
- Sara Fischer. “unreliable” news sources got more traction in 2020. https://www.axios.com/unreliable-news-sources-social-media-engagement-297bf046-c1b0-4e69-9875-05443b1dca73.html, 2020
- Emmanouil Papadogiannakis, Panagiotis Papadopoulos, Nicolas Kourtellis, and Evangelos P Markatos. Leveraging google’s publisher-specific ids to detect website administration. In Proceedings of the Web Conference, WWW’22, 2022.