• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • šŸ‡«šŸ‡·
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Regional Differences in Information Privacy Concerns After the Facebook-Cambridge Analytica Data Scandal

June 5, 2022

šŸ”¬ Research summary by Felipe GonzĆ”lez-Pizarro, a Ph.D. Computer Science student at the University of British Columbia. His research focuses on natural language processing, information visualization, and computational social science.

[Original paper by Felipe GonzÔlez-Pizarro, Andrea Figueroa, Claudia López, Cecilia Aragon]


Overview: While there is increasing global attention to data privacy, most of their understanding is based on research conducted in a few countries of North America and Europe. This paper proposes an approach to studying data privacy over a larger geographical scope. By analyzing Twitter content about the #CambridgeAnalytica scandal, we observe language and regional differences on privacy concerns that hint a need for extensions of current information privacy frameworks.


Introduction

In 2018, the firm Cambridge Analytica was accused of collecting and using the personal information of more than 87 million Facebook users without their authorization. Opinions, facts, and stories related to it were shared on social media, including  Twitter, where the hashtag #DeleteFacebook became a trending topic for several days. 

This paper analyzes more than a million public tweets related to the scandal. First, we divide the dataset by language (Spanish and English) and regions (Latin America, Europe, North America, and Asia). Using word embedding and manual content analysis,  we study and compare the semantic context in which privacy-related terms were used. Then, we contrast our results with one of the most used information privacy concerns frameworks (IUIPC). We pay special attention to the differences in emphasis on privacy-related terms across languages and world regions. 

We observe a greater emphasis on data collection in English than in Spanish. Additionally, data from North America exhibits a narrower focus on awareness compared to other regions under study. Some key concepts, such as regulations, are discussed online in all regions and both languages but have not yet been added to current information privacy frameworks. Our results call for more diverse sources of data and nuanced analysis of data privacy concerns around the globe.

Key Insights

Can information privacy concerns be present in a Twitter dataset?

Mining text from social media platforms such as Twitter is a fast and inexpensive method to gather opinions from individuals and can complement findings obtained from traditional polls or other research methods. Following this trend of research, we investigate whether Twitter data can reveal people’s information privacy concerns. Thus, our first research question is: Which information privacy concerns are present over social media content about a data-breach scandal?

To answer this question, first, we retrieved the most related terms to four privacy-related keywords: “data”, “privacy”, “user”, and “company” in multiple word embeddings. Word embeddings are representations of words, in the form of vectors that encode the meaning of words, in such a way that words that are closer in the vector space are expected to be related. For instance, the three most related terms to ‘privacy’ in our English word embedding are “data privacy”, “gdpr”, and “protection.”  

Collecting and analyzing the semantic contexts of these privacy-related keywords allows us to observe the presence of terms related to information privacy concerns in the collected tweets. We systematically conducted open coding of these terms. After several iterations, we developed a set of categories to characterize them. Finally, to assess if information privacy concerns were present, we contrasted these categories to a widely accepted framework to describe internet users’ information privacy concerns (IUIPC). We find relationships among some of our categories and the three IUIPC concepts as well as our initial keywords (see Figure 1). 

Figure 1: We identify several categories that can be easily mapped to the three dimensions of the Internet User Information Privacy Concerns (IUIPC): collection, awareness, and control. In this way, we find evidence that social media content can reveal information about privacy concerns.

Current conceptualizations of information privacy concerns might need to be extended

Our results suggest a more granular categorization of an IUIPC concept. Awareness might include more specific sub-topics that users can be aware of, such as privacy and security terms (e.g., cybersecurity, confidentiality), security mechanisms (e.g., credentials, encrypted), and privacy and security risks (e.g., scams, grooming). The presence of terms that fit these categories reveals that they are already part of public online conversations around privacy. A distinction among broad privacy and security terms, mechanisms to protect data, and potential data risks might be helpful to describe further the kinds of knowledge people have. Additionally, awareness about some of these subtopics might be more influential than others. For example, knowing about risks and mechanisms might be a sign of higher privacy concerns, while knowing broad privacy and security terms might not. The distinction between sub-topics could also guide users’, educators’, and practitioners’ efforts to enhance information privacy literacy. 

Regulations are not only a topic of data and law experts

Besides, the presence of the regulation category highlights its importance in relation to information privacy concerns. Regulation refers to laws or rules that aim to regulate the use of personal data. The emergence of this category from our open coding confirms its relevance through its frequent appearance in public posts about a data breach scandal. These regulations are not only a topic of data and law experts, but it seems to be part of the public discourse around data privacy online.

Language and regional differences in emphasis on information privacy concerns

English speakers emphasize data collection more than Spanish speakers.Ā 

Our analysis reveals that English speakers significantly emphasize data collection more than Spanish speakers when freely expressing online about privacy keywords. This difference can lead researchers and practitioners to explore the effectiveness of more tailored data privacy campaigns for specific populations. For example, populations concerned about collection might need more information about the benefits of sharing their information. 

North American privacy concerns are not generalizable to other regions.

We also observe significant regional differences in awareness. Particularly, data from North America shows the smallest emphasis on awareness while Latin America has the highest. Given that most studies on information privacy concerns are centered on the USA, this finding is particularly important. It warns us against the (sometimes implicit) assumption that North American privacy concerns can be generalizable to other regions. Our result provides observational evidence to argue that it is necessary to include more diverse populations to better understand the phenomena around data privacy.Ā  This finding also invites practitioners to address other regions, such as Latin America, using different services and privacy policies approaches. Populations that are more concerned about awareness might be more receptive to companies that use more transparent communications of their use of personal data, for example.

Between the lines

Our paper uses an alternative approach to study information privacy concerns over a large geographical scope. This approach aims to discover knowledge from a large-scale social media dataset on a topic for which a ground truth does not exist. Unfortunately, such ground truth is unlikely to exist because large-scale, multi-country, and multi-language surveys are too expensive to conduct (Li et al., 2020)

We carefully analyzed more than a thousand terms of the semantic contexts, conducted open coding to formulate a data-grounded categorization, and contrasted our categorization with IUIPC (Malhotra et al., 2004), one of the well-accepted theoretical conceptualizations of information privacy concerns. 

In our paper, we discuss how our findings can extend current conceptualizations of information privacy concerns. Finally, we examine how they might relate to regulations about personal data usage in the regions we analyzed. 

Future work can dig deeper into the observed differences and study the potential causes. Future studies might build upon our work to examine privacy concerns considering more languages, geographical locations, or different information privacy frameworks. Using our methodology to compare datasets across more extended periods could be helpful to determine whether the semantic contexts of the privacy keywords change over time.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

šŸ” SEARCH

Spotlight

Canada’s Minister of AI and Digital Innovation is a Historic First. Here’s What We Recommend.

Am I Literate? Redefining Literacy in the Age of Artificial Intelligence

AI Policy Corner: The Texas Responsible AI Governance Act

AI Policy Corner: Singapore’s National AI Strategy 2.0

AI Governance in a Competitive World: Balancing Innovation, Regulation and Ethics | Point Zero Forum 2025

related posts

  • Putting collective intelligence to the enforcement of the Digital Services Act

    Putting collective intelligence to the enforcement of the Digital Services Act

  • Algorithmic Impact Assessments – What Impact Do They Have?

    Algorithmic Impact Assessments – What Impact Do They Have?

  • On Human-AI Collaboration in Artistic Performance

    On Human-AI Collaboration in Artistic Performance

  • Owning Ethics: Corporate Logics, Silicon Valley, and the Institutionalization of Ethics (Research Su...

    Owning Ethics: Corporate Logics, Silicon Valley, and the Institutionalization of Ethics (Research Su...

  • The Challenge of Understanding What Users Want: Inconsistent Preferences and Engagement Optimization

    The Challenge of Understanding What Users Want: Inconsistent Preferences and Engagement Optimization

  • Unprofessional Peer Reviews Disproportionately Harm Underrepresented Groups in STEM (Research Summar...

    Unprofessional Peer Reviews Disproportionately Harm Underrepresented Groups in STEM (Research Summar...

  • Evaluating a Methodology for Increasing AI Transparency: A Case Study

    Evaluating a Methodology for Increasing AI Transparency: A Case Study

  • Resistance and refusal to algorithmic harms: Varieties of ā€˜knowledge projects’

    Resistance and refusal to algorithmic harms: Varieties of ā€˜knowledge projects’

  • Counterfactual Explanations via Locally-guided Sequential Algorithmic Recourse

    Counterfactual Explanations via Locally-guided Sequential Algorithmic Recourse

  • Demystifying Local and Global Fairness Trade-offs in Federated Learning Using Partial Information De...

    Demystifying Local and Global Fairness Trade-offs in Federated Learning Using Partial Information De...

Partners

  • Ā 
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • Ā© MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.