🔬 Original article by Chloé Currie, Itai Epstein, Dane Malenfant and Gayathri Rajendran from Encode Justice Canada
This is a part of our Recess series in which university students from across Canada briefly explain key concepts in AI that young people should know about: specifically, what AI does, how it works, and what it means for you. The writers are members of Encode Justice Canada, a student-led advocacy organization dedicated to including Canadian youth in the essential conversations about the future of AI.
As the world becomes more digital and reliant on advanced technologies in everyday life, there is increased concern about the security of our personal data. Individual citizens are at a higher risk of being tracked and analyzed on the Internet than ever before, with the uses of personal data ranging from arbitrary matters, like targeted ads on Facebook, to more dangerous threats, such as credit card theft.  Personal data is collected in many ways, including through third-party operators or public sources, such as Google, and can be done without the explicit consent of a user. On many websites, there are Terms of Service or ‘cookies’ to accept before entering the site. According to a 2017 Deloitte survey of 2000 American consumers, over 90% of people consent to terms and conditions without reading them.  Internet users often unknowingly consent to their personal information being collected by ‘Big Data’ companies, but there are also instances of websites implementing undetectable trackers in websites to non-consensually collect data. The lack of public knowledge over what is collected and how it is being done produces a significant amount of questions surrounding the ethics of personal data collection. As the number of ‘Big Data’ corporations increase and the threat of data security becomes more apparent, the general public must understand how their personal data is collected and used to ensure the right to privacy is not violated.
How your personal data is collected
According to the UK’s Information Commissioner’s Office (ICO), personal data encompasses any “information that refers to a living individual who may be identified.”  Personal data includes several kinds of information that, when combined, lead to the identification of a certain person. Artificial Intelligence (AI) is a vital tool for data capture, analysis, and information collection that many businesses use for a variety of goals, such as better understanding day-to-day operations, making more educated business decisions, and learning about their customers.  Recently, customer data has become a larger focus area as firms acquire, store, and analyze vast volumes of quantitative and qualitative data about their customer base, ranging from consumer behaviour to predictive analytics. Some organizations have based their business model around customer data, whether they are selling personal information to a third party or developing targeted advertisements. 
Businesses use data collection to profile customers and focus the sale of goods and services to them based on their traits and behaviours. According to the results of a 2019 survey conducted by the Pew Research Center, 77% of Americans had heard or read information about how firms and other organizations use personal data for targeted marketing.  Furthermore, 61% of people who have seen advertisements based on their personal data believe the ads somewhat accurately reflect their interests and characteristics, with a minority of participants viewing company use of personal data as acceptable. 
Additional examples regarding personal data use by corporations and firms include social media monitoring for mental illness and voice assistants sharing audio with law enforcement. 47% of individuals believe it is inappropriate for social media corporations to monitor individuals’ postings for indicators of specific mental illnesses such as depression in order to identify those who are in danger of self-harm and connect them to counselling services, while 27% support the practice.  Secondly, for firms that create voice assistants sharing consumer audio recordings with law enforcement to aid criminal investigations, the same pattern emerges: 49% find it objectionable, while 25% find it acceptable.  These two examples highlight the disparity in general public opinion regarding the use of their customer data, as well as issues concerning the non-consensual use of personal data.
Consent and collection
Personal data is collected at nearly every step of internet usage. When a user connects to the web, some company has likely made a log of multiple data points that can be used to identify a user: where they connected from, the device being used, its operating system, and the browser they used.  While much of this data may seem insignificant and impersonal, these surface datasets create only a small portion of the overarching umbrella that constitutes personal data.  The amount of data that companies collect every day, along with the methods used to do so, can be alarming when visualizing the sheer amount of data that exists on every individual.
The most common way personal data is collected is via a wide spectrum of consent. When visiting a website, users are often prompted with a message that asks whether they agree to the use of certain cookies or a list of long and indecipherable Terms of Service (ToS).  Almost all companies nowadays use these techniques to get consent from their users to track their online habits and browsing data. By giving consent to companies like Meta (previously Facebook), Google, and Amazon, users permit companies to track and collect copious amounts of data, whether the individual knows it or not. 
Furthermore, there are more mischievous ways in which data has been collected, such as Session Replay. Often, users do not know about these methods of data collection because, unlike having to agree to a ToS, these trackers are enabled immediately as the website is loaded, logging how users interact with a website and recording every keystroke and mouse movement made.  For instance, Shoppers Drug Mart previously used the software company FullStory to track user input and interactions on the website.  Though considered to be a simple tracker, FullStory was able to capture the credit card details, including the card number, expiration date, and security code, of a consumer from the American online retailer Bonobos’ website.  This suspicious activity is not limited to just one company: rather, other companies, such as WhatsApp, have been guilty of sharing individual data non-consensually. WhatsApp is an international messaging service that guarantees the security of messages, but, in actuality, everything related to users’ devices and engagement with the app is collected and shared with Meta, WhatsApp’s parent company.  Through these techniques, companies can harvest data from their users in ways unknown to a consumer.
The ethical implications of data collection
Over the last decade, the collection of large datasets and ‘Big Data’ has revolutionized many fields by allowing analytics to use high-dimensional computing for scientific and business insights. However, public knowledge of what is collected is not well defined and has led to many privacy issues. The last century has seen the implementation of robust and strict research ethics, specifically the practice of informed consent, following many controversies regarding human subjects.  At the beginning of the information age, these ethics have carried over into the digital space. For example, MySpace was a social networking platform created in 2003 that differentiated itself from the competition at the time due to the introduction of highly customizable user features. These personalizations led MySpace to become the largest social networking site from 2005 until 2009.  Its large virtual community was a rich source of data for researchers, but the public aspect of the social network challenged whether informed consent was required for data collection. 
In her work Consent in Cyberspace, Merle Spriggs attempts to investigate this problem through the consideration of various ethics surrounding online research and the collection of qualitative data. The paper mainly focuses on obtaining parental data consent when users are underage, with the mitigation of potential harm or risk of exploitation as one of the prominent issues discussed. Spriggs concludes that there is no exact solution to the ethics of data collection, but emphasizes the importance of acknowledging a subjects’ perception of privacy while including informed and voluntary consent where subjects are in control of their participation in research.  With the introduction of large databases and improvements in computation speed seen in the 2020s, these ethical concerns have resurfaced.
A key issue is the maintenance and security of data information. Large databases are also often targeted by malicious third-party actors, and therefore pose a substantial cyber risk. In 2020, Clearview AI, an American facial recognition company, had a major data breach, revealing the names of several international law enforcement agencies using the technology as well as the images collected.  A data breach of this size is concerning to the general public due to the sensitive information of facial identities that the company has amassed. The Clearview AI scandal emphasizes the complicated legality of the use of AI technology within the law and the need to consider the ethics behind data collection. The situation with Clearview AI also highlights the need for transparency between technology companies and private citizens, especially when those citizens could be at risk.
Data security is a difficult concept to grasp, as there are multiple complexities one must understand in order to know how data collection works and what can be done to protect each individual. These complexities include the differentiation between personal and customer data, consensual and non-consensual methods of collection, and the influence of ‘Big Data’ companies. Data is collected in a number of ways, both consensually through ToS and cookies and non-consensually through trackers, which can seem daunting to the average consumer. To improve the comfort levels of the general public, a greater level of transparency and education surrounding the use of artificial intelligence in data collection is necessary, along with greater restrictions on companies who collect data with ill intent. Through the implementation of specific policies regarding the laws surrounding data collection, citizens can feel more secure when using the internet knowing that their personal data will not be used for malicious intent.
 Brooke Auxier et al., “Americans and Privacy: Concerned, Confused, and Feeling a Lack of Control over their Personal Information,” Pew Research Center: Internet, Science & Technology, published August 17, 2020: https://www.pewresearch.org/internet/2019/11/15/americans-and-privacy-concerned-confused-and-feeling-lack-of-control-over-their-personal-information/.
 Caroline Cakebread, “You’re not alone, no one reads terms of service agreements,” Business Insider, published November 15, 2017: https://www.businessinsider.com/deloitte-study-91-percent-agree-terms-of-service-without-reading-2017-11.
 Information Commissioner’s Office, “Guide to the General Data Protection Regulation (GDPR),” ICO, published January 1, 2021: https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/.
 Max Freedman “How and Why Businesses Collect Your Personal Data,” Business News Daily, published June 17, 2020: https://www.businessnewsdaily.com/10625-businesses-collecting-data.html.
 Direct Market Services Ltd., “How is Your Personal Data Collected?” DMSL WebSite, published May 18, 2018: https://www.dmsluk.co.uk/privacy-policy/personal-data-collected/.
 Auxier et al., “Americans and Privacy.”
 Louise Matsakis, “The Wired Guide to Your Personal Data (And Who Is Using It),” Wired, published February 15, 2019: https://www.wired.com/story/wired-guide-personal-data-collection.
 Robert Heaton, “How Does Online Tracking Actually Work?” Robert Heaton WebSite, published November 20, 2017: https://robertheaton.com/2017/11/20/how-does-online-tracking-actually-work/.
 Nitash Tiku, “You’re Browsing a Website. These Companies May Be Recording Your Every Move,” Wired, published November 16, 2017: https://www.wired.com/story/the-dark-side-of-replay-sessions-that-record-your-every-move-online/.
 The FullStory Team, “Leading Canadian retailers Embrace Digital Experience Intelligence,” FullStory, published September 24, 2021: https://www.fullstory.com/blog/canadian-retailers-dxi/.
 Tiku, “You’re Browsing a Website.”
 Lily H. Newman, “WhatsApp Has Shared Your Data with Facebook for Years, Actually,” Wired, published January 8, 2021: https://www.wired.com/story/whatsapp-facebook-data-share-notification/.
 David B. Resnik, “Research Ethics Timeline,” National Institute of Environmental Health Sciences, reviewed November 21, 2021: https://www.niehs.nih.gov/research/resources/bioethics/timeline/index.cfm, and John P.A. Ioannidis, “Informed Consent, Big Data, and the Oxymoron of Research that is Not Research,” The American Journal of Bioethics 13, no.4 (2013): 40–42.
 Ross Dunn, “MySpace’s Rise and Fall – a Timeline of Highs and Lows,” StepForth Web Marketing Inc, published December 16, 2020: https://www.stepforth.com/blog/2011/myspace-timeline-history/.
 Merle Spriggs, “Consent in Cyberspace,” Monash Bioethics Review 28, no. 4 (2009): 25–39.
 Kate O’Flaherty, “Clearview AI, the Company whose Database has Amassed 3 Billion Photos, Hacked,” Forbes, published February 26, 2020: https://www.forbes.com/sites/kateoflahertyuk/2020/02/26/clearview-ai-the-company-whose-database-has-amassed-3-billion-photos-hacked/.