• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Core Principles of Responsible AI
    • Accountability
    • Fairness
    • Privacy
    • Safety and Security
    • Sustainability
    • Transparency
  • Special Topics
    • AI in Industry
    • Ethical Implications
    • Human-Centered Design
    • Regulatory Landscape
    • Technical Methods
  • Living Dictionary
  • State of AI Ethics
  • AI Ethics Brief
  • 🇫🇷
Montreal AI Ethics Institute

Montreal AI Ethics Institute

Democratizing AI ethics literacy

Selecting Privacy-Enhancing Technologies for Managing Health Data Use

June 1, 2023

🔬 Research Summary by Rachele Hendricks-Sturrup, Clara Fontaine, and Sara Jordan

Dr. Rachele Hendricks-Sturrup is the Research Director of Real-World Evidence (RWE) at the Duke-Margolis Center for Health Policy in Washington, DC, strategically leading and managing the Center’s RWE Collaborative.

Clara Fontaine is a Ph.D Student in the Centre for Quantum Technologies at NUS.

Dr. Sara R. Jordan was Senior Researcher, Artificial Intelligence and Ethics at the Future of Privacy Forum.

[Original paper by Rachele Hendricks-Sturrup, Clara Fontaine, and Sara Jordan]


Overview: In the 21st century, real-world data privacy is possible using privacy-enhancing technologies (PETs) or privacy engineering strategies. This paper draws on the literature to summarize privacy engineering strategies that have facilitated the use and exchange of health data across various practical use cases.


Introduction

Today, real-world data privacy remains controversial and elusive, driving ongoing debate among privacy researchers, health industry members, policymakers, and others about how to best safeguard both patient and consumer health data in more modernized ways. Researchers and other data management experts have demonstrated how real-world data can be generated, linked, processed, and shared in privacy-preserving and identity-revealing ways. Few, if any, have broadly explored strategies through which real-world data privacy can be preserved using PETs or privacy engineering. 

In this paper, the authors scoped the state of the literature and knowledge on privacy engineering strategies that, to date, have facilitated the use and exchange of health data. Key findings lent three general categories of PETs concerning health data: algorithmic, architectural, and augmentation-based PETs. 

Often combined, those three general categories of PETs fill privacy, security, or data sovereignty needs across a range of practical use cases involving health data.

Key Insights

PETs Defined and Explained

Defined broadly, a privacy-enhancing technology is a technical means that can protect a user’s privacy through policy, interaction, and/or architecture. To enable more critical analysis and comparisons of the applicability of these technologies on health data, we identified and categorized 7 PETs.

Algorithmic PETs

Represent data in a privacy-protecting but still useful way. Providing mathematical rigor and measurability to privacy.

  1. Homomorphic encryption
  2. Differential privacy
  3. Zero-knowledge proofs

Architectural PETs

Enabling confidential information exchange without sharing underlying data using a structured computation environment.  

  1. Federated learning
  2. Multi-party computation

Augmentation PETs

Generating realistic data to enhance small datasets or generate fully synthetic datasets.

  1. Synthetic data
  2. Digital twinning

State of the Peer-Reviewed Literature

In the United States, real-world uses of health data require that the data be protected with the highest privacy standards set by HIPAA. However, HIPAA does not precisely define what constitutes these standards beyond removing 18 personal identifiers. An expert must determine any stronger protections.

An expert could defensibly turn to peer-reviewed literature on the topic to make their determination. As domain experts in health data privacy, machine learning, and computer science, we sought to experience and assess this process of expert determination ourselves. We examined the state of peer-reviewed literature at the intersection between PETs and health data applications. We evaluate the robustness, consistency, transparency, and usefulness of the results presented in relevant papers retrieved from ACM, IEEE, and PubMed. 

Challenges for Expert Determination of Usefulness of Peer-Reviewed Literature on PETs and Health Data

Within our team of three experts, we independently evaluated each article against two criteria: 

  1. Applicability to the health data context
  2. The rigor of testing of the PET (i.e., quality, quantity, diversity of datasets)

Although we worked with the same rubric and consulted one another often for clarifications, we evaluated the PET literature differently. Our expert determinations heavily depended on knowledge of health conditions, recognition of standard benchmark datasets, and understanding machine learning performance metrics. Glaring disparities in the quality of performance characterization — details on privacy-utility tradeoffs, computational time, and hardware constraints in these articles amplified the differences in our determinations. We achieved only ⅔ similarity on over 80% of the literature, raising the questions: What makes a relevant expert? and What are the limits of expert collaboration?

Key Characteristics and Considerations for Each PET

To move the health data privacy community forward on this topic, we provide the referenced review and summary table titled “Key Characteristics and Considerations of Each PET.” This table is an overview of each algorithmic, architectural, and data augmentation PET described, specific use cases, pros and cons associated with using each PET in health data contexts, and opportunities for future research.

PETDescriptionUse-casesProsConsFuture research
Differential PrivacyAdds noise to a dataset to reduce an adversary’s ability to tell whether an individual is part of the dataset

Some variations improve data utility at the cost of weaker privacy protection
Publishing or sharing data to satisfy research needs
Provides measurable privacy guaranteesPrivacy-utility tradeoff
Inapplicable to time-series data

Under-represented or “unique” minority data may not be well-characterized 
Comparable and consistent reporting between DP variations of types and granularity of at-risk private information
Homomorphic EncryptionEncryption scheme that enables private computation over encrypted sensitive data

Partial, somewhat, and fully homomorphic encryption
Third-party computation
Data storage and processing
Provides a high level of privacy

Compatible with most data types
Inefficient, expensive, and complex

Not well-suited to resource-constrained environments
Explore more diverse and lightweight variations of HE especially for resource-constrained environments 

Analyze performance-privacy tradeoffs carefully
Zero-knowledge proofsVerification of sensitive data between collaborators without explicitly transferring dataIdentity and attribute verificationNo direct transfer of sensitive health data

Space, power, and computationally efficient
Applications are poorly characterized and infrequently discussed in health data researchExplore practical applications with health data and characterize performance and privacy
Federated learningCollaborative ML modeling while keeping training data local to data owners

Decentralized or centralized for both data and model
Collaborative ML with theoretically any type of algorithm or dataEnables ML training with more diverse data

Reduced computational load for institutions or devices

Private data never moves beyond the firewalls of institutions or devices

Provides a high level of data sovereignty to owners
No true privacy baseline across the learning system 

Scalability is dependent on collaboration and stable communication between otherwise sovereign and asymmetrical devices or institutions

Aggregated data is not necessarily independent and identically distributed
Identify when a federated approach is the best choice for the specific reason of protecting data privacy

Address challenges of interoperability

Consistently characterize the tradeoffs between privacy, utility, and performance across different FL approaches to aid decision-making
Multi-party computationComputation across multiple encrypted data sources while ensuring no party learns the private data of another

Includes secret sharing, garbled circuits, oblivious transfer
Collaborative inference

Third-party model training
Strong privacy protections for all participating parties

No need for a trusted third-party

High accuracy and precision
Communication and computational complexity are too high to use reasonably at scale and in resource-constrained environments

Privacy-accuracy tradeoff
Develop more practical SMC solutions for resource-constrained environments and computations at scale
Synthetic DataSynthesizing data to use instead of or in addition to real health data
Supports rapid development and benchmarking of ML algorithms

Balance data that has uneven representation

Augment datasets

Measure utility loss of algorithmic PETs
It may be the most effective way to maximize privacy

Increasingly easy and cost-efficient to implement
Limited methods to generate realistic data
Limited types of data that can be synthesized

Need to validate that synthetic data is representative of real data

Should be restricted to secondary uses
Develop diverse methods to generate realistic synthetic data of all data types


Digital twinningVirtual representations of what has been manufactured A virtual counterpart to persons or hospitals to test tools like ML modelsA real-time simulated environment without risk of exposing private dataApplication in healthcare is primarily theoretical

Privacy protections (e.g., risk of re-identification) are not well characterized
Develop practical applications of digital twins in healthcare

Characterize privacy protections
Table: KEY CHARACTERISTICS AND CONSIDERATIONS FOR EACH PET

Expertly Choosing PETs for Health Data

Experts in health data management, public health, computer science, and privacy engineering may face certain challenges in their collaborative attempts to ascertain the state of the literature on PETs and health data. As the HIPAA expert determination pathway continuously requires experts to rely on their knowledge of state-of-the-art technical methods to balance disclosure risk and data utility, the following themes from the literature are essential to remember:

  1. Not all types of health data can be protected with each PET. Drastic variations in performance, data utility, and/or computational resources are required.
  2. Architectural PETs create opportunities for data sovereignty and privacy in the aggregate, but they do not protect privacy locally. 
  3. The state of the art of PETs research for health data often uses well-worn benchmark datasets from other domains or specific health data types. Thus, they cannot be easily extended to health data in the wild.
  4. Besides maximizing synthetic data, combining architectural and algorithmic approaches is the emerging best practice in the research literature. 
  5. When to use a specific PET is context-dependent. The choice of PET in a pre-inferential setting will vary from choices made in a post-inferential setting. Continuous use of a single method will likely result in unintended privacy or performance loss.
  6. None of the algorithmic or architectural PETs can guarantee zero risk of reidentification. Fully synthetic data is the closest to providing such guarantees.

Between the lines

An important takeaway from this paper is that it offers a critical yet convenient starting point for experts collaborating across health data management, public health, computer science, and privacy engineering practices. We identified three sentinel examples of work with carefully written descriptions of PETs used and relevant applications and rigorously reported methods and findings. 

Yet, a great deal of work remains concerning intentionally integrating PETs into the day-to-day practices of health data privacy and security experts and creating more robust and more standardized guidance for relevant practitioners and stakeholders. Understanding the benefits and limitations of each PET is mission-critical for legal and data experts managing or advising on managing sensitive health data. 

Experts should continue to systematically explore a broad range of literature to develop formalized recommendations and disseminate them to policymakers and healthcare system stakeholders seeking to operationalize the potential, utility, and acceptability of PETs in support of public health research and practice.

Want quick summaries of the latest research & reporting in AI ethics delivered to your inbox? Subscribe to the AI Ethics Brief. We publish bi-weekly.

Primary Sidebar

🔍 SEARCH

Spotlight

AI Policy Corner: Frontier AI Safety Commitments, AI Seoul Summit 2024

AI Policy Corner: The Colorado State Deepfakes Act

Special Edition: Honouring the Legacy of Abhishek Gupta (1992–2024)

AI Policy Corner: The Turkish Artificial Intelligence Law Proposal

From Funding Crisis to AI Misuse: Critical Digital Rights Challenges from RightsCon 2025

related posts

  • Research summary:  Laughing is Scary, but Farting is Cute: A Conceptual Model of Children’s Perspect...

    Research summary: Laughing is Scary, but Farting is Cute: A Conceptual Model of Children’s Perspect...

  • Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness

    Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness

  • Towards Community-Driven Generative AI

    Towards Community-Driven Generative AI

  • An Empirical Analysis of Racial Categories in the Algorithmic Fairness Literature

    An Empirical Analysis of Racial Categories in the Algorithmic Fairness Literature

  • The State of AI Ethics Report (Oct 2020)

    The State of AI Ethics Report (Oct 2020)

  • The Ethics of AI Value Chains: An Approach for Integrating and Expanding AI Ethics Research, Practic...

    The Ethics of AI Value Chains: An Approach for Integrating and Expanding AI Ethics Research, Practic...

  • SoK: The Gap Between Data Rights Ideals and Reality

    SoK: The Gap Between Data Rights Ideals and Reality

  • De-platforming disinformation: conspiracy theories and their control

    De-platforming disinformation: conspiracy theories and their control

  • Ghosting the Machine: Judicial Resistance to a Recidivism Risk Assessment Instrument

    Ghosting the Machine: Judicial Resistance to a Recidivism Risk Assessment Instrument

  • The Larger The Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices

    The Larger The Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices

Partners

  •  
    U.S. Artificial Intelligence Safety Institute Consortium (AISIC) at NIST

  • Partnership on AI

  • The LF AI & Data Foundation

  • The AI Alliance

Footer

Categories


• Blog
• Research Summaries
• Columns
• Core Principles of Responsible AI
• Special Topics

Signature Content


• The State Of AI Ethics

• The Living Dictionary

• The AI Ethics Brief

Learn More


• About

• Open Access Policy

• Contributions Policy

• Editorial Stance on AI Tools

• Press

• Donate

• Contact

The AI Ethics Brief (bi-weekly newsletter)

About Us


Founded in 2018, the Montreal AI Ethics Institute (MAIEI) is an international non-profit organization equipping citizens concerned about artificial intelligence and its impact on society to take action.


Archive

  • © MONTREAL AI ETHICS INSTITUTE. All rights reserved 2024.
  • This work is licensed under a Creative Commons Attribution 4.0 International License.
  • Learn more about our open access policy here.
  • Creative Commons License

    Save hours of work and stay on top of Responsible AI research and reporting with our bi-weekly email newsletter.