Safety and Security

The Case for Anticipating Undesirable Consequences of Computing Innovations Early, Often, and Across Computer Science

January 23, 2024

🔬 Research Summary by Rock Yuren Pang, whose focus is on using HCI methods, crowdsourcing, and large language models to support researchers in anticipating the social impact of their work. [Original paper by Rock … [Read more...] about The Case for Anticipating Undesirable Consequences of Computing Innovations Early, Often, and Across Computer Science

DICES Dataset: Diversity in Conversational AI Evaluation for Safety

January 22, 2024

🔬 Research Summary by Ding Wang, a senior researcher from the Responsible AI Group in Google Research, specializing in responsible data practices with a specific focus on accounting for the human experience and … [Read more...] about DICES Dataset: Diversity in Conversational AI Evaluation for Safety

Defending Against Authorship Identification Attacks

January 18, 2024

🔬 Research Summary by Haining Wang, a Ph.D. student at Indiana University Bloomington, specializing in natural language processing and large language models. [Original paper by Haining Wang] Overview: … [Read more...] about Defending Against Authorship Identification Attacks

Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models

January 14, 2024

🔬 Research Summary by Leyang Cui, a senior researcher at Tencent AI lab. [Original paper by Yue Zhang , Yafu Li , Leyang Cui, Deng Cai , Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao , Yu Zhang , Yulong Chen, … [Read more...] about Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Down the Toxicity Rabbit Hole: Investigating PaLM 2 Guardrails

January 3, 2024

🔬 Research Summary by Ashique KhudaBukhsh, an assistant professor at the Rochester Institute of Technology specializing in natural language processing, computational social science, and responsible AI. [Original … [Read more...] about Down the Toxicity Rabbit Hole: Investigating PaLM 2 Guardrails

« Previous Page