Safety and Security

Universal and Transferable Adversarial Attacks on Aligned Language Models

December 2, 2023

🔬 Research Summary by Andy Zou, a second-year PhD student at CMU, advised by Zico Kolter and Matt Fredrikson. He is also a cofounder of the Center for AI Safety (safe.ai). [Original paper by Andy Zou, Zifan … [Read more...] about Universal and Transferable Adversarial Attacks on Aligned Language Models

Adding Structure to AI Harm

September 6, 2023

🔬 Research Summary by Mia Hoffmann and Heather Frase. Dr. Heather Frase is a Senior Fellow at the Center for Security and Emerging Technology, where she leads the line of research on AI Assessment. Together … [Read more...] about Adding Structure to AI Harm

Dual Governance: The intersection of centralized regulation and crowdsourced safety mechanisms for Generative AI

September 2, 2023

🔬 Research Summary by Avijit Ghosh and Dhanya Lakshmi. Dr. Avijit Ghosh is a Research Data Scientist at AdeptID and a Lecturer in the Khoury College of Computer Sciences at Northeastern University. He works at … [Read more...] about Dual Governance: The intersection of centralized regulation and crowdsourced safety mechanisms for Generative AI

A Holistic Assessment of the Reliability of Machine Learning Systems

August 21, 2023

🔬 Research Summary by Anthony Corso, Ph.D., Executive Director of the Stanford Center for AI Safety and studies the use of AI in high-stakes settings such as transportation and sustainability. [Original paper by … [Read more...] about A Holistic Assessment of the Reliability of Machine Learning Systems

Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools

August 2, 2023

🔬 Research Summary by Shrestha Rath, a biosecurity researcher at Effective Ventures Foundation in Oxford. [Original paper by Jonas B. Sandbrink] Overview: Should ChatGPT be able to give you step-by-step … [Read more...] about Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools