🔬 Research Summary by Andy Zou, a second-year PhD student at CMU, advised by Zico Kolter and Matt Fredrikson. He is also a cofounder of the Center for AI Safety (safe.ai). [Original paper by Andy Zou, Zifan … [Read more...] about Universal and Transferable Adversarial Attacks on Aligned Language Models
Safety and Security
Adding Structure to AI Harm
🔬 Research Summary by Mia Hoffmann and Heather Frase. Dr. Heather Frase is a Senior Fellow at the Center for Security and Emerging Technology, where she leads the line of research on AI Assessment. Together … [Read more...] about Adding Structure to AI Harm
Dual Governance: The intersection of centralized regulation and crowdsourced safety mechanisms for Generative AI
🔬 Research Summary by Avijit Ghosh and Dhanya Lakshmi. Dr. Avijit Ghosh is a Research Data Scientist at AdeptID and a Lecturer in the Khoury College of Computer Sciences at Northeastern University. He works at … [Read more...] about Dual Governance: The intersection of centralized regulation and crowdsourced safety mechanisms for Generative AI
A Holistic Assessment of the Reliability of Machine Learning Systems
🔬 Research Summary by Anthony Corso, Ph.D., Executive Director of the Stanford Center for AI Safety and studies the use of AI in high-stakes settings such as transportation and sustainability. [Original paper by … [Read more...] about A Holistic Assessment of the Reliability of Machine Learning Systems
Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools
🔬 Research Summary by Shrestha Rath, a biosecurity researcher at Effective Ventures Foundation in Oxford. [Original paper by Jonas B. Sandbrink] Overview: Should ChatGPT be able to give you step-by-step … [Read more...] about Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools