🔬 Research Summary by Jeff Johnston, an independent researcher working on envisioning positive futures, AI safety and alignment via law, and Piaget-inspired constructivist approaches to artificial general … [Read more...] about A Case for AI Safety via Law
Safety and Security
AI Deception: A Survey of Examples, Risks, and Potential Solutions
🔬 Research Summary by Dr. Peter S. Park andAidan O’Gara. Dr. Peter S. Park is an MIT AI Existential Safety Postdoctoral Fellow and the Director of StakeOut.AI. Aidan O’Gara is a research engineer at the … [Read more...] about AI Deception: A Survey of Examples, Risks, and Potential Solutions
AI and Great Power Competition: Implications for National Security
🔬 Research Summary by Arun Teja Polcumpally, a Technology Policy Analyst at Wadhwani Institute of Technology Policy (WITP), New Delhi, India). [Original paper by Eric Schmidt] Overview: This research … [Read more...] about AI and Great Power Competition: Implications for National Security
Universal and Transferable Adversarial Attacks on Aligned Language Models
🔬 Research Summary by Andy Zou, a second-year PhD student at CMU, advised by Zico Kolter and Matt Fredrikson. He is also a cofounder of the Center for AI Safety (safe.ai). [Original paper by Andy Zou, Zifan … [Read more...] about Universal and Transferable Adversarial Attacks on Aligned Language Models
Adding Structure to AI Harm
🔬 Research Summary by Mia Hoffmann and Heather Frase. Dr. Heather Frase is a Senior Fellow at the Center for Security and Emerging Technology, where she leads the line of research on AI Assessment. Together … [Read more...] about Adding Structure to AI Harm