Research Summaries

Deployment corrections: An incident response framework for frontier AI models

January 25, 2024

🔬 Research Summary by Joe O’Brien, an Associate Researcher at the Institute for AI Policy and Strategy, focusing on corporate governance and accountability surrounding developing and deploying frontier AI … [Read more...] about Deployment corrections: An incident response framework for frontier AI models

Representation Engineering: A Top-Down Approach to AI Transparency

January 25, 2024

🔬 Research Summary by Andy Zou, a Ph.D. student at CMU, advised by Zico Kolter and Matt Fredrikson. He also cofounded the Center for AI Safety (safe.ai). [Original paper by Andy Zou, Long Phan, Sarah Chen, James … [Read more...] about Representation Engineering: A Top-Down Approach to AI Transparency

Risky Analysis: Assessing and Improving AI Governance Tools

January 24, 2024

🔬 Research Summary by Kate Kaye, a researcher, author, award-winning journalist, and deputy director of the World Privacy Forum, a nonprofit, non-partisan, public-interest research group. Kate is a member of the OECD.AI … [Read more...] about Risky Analysis: Assessing and Improving AI Governance Tools

Bridging Systems: Open Problems for Countering Destructive Divisiveness Across Ranking, Recommenders, and Governance

January 24, 2024

🔬 Research Summary by Luke Thorburn, a PhD student at King’s College London, where he works on the design of algorithms to mitigate conflict risks. [Original paper by Aviv Ovadya and Luke Thorburn] Overview: … [Read more...] about Bridging Systems: Open Problems for Countering Destructive Divisiveness Across Ranking, Recommenders, and Governance

Levels of AGI: Operationalizing Progress on the Path to AGI

January 24, 2024

🔬 Research Summary by Meredith Ringel Morris, Director of Human-AI Interaction Research at Google DeepMind; she is also an Affiliate Professor at the University of Washington, and is an ACM Fellow and member of the ACM … [Read more...] about Levels of AGI: Operationalizing Progress on the Path to AGI

« Previous Page