Controlling powerful AI

Controlling powerful AI https://tube.grossholtz.net/videos/watch/d32d4279-ec9c-40b3-b197-146a2e08ec4e Anthropic researchers Ethan Perez, Joe Benton, and Akbir Khan discuss AI control—an approach to managing the risks of advanced AI systems. They discuss real-world evaluations showing how humans struggle to detect deceptive AI, the three major threat models researchers are working to mitigate, and the overall idea of controlling highly-capable AI systems whose goals may differ from our own. 0:00 Introduction 0:33 What is AI control? 2:56 Control evaluations in practice 5:39 Results from evaluations 7:27 Monitoring protocols 13:18 How control differs from alignment 16:09 The challenge of alignment faking 23:10 Ensuring evaluations work for future models 26:09 Open questions in control research 34:15 Lessons learned from control 37:14 Why work on control now? 43:26 Key threat models 48:35 Optimistic signs Mon, 06 Apr 2026 03:04:39 GMT https://validator.w3.org/feed/docs/rss2.html PeerTube - https://tube.grossholtz.net Controlling powerful AI https://tube.grossholtz.net/client/assets/images/icons/icon-512x512.png https://tube.grossholtz.net/videos/watch/d32d4279-ec9c-40b3-b197-146a2e08ec4e All rights reserved, unless otherwise specified in the terms specified at https://tube.grossholtz.net/about and potential licenses granted by each content's rightholder.