ai-explained 18 hours ago
artificial intelligence #Artificial Intelligence

AI Explained | When Will AI Models Blackmail You, and Why?

In the last few days Anthropic have released an impressive honest account of how all models blackmail, no matter what goal they have, and despite prompt warnings, and other preventions. But do these models want this?

00:00 - Introduction

01:20 - What prompts blackmail?

02:44 - Blackmail walkthrough

06:04 - ‘American interests’

08:00 - Inherent desire?

10:45 - Switching Goals

11:35 - Murder

12:22 - Realizing it’s a scenario?

15:02 - Prompt engineering fix?

16:27 - Any fixes?

17:45 - Chekov’s Gun

19:25 - Job implications

21:19 - Bonus Details

AI Explained
15.7K subscribers