AI Explained | The New Claude 3.5 Sonnet: Better, Yes, but...

A new state of the art LLM (at least for creative writing and basic reasoning) but what lies behind the numbers that were put out?

2024-10-23 19:00:00 - AI Explained

Is it for real, and are AI agents about to grab your mouse and shake your cursor? Plus, results on my own Simple Bench, and new tools from Runway (Act-One), HeyGen (Zoom Calls) and an updated NotebookLM. AI, without the hype.

00:00 – Introduction

00:57 – Claude 3.5 Sonnet (New) Paper

02:06 – Demo

02:58 – OSWorld

04:29 – Benchmarks compared + OpenAI Response

08:30 – Tau-Bench

13:09 – SimpleBench Results

17:05 – Yellowstone Detour

17:29 – Runway Act-One

18:44 – HeyGen Interactive Avatars + Demo

21:06 – NotebookLM Update

AI Explained | The New Claude 3.5 Sonnet: Better, Yes, but...

More Posts