Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’
00:00 - Introduction
00:36 - Fiction Bench
02:41 - Practicality - YouTube urls + Security - cut-off date
03:42 - Coding
06:22 - WeirdML Bench
07:01 - Simple Bench Record High
11:23 - Reverse Engineering!
13:22 - Anthropic Paper
17:49 - 3 Caveats