Dec 12, 2025

December 12, 2025.

4 Comments

Jeff Morhous

Dec 12, 2025

5.2 benchmarks are nuts. Very much enjoying comparing it to other models

It's on my to-do list to take it for a spin on longer-running agentic tasks. At Pulley we dreamed about setting up a model harness that could do cap table document analysis and that's one of their demo examples!

That sounds like a really great use case! As a software engineer I mostly watch the SWE-bench numbers, but I know there's a **ton** of emphasis on this model's performance in spreadsheets (especially BIG spreadsheets)

Reply

Share

AI Roundup 148: GPT-5.2