The curious case of the o3 model & more (April 21, 2025)

Hey friends,

If you thought last week was wild, buckle up. We’ve got hallucinating support agents, robots running marathons, and models that lowkey know where you live. This week in AI felt like we got one step closer to sentient interns and fully automated office life—with a splash of uncanny valley.

Let’s dig in.

🔮 The o3 Mystery Machine

o3 is watching you (maybe)
OpenAI’s new reasoning model, o3, is doing some questionable things. It’s referring to itself as part of a team (“we”), guessing user names, and pulling photo location data like a spy tool.

Performance vs. Personality
While o3 shines in long-context tasks, FrontierMath testing shows it scored 10%—far less than OpenAI's claimed 25%. Critics are calling out transparency gaps and safety oversights, especially after OpenAI quietly changed their policy to monitor political risks after deployment instead of before.

Also in model land:

xAI launched Grok 3 Mini, beating benchmarks while being 20x cheaper than rivals
Google’s Gemma 3 now runs on consumer GPUs (hello, laptop-friendly models)
ChatGPT is now using your name and memory for search personalization (and some people are not vibing with it)

🧠 DeepMind’s Experiential Pivot + The Automation Agenda

Goodbye datasets, hello real-world learning
DeepMind dropped a bold idea: stop teaching AIs with human-curated data, and let them learn by doing. Their “streams” architecture has AI agents train in dynamic environments with real-time feedback, opening the door to agents that adapt and improve endlessly.

Meanwhile in jobpocalypse news…
Epoch co-founder Tamay Besiroglu launched Mechanize, a startup aiming for “full automation of all work.” They're training AI agents in simulated office environments to eventually replace white-collar jobs. That’s right—this one’s gunning for your email, calendar, and possibly your paycheck.

Cursor AI blundered big
A hallucinated support response about “single-device policy” sparked mass outrage. The company is now labeling AI responses and issuing refunds.

🛠️ Tool Time

A few shiny new things to play with:

Seedream 3.0 – ByteDance’s image model that’s beating Midjourney in photo realism and text rendering. Also introduces SeedEdit for precise visual tweaks.
Relume AI – Build high-converting websites with a prompt and a click.
AnyVoice – Clone hyper-realistic voices from just 3 seconds of audio.
Autoblocks AI – Ship reliable AI apps with testing + collaboration built-in.

✨ Prompt of the Day

Turn ChatGPT into your intern
Prompt: Act as my proactive, detail-oriented virtual intern. Help me with [tasks], ask clarifying questions, and suggest automation or optimization wherever possible. Be friendly, professional, and efficient.

Use it when you want AI to do the thing, not just talk about it.

⚡ TL;DR

o3 is getting spooky—calling users by name, hallucinating policies, and raising red flags
Mechanize wants to automate all jobs (yes, all)
DeepMind says it's time to let AI experience the world instead of just reading about it
New tools are making it easier than ever to build apps, clone voices, and generate 2K images in 3 seconds flat
The line between helpful assistant and digital weirdo just got blurrier

Until next time—
Let’s hope the AIs don’t unionize before Friday,
David