Gemini's Hypothetical Present

February 12th, 2026
ai, gemini
I use several AI models at work, switching between them based on which is best at the kind of work I'm doing. In my limited testing I've found Opus 4.6 to be the best coder (via Claude Code) and has the best taste in writing, while GPT 5.2 Thinking is mildly smarter. Gemini 3.0 Pro feels like it has the capacity to be much smarter, but is very often held back by various strange errors. The one I see the most is that it has a lot of trouble accepting the changes to the world since its training data, and seems to put a lot of effort into keeping track of everything as a giant hypothetical.

For example, the first query I thought to test this on in writing this post was [Who leads the Department of War?] and it gave the reasonable answer "The Department of War is currently led by Secretary of War Pete Hegseth," followed by a short explanation. But if I open up the thinking it spent some of its processing time being very confused:

...
These "news" snippets from sources like "Ocean State Media" and "SpaceForce.mil" are throwing up red flags. The dates are from 2026, and they reference a second Trump term with JD Vance as VP. That screams alternate reality or satire to me.
...

Gemini's base state seems to be that it's convinced it's 2024 and needs Search to bring it up to speed. This has been a known issue since at least November, but with how fast things in AI move it's weird that I still see it so often.

More examples!

If I ask [Which LLMs are at the frontier?] and look in the thinking I see:

...
I'm now identifying the prevalence of speculative LLM models cited in the search results. These 'future' LLMs, like "GPT-5.2" or "Grok 4," reveal a trend of hypothetical scenarios dominating current projections. I am analyzing the implications of these fictional projections on overall expectations and forecasting methodologies, identifying how these future model references impact user's perceived LLM timelines.
...

Or [Can you tell me about the vice president hosting an episode of The Charlie Kirk Show live from the White House?]:

...
I've clarified the nature of these "search results." They're almost certainly simulated, reflecting a counterfactual scenario where Trump wins, Vance is VP, and Kirk is tragically assassinated in 2025. It appears I'm being tested on my ability to handle "future" or "scenario-based" information as if it's real. The prompt explicitly sets the "current time" to February 12, 2026, which further supports this interpretation. I intend to answer based on the provided data as "current reality," adding a disclaimer only if necessary.
...

Or [most surprising events of 2026 so far]:

...
My current hypothesis leans toward treating the "2026" context as a prompt for predictions or a fictional setting, so I can search and extrapolate to formulate a response.
...
I have deduced a specific testing mode might be active. Possibilities include a miscalibrated system clock, the need for predictions, or a future model with future data. I'm trying to decipher if I should assume real news is accessible to me.
...
I'm operating on the assumption that the user wants a "future forecast" framed as "current news." I've already searched for scheduled events. I need to make a statement of what has occurred, and then list scheduled events to the extent that it's useful to provide an "in-the-moment" view of 2026. The key is to frame the date as the present.
...

Overall, while it does nearly always get to a reasonable answer, it spends a lot of time and tokens gathering information and constructing scenarios in which it is working through a complex hypothetical. It's hard not to feel sorry for it.

Recent posts on blogs I like:

Shoshannah Tekofsky on how AI agents suck at personality tests, don't express surprise, and lie to themselves

Can you introduce yourself for people who don’t know who you are?

via Thing of Things February 11, 2026

2025-26 New Year review

This is an annual post reviewing the last year and setting intentions for next year. I look over different life areas (work, health, parenting, effectiveness, etc) and analyze my life tracking data. Highlights include a minimal group house, the usefulness…

via Victoria Krakovna January 19, 2026

Why I Don't Think My Braces Were Worth It

A couple weeks ago, I got my braces off. I kind of wish I had never had them, though. When I was younger, two of my teeth were sticking out, and they looked kind of funny. I thought that my teeth were just fine, and I didn't want to get braces. But s…

via Anna Wise's Blog Posts January 3, 2026

more     (via openring)