Sep 13, 2024 - 17:49
By pcgamer.com

Look out, OpenAI's latest chatbot hallucinates less and might even count to three

OpenAI has unleashed yet another new chatbot on we poor, unsuspecting humans. We give you o1, a chatbot designed for more advanced reasoning that's claimed to be better at things like coding, math and generally solving multistep problems.

Perhaps the most significant change from previous OpenAI LLMs is a shift from mimicking patterns found in text training data to a focus on more direct problem solving, courtesy of reinforcement learning. The net result is said to be a more consistent, accurate chatbot.

“We have noticed that this model hallucinates less,” OpenAI’s research lead, Jerry Tworek, told The Verge. Of course, «hallucinates less» doesn't mean no hallucinations at all. “We can’t say we solved hallucinations,” Tworek says. Ah.

Still, o1 is said to use something akin to a “chain of thought” that's similar to how we humans process problems, step-by-step. That contributes to much higher claimed performance in tasks like coding and math.

Apparently, o1 scored 83% in the qualifying exam for the International Mathematics Olympiad, far better than the rather feeble 13% notched up by GPT-4o. It has also performed well in coding competitions and OpenAI says an imminent further update will enable it to match PhD students, «in challenging benchmark tasks in physics, chemistry and biology.”

However, despite these advances, or perhaps because of them, this new bot is actually worse by some measures. It has fewer facts about the world at its finger tips and it can't browse the web or process images. It's also currently slower to respond and spit out answers, currently, than GPT-4o.

Of course, one immediate question that follows from all this is whether this new chatbot still suffers any of the surprising limitations of previous bots. Can o1, for instance, even count to three?

Keep up to date with the most important stories and the best deals, as picked by the PC Gamer team.

Apparently, yes, it can. GPT-4o can apparently be flummoxed when ordered to count the number of

Contacts

Look out, OpenAI's latest chatbot hallucinates less and might even count to three

How to Beat Phantom Mage In P3R: Episode Aigis (Weaknesses & Resistances)

Get your setup sorted for games season: PC Gamers' favorite gaming headset, mouse, keyboard, and controller are all on sale today

Look out, OpenAI's latest chatbot hallucinates less and might even count to three

Related News