• Horizon AI
  • Posts
  • OpenAI’s New AI Models Struggle More with Hallucinations 👀

OpenAI’s New AI Models Struggle More with Hallucinations 👀

Google shows new AR glasses, VR headset at TED

Welcome to another edition of Horizon AI,

The AI industry has been pivoting to focus on reasoning models, yet this approach may be worsening a long-standing issue AI has faced since the start: hallucinations.

Let’s jump right in!

Read Time: 4.5’ min

Here's what's new today in the Horizon AI

  • Chart of the week: AI Job Growth Projection Through 2033

  • OpenAI’s New Reasoning Models Show Higher Hallucination Rates

  • Free Resources

  • AI tools to check out

  • Video of the week

Chart of the week

AI Job Growth Projection Through 2033

  • Generative AI is reshaping many sectors, with the potential to displace workers in specific roles while also driving demand for those skilled in navigating and leveraging these new technologies.

AI News

OPENAI

OpenAI’s New Reasoning Models Show Higher Hallucination Rates

Hallucinations have long been one of AI's most persistent and challenging issues, and the release of OpenAI's new reasoning models, o3 and o4-mini, has only deepened those worries, as they exhibit even higher hallucination rates than older models.

Details:

  • According to OpenAI's internal evaluations, the o3 model hallucinated in response to 33% of questions on PersonQA, the company's benchmark for assessing a model's knowledge about people—double the rate of earlier models like o1 and o3-mini, and even worse than OpenAI’s traditional, "non-reasoning" models.

  • The o4-mini model performed even worse, hallucinating 48% of the time on the same benchmark.

  • OpenAI states in its technical report that "more research is needed" to understand why hallucinations are increasing as reasoning models are scaled up.

  • One hypothesis is that the reinforcement learning techniques used for the o-series models may amplify issues typically mitigated by standard post-training processes.​

  • Third-party testing by the AI research lab Transluce supports these findings, noting that the o3 model sometimes fabricates actions it supposedly took to arrive at an answer.

For now, the reasoning approach seems to be the way forward, as these models demonstrate significant improvements in complex tasks compared to traditional models. However, if scaling up reasoning models indeed continues to worsen hallucinations, the search for a solution to this issue just became much more urgent.

Resources

👀 This entrepreneur used AI to transform their business and create multiple revenue streams — Here's exactly how they did it

🎓 Guernsey headteachers adapt to AI use in education

✈️ AI-controlled fighter jets may be closer than we think — and would change the face of warfare

AI Tools to check out

 Bolt: Prompt, run, edit, and deploy full-stack web apps — all without leaving your browser or writing a single line of code.

👉 Figma: A collaborative design tool for creating user interfaces, mobile apps, and websites — with a wide range of features, including AI-powered tools.

🤝 Intercom: An AI-first customer support platform.

🌐 Supabase: An open-source backend-as-a-service that provides real-time databases, authentication, and API services.

👨‍💻 Devin: The first AI software engineer.

Video of the week

Google shows new AR glasses, VR headset at TED

Google is joining Meta, Apple, and others in exploring the ability to overlay a digital display on the real world.

At the TED Conference in Vancouver, Android XR head Shahram Izadi and his colleague Nishtha Bhatia showed off the glasses, which feature a camera, microphones, and speakers—like the Ray-Ban Meta glasses—but also include a "tiny high-resolution in-lens display that's full color."

The demo focused on Google’s Gemini multimodal conversational AI system, including the Project Astra capability, which allows it to remember what it sees by "continuously encoding video frames, combining the video and speech input into a timeline of events, and caching this information for efficient recall."

That’s a wrap!

Thanks for sticking with us to the end! Let’s stay connected on LinkedIn and Twitter.

We'd love to hear your thoughts on today's email!

Your feedback helps us improve our content

Login or Subscribe to participate in polls.

Not subscribed yet? Sign up here and send it to a colleague or friend!

See you in our next edition!

Gina 👩🏻‍💻