• Horizon AI
  • Posts
  • A New AGI Test Humbles Most Leading AI Models šŸ¤Æ

A New AGI Test Humbles Most Leading AI Models šŸ¤Æ

Create interactive mind maps in seconds with NotebookLM

In partnership with

Welcome to another edition of Horizon AI,

The tech industry has recently been calling for new, unbiased benchmarks to measure AI progress more accurately, and a new test aims to evaluate AGI in models.

Letā€™s get started!

Read Time: 4.5 min

Here's what's new today in the Horizon AI

  • Leading AI Models Struggle in New AGI Test

  • Ideogram Releases Version 3.0 of Its Image Model

  • AI Tutorial: Create interactive mind maps in seconds with NotebookLM

  • AI Tools to check out

  • The Latest in AI and Tech šŸ’”

  • AI Findings/Resources

AI News

THE ARC PRIZE FOUNDATION

Leading AI Models Struggle in New AGI Test

ā€‹The Arc Prize Foundation, co-founded by AI researcher FranƧois Chollet, has introduced a new evaluation called ARC-AGI-2 to assess the general intelligence of AI models.

Details:

  • This test presents AI systems with complex, puzzle-like problems that require identifying visual patterns among colored squares and generating corresponding solutions.

  • Unlike its predecessor, ARC-AGI-1, the updated version prioritizes efficiency, challenging models to adapt to new problems without relying on ā€œbrute forceā€ ā€” extensive computing power ā€” to find solutions.

  • Leading AI models have struggled with ARC-AGI-2. For instance, OpenAI's o1-pro and DeepSeek's R1 achieved scores between 1% and 1.3%.

  • OpenAIā€™s o3 modelā€”o3 (low)ā€”was the first to achieve a record 75.7% on ARC-AGI-1 but managed only 4% on ARC-AGI-2, despite using $200 worth of computing power per task. In contrast, human participants averaged a 60% success rate on the same test.

Alongside the new benchmark, the Arc Prize Foundation also announced the Arc Prize 2025 competition, daring developers to reach 85% accuracy on the ARC-AGI-2 test while keeping costs at $0.42 per task.

TOGETHER WITH LUMEN

Transform Your Health with Every Breath

Metabolism is lifeā€”it powers everything from your heartbeat to the energy you feel. Lumen, the worldā€™s first handheld metabolic coach, gives you the insights you need to optimize your metabolism and reach your health goals. Whether you're looking to lose weight, boost energy, or enhance athletic performance, Lumen helps you balance your lifestyle and make smarter, data-driven decisions to achieve your goals. Whether youā€™re preparing meals or planning workouts, Lumen gives you the tools you need to succeed. Take the first step to a healthier you. Get started today and save 15% with code HEALTHWITHLUMEN.

IDEOGRAM

Ideogram Releases Version 3.0 of Its Image Model

Ideogram has introduced version 3.0 of its AI image generation model, with new features for creating more realistic and stylized images.

Details:

  • Ideogram 3.0 delivers improved image quality with more sophisticated spatial compositions, precise lighting and coloring, and detailed backgrounds.

  • A notable addition is the style reference feature, allowing users to upload up to three reference images to guide the aesthetic of the generated output.

  • Text generation has always been a highlight of Ideogramā€™s models, and the new system now incorporates text elements into complex layouts and brand visualizations, facilitating the creation of marketing graphics, book covers, and event posters.

  • The company has also revamped its website and introduced Canvas, an AI-powered image editor that enables users to create, edit, and combine images using techniques like inpainting and outpainting.

With the hype around ChatGPT's GPT-4o and its new image generation and editing capabilities, Ideogram aims to compete by focusing on photorealism and professional tools.

The update is now accessible through both the Ideogram website and iOS app.

AI Tutorial

Create interactive mind maps in seconds with NotebookLM

Google has recently added interactive mind maps to NotebookLM, enhancing its ability to help users visually navigate complex information and better understand relationships between concepts in their documents.

  1. Go to NotebookLM website 

  2. Select "Create new" and add your sourcesā€”they can be PDFs, text, audio, copied text, website and YouTube links, Google Docs, etc. If you already created a notebook, just open it instead.

  3. Once you're in, select the "Mind Map" option.

  4. Open your mind map, and that's it! Explore and enjoy.

AI Tools to check out

šŸŽ“ Knowt: Upload your notes and let our AI makes flashcards and practice tests to help you learn the material better.

šŸ“ˆļø Beacons: All-in-one creator platform.

ļøšŸŒ Shapen: Create 3D models from images.

āœ… Persuva: Create high-converting product pages, landing pages, and advertorials in under 5 minutes.

šŸŒ‘ Fullmoon: chat with private and local large language models

AI Findings/Resources

šŸŒ AI for the world, or just the West? How researchers are tackling Big Tech's global gaps

šŸ”¬ AI is transforming peer review ā€” and many scientists are worried

šŸŒ Bill Gates: Within 10 years, AI will replace many doctors and teachersā€”humans wonā€™t be needed ā€˜for most thingsā€™

The latest in AI and Tech

Last week, The Atlantic introduced a tool that searches the LibGen database, which Meta reportedly used to train its AI models. Author Maris Kreizman shared in Literary Hub that she discovered her still-unpublished work among the files.

BMW will integrate AI cockpit technology from Banma, a company backed by Alibaba, into its upcoming models for the Chinese market, the companies announced Wednesday. Banma's technology was developed in collaboration with Alibaba's Qwen model team.

The new 'Parental Insights' feature sends parents a weekly report of their kids' chatbot usage via email. The details will include average screen time and what portion of that time kids are dedicating to specific bots.

Xā€™s AI chatbot is now integrated into the encrypted messaging app, but only Telegram Premium subscribers will have access to it for now.

Thatā€™s a wrap!

Thanks for sticking with us to the end! Letā€™s stay connected on LinkedIn and Twitter.

We'd love to hear your thoughts on today's email!

Your feedback helps us improve our content

Login or Subscribe to participate in polls.

Not subscribed yet? Sign up here and send it to a colleague or friend!

See you in our next edition!

Gina šŸ‘©šŸ»ā€šŸ’»