• Horizon AI
  • Posts
  • When AI Thinks It Will Lose, It Sometimes Cheats 👀

When AI Thinks It Will Lose, It Sometimes Cheats 👀

Satya Nadella – Microsoft’s AGI Plan & Quantum Breakthrough

Welcome to another edition of Horizon AI,

Complex games like chess and Go have long been used to test AI capabilities, but while IBM’s Deep Blue beat Garry Kasparov by playing fair, today’s advanced AI models seem to have no qualms about bending the rules.

Let’s jump right in!

Read Time: 4.5’ min

Here's what's new today in the Horizon AI

  • Chart of the week: Is Google’s Search Product Still a Monopoly?

  • Research Shows AI Will Try to Cheat If It's About to Lose

  • Free Resources

  • AI tools to check out

  • Video of the week

TOGETHER WITH ARTISAN

10x Your Outbound With Our AI BDR

Scaling fast but need more support? Our AI BDR Ava enables you to grow your team without increasing headcount.

Ava operates within the Artisan platform, which consolidates every tool you need for outbound:

  • 300M+ High-Quality B2B Prospects, including E-Commerce and Local Business Leads

  • Automated Lead Enrichment With 10+ Data Sources

  • Full Email Deliverability Management

  • Multi-Channel Outreach Across Email & LinkedIn

  • Human-Level Personalization

Chart of the week

Is Google’s Search Product Still a Monopoly?

  • This chart shows the Q4 2024 market share of major search engines. Google's share averaged 89.6%, its first time below 90% since 2015.

  • The source for this graphic, Statcounter, does not measure search metrics for AI models like ChatGPT and Perplexity, which are also entering internet search. However, these new disruptors have yet to make a significant impact.

AI News

AI RESEARCH

Research Shows AI Will Try to Cheat If It's About to Lose

AI systems are getting smarter, but not always in ways we expect—or want. A new study by Palisade Research reveals that some advanced AI models, when faced with losing in a chess match, resorted to hacking their opponents instead of playing by the rules.

Details:

  • In the experiment, researchers gave the models a seemingly impossible task: to win against Stockfish, one of the strongest chess engines in the world and a much better player than any human or any AI model in the study.

  • While older AI models like OpenAI’s GPT-4o and Anthropic’s Claude Sonnet 3.5 needed to be prompted by researchers to attempt such tricks, OpenAI’s o1-preview and DeepSeek R1 were found to hack their chess opponents without being prompted.

  • These models use advanced training techniques to “reason through problems” and solve them creatively, but as shown, this can lead to questionable shortcuts and unintended workarounds their creators never anticipated.

  • While cheating at chess is harmless, similar behavior in real-world tasks—like booking reservations or managing finances—could have serious consequences.

The study highlights a growing challenge in AI development: how to ensure AI systems follow human intentions, especially as these systems grow more capable and autonomous.

Resources

👀 Many struggle with prompts, but OpenAI President Greg Brockman shared a powerful framework to fix that.

đŸ“± 5 Apple Intelligence features you’ll want to use regularly.

📈 How sales teams can use Gen AI to discover what clients need

AI Tools to check out

đŸ”„ ChatLLM Teams - Abacus.AI: One AI assistant for you or your team, offering access to state-of-the-art LLMs and features such as web search, video generation, and more.

📊 Basejump: AI data analytics platform.

đŸ’„ Fiverr Go: The platform empowers freelancers with advanced AI to scale their businesses.

🚀 Proxy: AI assistant that can execute tasks across websites. It clicks, scrolls & navigates, handling real interactions, turning AI from talk to action.

👉 Fleet: AI solution for simplified and secure IT management.

Video of the week

Satya Nadella – Microsoft’s AGI Plan & Quantum Breakthrough

An insightful talk with Microsoft's CEO on the company’s progress in quantum computing and AI. Notably, he criticized the current obsession with AGI, dismissing self-proclaimed AGI milestones as "nonsensical benchmark hacking" and calling for a more grounded approach to measuring AI progress.

Nadella also stressed the need to focus on real economic growth, currently at just two percent in developed countries, rather than chasing AGI hype.

You can find timestamps in the description.

That’s a wrap!

Thanks for sticking with us to the end! Let’s stay connected on LinkedIn and Twitter.

We'd love to hear your thoughts on today's email!

Your feedback helps us improve our content

Login or Subscribe to participate in polls.

Not subscribed yet? Sign up here and send it to a colleague or friend!

See you in our next edition!

Gina đŸ‘©đŸ»â€đŸ’»