Amazon's New Nova AI Models 🔥

Change image styles with GPT-4o

Welcome to another edition of Horizon AI,

Amazon is showcasing its take on a more conversational voice model to compete with Google’s Gemini Live and OpenAI’s Advanced Voice Mode, along with an update to its AI video model.

Let’s get into it!

Read Time: 4.5 min

Here's what's new today in the Horizon AI

  • Amazon Launches Nova Sonic Voice Model

  • New AI Paper Generates 1-Minute Long Tom & Jerry Clips With Simple Text Prompts

  • AI Tutorial: Change image styles with GPT-4o

  • AI Tools to check out

  • The Latest in AI and Tech 💡

  • AI Findings/Resources

AI News

AMAZON

Amazon Launches Nova Sonic Voice Model

Amazon has debuted a new generative AI model, Nova Sonic, capable of natively processing voice and generating natural-sounding speech.

Details:

  • The Nova Sonic model is designed to enable third-party app developers to build real-time, naturalistic conversational voice interactivity into their products using Amazon’s web platform, Bedrock.

  • The model reportedly matches the performance of leading speech models from OpenAI and Google in key metrics such as speed, speech recognition, and call quality, while offering an 80% lower cost compared to OpenAI's GPT-4o.

  • Alongside Nova Sonic, Amazon introduced Nova Reel 1.1, an updated video generation model that delivers improved quality, reduced latency, and the ability to maintain consistent visual styles across multiple six-second scenes—allowing for the creation of coherent videos up to two minutes long.

Developers can access both models through Amazon's Bedrock platform, and components of the Sonic model are already incorporated into the new Alexa Plus assistant.

AI RESEARCH

New AI Paper Generates 1-Minute Long Tom & Jerry Clips With Simple Text Prompts

Researchers from NVIDIA, Stanford University, UC San Diego, UC Berkeley, and the University of Texas at Austin have developed a method for generating longer, more coherent AI videos that can tell complex stories.

Details:

  • By incorporating Test-Time Training (TTT) layers into pre-trained Transformer models, they enabled the generation of videos up to one minute long, a substantial increase from previous limitations of 8 to 20 seconds.

  • Traditional Transformer models face challenges with longer videos due to their self-attention mechanisms, which require each element to relate to every other element, leading to quadratic increases in computational demands.

  • The introduced TTT layers address this by adding mini neural networks that learn during the video generation process, enhancing memory retention and consistency across longer sequences.

To demonstrate this technique, the team applied it to generate extended sequences of Tom and Jerry cartoons, producing coherent clips up to one minute in length, which you can check out on the project page. This progress opens new possibilities for AI in entertainment and other domains requiring extended video generation.

AI Tutorial

Change image styles with GPT-4o

  1. Go to Chatgpt and choose ‘GPT-4o’ as your model.

  2. Upload your image

  3. Use the prompt: Recreate this image with a style: [Insert style]

Some options to try:

  • Recreate this image with a style: Studio Ghibli

  • Recreate this image with a style: Pixar

  • Recreate this image with a style: Dragon Ball

  • Recreate this image with a style: Lego

  • Recreate this image with a style: Hand-knitted doll

  • Recreate this image with a style: Funko Pop

  • Recreate this image with a style: Rick and Morty

  • Recreate this image with a style: Hanna Barbera

  • Recreate this image with a style: Manga

  • Recreate this image with a style: Simpsons

  • Recreate this image with a style: South Park

  • Recreate this image with a style: Gothic Stop Motion

  • Recreate this image with a style: Barbie

AI Tools to check out

🗣 EverTutor Live: AI-powered voice tutor that teaches, adapts, and interacts.

 DreamActor-M1: Upload your image and watch it come to life with our state-of-the-art animation technology.

💥 Paragon: All-in-one platform to build, ship, and manage product integrations.

🤖 Devin: A collaborative AI teammate built to help ambitious engineering teams achieve more.

 GitSummarize: Turn any GitHub repository into a comprehensive AI-powered documentation hub.

AI Findings/Resources

👨‍💻 ‘Don’t study coding’ says Replit CEO

📷 5 ways to use Gemini Live with camera and screen sharing

👉 Tech’s big anxiety: fewer jobs, lower pay, more AI

The latest in AI and Tech

Thinking Machines Lab has brought on former OpenAI leaders Bob McGrew and Alec Radford as advisors. They join other ex-OpenAI figures, including co-founder John Schulman and former post-training lead Barret Zoph.

Samsung is adding Google’s Gemini AI to its home robot Ballie through a partnership with Google Cloud. The AI will enable Ballie to handle audio and video inputs to answer different questions.

The European Commission has presented the “AI Continent Action Plan,” which will aim to “transform Europe’s strong traditional industries and its exceptional talent pool into powerful engines of AI innovation and acceleration.”

Rumors have circulated in recent days that robotaxi company Waymo might use data from interior vehicle cameras to train AI and serve targeted ads to riders. However, a company spokesperson has clarified that there are no such plans.

That’s a wrap!

Thanks for sticking with us to the end! Let’s stay connected on LinkedIn and Twitter.

We'd love to hear your thoughts on today's email!

Your feedback helps us improve our content

Login or Subscribe to participate in polls.

Not subscribed yet? Sign up here and send it to a colleague or friend!

See you in our next edition!

Gina 👩🏻‍💻