jaypeters.dev
Posts
Google's AI Lead and 1 to 3

Google's AI Lead and 1 to 3

The Weekly Variable

Jay Peters
December 20, 2024

The Weekly Variable

It’s hard to keep up with everything happening in AI and the scary part is that it’s only going to get faster.

Here is a huge update on all things AI from Google and OpenAI:

Dethroned Already
Google’s On Top
More Than Video
Google’s Genesis
The Next Leap

Dethroned Already

Surprisingly, or maybe unsurprisingly, Sora has already been dethroned as the top text-to-video model.

Google revealed Veo 2 this week and it is noticeably more realistic than the videos showcased by Sora.

It’s weird to keep saying that videos are looking more and more realistic, but Veo 2 has a more grounded appearance while Sora tends to look more stylistic.

I couldn’t quite put my finger on it before, but now that I’ve seen more consistently lifelike results from Veo 2, it’s more obvious that Sora is less consistently realistic.

I may not be the only one to think that though, according to Google’s results from Meta’s MovieGenBench video platform, users preferred Veo 2 outputs nearly 50% of the time over Sora and other results when shown more than 1003 videos.

But more importantly, videos generated by people on X.com that already have access to Veo 2 are showcasing realistic results for themselves, results not curated by Google.

I tested Sora vs. the new Google Veo-2.
I feel like comparing a bike vs. a starship:
— Ruben Hassid (@RubenHssd)
2:14 PM • Dec 17, 2024

Sora was incredibly impressive when it first debuted in February and it’s insane to think that since then, in less than a year, at least 5 other competitive models have sprung up, and some have even surpassed Sora in capability.

It may be unsurprising that Google has pushed past OpenAI in video capability considering they own one of the largest video platforms on the planet with YouTube.

And rumor has it, Google has been playing with AI concepts for years, even being the birth place of the transformer architecture that has paved the way for this AI boom.

Google and Alphabet have enough resources to remain one of the biggest players in the Artificial Intelligence game, but more on that later.

For now I’ll be waiting for my email from Google that I have access to Veo 2.

So long Sora! You had a great 1 week run.

Google’s On Top

I have to give credit to Wes Roth’s video for pointing out how Google is really pushing ahead in the AI race.

Recently Google has been topping the AI benchmarks in a number of categories, and I’m ashamed to admit I haven’t been paying attention.

I think it’s because their rollout of the various versions of Gemini has been so scattered.

The few times I’ve used Gemini in something like Google Docs, the results have been lack luster, although it does seem like Google Search summaries are getting better.

I still tend to search with Perplexity first.

MidJourney is my go-to for image generation.

And I’ve been using a combination of o1 models and Sonnet 3.5 for coding and general technical advice, mostly because that’s what’s available in Cursor right now, but maybe it’s time to roll Google’s AI into the mix.

Veo 2 is incredibly impressive in terms of producing video, better than Sora or any other video model
Imagen 3 looks on par or better than MidJourney or Flux in terms of creating realistic and accurate images
And as of right now, 2 Google AI models are at the top of the LLM ChatBot Arena Leaderboard

Gemini in the top 2 spots

Beyond that, Google is working on additional projects:

mobile integrated personal AI assist with Pixel 9 and Project Astra
extending the personal assistant idea to the physical world with augmented reality through Android XR and smart devices similar to Ray-Ban Meta Smart Glasses
Jules, an AI coding assist
Project Mariner, a browser-based research assistant and web agent
Deep Research, a more thorough, reasoning based research assistant similar to OpenAI’s o1 models

And of course, the NotebookLM podcast feature, which you can now chat with!

After generating the podcast, and enabling the interactive feature in NotebookLM, you then can click a button to ask a question and the automated podcast voices will recognize your input and smoothly add their response to the conversation before returning back to discussing whatever you had them focus on for the deepdive.

It’s hard to keep track of all the AI projects going on right now, but given that impressive list, I will certainly be keeping a better eye on what Google is offering, looking for opportunities to use those tools in my regular work flow.

Google’s Genesis

As if Google wasn’t doing enough, they also have been pushing the boundaries of real world robotic training, similar to what Nvidia’s Ominverse project hopes to achieve.

Google has an open source project Genesis, or a “generative and universal physics engine for robotics and beyond” taking the “world model” concept from above and stretching it way past entertaining videos.

What’s really crazy is that you can potentially run this model on your personal computer if you happen to have a rather expensive RTX 4090 video card.

This project is listed as multiple things including:

A universal physics engine re-built from the ground up, capable of simulating a wide range of materials and physical phenomena.
A lightweight, ultra-fast, pythonic, and user-friendly robotics simulation platform.
A powerful and fast photo-realistic rendering system.
A generative data engine that transforms user-prompted natural language description into various modalities of data.

Another project I had no idea Google was working on but it’s readily available for download, and ready to apply a general understanding of the physical world to whatever you want to prompt it with.

I might have to finally invest in a robot to really start testing some of these ideas out.

The future is here folks and it’s only going to get wilder.

The Next Leap

I’m glad to see Google making huge strides in AI.

For a time seemed like Google was in maintenance mode, prioritizing profit over building new and better technologies.

But OpenAI came along with something that you could ask a question as you would ask another person, and it would not only understand, but provide a person like answer, something that Google had failed to properly accomplish first.

And it seemed like OpenAI shook Google to it’s core.

Suddenly they might lose their technological dominance.

They were forced to make progress by a single company’s technological breakthrough, and now Google is producing all kinds of progress as a result.

But OpenAI is also continuing to push forward.

Their 12 Days of Shipmas has produced some amazing features.

Here’s a list generated by Perplexity:

Day 1: Full release of OpenAI's o1 model and ChatGPT Pro.
Day 2: Launch of a Reinforcement Fine-Tuning Research Program.
Day 3: Introduction of Sora, a text-to-video AI tool that generates clips from user input.
Day 4: Expansion of ChatGPT's Canvas feature to all users with new functionalities.
Day 5: Integration of ChatGPT into Apple devices (iPhone, iPad, macOS).
Day 6: Launch of advanced voice features, including a Santa-themed voice mode.
Day 7: Introduction of Projects in ChatGPT for organizing and customizing conversations.
Day 8: Enhanced web search capabilities in ChatGPT with citations.
Day 9: Developer-focused updates including improvements to the Realtime API and new fine-tuning methods.
Day 10: Launch of the "1-800-CHATGPT" service for user interaction via phone.
Day 11: Video and screen-sharing features in ChatGPT's Advanced Mode.
Day 12: Unveiling of OpenAI's latest reasoning model, o3, designed for advanced problem-solving in math and science.

Not only did they just release access to o1 this month, but they also previewed their next model, o3.

Today, we shared evals for an early version of the next model in our o-model reasoning series: OpenAI o3
— OpenAI (@OpenAI)
7:16 PM • Dec 20, 2024

OpenAI claims that o3 reaches even closer to the dream of AGI, an entity that reasons like a human.

o1 already is nearing that idea, it handles vague direction much more thoroughly than any of the other models I’ve chatted with, but there’s still a ways to go.

I think the goal for Artificial General Intelligence is that you could give it a very broad instruction like “go discover new physics” and it go do research on it’s own, eventually coming back with one or more new discoveries in physics, ideally without the aid of any humans, or maybe some human help with physical world access if it needed.

I’m not sure o3 is there yet though, but it’s working toward that goal.

It will have the primary model, o3, but then can delegate various tasks to 3 versions of o3-mini (low, medium, and high reasoning) which can be used depending on the complexity of the task. The more complex the task, the higher the reasoning required.

Then full o3 model will have the highest amount of reasoning power so that it can evaluate the results from o3-mini.

According to an AGI benchmarking test, o3 has well outperformed all previous versions of OpenAI’s models.

New verified ARC-AGI-Pub SoTA!
@OpenAI o3 has scored a breakthrough 75.7% on the ARC-AGI Semi-Private Evaluation.
And a high-compute o3 configuration (not eligible for ARC-AGI-Pub) scored 87.5% on the Semi-Private Eval.
1/4
— ARC Prize (@arcprize)
6:07 PM • Dec 20, 2024

I would imagine the tests are rather complicated since it took more that $1000 of compute power for the o3 to try to complete the task, but it completed 88% of which is more than double the progress of o1 at it’s highest capacity.

o3 is set to release at the end of January and I’m hoping it will be available to ChatGPT Pro memberships so I can throw complex tasks at it, just maybe not $1000 tasks.

And that’s it for this week! Google is moving to dominate the AI space, but OpenAI remains a leader.

Those are the links that stuck with me throughout the week and a glimpse into what I personally worked on.

If you want to start a newsletter like this on beehiiv and support me in the process, here’s my referral link: https://www.beehiiv.com/?via=jay-peters. Otherwise, let me know what you think at @jaypetersdotdev or email [email protected], I’d love to hear your feedback. Thanks for reading!

Google's AI Lead and 1 to 3

The Weekly Variable

Dethroned Already

Google’s On Top

More Than Video

Google’s Genesis

The Next Leap