3, 3.7, or 4.5

The Weekly Variable

Grok 3 led the way for AI releases last week, with updates from OpenAI and Anthropic soon to follow.

Unfortunately, I still have to double check and do some coding myself but hopefully not for too long.

Topics for this week:

Full Time AI Engineer

I would consider myself a software engineer, and I fully believe AI will handle all software development eventually.

I’m looking forward to when that happens but we’re not quite there yet.

I keep seeing YouTube titles and X.com posts about AI building full apps that make millions of dollars, and I’m sure it’s possible with the right app and proper marketing, but so far my experience has been a little different.

I’ve been averaging 7+ hours per day building the backend of an app with Supabase and I’ve been using all kinds of AI the entire time.

Every AI model that I can remember that has contributed in someway so far:

  • Claude Sonnet 3.5

  • perplexity

  • gpt-4o

  • gpt-4o-mini

  • cursor-small

  • cursor-fast

  • o1-mini

  • o1 pro mode

  • o3-mini-high

  • Deepseek r1

  • Grok 3

  • Claude Sonnet 3.7

  • Claude Sonnet 3.7 Thinking

Without help from all of these models, I’d be way further behind and have a way simpler app if I were building it on my own, but I think leaving the AI to chain reason it’s way into building a full app with anything beyond a simple concept, you would end up very disappointed.

Claude Sonnet 3.5 pretty much built the entire appearance of the app which was great, and I think that’s what people are selling right now, the visual.

Once you go behind the visual and try to have it generate something a little more complicated like messaging between users, this is where the results become lack luster.

I’m sure you could prompt your way into something that works, but it would be no where near efficient or scalable.

Given enough cycles though, I do think AI could build you a quality app eventually, but it might take a while.

You’d need a team of AIs iterating constantly toward the goal, much like a team of developers…

Luckily I don’t think we’re too far off from one model that can be told “build this app” and it produces an enterprise level product eventually, but unfortunately that’s way off from what’s being sold right now.

Hopefully not for long!

Grok 3

Up until last week, I had been using Sonnet 3.5 as my main programming buddy.

With a paid plan in Cursor you get 500 prompts with paid models including claude-sonnet-3.5 and this month I’ve used 483 out of 500.

But since I was running out of prompts and Grok 3 had just launched, last weekend I decided to give it a shot.

So far I’ve been impressed.

Claude’s been great but it had some quirks where it would randomly change things it didn’t need to change, sometimes even after being told explicitly, “only change relevant code".

Grok 3’s Thinking mode on the other hand has been really solid.

Multiple times now I’ve explained what I needed to happen, pasted in a few files from the repo for Grok to referenced, and asked it to make the updates.

Nearly every time I’ve done that, Grok has spit out a complete file that I could copy and paste and it worked first try.

It might be a little unfair to compare them though because Grok does use reasoning with Thinking mode, so it will process the answer for up to 2 minutes before printing it out while Claude 3.5 gives you an answer right away.

But after a few positive attempts, I found myself using Grok all week to figure out how to build and interact with database tables instead of Claude or o3-mini-high.

I can see why Grok is currently in first place on the LLM leaderboard from a programming perspective anyway.

Seems to be my number 1 right now, but it’s been a busy week…

Claude Sonnet 3.7

Grok 3 released last week and so far so good, but it seemed to kickoff a few major updates in the AI world.

Anthropic released Claude Sonnet 3.7, Claude Sonnet 3.7 Thinking, and Claude Code.

Sonnet 3.7 added thinking into the base model so it now has some built-in reasoning to begin with, but also has a “thinking” option to allow it more time to reason on the problem first.

Nearing my Cursor cap, I only tried Claude a few times and it does seem good, it’s fixed a few things that I asked it to fix pretty reliably, but still seems to have the tendency from 3.5 to randomly update code that it shouldn’t have.

The full “thinking” version was a little hit or miss.

The first time I tried it, it nailed the fix and I was really impressed, but the next few times it changed a random file that had nothing to do with what I asked, and the other time it failed trying to search the web for an answer and just completely quit.

To be fair, this could be more of a Cursor integration than a Claude thing so I’d like to give it another shot.

Luckily my cap should refresh tomorrow so I’ll further push Claude to compare it with Grok next week.

And I’m not sure if I have access to Claude Code, but it’s supposed to be really helpful, “especially for test-driven development, debugging complex issues, and large-scale refactoring” which be amazing.

I’ll be anxious to see if Claude dethrones Grok next week.

GPT-4.5 Preview

Have to wonder if this was inspired by Grok’s release, but last night I got an email from OpenAI that they released a preview version of GPT-4.5.

It’s way more expensive than their previous models and it’s not a reasoning model which is interesting.

The current trend seems to be reasoning models, which OpenAI helped start, but now they’re pushing out a non-reasoning, more expensive model.

Preview is probably a keyword here.

They do tend to release something, then massively drop the prices within a month, so I’m sure they will be cutting the cost on this model as they figure out how to further optimize it, but it’s surprising to see something priced at $68 per 1 million tokens when o1 is $15 per 1 million and o3-mini is $1.10 per 1 million.

More than 4 times their currently most expensive model and it doesn’t have reasoning.

I’ve only prompted it twice so far, and it did provide a really solid answer without needing to reason through it which was pretty impressive, but that’s it so far.

I’m sure I’ll be chatting with it more this weekend so we’ll see where it stands next week as well.

Everyday I fight the urge to build a nice custom app to chat with all of these AI models at once…

Majorana 1

Since compute power is going to be a limitation for all of these increasingly complex AI models, Microsoft threw their new quantum computing chip in the ring to eventually help out with the Majorana 1.

Much like Google’s Willow, it is a self-contained, scalable quantum computing chip.

This version only has 8 qubits, which is why the article doesn’t talk about calculations older than the universe like Willow, but once the Majorana chip reaches 100 or more qubits, that will certainly be a talking point.

Microsoft has found a completely different way to scale to 100+ qubits with the majorana particle.

Other quantum processing typically puts existing particles into a quantum state of “in-between” so they can become entangled and hopefully reliably and predictably be pulled out of the quantum state to produce an answer.

Instead, Majorana 1 basically forces the exotic majorana particle into existence because it can’t otherwise be found in nature.

In physics there’s a theory of symmetry where most particles have an anti-particle with the opposite charge.

Electrons have a negative charge, and their opposite, anti-electrons (positrons) have a positive charge.

If they meet, they annihilate each other, balancing out the charge.

The majorana fermion is different because it is both its particle and anti-particle, the charges are not opposite but neutral.

Microsoft was able to leverage this property to create a more stable quantum state that’s harder for matter to interact with, which is the normal problem for quantum computing, but also harder to achieve.

The chip has to very precisely create this particle state, but they found a way to do so that could be scaled to potentially 1 million qubits.

Hard to even imagine what that could mean, but we’re well on our way to the quantum computing age with more than one stable chip now in existence.

Links to the Microsoft article above, and Fireship explains this way better than I could down below:

And that’s it for this week! A busy week for AI and always exciting to see quantum computing updates.

Those are the links that stuck with me throughout the week and a glimpse into what I personally worked on.

If you want to start a newsletter like this on beehiiv and support me in the process, here’s my referral link: https://www.beehiiv.com/?via=jay-peters. Otherwise, let me know what you think at @jaypetersdotdev or email [email protected], I’d love to hear your feedback. Thanks for reading!