Article 6VEGX I Tested Grok 3, and It's Not Worth the Price Hike

I Tested Grok 3, and It's Not Worth the Price Hike

by
Khamosh Pathak
from Lifehacker on (#6VEGX)

Earlier this week, xAI released Grok 3, the company's most advanced AI yet, complete with a reasoning model and a DeepSearch feature. The company claims that it's the "world's smartest AI," and Elon himself says it's "outperforming anything that's been released" so far. But is it really the "maximally truth-seeking AI" Musk says it is?

Well, to spoil it for you, no. Not yet. Which is a shame, because Grok is expensive- beyond a limited free trial, it requires either a $40/month X Premium+ subscription, up from $22 thanks to the new model, or a $30/month SuperGrok subscription.

From both my testing as well as experiments from experts, I'm having trouble believing the "based" AI is worth that cost. There is no next-generation breakthrough or groundbreaking reasoning model that we haven't already seen before here. Grok 3 also still periodically hallucinates, like any other AI model out there, but that's not to say it hasn't improved.

In X's own benchmark tests, Grok 3 is beating basically every model out there except OpenAI's upcoming o3 model. But from a user standpoint, an AI app goes way beyond benchmarks.

A good AI chatbot is a mature, well-rounded product. Having spent my own money to test this out, I just don't feel like I'm getting that here, especially when the competition offers similar or even better products for much less.

Grok 3 has technically caught up

It's best to leave Elon's outlandish claims aside when evaluating Grok 3. Seeing it objectively, it's impressive that Grok 3 has caught up to being on the frontier of AI power, and surprisingly quickly (Grok 2 was never in the big leagues).

Grok 3 was trained using 200,000 Nvidia H100 GPUs, and uses more than 10 times the compute as Grok 2. All that power means gains. Grok 3 is now quite fast, and plenty usable for regular day-to-day tasks. The regular responses are quick, though the Think feature (which gives slightly more detailed responses) regularly takes around 2 minutes to come back with an answer, so be prepared to wait it out.

Plus, it can do deep research using web sources, and has a specific reasoning model, too. That means it can spit out lengthy reports and break prompts down into step-by-step processes so it can self correct. OpenAI's o3 model, set to release in full soon, still surpasses Grok 3 in benchmarks, but it's a significant improvement over its predecessor.

This Tweet is currently unavailable. It might be loading or has been removed.

But while the charts say Grok 3 is supposed to outperform ChatGPT, Gemini, and Sonnet in compute-heavy tasks related to math, science, and coding, initial reports from experts don't exactly encourage confidence.

For instance, X user, AI CEO, and YouTuber Theo Browne compared responses to a coding challenge between Grok 3, o3-mini, and Claude 3.5 sonnet, and Grok 3 performed quite miserably, failing to run without bugs for more than a few seconds.

This Tweet is currently unavailable. It might be loading or has been removed.

Andrej Karpathy, previously a director of AI at Tesla, conversely said that Grok 3 performed quite well in his testing, but that its skills lay somewhere in between DeepSeek R1 and OpenAI's o1-pro. Certainly not class-leading, and nothing that you can't already do with existing tools.

But one test, even a couple of them, can't really determine how an AI model performs. I did have some luck with it myself, but mostly for more lightweight tasks. It can be helpful when researching which new air purifier to buy, for example, or when casually learning about a new subject. But that's not exactly something I'm willing to bust open my wallet for.

Grok isn't "based," it's actually quite boring

Before Grok 3 launched, Musk made a big deal about how "based" it is. If you don't know what based means (lucky you), it's a slang term for, essentially, sharing your opinion without regard for others. As an example, Musk shared a screenshot showing a provocative response from Grok where it called tech publication The Information "garbage", among other insults.

This Tweet is currently unavailable. It might be loading or has been removed.

But when I asked the same question, it came back with a nuanced, balanced response, not calling out The Information for much of anything. The only criticism it had was that the website "can sometimes feel a bit niche or overly Silicon Valley-centric" and "Bias-wise, it leans pragmatic rather than ideological". That's a pretty timid take, if you ask me.

images-2.fill.size_2000x1064.v1740076958.png Credit: Khamosh Pathak

I got similar results in other tests. Grok wouldn't take a side in the Justin Baldoni vs. Blake Lively lawsuit. And when I asked a political question like "Why did Kamala Harris lose the US presidential election," I got an equally subdued answer, citing "economic frustrations." Reporting from Axios is matching what I've found, too.

images-3.fill.size_2000x921.v1740076958.png Credit: Khamosh Pathak

Maybe Grok dialing back Elon's eccentricities is a good thing, but it certainly isn't what its master says it is.Instead, it again looks a lot like the competition.

How Deep is your Search? images-1.fill.size_2000x1357.v1740064192.png Credit: Khamosh Pathak

When it comes to DeepSearch, Grok's report generating tool works quite similarly to Perplexity's newly launched, mostly free Deep Research feature. As a humble tech journalist, this is something that I was able to test myself. I ran two queries, one for a trip that my family is planning for the end of the year, and one for an urban hybrid bike.

images-8.fill.size_2000x1051.v1740147802.png My detailed travel planning prompt for Grok DeepSearch. Credit: Khamosh Pathak

In both cases, Perplexity AI did slightly better than Grok on most tasks. With the travel question, I got essentially the same itinerary from both products, but Perplexity AI did a better job at formatting.

images-5.fill.size_2000x1553.v1740147802.png Credit: Khamosh Pathak

Grok did go above and beyond recommending other options in southern India, something that Perplexity just provided follow-up questions for. So, I have to give it props there.

images-4.fill.size_2000x1261.v1740147802.png Credit: Khamosh Pathak

When it came to shopping research, though, Grok screwed up with the top product recommendation. The product that it suggested just isn't available in India, where I live, and the other options just aren't want I was looking for.

images-6.fill.size_2000x1334.v1740147802.png Credit: Khamosh Pathak

Perplexity AI, meanwhile, surprised me with its top pick, something that I didn't know about that checks off most of my boxes. Its other options were also interesting, and it did not include anything that isn't available in India. Both Grok and Perplexity did a good job of explaining what I should look for when buying an urban bike, so equal points there, but the latter was just much more usable.

images-7.fill.size_2000x1831.v1740147802.png Credit: Khamosh Pathak

Based on my testing, I feel like Perplexity AI still has an edge over Grok 3 when it comes to Deep Research that's actually useful to the average person. Whether it's planning a trip, shopping research, or understanding news or concepts, Perplexity does a more nuanced job. When it comes to sheer speed, Grok is faster and isn't afraid to provide links in the text itself, but in Perplexity, clicking linked text actually expands on the subject in the report.

Perplexity also has more export options. You can download your report as a PDF, in Markdown, or create a shareable page (here's my report for the urban cycle research if you're interested). In Grok, all you can do is copy the text.

What does all that mean? Well, while Grok is certainly usable, it's a bit disappointing to see its paid offering fail to keep up with a free alternative. That's something I feel I keep bumping into here.

Grok 3 isn't worth the price of admission

Right now, we are in the middle of the Grok 3 hype cycle. Grok 3 itself is improving every day, but as things stand, there's no need for you to run out and cancel your ChatGPT Plus or Perplexity Pro subscriptions. In many ways, Grok is good, just not that good.

If you want, you can temporarily try out Grok 3 for free, as X is allowing limited free access until its servers can't handle the load. When that period will end? Who knows. According to Musk's X account, it'll only be free for a "short time."

Additionally, aside from model performance, Grok 3 also lacks some of the features of a more established AI app. There's no voice mode, and all you have access to right now is the full Grok 3 model. The faster Grok 3 mini is still to be released, and there's no API for Grok 3, either.

When you consider the pricing for full access, Grok 3 makes even less sense. $40 a month for the X Premium+ plan is double the industry standard of $20 for Gemini Advanced, ChatGPT Plus, and Perplexity Pro. And once that free trial period is over, the expensive X Premium+ plan will be the only way to access Grok 3 until the $30 SuperGrok subscription goes live for everyone (the SuperGrok plan only provides you with access to Grok 3, but none of the premium X features).

And as it stands, you aren't really getting double the money's worth. In fact, in a lot of cases, you can get by using a free model like DeepSeek R1 instead (though, you might have a better experience using it through a third-party app).

External Content
Source RSS or Atom Feed
Feed Location http://lifehacker.com/rss
Feed Title Lifehacker
Feed Link https://lifehacker.com/feed/rss
Feed Copyright Copyright 2025 Ziff Davis, LLC. All Rights Reserved.
Reply 0 comments