(New Research Proves It)
I’ve been watching the AI visibility tracking industry explode with a growing sense of dread. Every week, another vendor slides into my inbox promising to track my clients’ “AI rankings.” Every conference has a new panel on “optimizing for ChatGPT.”
And I’ve kept my mouth shut. Until now.
New research from SparkToro just dropped, and it confirms what my gut has been screaming for months. What are these AI tracking tools selling?
Complete and utter bullshit.
The Numbers Don’t Lie
Rand Fishkin and Patrick O’Donnell ran an experiment with 600 volunteers. They asked ChatGPT, Claude, and Google AI the same questions over and over again. We’re talking nearly 3,000 prompts across 12 different topics.
Here’s what they found:
- Less than 1 in 100 chance that ChatGPT or Google AI will give you the same list of brands twice when asked the same question 100 times!!!!!
- Less than 1 in 1,000 chance of getting the same list in the same order
- List lengths jump around wildly. Sometimes you get 3 recommendations. Sometimes you get 10+. No rhyme or reason.
Read that again. It changes everything about how you should approach AI tracking.
“Ranking Position” Is Meaningless in AI Overviews
Here’s the thing.
If you’re paying for a tool that shows you “ranked #3 in ChatGPT for audience research software,” you’re being played. The data is so random that “ranking position” doesn’t exist in any meaningful way.
Think about it. You run a prompt 100 times. You get 100 different orderings. What exactly is your “rank”? It’s like trying to nail jello to a wall.
The SparkToro research puts it bluntly: these AI tools are “probability engines.” They’re designed to generate unique answers every single time. Treating them like Google search results from 2015 is provably nonsensical.
What Actually Works (If Anything)
I’m not going to sugarcoat it. My initial assumption was that AI tracking was entirely useless.
But the research surprised me on one point.
Visibility percentage (how often your brand appears across dozens or hundreds of prompts) does seem to be a reasonable metric. When SparkToro ran their headphones prompts 994 times, brands like Bose and Sony showed up 55-77% of the time consistently.
The key difference:
- BS metric: “You rank #3 in ChatGPT”
- Valid metric: “Your brand appears in 67% of AI responses for this topic”
See what I’m getting at?
One pretends precision that doesn’t exist. The other acknowledges the chaotic reality while still measuring something useful.
The Prompt Problem Nobody Talks About
Here’s where it gets worse for the AI tracking vendors.
Even when people have the exact same intent, they craft wildly different prompts. The research found a semantic similarity score of just 0.081 across 142 human-written prompts about the same topic.
That’s like comparing Kung Pao Chicken to Peanut Butter. Sure, both have peanuts. But they’re not remotely the same dish.
So even if a tool could perfectly track visibility (spoiler: they can’t), they’d need to anticipate the infinite ways real humans actually phrase their questions. Good luck with that.
What This Means For Your Budget
Let me break down what you should actually do with this information.
Stop buying:
- Any tool claiming to track your “AI ranking position.”
- Services promising to “optimize your rank in ChatGPT.”
- Reports showing you moved from position 5 to position 3 (meaningless noise)
Consider buying (with caution):
- Visibility percentage tracking, IF the vendor runs prompts 60-100+ times per query
- Tools that are transparent about their methodology
- Services that acknowledge the statistical limitations
Your sector matters too. Narrow niches (local car dealerships, B2B SaaS providers) show more consistency than broad consumer spaces (novels, headphones). The research found much tighter correlations in spaces with fewer competitors.
The Red Flags I Wish I’d Known
I’ve been in SEO since 2013. I’ve seen snake oil salespeople weaponize every algorithm update to sell garbage services.
This feels exactly the same!
Watch out for vendors who:
- Won’t publish their methodology
- Claim proprietary “AI ranking algorithms”
- Show suspiciously precise ranking data
- Can’t explain how many times they run each prompt
- Refuse to answer questions about statistical validity
As Fishkin pointed out, sketchy AI visibility “experts” could easily weaponize this randomness. Run a prompt until you get a favorable result. Screenshot it. Claim success.
I’ve seen some shit in this industry. This has all the hallmarks of the next big scam.
The Bottom Line
AI tools don’t give consistent lists of brand recommendations. Period.
The writing’s on the wall. If you want useful data, demand transparency from your vendors. Ask how many times they run each prompt. Ask about their statistical methodology. If they can’t answer, walk away.
And if someone tries to sell you on improving your “AI ranking position”? Run.
I’ve been feeding the algorithm since 1996. The algorithm keeps changing. The snake oil stays the same.
