Schema Markup Does Not Buy You AI Citations (Here Is What the 1,885-Page Data Actually Shows)
Walk into almost any “GEO” or “AI SEO” sales call right now and you will hear the same promise: bolt on enough JSON-LD and the AI engines will start citing you. The numbers get thrown around with suspicious confidence; “36% more likely to be cited”, “60% visibility loss without schema”. Ask for the source and the room goes quiet, because there isn’t one. These figures are invented, repeated until they sound like consensus.
So let us do the thing the schema vendors avoid: look at actual data.
What the largest test to date found
Ahrefs tracked 1,885 pages that added JSON-LD schema between August 2025 and March 2026, and matched them against control pages with similar citation histories that never touched schema. This is the closest thing the industry has to a controlled experiment on the question.
The result, across three AI platforms:

- Google AI Mode citations moved +2.4%
- ChatGPT citations moved +2.2%
- Google AI Overviews moved -4.6% (yes, negative)
Those are not the numbers of a ranking lever. Those are the numbers of noise. If adding schema were the citation cheat code it is sold as, you would not see a platform go down after implementation.
The “3x more likely” trap
Here is where most people get fooled, and where the vendors get their ammunition. In the same dataset, pages that were cited by AI were almost three times more likely to have JSON-LD than pages that were not cited. Sounds damning, right? Implement schema, triple your odds.
No. That is correlation wearing a causation costume.
Schema markup tends to live on better-maintained, more technically sophisticated sites; the same sites that publish stronger content, build more authority, and earn more links. AI systems retrieve that content. The schema is a symptom of a competent site, not the cause of the citation. Confusing the two is the single most expensive mistake in the GEO space right now, because it sends people optimising the marker instead of the thing the marker happens to sit next to.
If you have read our breakdown of how topical authority actually compounds, this pattern will look familiar: the visible artifact gets the credit that belongs to the underlying work.
The caveat the headline writers ignored
There is a detail in the study that matters enormously and almost nobody quoted: every page in the dataset already had more than 100 AI Overview citations before any schema was added.
So the study answers a precise question: does schema boost pages that are already visible to AI?
No, not meaningfully. What it does not test is whether schema helps an invisible page become visible in the first place. That is a real open question, and an honest practitioner should say so rather than overclaiming in either direction. As of mid-2026 there are zero peer-reviewed studies settling it. Anyone quoting a precise percentage is selling, not citing.
What schema actually does (and it is not nothing)
Dismissing schema entirely would be its own form of amateurism. Structured data has real, documented jobs:
- Entity disambiguation. It tells search engines that “Mercury” is the planet, the element, or the car, and connects your brand to a known entity in the Knowledge Graph.
- Rich result eligibility. Review stars, FAQ accordions, product price and stock, recipe cards, breadcrumb trails. These are SERP features, not AI citations, and they still drive clicks.
- Machine-readable facts. Clean Product, Article, and Organization markup removes ambiguity about price, author, and publish date.
None of those are “make the content worth citing.” That is the part schema cannot do, and the part the pitch quietly skips.
What actually drives AI citations
If schema is downstream of site quality, then the work is upstream. The levers that the data and the platforms’ own behaviour point to:
- Being the original source. Proprietary data, first-hand testing, a framework or benchmark nobody else has. AI engines have a reason to cite the page that contains the fact, not the tenth rewrite of it.
- Answer-first structure in raw HTML. A direct answer in the opening lines that is present in the initial HTML payload, not injected by JavaScript a retrieval bot will never execute.
- Topical authority and links. The same authority signals that have always separated cited sources from the crowd.
- Freshness. AI engines weigh recency; a 2024 page with no updates loses to a maintained 2026 one.
Stop trusting vendor stats (blindly). Test it on your own corpus.
You do not have to take Ahrefs’ word or mine. Run the experiment on your own site:
- Pick 20 to 40 pages with a known AI-citation baseline (track via AI Overview appearances, ChatGPT/Perplexity citations, or a Share-of-Model tool).
- Add or upgrade JSON-LD on half of them. Leave the other half untouched as a control. Match the two groups on traffic and authority so you are comparing like with like.
- Change nothing else; no content rewrites, no new links, during the test window.
- Wait 8 to 12 weeks; AI retrieval indexes are slow.
- Compare citation deltas between the two groups, not before-and-after on the same group (that confounds schema with everything else that moved).
If your test shows a real lift, brilliant, you have evidence specific to your niche. If it shows what the 1,885-page study showed, you just saved yourself a quarter of misdirected effort.
The verdict
Schema markup is hygiene and an eligibility ticket for rich results. It is not an AI-citation lever, and the largest dataset we have says so plainly. The agencies quoting “36% more citations” are reading a correlation backwards and hoping you do not check. Implement schema because it disambiguates your entities and earns SERP features; do not implement it expecting ChatGPT to suddenly notice you. The thing that earns the citation is the thing schema cannot fake: content worth citing.
Discover more from WpConsults
Subscribe to get the latest posts sent to your email.
