AI-generated TLDR
A retrieval event means the page was pulled into a live workflow from a RAG system in response to a user question. A citation means the page made it to the surface through a visible link, or a downstream referral. And if an important page never gets retrieval hits at all, that is often the clearest optimization target. Then use your own logs to see what is actually happening.
AI SEO is hard to measure because the most important step is usually the least visible.
You can see crawler activity. You can track citation or a referrals but the middle of the pipeline when an AI system actually pulls your page into an answer is usually hidden.
CDN logs are one of the few places where that layer is visible. They can expose signals that sit between crawler access and a visible citation. That makes them one of the few ways to get a better read on what is actually happening.
If you see that a page gets a hit from ChatGPT-User, Claude-User, or Perplexity-User bots, that signals that your content was fetched during a user-session inside AI. You can't tell if you get cited or not, but it's a clear definitive signal that your page was part of the consideration pool.
The model to use: crawl -> retrieval -> citation
Easy way to think about AI visibility:
crawl -> retrieval -> citation
A crawl event means the page was discovered, refreshed, or indexed for some later use.
A retrieval event means the page was pulled into a live workflow from a RAG system in response to a user question.
A citation means the page made it to the surface through a visible link, or a downstream referral.
We tend to focus on the first and third but the second is the most actionable.
What retrieval hits can help you diagnose
If a URL gets repeated *-User fetches over time, that page is likely relevant, it keeps showing up in the consideration set.
If a page gets retrieval hits and never earns citations or referrals, you should start asking whether it is quotable enough to warrant an inclussion. Is the answer too buried? Is the page too broad? Does it bury the key fact under a long intro? Does it lack a clean definition, a direct comparison table, a timestamp, or a strong opening summary?
And if an important page never gets retrieval hits at all, that is often the clearest optimization target. The page may be misaligned with the way real user questions are phrased, too vague to rank as a source, or too thin to survive retrieval.
How to track retrievals
First, filter CDN logs for these user agents:
ChatGPT-UserClaude-UserPerplexity-User
Then group by URL, count repeat hits, and compare those URLs against the pages you actually want to win with.
In practice, that gives you a sensible workflow:
- Match on
*-Userin your logs. - Look for repeated patterns by URL or section.
- Then compare retrieval activity against citations, referrals, and business-priority pages.
Use docs
Use the docs to interpret what a request means if you're unsure. The vendor documentation is the best truth source. Then use your own logs to see what is actually happening.
Notes
-
In a March 2026 ChatGPT study from Airops (https://www.airops.com/report/influence-of-retrieval-fanout-and-google-serps-in-chatgpt), 548,534 pages were retrieved during answer generation, but only 15% were cited in the final response. That is exactly why CDN logs matter: a ChatGPT-User, Claude-User, or Perplexity-User hit is not telling you that you won the citation. It is telling you something earlier and often more useful — that your page made it into the candidate pool at all.
-
OpenAI explicitly separates
OAI-SearchBotfromChatGPT-User, sayingOAI-SearchBotis used to surface sites in ChatGPT search whileChatGPT-Useris used for certain user-triggered actions and is not used for automatic crawling or Search inclusion. Anthropic and Perplexity document similar distinctions between search crawlers and user-triggered fetch agents in their own bot documentation (Anthropic, Perplexity).
Notes
- Published: March 19, 2026
- Author: Ves Ivanov
- Source URL: https://vesivanov.com/blog/cdn-logs-ai-seo