Tech SEO for AI Agents

By Ves Ivanov
6 min read·1,359 words

AI agents are no longer hypothetical. OpenClaw an open-source personal assistant—hit over 150k GitHub stars in weeks. People are using it to manage email, build software, book flights, and automate workflows across dozens of services.

These agents don't browse your website like we do. They don't even browse like traditional crawlers. They want to act, and that changes everything about how we should think about building for them.

Two Types of Bots, Two Different Problems

Static crawlers (ClaudeBot, GPTBot, Googlebot) fetch your pages via HTTP, parse the HTML, and extract text. Many don't execute JavaScript; they don't click buttons. They read.

Interactive agents (OpenClaw, browser-use, custom automation) control a browser or call APIs. They navigate, click, fill forms, submit data. They do things on behalf of users.

Traditional tech SEO—clean HTML, semantic markup, crawlable URLs—serves the first group. This article focuses on the second: agents that need to do things on your site (submit forms, book, update data), not just read and index it.

The good news: many of the same principles apply. Let's dive in

The Two Paths

Agents interact with your site in two ways:

UI automation: The agent controls a browser, clicks buttons, fills forms, scrapes results.

Structured access: The agent calls an API, connects via MCP, or reads a feed. Clean data in, clean data out.

UI automation works when no API exists, and sometimes it's the only option, but it's brittle. Agents are interpreting layouts and copy meant for humans, not structured data for programs.

Structured access is what agents prefer. OpenClaw connects to dozens of services through MCP, not screen scraping. When an agent books a flight or updates a calendar, it's calling an API, not clicking through a UI.

The opportunity: If you give agents a clean, structured way to interact with your service, they'll use it. If you don't, they'll scrape your UI, or skip you entirely.

Structured Access

When to build what

Access type Best for Effort
MCP server Interactive services, actions, real-time data High
REST API Complex queries, CRUD operations Medium
JSON feed Content updates (blog, products, events) Low
RSS/Atom Chronological content Very low

MCP servers

MCP (Model Context Protocol) is a standard that lets AI agents connect to external services through a unified interface. Instead of building custom integrations for every agent, you build one MCP server and any MCP-compatible agent can use it.

An MCP server exposes three building blocks: Tools (actions: send email, create task, query database), Resources (read-only data: documents, records, state), and Prompts (pre-built interaction templates). You can illustrate this with a diagram from the MCP server concepts docs e.g. a screenshot of the "Core Server Features" table or a simple three-branch sketch.

When MCP makes sense:

  • Your service has actions, not just content
  • You want agents to interact in real-time
  • You're building for the OpenClaw/agent ecosystem specifically

For implementation details, see Anthropic's MCP documentation.

REST APIs

If you already have an API, you're ahead. The question is whether it's agent-friendly.

Agent-friendly APIs need to be more explicit than what human developers can tolerate: agents can't infer or improvise the way a developer reading docs can.

  • Predictable response shapes. Same structure every time, no surprises.
  • Explicit errors. {"error": "email_required", "field": "email"} not {"error": "Something went wrong"}.
  • Clear pagination. Cursor-based or explicit next/prev links. No "figure out the page parameter" puzzles.
  • Documented rate limits. Agents will hit them. Tell them what they are.
  • No browser-based auth flows. API keys or OAuth with clear token endpoints.

JSON and RSS feeds

The lowest-effort option for content-based sites.

JSON Feed (jsonfeed.org) is the modern alternative to RSS:

  • Native JSON - no XML parsing
  • Clean, predictable structure
  • Easy to generate from any backend

What to include:

  • Full content (not just summaries) when practical
  • Timestamps for updates
  • Stable IDs for each item
  • Clear content types

The point is to offer structured feed rather than forcing agents to scrape HTML.

Agent-Friendly Documentation

Humans skim docs and experiment. Agents need everything spelled out precisely.

Example sites that get it right: Stripe API docs (comprehensive, consistent, every edge case documented), GitHub API (clear structure, good examples), and most headless CMS platforms (content via API by default).

What to include

  • Every endpoint, every parameter, every response field
  • Request/response examples for every operation
  • Error responses with codes and meanings
  • Auth flow, step by step
  • Rate limits and quotas, explicitly stated

Format

Markdown is preferred. LLMs parse it cleanly. OpenAPI specs work for structured tooling.

Example

Bad:

Pass the user ID to get user info.

Good:

GET /users/{id}

Response: { "id": string, "name": string, "email": string }

Errors:
- 404: User not found
- 401: Missing or invalid Authorization header

Requires: Authorization: Bearer <token>

Content Negotiation

You can serve the same page in different formats depending on who's asking. That's content negotiation: same URL, different response based on the client's Accept header. Standard HTTP.

A browser sends Accept: text/html and gets your webpage. An agent sends Accept: text/markdown and gets clean, structured content—no nav, no ads, no chrome.

Implementation

Your middleware checks the Accept header:

if "text/markdown" in request.headers.get("Accept", ""):
    return markdown_response(content)
return html_response(content)

Strip everything except the content itself. Preserve structure—headings, lists, links.

The agents.md pattern

Agents need one predictable place to discover what your site does and how to use it. A dedicated file at /agents.md gives you that: a single URL where you document capabilities instead of leaving agents to guess.

Create it at /agents.md with:

  • What your site/service does
  • What agents can do here
  • Links to API documentation
  • Authentication requirements
  • Rate limits

Think of it as robots.txt for capabilities, not restrictions.

See vesivanov.com/agents.md for a working example.

Note: This pattern is still experimental. There's no established standard yet—llms.txt was proposed but didn't gain traction. The concept is sound.

Discoverability

You've built structured access. Now agents need to find it.

Sitemap

Include machine-readable resources:

  • /agents.md
  • API documentation URLs
  • Feed endpoints

Consider a separate sitemap for agent-facing resources.

Homepage signals

Link to agents.md from your homepage. A footer link probably works too. So does:

<link rel="alternate" type="text/markdown" href="/agents.md" title="Agent instructions">

Crawlable and Automation-Friendly HTML

Not every site can offer an API, and not every agent will use one. Many agents, and all static crawlers, still rely on your HTML. The same basics that make pages crawlable (so crawlers and simple tools see content) also make them work with browser automation when an agent drives a real browser.

When this matters

  • Legacy systems without APIs
  • Third-party sites you don't control
  • Agents that default to browser-based interaction before checking for structured alternatives
  • Any site that hasn't implemented structured access yet (most of them)

If you do nothing else, getting these basics right means crawlers can read your content and agents can still operate your site when they fall back to the UI.

Server-side rendering and the JavaScript problem

Static crawlers and many automation tools can't execute JavaScript. If your content only exists after client-side rendering, they see an empty page.

Check what agents actually see:

curl https://yoursite.com/page

If the response is just <div id="root"></div> and script tags, your content is invisible to crawlers.

Solutions:

  • Server-side rendering (SSR)
  • Static site generation
  • Hybrid approaches (Next.js, Nuxt, etc.)

The same page that works in a browser might be completely empty to a bot. Test with curl, not DevTools.

Semantic HTML

Use "real" elements, not JS divs:

<!-- Bad: agent can't identify this as a button -->
<div class="btn" onclick="submit()">Submit</div>

<!-- Good: semantic button -->
<button type="submit">Submit</button>

Agents understand roles (button, textbox, link) and names (labels). Semantic HTML speaks their language.

Use <button>, <a href>, <input>, <select>. Use <main>, <nav>, <article> for page structure. Don't skip heading levels.

Labels on everything

Agents identify form fields by their labels, not their position on screen.

<!-- Bad: agent can't identify this field -->
<input type="text" placeholder="Email">

<!-- Good: explicit label -->
<label for="email">Email address</label>
<input type="email" id="email" name="email">

Every input needs a label. Every button needs clear text. If it's interactive, it needs a name.

Stable selectors

Add data-testid attributes for automation hooks:

<button type="submit" data-testid="checkout-button">Complete Purchase</button>

Class names change. IDs get refactored. data-testid is an explicit contract with automation tools.

Real navigation

Links should be links:

<!-- Bad: not a real link -->
<span onclick="navigate('/about')">About</span>

<!-- Good: real link -->
<a href="/about">About</a>

Clear success and failure states

Agents need to know if an action worked. Transient toasts that disappear after 3 seconds don't cut it.

<!-- Bad: disappearing toast -->
<script>showToast("Success!", 3000);</script>

<!-- Good: persistent status -->
<div role="status" class="success-message">
  Order submitted. Confirmation #12345.
</div>

Show explicit, persistent feedback. Include relevant details (confirmation numbers, next steps). Make errors specific and actionable.

How to Test

Can someone using curl and a text editor understand your page structure? Can Playwright script your core user flows without fragile workarounds?

If not, agents will struggle too.

We're Early

Some of the patterns in this article: MCP servers, agents.md, content negotiation for AI—are emerging, not established. Six months from now, the specifics may look different.

But what won't change is: agents are becoming real users of the web. They want structured data over screen scraping and they need explicit documentation. Don't forget, this is just the beginning.