Last active
February 6, 2026 07:56
-
-
Save vimalk78/1e13b56736a6840810e8a933754671d2 to your computer and use it in GitHub Desktop.
agent-friendly content check
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ❯ python3 ./agent_friendly_check.py | |
| ============================================================================== | |
| AGENT-FRIENDLY CONTENT CHECK | |
| Do websites already serve agent-friendly content? | |
| ============================================================================== | |
| Hypothesis: If AI agents browse the web on behalf of users, websites | |
| should offer agent-friendly content instead of HTML. This program | |
| checks 34 major websites to see if any already do. | |
| For each site, we probe for: | |
| 1. /llms.txt, /llms-full.txt — emerging standard for LLM-readable pages | |
| 2. Content negotiation — requesting JSON, Markdown, or plain text | |
| via the HTTP Accept header | |
| 3. Machine endpoints — /.well-known/ai-plugin.json, OpenAPI specs | |
| 4. RSS/Atom feeds — legacy machine-readable format | |
| 5. robots.txt AI rules — do they mention GPTBot, Claude, etc.? | |
| 6. Sitemap & API endpoints — /sitemap.xml, /api | |
| Probing docs.anthropic.com... done (15.4s) — 1: Markdown | |
| Probing platform.openai.com... done (12.5s) — 3: llms.txt, llms-full.txt, sitemap | |
| Probing ai.google.dev... done (11.1s) — 1: sitemap | |
| Probing docs.mistral.ai... done (1.9s) — 3: llms.txt, llms-full.txt, OpenAPI | |
| Probing docs.cohere.com... done (16.9s) — 4: llms.txt, llms-full.txt, text/plain, sitemap | |
| Probing docs.perplexity.ai... done (4.8s) — 5: llms.txt, llms-full.txt, Markdown, OpenAPI, sitemap | |
| Probing huggingface.co... done (6.9s) — 3: OpenAPI, sitemap, API | |
| Probing docs.aws.amazon.com... done (2.6s) — 2: llms.txt, llms-full.txt | |
| Probing learn.microsoft.com... done (7.5s) — none | |
| Probing docs.together.ai... done (10.5s) — 5: llms.txt, llms-full.txt, Markdown, OpenAPI, sitemap | |
| Probing docs.fireworks.ai... done (10.5s) — 4: llms.txt, llms-full.txt, Markdown, sitemap | |
| Probing docs.groq.com... done (10.1s) — none | |
| Probing docs.langchain.com... done (6.5s) — 4: llms.txt, llms-full.txt, Markdown, sitemap | |
| Probing docs.llamaindex.ai... done (4.0s) — none | |
| Probing docs.cursor.com... done (4.8s) — none | |
| Probing docs.python.org... done (1.1s) — 1: sitemap | |
| Probing docs.stripe.com... done (7.7s) — 5: llms.txt, JSON, text/plain, sitemap, API | |
| Probing developer.mozilla.org... done (10.9s) — 2: RSS, sitemap | |
| Probing docs.github.com... done (1.0s) — 1: llms.txt | |
| Probing vercel.com... done (10.3s) — 3: llms.txt, Markdown, sitemap | |
| Probing nextjs.org... done (2.5s) — 2: llms.txt, sitemap | |
| Probing react.dev... done (2.1s) — 2: llms.txt, RSS | |
| Probing www.bbc.com... done (3.0s) — 2: sitemap, robots.txt:AI | |
| Probing www.nytimes.com... done (4.9s) — 2: RSS, robots.txt:AI | |
| Probing www.theguardian.com... done (5.5s) — 2: RSS, robots.txt:AI | |
| Probing en.wikipedia.org... done (6.6s) — 1: RSS | |
| Probing stackoverflow.com... done (2.7s) — none | |
| Probing www.amazon.com... done (17.5s) — 1: robots.txt:AI | |
| Probing www.ebay.com... done (17.7s) — 1: robots.txt:AI | |
| Probing www.usa.gov... done (17.2s) — 1: sitemap | |
| Probing data.gov... done (19.8s) — 1: sitemap | |
| Probing www.reddit.com... done (11.0s) — 1: RSS | |
| Probing github.com... done (7.2s) — 1: JSON | |
| Probing www.cloudflare.com... done (5.5s) — 1: sitemap | |
| ============================================================================== | |
| RESULTS MATRIX | |
| ============================================================================== | |
| Site llms JSON MD TXT RSS API OAI Map | |
| ------------------------------------------------------------------------------ | |
| docs.anthropic.com - - YES - - - - - | |
| platform.openai.com YES - - - - - - YES | |
| ai.google.dev - - - - - - - YES | |
| docs.mistral.ai YES - - - - - YES - | |
| docs.cohere.com YES - - YES - - - YES | |
| docs.perplexity.ai YES - YES - - - YES YES | |
| huggingface.co - - - - - YES YES YES | |
| docs.aws.amazon.com YES - - - - - - - | |
| learn.microsoft.com - - - - - - - - | |
| docs.together.ai YES - YES - - - YES YES | |
| docs.fireworks.ai YES - YES - - - - YES | |
| docs.groq.com - - - - - - - - | |
| docs.langchain.com YES - YES - - - - YES | |
| docs.llamaindex.ai - - - - - - - - | |
| docs.cursor.com - - - - - - - - | |
| docs.python.org - - - - - - - YES | |
| docs.stripe.com YES YES - YES - YES - YES | |
| developer.mozilla.org - - - - YES - - YES | |
| docs.github.com YES - - - - - - - | |
| vercel.com YES - YES - - - - YES | |
| nextjs.org YES - - - - - - YES | |
| react.dev YES - - - YES - - - | |
| www.bbc.com - - - - - - - YES | |
| www.nytimes.com - - - - YES - - - | |
| www.theguardian.com - - - - YES - - - | |
| en.wikipedia.org - - - - YES - - - | |
| stackoverflow.com - - - - - - - - | |
| www.amazon.com - - - - - - - - | |
| www.ebay.com - - - - - - - - | |
| www.usa.gov - - - - - - - YES | |
| data.gov - - - - - - - YES | |
| www.reddit.com - - - - YES - - - | |
| github.com - YES - - - - - - | |
| www.cloudflare.com - - - - - - - YES | |
| llms = /llms.txt | JSON = Accept: application/json | MD = Accept: text/markdown | |
| TXT = Accept: text/plain | RSS = RSS/Atom feed | API = /api endpoint | |
| OAI = OpenAPI spec | Map = sitemap.xml | |
| ============================================================================== | |
| llms.txt ADOPTION (13/34 sites) | |
| ============================================================================== | |
| platform.openai.com: | |
| /llms.txt — 18,764 bytes | |
| /llms-full.txt — 1,525,306 bytes | |
| docs.mistral.ai: | |
| /llms.txt — 14,660 bytes | |
| /llms-full.txt — 991,794 bytes | |
| docs.cohere.com: | |
| /llms.txt — 102,615 bytes | |
| /llms-full.txt — 2,886,704 bytes | |
| docs.perplexity.ai: | |
| /llms.txt — 20,490 bytes | |
| /llms-full.txt — 957,176 bytes | |
| docs.aws.amazon.com: | |
| /llms.txt — 282,390 bytes | |
| /llms-full.txt — 577,989 bytes | |
| docs.together.ai: | |
| /llms.txt — 30,585 bytes | |
| /llms-full.txt — 1,233,716 bytes | |
| docs.fireworks.ai: | |
| /llms.txt — 42,853 bytes | |
| /llms-full.txt — 820,341 bytes | |
| docs.langchain.com: | |
| /llms.txt — 84,102 bytes | |
| /llms-full.txt — 7,108,302 bytes | |
| docs.stripe.com: | |
| /llms.txt — 88,769 bytes | |
| docs.github.com: | |
| /llms.txt — 1,914 bytes | |
| vercel.com: | |
| /llms.txt — 222,839 bytes | |
| nextjs.org: | |
| /llms.txt — 6,668 bytes | |
| react.dev: | |
| /llms.txt — 14,347 bytes | |
| ============================================================================== | |
| CONTENT NEGOTIATION (9/34 sites respond to Accept headers) | |
| ============================================================================== | |
| docs.anthropic.com: Markdown | |
| docs.cohere.com: Plain text | |
| docs.perplexity.ai: Markdown | |
| docs.together.ai: Markdown | |
| docs.fireworks.ai: Markdown | |
| docs.langchain.com: Markdown | |
| docs.stripe.com: JSON, Plain text | |
| vercel.com: Markdown | |
| github.com: JSON | |
| ============================================================================== | |
| robots.txt AI AGENT RULES (5/34 sites mention AI bots) | |
| ============================================================================== | |
| www.bbc.com: | |
| GPTBot: User-agent: GPTBot; Disallow: / | |
| ChatGPT-User: User-agent: ChatGPT-User; Disallow: / | |
| anthropic: User-agent: anthropic-ai; Disallow: / | |
| Claude-Web: User-agent: Claude-Web; Disallow: / | |
| Amazonbot: User-agent: Amazonbot; Disallow: / | |
| www.nytimes.com: | |
| GPTBot: User-agent: GPTBot; Disallow: / | |
| ChatGPT-User: User-agent: ChatGPT-User; Disallow: / | |
| anthropic: User-agent: anthropic-ai; Disallow: / | |
| Claude-Web: User-agent: Claude-Web; Disallow: / | |
| CCBot: User-agent: CCBot; Disallow: / | |
| www.theguardian.com: | |
| anthropic: User-agent: anthropic-ai | |
| Amazonbot: User-agent: Amazonbot | |
| CCBot: User-agent: CCBot | |
| Bytespider: User-agent: Bytespider | |
| PerplexityBot: User-Agent: PerplexityBot | |
| www.amazon.com: | |
| GPTBot: User-agent: GPTBot; Disallow: / | |
| ChatGPT-User: User-agent: ChatGPT-User; Disallow: / | |
| CCBot: User-agent: CCBot; Disallow: / | |
| Google-Extended: User-agent: Google-Extended; Disallow: / | |
| Bytespider: User-agent: Bytespider; Disallow: / | |
| www.ebay.com: | |
| anthropic: User-agent: anthropic-ai; Disallow: / | |
| Amazonbot: User-agent: AmazonBot; Disallow: /; | |
| CCBot: User-agent: CCBot; Disallow: / | |
| Bytespider: User-agent: Bytespider; Disallow: / | |
| PerplexityBot: User-agent: PerplexityBot; Disallow: / | |
| ============================================================================== | |
| SCORECARD | |
| ============================================================================== | |
| Have /llms.txt [███████████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 13/34 | |
| Have /llms-full.txt [████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 8/34 | |
| Serve JSON via Accept header [██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 2/34 | |
| Serve Markdown via Accept header [██████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 6/34 | |
| Serve plain text via Accept header [██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 2/34 | |
| Have RSS/Atom feeds [██████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 6/34 | |
| Have OpenAPI spec [████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 4/34 | |
| Have /api endpoint [██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 2/34 | |
| Have sitemap.xml [███████████████████████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 17/34 | |
| Have AI-plugin manifest [░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 0/34 | |
| Mention AI bots in robots.txt [███████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 5/34 | |
| ============================================================================== | |
| CONCLUSION | |
| ============================================================================== | |
| Of 34 major sites tested: | |
| - 13 offer /llms.txt (purpose-built for LLMs) | |
| - 9 respond to content negotiation with non-HTML formats | |
| - 5 acknowledge AI agents in robots.txt (mostly to BLOCK them) | |
| - 6 have RSS feeds (legacy machine-readable format) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment