Alternative Markdown format for AI Tools

Paulchen at work with AI

I started reading “AI Engineering” by Chip Huyen and learned about the studies that NewsGuard did on AI generated content. The conclusion was and I am paraphrasing:

LLMs are slowly running out of human created content to learn from. Many publishers are shutting AI Tools out of crawling their sites. So LLMs are feeding from Ai generated content, which will fast decrease the quality of the LLMs. It’s a risk in the industry that AIs won’t be able to deliver on the promise.

As I publish on the Gutenberg Times, and I want to make sure that AI Tools are able to crawl the content. So they can learn from the updates published there. The good part is that Gutenberg Times is not depent on advertising or sponsors that need eyeballs.

Earlier this week, I read Dries Buytaert‘s blog post The Third Audience. In it Buytaert outlines how he enabled on his site content formated in Markdown, the favorite format for AI tools. He implemented it on his favorite CMS Drupal and specified the methods:

First, I added content negotiation to my site. When a request includes Accept: text/markdown in the HTTP headers, my site returns the Markdown instead of the rendered HTML.

Second, I made it possible to append .md to any URL. For example, https://dri.es/principles-for-life.md gives you clean Markdown with metadata like title, date, and tags. You can also try adding .md to the URL of this post.

But how did those crawlers find the Markdown version so fast? I borrowed a pattern from RSS: RSS auto-discovery. Many sites include a link tag with rel="alternate" pointing to their RSS feed. I applied the same idea to Markdown: every HTML page now includes a link tag announcing that an alternative Markdown version exists at the .md URL.

This shouldn’t be too hard to implement also for the WordPress CMS, and I consulted with Claude Code on creating the plugin.

It’s now running on Gutenberg Times and on this site. I have not seen any significant changes in traffic patterns, or visits. I also suspect that Jetpack stats won’t pick up the visits by AI crawlers.

I have not packaged the plugin for the WordPress repository, but if you are interested you can grab it from GitHub.

https://github.com/bph/wp-markdown-endpoint

There are contributors working on a built in method for WordPress Core to provide markdown out of the box, here and here.
So I won’t explore improving this plugin. It’s just here for the interim.

I code for a purpose
I code for a purpose
@pauli@icodeforapurpose.com

Personal tech blog of Birgit Pauli-Haack

53 posts
3 followers

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.