Skip to main content

How jpfchang.org Is Becoming Readable to AI Search

RSS

A public engineering note on canonical URLs, structured data, sitemaps, robots policy, RSS, and llms.txt for SEO and AI discovery.

Date: June 30, 2026
Reading time: 4 min read
Tags:
Screenshot of the public llms.txt route for jpfchang.org showing canonical facts, priority pages, and citation guidance.

GEO, or generative engine optimization, is easiest to misunderstand when it is treated like a new bag of tricks. The current progress on jpfchang.org takes the opposite approach: make the public site technically clear, first-hand, crawlable, and easy to cite.

Google’s own guidance now frames GEO and AEO as part of the broader search experience: useful content, accessible pages, structured context, and high-quality media still matter. The practical version for this site is straightforward: write clearly, expose canonical public facts, and help both humans and answer systems know which pages should be trusted as citation targets.

Screenshot of the public llms.txt route for jpfchang.org showing canonical facts, priority pages, and citation guidance.

Canonical pages first

The site is now clearer about which pages should carry public meaning:

  • / for the portfolio overview.
  • /about/ for identity, skills, role fit, and public profiles.
  • /projects/ for portfolio evidence and public product context.
  • /blog/ for writing and dated progress notes.
  • /contact/ for hiring, product, and secure-contact routing.
  • /gpg/ for signed or sensitive correspondence claims.

The goal is to reduce duplicate or misleading citation paths. A terminal interface can still be delightful, but it should not become the primary source for role-fit claims. A retired product page can still redirect, but it should not be treated as portfolio evidence.

This is especially important for AI search. If an answer engine is building a short summary, it needs stable targets. Clear canonical pages make that summary more likely to be accurate.

Structured data as public scaffolding

jpfchang.org already uses JSON-LD for major public surfaces. The current work strengthens that layer with:

  • ProfilePage and Person context for Pengfan Chang.
  • Organization context for jpfchang.org and public founder context.
  • ItemList context for project and writing archives.
  • BlogPosting context for individual articles.
  • BreadcrumbList context for navigable page hierarchy.
  • Contact points where the public page is meant to support professional inquiries.

Structured data should not invent claims. It should mirror what a reader can already see. That is the rule I am following here: schema is a map, not a billboard.

Google’s article structured data documentation emphasizes accurate article metadata such as headline, date, author, and representative image. The blog layout already supports that pattern through frontmatter, canonical URLs, article dates, tags, and optional image fields.

llms.txt for answer systems

The public /llms.txt route is a compact, machine-readable guide to jpfchang.org.

It lists canonical facts, priority pages, recent writing, public profiles, preferred citation targets, routes to avoid, and machine-readable resources like RSS, sitemap, robots policy, and the llms.txt URL itself.

That does not replace normal SEO. It complements it. Search crawlers still need indexable pages. Readers still need useful writing. But llms.txt gives answer systems a low-friction summary of how the public site wants to be understood.

The useful part is the discipline behind it: if a fact belongs in llms.txt, it should also be true on the public pages. If it is confidential, uncertain, or not ready to publish, it does not belong there.

Robots policy with a sharper distinction

The robots policy now separates user-directed answer retrieval from bulk model-training collection.

In plain language: answer and search retrieval crawlers can access the public site, while training-oriented crawlers are blocked. That matches the public posture of the site. I want useful public pages to be discoverable and cited. I do not want private or retired surfaces to become raw material for the wrong kind of reuse.

OpenAI documents distinct crawlers for search, user agents, and training. The site policy follows that kind of distinction at a public robots level without disclosing private infrastructure.

Sitemap, RSS, and images

The sitemap now treats primary public routes and blog posts as the important discovery layer. Utility files, private routes, retired routes, API paths, and duplicate surfaces are excluded.

RSS remains the syndication path for writing. Blog posts carry titles, descriptions, dates, tags, and canonical links. Images are local, descriptive, and representative of the content.

That matters for both SEO and GEO because media can support how a page is understood. A screenshot of a real public page is safer and more useful than a decorative image that suggests facts the article cannot support.

The practical rule

The current direction is not “write for robots.” It is:

  1. Make the public site useful for people.
  2. Make the public facts consistent across pages.
  3. Make the technical structure crawlable.
  4. Make citations point to the right pages.
  5. Keep private data out of public progress writing.

That is the kind of SEO/GEO work I trust: boring in the best way, because it makes the truth easier to find.

References

Pengfan Chang (John)

Written by Pengfan Chang (John)

Hacker, Researcher, and Outdoor Enthusiast