How static-site migrations silently break SEO, and the build-time check that catches it

In late February, I migrated rajiv.com from Hugo to Astro. The new build pipeline was cleaner, the site was faster, and the deploy went without incident. Three months later, organic traffic was down meaningfully and recent posts were not ranking the way older ones did.

The site looked fine. Search Console did not flag anything obvious. The sitemap validated. Then I ran one diagnostic that revealed a regression class I had not documented. I now run that check at build time before any static-site deploy.

I wrote this for engineers, technical leads, and anyone running their own static site. If you are on Hugo, Astro, Eleventy, Jekyll, or any tool that emits a sitemap as part of the build, this failure mode applies to you.

What “silent” means here

The sitemap was technically valid. It returned XML, parsed cleanly, and contained the URLs it should have. By every quick test, it looked correct.

But Search Console’s Sitemaps panel told a different story. The most recent successful fetch had been months earlier, right around the migration date. Since then, the status had been flipping between “Couldn’t fetch” and “Temporary processing error.” Recent posts were not appearing in the Index Coverage report. Old archive URLs from the previous URL structure were still being indexed and outranking the canonical articles in their own SERPs.

The site reported healthy status while the search ecosystem reported degraded status. That gap is the silent failure mode. The site itself worked fine; what broke was the integration with Google’s crawler, and that was only visible inside Search Console.

The diagnosis

I pulled URL Inspection in Search Console for a representative sample of recent posts. The pattern was immediate:

  • An article from November: “Crawled - currently not indexed.” The Sitemaps row read “Temporary processing error.”
  • A second article from the same week: “Discovered - currently not indexed.” All crawl fields N/A. Google had seen the URL existed but had never crawled it.
  • A December article that the GSC dashboard had flagged with a 58 percent impressions drop: indexed but degraded.

Two different not-indexed states across articles that should have been routine. The Sitemaps “Temporary processing error” row was the smoking gun. The story it told: Google saw the URLs, attempted to process the sitemap entries, hit an error, and parked the articles in not-indexed state. The visible 58 percent impressions drop was an undercount. Articles that fall out of the index entirely do not show up in impressions data at all.

The cause, once I looked at the build output, was small and ugly. The Astro sitemap plugin emits sitemap-index.xml and sitemap-N.xml files. My robots.txt and Search Console submission both still pointed at /sitemap.xml: the URL the previous Hugo build had emitted, and the URL every external system expected. After the migration, /sitemap.xml had been returning 404 for three months. Google had been hitting it on every crawl cycle and silently degrading the site’s crawl priority.

The audit was bigger than the sitemap

When the initial diagnosis surfaced just the sitemap defect, the partial-fix shape did not match the size of the symptom. Three months of indexing degradation is not usually a one-line fix. I expanded the audit across all three of my sites (rajiv.com plus my two synthesis sites that share infrastructure), and across the full SEO surface, not just the reported issue.

I directed Claude Code to run parallel scans across the three sites: robots.txt directives, sitemap structure and lastmod presence, schema completeness, canonical correctness, archive indexability, Open Graph article tags, image alt text, <h1> count per page. Five agents running in parallel, each with a narrow scope, all writing into a single consolidated findings document. I reviewed every finding, adjusted severities, and made the call on what got included in the deploy.

The breadth was the lesson. Ten CRITICAL findings, plus a long tail of HIGH and MEDIUM ones:

  • /sitemap.xml returning 404 across all three sites
  • Zero JSON-LD Article schema on any blog post (the Astro plugin does not auto-generate it)
  • No <lastmod> in any sitemap entry, so Google could not tell content had been updated
  • Archive pages (categories, tags, the /articles/ index) indexable, crowding canonical articles in SERPs
  • Title duplication on the homepage <title> tag, with the site name appearing twice; the same pattern on the synthesis sites
  • Two <h1> tags on every page (the site brand template was using <h1>)
  • Body images missing alt attributes
  • Some articles emitting relative canonical URLs instead of absolute
  • One synthesis site’s robots.txt missing the Sitemap: directive entirely
  • Articles missing og:image on one site

Every one of these is the kind of thing that should have shown up in a serious pre-deploy audit before flipping the DNS over to the new build. None did. The pre-deploy check I had at the time was “does the site render and pass astro build,” which catches none of the issues above.

One coordinated deploy

I shipped all of the CRITICAL and HIGH findings in a single coordinated deploy across all three Astro repos. The reasoning was structural. Each issue was small in isolation, and the combination was producing the cliff drop. Fixing the sitemap alone would have moved the metric some and left a degraded site for Google to re-evaluate. Fixing all of them at once gives the search ecosystem one clean rebuild signal.

The pieces:

Sitemap URL contract. I kept /sitemap.xml as the canonical public URL. Every external tool (Search Console, Bing Webmaster, archive.org, third-party SEO checkers) expects /sitemap.xml. I added a Cloudflare _redirects rewrite that serves the Astro-emitted sitemap-index.xml content at that path with a 200 response (a rewrite, not a 301). The implementation detail is invisible to crawlers; the URL contract is preserved.

Sitemap freshness. Configured @astrojs/sitemap with a serialize function that pulls lastmod from each post’s frontmatter modified or date field. Without lastmod, Google cannot tell that content has been updated, and the crawler under-prioritizes the URL.

JSON-LD article schema. Added Article, Person, WebSite, and BreadcrumbList structured data, generated in BaseLayout from frontmatter. Required for rich-result eligibility, and one of the strongest signals available for AI-driven retrieval (the GEO surface).

Archive page noindex. Marked /blog/, /blog/category/*, /blog/tag/*, and the /articles/ index page on the synthesis sites as noindex, follow. Filtered them out of the sitemap. Archive pages are useful for human navigation and bad for search results. They crowd canonical articles and dilute the signal.

Canonicals absolute. Forced absolute canonical URLs via a makeAbsolute() helper in BaseLayout. Relative canonicals get interpreted as path-relative by some crawlers and fail in surprising ways.

HTML semantics. Changed the site-title <h1> to a <span> so each page has exactly one <h1> (the article title). Removed the generic site-wide <meta name="keywords"> tag entirely. Keywords is dead, and the same string on every page is its own kind of signal degradation.

Apex redirect. Added https://www.rajiv.com/* → https://rajiv.com/:splat 301 so the www subdomain stops serving as a parallel indexable host.

Open Graph article tags. Added article:published_time, article:modified_time, article:author to article layouts.

That is the substantive list. Each piece took an evening or less. The combined deploy was a session of focused work.

The build-time guardrail

The fix is not durable unless the contract is enforced. The whole reason this happened in the first place is that the migration regressed the SEO contract silently. There was no automated check between “build succeeded” and “deploy approved.” So I wrote one.

scripts/seo-check.js runs after astro build and before deploy. It scans dist/ and fails the build if any of these checks fail:

// Twelve checks, all on the built output:
//  1. Every article in content/posts/ appears in dist/sitemap-0.xml
//  2. Every URL in the sitemap has a <lastmod> element
//  3. No article HTML contains <meta name="robots" content="noindex">
//  4. Category/tag/index pages DO contain noindex
//  5. Every article HTML contains <script type="application/ld+json">
//     with @type: Article
//  6. Every article's <link rel="canonical"> href starts with https://
//  7. Every page's <title> is between 30 and 65 characters
//  8. No <title> contains the site name twice
//  9. Every article has a non-empty meta description
// 10. Every article emits an og:image URL (or has explicit fallback flag)
// 11. Every <img> tag has a non-empty alt attribute
// 12. Every HTML page has exactly one <h1>

The full script is about 250 lines of Node.js. It runs in under a second on a 340-article site. I wired it into build.sh with a SEO_STRICT=1 env var, currently in warn-only mode while I clean up a handful of pre-existing content issues. Once the build is clean, the env var flips to strict and any regression fails the deploy.

The principle: any SEO contract worth having is worth enforcing in CI. Writing the check is an evening of focused work; reintroducing the regression silently costs months of indexing damage you might not notice.

What every static-site operator should verify

The minimum SEO contract before any static-site migration ships:

  • The sitemap URL crawlers expect (/sitemap.xml) returns 200 and contains every published article
  • Sitemap entries include <lastmod> from a real source field
  • All canonical URLs are absolute and match the public URL
  • Archive and index pages (/blog/, /categories/*, /tags/*) are noindex, follow
  • Every article emits Article JSON-LD; the homepage emits WebSite JSON-LD; an author Person schema is present site-wide
  • Every page has exactly one <h1> (the page title, not the site brand)
  • Every body image has alt text
  • <title> tags do not double the site name
  • The www subdomain redirects to the apex (or vice versa, but pick one)
  • All of the above are checked at build time and fail the deploy on regression

If any of these are not enforced in your build pipeline, you have a silent-regression risk. The check script does not have to be elegant (mine is a few hundred lines of Node), but it has to fail the deploy.

On migrations more generally

The deeper lesson here is migration discipline.

A static-site migration is a contract change with the search ecosystem. URL structures, sitemap formats, and canonical patterns all shift. The build pipeline that produced the previous site’s contract gets replaced with a new pipeline producing, hopefully, the same contract. Without a checkable definition of that contract, “the new site builds successfully” is not the same statement as “the new site preserves the SEO surface.”

The build-time check is the way I now define the contract. It runs twelve assertions; any failure stops the deploy. I should have had this kind of check at the time of the first migration. Now I do.


Update (2026 May 7): I wrote a companion piece on the operational side of this recovery, applying the Direction Dynamic to multi-tool GUI ops.