How to Do a Technical SEO Audit: A Practitioner Methodology

A technical SEO audit is a systematic review of the engineering and architecture layer of a website to identify issues that prevent search engines from crawling, indexing, rendering, or ranking the site’s pages. It sits underneath content and backlink work – neither matters if the site cannot be crawled cleanly or rendered correctly – and it is the audit type that produces the most important fixes when the technical layer has been neglected.

The methodology has stabilised over the last decade into a defined sequence: crawl, indexability, site architecture, page speed and Core Web Vitals, schema, internal-link analysis, and (briefly) log-file review. The order matters because the findings cascade – a crawl problem affects what you can even measure for indexability, and an indexability problem affects what is worth speed-optimising.

This article is a practitioner methodology. It assumes the reader is an SEO, developer, or product owner running or commissioning a technical audit and wants the operational sequence, what each layer checks, and how to prioritise the findings into a remediation plan that engineering can actually ship.

Key Takeaways

Run a full crawl first; the crawl is the foundation that every other audit layer depends on for accurate findings.
Indexability checks (robots.txt, meta robots, canonicals, hreflang, sitemap) catch the issues that hide pages from search engines entirely.
Findings should be triaged by impact and engineering effort; ship the high-impact, low-effort fixes first to demonstrate audit value before tackling architectural debt.

Crawl: the foundation every other audit layer depends on

The audit starts with a full crawl of the site, run with a crawler tool that simulates how a search engine bot would behave. The crawl produces the inventory: every URL the crawler reached, the HTTP status returned, the meta robots directives, the canonical declared, the title and meta description, the H1, the response time, and the depth from the home page.

Crawler configuration. Configure the crawl with a desktop user-agent and a mobile user-agent (run separately or in parallel; mobile is the primary surface for indexing). Respect robots.txt by default but also run a crawl that ignores robots.txt to see what is being blocked. Render JavaScript if the site is JavaScript-heavy – comparing the rendered crawl against the raw-HTML crawl reveals content that is invisible to engines that do not execute scripts.

What to extract. URL list with status codes; orphaned pages (pages in the sitemap but not linked from anywhere); pages with non-200 status (4xx errors, 5xx errors, redirect chains); pages with thin content (low word count, low text-to-HTML ratio); pages with duplicate or missing titles, meta descriptions, or H1s; pages with non-canonical canonical declarations; redirect chains and loops.

Crawl-budget signals. Large sites need crawl-budget analysis: how much time the engine spends crawling unhelpful URLs (faceted-navigation explosions, parameter-driven duplicates, paginated archives) versus the canonical commercial pages. The crawl reveals where the budget is leaking.

Compare crawl to sitemap to indexed-pages. The three lists should overlap meaningfully. Pages in the crawl but not in the sitemap suggest sitemap coverage is incomplete. Pages in the sitemap but not in the indexed-pages list suggest indexability or quality issues. Pages indexed but not in the crawl suggest orphaned content the engines found via external links.

Indexability: robots, canonicals, sitemaps, hreflang

Indexability is whether a page is eligible to appear in search results. The most common technical-SEO failure mode is that pages that should be indexable are accidentally blocked or canonicalised away.

robots.txt. Read every line. Common mistakes: blocking /wp-admin/ and accidentally also blocking the wrong directory; a leftover Disallow: / from a staging deployment that was not removed; user-agent-specific blocks that affect Googlebot or other engines unintentionally. Test with the search-console robots.txt tester or equivalent.

Meta robots and X-Robots-Tag. Pages with noindex or nofollow directives that should not have them. Templates that inject noindex on pages that should be indexed (a common bug after CMS updates or template inheritance). X-Robots-Tag in HTTP headers can override the on-page meta robots and is easy to miss because it does not appear in the HTML.

Canonicals. Every page should declare a self-referencing canonical or a canonical to the preferred version. Issues to find: pages declaring canonicals to other pages (cross-canonicalisation that hides them); pages with no canonical at all (dependent on the engine’s inference); canonical declarations that do not match the page actually served; pages with multiple canonical declarations.

Sitemaps. XML sitemaps should list every indexable URL and exclude every non-indexable URL. Sitemap freshness, lastmod accuracy, sitemap-index structure for large sites, separate sitemaps per content type (news, video, image where applicable). Submit to search console category tools and monitor coverage status.

hreflang for international sites. If the site serves multiple languages or regions, hreflang declarations must be reciprocal (each variant references all others), use correct language and region codes, and match the actual content served. hreflang errors are common and often invisible in user-facing experience but produce wrong-locale-served-to-user search results.

Pagination and faceted navigation. Decide which paginated pages are indexable (often only page 1, with rel-next/prev historical and now mostly informational) and which are canonicalised. Faceted navigation should not produce indexable URL explosions; common patterns are parameter-handling rules, robots.txt blocks for query-string-driven facets, or canonicalisation back to the unfaceted view.

Site architecture, internal linking, and topical clusters

Site architecture is how pages are organised and linked. Architecture audits look at depth, internal-link distribution, and the cluster structure that signals topical authority to engines.

Depth from home page. Pages reachable in two or three clicks from the home page get crawled and indexed more reliably than pages buried at depth six or seven. Audit the depth distribution. Important commercial pages should not be deeper than three clicks; if they are, the architecture is not surfacing them.

Internal-link distribution. Most pages should have at least three to five inbound internal links. Pages with one or zero inbound internal links (orphans or near-orphans) often underperform because the engines have no internal-link signal supporting them. Pillar pages should have many inbound internal links from supporting articles.

Cluster structure. Topical authority is signalled when a pillar article on a topic has supporting articles that link to it and to each other. The audit should map clusters: identify the pillar candidates, identify the supporting article inventory, identify gaps where clusters are incomplete or where supporting articles do not actually link back to the pillar.

Anchor-text patterns. Internal anchor text should describe the destination accurately. Generic anchors (‘click here,’ ‘read more’) waste the signal. Over-optimised exact-match anchors on every link can read as manipulative; varied descriptive anchors are the operational pattern.

Breadcrumbs. Breadcrumb navigation, with BreadcrumbList schema, helps users and engines understand hierarchy. Audit for presence on category pages and content pages, correct schema, consistent hierarchy.

URL structure. URLs should be readable, predictable, and use hyphens not underscores. Deep URL nesting (/category/subcategory/sub-subcategory/page) is fine if the hierarchy is real; arbitrary nesting is noise. URL changes (during migrations) need redirect maps to preserve internal-link equity.

Page speed, Core Web Vitals, and rendering

Page speed and Core Web Vitals are user-experience signals that search engines use as part of ranking. The audit covers both lab-data measurements (synthetic snapshots) and field-data (real user metrics from the Chrome User Experience Report or equivalent), with field data being the primary signal for ranking purposes.

Largest Contentful Paint (LCP). Time to render the largest above-the-fold element. Target: under 2.5 seconds at the 75th percentile of mobile field data. Common causes of poor LCP: unoptimised hero images, large above-the-fold JavaScript bundles, render-blocking CSS, slow server response time. Audit per-page-template, not just the home page.

Interaction to Next Paint (INP). Latency of user interactions across the page lifetime. Target: under 200ms at the 75th percentile. Causes of poor INP: heavy main-thread JavaScript, third-party scripts firing on every interaction, large hydration costs on framework-heavy sites.

Cumulative Layout Shift (CLS). Sum of unexpected layout shifts. Target: under 0.1. Causes: images and ads without dimensions, web fonts that swap and reflow, dynamically-injected content. Audit by template; the home page may pass while a content template fails.

Server response time. Time to First Byte should be under 600ms at the 75th percentile of mobile field data. High TTFB suggests origin-server, database, or CDN issues. Audit by URL pattern – content pages may serve quickly while dynamic search pages are slow.

Render audit for JavaScript-heavy sites. Compare the raw-HTML crawl against the rendered crawl. Content visible only after JavaScript execution may not be indexed reliably, especially on first-render-blocked sites. The fix is server-side rendering, static generation, or careful hydration patterns.

Resource hints and asset budgets. Audit the resource-hint usage (preload, preconnect, prefetch) and set explicit byte budgets per template (HTML, CSS, JavaScript, images). Pages that consistently blow the budget need bundle audits and third-party-tag review.

Schema validation, log-file review, and prioritisation

The final layers of the audit cover structured data, server-log behaviour, and the synthesis of findings into a prioritised remediation plan.

Schema validation. Every page that should carry schema (Article, BlogPosting, FAQPage, Product, Organization, BreadcrumbList, LocalBusiness depending on content type) ships valid JSON-LD or microdata. Validate with a structured-data testing tool from the validator-tool category, plus the search console’s Rich Results report for indexed-page schema status. Common issues: schema present but with missing required fields; schema declarations that contradict on-page content; schema on pages that no longer exist; FAQPage schema on Q&A that is not visible to users (a violation that risks manual action).

Log-file review (briefly). Server logs show how engines actually crawled the site versus how a crawler tool simulates. For large sites with crawl-budget concerns, log-file analysis reveals: which URL patterns engines spend time on, the crawl frequency on commercial pages versus low-value pages, status codes engines actually received (which sometimes differ from what a crawler tool reports), and whether specific bots are over- or under-crawling. Log-file review is more involved than the other audit layers and is reserved for sites where crawl budget is a real constraint (typically large e-commerce, news, or marketplace sites).

Findings synthesis. Every issue identified gets logged with severity (critical, high, medium, low), affected URLs or templates, the root-cause hypothesis, and the proposed fix. The audit deliverable is a prioritised list, not a raw findings dump.

Prioritisation matrix. Plot findings on impact (how much commercial traffic or ranking is affected) versus engineering effort (how much developer work to fix). Ship the high-impact, low-effort fixes first to demonstrate audit value and unblock the higher-effort architectural work. Items in the high-impact, high-effort quadrant become quarter-scale projects with explicit forecasts and stakeholder buy-in. Low-impact items can be deferred or deprecated.

Re-audit cadence. Technical SEO is not a one-off audit; the site changes constantly with deployments, content publishes, and third-party tag additions. A quarterly re-crawl with a delta report against the previous audit catches regressions early. CI integration of crawl checks (a smoke crawl on staging before deployment) catches them earliest.

Conclusion

A technical SEO audit is a sequenced methodology, not a checklist run in any order. Crawl the site to build the inventory; check indexability so the pages that should be in the index actually can be; audit site architecture and internal linking so authority flows to commercial pages and topical clusters are coherent; run page-speed and Core Web Vitals work using field data over lab data; validate schema on every page that should carry it; and reserve log-file analysis for sites where crawl budget is a genuine constraint. The deliverable is not a findings dump but a prioritised remediation plan, plotted by impact versus engineering effort, with the high-impact low-effort fixes shipped first to demonstrate value before the architectural projects begin. Technical SEO is the layer underneath content and links – when it is neglected, the rest of the work compounds against a degraded foundation. A clean audit cadence (quarterly re-crawl, CI smoke-crawl on staging) is what prevents regressions from accumulating between full audits.

Frequently Asked Questions

What is a technical SEO audit?

A technical SEO audit is a systematic review of the engineering and architecture layer of a website to identify issues that prevent search engines from crawling, indexing, rendering, or ranking the site’s pages. It covers crawl behaviour, indexability (robots, canonicals, sitemaps), site architecture and internal linking, page speed and Core Web Vitals, schema validation, and log-file behaviour. It produces a prioritised remediation plan engineering can ship.

What is the right order for a technical SEO audit?

Crawl first (it is the foundation every other layer depends on for accurate data), then indexability, then site architecture and internal linking, then page speed and Core Web Vitals, then schema validation, and (for large sites) log-file review last. The order matters because findings cascade – a crawl issue affects what you can measure for indexability, and an indexability issue affects what is worth speed-optimising.

How long does a technical SEO audit take?

A small site (under 500 URLs) takes one to two days for the crawl and analysis. A medium site (500-10,000 URLs) takes three to five days. A large site (10,000+ URLs, especially with log-file review and rendering audits) takes one to two weeks. The remediation plan and engineering hand-off add additional time. Audits that complete faster than this on large sites usually skip the rendering and log-file layers.

Should I crawl with a desktop or mobile user-agent?

Both, ideally – but if you can only run one, run mobile, because mobile-first indexing means the mobile version is the primary version search engines use for ranking. Comparing the two crawls reveals content-parity issues where the mobile rendered HTML is materially shorter than desktop. JavaScript rendering should also be audited separately if the site is JavaScript-heavy.

What Core Web Vitals targets should I aim for?

Largest Contentful Paint under 2.5 seconds, Interaction to Next Paint under 200ms, and Cumulative Layout Shift under 0.1, all measured at the 75th percentile of mobile field data from the Chrome User Experience Report or equivalent. Field data is the primary signal because that is what real users actually experienced; lab data from synthetic tests is a useful diagnostic but does not feed ranking signals directly.

Do I need log-file analysis for every audit?

No. Log-file analysis is more involved than other audit layers and is reserved for sites where crawl budget is a genuine constraint – typically large e-commerce sites, news sites, and marketplaces with hundreds of thousands of URLs. Smaller sites get adequate crawl coverage from sampling, and log-file work would be over-investment. The crawler simulation plus search console crawl statistics are usually enough.

How should I prioritise audit findings?

Plot every finding on impact (how much commercial traffic or ranking is affected) versus engineering effort (developer work required). Ship high-impact, low-effort fixes first to demonstrate audit value and build stakeholder confidence. High-impact, high-effort findings become quarter-scale projects with explicit forecasts and buy-in. Low-impact findings can be deferred or deprecated. Avoid handing engineering a raw findings dump; deliver a prioritised plan with severity, root cause, and proposed fix per item.

If you want a structured technical SEO audit – full crawl, indexability, architecture, Core Web Vitals, schema, prioritised remediation plan – we can scope it.

Alva Chew

We help businesses dominate AI Overviews through our specialised 90-day optimisation programme.