A bot that audits your site like Google does

See exactly which pages Google's Helpful Content System would flag.

Free preview scans the first 100 URLs. No signup required to preview.

35+

distinct finding types

5,000

URLs per scan

10–15 min

full-site scan time

$9.99

per month, cancel anytime

What a real scan finds — seennabis.com, 100-URL preview

A live preview scan against one cannabis marketplace surfaced this, in under a minute. Every finding links to the offending page, the rule it tripped, and a fix.

ORPHAN_PAGE — no other scanned page links to them. Google deprioritises pages with no internal support.

NO_H1 — every page should have exactly one. Mobile + desktop duplicates are the usual cause.

NEAR_DUPLICATE — SimHash distance ≤3. Strong cannibalisation signal between near-identical templated pages.

HEADING_SKIP — H1 → H3 with no intermediate H2. Common after blog template refactors.

IMG_NO_DIMENSIONS — missing width/height. Cumulative Layout Shift hits Core Web Vitals.

VERY_THIN — under 150 words. Two of these were SSR-bailout pages serving Googlebot only the literal text "Loading…".

The full report ranks these by total risk reduction so the highest-leverage fixes surface first. Subscribers download it as Markdown and CSV.

Every check we run, grouped by intent

Each finding carries a severity (critical / warn / info), the value we observed, the threshold we compared against, and a fix suggestion. Below is the full set — no surprises hiding behind a paywall.

Content quality

VERY_THINPages under 150 words (60 on listing pages)
THINPages under 400 words (180 on listing pages)
LOW_UNIQUEDistinct-word ratio under 30%
AI_TELLS / HEAVY_AI_TELLS40+ stock-AI phrases — three hits warn, five flag heavy
STALE_YEAROutdated year in title — e.g. a 2024 listicle still indexed in 2026
SOFT_404Returns 200 but body looks like a not-found page

Page structure

NO_H1 / MULTI_H1Missing or duplicate H1 (mobile + desktop dupes are common)
HEADING_SKIPH1 → H3 with no H2 in between
REPEATED_HEADINGMultiple identical H2 sections on one page
NO_TITLE / SHORT_TITLE / LONG_TITLEMissing, under 15 chars, over 65 chars
NO_META_DESC / SHORT_META_DESCMissing or under 50 chars — Google generates its own, usually poorly

Indexing & canonical

CANONICAL_MISMATCHCanonical points to a different URL
CANONICAL_TO_NOINDEXCanonical target is noindex — Google indexes neither
CANONICAL_CHAINA→B→C chain; Google may stop following
NOINDEX_IN_SITEMAPPage is in sitemap but has noindex — contradiction
META_ROBOTS_CONFLICTindex and noindex both present
META_NOFOLLOW / NOARCHIVE / UNAVAILABLE_AFTERLess-known robots directives reported

Schema & rich results

BROKEN_JSON_LDJSON-LD block fails to parse — Google extracts nothing
DUPLICATE_JSON_LDTwo Organization blocks (page + layout) is the classic example
OG_IMAGE_404og:image URL returns 4xx — social shares render blank

Cross-page architecture

ORPHAN_PAGENo other scanned page links here — Google deprioritises
DEEP_PAGEMore than 3 clicks from the homepage via internal links
NEAR_DUPLICATESimHash distance ≤3 — cannibalisation candidate
DUPLICATE_TITLEIdentical title on multiple pages (excluding pagination)
HREFLANG_ASYMMETRYSame-host alternates that don't reciprocate
LINK_BLOAT / LINK_STARVEDOver 200 or under 3 internal links per page
GENERIC_ANCHOR'Click here' / 'read more' as a meaningful share of links

Site-wide hygiene

NO_ROBOTS_TXT / ROBOTS_NO_SITEMAProbots.txt missing or doesn't declare a Sitemap:
ROBOTS_DISALLOW_ALL'Disallow: /' is blocking the whole site
STALE_LLMS_TXTTLD-locale mismatch — e.g. .de site mentions Canadian content
IMG_NO_ALT / IMG_DUPLICATE_ALTImage alt audit, including cross-image duplicates
IMG_NO_DIMENSIONSImages without width/height — Cumulative Layout Shift risk
NO_IMAGESLong-form page (>300 words) with no images at all

Composite: every page also gets an indexability score (0–100) rolling all signals into one number — useful for sorting big result sets.

How a scan works

1
Submit your URL
Paste a domain. CrawlAudit pulls /sitemap.xml — and falls back to /sitemap_index.xml or /sitemap-0.xml — to discover URLs. Same-host filtering keeps the scan focused on the property you submitted.
2
Chunked crawl
A worker claims 200 URLs at a time and fetches them at 20 concurrent. We use the user-agent CrawlAuditBot/1.0, follow up to 5 redirects, never execute JavaScript, never accept cookies. A 5,000-URL scan finishes in 10-15 minutes.
3
Per-page scoring
Each page produces structured findings: type, severity, value, threshold, why, fix. Severity-weighted so the report says 'Title 47 chars (Google truncates at 60)' rather than just labelling something SHORT_TITLE.
4
Cross-page + site-level checks
When the queue empties we run duplicate-title detection, hreflang reciprocity, canonical-chain detection, og:image HEAD-checks, SimHash near-duplicate clustering, BFS crawl-depth from your homepage, orphan-page detection, robots.txt + llms.txt inspection.
5
Fix recommendations
Findings aggregated by type and ranked by total risk reduction. The top 10 surface at the report header — 'Fix duplicate titles on 47 pages → risk drops 235' beats a 4,000-line CSV.
6
Download
Subscribers download the full Markdown report (human-readable, includes every finding's diagnosis + fix) and CSV (spreadsheet-ready). Re-run after fixes ship: baseline diff shows +new / −resolved.

Versus other crawlers

Generalist SEO tools cover technical signals (broken links, status codes) well. They don't score content quality against Helpful Content signals. We do.

Static HTML scanning	CrawlAudit	Screaming Frog	Sitebulb	Ahrefs Site Audit
Thin-content + listing-page-aware scoring	Yes	No	Partial	No
AI-tell phrase detection	Yes (40+ phrases)	No	No	No
SimHash near-duplicate clustering	Yes	No	No	Partial
BFS crawl depth + orphan detection	Yes	Yes	Yes	Yes
Canonical-chain + canonical-to-noindex	Yes	Partial	Yes	Yes
og:image 404 check via HEAD	Yes	No	No	No
Stale-llms.txt detection	Yes	No	No	No
Per-finding why + suggested fix	Yes	No	Yes	Partial
Baseline diff between scans	Yes	Partial	Yes	Yes
Runs in the cloud (no laptop required)	Yes	No	No	Yes
Price	$9.99/mo	$259/year	$15+/mo	$129+/mo

Frequently asked questions

Does CrawlAudit execute JavaScript like Googlebot does?

No — static HTML only, by design. Googlebot's first-pass index uses static HTML; the rendered DOM gets a second pass much later. Most ranking decisions happen on the first pass. If your content is invisible to us, it's invisible to Google's initial crawl too — and that's exactly what we want to surface.

How is this different from Screaming Frog or Sitebulb?

Those tools cover technical SEO — broken links, status codes, redirect chains. CrawlAudit specifically scores content quality against Helpful Content signals: thin pages, AI footprints, templated intros, near-duplicates, missing structured data. We score what they don't. We also run in the cloud, so a 5,000-URL scan doesn't tie up your laptop for 20 minutes.

Why should I trust the AI-tell detector?

It detects pattern-presence, not 'AI vs human.' We don't claim to identify which model wrote what. We flag 40+ stock phrases that real human editors strip out — filler transitions, generic summary tags, vague intensifiers. If your page hits five of them, it reads like an unedited draft, whether a person, a model, or both produced it. The fix is the same: edit it. The full pattern list is on /about.

Will the bot hammer my server?

No. 20 concurrent requests max, 15-second per-request timeout. A 5,000-URL scan is roughly equivalent to one user browsing for 15 minutes. The user-agent is CrawlAuditBot/1.0 — block or rate-limit if you want.

What about indexability — can I see which pages Google won't rank?

Every page gets an indexability score from 0 to 100. Pages with noindex score 0. Soft 404s score 5. Thin pages score 10-40. Pages with clean structure, sitemap inclusion, and a self-canonical score 95+. The composite rolls up signals into one number you can sort by.

Can I scan a site bigger than 5,000 URLs?

Rotate scans across sections by pointing CrawlAudit at sub-sitemaps separately (e.g. /sitemaps/blog.xml then /sitemaps/products.xml). Email us if you need a higher cap built into the plan.

Do unused scans carry over?

No — the 5-scan allotment resets monthly. Most sites scan once after a content push, then once more two weeks later to verify fixes.

Stop guessing. Get the list.

You don't need another opinion on whether your content is "good enough." You need the specific 47 pages Google would skip, ranked by which fixes move the needle most.

Free 100-URL preview, no card, no signup. $9.99/month unlocks the full site.

See exactly which pages Google's Helpful Content System would flag.

What a real scan finds — seennabis.com, 100-URL preview

Every check we run, grouped by intent

Content quality

Page structure

Indexing & canonical

Schema & rich results

Cross-page architecture

Site-wide hygiene

How a scan works

Submit your URL

Chunked crawl

Per-page scoring

Cross-page + site-level checks

Fix recommendations

Download

Versus other crawlers

Frequently asked questions

Stop guessing. Get the list.