Pro · Methodology · Rubric 2026.06.5 · Open for review

How we measure AI visibility.

They score what your content looks like. We measure what AI actually says.

SetpointHQ Pro is the audit and retainer engine behind The SetpointHQ Index, a longitudinal dataset of UK employer AI visibility. This page documents the methodology in full: the dimensions we score, how they roll up to a composite, and the version-pinning discipline that makes every published score reproducible from source.

Current version2026.06.5

Dimensions scored5

LLM providers probed4

Probe cadencePer audit

Open for reviewYes

01 · Why this matters

AI visibility is not SEO.

SEO measures whether Google ranks your page. AI visibility measures whether ChatGPT, Claude, Perplexity and Gemini cite your company when a candidate asks them about employers. The mechanisms diverge: search engines crawl and rank pages by intent matching; LLMs absorb a corpus and synthesise answers.

What SEO measures

Your rank on the page

A strong position, on a page the candidate may never open.

What we measure

Whether the answer names you

Who are the strong employers in my field?

Worth looking at: Acme, [Your brand], and Initech.

A mention in the answer they actually read.

A company can rank #1 on Google for its careers page and never be named when a candidate asks an AI best fintechs in London for software engineers. We measure the second of those, which is increasingly the question candidates ask.

Existing tools score whether content looks citable: passage length, statistical density, formatting heuristics. Pro scores whether the company is actually cited when an AI is asked. The two are correlated but not identical, and only one is what candidates see.

02 · The approach

Empirical, not heuristic.

Citation likelihood is the centrepiece of the rubric. We probe four major LLMs (Anthropic Claude, OpenAI GPT, Perplexity Sonar, and Google Gemini) with candidate-research queries and observe whether the client is named in the response. This is empirical, not heuristic: we are not scoring whether the content looks citation-ready, we are measuring whether AI cites it.

Multi-engine probing is deliberate. AI visibility means visibility across engines, not against a single model. Each provider has different training data, different retrieval surfaces, and different citation behaviours; an audit that probes only one of them measures one model’s opinion, not AI visibility in any meaningful sense. We run the same query set across all four and aggregate per-provider results into a cross-provider mean.

We do not measure raw search traffic: that is SEO. We do not measure LinkedIn engagement metrics: that is social listening. We measure whether AI cites you when candidates ask AI about employers, which is a question the existing tooling stack does not answer.

03 · The dimensions

Five dimensions, one composite.

Each dimension scores 0 to 100 against an empirical or structural signal. The composite is a weighted average. Weights below come from the current published rubric (2026.06.5); they update when the rubric does.

DIM · 01

Citation likelihood

Weight · 35%

Empirical: do four major LLMs cite the client when asked about employers?

We run a standardised query set across Anthropic Claude, OpenAI GPT, Perplexity Sonar, and Google Gemini. We parse each response for mentions of the client by canonical name and known aliases. Position matters: citations at the top of an answer score higher than citations at the end. We average per-provider scores across providers that responded. Providers that errored on every probe drop from the mean rather than score as zero. Citation is the IP centrepiece of the rubric.

DIM · 02

Schema completeness

Weight · 20%

Structural: does the careers site expose the structured data AI crawlers expect?

We check JSON-LD presence and validity for Organization, JobPosting, Person, ProfilePage, BreadcrumbList, and WebSite types against schema.org. Each type's sub-signals weight by which fields it declares. Missing required fields cost more than missing optional ones.

DIM · 03

Knowledge-graph presence

Weight · 15%

External: is the company present in the structured knowledge sources LLMs draw on?

We score Wikipedia entry, Wikidata Q-number, and the completeness of sameAs cross-references in Organization schema. Knowledge-graph presence drives baseline AI awareness independent of the careers site itself. Companies with strong KG presence appear in LLM responses even when their careers content is sparse.

DIM · 04

LLM crawlability

Weight · 15%

Access: can AI crawlers reach and read the content at all?

We score robots.txt permissiveness against a fourteen-crawler list (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and equivalents). We check for /llms.txt. Render-readiness matters too: whether the careers content sits in raw HTML or needs JavaScript to materialise. AI crawlers that cannot render never see the content regardless of how good it is. Same for crawlers we block.

DIM · 05

Social-signal density

Weight · 15%

Distributed: is the employer brand present where LLMs absorb employer signal?

We score multi-platform presence against a curated platform list. Currently v1 ships Glassdoor-only via web-search-mediated probing: review count, recency, average rating. Indeed, LinkedIn, Reddit, and other platforms are queued for v2. The dimension currently scores the Glassdoor sub-signal only and weights accordingly inside the composite.

04 · The composite

How the dimensions add up.

Citation

35%

Schema

20%

Knowledge graph

15%

Crawlability

15%

Social signal

15%

The composite is the weighted average of the five dimension scores, on the same 0 to 100 scale. A composite of 50 means the average dimension scored at the midpoint after weighting; a composite of 80 means most dimensions are scoring strongly. The scale is linear, not logarithmic: moving from 70 to 80 represents the same absolute improvement as moving from 30 to 40.

Two design choices worth flagging. First, the citation dimension is weighted highest because it is the closest direct measure of AI visibility. Schema and crawl signals are necessary upstream conditions, but a score on those without a corresponding citation result tells you the plumbing works without telling you the audience hears anything. Second, no dimension can disappear from the composite even when its score is zero: a company that scores 0 on citations because no LLM names them does not get a partial pass for having good schema. The weights are fixed; the composite reflects the full rubric every time.

Composite scores are point-in-time. A run executed today produces a score against today’s rubric; the same site re-audited next week may score differently both because the site changed and because LLM responses are non-deterministic. The longitudinal value comes from running audits at a steady cadence and tracking the trend.

05 · Reproducibility

Versioned methodology, hashed inputs.

Every audit pins the rubric version it scored against and a SHA256 methodology hash of the underlying rubric definition, query templates, and reference data. The current rubric is 2026.06.5; the hash for an audit run under that rubric uniquely identifies the exact rubric content, query set, AI-crawler list, and brand-platform list that produced the score.

Two integrity rules govern version changes. First, a published rubric is never mutated: when the methodology changes, the rubric version bumps and the hash changes with it. An audit run scored under 2026.05.5 remains scored under 2026.05.5 forever; it does not silently re-score against a newer rubric. Second, an audit run whose methodology hash cannot be reproduced from the current source state is not allowed to persist: the trail must always be reproducible from the codebase.

These rules have teeth. 12rubric versions are currently published in the trail, with one earlier version reverted before publication when a methodology change failed its own integrity check. The reverted version’s audit data was deleted from the trail rather than allowed to claim a methodology that no longer exists in source. This is the discipline that makes “published score” mean something.

Year-over-year comparisons require methodology stability. When the rubric does change between time points, comparisons surface the methodology delta explicitly rather than papering over it. This is how The SetpointHQ Index handles longitudinal claims: the published number is always tagged with the version that produced it, and cross-version comparisons carry the version-change context with them.

06 · The Index

Why this all exists.

The methodology described above produces individual audit scores. The methodology compounds when the same rubric is run across a structured cohort: 102 UK employer brands in known sectors, audited at a steady cadence, scored under the same versioned rubric and reproducible from source. That cohort is The SetpointHQ Index.

The Index is a public quarterly dataset, re-run each quarter under the rubric pinned at the date of each run. The v0 cohort ran on 7 May 2026: 102 UK employer brands across 11 sectors, drawn from a target list of 103 (1 excluded at fetch time for broken URLs or anti-bot blocks). The full ranked table is browsable at /the-index. The methodology page documents how the score is built; /the-index surfaces what the score actually says about the cohort.

Retainer clients of SetpointHQ Pro are audited under the same methodology. The Index is the public reference cohort; client audits are the private deep work. Both run on the same engine, the same dimensions, the same rubric versions. Cohort comparisons in client reports (“you score in the top quartile for fintech”) draw from the Index dataset directly.

07 · FAQ

Questions we get asked.

Why four LLMs and not just one?

AI visibility means visibility across engines. Claude, GPT, Perplexity and Gemini have different training data, retrieval surfaces, and citation behaviours. A score against one of them is a score against one model’s opinion.

A score across all four is a score against AI as candidates use the term. We aggregate per-provider results into a cross-provider mean.

Providers that errored on every probe drop from that mean rather than score as zero. Auth failures are not citation signals.

How do you handle LLM non-determinism?

LLM responses vary run-to-run. We accept this rather than fight it. Single audits produce point-in-time scores; the value comes from running audits at a steady cadence and tracking the trend.

Day-to-day variance is real signal noise; week-over-week and month-over-month trends are the durable measurement. The methodology hash pins the rubric and inputs per score. When a number moves, we can tell whether the methodology moved or the LLMs moved.

What’s measured versus what’s recommended?

The rubric measures empirical signals only: LLM probe results, structured data presence, knowledge-graph entries, crawl accessibility, social-platform presence. Recommendations are a separate layer. When an audit identifies a gap, a play library suggests UK-recruitment-specific remediation steps with effort tags and expected impact.

Measurements stay strictly empirical: mixing recommendation logic into scoring would compromise the signal. We enforce the boundary at the architectural level.

Why do we version the methodology at all?

Methodology drifts. New dimensions get added, query templates get refined, reference data lists get updated. Without versioning, comparing a January score against a June score compares two different rubrics without saying so.

Versioning makes the comparison legible: a January score under 2026.05.1 and a June score under 2026.05.7 are explicitly different methodologies. We can inspect the change. Audit-trail integrity is the constraint that justifies the version-pinning machinery.

Can I see a sample audit?

We walk through sample audits on discovery calls rather than publish static artefacts. The questions worth answering depend on the prospect’s sector, current visibility, and what they’re trying to fix. Published examples will land alongside The SetpointHQ Index v0 in Q2 2026.

The cohort’s individual employer pages will provide a public reference set. In the meantime, get in touch.

Get in touch

Audit your employer brand against this methodology.

Discovery calls run thirty minutes. We’ll walk through how the methodology applies to your sector, what your current scores would look like, and where the leverage is. No pitch deck, no slides: the methodology is the pitch.

Get in touch →See a sample audit →

SetpointHQ Pro · operated from the United Kingdom