What AI Search Engines Actually Check Before Citing Your Business

AI-powered search engines don't randomly select which businesses to cite. They follow a systematic evaluation process — checking identity, structure, and external evidence before deciding whether to include your business in a generated response. Here's what that process looks like, and what it means for your Generative Engine Optimization strategy.

When someone asks ChatGPT to recommend a web design agency, or asks Perplexity to compare accounting firms, or receives a Google AI Overview about local restaurants, the AI system doesn't just search the web and summarise what it finds. It runs a multi-stage evaluation that determines which sources are trustworthy, which entities are real, and which information is worth citing.

Understanding this evaluation process is the foundation of effective Generative Engine Optimization (GEO). And it maps directly to the three pillars that AISC audits: Identity Clarity, Structural Support, and Evidence Consistency.

How retrieval-augmented generation works

Most AI search systems use a technique called Retrieval-Augmented Generation (RAG). Instead of relying solely on training data, they retrieve current information from the web in real time and use it to generate responses.

The process works in stages. First, the AI system decomposes the user's query into sub-queries — breaking a complex question into searchable fragments. Then it retrieves candidate sources for each sub-query. Then it evaluates those sources for relevance, authority, and consistency. Finally, it synthesises a response, citing the sources it deemed most trustworthy.

Your business can be excluded at any stage. If your website isn't crawlable, you're excluded at retrieval. If your entity identity is ambiguous, you're excluded at evaluation. If your information conflicts with external sources, you're downranked at synthesis.

Check 1: Can the AI identify your business?

The first evaluation is entity recognition. The AI system needs to determine: is this a real business? What kind of business is it? What are its canonical properties?

This is where JSON-LD structured data is decisive. A website with comprehensive Organisation schema provides the AI system with explicit, typed, machine-readable facts: the business name, its type, its URL, its logo, its social profiles, its address, its description.

A website without this structured data forces the AI to infer these facts from unstructured page copy. The AI might parse a sentence like "We're a Melbourne-based digital agency" and extract a location and a business type — but with far less confidence than if the same facts were declared in JSON-LD.

What AISC Checks — Identity Clarity (40% of your score)

Organisation schema

Does your homepage declare an Organisation entity in JSON-LD? This is the single most important structured data signal for AI identity recognition.

sameAs properties

Do you link to your official social profiles (LinkedIn, Facebook, Twitter/X)? AI systems use sameAs to cross-reference your entity across platforms and build confidence in your identity.

Logo declaration

Is your logo URL declared in structured data? This helps AI systems match your visual identity to your entity declaration.

Website schema

Is there a WebSite entity declaration? This establishes the canonical URL that AI systems use as the authoritative source for your business.

Missing any of these signals degrades the AI system's confidence in your entity identity. And confidence determines citation. AI systems cite businesses they can identify with certainty; they skip businesses where the identity is ambiguous.

Check 2: Can the AI access your content?

Even if your structured data is perfect, AI systems need to be able to reach it. This is the structural support evaluation.

Several common technical configurations can block or hinder AI crawlers without the website owner realising it. A robots.txt file that blocks AI user agents. A missing sitemap that prevents efficient crawling. Invalid HTML that confuses parsers. Cloudflare's default bot protection settings, which were updated in recent years to block AI crawlers by default.

Many websites are accidentally invisible to AI crawlers because of default security settings they never reviewed.

The presence of JSON-LD itself is also a structural signal. If your website has no JSON-LD at all — not even a basic WebPage declaration — AI systems have to fall back entirely on unstructured content parsing. This is slower, less reliable, and produces lower-confidence entity data.

AISC checks four structural signals: JSON-LD presence, robots.txt accessibility, sitemap availability, and HTML validity. Each one represents a potential barrier between your business and AI systems trying to evaluate it.

Check 3: Do external sources corroborate your identity?

AI systems don't trust any single source unconditionally. They cross-reference. When your website declares that you're a web design agency in Melbourne, the AI system checks whether Google's search results agree. Whether your Google Business Profile confirms your address and phone number. Whether your business appears in local directory listings with consistent information.

This is the evidence consistency evaluation — and it's where many businesses fail without knowing it.

Common failure modes include: your business name appears differently in your structured data than in your Google Business Profile. Your phone number on your website doesn't match directory listings. Your address is outdated in one source but current in another. Your brand doesn't appear in Google search results for its own name in your target market.

AISC checks evidence consistency by querying three Google surfaces — organic search, Local Finder, and Maps — with your brand name in your specific geographic market. The audit looks for brand presence in search results, knowledge panel availability, rich result rendering, and NAP (name, address, phone) consistency across sources.

Why geography matters

A critical detail that most GEO discussions overlook: AI search evaluations are location-dependent. Google shows different results in different cities. A business that appears in the knowledge panel when searched from Melbourne might be absent when searched from Sydney.

This is why every AISC audit is anchored to a specific geographic market. Your AI visibility in New York is a different question from your AI visibility in London. Auditing from a global default gives you a score that doesn't represent what your actual customers see.

Geographic specificity is fundamental to meaningful GEO measurement. A GEO strategy that doesn't account for location is optimising for an audience that may not exist in the market you actually serve.

Why deterministic scoring matters

One of the risks in the emerging GEO tooling landscape is the temptation to use AI to evaluate AI visibility. It seems intuitive — use a language model to assess how other language models perceive your business.

The problem is reproducibility. LLM responses are non-deterministic by design. Ask the same question twice and you may get different answers. A GEO score derived from AI evaluation is, by definition, unstable. You can't reliably measure progress if your baseline measurement changes every time you check it.

AISC takes the opposite approach. Every signal is binary — present or absent. Every reason code maps to an observable fact. Every score is computed from a deterministic formula. Run the same audit twice and you get the same result. This makes it possible to measure real progress over time, not noise.

If your measurement tool produces different results every time you use it, you can't measure progress. Deterministic scoring is not a philosophical preference — it's a practical requirement.

From evaluation to action

Understanding what AI systems check gives you a clear roadmap for your GEO infrastructure investment. The three checks map directly to the three pillars of the AI Visibility Score.

Identity Clarity answers "Can AI identify you?" — fix your JSON-LD, declare your Organisation entity, add sameAs properties, declare your logo.

Structural Support answers "Can AI reach you?" — ensure your robots.txt doesn't block AI crawlers, publish a sitemap, have valid HTML, and make sure JSON-LD is present in every page response.

Evidence Consistency answers "Do external sources agree?" — align your Google Business Profile, update directory listings, and ensure your NAP data is consistent across the web.

These aren't content tasks. They're infrastructure tasks. And they're the prerequisite for every content-level GEO strategy that follows.