Back to Blog
3 min read

What sources do AI Search Engines cite? Analysis of 40k responses and 250k sources

An in-depth analysis of 40,000 AI search engine responses and 250,000 sources to understand what content gets cited and why.

Beeri Amiel
Beeri Amiel
What sources do AI Search Engines cite? Analysis of 40k responses and 250k sources

Introduction

AI search engines like ChatGPT, Perplexity, and Google's Gemini are radically changing how buyers look for information—often within a single conversational thread. Because we track performance across these engines, we ran a study analyzing 250,000 citations from 40,000 AI responses to identify which sources AI tools prefer. Below are our key findings.

Our Approach

Data Collection

We analyzed 40,000 answers from ChatGPT, Perplexity, and Gemini, capturing 250,000 unique outbound links. We generated these questions. Throughout our study, we examined the user interface models for all engines, which occasionally changed during the measuring period as the platforms evolved their offerings.

Citation Types

  • Owned: A company's own domain.
  • Competitor: Direct competitor's domain.
  • Earned: Third‐party or affiliate sites, often requiring some external listing or coverage.
  • UGC: User‐generated content (e.g., Reddit, G2, Trustpilot).

Domain Authority

We grouped each domain's Domain Authority (from Moz) into ranges: 0–19, 20–39, 40–59, 60–79, or 80–100.

Buyer Journey Stages

Each question was categorized into: problem_exploration, solution_education, solution_comparison, final_research, or solution_evaluation.

Average Citations by AI Engine

xFunnel Logo
Average Citations Per Response by Engine
  • Perplexity: ~6.61 citations per answer
  • Google Gemini: ~6.1 citations per answer
  • ChatGPT: ~2.62 citations per answer

According to recent traffic data, ChatGPT leads the AI search market with 3.8 billion monthly visits, while Google Gemini receives approximately 267.70 million visits, and Perplexity attracts 99.50 million monthly visitors.

Takeaway

Perplexity and Gemini appear to include citations, pulling in a broader range of sources. We let ChatGPT decide when to use citations (rather than purposefully turning on Search or DeepResearch mode). Using this scenario - ChatGPT cited fewer sources, although this is rapidly evolving as they integrate search and deep research capabilities more deeply. We are also seeing Gemini shift it's inclusion of citations, from including citations in every response, the model will now pick when to include citations.

"Scaling Gemini on the consumer side will be our biggest focus next year."

Domain Authority Still Matters

xFunnel Logo
Percentage of Total Citations by Domain Authority

From the 250k citations we analyzed:

  • 80–100 Domain Authority: ~31.53%
  • 40–59 Domain Authority: ~26.32%
  • 20–39 Domain Authority: ~22.07%
  • 60–79 Domain Authority: ~15.33%
  • 0–19 Domain Authority: ~4.76%

Takeaway

While very low‐authority sites (0–19) rarely appear, domains at 20+ DA show up more consistently. There is a definite preference for stronger domains, but as soon as a site crosses a moderate DA threshold, it tends to be fairly well represented—with top‐tier authority leading the pack. This includes many affiliate sites and blogs, which hold notable influence in AI search results.

Citation Types: Earned Dominates, UGC on the Rise

xFunnel Logo
Distribution of Citation Type by Engine

Earned (third‐party, editorial, or affiliate) remains the most frequent type of citations across ChatGPT, Gemini, and Perplexity.

Takeaway

Data shows a strong presence of affiliate sites and blogs alongside UGC forums. As AI engines refine their results, these sources carry substantial weight in how final answers get assembled, something marketers need to be aware of this AI search engines gain momentum.

Citations Across the Buyer Journey

xFunnel Logo
Citation Type by Buyer Journey Phase
  • Early Stages (problem_exploration, solution_education): Heavy reliance on Earned (press, third‐party sites).
  • Mid Stages (solution_comparison): Noticeable increase in UGC, indicating buyers want peer reviews or firsthand experiences.
  • Later Stages (final_research, solution_evaluation): Greater mention of Owned and Competitor domains for direct product details.

Takeaway

The data suggests that, from top to bottom of the funnel, AI engines shift from broad, third‐party perspectives (Earned) toward more user‐centric or brand‐specific sources (UGC, Owned). It's an evolving mix that shows how conversation flows from problem identification to final decision.

Company Size and Source Preference

xFunnel Logo
Distribution of Source by Company Size (employee count)

Analysis spanned organizations ranging from <100 employees to 50,000+. In all segments, Earned remains the most frequently cited type. However, as we analyzed larger companies, there was a gradual rise in AI search engines citing sources from the company's own domains.

Takeaway

The larger the company, the more likely AI engines are to cite that brand's own site or close competitors. This pattern suggests that major enterprises have more recognized domains, leading engines to surface those larger corporate sites more frequently in responses.

Where UGC Citations Come From

Where UGC Citations Come From

In general, user‐generated content appears predominantly on platforms like Reddit, YouTube, G2, GitHub, and Medium. However, different AI search engines favor different UGC hotspots, reflecting each platform's indexing approach.

Takeaway

Even within the broad category of "UGC," each engine relies on a unique cluster of review platforms, forums, and social sites. Perplexity skews heavily toward YouTube and PeerSpot, while Gemini highlights Medium, Reddit, and YouTube and ChatGPT frequently mentions LinkedIn, G2 and Gartner Peer Reviews. This diversity suggests that AI crawlers differ in their indexing strategies and may integrate new platforms over time.

Conclusion

In examining 250,000 citations across 40,000 AI responses, we've seen how domain authority, earned media, and user‐generated content all shape what AI search engines choose to reference. While Perplexity and Gemini rely extensively on real‐time crawl data, ChatGPT's approach—though evolving—still showcases a unique blend of sources, including UGC and affiliated domains.

As these AI experiences continue to mature, we anticipate ongoing shifts in how web content is discovered and cited. By understanding the data behind the results, marketers and SEO teams can better influence how their brand appears (or doesn't) in customer buying journeys within the AI search engines.

Want to know how to improve your brand's visibility on AI Search? We'd be happy to help.