White paper

AI Content & Search

Jose Luis Paredes, PhD

Data Specialist

Gregory Druck, PhD

Chief AI Officer

Ethan Smith

CEO

I. Motivation

We are frequently asked about using AI to automate organic content creation. Companies are excited about the potential to dramatically reduce content creation costs and scale content more efficiently. There is also a perception that everyone is already using AI to create content, and Google is now flooded with AI-generated content.

On the other hand, there have been well-circulated success stories of sites using AI-generated content that soon after lost most of their traffic, and Google continues to emphasize expertise and personal experience as key indicators of content quality.

In this whitepaper, we first investigate how much AI content currently appears in Google search results (Note that this paper focuses on organic search results with AI-generated content rather than AI Overviews generated by Google). Then, we analyze the performance of AI-generated content, relative to human-written content.  We find that purely AI-generated content makes up 3% of organic search results today, and generally ranks lower than human-generated content.

Our results suggest exercising caution about using a purely AI-generated content strategy.

II. Experiment Setup

1. Keyword Selection

We select an initial list of 2200 keywords evenly distributed in the following categories:

  • Tech
  • Productivity
  • News
  • Food
  • Finance
  • Entertainment
  • Education
  • Crypto
  • Commerce
  • Local

More specifically, we select the top non-branded keywords for the top five domains in each category. We then de-duplicate the keywords, filter out nonsense or irrelevant keywords, and use that as our base list per category. For each category, we randomly select 220 keywords.

2. URL Selection and Filtering

We start with the top 20 organic results for each keyword. AI detectors work best on long-form, editorial content, so we limit our analysis to articles and listicles, excluding product, category, landing, video, and other pages. To reduce costs, we also remove PDFs and very long pages.

The AI detector could not download the content of some pages, for example, in cases where the website blocks access.

After filtering and processing with the AI detector, we have 20280 URLs.

3. AI Detection

We use Originality.ai as the AI detection tool. It returns a score between 0 and 1, indicating the likelihood that the page text comes from generative AI tools such as ChatGPT, GPT-4, Gemini Advanced, or Llama 3, among others, or from a human writer.  Numerous studies have shown that Originality.ai is one of the most accurate AI-content detectors available, having an accuracy of over 90% on multiple data sets1. Furthermore, several top digital content creators, news media, publishers, and writing agencies rely on Originality.ai as their primary AI-content detector tool.

More specifically, Originality.ai returns a global score for the entire page and per-paragraph scores. Let (ai_score(k),wc(k)) be the ai score and the word count for the kth paragraph of page content, respectively. Originality.ai computes the global ai_score as the average of the paragraph-based ai_score(k) for k = 1,2,..., P where P is the number of paragraphs in the page of interest.

Since our analysis heavily relies on the accuracy of the AI-content detector, we favor a conservative approach, i.e., we keep the URLs for which the AI-content detector outputs a high confic score for most paragraphs, excluding the URLs with uncertain/ambiguous results from the AI detector. First, we classify each paragraph into one of the following three groups:

  1. AI-generated paragraph if ai_score(k) ≥ 0.85
  2. Human-created paragraph if ai_score(k) < 0.15
  3. Uncertain paragraph otherwise

Selecting the threshold values (0.85 and 0.15) is a trade-off between increasing the detector's precision and avoiding filtering out many URLs. We then compute the percentage of content for each content type as follows.

Percentage of AI-generated content
where Ω denotes the subset of AI-generated paragraph indices,
and wc(k) is the number of words in the kth paragraph.

Similarly, for the percentage of human-created and uncertain content. Notice that the longer the paragraph, the more it contributes to the page-level prediction.

Finally, we remove pages containing a substantial proportion of ambiguous content. Doing so ensures that our final URL set contains only URLs where the AI-content detector yields high confidence across multiple paragraphs. Our analysis found that setting a maximum threshold value for the percentage of uncertain content to 30% leads to a more reliable database for further study.  This last filtering stage removes about 40% of the URLs processed by the AI detector, leaving 11994 URLs.

4. AI Content Taxonomy

We divide the URLs into categories based on the proportion of AI-generated. To be more specific, we categorize URLs into four types:

  • Human-created: < 10% AI content
  • AI-generated: ≥ 90% AI content
  • Mixed: low AI content: 10-50% AI content
  • Mixed: high AI content: 50-90% AI content

Note that AI-generated content that a human subsequently edits might lie in either Mixed category.

III. Results

How Much AI Content Appears in Search?

Contrary to the claim that AI content is flooding the web, only a small percentage of the results on the first two SERPs are AI-generated.

FIGURE 1

Pure AI-generated content makes up about 3% of pages, while pages with more than 50% AI content make up 12%. URLs with minimal to no AI content dominate the search results for the first 20 positions, making up 88% of the total URLs used in our study.

FIGURE 2

Some categories, like Food, contain minimal purely AI-generated content, with less than 1% of the URLs. Across most categories, around 3% of the URLs are AI-generated, although Commerce stands out with 8% AI-generated. Moreover, categories like Crypto, Commerce, Finance, and Local have roughly 20% of their URLs featuring AI-generated content exceeding 50%.

Evaluating Rank Performance for AI-generated content

How well does AI-generated content rank? Is there a correlation between the quantity of AI-generated content and rank?

It’s worth noting that ranking depends on many variables, and isolating the effect of a particular variable on the rank is challenging.

With that caveat, we compare the ranks of each content type. More specifically, for each keyword, we select the top-ranking page for each AI content category. For example, for a particular keyword, the top-ranking article page confidently classified as human-created may appear at position 3, while the top-ranking article page confidently classified as AI-generated may appear at position 5. Then, we compute the average and the median of those best positions over all keywords.

Our results suggest that pages with more AI-generated content rank lower.

FIGURE 3

The average and median best positions increase (i.e., pages rank lower) as AI content increases. The best position for human-created content is in the first five positions 50% of the time, while the best position for AI-created content is on the second SERP 50% of the time. The results suggest that human-written content outranks AI content.

IV. Conclusion

Our findings suggest that AI-generated content comprises a small percentage of search results and ranks lower than human-created content, and the average rank decreases as the amount of AI-generated content increases. The AI landscape is evolving rapidly, but at the moment, our results caution against a purely AI-generated strategy. While AI can reduce costs, when comparing the ROI of different content creation strategies, it is important to note that human-written content performs better in search.

Things are changing quickly — AI is improving and Google algorithms are adapting — so we will continue to study the performance of AI content over time.

Copied