Key Takeaways
- The quantity of AI-generated articles has surpassed human-written articles on the internet.
- However, the proportion of articles fully generated with AI has plateaued since May 2024.
- Despite the prevelance of AI generated content on the internet, we show in a separate study that this content largely does not appear in Google or ChatGPT. This study did not evaluate whether AI generated content is viewed in proportion by real users, and we suspect that it does not.
- Our study did not evaluate the prevalence of mixed human/AI content, and we believe human-edited AI content may be more prevalent.
Motivation
Since ChatGPT launched in November 2022, many companies have explored publishing content generated by LLMs such as ChatGPT, Claude, and Gemini, in order to grow their traffic across channels such as Google Search, social, and advertising. This is a cost-effective alternative to spending hundreds of dollars for humans to write content.
The quality of AI content is rapidly improving. In many cases, AI-generated content is as good or better than content written by humans (MIT Study). It is often hard to distinguish whether content is created by AI vs. a human (Originality AI Study).
We seek to evaluate the prevalence of article content generated by AI.
Results
We find that starting in November 2024, the quantity of AI-generated articles on the web surpassed human-written articles.
We observe significant growth in AI content starting with the launch of ChatGPT in November 2022. In only 12 months, AI content reached nearly half (39% ) of all content published on the internet.
AI-Generated Article Content Has Surpassed Human-Written Content on the Web
While AI-generated content grew dramatically after ChatGPT launched, we do not see that trend continuing. Instead, we see the amount of AI content vs. human content stay flat over the last 12 months.

AI-Generated Content Is Not Performing In Search
In a separate study, we show that while there is more AI-generated content now published on the internet, this content does not appear in Google Search. In fact, only 12% of content on Google is AI-generated. So, we do not see that AI-generated content has overtaken human content in terms of what real people see.
Methodology
Selection of URLs
Our goal was to find a representative sample of 65k English-language articles on the internet. To do so, we randomly selected URLs from CommonCrawl, and confirmed that each matched the criteria: was in English-language, had an article schema markup, contained greater than 100 words, and was published between January 2020 and May 2025. Lastly, we ran the webpages through the Graphite pagetype classifier to validate that each was an article or listicle.
CommonCrawl
Common Crawl maintains one of the largest publicly available web archives. It provides billions of URLs and is used by research, developers, and is a key data source for training early versions of large language models.
AI Detection Algorithm
Accurate detection of AI content is required to make claims about the prevalence of AI content on the internet. There is a considerable disagreement about the accuracy of AI detection algorithms, and many argue that detecting AI is impossible, or at best, highly inaccurate. Many companies offer AI detection algorithms, including GPTZero, Grammarly, and SurferSEO, with a varying degree of accuracy.
We selected SurferSEO's AI detection algorithm based on its high accuracy in distinguishing fully human-written from AI-generated content. We evaluated the accuracy of SurferSEO’s AI detection algorithm.
Evaluation of False Positive Rate
To evaluate the false positive rate (human-written content classified as AI-generated), we ran SurferSEO’s AI detection algorithm on 15894 articles published between January 2020 and November 2022. Given that the articles were published prior to the launch of ChatGPT in November 30, 2022, there is a very high probability that they were human-written. SurferSEO’s AI detection tool classified 5% of the content as AI-written, suggesting a 5% false positive rate.
Evaluation of False Negative Rate
In order to evaluate the false negative rate (content generated by AI is falsely scored as created by humans), we used OpenAI’s GPT-4o model to generate 6,009 articles on a wide range of topics from projects at Graphite, including commerce, finance, consumer, and b2b enterprise.
We used the OpenAI API set to the GPT-4o model to generate content using the following system prompt:
"You are an expert content writer. Your task is to generate clear, engaging, and informative content about the topic provided by the user.
- Write in a professional yet friendly tone.
- The target audience is people searching on the web for key terms related to the topic provided by the user.
- The user will provide a word count for the prompt. Ensure that the generated content adheres to the specified word count, allowing for a variance of plus or minus 10 percent.
- Avoid jargon unless explained.
- Do not include any disclaimers or meta-commentary."
SurferSEO’s AI detection algorithm correctly classified 99.4% of the AI-generated content as AI-generated, suggesting a 0.6% false negative rate on GPT-4o content.
Quantifying AI Content on the Web
We ran the 65k articles from the sample set through SurferSEO’s AI content detector to evaluate the percentage of content on the web that is AI-generated.
Limitations
AI-Assisted Content
Many people use AI as part of their content creation process. The strategy is to ask AI to create a first version of content, then have a human in the loop to edit or rewrite the content. Our study did not evaluate the prevalence of content created using this strategy, and we believe human-edited AI content may be more prevalent than our research shows.
AI Models
AI models continue to rapidly improve over time. As new models launch, it is possible that it will become more difficult to detect. AI detection was only evaluated on content generated by GPT-4o. It is possible that the SurferSEO detection algorithm has a different rate of accuracy on content generated by other models.






























.jpg)
.jpg)










.png)
.png)











































































