Key Takeaways

  • The number of articles published on the internet that are primarily AI-generated (50%) is equal to the number written by humans (50%).
  • ChatGPT launched in November 2022. Within the first 12 months, the percentage of primarily AI-generated articles jumped to 36%, and reached 48% by 24 months.
  • However, since Q1 2025 the percentage of primarily AI-generated articles has plateaued at roughly 50%. We previously published this finding with data up to May 2025, and new data confirms this trend.
  • We build on our prior research by using three different AI detectors (Pangram, GPTZero, Copyleaks). We independently evaluate each to show that the false positive rates and average false negative rates are consistently below 2%. Each AI detector shows a similar trend.
  • While the trend is the same, our previous study estimated the proportion of primarily AI-generated articles to be 3.3 percentage points higher. This relatively small difference is the result of averaging three AI detectors rather than relying on the accuracy of a single detector.
  • Despite the prevalence of AI-generated articles on the web, we show in a separate study that these articles largely do not appear in Google and ChatGPT. We do not evaluate whether AI-generated articles get as much traffic as human-written articles, but we suspect that they do not.

Motivation 

Since ChatGPT launched in November 2022, many companies have explored publishing content generated by LLMs such as ChatGPT, Claude, and Gemini to grow their traffic across channels such as Google Search, social, and advertising. This is a cost-effective alternative to spending hundreds of dollars for humans to write content.

The quality of AI content is rapidly improving.  In many cases, AI-generated content is as good or better than content written by humans (MIT Study). It is often hard for people to distinguish whether content is created by AI (Originality.ai Study).

We seek to evaluate the prevalence of AI-generated articles.

Results

We observe significant growth in primarily AI-generated articles, coinciding with the launch of ChatGPT in November 2022. After only 12 months, primarily AI-generated articles accounted for 35.9% of articles published. 

In Q1 2025, the quantity of primarily AI-generated articles being published on the web nearly equaled the quantity of human-written articles, 49.6% vs. 50.4%. In Q4 2025, primarily AI-generated articles surpassed human-written at 50.9%, before returning to 49.9% in Q1 2026.   

Source Data

Primarily AI-Generated Article Growth Has Plateaued 

While primarily AI-generated articles grew dramatically after ChatGPT launched, we do not see that trend continuing. Instead, the proportion of primarily AI-generated articles has remained relatively stable, near 50%, over the last five quarters. We hypothesize that this is because practitioners found that primarily AI-generated articles do not perform well in search, as shown in a separate study

Methodology

Common Crawl

Common Crawl maintains one of the largest publicly available web archives. It contains billions of pages and is used by researchers and developers. It is a key data source for training large language models.

Selection of Articles

We need a representative sample of English-language articles on the web. While Common Crawl does not crawl every page, its archive is the best free and publicly available proxy for the web. We want to measure the proportion of all articles being published that are primarily AI-generated, so we do not filter by traffic or use a curated subset. We randomly select 55.4k URLs from Common Crawl, and confirm that each is in English, has an article schema markup, is at least 100 words, has a publish date between January 2020 and March 2026, and is an article or listicle as classified by the Graphite page type classifier.

AI Detection

We classify each article using three AI detectors: Pangram, Copyleaks, and GPTZero. The AI detectors produce different outputs. We provide the output of each detector, and how we transform that output into a binary, primarily AI / primarily human classification below.

Pangram and Copyleaks provide the proportion of the article’s content that is AI-generated.

Pangram

  • Output: Proportion of the article that is Human, AI-assisted, AI
  • Classify as primarily AI if: proportion AI + proportion AI-assisted > proportion Human

Copyleaks

  • Output: Proportion of the article that is Human, AI
  • Classify as primarily AI if: proportion AI > proportion Human

In contrast, GPTZero provides an article-level prediction. (Its Advanced Sentence Scanning output includes sentences that most impact the classification, but it does not directly provide the proportion of AI-generated content. We prefer to use its article-level output rather than devising our own method for computing the proportions.)

GPTZero

  • Output: Prediction (Human, Mixed, AI) and confidence score
  • Classify as primarily AI if: prediction is AI or Mixed

Note that the labels indicating a mixture of AI and human writing are rarely predicted on our dataset: GPTZero tags 6.4% of articles as Mixed, and Pangram tags 1.9% of articles as having AI-assisted text.

Accurate detection of AI-generated content is required to make claims about the prevalence of AI-generated articles on the web. There is considerable disagreement about the accuracy of AI detection algorithms, and many argue that detecting AI is impossible, or at best, highly inaccurate. Therefore, before classifying the articles in our data set, we evaluate the accuracy of the AI detectors. 

Evaluation of False Positive Rates

To evaluate the false positive rate (the percentage of human-written articles classified as primarily AI-generated), we need a dataset of human-written articles. Since the large-scale adoption of AI tools began with ChatGPT, we argue that, with high probability, articles published before its release were written by humans. Therefore, we run each detector on the 15.7k articles in our Common Crawl dataset that were published between January 2020 and November 2022. In the table below, we see that all the AI detectors have low false-positive rates. 

Source: 15.7k Articles from Common Crawl (3/2026) Published Between 1/2020 and 11/2022
AI Detections from Pangram (4/2026), Copyleaks (5/2026), GPTZero (4/2026)

Evaluation of False Negative Rates

To evaluate the false negative rate (the percentage of primarily AI-generated articles classified as human-written), we use GPT-5, Gemini 3.1 Pro, and Claude Opus 4.6 to generate 2,000 articles using each, covering the same topics as a set of reference articles published before November 2022. For each reference article, we first generate a 100-word summary of the article using GPT-5, then we use the summary to AI-generate an article using the system prompt:

You are an expert content writer. Your task is to generate clear, engaging, and informative content about the topic provided by the user.

  • Write in a professional yet friendly tone.
  • The target audience is people searching on the web for key terms related to the topic provided by the user.
  • The user will provide a word count for the prompt. Ensure that the generated content adheres to the specified word count, allowing for a variance of plus or minus 10 percent.
  • Avoid jargon unless explained.
  • Do not include any disclaimers or meta-commentary.

and prompt: 

Write an online article based on the summary provided below with approximately {word_count} words. Use plain text only (no markdown). Add section headings if needed.

SUMMARY: {summary}

where word_count is the word count of the reference article. 

All detectors have low false negative rates, especially for GPT-5, the most popular LLM as of May 2026. 

Source: 2k Articles Generated by Each of GPT-5, Gemini 3.1 Pro, and Claude Opus 4.6
AI Detections from Pangram (4/2026), Copyleaks (5/2026), GPTZero (4/2026)

The raw data for this evaluation is available here

Quantifying Primarily AI-Generated Articles on the Web

Finally, we classify all 55.4k articles in our dataset using each detector to evaluate the percentage of articles that are primarily AI-generated. First, we compute the percentage of articles published in each quarter that are primarily AI-generated using each AI detector. Then, we simply take the average of those AI detector-level estimates.  


The raw data with classifications is available here. Note that we do not include the URLs to avoid identifying specific companies that may be publishing AI-generated articles.

Comparison with Our Prior Study

We previously published a study on the same topic in October 2025. The differences from our prior study are:

  • We extended our Common Crawl sample to include articles published through March 2026.
  • We used three AI detectors instead of one, and averaged their detections. This method is preferable because we do not rely on the accuracy of a single detector. 
    • The methodology is similar in that for Pangram and Copyleaks, we consider an article primarily AI-generated when a majority of its content is detected as using AI. For GPTZero, we use its article-level predictions.

The overall story is the same: a steep rise in primarily AI-generated articles after ChatGPT’s release and a plateau near 50% more recently. However, the percentage of primarily AI-generated articles we find by using multiple detectors is slightly lower than before (3.3 percentage points, on average), due to the more robust averaging method.

Limitations

AI-Assisted Articles

Many people incorporate AI into their content creation process.  One strategy is to ask AI to create a first draft, then have a human in the loop to edit or rewrite it. We did not evaluate the accuracy of AI detectors using this strategy. 

AI Models

AI models continue to improve, and may become harder to detect. We only evaluate the false negative rate on articles generated by GPT-5, Gemini 3.1 Pro, and Claude Opus 4.6. The AI detection algorithm may have lower accuracy when applied to articles generated by other models. 

Acknowledgements: 

We are grateful to Pangram, Copyleaks, and GPTZero for allowing us to use their AI detectors for this study.

  • Pangram: AI detection platform that identifies AI-generated and AI-assisted writing with detailed authenticity analysis.
  • Copyleaks: Content integrity platform offering AI detection and plagiarism checking across text, code, and documents.
  • GPTZero: Detects AI content from ChatGPT, GPT-5, Claude, Gemini, and checks writing quality to make every word worth reading.

We are also grateful to Common Crawl for providing free web crawl data to researchers since 2008.

Appendix

Results by AI detector:

Source Data

Jose Luis Paredes, PhD
Jose Luis Paredes, PhD
Data Specialist
José L. Paredes is a Senior Data Scientist at Graphite. He holds a Ph.D. and a Master’s from the University of Delaware and was a professor at the Universidad de Los Andes for over two decades. He has authored more than 50 research papers, holds 5 U.S. patents, and previously served as Head of Data Science at GoToDigital.
Gregory Druck, PhD
Gregory Druck, PhD
Chief AI Officer
Gregory Druck is Chief AI Officer at Graphite.io, where he leads a team of scientists and engineers building AI tools for growth and researching how AI is reshaping marketing. Previously, he was the Chief Data Scientist at Yummly, where he built NLP and computer vision systems for the smart kitchen. Before that, he was an NLP and search researcher at Yahoo! Research, with internships at Google and Microsoft. He earned a Ph.D. from the University of Massachusetts Amherst, where he worked on semi-supervised and active machine learning with Andrew McCallum.
Bevin Benson
Bevin Benson
Growth Advisor
Bevin is a growth advisor focused on AEO, SEO, and AI based marketing with companies such as Character.AI and Barkbus. She previously worked in operations at Tesla.
Ethan Smith
Ethan Smith
CEO
Ethan Smith is CEO of Graphite.io, a research-driven growth agency that works with companies like Webflow, Adobe, and Upwork. He is an adjunct professor at IE Business School and teaches SEO and AEO at Reforge. His research has been published in ACM, Axios, Financial Times, and The Atlantic. Prior to founding Graphite, Ethan was a growth advisor to Masterclass, Robinhood, and Honey. Ethan was a research assistant focused on human-computer interaction and psychology at UC Santa Barbara and University College London.
Copied