This is a highly condensed version of Demystifying Randomness in AI, focusing on position.

Key Takeaways:

1. Position is the rank of an entity within a list of entities, averaged over a sample of AI responses. 

2. Measuring position is straightforward. A sample of 10 responses is enough for a quick estimate.

3. For cases that require greater precision, we can use statistical tools to assess the accuracy of position estimates and use sequential sampling to efficiently determine the right number of responses, based on the desired precision and our risk tolerance. 

Position Is an Entity’s Rank in a List of Entities

AI often responds differently to the same prompt. However, it is not incoherent. AI randomly samples responses from a probability distribution. This means prompt tracking metrics are measurable using a sample of responses. 

When AI responses include a list of entities, we can measure an entity’s position in that list. 

For example, in these responses to the prompt “What are the best flavors of ice cream?”, 

“Cookies & Cream” appears at positions 5, 4, and 5, and “Coffee” appears at position 9 in the third response, and does not appear in the other responses. 

“Best” is subjective — but these flavors are consistently loved and top‑rated around the world:

🍨 Classic Favorites

  • Vanilla – Simple, versatile, and never goes out of style
  • Chocolate – Rich, smooth, and satisfying
  • Strawberry – Sweet with a fruity freshness
  • Mint Chocolate Chip – Cool mint + crunchy chocolate bits

🍫 Indulgent & Decadent

  • Cookies & Cream – Creamy vanilla with chocolate cookie chunks
  • Salted Caramel – Sweet with a savory edge
  • Chocolate Fudge Brownie – For serious chocolate lovers
  • Peanut Butter Cup – Sweet + salty perfection

🍓 Fruity & Refreshing

  • Mango – Bright and tropical
  • Raspberry Swirl – Tart and sweet contrast
  • Pistachio – Nutty, subtly sweet
  • Lemon Sorbet – Light and refreshing (dairy‑free option)

🌍 Unique & Gourmet

  • Matcha (Green Tea) – Earthy and smooth
  • Tiramisu – Coffee and mascarpone flavors
  • Lavender Honey – Floral and delicate
  • Black Sesame – Nutty with a toasted depth

The “best” ice cream flavors really depend on your taste, but here are some of the most loved and highly rated ones across the world:

🍦 Classic Favorites

  • Vanilla – Simple, creamy, and perfect with toppings.
  • Chocolate – Rich, smooth, and timeless.
  • Strawberry – Sweet, fruity, and refreshing.
  • Cookies & Cream – Vanilla with crunchy chocolate cookie pieces.
  • Mint Chocolate Chip – Cool mint with chocolate bits.

🍨 Indulgent & Decadent

  • Chocolate Fudge Brownie – Chocolate base with brownie chunks.
  • Salted Caramel – Sweet and salty balance.
  • Rocky Road – Chocolate, marshmallows, and nuts.
  • Peanut Butter Cup – Chocolate and peanut butter swirls.
  • Cookie Dough – Vanilla with chunks of raw cookie dough.

🌍 Unique & Gourmet

  • Matcha (Green Tea) – Earthy and slightly sweet.
  • Pistachio – Nutty and creamy.
  • Mango – Bright and tropical.
  • Tiramisu – Coffee and mascarpone flavors.
  • Lavender or Honeycomb – Floral and subtly sweet.

The “best” ice cream flavors really depend on your taste, but these are some of the most loved and highly rated worldwide:

🍦 Classic Favorites

  • Vanilla – Simple, creamy, and perfect with toppings.
  • Chocolate – Rich and timeless.
  • Strawberry – Sweet and fruity.
  • Mint Chocolate Chip – Cool and refreshing with chocolate crunch.
  • Cookies & Cream – Vanilla with crushed chocolate cookies.

🍨 Gourmet / Premium Picks

  • Salted Caramel – Sweet with a hint of savory.
  • Pistachio – Nutty and smooth.
  • Rocky Road – Chocolate with marshmallows and nuts.
  • Coffee – Bold and slightly bitter.
  • Cookie Dough – Chunks of dough in creamy vanilla.

🍓 Fruity & Refreshing

  • Mango – Tropical and smooth.
  • Raspberry Swirl – Tart and sweet.
  • Lemon Sorbet – Light and zesty.
  • Coconut – Creamy with a tropical vibe.

🍫 For Serious Sweet Tooths

  • Chocolate Fudge Brownie
  • Peanut Butter Cup
  • S’mores
  • Dulce de Leche

vanilla

chocolate

strawberry

mint chocolate chip

cookies & cream

salted caramel

fudge brownie

peanut butter cup

mango

raspberry swirl

pistachio

lemon sorbet

matcha

tiramisu

lavender honey

black sesame

vanilla

chocolate

strawberry

cookies & cream
mint chocolate chip

fudge brownie

salted caramel

rocky road

peanut butter cup

cookie dough

matcha

pistachio

mango

tiramisu

vanilla

chocolate

strawberry

mint chocolate chip

cookies & cream

salted caramel

pistachio

rocky road

coffee

cookie dough

mango

raspberry swirl

lemon sorbet

coconut

fudge brownie
peanut butter cup

s’mores

dulce de Leche

Position Histograms Help Us Visualize the Position Distribution

A good way to visualize position over a sample of responses is using a position histogram. It tells us how often an entity appears at a particular position, and how often the entity is not mentioned in the response, which is the complement of its visibility (the percentage of responses that mention an entity). 

Cookies & Cream has a relatively stable position distribution. 

In contrast, Coffee only appears 49% of the time, and appears at a range of different positions.

Average Position Summarizes the Position Distribution

How do we aggregate this into a single value? We take the average position over all the responses where an entity appears. We could instead assign a fake position, like the maximum position in the list plus one, when an entity does not appear, but this makes the position more difficult to interpret. 

In the histograms above, the average position for “Cookies & Cream” is 4.7, and the average position for “Coffee” is 15.5. 

To help summarize the variability, it is also useful to view the standard deviation of the position across the sample. The standard deviation of position is 0.58 for “Cookies & Cream” and 3.41 for “Coffee”.

This Is Similar to Position in Google Search Console (GSC)

Note that despite many claims that AI is very different from traditional search, our definition of position is similar to how position is reported in GSC. A search result can appear in different positions at different times and for different users. GSC provides the average position when the URL appears, as shown in the screenshot below.

When the URL appears occasionally, the average position ignores cases where the URL does not rank, as shown in the screenshot below.

Focus on Visibility First, Then Position

Visibility and position are complementary. Visibility tells us how often an entity appears, and position provides the average rank when it appears. It does not make sense to focus on position when visibility is low. The first priority is to appear frequently, and the second priority is to appear at the top of the response, as the first recommended item. 

Higher Visibility Implies Better Position

Visibility and position are correlated. Higher visibility generally implies better position. 

Top Positions Are More Stable

The top of the list is generally more stable than the bottom. 

Although position can be highly variable, these two facts combine to make position easier to estimate than we might expect: high-visibility entities, the ones we care about position for, have better and more stable position.  

Tracking Position Is Straightforward

We can compute empirical estimation errors to develop rules of thumb for coarse tracking and leverage statistical tools when more precision is needed. 

Note that because position is defined only when an entity occurs, the effective sample size for measuring the average position depends on the entity's visibility. Tracking the position of low-visibility entities yields a low effective sample size and unstable estimates, which is another reason to avoid position until visibility is high. 

Because we only compute position for entities that appear, entities that do not appear in a sample will not get a position estimate. This means we may underestimate error with very small sample sizes. But by n=10 responses, if an entity does not appear, the upper bound of its 95% visibility confidence interval is 27.8%. In this case, we should focus on increasing visibility before considering position. 

Position Estimation Error Decreases Rapidly with the Number of Responses

We conduct an experiment to measure the mean absolute error in position estimates as a function of the number of responses. For example, if the estimated position is 4.2 and the true position is 5.5, the absolute error is 1.3. The mean absolute error for a prompt is the average absolute error across all entities that appear in the sample. We measure error for entities with visibility of at least 10% in the full set of responses, and we discard entities that do not appear in the sub-sample. 

The dataset consists of 200 entity comparison question prompts from three sources: prompts tracked in Graphite’s prompt tracking tool, prompts generated from editorial search keywords, and manually written prompts on topics of general interest, like “What are the best science fiction books right now?” We obtain 400 responses for each prompt from the gpt-5.2-chat-latest model via the OpenAI API and extract entities that are answers to the question. We provide the full dataset of all responses and the entities.

By 10 responses, the position estimation error is just above one position, at 1.06 (out of 23.4 qualifying entities, on average), despite the smaller effective sample size. 

Source Data

In the full paper, we show similar results when: 

  • Using OpenAI’s API + web search tool 
  • Using Gemini 3 Flash via API 
  • Using scraped responses from Gemini

Below, we show the percentage of prompts with a mean absolute error <= 2 or <= 1 position after different numbers of responses using the OpenAI API data. With 10 responses, 90.6% of prompts have a mean absolute error <= 2 positions.

Source Data

Many of the prompts that do not achieve MAE <= 1 position after 100 samples are “What are the best things to do in CITY?” questions, which have less stable orderings. In contrast, prompts like “What is a good online calendar scheduling tool?” achieve very low MAE (0.27) at 10 responses.  

Here are examples of two prompts and their position estimates with different sample sizes. The first stabilizes more quickly than the second. 

Confidence Intervals Can Give Us Statistical Margins of Error for Our Estimates

Confidence intervals are statistical margins of error for estimates. We can compute a confidence interval for the average position using the t-distribution, which accounts for uncertainty in both the sample mean and the sample standard deviation (formula in the full paper Appendix). For this calculation, we need to compute the standard deviation of observed positions. We can see some example confidence intervals in the table below.

Google Sheets Calculator

Sequential Sampling Can Help Decide How Many Responses a Prompt Needs

We can use sequential sampling to decide when an estimate is “good enough”, so that different prompts get different numbers of responses, rather than the common strategy of using a fixed number of responses per prompt, such as 100. 

For position, we use a t-distribution-based confidence interval for the mean.                                                                                                                                  

If we want a 95% confidence interval of width less than 2.0 for an entity's mean position, we can iteratively add 10 responses and stop when the width falls below the target. For example, for an entity with an observed position standard deviation of 3.0:                                                                                                                      

  • n=10 => width: 4.29
  • n=20 => width: 2.81                      
  • n=30 => width: 2.24
  • n=40 => width: 1.92 STOP

The savings provided by stopping early depend on the standard deviation of the entity's position. Entities with stable position (low standard deviation) converge much faster than entities with highly variable position. For example, to get a 95% confidence interval of width 2.0, an entity with standard deviation 0.97 (the 10th percentile in our dataset) needs only 7 responses, whereas an entity with standard deviation 5.87 (the 90th percentile) requires 135 responses, a 95% reduction in samples by stopping early.                                                                                                                                                                                                                                                 

In our data set, 37% of entities with visibility ≥ 10% have position standard deviation below 2.0, and only 16% have a standard deviation above 5.0. On average, achieving a 95% confidence interval of width 2.0 would require 55 responses per entity, a reduction of 59% relative to the worst-case 135. For a confidence interval of width 3.0, the reduction is 58%, from 62 responses to 26.

We could also stop when the average or max width of the confidence interval across all entities reaches a target error width.                                                                                                                                                                                                                                            

For example, for entities with visibility ≥ 10%, suppose we want the average 95% confidence interval width on each prompt to be less than 2.0. Capping sampling at 200 responses per prompt, fixed sampling uses 200 responses on every prompt and meets the target on 88% of prompts. Sequential sampling achieves the same meet-target rate with an average of 85.1 responses per prompt, a 57% reduction.

Conclusion

Like visibility, position is probabilistic, but measurable using a sample of responses. The full paper Demystifying Randomness in AI covers this (and many other topics) in much more depth.

Gregory Druck, PhD
Gregory Druck, PhD
Chief AI Officer
Gregory Druck is Chief AI Officer at Graphite.io, where he leads a team of scientists and engineers building AI tools for growth and researching how AI is reshaping marketing. Previously, he was the Chief Data Scientist at Yummly, where he built NLP and computer vision systems for the smart kitchen. Prior to Yummly, he was an NLP and search researcher at Yahoo! Research, with internships at Google and Microsoft. He earned a Ph.D. from the University of Massachusetts Amherst, where he worked on semi-supervised and active machine learning with Andrew McCallum.
Ethan Smith
Ethan Smith
CEO
Ethan Smith is CEO of Graphite.io, a research-driven growth agency that works with companies like Webflow, Adobe, and Upwork. He is an adjunct professor at IE Business School and teaches SEO and AEO at Reforge. His research has been published in ACM, Axios, Financial Times, and The Atlantic. Prior to founding Graphite, Ethan was a growth advisor to Masterclass, Robinhood, and Honey. Ethan was a research assistant focused on human-computer interaction and psychology at UC Santa Barbara and University College London.
Copied