Introduction

In the rush to get an AI/ML augmented application to production, many engineers are disregarding new and emerging patterns in workflow engineering and applied AI. Often, LLMs are tasked with selecting quotes by re-writing/extracting them, or tasked with writing long selectors to interact with a complex interface. In both of these scenarios, the raw information (a quote, or an xpath location) already exists and do not need to be generated, just identified and pointed to. Enter: Deterministic Quoting.

The core pattern behind deterministic quoting is this:

Don't make the LLM generate the solution; make it select from an indexed/pre-processed set, then execute deterministically.

Let's begin by first evaluate how to handle these situations incorrectly...

How to Incorrectly Engineer This Workflow.

Situation: You're writing an LLM-Workflow, and the LLM needs to identify specific topic-based passages from an Essay.

The most common way people will write this is by setting up a structured output for quote selection, and tasking the LLM with populating it.

Below is an example of the target model the LLM would be tasked with populating.

Structured Output Models

class FreeFormQuotes(BaseModel):
    """Output model for free-form quote extraction.
 
    WARNING: This approach allows the LLM to generate or modify text,
    which may result in inaccurate or hallucinated quotes.
    """
    quotes: list[str] = Field(
        description="Relevant quotes extracted from the essay about the topic"
    )
    reasoning: str = Field(
        description="Brief explanation of why these quotes were selected"
    )

Here's the gist of how it would be invoked and populated:

Quote Extraction Logic

def extract_quotes_wrong_way(
    essay_text: str,
    topic: str,
    model: str = "gpt-4o-mini",
    temperature: float = 0
) -> FreeFormQuotes:
    """Extract quotes using the WRONG way: free-form generation.
 
    This function demonstrates why you should NOT ask an LLM to generate
    quotes directly. The LLM may paraphrase, hallucinate, or otherwise
    modify the original text.
 
    Args:
        essay_text: The full text of the essay to extract quotes from.
        topic: The topic to find relevant quotes about.
        model: The OpenAI model to use for extraction.
        temperature: Sampling temperature (0 for deterministic output).
 
    Returns:
        FreeFormQuotes object containing the generated quotes and reasoning.
 
    Example:
        >>> with open("essay.txt") as f:  # doctest: +SKIP
        ...     essay = f.read()
        >>> result = extract_quotes_wrong_way(essay, "artificial intelligence")  # doctest: +SKIP
        >>> print(result.quotes)  # doctest: +SKIP
        ['AI is transforming healthcare...']  # May not match original exactly!
    """
    # Initialize the LLM
    llm = ChatOpenAI(model=model, temperature=temperature)
 
    # Create structured output
    structured_llm = llm.with_structured_output(FreeFormQuotes)
 
    # Build the prompt - this is the WRONG approach
    # We're asking the LLM to GENERATE quotes, not select them
    prompt = ChatPromptTemplate.from_messages([
        ("system", """You are a helpful assistant that extracts relevant quotes from essays.
 
Your task is to extract quotes about a specific topic from the provided essay.
 
IMPORTANT: Extract the exact text as it appears in the essay. Do not paraphrase
or modify the quotes in any way."""),
        ("human", """Essay:
{essay}
 
Please extract relevant quotes about: {topic}
 
Return the quotes exactly as they appear in the text.""")
    ])
 
    # Chain the prompt with the structured LLM
    chain = prompt | structured_llm
 
    # Invoke the chain
    result = chain.invoke({
        "essay": essay_text,
        "topic": topic
    })
 
    return result

Notice that in the above, we're building an llm.with_structured_output which is a nice touch when interacting with LLM's. However, under the hood the LLM is rewriting the identified quotes in its response, to populate the FreeFormQuotes response model.

Let's think about this mechanism.

The LLM can partially hallucinate the quote it's rewriting -- This is what we see most commonly, the LLM is 90% right but it's not a direct quote.
Output tokens cost substantially more than input tokens -- Populating quotes in the response model by rewriting is an output token heavy operation.
LLM's are SLOW! -- The more output the LLM is tasked with generating, the longer our users and applications are waiting for a response.
We're tasking an LLM with Rewriting information that we already have in memory -- This alone should sound alarm bells in the engineer's mind.

The code looks like an AI Engineer wrote it, structured outputs/output parsers are being used along with ChatPromptTemplates. But, when you consider the action under the hood, with the task given to the LLM, the issues with this design should be clear.

How to Correctly Engineer This Workflow

Now that we see how to approach this problem incorrectly, let's take a look at the deterministic quoting pattern in action.

Again, we'll begin with our target result object models.

Structured Output Models

class SelectedQuote(BaseModel):
    """A single quote selection: an index paired with its reasoning."""
    index: int = Field(
        description="Index of the selected sentence (0-based)"
    )
    reasoning: str = Field(
        description="Brief explanation of why this sentence is relevant"
    )
 
 
class QuoteSelection(BaseModel):
    """Structured output for index-based quote selection.
 
    The LLM returns a list of selections, each pairing a sentence index
    with a reason for its relevance — rather than generating text itself.
    """
    selections: list[SelectedQuote] = Field(
        description="List of selected sentences with per-quote reasoning"
    )

Notice that our target structured response is a list[SelectedQuote] and that the SelectedQuote class encapsulates an index int and not a quote this time.

We'll need an object that's programatically populated, not LLM populated:

@dataclass
class IndexedSentences:
    """Indexed sentences from a document, ready for LLM selection.
 
    Attributes:
        sentences: Mapping of index to exact sentence text.
        formatted_context: Pre-built "[0] Sentence..." string for LLM prompts.
    """
    sentences: dict[int, str] = field(default_factory=dict)
    formatted_context: str = ""
 
    def get(self, indices: list[int]) -> list[str]:
        """Look up sentences by index. Invalid indices are skipped."""
        return [self.sentences[i] for i in indices if i in self.sentences]

Finally, we need a mechnaism to pre-process and split() the text our LLM is tasked with evaluating.

def index_sentences(text: str, language: str = "en") -> IndexedSentences:
    """Split text into sentences and return an indexed lookup.

    Uses Stanford's stanza library for robust sentence segmentation
    that handles abbreviations, quotations, and edge cases.

    Args:
        text: The full text to segment.
        language: Stanza language code (default: "en").

    Returns:
        IndexedSentences with a sentence dict and formatted context string.
    """
    nlp = stanza.Pipeline(lang=language, processors="tokenize", verbose=False)
    doc = nlp(text)

    sentences = {i: s.text.strip() for i, s in enumerate(doc.sentences)}
    formatted_context = "\n".join(f"[{i}] {text}" for i, text in sentences.items())

    return IndexedSentences(sentences=sentences, formatted_context=formatted_context)

The above function will use the Stanford NLP library to split the essay into sentences. Each sentence is then assigned an index number and populated in the IndexedSentences object. Lastly, a formatted_context is prepared for populating within the prompt for quote extraction.

Now, we're ready to move on to the LLM call logic.

def extract_quotes_right_way(
    essay_text: str,
    topic: str,
    model: str = "gpt-4o-mini",
    temperature: float = 0
) -> dict[str, Any]:
    """Extract quotes using the RIGHT way: index-based selection.
 
    This function demonstrates the correct approach to quote extraction:
    1. Split essay into indexed sentences using stanza
    2. Present indexed sentences to LLM
    3. LLM selects relevant indices
    4. Retrieve exact sentences deterministically
 
    This guarantees 100% accuracy - the quotes are retrieved exactly
    as they appear in the original text, with no possibility of
    hallucination or modification.
 
    Note: The selected quote can't be hallucinated, but the selection mechanism could.
 
    Args:
        essay_text: The full text of the essay to extract quotes from.
        topic: The topic to find relevant quotes about.
        model: The OpenAI model to use for selection.
        temperature: Sampling temperature (0 for deterministic output).
 
    Returns:
        Dictionary containing:
        - quotes: List of exact quotes from the original text
        - selections: List of SelectedQuote objects (index + reasoning)
        - indexed: The IndexedSentences lookup for verification
    """
    # Step 1: Index the sentences
    indexed = index_sentences(essay_text)
 
    # Step 2: Initialize LLM with structured output
    llm = ChatOpenAI(model=model, temperature=temperature)
    structured_llm = llm.with_structured_output(QuoteSelection)
 
    # Step 3: Build the prompt
    # CRITICAL: We ask the LLM to SELECT indices, not generate text
    prompt = ChatPromptTemplate.from_messages([
        ("system", """You are a helpful assistant that selects relevant sentences from essays.
 
Your task is to identify which sentences are relevant to a given topic.
 
You will be provided with a list of sentences, each prefixed with an index number
in the format [0], [1], [2], etc.
 
IMPORTANT: Return ONLY the indices of relevant sentences. Do NOT generate or
modify any text. Simply select the numbers of sentences that are relevant."""),
        ("human", """Here are the sentences from the essay:
 
{sentences}
 
Please select the indices of sentences that are relevant to the topic: {topic}
 
Return the indices as a list of numbers.""")
    ])
 
    # Step 4: Run the chain
    chain = prompt | structured_llm
    selection: QuoteSelection = chain.invoke({
        "sentences": indexed.formatted_context,
        "topic": topic
    })
 
    # Step 5: Deterministically retrieve exact sentences by index
    indices = [s.index for s in selection.selections]
    quotes = indexed.get(indices)
 
    return {
        "quotes": quotes,
        "selections": selection.selections,
        "indexed": indexed
    }

Notice in the above, the LLM is no longer attempting to write the content of the quote its selecting. Instead, it's only tasked with writing a list of indices (and justifications, which is optional), our application logic is able to match the index with the direct quote, and we can move on with our processing hallucation free!

Additional Use Cases

Our example above was pretty straightforward. Similar patterns outlined above are also employed be offerings like Playwright MCP, which uses the accessibility tree with indexed elements:

For example:

Accessibility snapshot provided to LLM

- role: button
  name: Login
  index: 0
 
- role: textbox
  name: Username
  index: 1
 
- role: link
  name: Sign Up
  index: 2

Instead of writing xpath, or even the name of the element the LLM putput would be a verb and index listing:

{
  "action": "click",
  "element_index": 0
}

This accomplishes so much. It eliminates changes for the LLM to hallucinate an element name, reduces output tokens thus saves cost and shields the LLM context.

Conclusion

If you made it this far in the article, hopefully you're at least more sensitive to the tasks we give LLMs when we weave between deterministic code paths and inference. This pattern is invaluable, rarely can you find a solution that

Saves Money
Increases Computer Performance
Eliminates Hallucination (tactically)

Thoughtfully applying deterministic quoting in your workflows will dramatically improve the quality/responsiveness of your application while simultaneously reducing costs. It's a dream come true!

AI/ML Workflow Engineering Pattern: Deterministic Quoting

Introduction

How to Incorrectly Engineer This Workflow.

How to Correctly Engineer This Workflow

Additional Use Cases

Conclusion

Supplemental Code

Share this article

Move through the archive

Agent Behavior-Steering Middleware

The Context Gardener

Related Posts

Agent Behavior-Steering Middleware

Implementing the SELF-DISCOVER Algorithm in Java Spring with LangChain4J

Getting LLMs to Use Data You Don't Let Them See