Prompt: Format Enforcement (Structured Output Control)

Metadata

  • Category: Constraint
  • Technique: Explicit format instruction with bracketed output
  • Model Tested: GPT-3.5, GPT-4, various local models
  • Contributor: User-appl2613 (monkeyrithms)
  • Status: Proven (production in ChatBot RPG)
  • Implementation: ChatBot RPG

Purpose

Force LLM to output in a specific, parseable format (typically bracketed, comma-separated values) that can be reliably extracted and processed by code. This technique transforms LLM from creative writer to structured data extractor, enabling deterministic parsing and game logic.

Core Principle: Tell the LLM you’re expecting structured output, not creative output.

Template

Read the following text. Answer with only {{format_description}}.

Rules:
- Output format: {{format_example}}
- {{item_format_rule}}
- {{multiplicity_rule}}
- No additional text, commentary, or explanation
- If {{nothing_found_condition}}, output: {{empty_format}}

Text to analyze:
{{input_text}}

Output:

Variables

VariableDescriptionExample
{{format_description}}What to output”the singular form of consumed objects in brackets”
{{format_example}}Example of correct format”[WATER],[BERRY JUICE]“
{{item_format_rule}}How to format individual items”Write in singular form, uppercase”
{{multiplicity_rule}}How to handle duplicates”If multiple same items, write singular once only”
{{nothing_found_condition}}When to return empty”no objects are found”, “character performs no action”
{{empty_format}}What to return if nothing found”[NONE]”, ”[]”, “NONE”
{{input_text}}The text to extract fromPlayer message, chat log, narration

Usage Examples

Example 1: Item Consumption Detection

Prompt:

Read the following text. Answer with only the singular form of the object or objects that the character is consuming in brackets.

Rules:
- Output format: [ITEM1],[ITEM2],[ITEM3]
- Write each item in singular form, uppercase
- If multiple of the same item, write it once only (e.g., "two apples" = [APPLE])
- No additional text, commentary, or explanation
- If no consumption occurs, output: [NONE]

Text to analyze:
"I drink a soda to quench my thirst, then eat two apples."

Output:

Expected Output:

[SODA],[APPLE]

Code Integration:

def extract_consumed_items(player_message):
    prompt = format_consumption_prompt(player_message)
    raw_output = llm.generate(prompt, temperature=0.2)
 
    # Parse bracketed items
    items = re.findall(r'\[([^\]]+)\]', raw_output)
 
    # Convert to lowercase, check inventory
    for item in items:
        item_name = item.lower()
        if item_name in player_inventory:
            player_inventory[item_name] -= 1
            apply_item_effect(item_name)
        else:
            notify_player(f"You don't have {item_name}")
 
    return items

Example 2: Location/Keyword Detection

Prompt:

Read the following text. Answer with only location names or places of interest mentioned, in brackets.

Rules:
- Output format: [LOCATION1],[LOCATION2]
- Write each location name exactly as mentioned, capitalized
- Include only explicitly named locations (not "there" or "here")
- No additional text, commentary, or explanation
- If no locations mentioned, output: [NONE]

Text to analyze:
"I leave the house and head to the office. On the way, I stop by the park."

Output:

Expected Output:

[HOUSE],[OFFICE],[PARK]

Code Integration:

def update_location_keywords(chat_message):
    prompt = format_location_prompt(chat_message)
    raw_output = llm.generate(prompt, temperature=0.1)
 
    # Parse locations
    locations = re.findall(r'\[([^\]]+)\]', raw_output)
 
    # Add to setting keywords
    for location in locations:
        if location.lower() not in excluded_words:
            setting.add_keyword(location.lower())
            trigger_narrator_for_location(location)
 
    return locations

Example 3: Action Classification

Prompt:

Read the following text. Classify the action type with a single word in brackets.

Rules:
- Output format: [ACTION_TYPE]
- Valid types: [ATTACK],[DEFEND],[MOVE],[INTERACT],[SPEAK],[WAIT]
- Choose only ONE action type (the primary action)
- No additional text, commentary, or explanation
- If unclear or no action, output: [WAIT]

Text to analyze:
"I draw my sword and charge at the goblin!"

Output:

Expected Output:

[ATTACK]

Example 4: Yes/No Decision

Prompt:

Read the following text. Determine if the character is interacting with the object "{{object_name}}".

Rules:
- Output format: [YES] or [NO]
- Answer [YES] only if explicit interaction with {{object_name}}
- Mere mention without interaction = [NO]
- No additional text, commentary, or explanation

Text to analyze:
{{player_message}}

Output:

Expected Output:

[YES]

or

[NO]

Code Integration:

def check_object_interaction(player_message, object_name):
    prompt = format_interaction_prompt(player_message, object_name)
    raw_output = llm.generate(prompt, temperature=0.1)
 
    if '[YES]' in raw_output.upper():
        trigger_object_event(object_name)
        return True
    return False

Format Markers: Why Brackets?

User-appl2613’s Explanation:

“You probably don’t need something like brackets but it just tells the LLM that you’re expecting a structured output vs. a creative one.”

Alternative Markers:

  • [ITEM] - Brackets (most common)
  • {ITEM} - Braces
  • <ITEM> - Angle brackets (be careful, may trigger HTML thinking)
  • |ITEM| - Pipes
  • ITEM:: - Colons (less reliable)
  • JSON format - {"item": "value"} (good for complex structures)

Why Brackets Work Best:

  1. Visual distinction: Clearly not natural prose
  2. Easy parsing: Simple regex: \[([^\]]+)\]
  3. Convention: LLMs trained on code/data use brackets
  4. Unambiguous: Unlikely to appear naturally in text

Effectiveness Notes

What Works Well

  • Deterministic parsing: Reliable extraction even from inconsistent LLMs
  • Model-agnostic: Works across GPT, Claude, local models
  • Low temperature effective: Temperature 0.1-0.3 gives consistent formatting
  • Clear instruction: Reduces hallucination by constraining creativity
  • Error handling: Can detect malformed output and retry
  • Production-proven: Used in ChatBot RPG for inventory, locations, actions

Known Limitations

  • Hallucination risk: Small models may invent items not in text
  • Context sensitivity: More context = more likely to follow format
  • Ambiguity issues: LLM may interpret differently than intended
  • Model drift: Each model requires prompt tuning
  • Parsing brittleness: Extra text around brackets breaks simple parsers
  • Cost: Every format enforcement = an LLM call

Hallucination Warning

User-appl2613 (February 2024):

“this approach has a major weakness: hallucination, especially in small models. The more sensory-deprived the model is, the more it will hallucinate, because it has to (like our brains will in a sensory-depravation tank). Small models love to hallucinate”

Example of Hallucination:

Player input: "I eat my apple."

Expected: [APPLE]

Small model output: "[Post office] John eats his apple and then heads to the post office to pick up his package."

Mitigation:

  • Use lower temperature (0.1-0.2)
  • Provide more context (recent events, location, inventory)
  • Add explicit “only output what’s in the text” constraint
  • Use validation: check if extracted items make sense given context
  • Retry with rephrased prompt if output is malformed

Model Considerations

  • GPT-3.5/GPT-4: Very reliable, rarely breaks format
  • Local 7B-13B models: Need stricter instructions, more examples
  • Creative models (Hathor, etc.): May resist structured output, need firm constraints
  • Instruction-tuned models: Best for this task

Temperature Settings

Recommended Settings for Format Enforcement:

  • Temperature: 0.1-0.3 (lower = more consistent format)
  • Top-p: 0.85-0.9
  • Repetition Penalty: 1.0-1.1 (don’t want to suppress correct repeated format)
  • Max Tokens: 50-100 (short output expected)

Note: Format enforcement is one of the few cases where temperature near 0 is beneficial.

Variations

1. With Few-Shot Examples

Show correct format multiple times:

Read the text and extract consumed items in brackets.

Example 1:
Text: "I drink water and eat bread."
Output: [WATER],[BREAD]

Example 2:
Text: "She consumes two healing potions."
Output: [HEALING POTION]

Example 3:
Text: "He walks to the door."
Output: [NONE]

Now analyze:
Text: {{input_text}}
Output:

2. JSON Format (Complex Data)

For multi-field extraction:

Read the text and extract information as JSON only.

Format:
{
  "action": "verb describing action",
  "target": "object or character affected",
  "location": "where it occurs",
  "items_used": ["item1", "item2"]
}

Text: {{input_text}}

JSON:

3. Multi-Stage Format

Extract multiple things in one pass:

Read the text and extract:

ITEMS: [item1],[item2]
LOCATIONS: [loc1],[loc2]
NPCS: [npc1],[npc2]
ACTION: [action_type]

Text: {{input_text}}

Output:

4. Confidence Scoring

Include confidence with format:

Extract consumed items with confidence (0-10).

Format: [ITEM:confidence]

Example: [WATER:10],[MAYBE_BREAD:4]

Text: {{input_text}}

Output:

Validation and Error Handling

def robust_format_extraction(text, max_retries=3):
    """
    Extract formatted data with retries and validation.
    """
    for attempt in range(max_retries):
        prompt = format_prompt(text)
        output = llm.generate(prompt, temperature=0.1)
 
        # Try to parse
        try:
            items = parse_bracketed_items(output)
 
            # Validate
            if validate_items(items, text):
                return items
            else:
                # Retry with stricter prompt
                prompt = add_strictness(prompt, attempt)
        except ParseError:
            # Malformed output, retry
            continue
 
    # All retries failed, fall back to deterministic method
    return fallback_parser(text)
 
def parse_bracketed_items(output):
    """Extract items from [ITEM] format"""
    matches = re.findall(r'\[([^\]]+)\]', output)
    if not matches:
        raise ParseError("No bracketed items found")
    return [m.strip() for m in matches]
 
def validate_items(items, original_text):
    """Check if extracted items make sense"""
    if items == ['NONE']:
        return True
 
    # Check if items appear in original text
    text_lower = original_text.lower()
    for item in items:
        if item.lower() not in text_lower:
            # Item hallucinated, reject
            return False
 
    return True

Integration with Game Logic

class FormatEnforcer:
    """Helper class for format-enforced LLM queries"""
 
    def __init__(self, llm):
        self.llm = llm
 
    def extract_items(self, text, format_type='consumption'):
        prompt = self.prompts[format_type].format(input_text=text)
        output = self.llm.generate(prompt, temperature=0.2)
        return self.parse(output)
 
    def extract_yes_no(self, text, question):
        prompt = f"""
        {question}
        Answer only [YES] or [NO].
        Text: {text}
        Output:
        """
        output = self.llm.generate(prompt, temperature=0.1)
        return '[YES]' in output.upper()
 
    def extract_classification(self, text, categories):
        cat_list = ','.join(categories)
        prompt = f"""
        Classify the following text into one category: {cat_list}
        Output format: [CATEGORY]
        Text: {text}
        Output:
        """
        output = self.llm.generate(prompt, temperature=0.2)
        match = re.search(r'\[([^\]]+)\]', output)
        return match.group(1) if match else None
 
    def parse(self, output):
        return re.findall(r'\[([^\]]+)\]', output)

Testing Checklist

  • Test with clear, unambiguous input
  • Test with ambiguous input
  • Test with no relevant content (should return [NONE])
  • Test with multiple items
  • Test with duplicate items
  • Test for hallucination (items not in input)
  • Test parsing robustness (extra text before/after)
  • Test across different models
  • Test at different temperatures
  • Test with malformed input
  • Measure retry rate
  • Validate accuracy vs. ground truth

Performance Metrics

From ChatBot RPG Production Use:

  • Format compliance: ~95% with GPT-3.5 at temp 0.2
  • Hallucination rate: ~5-10% with small local models, ~1% with GPT-3.5
  • Parse success rate: ~98% with bracketed format
  • Retry rate: ~2% need second attempt
  • Latency: 200-500ms per extraction (model-dependent)

Source

Original Discussion: Discord transcript lines 6550-6599 (February 10, 2024) Context: User-appl2613 explaining ChatBot RPG’s item consumption and location detection system Key Quotes:

“You probably don’t need something like brackets but it just tells the LLM that you’re expecting a structured output vs. a creative one.”

“Then I’ll parse the results into: ‘soda’ and it checks if ‘soda’ is in the inventory. If it is, it subtracts 1 soda from the inventory and cures the player’s thirst.”

Warning on Hallucination:

“this approach has a major weakness: hallucination, especially in small models. Small models love to hallucinate… so it can take a fair bit of fanangling to get it to play nice”

Advanced: Function Calling vs Format Enforcement

Modern Alternative: OpenAI/Claude function calling

# Instead of format enforcement prompt:
functions = [{
    "name": "extract_consumed_items",
    "parameters": {
        "type": "object",
        "properties": {
            "items": {
                "type": "array",
                "items": {"type": "string"}
            }
        }
    }
}]
 
response = llm.chat_completion(
    messages=[{"role": "user", "content": text}],
    functions=functions,
    function_call={"name": "extract_consumed_items"}
)
 
items = response.function_call.arguments['items']

Pros of Function Calling:

  • More reliable structure
  • Native support
  • Better for complex schemas

Pros of Format Enforcement:

  • Works with any model
  • Simpler implementation
  • More control over exact format
  • No special API features needed

Recommendation: Use function calling if available, format enforcement otherwise.