Prompt: Binary Classification (Yes/No Validation)

Metadata

Category: Reasoning
Technique: Binary classification with CoT
Model Tested: GPT-3.5, Llama 3 (8B), TinyLlama, Mistral 7B, various local models
Contributor: User-monkeyrithms, User-vali98
Status: Proven in production (ChatBot RPG game engine)

Purpose

Extract binary decisions from natural language input to drive game logic. Used for validating player actions, detecting intent, and triggering programmatic responses. This pattern enables deterministic game state management while using LLMs for flexible natural language understanding.

Key Innovation: Combines human-readable questions with machine-parseable answers. The LLM explains its reasoning (helpful for debugging) but commits to a binary [YES]/[NO] response that code can reliably parse.

Template

{{input_context}}

Is {{condition_to_check}}? Answer with [YES] or [NO]. Explain your answer step-by-step.

Grammar-Constrained Version

For backends that support grammar constraints (koboldcpp, llama.cpp):

Grammar: root::= "[YES]" | "[NO]"

{{input_context}}

Is {{condition_to_check}}?

Variables

Variable	Description	Example
{{input_context}}	The text to analyze	Player’s action, dialogue, game log excerpt
{{condition_to_check}}	The binary question to answer	”player interacting with an object”, “quest completed”, “combat initiated”

Usage Examples

Example 1: Object Interaction Detection

Input:

{{{INPUT}}}
"Hmm, that looks good. I think I want to eat that apple." Player eats the apple.

Is player interacting with an object right now? Answer with [YES] or [NO]. Explain your answer step-by-step.

Expected Output:

{{{OUTPUT}}}
[YES]: Yes, the player is interacting with an object. The text explicitly states "Player eats the apple," which is a direct interaction with the object "apple". The player is performing the action of eating, which requires physical interaction with the food item.

Code Parsing:

def parse_binary_response(llm_output: str) -> bool:
    """Extract YES/NO from LLM response."""
    if llm_output.strip().startswith("[YES]"):
        return True
    elif llm_output.strip().startswith("[NO]"):
        return False
    else:
        # Fallback regex
        import re
        yes_match = re.search(r'\[YES\]', llm_output, re.IGNORECASE)
        if yes_match:
            return True
        return False
 
response = llm.generate(prompt)
is_interacting = parse_binary_response(response)
 
if is_interacting:
    # Extract item name with follow-up prompt
    item_name = extract_item(context)
    game_state.use_item(item_name)

Example 2: Item Extraction

Input:

{{{INPUT}}}
"Hmm, that looks good. I think I want to eat that apple." Player eats the apple.

What is the item the Player is using? Answer with the item inside square brackets.

Expected Output:

{{{OUTPUT}}}
[apple]

Example 3: Quest Completion Check

Input:

{{{INPUT}}}
Player: "Here is the ancient artifact you requested, wise elder."
Old Man: "Ah, thank you brave adventurer! You have saved our village."

Player hands over the Amulet of Zephyr to the Old Man.

Did the player return the artifact to the old man? Answer with [YES] or [NO].

Expected Output:

{{{OUTPUT}}}
[YES]: The player explicitly handed over the Amulet of Zephyr to the Old Man, which is the artifact requested. The Old Man's response confirms receipt.

Code Integration:

if check_quest_completion(game_log, "return_artifact_to_elder"):
    game_state.complete_quest("village_savior")
    game_state.advance_quest_stage("main_story", stage=2)

Example 4: Combat Detection

Input:

{{{INPUT}}}
Player: "I draw my sword and charge at the bandit!"
The player rushes forward with weapon raised.

Has combat been initiated? Answer with [YES] or [NO].

Expected Output:

{{{OUTPUT}}}
[YES]: Combat has been initiated. The player has drawn a weapon and is performing an aggressive action ("charge at the bandit"), which are clear indicators of entering combat.

Effectiveness Notes

What Works Well

[YES]/[NO] Brackets: Makes parsing trivial with startswith() checks
“Explain step-by-step” suffix: Improves accuracy on smaller models via Chain of Thought
Token limiting: Restrict to 10-50 tokens to prevent rambling while preserving reasoning
Grammar constraints: Forces exact [YES] or [NO] output on compatible backends
Simple questions work best: Binary, unambiguous conditions
Context isolation: Feed only relevant text (not entire game history)

Known Limitations

False positives on intent: LLMs may answer YES if player intends to do something but hasn’t done it yet
- Solution: Exclude quoted dialogue, focus on action descriptions only
Complex multi-part questions fail: “Did player drop item AND pick up new item?”
- Solution: Break into separate binary questions
Negation confusion: “Did player NOT eat the apple?” is harder than “Did player eat the apple?”
- Solution: Frame questions positively
Model-specific quirks: Some questions are inexplicably hard for certain models
- Example: “Did player drop item?” consistently fails on Llama 2 7B
- Solution: Test each question, route difficult ones to stronger models

The Question Tree Pattern

From User-monkeyrithms’s architecture:

Binary questions form a decision tree:

Q1: "Is player interacting with object?" → [YES]
  ↓
Q2: "What item?" → [apple]
  ↓
Q3: "Is [apple] in inventory?" → Check game state
  ↓
If YES: Remove item, update hunger status
If NO: Generate "You don't have that item" response

Each question is self-contained. The program parses answers and decides which question to ask next.

Model-Specific Performance

Model	Accuracy	Notes
GPT-3.5	99%+	Never fails, can handle complex questions
GPT-4	99%+	Overkill for this task, use for complex reasoning
Llama 3 (8B)	90-95%	Very good with clear questions
Mistral 7B	85-90%	Good balance of speed and accuracy
TinyLlama	70-80%	Works for simple questions, unreliable for complex
Llama 2 (7B)	60-80%	Inconsistent, avoid for production

Variations

Minimal (Token-Efficient)

{{context}}

{{question}}? [YES] or [NO]:

Output: [YES] or [NO] only (no explanation)

Confidence Scoring

{{context}}

{{question}}? Answer format:
[YES/NO]: [Confidence 0-100%]: [Explanation]

Output: [YES]: 95%: The text clearly states...

Multi-Choice (Beyond Binary)

{{context}}

What action is the player performing?
[ATTACK] [DEFEND] [MOVE] [INTERACT] [DIALOGUE] [NONE]

Grammar-Constrained Item Extraction

Grammar: root::= "[" item_name "]"
         item_name::= [a-z]+

What item is the player using?

Forces output like [apple], [sword], etc.

Integration Patterns

Multi-Model Routing

From User-monkeyrithms’s approach:

# Easy questions → Fast local model
if question_type == "object_interaction":
    response = local_model_7b.generate(prompt, max_tokens=10)
 
# Hard questions → Smarter model
elif question_type == "quest_logic":
    response = gpt35.generate(prompt, max_tokens=50)
 
# Creative narration → Creative model
elif question_type == "narration":
    response = hathor_13b.generate(prompt, max_tokens=200, temperature=0.9)

Validation Chain

def validate_player_action(player_input: str, game_state: GameState) -> ActionResult:
    """Multi-step validation using binary classification."""
 
    # Step 1: Is this an action or just dialogue?
    is_action = classify_binary(
        f"{player_input}\n\nIs this a game action? [YES/NO]"
    )
    if not is_action:
        return ActionResult(type="dialogue", valid=True)
 
    # Step 2: Does action involve an item?
    uses_item = classify_binary(
        f"{player_input}\n\nDoes this action use an item? [YES/NO]"
    )
 
    if uses_item:
        # Step 3: Extract item name
        item_name = extract_item(player_input)
 
        # Step 4: Validate inventory
        if not game_state.player.has_item(item_name):
            return ActionResult(
                type="action",
                valid=False,
                error=f"You don't have {item_name}"
            )
 
    return ActionResult(type="action", valid=True, item=item_name)

Temperature Settings

Recommended: 0.1 - 0.3 (low temperature for deterministic binary output)
With CoT explanation: 0.3 - 0.5 (allow some reasoning variation)
Grammar-constrained: 0.0 (temperature irrelevant when output is forced)

Testing Checklist

Test your binary classification prompts:

Answers consistently start with [YES] or [NO]
False positives: Doesn’t confuse intent with action
False negatives: Catches all valid cases
Handles edge cases (empty input, ambiguous actions)
Works on target model (test small models separately)
Parsing code handles malformed responses gracefully
Token limit prevents excessive output
Questions are unambiguous and binary

Performance Metrics

From ChatBot RPG production use:

Accuracy: 95%+ with GPT-3.5, 85-90% with Llama 3 8B
Latency: 50-200ms per classification (local 7B models)
Token cost: 10-50 tokens per question (with explanation)
Token cost (grammar): 1-5 tokens (forced binary output)

format-enforcement - General structured output
chain-of-thought - Reasoning patterns
action-narration - What happens after validation succeeds
anti-hallucination - Prevents inventing false states

Source

Discussion context: Lines 1330-1398 of transcript. User-monkeyrithms explaining the “question tree” approach used in ChatBot RPG to extract game state from natural language.

Key insight from monkeyrithms:

“So its a combination of function calls, and being clever about prompting. What I mean by that last part is you really have to think everything through… asking the object/apple question would likely result in a false positive, especially with smaller ‘dumber’ models, because the context suggests that the intent might be to eat an apple. So in this case, you have to check all the text that is not the enclosed-quotes dialogue.”

On model quirks:

“What I’ve found is that the questions models find easier/harder to answer, are not the same as what -we- find easier or harder to answer. There’s a lot of quirks you just have to work with. Like I have one question with multiple possible answers and multiple moving parts to it, so you’d think that’d be harder for a model to get right, but even the little ones do that one well. Then I have another question, ‘did the player drop an item,’ and for some unknown reason, LLMs just really really struggle with that one.”

Screenshot: Media/image-D3FB9.png shows TinyLlama successfully handling binary classification with step-by-step explanation.

Related threads: 01-Architecture-and-Design, 02-Prompt-Engineering

Best Practices

DO:

✅ Use [YES] and [NO] with square brackets for easy parsing
✅ Ask for step-by-step explanation to improve accuracy (CoT)
✅ Limit tokens (10-50 range) to prevent rambling
✅ Frame questions positively (“Did X happen?” not “Did X not happen?“)
✅ Test each question individually on target model
✅ Isolate relevant context (don’t feed entire chat history)
✅ Use grammar constraints when available
✅ Route hard questions to stronger models

DON’T:

❌ Ask multi-part questions (“Did X and Y happen?“)
❌ Include player dialogue in action detection context
❌ Use Yes/No without brackets (harder to parse)
❌ Assume all models handle all questions equally
❌ Trust high temperature outputs
❌ Skip testing on production model
❌ Parse without fallback for malformed output

Code Example: Complete Implementation

from typing import Literal, Optional
import re
 
class BinaryClassifier:
    """LLM-based binary classification for game state validation."""
 
    def __init__(self, model, use_grammar: bool = False):
        self.model = model
        self.use_grammar = use_grammar
 
    def classify(
        self,
        context: str,
        question: str,
        explain: bool = True,
        max_tokens: int = 50
    ) -> tuple[bool, Optional[str]]:
        """
        Perform binary classification.
 
        Returns:
            (result: bool, explanation: Optional[str])
        """
        if self.use_grammar:
            prompt = f'{context}\n\n{question}?'
            grammar = 'root::= "[YES]" | "[NO]"'
            response = self.model.generate(
                prompt,
                grammar=grammar,
                max_tokens=10,
                temperature=0.0
            )
        else:
            explain_suffix = " Explain your answer step-by-step." if explain else ""
            prompt = f'{context}\n\n{question}? Answer with [YES] or [NO].{explain_suffix}'
            response = self.model.generate(
                prompt,
                max_tokens=max_tokens,
                temperature=0.2
            )
 
        # Parse response
        result = self._parse_binary(response)
        explanation = self._extract_explanation(response) if explain else None
 
        return result, explanation
 
    def _parse_binary(self, response: str) -> bool:
        """Extract YES/NO from response."""
        response = response.strip()
 
        # Direct prefix match (fastest)
        if response.startswith("[YES]"):
            return True
        if response.startswith("[NO]"):
            return False
 
        # Regex fallback
        if re.search(r'\[YES\]', response, re.IGNORECASE):
            return True
        if re.search(r'\[NO\]', response, re.IGNORECASE):
            return False
 
        # Default to False if unparseable
        return False
 
    def _extract_explanation(self, response: str) -> Optional[str]:
        """Extract explanation after [YES]/[NO]."""
        match = re.search(r'\[(YES|NO)\]:?\s*(.+)', response, re.IGNORECASE | re.DOTALL)
        if match:
            return match.group(2).strip()
        return None
 
# Usage
classifier = BinaryClassifier(model=llm, use_grammar=False)
 
player_input = '"That apple looks tasty." Player eats the apple.'
is_action, explanation = classifier.classify(
    context=player_input,
    question="Is player interacting with an object right now",
    explain=True
)
 
if is_action:
    print(f"Action detected: {explanation}")
    # Follow-up: extract item name
    item, _ = classifier.classify(
        context=player_input,
        question="What is the item the Player is using? Answer with the item inside square brackets"
    )

Advanced: Question Tree Framework

from dataclasses import dataclass
from typing import Callable, Optional
 
@dataclass
class Question:
    """A node in the validation question tree."""
    prompt: str
    on_yes: Optional['Question'] = None
    on_no: Optional['Question'] = None
    action_yes: Optional[Callable] = None
    action_no: Optional[Callable] = None
 
# Define question tree
q_interact = Question(
    prompt="Is player interacting with an object?",
    on_yes=Question(
        prompt="What is the item? Answer with [item_name]",
        action_yes=lambda item: game.use_item(item)
    ),
    action_no=lambda: game.generate_generic_narration()
)
 
# Execute tree
def execute_question_tree(root: Question, context: str, classifier: BinaryClassifier):
    """Traverse question tree based on LLM answers."""
    current = root
 
    while current:
        result, explanation = classifier.classify(context, current.prompt)
 
        if result:  # [YES]
            if current.action_yes:
                current.action_yes()
            current = current.on_yes
        else:  # [NO]
            if current.action_no:
                current.action_no()
            current = current.on_no

LLM World Engine Knowledge Base

Explorer

binary-classification

Prompt: Binary Classification (Yes/No Validation)

Metadata

Purpose

Template

Grammar-Constrained Version

Variables

Usage Examples

Example 1: Object Interaction Detection

Example 2: Item Extraction

Example 3: Quest Completion Check

Example 4: Combat Detection

Effectiveness Notes

What Works Well

Known Limitations

The Question Tree Pattern

Model-Specific Performance

Variations

Minimal (Token-Efficient)

Confidence Scoring

Multi-Choice (Beyond Binary)

Grammar-Constrained Item Extraction

Integration Patterns

Multi-Model Routing

Validation Chain

Temperature Settings

Testing Checklist

Performance Metrics

Source

Best Practices

DO:

DON’T:

Code Example: Complete Implementation

Advanced: Question Tree Framework

Graph View

Table of Contents

Backlinks

LLM World Engine Knowledge Base

Explorer

binary-classification

Prompt: Binary Classification (Yes/No Validation)

Metadata

Purpose

Template

Grammar-Constrained Version

Variables

Usage Examples

Example 1: Object Interaction Detection

Example 2: Item Extraction

Example 3: Quest Completion Check

Example 4: Combat Detection

Effectiveness Notes

What Works Well

Known Limitations

The Question Tree Pattern

Model-Specific Performance

Variations

Minimal (Token-Efficient)

Confidence Scoring

Multi-Choice (Beyond Binary)

Grammar-Constrained Item Extraction

Integration Patterns

Multi-Model Routing

Validation Chain

Temperature Settings

Testing Checklist

Performance Metrics

Related Prompts

Source

Best Practices

DO:

DON’T:

Code Example: Complete Implementation

Advanced: Question Tree Framework

Graph View

Table of Contents

Backlinks