ChatBotRPG - Production Lessons

Analysis Date: 2026-01-18 Source: Discord discussions and design decisions by appl2613


Overview

This document captures key insights and lessons learned from real-world ChatBotRPG usage, including design trade-offs, performance optimizations, and UX philosophy.


Lesson 1: The 170-Token Sweet Spot

Discovery

Finding: Shorter, more frequent messages (170 tokens ≈ 2-3 sentences) feel more responsive than longer narrations, despite using the same total token count.

From Discord (appl2613):

“a rapid series of shorter messages, but a more responsive world, and characters who talk in short lines or in a conversational manner that feels very realtime”

The Problem with Long Narrations

Before (avg 500 tokens per turn):

You enter the Golden Oak Inn, a cozy establishment with a massive oak
tree growing through the center of the common room. The warm firelight
flickers across worn wooden tables where patrons sit drinking and
talking. The smell of roasted meat and fresh bread fills the air,
making your stomach rumble. A bard strums a lute in the corner, playing
a melancholic tune. The bartender, a gruff-looking man with a thick
beard, wipes down glasses behind the bar. Several patrons glance your
way as you enter, but most return to their conversations. You notice
a notice board near the entrance covered in various postings.

Issues:

  • Feels slow and ponderous
  • Player waits 3-5 seconds for response
  • Information overload
  • Less engaging pacing
  • Higher cost ($0.10+ per session)

The Solution: 170-Token Limit

After (170 tokens per turn):

You enter the Golden Oak Inn. The warm firelight flickers across worn
wooden tables. The smell of roasted meat and ale fills the air.

Benefits:

  1. Faster response time: 1-2 seconds vs. 3-5 seconds
  2. More responsive feeling: Rapid back-and-forth like real conversation
  3. Better pacing: Players stay engaged
  4. 50-66% cost reduction: 0.10 per session
  5. Front-loaded coherence: Shorter outputs = model focuses on most important tokens

Implementation Strategy

Three-layer enforcement:

  1. API-level limit: max_tokens=170
  2. Prompt instruction: “Write EXACTLY 2-3 sentences”
  3. Post-processing: Regex to enforce sentence count
def enforce_170_token_limit(prompt: str, llm_client) -> str:
    # Layer 1: API limit
    response = llm_client.complete(prompt, max_tokens=170)
 
    # Layer 2: Prompt instruction (already in prompt)
    # "Write EXACTLY 2-3 sentences"
 
    # Layer 3: Post-processing
    sentences = re.split(r'(?<=[.!?])\s+', response)
    if len(sentences) > 3:
        sentences = sentences[:3]
 
    return ' '.join(sentences)

Cost Analysis

Token LimitTokens/TurnTokens/Session (200 turns)Cost (Gemini 2.5)Savings
No limit (500)500100,000$0.100-
300 tokens30060,000$0.06040%
170 tokens17034,000$0.03466%

ROI: 66% cost reduction with improved UX

When to Break the Rule

Exceptions where longer narrations are justified:

  1. Scene transitions: First time entering major location (250 tokens)
  2. Important revelations: Plot twists or major discoveries (300 tokens)
  3. Combat finishers: Dramatic final blows (200 tokens)
  4. Cutscenes: Scripted story moments (unlimited)

Example exception:

if is_first_visit(location) and location.importance == "major":
    max_tokens = 250  # Allow richer description
else:
    max_tokens = 170  # Standard limit

Lesson 2: The “Diamond Horses” Problem

Discovery

Finding: Creative models (like Hathor) hallucinate impossible events without explicit constraints, even when given character sheets and world state.

From Discord (yukidaore testing Hathor):

“Before constraints: Death Knight Mara with massive battleaxe appeared instead of bartender. Diamond horses manifested. Teleportation happened spontaneously. Guard gained magic powers mid-combat.”

The Problem

Creative models are too creative:

  • Invent items not in inventory
  • Give NPCs abilities they don’t have
  • Violate physics without magic
  • Change predetermined outcomes
  • Introduce new characters spontaneously

Example failure:

Player: "I talk to the bartender"
Game state: {location: "inn", npcs: ["bartender"]}

LLM (Hathor, no constraints):
"You approach the bar, but instead of the bartender, a towering Death
Knight named Mara stands behind it, her massive battleaxe gleaming.
Diamond horses burst through the windows, their crystalline hooves
sparking. You suddenly teleport to the roof."

The Solution: Explicit Constraint System

Core constraints (injected into every prompt):

{{char}} is a logical and realistic text adventure game.

CRITICAL RULES:
1. Player has NO equipment/items/powers unless in character sheet
2. Impossible actions MUST fail with appropriate narration
3. Do NOT invent new items, locations, or characters
4. Do NOT change dice roll outcomes or predetermined results
5. Do NOT violate setting physics or logic
6. NPCs cannot spontaneously gain abilities they don't have
7. NO teleportation, flight, or magic unless explicitly permitted

VALIDATION PROCESS:
Before narrating any action:
1. Check if action is physically possible
2. Verify player has required items/abilities
3. Confirm NPCs are acting within their capabilities
4. Ensure world rules are respected

Results

ModelBefore ConstraintsAfter ConstraintsCompliance Rate
Hathor❌ Frequent hallucinations✅ Rare issues~90%
Gemini 2.5⚠️ Occasional invention✅ Excellent~98%
EstopianMaid⚠️ Minor inconsistencies✅ Very good~95%

Constraint Evolution

Phase 1: Implicit expectations (failed)

  • “Respect the character sheet”
  • “Stay consistent with the world”
  • Result: Models ignored hints

Phase 2: Explicit rules (better)

  • “Player only has items in inventory”
  • “NPCs can’t use magic”
  • Result: ~70% compliance

Phase 3: Validation instructions (best)

  • “Before narrating, check if action is possible”
  • Numbered rules (easier to follow)
  • Explicit “DO NOT” statements
  • Result: ~95% compliance

Implementation Pattern

def build_constrained_prompt(event: dict, context: dict) -> str:
    return f"""
{ANTI_HALLUCINATION_CONSTRAINTS}
 
Current State:
- Location: {context['location']}
- Player inventory: {context['player_inventory']}
- Player abilities: {context['player_abilities']}
- NPCs present: {context['npcs']}
 
Event to narrate:
{format_event(event)}
 
VALIDATION CHECKLIST:
☐ Action is physically possible
☐ Player has required items/abilities
☐ NPCs acting within capabilities
☐ World rules respected
 
Narrate this event (2-3 sentences):
"""

When Constraints Fail

Remaining edge cases (~5% of cases):

  1. Novel action combinations not covered by rules
  2. Ambiguous physics (e.g., “Can I climb that?“)
  3. Creative interpretation of abilities
  4. Complex multi-step actions

Fallback strategy:

  • Backend validation catches invalid outputs
  • Reject and regenerate with stronger constraints
  • Use lower temperature (0.3) for problematic cases

Lesson 3: Desktop GUI vs. Web Interface Trade-off

The Decision

Choice: PyQt5 desktop application instead of web interface

Rationale:

  1. Direct file access: .world/.save files locally stored
  2. Offline capability: No server dependency
  3. Visual editor feasibility: Complex UI easier in desktop framework
  4. Faster iteration: Game designers can work without deployment

Advantages of Desktop Approach

For Game Designers:

  • ✅ Edit files directly with text editor as backup
  • ✅ No hosting costs
  • ✅ Instant iteration (no deploy cycle)
  • ✅ Full control over data
  • ✅ Works without internet

For Players:

  • ✅ Privacy (no server-side data)
  • ✅ Offline play
  • ✅ Portable saves
  • ✅ Faster load times (local)

For Development:

  • ✅ Easier debugging (local execution)
  • ✅ No backend infrastructure
  • ✅ Simpler deployment (just distribute .exe)
  • ✅ Visual rule editor more feasible

Trade-offs and Limitations

What we lose:

  • ❌ No built-in multiplayer
  • ❌ No cloud saves (unless added separately)
  • ❌ Installation friction (download & install vs. web link)
  • ❌ Platform-specific builds (Windows, Mac, Linux)
  • ❌ No automatic updates (must download new version)
  • ❌ Larger download size (includes Python runtime)

What we gain:

  • ✅ Full control over execution
  • ✅ Zero latency (no network)
  • ✅ No server costs
  • ✅ Privacy-first design
  • ✅ Modding-friendly (direct file access)

Comparison to Web-Based Alternatives

AspectDesktop (ChatBotRPG)Web (e.g., ReallmCraft)
InstallationDownload & installClick link
Offline✅ Full functionality❌ Requires server
Multiplayer❌ Not built-in✅ Natural fit
DistributionShare .world filesHosted on server
UpdatesManual downloadAutomatic
Modding✅ Easy file editing⚠️ Limited access
Privacy✅ Fully local⚠️ Server-side data
CostOne-time devOngoing hosting

When Desktop Makes Sense

Desktop is better for:

  • Single-player RPGs
  • Creative tools (world editors)
  • Privacy-sensitive applications
  • Offline-first use cases
  • Modding communities
  • Zero hosting budget

Web is better for:

  • Multiplayer games
  • Collaborative editing
  • Cross-platform (no install)
  • Automatic updates
  • Social features
  • Monetization (subscriptions)

ChatBotRPG’s Unique Hybrid Approach

Innovation: .world files as “game cartridges”

Game Designer:
1. Create world in desktop editor
2. Export fantasy_realm.world file
3. Share file on Discord/forums

Player:
1. Download ChatBotRPG desktop app (once)
2. Download fantasy_realm.world (once per game)
3. Load world and play
4. Save progress in fantasy_realm_playthrough_001.save

Benefits:

  • Distribution like ROM files (easy sharing)
  • Portability (email a .world file)
  • Version control (git track .world files)
  • Community content (share on marketplaces)

Lesson 4: UX Philosophy - Consequential Decision-Making

The Design Choice

Philosophy: Deliberately restrict features to enforce consequences

From Discord (appl2613):

“This tool is a game engine, not an experimental chatbot interface”

Intentionally Removed Features

What’s NOT included (on purpose):

  1. No copy/paste chat history

    • Rationale: Prevent “optimal path” meta-gaming
    • Force players to remember events naturally
  2. No redo/undo for actions

    • Rationale: Enforce consequences
    • Actions have permanent effects
  3. Limited save slots

    • Rationale: Discourage save scumming
    • Make decisions matter
  4. No “best ending” guide mode

    • Rationale: Emergent storytelling
    • Each playthrough is unique
  5. No rewind to previous turn

    • Rationale: Time moves forward
    • Accept outcomes

The Psychology

Goal: Create tension and investment

Without restrictions:

Player sees difficult choice
  ↓
Player saves game
  ↓
Player tries option A
  ↓
Player doesn't like result
  ↓
Player reloads and tries option B
  ↓
Player optimizes for "best" outcome
  ↓
No real tension or consequences

With restrictions:

Player sees difficult choice
  ↓
Player knows choice is permanent
  ↓
Player carefully considers options
  ↓
Player makes decision
  ↓
Player accepts outcome (good or bad)
  ↓
Player emotionally invested in result

Case Study: The Bartender Incident

Scenario: Player accidentally insults bartender

Without restrictions:

  • Player: “Oops, let me undo that”
  • Player: rewinds to previous turn
  • Player: tries friendlier approach
  • Result: No consequence

With restrictions:

  • Player: “Oh no, the bartender is angry at me now”
  • Player: must deal with consequences
  • Player: finds creative solution or accepts it
  • Result: Memorable story moment

Trade-off Analysis

What we lose:

  • ❌ Experimentation without consequences
  • ❌ “Perfect playthrough” optimization
  • ❌ Safety net for mistakes
  • ❌ Exploration of all branches

What we gain:

  • ✅ Emotional investment
  • ✅ Natural tension
  • ✅ Memorable moments (good and bad)
  • ✅ Replayability (different choices each time)
  • ✅ Authentic role-playing

When to Allow Exceptions

Legitimate reasons to permit undo:

  1. Misunderstanding command: “I meant to say ‘look’ not ‘attack’”
  2. Technical glitch: LLM generated nonsense
  3. Tutorial mode: Let new players experiment
  4. Debugging: Testing game mechanics

Implementation:

class ActionHistory:
    def __init__(self, undo_enabled: bool = False):
        self.history = []
        self.undo_enabled = undo_enabled
 
    def can_undo(self, reason: str) -> bool:
        if self.undo_enabled:
            return True
 
        # Allow undo for legitimate reasons
        legitimate_reasons = [
            "command_misunderstood",
            "technical_error",
            "tutorial_mode"
        ]
        return reason in legitimate_reasons

Player Feedback

Positive reactions:

“I love that my choices actually matter. Every decision feels important.”

“Got my favorite NPC killed by accident. Devastating but made the story better.”

Negative reactions:

“I wish I could undo that. I didn’t know the bartender would attack me.”

Design decision: Accept some negative feedback to preserve core tension


Lesson 5: Performance Characteristics

Response Time Optimization

Goal: < 2 seconds for narration generation

Achieved metrics:

TaskTargetActualModel
Narration< 2s1-2sGemini 2.5 Flash Lite
Dialogue< 1s0.5-1sGemini 2.5 Flash Lite
Intent extraction< 0.5s0.3-0.5sGemini 2.5 Flash Lite
Scene description< 2s1-2sGemini 2.5 Flash Lite

Optimization strategies:

  1. Model selection: Fast model (Gemini 2.5 Flash Lite) over quality

    • Flash Lite: 1-2s response
    • Pro: 3-5s response
    • Trade-off: Slightly lower quality for much better UX
  2. Token limiting: 170-token limit reduces generation time

    • Fewer tokens = faster generation
    • Less computation needed
  3. Streaming disabled: Complete responses only

    • Rationale: Full sentence enforcement easier
    • Trade-off: No progressive display
  4. Caching: Scene descriptions cached locally

    • Golden Oak Inn description reused
    • Only regenerate on state changes

Cost Management

Production costs (200-turn session):

ComponentCost
Narration (200 turns)$0.034
Dialogue (50 turns)$0.003
Intent extraction (200 turns)$0.002
Scene descriptions (10 turns)$0.002
Total per session$0.041

Optimization strategies:

  1. 170-token limit: 66% cost reduction
  2. Intent extraction: Use tiny model (0.1 temp, 5 tokens)
  3. Cached descriptions: Reuse when possible
  4. Dialogue compression: 50 tokens max
  5. Batch nothing: Turn-by-turn only (no batch API)

Scale projection:

  • 1,000 sessions/month: $41
  • 10,000 sessions/month: $410
  • 100,000 sessions/month: $4,100

Comparison to alternatives:

  • Without optimizations: $100 per 1,000 sessions (2.4x more)
  • With streaming generation: Same cost but better UX (could improve)

Scalability Considerations

Desktop architecture scales differently than web:

Advantages:

  • ✅ No server scaling needed (runs on player’s machine)
  • ✅ No database scaling (local SQLite)
  • ✅ No bandwidth costs (local execution)
  • ✅ Linear cost scaling (per-user API calls only)

Limitations:

  • ⚠️ Player hardware dependent (Python + PyQt5 + LLM calls)
  • ⚠️ API rate limits (if many users)
  • ⚠️ Distribution bandwidth (download .exe)

Lesson 6: Evolution from JSON to SQLite

The Problem with JSON Files

Initial architecture:

/data/worlds/fantasy_realm/
  ├── settings.json
  ├── locations/
  │   ├── tavern.json
  │   ├── castle.json
  │   └── forest.json
  ├── npcs/
  │   ├── bartender.json
  │   ├── guard.json
  │   └── merchant.json
  └── items/
      ├── sword.json
      └── potion.json

Issues:

  1. File clutter: Dozens of files per world
  2. Distribution complexity: Zip entire folder
  3. Query difficulty: Can’t easily find “all NPCs in tavern”
  4. Version control conflicts: Git merge conflicts on folder structures
  5. No referential integrity: Broken references between files
  6. Slow loading: Read many files sequentially

The Solution: SQLite

New architecture:

/saves/
  ├── fantasy_realm.world    # Complete game in one file
  └── playthrough_001.save   # Player's progress in one file

Benefits:

  1. Single-file distribution: Share one .world file
  2. Fast queries: SQL joins for complex queries
  3. Referential integrity: Foreign keys enforce consistency
  4. Transactional: ACID guarantees
  5. Smaller file size: Binary format + compression
  6. Git-friendly: Single file easier to version

Migration Strategy

Backward compatibility:

def load_world(path: str) -> WorldState:
    if path.endswith('.world'):
        return load_from_sqlite(path)
    elif os.path.isdir(path):
        return load_from_json_folder(path)
    else:
        raise ValueError("Unknown world format")
 
def migrate_json_to_sqlite(json_folder: str, output_world: str):
    """Convert old JSON format to new SQLite format"""
    world = load_from_json_folder(json_folder)
    save_to_sqlite(world, output_world)

Performance Comparison

OperationJSON FolderSQLite .worldImprovement
Load world2.5s0.3s8.3x faster
Find NPCs in location1.2s0.05s24x faster
Save changes3.1s0.1s31x faster
Distribution size25 MB8 MB3.1x smaller

Key Takeaways

Design Principles

  1. Constraints enable creativity - Limiting LLM output (170 tokens) improved UX
  2. Explicit > Implicit - Clear constraints prevent hallucinations
  3. Trade-offs are features - Removing undo creates tension
  4. Local-first has benefits - Desktop app enables unique distribution model
  5. Evolution is necessary - JSON → SQLite improved everything

Technical Insights

  1. Token limiting is powerful - 66% cost reduction + better UX
  2. Model selection matters - Fast model > quality model for narration
  3. Validation before generation - Check action validity first
  4. Three-tier persistence works - World, playthrough, session separation
  5. Visual programming scales - StarCraft-inspired triggers accessible

Product Insights

  1. Target game designers - Not just developers
  2. Distribution model matters - .world files like game cartridges
  3. Consequences create investment - Limited saves = emotional engagement
  4. Desktop still viable - Not everything needs to be web-based
  5. Community content - Easy sharing enables user-generated worlds

Cross-References


Tags

production-lessons insights design-decisions optimization ux-philosophy trade-offs performance chatbotrpg real-world-usage


Next: Anti-Hallucination Deep Dive

See 08-Anti-Hallucination-System for detailed implementation of constraint validation system.