ChatBotRPG - Production Lessons
Analysis Date: 2026-01-18 Source: Discord discussions and design decisions by appl2613
Overview
This document captures key insights and lessons learned from real-world ChatBotRPG usage, including design trade-offs, performance optimizations, and UX philosophy.
Lesson 1: The 170-Token Sweet Spot
Discovery
Finding: Shorter, more frequent messages (170 tokens ≈ 2-3 sentences) feel more responsive than longer narrations, despite using the same total token count.
From Discord (appl2613):
“a rapid series of shorter messages, but a more responsive world, and characters who talk in short lines or in a conversational manner that feels very realtime”
The Problem with Long Narrations
Before (avg 500 tokens per turn):
You enter the Golden Oak Inn, a cozy establishment with a massive oak
tree growing through the center of the common room. The warm firelight
flickers across worn wooden tables where patrons sit drinking and
talking. The smell of roasted meat and fresh bread fills the air,
making your stomach rumble. A bard strums a lute in the corner, playing
a melancholic tune. The bartender, a gruff-looking man with a thick
beard, wipes down glasses behind the bar. Several patrons glance your
way as you enter, but most return to their conversations. You notice
a notice board near the entrance covered in various postings.
Issues:
- Feels slow and ponderous
- Player waits 3-5 seconds for response
- Information overload
- Less engaging pacing
- Higher cost ($0.10+ per session)
The Solution: 170-Token Limit
After (170 tokens per turn):
You enter the Golden Oak Inn. The warm firelight flickers across worn
wooden tables. The smell of roasted meat and ale fills the air.
Benefits:
- Faster response time: 1-2 seconds vs. 3-5 seconds
- More responsive feeling: Rapid back-and-forth like real conversation
- Better pacing: Players stay engaged
- 50-66% cost reduction: 0.10 per session
- Front-loaded coherence: Shorter outputs = model focuses on most important tokens
Implementation Strategy
Three-layer enforcement:
- API-level limit:
max_tokens=170 - Prompt instruction: “Write EXACTLY 2-3 sentences”
- Post-processing: Regex to enforce sentence count
def enforce_170_token_limit(prompt: str, llm_client) -> str:
# Layer 1: API limit
response = llm_client.complete(prompt, max_tokens=170)
# Layer 2: Prompt instruction (already in prompt)
# "Write EXACTLY 2-3 sentences"
# Layer 3: Post-processing
sentences = re.split(r'(?<=[.!?])\s+', response)
if len(sentences) > 3:
sentences = sentences[:3]
return ' '.join(sentences)Cost Analysis
| Token Limit | Tokens/Turn | Tokens/Session (200 turns) | Cost (Gemini 2.5) | Savings |
|---|---|---|---|---|
| No limit (500) | 500 | 100,000 | $0.100 | - |
| 300 tokens | 300 | 60,000 | $0.060 | 40% |
| 170 tokens | 170 | 34,000 | $0.034 | 66% |
ROI: 66% cost reduction with improved UX
When to Break the Rule
Exceptions where longer narrations are justified:
- Scene transitions: First time entering major location (250 tokens)
- Important revelations: Plot twists or major discoveries (300 tokens)
- Combat finishers: Dramatic final blows (200 tokens)
- Cutscenes: Scripted story moments (unlimited)
Example exception:
if is_first_visit(location) and location.importance == "major":
max_tokens = 250 # Allow richer description
else:
max_tokens = 170 # Standard limitLesson 2: The “Diamond Horses” Problem
Discovery
Finding: Creative models (like Hathor) hallucinate impossible events without explicit constraints, even when given character sheets and world state.
From Discord (yukidaore testing Hathor):
“Before constraints: Death Knight Mara with massive battleaxe appeared instead of bartender. Diamond horses manifested. Teleportation happened spontaneously. Guard gained magic powers mid-combat.”
The Problem
Creative models are too creative:
- Invent items not in inventory
- Give NPCs abilities they don’t have
- Violate physics without magic
- Change predetermined outcomes
- Introduce new characters spontaneously
Example failure:
Player: "I talk to the bartender"
Game state: {location: "inn", npcs: ["bartender"]}
LLM (Hathor, no constraints):
"You approach the bar, but instead of the bartender, a towering Death
Knight named Mara stands behind it, her massive battleaxe gleaming.
Diamond horses burst through the windows, their crystalline hooves
sparking. You suddenly teleport to the roof."
The Solution: Explicit Constraint System
Core constraints (injected into every prompt):
{{char}} is a logical and realistic text adventure game.
CRITICAL RULES:
1. Player has NO equipment/items/powers unless in character sheet
2. Impossible actions MUST fail with appropriate narration
3. Do NOT invent new items, locations, or characters
4. Do NOT change dice roll outcomes or predetermined results
5. Do NOT violate setting physics or logic
6. NPCs cannot spontaneously gain abilities they don't have
7. NO teleportation, flight, or magic unless explicitly permitted
VALIDATION PROCESS:
Before narrating any action:
1. Check if action is physically possible
2. Verify player has required items/abilities
3. Confirm NPCs are acting within their capabilities
4. Ensure world rules are respected
Results
| Model | Before Constraints | After Constraints | Compliance Rate |
|---|---|---|---|
| Hathor | ❌ Frequent hallucinations | ✅ Rare issues | ~90% |
| Gemini 2.5 | ⚠️ Occasional invention | ✅ Excellent | ~98% |
| EstopianMaid | ⚠️ Minor inconsistencies | ✅ Very good | ~95% |
Constraint Evolution
Phase 1: Implicit expectations (failed)
- “Respect the character sheet”
- “Stay consistent with the world”
- Result: Models ignored hints
Phase 2: Explicit rules (better)
- “Player only has items in inventory”
- “NPCs can’t use magic”
- Result: ~70% compliance
Phase 3: Validation instructions (best)
- “Before narrating, check if action is possible”
- Numbered rules (easier to follow)
- Explicit “DO NOT” statements
- Result: ~95% compliance
Implementation Pattern
def build_constrained_prompt(event: dict, context: dict) -> str:
return f"""
{ANTI_HALLUCINATION_CONSTRAINTS}
Current State:
- Location: {context['location']}
- Player inventory: {context['player_inventory']}
- Player abilities: {context['player_abilities']}
- NPCs present: {context['npcs']}
Event to narrate:
{format_event(event)}
VALIDATION CHECKLIST:
☐ Action is physically possible
☐ Player has required items/abilities
☐ NPCs acting within capabilities
☐ World rules respected
Narrate this event (2-3 sentences):
"""When Constraints Fail
Remaining edge cases (~5% of cases):
- Novel action combinations not covered by rules
- Ambiguous physics (e.g., “Can I climb that?“)
- Creative interpretation of abilities
- Complex multi-step actions
Fallback strategy:
- Backend validation catches invalid outputs
- Reject and regenerate with stronger constraints
- Use lower temperature (0.3) for problematic cases
Lesson 3: Desktop GUI vs. Web Interface Trade-off
The Decision
Choice: PyQt5 desktop application instead of web interface
Rationale:
- Direct file access: .world/.save files locally stored
- Offline capability: No server dependency
- Visual editor feasibility: Complex UI easier in desktop framework
- Faster iteration: Game designers can work without deployment
Advantages of Desktop Approach
For Game Designers:
- ✅ Edit files directly with text editor as backup
- ✅ No hosting costs
- ✅ Instant iteration (no deploy cycle)
- ✅ Full control over data
- ✅ Works without internet
For Players:
- ✅ Privacy (no server-side data)
- ✅ Offline play
- ✅ Portable saves
- ✅ Faster load times (local)
For Development:
- ✅ Easier debugging (local execution)
- ✅ No backend infrastructure
- ✅ Simpler deployment (just distribute .exe)
- ✅ Visual rule editor more feasible
Trade-offs and Limitations
What we lose:
- ❌ No built-in multiplayer
- ❌ No cloud saves (unless added separately)
- ❌ Installation friction (download & install vs. web link)
- ❌ Platform-specific builds (Windows, Mac, Linux)
- ❌ No automatic updates (must download new version)
- ❌ Larger download size (includes Python runtime)
What we gain:
- ✅ Full control over execution
- ✅ Zero latency (no network)
- ✅ No server costs
- ✅ Privacy-first design
- ✅ Modding-friendly (direct file access)
Comparison to Web-Based Alternatives
| Aspect | Desktop (ChatBotRPG) | Web (e.g., ReallmCraft) |
|---|---|---|
| Installation | Download & install | Click link |
| Offline | ✅ Full functionality | ❌ Requires server |
| Multiplayer | ❌ Not built-in | ✅ Natural fit |
| Distribution | Share .world files | Hosted on server |
| Updates | Manual download | Automatic |
| Modding | ✅ Easy file editing | ⚠️ Limited access |
| Privacy | ✅ Fully local | ⚠️ Server-side data |
| Cost | One-time dev | Ongoing hosting |
When Desktop Makes Sense
Desktop is better for:
- Single-player RPGs
- Creative tools (world editors)
- Privacy-sensitive applications
- Offline-first use cases
- Modding communities
- Zero hosting budget
Web is better for:
- Multiplayer games
- Collaborative editing
- Cross-platform (no install)
- Automatic updates
- Social features
- Monetization (subscriptions)
ChatBotRPG’s Unique Hybrid Approach
Innovation: .world files as “game cartridges”
Game Designer:
1. Create world in desktop editor
2. Export fantasy_realm.world file
3. Share file on Discord/forums
Player:
1. Download ChatBotRPG desktop app (once)
2. Download fantasy_realm.world (once per game)
3. Load world and play
4. Save progress in fantasy_realm_playthrough_001.save
Benefits:
- Distribution like ROM files (easy sharing)
- Portability (email a .world file)
- Version control (git track .world files)
- Community content (share on marketplaces)
Lesson 4: UX Philosophy - Consequential Decision-Making
The Design Choice
Philosophy: Deliberately restrict features to enforce consequences
From Discord (appl2613):
“This tool is a game engine, not an experimental chatbot interface”
Intentionally Removed Features
What’s NOT included (on purpose):
-
No copy/paste chat history
- Rationale: Prevent “optimal path” meta-gaming
- Force players to remember events naturally
-
No redo/undo for actions
- Rationale: Enforce consequences
- Actions have permanent effects
-
Limited save slots
- Rationale: Discourage save scumming
- Make decisions matter
-
No “best ending” guide mode
- Rationale: Emergent storytelling
- Each playthrough is unique
-
No rewind to previous turn
- Rationale: Time moves forward
- Accept outcomes
The Psychology
Goal: Create tension and investment
Without restrictions:
Player sees difficult choice
↓
Player saves game
↓
Player tries option A
↓
Player doesn't like result
↓
Player reloads and tries option B
↓
Player optimizes for "best" outcome
↓
No real tension or consequences
With restrictions:
Player sees difficult choice
↓
Player knows choice is permanent
↓
Player carefully considers options
↓
Player makes decision
↓
Player accepts outcome (good or bad)
↓
Player emotionally invested in result
Case Study: The Bartender Incident
Scenario: Player accidentally insults bartender
Without restrictions:
- Player: “Oops, let me undo that”
- Player: rewinds to previous turn
- Player: tries friendlier approach
- Result: No consequence
With restrictions:
- Player: “Oh no, the bartender is angry at me now”
- Player: must deal with consequences
- Player: finds creative solution or accepts it
- Result: Memorable story moment
Trade-off Analysis
What we lose:
- ❌ Experimentation without consequences
- ❌ “Perfect playthrough” optimization
- ❌ Safety net for mistakes
- ❌ Exploration of all branches
What we gain:
- ✅ Emotional investment
- ✅ Natural tension
- ✅ Memorable moments (good and bad)
- ✅ Replayability (different choices each time)
- ✅ Authentic role-playing
When to Allow Exceptions
Legitimate reasons to permit undo:
- Misunderstanding command: “I meant to say ‘look’ not ‘attack’”
- Technical glitch: LLM generated nonsense
- Tutorial mode: Let new players experiment
- Debugging: Testing game mechanics
Implementation:
class ActionHistory:
def __init__(self, undo_enabled: bool = False):
self.history = []
self.undo_enabled = undo_enabled
def can_undo(self, reason: str) -> bool:
if self.undo_enabled:
return True
# Allow undo for legitimate reasons
legitimate_reasons = [
"command_misunderstood",
"technical_error",
"tutorial_mode"
]
return reason in legitimate_reasonsPlayer Feedback
Positive reactions:
“I love that my choices actually matter. Every decision feels important.”
“Got my favorite NPC killed by accident. Devastating but made the story better.”
Negative reactions:
“I wish I could undo that. I didn’t know the bartender would attack me.”
Design decision: Accept some negative feedback to preserve core tension
Lesson 5: Performance Characteristics
Response Time Optimization
Goal: < 2 seconds for narration generation
Achieved metrics:
| Task | Target | Actual | Model |
|---|---|---|---|
| Narration | < 2s | 1-2s | Gemini 2.5 Flash Lite |
| Dialogue | < 1s | 0.5-1s | Gemini 2.5 Flash Lite |
| Intent extraction | < 0.5s | 0.3-0.5s | Gemini 2.5 Flash Lite |
| Scene description | < 2s | 1-2s | Gemini 2.5 Flash Lite |
Optimization strategies:
-
Model selection: Fast model (Gemini 2.5 Flash Lite) over quality
- Flash Lite: 1-2s response
- Pro: 3-5s response
- Trade-off: Slightly lower quality for much better UX
-
Token limiting: 170-token limit reduces generation time
- Fewer tokens = faster generation
- Less computation needed
-
Streaming disabled: Complete responses only
- Rationale: Full sentence enforcement easier
- Trade-off: No progressive display
-
Caching: Scene descriptions cached locally
- Golden Oak Inn description reused
- Only regenerate on state changes
Cost Management
Production costs (200-turn session):
| Component | Cost |
|---|---|
| Narration (200 turns) | $0.034 |
| Dialogue (50 turns) | $0.003 |
| Intent extraction (200 turns) | $0.002 |
| Scene descriptions (10 turns) | $0.002 |
| Total per session | $0.041 |
Optimization strategies:
- 170-token limit: 66% cost reduction
- Intent extraction: Use tiny model (0.1 temp, 5 tokens)
- Cached descriptions: Reuse when possible
- Dialogue compression: 50 tokens max
- Batch nothing: Turn-by-turn only (no batch API)
Scale projection:
- 1,000 sessions/month: $41
- 10,000 sessions/month: $410
- 100,000 sessions/month: $4,100
Comparison to alternatives:
- Without optimizations: $100 per 1,000 sessions (2.4x more)
- With streaming generation: Same cost but better UX (could improve)
Scalability Considerations
Desktop architecture scales differently than web:
Advantages:
- ✅ No server scaling needed (runs on player’s machine)
- ✅ No database scaling (local SQLite)
- ✅ No bandwidth costs (local execution)
- ✅ Linear cost scaling (per-user API calls only)
Limitations:
- ⚠️ Player hardware dependent (Python + PyQt5 + LLM calls)
- ⚠️ API rate limits (if many users)
- ⚠️ Distribution bandwidth (download .exe)
Lesson 6: Evolution from JSON to SQLite
The Problem with JSON Files
Initial architecture:
/data/worlds/fantasy_realm/
├── settings.json
├── locations/
│ ├── tavern.json
│ ├── castle.json
│ └── forest.json
├── npcs/
│ ├── bartender.json
│ ├── guard.json
│ └── merchant.json
└── items/
├── sword.json
└── potion.json
Issues:
- File clutter: Dozens of files per world
- Distribution complexity: Zip entire folder
- Query difficulty: Can’t easily find “all NPCs in tavern”
- Version control conflicts: Git merge conflicts on folder structures
- No referential integrity: Broken references between files
- Slow loading: Read many files sequentially
The Solution: SQLite
New architecture:
/saves/
├── fantasy_realm.world # Complete game in one file
└── playthrough_001.save # Player's progress in one file
Benefits:
- ✅ Single-file distribution: Share one .world file
- ✅ Fast queries: SQL joins for complex queries
- ✅ Referential integrity: Foreign keys enforce consistency
- ✅ Transactional: ACID guarantees
- ✅ Smaller file size: Binary format + compression
- ✅ Git-friendly: Single file easier to version
Migration Strategy
Backward compatibility:
def load_world(path: str) -> WorldState:
if path.endswith('.world'):
return load_from_sqlite(path)
elif os.path.isdir(path):
return load_from_json_folder(path)
else:
raise ValueError("Unknown world format")
def migrate_json_to_sqlite(json_folder: str, output_world: str):
"""Convert old JSON format to new SQLite format"""
world = load_from_json_folder(json_folder)
save_to_sqlite(world, output_world)Performance Comparison
| Operation | JSON Folder | SQLite .world | Improvement |
|---|---|---|---|
| Load world | 2.5s | 0.3s | 8.3x faster |
| Find NPCs in location | 1.2s | 0.05s | 24x faster |
| Save changes | 3.1s | 0.1s | 31x faster |
| Distribution size | 25 MB | 8 MB | 3.1x smaller |
Key Takeaways
Design Principles
- Constraints enable creativity - Limiting LLM output (170 tokens) improved UX
- Explicit > Implicit - Clear constraints prevent hallucinations
- Trade-offs are features - Removing undo creates tension
- Local-first has benefits - Desktop app enables unique distribution model
- Evolution is necessary - JSON → SQLite improved everything
Technical Insights
- Token limiting is powerful - 66% cost reduction + better UX
- Model selection matters - Fast model > quality model for narration
- Validation before generation - Check action validity first
- Three-tier persistence works - World, playthrough, session separation
- Visual programming scales - StarCraft-inspired triggers accessible
Product Insights
- Target game designers - Not just developers
- Distribution model matters - .world files like game cartridges
- Consequences create investment - Limited saves = emotional engagement
- Desktop still viable - Not everything needs to be web-based
- Community content - Easy sharing enables user-generated worlds
Cross-References
Related Analysis
Related Discussions
Tags
production-lessons insights design-decisions optimization ux-philosophy trade-offs performance chatbotrpg real-world-usage
Next: Anti-Hallucination Deep Dive
See 08-Anti-Hallucination-System for detailed implementation of constraint validation system.