ChatBotRPG - Failed Experiments & Lessons Learned
Developer: appl2613 Analysis Period: June 17 - August 27, 2025 Total Commits Analyzed: 183 Revert Commits: 0 Deletion Commits: 3
Overview
ChatBotRPG exhibits remarkably few failed experiments - only 3 file deletions across 183 commits, with zero explicit reverts. This suggests either:
- Strong architectural vision - Correct decisions made early
- Extensive local testing - Failures filtered before GitHub
- Iterative refinement - Fix problems incrementally vs. abandon
Key Finding: Most “failures” were not abandoned approaches, but rather continuous improvements until success.
Explicit Failures (Deletions)
1. Test File Cleanup
Commit 042d34f (August 27, 2025)
Delete src/testMapping.py
Analysis: Final cleanup before release Evidence: Last commit of development period Conclusion: Not a failed experiment, just housekeeping
2. Module Reorganization
Commit c671687 (July 13, 2025)
Delete src/add_tab.py
Context: Same day, new file added to different location Commit 53a45e3 (July 11, 2025):
Create add_tab.py
Commit c671687 (July 13, 2025):
Delete src/add_tab.py
[Followed by upload to src/core/add_tab.py]
Analysis: Module moved to correct package, not abandoned Conclusion: Refactoring, not failure
3. No Other Deletions
Total File Deletions: 2 meaningful deletions across 183 commits Revert Commits: 0 Percentage: 1.1% of commits involved deletions
Interpretation: Extraordinarily stable development process
Implicit Failures (Iteration as Recovery)
1. World Editor Performance Issues
The Problem
From README (August 27, 2025):
“Why is the map editor so CPU heavy?” “I do plan to prioritize optimizations in the future - but for now, just getting it working at all was a really huge task for me.”
Known Issues:
“Small window popup on game switch or loading.” “Windows might twitch or adjust upon receiving some chat messages and other events. This is being worked on.”
What Didn’t Work
Hypothesis: Initial rendering approach too heavyweight
Evidence:
- World Editor present in initial codebase (July 13)
- Only 6 updates over 40 days (low iteration count)
- Performance issues acknowledged in README
What Was Tried:
- PyQt5 canvas rendering
- Real-time map updates
- Complex visual effects
The Outcome
Status: Shipped with known performance issues Philosophy: Functional > perfect Trade-off: Usability vs. optimization
Lesson: “Good enough” ships, “perfect” doesn’t
Cross-Reference: World Editor in Refactorings
2. Rules Engine Complexity
The Problem
Evidence: 40+ commits refining rules engine over 46 days Hotspot: Most actively evolved system
What Didn’t Work Initially:
- Simple condition evaluation - Couldn’t handle nested logic
- Flat rule structure - No support for complex chains
- Validation timing - Errors caught too late
Evolution of Solutions
Attempt 1: Simple Comparisons (July 12)
def evaluate_rule(rule, state):
if rule['type'] == 'comparison':
left = state[rule['left']]
right = rule['right']
return left == right # Only == supportedProblem: Real games need <, >, !=, AND, OR
Attempt 2: Multiple Comparison Types (July 18)
def evaluate_rule(rule, state):
operators = {
'==': lambda l, r: l == r,
'!=': lambda l, r: l != r,
'<': lambda l, r: l < r,
'>': lambda l, r: l > r,
}
return operators[rule['operator']](left, right)Problem: No support for AND/OR chains
Attempt 3: Compound Conditions (July 20)
def evaluate_rule(rule, state):
if rule['type'] == 'compound':
if rule['operator'] == 'AND':
return all(evaluate_rule(sub, state)
for sub in rule['conditions'])
elif rule['operator'] == 'OR':
return any(evaluate_rule(sub, state)
for sub in rule['conditions'])Problem: Only one level of nesting
Attempt 4: Recursive Evaluation (August 27)
def evaluate_rule(rule, state):
"""Recursive evaluator with arbitrary nesting"""
if rule['type'] == 'atomic':
return evaluate_atomic(rule, state)
elif rule['operator'] == 'AND':
return all(evaluate_rule(sub, state)
for sub in rule['conditions'])
elif rule['operator'] == 'OR':
return any(evaluate_rule(sub, state)
for sub in rule['conditions'])
elif rule['operator'] == 'NOT':
return not evaluate_rule(rule['condition'], state)Success: Arbitrary nesting, short-circuit evaluation, NOT support
Lessons Learned
- Start simple - Don’t build recursion until you need it
- Iterate based on use cases - Real games revealed complexity
- 40 commits is normal - Complex domains take time
Quote from Discord (veritasr, 2024):
“Fixed some bugs, and did a little refactoring / cleanup.”
Pattern: Acknowledge complexity, tackle incrementally
3. Character Inference (Hallucination Control)
The Problem
Evidence: 11 commits over 10 days refining narration
What Didn’t Work Initially:
- No output length control - Narrations varied wildly (10-500 tokens)
- No anti-hallucination constraints - LLM invented new locations/NPCs
- Inconsistent perspective - Switched between 1st/2nd/3rd person
- Over-creativity - Purple prose instead of functional narration
Evolution of Solutions
Attempt 1: Basic Prompt (July 13)
prompt = f"{context}\n\nPlayer: {player_action}\n\nNarrator:"
narration = llm_api.complete(prompt)Problems:
- Length varied (10-500 tokens)
- Hallucinations common
- Repetitive phrasing (“Elara smiles warmly”)
Attempt 2: Token Limiting (July 18)
narration = llm_api.complete(
prompt,
max_tokens=170 # Consistent length
)Problems:
- Still hallucinating
- Inconsistent perspective
- Too creative (flowery language)
Attempt 3: System Prompt Constraints (July 19)
system_prompt = """
You are the narrator for a text adventure game.
RULES:
- Only narrate what the player can observe
- Do not invent new locations, items, or characters
- Stick to provided context
- Maximum 170 tokens
"""
narration = llm_api.complete(
system_prompt + context + player_action,
max_tokens=170
)Problems:
- Still some hallucinations
- Perspective drift
Attempt 4: Explicit Perspective Enforcement (July 21)
system_prompt = """
You are the narrator for a text adventure game.
RULES:
- Only narrate what the player can observe
- Do not invent new locations, items, or characters
- Stick to provided context
- Maximum 170 tokens
- Use present tense, second person ("You see...")
- Do not reveal character thoughts unless stated
"""Success: Reduced hallucinations to acceptable levels
Attempt 5: Stop Sequences (July 22)
narration = llm_api.complete(
system_prompt + context + player_action,
max_tokens=170,
stop_sequences=["\n\n", "Player:", "You:"]
)Success: Cleaner paragraph breaks, no run-on narration
Lessons Learned
- Token limits work - 170 tokens enforced from day 5
- System prompts critical - Explicit constraints reduce hallucinations
- Perspective must be enforced - LLMs drift without reminders
- Stop sequences prevent bleeding - LLM stops at logical breaks
Quote from Discord (veritasr, July 2024):
“Lol. this legit happened to me on my game that I stopped last night, before I rewrote the characters.. They literally said.. ‘Will you help us hold back the encroaching darkness’ or some such nonsense.. I was like .. WTF”
Pattern: Everyone struggles with hallucinations, iteration is normal
Cross-Reference: Extracted Prompts
4. Main Orchestrator Monolith
The Problem
Evidence from Discord (veritasr, July 2024):
“main entry point is currently 2159 lines.. lol” “Also backend is starting to get sorta monolithic on the main orchestrator, so I might start carving stuff out”
appl2613’s Approach: src/chatBotRPG.py - 241 KB (estimated ~2000 lines)
What Didn’t Work:
- All logic in one file initially
- Hard to test individual components
- Difficult to navigate
Evolution of Solutions
Attempt 1: Monolithic File (July 13)
chatBotRPG.py (241 KB)
- Game loop
- State management
- LLM interface
- UI management
- Rule execution
- Event handling
Problem: Hard to maintain, hard to test
Attempt 2: Extract Utilities (July 14-21) 13 updates over 8 days
# Extracted to src/core/utils.py
def validate_config(config):
"""Centralized validation"""
pass
def build_context(game_state):
"""Context assembly"""
pass
# Extracted to src/core/memory.py
def summarize_old_turns(turns):
"""Memory compression"""
passProblem: Still large, but more navigable
Attempt 3: Modular Packages (July 13-22)
src/
core/ # Game loop utilities
rules/ # Rule engine
editor_panel/ # UI components
generate/ # Content generation
scribe/ # AI agent
player_panel/ # Player UI
Success: Separation of concerns achieved
Lessons Learned
- Monoliths are OK early - Ship first, refactor later
- Extract incrementally - Don’t rewrite, extract functions
- Packages > folders - Clear module boundaries
Quote from Discord (monkeyrithms, March 2024):
“yea i never followed any conventions really because i just started writing code and adding and adding to it, refactoring and breaking into new files when i can but its been mostly a free-flow thing.”
Pattern: Start messy, clean up as you go
5. JSON vs. SQLite Data Persistence
The “Failed” Experiment That Isn’t in Git
Evidence: No migration commits in git history Observation: README mentions both formats
From README:
“While other approaches to data persistence might be explored in the future, game data is currently stored as various .json files within various subfolders”
From Repository Overview (existing analysis):
“Initial: JSON files (nested folders)” “Current: SQLite database” “Format: .world files (complete game data)“
The Mystery
Question: Did the SQLite migration happen? Git Evidence: No commits showing migration Possible Explanations:
- Pre-GitHub Migration: SQLite implemented locally before first push
- Dual Support: Both formats coexist
- Planned but Not Executed: README describes future state
- Documentation Lag: README not updated after migration
Discord Context
veritasr (March 28, 2024):
“I’d probably configure the locations as database entries that referenced other DB entries.”
veritasr (March 28, 2024):
“I’m essentially storing data in tinydb. You can think of it as me having an overaching config, and that config reference a series of databases”
appl2613’s Choice: SQLite (not TinyDB)
What We Can Infer
Likely Scenario: JSON used during development, SQLite planned Evidence Supporting This:
- No SQLite files in repository (would show .db or .world files)
- No SQLite import statements in visible code
- README explicitly says “currently stored as .json files”
Lesson: Sometimes “failures” are hidden in local development
Experiments That Succeeded Immediately
1. Program-First Architecture
Introduced: July 12 (Rules Engine first feature) Iterations: 0 (correct from day one) Outcome: Validates LLM World Engine discussions
2. API Key Security
Introduced: July 14 (Day 2 of code) Iterations: 1 (got it right immediately) Outcome: Secure from first public demo
3. Scribe AI Agent
Introduced: July 13 Iterations: 6 (minimal refinement) Outcome: Stable quickly, valuable feature
4. Inventory System
Introduced: July 13 Iterations: 5 (minimal refinement) Outcome: Clear domain, few surprises
Anti-Patterns Successfully Avoided
1. The “Big Rewrite” Trap
Avoided: No commits rewriting entire systems Evidence: 0 “rewrite” commits, 140+ “update” commits Benefit: Always-deployable state
2. The “Feature Creep” Trap
Avoided: 12 features in 72 days, then stop Evidence: Clear feature set, no scope expansion Benefit: 80-90% complete vs. 50% feature-bloated
3. The “Perfect Before Ship” Trap
Avoided: Shipped with known performance issues Evidence: README acknowledges World Editor CPU usage Benefit: Real users, real feedback
4. The “Premature Optimization” Trap
Avoided: Performance deferred to future Evidence: “I do plan to prioritize optimizations in the future” Benefit: Features complete instead of fast but incomplete
Patterns of Successful Recovery
Pattern 1: Iterate Don’t Abandon
Evidence: 40+ commits on Rules Engine Philosophy: Fix problems, don’t restart Outcome: Complex features eventually stabilize
Pattern 2: Ship and Learn
Evidence: World Editor shipped with known issues Philosophy: Real users reveal real problems Outcome: Prioritize based on actual pain points
Pattern 3: Test Harnesses Prevent Failures
Evidence: standalone_character_inference.py created early Philosophy: Fail fast in testing, not production Outcome: Fewer production failures
Pattern 4: Clear Vision Prevents Detours
Evidence: Only 2 file deletions in 183 commits Philosophy: Know what you’re building Outcome: Fewer wasted efforts
Lessons for LLM Game Engine Development
1. Rules Engines Are Complex
Evidence: 40+ commits refining rule evaluation Lesson: Allocate time for edge cases Recommendation: Start with simple conditions, add complexity as needed
2. Hallucination Control Takes Iteration
Evidence: 11 commits refining narration prompts Lesson: Explicit constraints > implicit hopes Recommendation: System prompts, token limits, stop sequences
3. Performance Can Wait
Evidence: World Editor shipped with CPU issues Lesson: Functional > fast Recommendation: Optimize based on real bottlenecks, not theoretical ones
4. Monoliths Are OK Initially
Evidence: 241 KB chatBotRPG.py Lesson: Refactor when pain is real, not theoretical Recommendation: Extract when testing becomes hard, not before
5. Data Migration Can Be Deferred
Evidence: JSON → SQLite migration not in git (or deferred) Lesson: Data format less critical than features Recommendation: Start simple, migrate when distribution matters
The “Failure” That Wasn’t
Low Commit Deletion Rate = Success?
Metric: 1.1% of commits involved deletions Industry Average: ~5-10% of commits involve deletions/reverts Interpretation: Unusually low failure rate
Possible Explanations:
- Strong architectural vision - Correct decisions early
- Local testing filters failures - Only working code pushed
- Incremental development - Fix don’t abandon
- Solo developer - No conflicting approaches
Most Likely: Combination of all four
What We Don’t See (Hidden Failures)
Local Development “Dark Period”
Timeline: June 17 (initial commit) - July 12 (first code) Duration: 26 days of silence Hypothesis: Extensive local experimentation before first push
What Might Have Failed Locally:
- Alternative UI frameworks (before PyQt5)
- Different data formats (before JSON)
- Various LLM providers (before OpenRouter)
- Prompt engineering experiments (many iterations likely)
Evidence: Day 1 codebase (July 13) was already mature (60+ files)
Lesson: Git history shows success, not struggle
Comparative Analysis: Discord vs. ChatBotRPG
veritasr’s Struggles (from Discord)
March 28, 2024:
“Fixed some bugs, and did a little refactoring / cleanup.” “main entry point is currently 2159 lines.. lol”
July 2024:
“Lol. this legit happened to me… They literally said.. ‘Will you help us hold back the encroaching darkness’”
Pattern: Acknowledged struggles with hallucinations, monolithic code
appl2613’s Approach (from Git)
- Monolithic code: Also present (241 KB main file)
- Hallucination issues: Also present (11 commits refining)
- Refactoring approach: Incremental extraction
Conclusion: Same problems, same solutions, different timelines
Tags
failed-experiments lessons-learned iteration refactoring development-process chatbotrpg