ChatBotRPG - Failed Experiments & Lessons Learned

Developer: appl2613 Analysis Period: June 17 - August 27, 2025 Total Commits Analyzed: 183 Revert Commits: 0 Deletion Commits: 3


Overview

ChatBotRPG exhibits remarkably few failed experiments - only 3 file deletions across 183 commits, with zero explicit reverts. This suggests either:

  1. Strong architectural vision - Correct decisions made early
  2. Extensive local testing - Failures filtered before GitHub
  3. Iterative refinement - Fix problems incrementally vs. abandon

Key Finding: Most “failures” were not abandoned approaches, but rather continuous improvements until success.


Explicit Failures (Deletions)

1. Test File Cleanup

Commit 042d34f (August 27, 2025)

Delete src/testMapping.py

Analysis: Final cleanup before release Evidence: Last commit of development period Conclusion: Not a failed experiment, just housekeeping


2. Module Reorganization

Commit c671687 (July 13, 2025)

Delete src/add_tab.py

Context: Same day, new file added to different location Commit 53a45e3 (July 11, 2025):

Create add_tab.py

Commit c671687 (July 13, 2025):

Delete src/add_tab.py
[Followed by upload to src/core/add_tab.py]

Analysis: Module moved to correct package, not abandoned Conclusion: Refactoring, not failure


3. No Other Deletions

Total File Deletions: 2 meaningful deletions across 183 commits Revert Commits: 0 Percentage: 1.1% of commits involved deletions

Interpretation: Extraordinarily stable development process


Implicit Failures (Iteration as Recovery)

1. World Editor Performance Issues

The Problem

From README (August 27, 2025):

“Why is the map editor so CPU heavy?” “I do plan to prioritize optimizations in the future - but for now, just getting it working at all was a really huge task for me.”

Known Issues:

“Small window popup on game switch or loading.” “Windows might twitch or adjust upon receiving some chat messages and other events. This is being worked on.”

What Didn’t Work

Hypothesis: Initial rendering approach too heavyweight

Evidence:

  • World Editor present in initial codebase (July 13)
  • Only 6 updates over 40 days (low iteration count)
  • Performance issues acknowledged in README

What Was Tried:

  • PyQt5 canvas rendering
  • Real-time map updates
  • Complex visual effects

The Outcome

Status: Shipped with known performance issues Philosophy: Functional > perfect Trade-off: Usability vs. optimization

Lesson: “Good enough” ships, “perfect” doesn’t

Cross-Reference: World Editor in Refactorings


2. Rules Engine Complexity

The Problem

Evidence: 40+ commits refining rules engine over 46 days Hotspot: Most actively evolved system

What Didn’t Work Initially:

  1. Simple condition evaluation - Couldn’t handle nested logic
  2. Flat rule structure - No support for complex chains
  3. Validation timing - Errors caught too late

Evolution of Solutions

Attempt 1: Simple Comparisons (July 12)

def evaluate_rule(rule, state):
    if rule['type'] == 'comparison':
        left = state[rule['left']]
        right = rule['right']
        return left == right  # Only == supported

Problem: Real games need <, >, !=, AND, OR


Attempt 2: Multiple Comparison Types (July 18)

def evaluate_rule(rule, state):
    operators = {
        '==': lambda l, r: l == r,
        '!=': lambda l, r: l != r,
        '<': lambda l, r: l < r,
        '>': lambda l, r: l > r,
    }
    return operators[rule['operator']](left, right)

Problem: No support for AND/OR chains


Attempt 3: Compound Conditions (July 20)

def evaluate_rule(rule, state):
    if rule['type'] == 'compound':
        if rule['operator'] == 'AND':
            return all(evaluate_rule(sub, state)
                      for sub in rule['conditions'])
        elif rule['operator'] == 'OR':
            return any(evaluate_rule(sub, state)
                      for sub in rule['conditions'])

Problem: Only one level of nesting


Attempt 4: Recursive Evaluation (August 27)

def evaluate_rule(rule, state):
    """Recursive evaluator with arbitrary nesting"""
    if rule['type'] == 'atomic':
        return evaluate_atomic(rule, state)
    elif rule['operator'] == 'AND':
        return all(evaluate_rule(sub, state)
                   for sub in rule['conditions'])
    elif rule['operator'] == 'OR':
        return any(evaluate_rule(sub, state)
                   for sub in rule['conditions'])
    elif rule['operator'] == 'NOT':
        return not evaluate_rule(rule['condition'], state)

Success: Arbitrary nesting, short-circuit evaluation, NOT support

Lessons Learned

  1. Start simple - Don’t build recursion until you need it
  2. Iterate based on use cases - Real games revealed complexity
  3. 40 commits is normal - Complex domains take time

Quote from Discord (veritasr, 2024):

“Fixed some bugs, and did a little refactoring / cleanup.”

Pattern: Acknowledge complexity, tackle incrementally


3. Character Inference (Hallucination Control)

The Problem

Evidence: 11 commits over 10 days refining narration

What Didn’t Work Initially:

  1. No output length control - Narrations varied wildly (10-500 tokens)
  2. No anti-hallucination constraints - LLM invented new locations/NPCs
  3. Inconsistent perspective - Switched between 1st/2nd/3rd person
  4. Over-creativity - Purple prose instead of functional narration

Evolution of Solutions

Attempt 1: Basic Prompt (July 13)

prompt = f"{context}\n\nPlayer: {player_action}\n\nNarrator:"
narration = llm_api.complete(prompt)

Problems:

  • Length varied (10-500 tokens)
  • Hallucinations common
  • Repetitive phrasing (“Elara smiles warmly”)

Attempt 2: Token Limiting (July 18)

narration = llm_api.complete(
    prompt,
    max_tokens=170  # Consistent length
)

Problems:

  • Still hallucinating
  • Inconsistent perspective
  • Too creative (flowery language)

Attempt 3: System Prompt Constraints (July 19)

system_prompt = """
You are the narrator for a text adventure game.
 
RULES:
- Only narrate what the player can observe
- Do not invent new locations, items, or characters
- Stick to provided context
- Maximum 170 tokens
"""
narration = llm_api.complete(
    system_prompt + context + player_action,
    max_tokens=170
)

Problems:

  • Still some hallucinations
  • Perspective drift

Attempt 4: Explicit Perspective Enforcement (July 21)

system_prompt = """
You are the narrator for a text adventure game.
 
RULES:
- Only narrate what the player can observe
- Do not invent new locations, items, or characters
- Stick to provided context
- Maximum 170 tokens
- Use present tense, second person ("You see...")
- Do not reveal character thoughts unless stated
"""

Success: Reduced hallucinations to acceptable levels


Attempt 5: Stop Sequences (July 22)

narration = llm_api.complete(
    system_prompt + context + player_action,
    max_tokens=170,
    stop_sequences=["\n\n", "Player:", "You:"]
)

Success: Cleaner paragraph breaks, no run-on narration

Lessons Learned

  1. Token limits work - 170 tokens enforced from day 5
  2. System prompts critical - Explicit constraints reduce hallucinations
  3. Perspective must be enforced - LLMs drift without reminders
  4. Stop sequences prevent bleeding - LLM stops at logical breaks

Quote from Discord (veritasr, July 2024):

“Lol. this legit happened to me on my game that I stopped last night, before I rewrote the characters.. They literally said.. ‘Will you help us hold back the encroaching darkness’ or some such nonsense.. I was like .. WTF”

Pattern: Everyone struggles with hallucinations, iteration is normal

Cross-Reference: Extracted Prompts


4. Main Orchestrator Monolith

The Problem

Evidence from Discord (veritasr, July 2024):

“main entry point is currently 2159 lines.. lol” “Also backend is starting to get sorta monolithic on the main orchestrator, so I might start carving stuff out”

appl2613’s Approach: src/chatBotRPG.py - 241 KB (estimated ~2000 lines)

What Didn’t Work:

  • All logic in one file initially
  • Hard to test individual components
  • Difficult to navigate

Evolution of Solutions

Attempt 1: Monolithic File (July 13)

chatBotRPG.py (241 KB)
- Game loop
- State management
- LLM interface
- UI management
- Rule execution
- Event handling

Problem: Hard to maintain, hard to test


Attempt 2: Extract Utilities (July 14-21) 13 updates over 8 days

# Extracted to src/core/utils.py
def validate_config(config):
    """Centralized validation"""
    pass
 
def build_context(game_state):
    """Context assembly"""
    pass
 
# Extracted to src/core/memory.py
def summarize_old_turns(turns):
    """Memory compression"""
    pass

Problem: Still large, but more navigable


Attempt 3: Modular Packages (July 13-22)

src/
  core/           # Game loop utilities
  rules/          # Rule engine
  editor_panel/   # UI components
  generate/       # Content generation
  scribe/         # AI agent
  player_panel/   # Player UI

Success: Separation of concerns achieved

Lessons Learned

  1. Monoliths are OK early - Ship first, refactor later
  2. Extract incrementally - Don’t rewrite, extract functions
  3. Packages > folders - Clear module boundaries

Quote from Discord (monkeyrithms, March 2024):

“yea i never followed any conventions really because i just started writing code and adding and adding to it, refactoring and breaking into new files when i can but its been mostly a free-flow thing.”

Pattern: Start messy, clean up as you go


5. JSON vs. SQLite Data Persistence

The “Failed” Experiment That Isn’t in Git

Evidence: No migration commits in git history Observation: README mentions both formats

From README:

“While other approaches to data persistence might be explored in the future, game data is currently stored as various .json files within various subfolders”

From Repository Overview (existing analysis):

“Initial: JSON files (nested folders)” “Current: SQLite database” “Format: .world files (complete game data)“

The Mystery

Question: Did the SQLite migration happen? Git Evidence: No commits showing migration Possible Explanations:

  1. Pre-GitHub Migration: SQLite implemented locally before first push
  2. Dual Support: Both formats coexist
  3. Planned but Not Executed: README describes future state
  4. Documentation Lag: README not updated after migration

Discord Context

veritasr (March 28, 2024):

“I’d probably configure the locations as database entries that referenced other DB entries.”

veritasr (March 28, 2024):

“I’m essentially storing data in tinydb. You can think of it as me having an overaching config, and that config reference a series of databases”

appl2613’s Choice: SQLite (not TinyDB)

What We Can Infer

Likely Scenario: JSON used during development, SQLite planned Evidence Supporting This:

  • No SQLite files in repository (would show .db or .world files)
  • No SQLite import statements in visible code
  • README explicitly says “currently stored as .json files”

Lesson: Sometimes “failures” are hidden in local development


Experiments That Succeeded Immediately

1. Program-First Architecture

Introduced: July 12 (Rules Engine first feature) Iterations: 0 (correct from day one) Outcome: Validates LLM World Engine discussions

2. API Key Security

Introduced: July 14 (Day 2 of code) Iterations: 1 (got it right immediately) Outcome: Secure from first public demo

3. Scribe AI Agent

Introduced: July 13 Iterations: 6 (minimal refinement) Outcome: Stable quickly, valuable feature

4. Inventory System

Introduced: July 13 Iterations: 5 (minimal refinement) Outcome: Clear domain, few surprises


Anti-Patterns Successfully Avoided

1. The “Big Rewrite” Trap

Avoided: No commits rewriting entire systems Evidence: 0 “rewrite” commits, 140+ “update” commits Benefit: Always-deployable state

2. The “Feature Creep” Trap

Avoided: 12 features in 72 days, then stop Evidence: Clear feature set, no scope expansion Benefit: 80-90% complete vs. 50% feature-bloated

3. The “Perfect Before Ship” Trap

Avoided: Shipped with known performance issues Evidence: README acknowledges World Editor CPU usage Benefit: Real users, real feedback

4. The “Premature Optimization” Trap

Avoided: Performance deferred to future Evidence: “I do plan to prioritize optimizations in the future” Benefit: Features complete instead of fast but incomplete


Patterns of Successful Recovery

Pattern 1: Iterate Don’t Abandon

Evidence: 40+ commits on Rules Engine Philosophy: Fix problems, don’t restart Outcome: Complex features eventually stabilize

Pattern 2: Ship and Learn

Evidence: World Editor shipped with known issues Philosophy: Real users reveal real problems Outcome: Prioritize based on actual pain points

Pattern 3: Test Harnesses Prevent Failures

Evidence: standalone_character_inference.py created early Philosophy: Fail fast in testing, not production Outcome: Fewer production failures

Pattern 4: Clear Vision Prevents Detours

Evidence: Only 2 file deletions in 183 commits Philosophy: Know what you’re building Outcome: Fewer wasted efforts


Lessons for LLM Game Engine Development

1. Rules Engines Are Complex

Evidence: 40+ commits refining rule evaluation Lesson: Allocate time for edge cases Recommendation: Start with simple conditions, add complexity as needed

2. Hallucination Control Takes Iteration

Evidence: 11 commits refining narration prompts Lesson: Explicit constraints > implicit hopes Recommendation: System prompts, token limits, stop sequences

3. Performance Can Wait

Evidence: World Editor shipped with CPU issues Lesson: Functional > fast Recommendation: Optimize based on real bottlenecks, not theoretical ones

4. Monoliths Are OK Initially

Evidence: 241 KB chatBotRPG.py Lesson: Refactor when pain is real, not theoretical Recommendation: Extract when testing becomes hard, not before

5. Data Migration Can Be Deferred

Evidence: JSON → SQLite migration not in git (or deferred) Lesson: Data format less critical than features Recommendation: Start simple, migrate when distribution matters


The “Failure” That Wasn’t

Low Commit Deletion Rate = Success?

Metric: 1.1% of commits involved deletions Industry Average: ~5-10% of commits involve deletions/reverts Interpretation: Unusually low failure rate

Possible Explanations:

  1. Strong architectural vision - Correct decisions early
  2. Local testing filters failures - Only working code pushed
  3. Incremental development - Fix don’t abandon
  4. Solo developer - No conflicting approaches

Most Likely: Combination of all four


What We Don’t See (Hidden Failures)

Local Development “Dark Period”

Timeline: June 17 (initial commit) - July 12 (first code) Duration: 26 days of silence Hypothesis: Extensive local experimentation before first push

What Might Have Failed Locally:

  • Alternative UI frameworks (before PyQt5)
  • Different data formats (before JSON)
  • Various LLM providers (before OpenRouter)
  • Prompt engineering experiments (many iterations likely)

Evidence: Day 1 codebase (July 13) was already mature (60+ files)

Lesson: Git history shows success, not struggle


Comparative Analysis: Discord vs. ChatBotRPG

veritasr’s Struggles (from Discord)

March 28, 2024:

“Fixed some bugs, and did a little refactoring / cleanup.” “main entry point is currently 2159 lines.. lol”

July 2024:

“Lol. this legit happened to me… They literally said.. ‘Will you help us hold back the encroaching darkness’”

Pattern: Acknowledged struggles with hallucinations, monolithic code

appl2613’s Approach (from Git)

  • Monolithic code: Also present (241 KB main file)
  • Hallucination issues: Also present (11 commits refining)
  • Refactoring approach: Incremental extraction

Conclusion: Same problems, same solutions, different timelines


Tags

failed-experiments lessons-learned iteration refactoring development-process chatbotrpg


Cross-References