ChatBotRPG - Failed Experiments & Lessons Learned

Developer: appl2613 Analysis Period: June 17 - August 27, 2025 Total Commits Analyzed: 183 Revert Commits: 0 Deletion Commits: 3

Overview

ChatBotRPG exhibits remarkably few failed experiments - only 3 file deletions across 183 commits, with zero explicit reverts. This suggests either:

Strong architectural vision - Correct decisions made early
Extensive local testing - Failures filtered before GitHub
Iterative refinement - Fix problems incrementally vs. abandon

Key Finding: Most “failures” were not abandoned approaches, but rather continuous improvements until success.

Explicit Failures (Deletions)

1. Test File Cleanup

Commit 042d34f (August 27, 2025)

Delete src/testMapping.py

Analysis: Final cleanup before release Evidence: Last commit of development period Conclusion: Not a failed experiment, just housekeeping

2. Module Reorganization

Commit c671687 (July 13, 2025)

Delete src/add_tab.py

Context: Same day, new file added to different location Commit 53a45e3 (July 11, 2025):

Create add_tab.py

Commit c671687 (July 13, 2025):

Delete src/add_tab.py
[Followed by upload to src/core/add_tab.py]

Analysis: Module moved to correct package, not abandoned Conclusion: Refactoring, not failure

3. No Other Deletions

Total File Deletions: 2 meaningful deletions across 183 commits Revert Commits: 0 Percentage: 1.1% of commits involved deletions

Interpretation: Extraordinarily stable development process

Implicit Failures (Iteration as Recovery)

1. World Editor Performance Issues

The Problem

From README (August 27, 2025):

“Why is the map editor so CPU heavy?” “I do plan to prioritize optimizations in the future - but for now, just getting it working at all was a really huge task for me.”

Known Issues:

“Small window popup on game switch or loading.” “Windows might twitch or adjust upon receiving some chat messages and other events. This is being worked on.”

What Didn’t Work

Hypothesis: Initial rendering approach too heavyweight

Evidence:

World Editor present in initial codebase (July 13)
Only 6 updates over 40 days (low iteration count)
Performance issues acknowledged in README

What Was Tried:

PyQt5 canvas rendering
Real-time map updates
Complex visual effects

The Outcome

Status: Shipped with known performance issues Philosophy: Functional > perfect Trade-off: Usability vs. optimization

Lesson: “Good enough” ships, “perfect” doesn’t

Cross-Reference: World Editor in Refactorings

2. Rules Engine Complexity

The Problem

Evidence: 40+ commits refining rules engine over 46 days Hotspot: Most actively evolved system

What Didn’t Work Initially:

Simple condition evaluation - Couldn’t handle nested logic
Flat rule structure - No support for complex chains
Validation timing - Errors caught too late

Evolution of Solutions

Attempt 1: Simple Comparisons (July 12)

def evaluate_rule(rule, state):
    if rule['type'] == 'comparison':
        left = state[rule['left']]
        right = rule['right']
        return left == right  # Only == supported

Problem: Real games need <, >, !=, AND, OR

Attempt 2: Multiple Comparison Types (July 18)

def evaluate_rule(rule, state):
    operators = {
        '==': lambda l, r: l == r,
        '!=': lambda l, r: l != r,
        '<': lambda l, r: l < r,
        '>': lambda l, r: l > r,
    }
    return operators[rule['operator']](left, right)

Problem: No support for AND/OR chains

Attempt 3: Compound Conditions (July 20)

def evaluate_rule(rule, state):
    if rule['type'] == 'compound':
        if rule['operator'] == 'AND':
            return all(evaluate_rule(sub, state)
                      for sub in rule['conditions'])
        elif rule['operator'] == 'OR':
            return any(evaluate_rule(sub, state)
                      for sub in rule['conditions'])

Problem: Only one level of nesting

Attempt 4: Recursive Evaluation (August 27)

def evaluate_rule(rule, state):
    """Recursive evaluator with arbitrary nesting"""
    if rule['type'] == 'atomic':
        return evaluate_atomic(rule, state)
    elif rule['operator'] == 'AND':
        return all(evaluate_rule(sub, state)
                   for sub in rule['conditions'])
    elif rule['operator'] == 'OR':
        return any(evaluate_rule(sub, state)
                   for sub in rule['conditions'])
    elif rule['operator'] == 'NOT':
        return not evaluate_rule(rule['condition'], state)

Success: Arbitrary nesting, short-circuit evaluation, NOT support

Lessons Learned

Start simple - Don’t build recursion until you need it
Iterate based on use cases - Real games revealed complexity
40 commits is normal - Complex domains take time

Quote from Discord (veritasr, 2024):

“Fixed some bugs, and did a little refactoring / cleanup.”

Pattern: Acknowledge complexity, tackle incrementally

3. Character Inference (Hallucination Control)

The Problem

Evidence: 11 commits over 10 days refining narration

What Didn’t Work Initially:

No output length control - Narrations varied wildly (10-500 tokens)
No anti-hallucination constraints - LLM invented new locations/NPCs
Inconsistent perspective - Switched between 1st/2nd/3rd person
Over-creativity - Purple prose instead of functional narration

Evolution of Solutions

Attempt 1: Basic Prompt (July 13)

prompt = f"{context}\n\nPlayer: {player_action}\n\nNarrator:"
narration = llm_api.complete(prompt)

Problems:

Length varied (10-500 tokens)
Hallucinations common
Repetitive phrasing (“Elara smiles warmly”)

Attempt 2: Token Limiting (July 18)

narration = llm_api.complete(
    prompt,
    max_tokens=170  # Consistent length
)

Problems:

Still hallucinating
Inconsistent perspective
Too creative (flowery language)

Attempt 3: System Prompt Constraints (July 19)

system_prompt = """
You are the narrator for a text adventure game.
 
RULES:
- Only narrate what the player can observe
- Do not invent new locations, items, or characters
- Stick to provided context
- Maximum 170 tokens
"""
narration = llm_api.complete(
    system_prompt + context + player_action,
    max_tokens=170
)

Problems:

Still some hallucinations
Perspective drift

Attempt 4: Explicit Perspective Enforcement (July 21)

system_prompt = """
You are the narrator for a text adventure game.
 
RULES:
- Only narrate what the player can observe
- Do not invent new locations, items, or characters
- Stick to provided context
- Maximum 170 tokens
- Use present tense, second person ("You see...")
- Do not reveal character thoughts unless stated
"""

Success: Reduced hallucinations to acceptable levels

Attempt 5: Stop Sequences (July 22)

narration = llm_api.complete(
    system_prompt + context + player_action,
    max_tokens=170,
    stop_sequences=["\n\n", "Player:", "You:"]
)

Success: Cleaner paragraph breaks, no run-on narration

Lessons Learned

Token limits work - 170 tokens enforced from day 5
System prompts critical - Explicit constraints reduce hallucinations
Perspective must be enforced - LLMs drift without reminders
Stop sequences prevent bleeding - LLM stops at logical breaks

Quote from Discord (veritasr, July 2024):

“Lol. this legit happened to me on my game that I stopped last night, before I rewrote the characters.. They literally said.. ‘Will you help us hold back the encroaching darkness’ or some such nonsense.. I was like .. WTF”

Pattern: Everyone struggles with hallucinations, iteration is normal

Cross-Reference: Extracted Prompts

4. Main Orchestrator Monolith

The Problem

Evidence from Discord (veritasr, July 2024):

“main entry point is currently 2159 lines.. lol” “Also backend is starting to get sorta monolithic on the main orchestrator, so I might start carving stuff out”

appl2613’s Approach: src/chatBotRPG.py - 241 KB (estimated ~2000 lines)

What Didn’t Work:

All logic in one file initially
Hard to test individual components
Difficult to navigate

Evolution of Solutions

Attempt 1: Monolithic File (July 13)

chatBotRPG.py (241 KB)
- Game loop
- State management
- LLM interface
- UI management
- Rule execution
- Event handling

Problem: Hard to maintain, hard to test

Attempt 2: Extract Utilities (July 14-21) 13 updates over 8 days

# Extracted to src/core/utils.py
def validate_config(config):
    """Centralized validation"""
    pass
 
def build_context(game_state):
    """Context assembly"""
    pass
 
# Extracted to src/core/memory.py
def summarize_old_turns(turns):
    """Memory compression"""
    pass

Problem: Still large, but more navigable

Attempt 3: Modular Packages (July 13-22)

src/
  core/           # Game loop utilities
  rules/          # Rule engine
  editor_panel/   # UI components
  generate/       # Content generation
  scribe/         # AI agent
  player_panel/   # Player UI

Success: Separation of concerns achieved

Lessons Learned

Monoliths are OK early - Ship first, refactor later
Extract incrementally - Don’t rewrite, extract functions
Packages > folders - Clear module boundaries

Quote from Discord (monkeyrithms, March 2024):

“yea i never followed any conventions really because i just started writing code and adding and adding to it, refactoring and breaking into new files when i can but its been mostly a free-flow thing.”

Pattern: Start messy, clean up as you go

5. JSON vs. SQLite Data Persistence

The “Failed” Experiment That Isn’t in Git

Evidence: No migration commits in git history Observation: README mentions both formats

From README:

“While other approaches to data persistence might be explored in the future, game data is currently stored as various .json files within various subfolders”

From Repository Overview (existing analysis):

“Initial: JSON files (nested folders)” “Current: SQLite database” “Format: .world files (complete game data)“

The Mystery

Question: Did the SQLite migration happen? Git Evidence: No commits showing migration Possible Explanations:

Pre-GitHub Migration: SQLite implemented locally before first push
Dual Support: Both formats coexist
Planned but Not Executed: README describes future state
Documentation Lag: README not updated after migration

Discord Context

veritasr (March 28, 2024):

“I’d probably configure the locations as database entries that referenced other DB entries.”

veritasr (March 28, 2024):

“I’m essentially storing data in tinydb. You can think of it as me having an overaching config, and that config reference a series of databases”

appl2613’s Choice: SQLite (not TinyDB)

What We Can Infer

Likely Scenario: JSON used during development, SQLite planned Evidence Supporting This:

No SQLite files in repository (would show .db or .world files)
No SQLite import statements in visible code
README explicitly says “currently stored as .json files”

Lesson: Sometimes “failures” are hidden in local development

Experiments That Succeeded Immediately

1. Program-First Architecture

Introduced: July 12 (Rules Engine first feature) Iterations: 0 (correct from day one) Outcome: Validates LLM World Engine discussions

2. API Key Security

Introduced: July 14 (Day 2 of code) Iterations: 1 (got it right immediately) Outcome: Secure from first public demo

3. Scribe AI Agent

Introduced: July 13 Iterations: 6 (minimal refinement) Outcome: Stable quickly, valuable feature

4. Inventory System

Introduced: July 13 Iterations: 5 (minimal refinement) Outcome: Clear domain, few surprises

Anti-Patterns Successfully Avoided

1. The “Big Rewrite” Trap

Avoided: No commits rewriting entire systems Evidence: 0 “rewrite” commits, 140+ “update” commits Benefit: Always-deployable state

2. The “Feature Creep” Trap

Avoided: 12 features in 72 days, then stop Evidence: Clear feature set, no scope expansion Benefit: 80-90% complete vs. 50% feature-bloated

3. The “Perfect Before Ship” Trap

Avoided: Shipped with known performance issues Evidence: README acknowledges World Editor CPU usage Benefit: Real users, real feedback

4. The “Premature Optimization” Trap

Avoided: Performance deferred to future Evidence: “I do plan to prioritize optimizations in the future” Benefit: Features complete instead of fast but incomplete

Patterns of Successful Recovery

Pattern 1: Iterate Don’t Abandon

Evidence: 40+ commits on Rules Engine Philosophy: Fix problems, don’t restart Outcome: Complex features eventually stabilize

Pattern 2: Ship and Learn

Evidence: World Editor shipped with known issues Philosophy: Real users reveal real problems Outcome: Prioritize based on actual pain points

Pattern 3: Test Harnesses Prevent Failures

Evidence: standalone_character_inference.py created early Philosophy: Fail fast in testing, not production Outcome: Fewer production failures

Pattern 4: Clear Vision Prevents Detours

Evidence: Only 2 file deletions in 183 commits Philosophy: Know what you’re building Outcome: Fewer wasted efforts

Lessons for LLM Game Engine Development

1. Rules Engines Are Complex

Evidence: 40+ commits refining rule evaluation Lesson: Allocate time for edge cases Recommendation: Start with simple conditions, add complexity as needed

2. Hallucination Control Takes Iteration

Evidence: 11 commits refining narration prompts Lesson: Explicit constraints > implicit hopes Recommendation: System prompts, token limits, stop sequences

3. Performance Can Wait

Evidence: World Editor shipped with CPU issues Lesson: Functional > fast Recommendation: Optimize based on real bottlenecks, not theoretical ones

4. Monoliths Are OK Initially

Evidence: 241 KB chatBotRPG.py Lesson: Refactor when pain is real, not theoretical Recommendation: Extract when testing becomes hard, not before

5. Data Migration Can Be Deferred

Evidence: JSON → SQLite migration not in git (or deferred) Lesson: Data format less critical than features Recommendation: Start simple, migrate when distribution matters

The “Failure” That Wasn’t

Low Commit Deletion Rate = Success?

Metric: 1.1% of commits involved deletions Industry Average: ~5-10% of commits involve deletions/reverts Interpretation: Unusually low failure rate

Possible Explanations:

Strong architectural vision - Correct decisions early
Local testing filters failures - Only working code pushed
Incremental development - Fix don’t abandon
Solo developer - No conflicting approaches

Most Likely: Combination of all four

What We Don’t See (Hidden Failures)

Local Development “Dark Period”

Timeline: June 17 (initial commit) - July 12 (first code) Duration: 26 days of silence Hypothesis: Extensive local experimentation before first push

What Might Have Failed Locally:

Alternative UI frameworks (before PyQt5)
Different data formats (before JSON)
Various LLM providers (before OpenRouter)
Prompt engineering experiments (many iterations likely)

Evidence: Day 1 codebase (July 13) was already mature (60+ files)

Lesson: Git history shows success, not struggle

Comparative Analysis: Discord vs. ChatBotRPG

veritasr’s Struggles (from Discord)

March 28, 2024:

“Fixed some bugs, and did a little refactoring / cleanup.” “main entry point is currently 2159 lines.. lol”

July 2024:

“Lol. this legit happened to me… They literally said.. ‘Will you help us hold back the encroaching darkness’”

Pattern: Acknowledged struggles with hallucinations, monolithic code

appl2613’s Approach (from Git)

Monolithic code: Also present (241 KB main file)
Hallucination issues: Also present (11 commits refining)
Refactoring approach: Incremental extraction

Conclusion: Same problems, same solutions, different timelines

LLM World Engine Knowledge Base

Explorer

failed-experiments

ChatBotRPG - Failed Experiments & Lessons Learned

Overview

Explicit Failures (Deletions)

1. Test File Cleanup

2. Module Reorganization

3. No Other Deletions

Implicit Failures (Iteration as Recovery)

1. World Editor Performance Issues

The Problem

What Didn’t Work

The Outcome

2. Rules Engine Complexity

The Problem

Evolution of Solutions

Lessons Learned

3. Character Inference (Hallucination Control)

The Problem

Evolution of Solutions

Lessons Learned

4. Main Orchestrator Monolith

The Problem

Evolution of Solutions

Lessons Learned

5. JSON vs. SQLite Data Persistence

The “Failed” Experiment That Isn’t in Git

The Mystery

Discord Context

What We Can Infer

Experiments That Succeeded Immediately

1. Program-First Architecture

2. API Key Security

3. Scribe AI Agent

4. Inventory System

Anti-Patterns Successfully Avoided

1. The “Big Rewrite” Trap

2. The “Feature Creep” Trap

3. The “Perfect Before Ship” Trap

4. The “Premature Optimization” Trap

Patterns of Successful Recovery

Pattern 1: Iterate Don’t Abandon

Pattern 2: Ship and Learn

Pattern 3: Test Harnesses Prevent Failures

Pattern 4: Clear Vision Prevents Detours

Lessons for LLM Game Engine Development

1. Rules Engines Are Complex

2. Hallucination Control Takes Iteration

3. Performance Can Wait

4. Monoliths Are OK Initially

5. Data Migration Can Be Deferred

The “Failure” That Wasn’t

Low Commit Deletion Rate = Success?

What We Don’t See (Hidden Failures)

Local Development “Dark Period”

Comparative Analysis: Discord vs. ChatBotRPG

veritasr’s Struggles (from Discord)

appl2613’s Approach (from Git)

Tags

Cross-References

Graph View

Table of Contents