ReallmCraft Project

Overview

Creator: User-veritasr Type: LLM-powered Minecraft-inspired world engine Architecture: Flask backend + React frontend Key Innovation: Natural Description Language (NDL) for controlled narrative generation Code Scale: ~7,500+ lines (July 2024) Status: Active development, ~12th iteration, not publicly released

Project Philosophy

Core Principle: “The program decides, NDL describes, LLM narrates”

ReallmCraft represents a paradigm shift in LLM game design:

Program-first architecture: Backend logic determines all game state changes
LLM as decorator: AI only generates narrative text, makes no decisions
NDL translation layer: Converts programmatic events to natural language markup
Constraint-based reasoning: Uses expert systems and rule engines over probability

“Turns out that when you take away decision making from the LLM it behaves much better.”

Architecture

High-Level System Design

graph TD
    A[Player Input] --> B[Backend Logic]
    B --> C[State Update]
    C --> D[Event Generation]
    D --> E[NDL Converter]
    E --> F[LLM Decorator]
    F --> G[Narrative Output]
    C --> H[Game State DB]
    H --> I[RAG Retrieval]
    I --> F

Technology Stack

Backend:

Framework: Flask (Python)
Database: PostgreSQL (production), SQLite (development)
Vector DB: ChromaDB for semantic search
State Management: JSON-based save files
Task Processing: Background workers for generation

Frontend:

Framework: React/Next.js
UI Paradigm: Selection-based menus (reduce typing fatigue)
State Machine: FSM-based contextual menus
Deployment: Considered Tauri for desktop

AI/LLM:

Primary Models: Gemma, Llama3, Mistral, Stheno (7B-9B range)
Inference Backends: Ooba, Kobold, OpenAI, OpenRouter, Mancer
Multi-model support: Different models for different tasks

Core Systems

1. Natural Description Language (NDL)

Purpose: Translate programmatic events into controlled narrative prompts

Flow:

Backend Event → NDL Markup → LLM Processing → Natural Narrative

Example NDL Markup:

[scene:tavern]
  [character:Elara|mood:concerned]
    [dialogue|intent:request_help]
      "The ancient ruins are dangerous..."
    [/dialogue]
  [/character]
  [action:player|type:accept_quest]
[/scene]

Results:

Consistent subtext and emotional beats
Reliable character motivation without explicit scripting
Works across small and large models
Prevents LLM hallucination of state changes

See: 08-NDL-Natural-Description-Language for complete specification

2. Hybrid RAG System

Retrieval Strategy: Multi-method fusion for context

Three Retrieval Types:

Exact Match: Direct name/ID lookups
Boolean Search: Tag-based filtering
Semantic Similarity: Vector search for descriptions

Key Innovation: Store experiences, not descriptions

“Semantic similarity isn’t meant to locate stuff like keywords… it’s meant to find similar paragraphs of text.”

Implementation:

def hybrid_retrieve(query, context):
    # Exact matches first (O(1) lookup)
    exact = exact_match(query)
 
    # Boolean tag search (O(n) with indexing)
    tags = boolean_search(extract_tags(query))
 
    # Semantic search (vector similarity)
    semantic = vector_search(query, top_k=10)
 
    # RRF-style fusion
    ranked = reciprocal_rank_fusion(exact, tags, semantic)
 
    return ranked[:context_limit]

Context Management:

Token estimation with 1/50 ratio padding
Priority-based inclusion (exact > tags > semantic)
Dynamic pruning to fit context window

See: 03-RAG-and-Memory for implementation details

3. World Generation System

Approach: Just-In-Time generation with template fixation

Principles:

Don’t generate everything upfront
Generate as needed (player proximity/interaction)
Fix values once created (consistency)
Cache generated content

Template System:

{
  "entity_type": "npc",
  "template": "innkeeper",
  "attributes": {
    "name": "${generate:name|culture:fantasy}",
    "personality": "${template:friendly_merchant}",
    "inventory": "${generate:shop_items|type:general_goods}",
    "schedule": "${template:innkeeper_schedule}"
  }
}

Generation Pipeline:

Player triggers generation (enters area, asks question)
Backend selects appropriate template
LLM fills in ${generate:} placeholders
Results cached and persisted
Future references use cached data

See: 04-World-Generation for generation patterns

4. Constraint Programming System

Paradigm Shift (July 2024): From probability to constraints

Four Component Types:

Generators: Create and cache options/choices

def generate_combat_actions(character):
    # Generate possible actions
    actions = []
    if character.has_weapon():
        actions.extend(weapon_attacks(character.weapon))
    if character.has_magic():
        actions.extend(spell_list(character.spells))
    return cache_actions(actions)

Assemblers: Compose elements from choices

def assemble_character(race, class_type):
    base_stats = race_template[race]
    class_mods = class_template[class_type]
    equipment = starting_gear[class_type]
    return Character(base_stats, class_mods, equipment)

Evaluators: Parse conditions, score, make choices

def evaluate_action(action, context):
    if not meets_prerequisites(action, context):
        return None
    score = utility_score(action, context)
    return (action, score)

Analyzers: Abstract data, derive metrics

def analyze_world_state():
    metrics = {
        'danger_level': calculate_threat(),
        'quest_progress': track_objectives(),
        'npc_moods': aggregate_relationships()
    }
    return metrics

5. Modding Framework

Design Goal: Support user-created content from day one

Mod Structure:

mods/
  my_mod/
    mod.json           # Manifest
    templates/         # Entity templates
    scripts/           # Game logic
    locales/           # Translations
    assets/            # Images, sounds

Template Inheritance:

{
  "extends": "core:npc",
  "overrides": {
    "name_generator": "my_mod:custom_names"
  },
  "additions": {
    "reputation_system": "my_mod:faction_rep"
  }
}

Tooling Focus:

LLM-generated templates (reviewed by humans)
GUI editors for non-coders
Crowdsourced content creation
Easy testing and validation

Development Timeline

Early 2024 (Jan-Mar)

Joined LLM World Engine Discord thread
Established core architecture principles
Built world generation system
Implemented initial RAG retrieval
Iteration 1-4: Exploration phase

Mid 2024 (Apr-Jun)

React frontend development
Micromamba integration (cross-platform support)
Workflow systems
Random table generation
Extensive playtesting (Zweihander-inspired world)
Iteration 5-8: Feature expansion

Late 2024 (Jul-Dec)

Character generation systems
Behavior systems
NDL initial development
Constraint programming exploration
Modding framework architecture
Iteration 9-11: Architectural pivot

2025 (Jan-Jul)

NDL refinement and validation
Advanced prompting workflows
Selection-based UI (FSM menus)
Subtext and implicit dialogue systems
Iteration 12: Current “expert system + LLM decorator” architecture
Still in “very WIP” state (~2 weeks into iteration 12 as of July 2025)

Key Features

Completed Features

✅ Modular world generation
✅ Template-based entity system
✅ Hybrid RAG retrieval (exact + boolean + semantic)
✅ NDL translation layer
✅ Multi-model support (7B-9B models work reliably)
✅ Context window management with token estimation
✅ Selection-based UI with FSM menus
✅ Save/load system with state persistence
✅ Mod framework architecture

In Development

🔄 Custom scripting language for game logic
🔄 Constraint programming full integration
🔄 Advanced behavior trees
🔄 Fractal location generation
🔄 Complete mod tooling and editors

Planned Features

📋 Public GitHub release (when ready)
📋 Multiplayer/shared world exploration
📋 Community content marketplace
📋 Visual node editor for logic
📋 Replay/spectator mode

Technical Challenges Overcome

1. Cross-Platform Support

Problem: Conda/Python environment complexity on Windows Solution: Micromamba integration with automated environment setup

2. Context Window Management

Problem: Limited context for complex worlds Solution: Dynamic token estimation (1/50 ratio), priority-based pruning, hybrid retrieval

3. Retrieval Quality

Problem: Vector search alone insufficient for game queries Solution: Hybrid exact + boolean + semantic with RRF fusion

4. UI Complexity

Problem: Typing fatigue in text-based games Solution: Selection-based menus, FSM-driven contextual options

5. LLM Consistency

Problem: Models hallucinate state changes and break game logic Solution: NDL system - program controls all state, LLM only narrates

6. Iteration Fatigue

Problem: Rebuilding same features multiple times Solution: Modular architecture, embrace iterative design, template everything

Philosophical Insights

On LLM Role

“Sort of like it just narrating what happens in the world rather than the LLM trying to continue the story”

“LLMs are terrible at state management - Keep state programmatic”

On Architecture

“Essentially half simulation half story”

On Development

“A lot of these ideas make it to partial implementations before throwing them out”

“I’m like on iteration 12 for this thing”

“I don’t typically share projects, since that means people want me to support them. I’ve been burned by that in the past.”

On Innovation

“It sort of feels like the industry is catching up to what we were doing in here last year”

Performance & Results

NDL Validation (July 2025)

✅ Consistent subtext in narrative
✅ Controlled emotional beats
✅ Reliable dialogue with implicit intent
✅ Works across Gemma, Llama3, Mistral, Stheno (7B-9B)
✅ No state hallucinations
✅ Character motivation preserved without explicit scripting

RAG Performance

Exact match: ~O(1) with dictionary lookup
Boolean search: ~O(n) with tag indexing
Semantic search: ~100ms for top-10 (ChromaDB)
Combined fusion: ~150ms total retrieval time

Code Scale

~7,500+ lines Python (backend)
Unknown lines React (frontend)
Template library: Extensive JSON data
12 major iterations over 18 months

Lessons Shared with Community

On Context Management

Token estimation with padding (1/50 ratio)
Priority-based context inclusion
Exact matches first, semantic search second

On Content Generation

LLM-generated templates work well when reviewed
Cache and refine generated content
Focus tooling on data creation, not code
Consider crowdsourcing for scale

On Retrieval

Three approaches: RAG, boolean, knowledge graphs
Hybrid works best for game engines
Store experiences, not just descriptions
Generate hypothetical questions for better matching

On Architecture

Backend work easier than frontend (for veritasr)
Security concerns only matter for multi-user
Local-first design simplifies considerably
Plan for modding from start, not as afterthought

On LLM Usage

Small models (7B-9B) work fine with NDL
Multiple inference passes > one-shotting
Constraint-based > probabilistic generation
Program decides, LLM decorates

Core Pages

User-veritasr - Creator profile and contributions
08-NDL-Natural-Description-Language - NDL complete specification
01-Architecture-and-Design - Architectural patterns
03-RAG-and-Memory - RAG implementation details
04-World-Generation - Generation system patterns

NDL Documentation

00-NDL-INDEX - NDL master index
02-grammar - EBNF grammar specification
scene-transitions - Scene transition patterns
intention - Character intention construct
ndl-to-llm-flow - How to integrate NDL with LLMs

Architectural Patterns

program-first-architecture - Core design pattern
ndl-bridge - NDL integration pattern
multi-model-routing - Multi-model strategy
jit-generation - JIT generation pattern

Prompts

scene-description - Scene narrative generation
format-enforcement - Constraint enforcement
character-generation - Character creation

Community Impact

Influence on Discussions

Set standards for architectural thinking
Provided working code examples
Debugged others’ implementations
Pushed boundaries of LLM game design

Contributions to Knowledge Base

Most prolific technical contributor (by message volume)
Introduced NDL paradigm (now formally specified)
Pioneered constraint programming approach
Validated small model viability (7B-9B)

Open Questions Answered

Can small models work? Yes, with NDL
How to prevent hallucination? Separate state from narration
What’s the right retrieval strategy? Hybrid (exact + boolean + semantic)
Should LLMs make decisions? No, program should decide

Comparison with Other Projects

vs ChatBotRPG (ChatBotRPG-Project)

Similarities:

LLM as narration layer only
Programmatic state management
Template-based content
Modular/extensible architecture

Differences:

ReallmCraft: Backend-first (Flask + React)
ChatBotRPG: Desktop-first (PyQt5 GUI)
ReallmCraft: NDL translation layer
ChatBotRPG: Visual rule editor (StarCraft-inspired)
ReallmCraft: Not publicly released
ChatBotRPG: GitHub release (July 2025)

Unique to ReallmCraft

Natural Description Language (NDL)
Constraint programming integration
Fractal world generation approach
Advanced RAG with RRF fusion
Multi-iteration evolutionary design (12+ iterations)

Future Directions

Immediate Priorities (veritasr’s focus)

Complete iteration 12 stabilization
Full constraint programming integration
Polish NDL implementation
Improve mod tooling

Long-term Vision

Public release when ready (no timeline)
Community-driven content creation
Potential collaboration on shared worlds
Advanced AI director systems
Procedural quest generation

Research Directions

Further constraint programming exploration
Knowledge graph integration
Multi-agent NPC systems
Emergent storytelling from rule interactions

Project Status

ReallmCraft remains in active development with no public release date. The project has gone through ~12 major iterations over 18 months, with the current iteration focusing on expert systems + LLM decorator architecture.

Key Achievement

Successfully demonstrated that small models (7B-9B) can generate consistent, high-quality narrative when combined with NDL translation layer and program-first architecture. This validates the “program decides, LLM decorates” paradigm.

Developer's Perspective

“Very WIP… like 2 weeks in or something like that” (re: iteration 12, July 2025)

The willingness to throw out and rebuild demonstrates commitment to finding the right architecture over rushing to release.

LLM World Engine Knowledge Base

Explorer

ReallmCraft-Project

ReallmCraft Project

Overview

Project Philosophy

Architecture

High-Level System Design

Technology Stack

Core Systems

1. Natural Description Language (NDL)

2. Hybrid RAG System

3. World Generation System

4. Constraint Programming System

5. Modding Framework

Development Timeline

Early 2024 (Jan-Mar)

Mid 2024 (Apr-Jun)

Late 2024 (Jul-Dec)

2025 (Jan-Jul)

Key Features

Completed Features

In Development

Planned Features

Technical Challenges Overcome

1. Cross-Platform Support

2. Context Window Management

3. Retrieval Quality

4. UI Complexity

5. LLM Consistency

6. Iteration Fatigue

Philosophical Insights

On LLM Role

On Architecture

On Development

On Sharing Work

On Innovation

Performance & Results

NDL Validation (July 2025)

RAG Performance

Code Scale

Lessons Shared with Community

On Context Management

On Content Generation

On Retrieval

On Architecture

On LLM Usage

Related Documentation

Core Pages

NDL Documentation

Architectural Patterns

Prompts

Community Impact

Influence on Discussions

Contributions to Knowledge Base

Open Questions Answered

Comparison with Other Projects

vs ChatBotRPG (ChatBotRPG-Project)

Unique to ReallmCraft

Future Directions

Immediate Priorities (veritasr’s focus)

Long-term Vision

Research Directions

Graph View

Table of Contents

Backlinks