Clean up docs, tests, patch writing bug

Checked out commit from rsarv3006 which is super interesting, grabbed some inspiration from it and mentioned it in the ideas file.
2026-06-14 10:08:37 +00:00 · 2026-05-22 21:04:54 -07:00 · 2026-05-22 21:04:54 -07:00 · e1e6803eb1
commit e1e6803eb1
parent 33b9de5441
13 changed files with 410 additions and 678 deletions
--- a/.gitignore
+++ b/.gitignore
@ -25,7 +25,10 @@ share/python-wheels/
 .installed.cfg
 *.egg
 MANIFEST
 # Codex working notes and generated analysis docs
 docs/codex/
 # PyInstaller
 #  Usually these files are written by a python script from a template
 #  before PyInstaller builds the exe, so as to inject date/other infos into it.
--- a/docs/2026-05-22/iteration1.md
+++ b/docs/2026-05-22/iteration1.md
@ -1,119 +0,0 @@
 # Iteration 1: SCENE-002 Update State Failure
 Date: 2026-05-22
 ## Run Reviewed
 - Sandbox: `integ_runs/20260522T214944.385761Z`
 - Run: `.nightshift/runs/20260522T215005.188534Z`
 - Task: `SCENE-002`
 - Final status: failed
 - Failed stage: `update_state`
 ## What Happened
 The scene workflow mostly succeeded:
 - `draft_scene` wrote the scene.
 - `continuity_review` correctly failed the first draft for pronoun drift.
 - `edit_scene` repaired the pronoun issue.
 - `continuity_review` passed after edit.
 - `style_review` passed.
 The remaining failure happened in `update_state`.
 NightShift reported:
 ```text
 File writer error: no file blocks found. Expected FILE: path with ---CONTENT---/---END--- or fenced blocks like ```file:path.py.
 ```
 The model output did contain visible `FILE:` blocks, but it omitted the required `---END---` delimiter. It emitted:
 ```text
 FILE: story/plot-state.md
 ---CONTENT---
 ...
 FILE: story/characters.md
 ---CONTENT---
 ...
 ```
 The current parser requires `---END---`, so it rejected all of the blocks.
 ## Additional Risk Found
 The rejected state update also tried to rewrite character canon in unsafe ways:
 - It changed BLOODMONEY's pronoun reference to `he/him`.
 - It changed Cricket's pronoun reference to `they/them`.
 - It compressed/replaced larger parts of `story/characters.md`.
 That means simply accepting unterminated blocks is not enough. The parser can be more tolerant, but the state updater still needs stronger constraints so durable canon does not drift.
 ## Suggested Fixes
 Short-term fixes for this iteration:
 1. Make `parse_file_updates` tolerate delimiter blocks that omit `---END---` when a new `FILE:` block or EOF clearly terminates the previous block.
 2. Keep strict path validation and duplicate-file validation unchanged.
 3. Strengthen the state-updater prompt:
   - never edit `Pronouns / Reference` sections
   - preserve existing character profiles
   - prefer updating `plot-state.md`, `timeline.md`, and `unresolved-threads.md`
   - edit `characters.md` only for small additive current-status facts
 4. Add regression tests for unterminated delimiter parsing.
 Longer-term follow-up:
 - Add deterministic writing-state validation that rejects changes to protected canon sections such as `Pronouns / Reference`.
 - Move character canon into structured data so pronoun constraints can be validated directly.
 ## Planned Changes
 - Update delimiter block parsing in `nightshift/patches.py`.
 - Add parser tests in `tests/test_patches.py`.
 - Tighten `state-updater.md` in the tutorial novel template.
 - Run focused parser tests and the full suite.
 ## Changes Made
 - `parse_file_updates` now accepts delimiter-style file blocks that omit `---END---` when the next `FILE:` header or EOF clearly terminates the block.
 - Added regression coverage for:
  - unterminated delimiter blocks before another `FILE:`
  - mixed terminated and unterminated delimiter blocks
 - Strengthened the tutorial novel state updater prompt to protect character canon:
  - never change `Pronouns / Reference`
  - never change canonical pronouns, narrative reference, identity, or core wound
  - prefer state/timeline/thread files over `characters.md`
  - edit `characters.md` only for small additive current-status facts or new named characters
 - Added deterministic protection in file-block patch generation:
  - changes to existing `Pronouns / Reference` sections in `story/characters.md` are rejected before a patch is generated
 - Added regression coverage for rejecting protected pronoun canon changes.
 ## Verification
 Focused tests:
 ```powershell
 python -m pytest tests/test_patches.py tests/test_pipeline.py -q
 ```
 Result:
 ```text
 56 passed
 ```
 Full suite:
 ```powershell
 python -m pytest -q
 ```
 Result:
 ```text
 199 passed, 4 subtests passed
 ```
--- a/docs/codex/20260520-203827.md
+++ b/docs/codex/20260520-203827.md
@ -1,73 +0,0 @@
 # NightShift Integration Failure Analysis
 ## Immediate Causes
 I would separate the failures into four buckets:
 1. The pastebin template is not truly incremental.
   `tests/test_pastebin.py` already tests listing/filtering and expiration, even though `TASK-001` only asks for create/view. The stock app also already has a fairly complete `create_app` implementation. So the task is not "build feature 1"; it is "modify an already-complete app without breaking future-task behavior."
 2. The retry stop policy is harsher than the config implies.
   Even with `stop_on_repeated_failure_signature_after: 6`, `nightshift/escalation.py` unconditionally stops after the last 3 entries have the same stage and cause. That explains the "same stage same reason" stop before the configured repeated-signature threshold.
 3. The model got bad or insufficient context early.
   In the run artifacts, the planner asked for `app/models.py` and `app/routes.py`, both outside the actual scoped repo. That pushed it toward a hallucinated Flask/SQLAlchemy architecture. Later repairs added `tests/test_snippets.py` importing nonexistent `app`, then tried to repair by deleting large amounts of code, which patch validation correctly rejected.
 4. The template and manual deletion created contradictory state.
   In the latest project, `src/pastebin_app/__init__.py` imports `create_app`, but `src/pastebin_app/app.py` no longer defines it. `tests/test_pastebin.py` is now empty, while generated `tests/test_snippets.py` expects a different app shape. That is exactly the kind of broken intermediate state a local model will churn on unless the orchestrator gives it a very explicit recovery path.
 ## On Pre-Generated Code
 I agree with your instinct: for this tutorial, pre-generated app code is hurting more than helping.
 A better template would include:
 - `pyproject.toml`
 - package directories and empty `__init__.py`
 - minimal templates if the task needs HTML later
 - no complete app logic
 - no future-task tests active during `TASK-001`
 - a small `tests/test_task001.py` for only create/view
 Then `TASK-002` adds list/filter tests, `TASK-003` adds expiration tests, etc. The AI should build forward, not preserve a hidden completed app.
 ## Why Claude/Codex Feel Different
 Production coding agents usually have an inner loop:
 - inspect files
 - edit narrowly
 - run targeted tests
 - read exact failure
 - inspect more files
 - edit again
 - rerun
 NightShift currently has a coarser loop: generate one patch, normalize, apply, run tests, summarize, retry. That is auditable, but it means each retry is another sampled patch rather than an interactive repair session. Swapping models does not fix bad task shape, bad context, or contradictory repo state.
 ## Best Options
 Option A: fix the current design conservatively.
 - Remove pre-generated pastebin app logic.
 - Split tests by task.
 - Run only task-relevant tests during the task, then full suite after success.
 - Move deterministic repo context before planning, or at least always include file tree plus full contents of likely target files.
 - Make churn stopping obey config; do not hard-stop after 3 same-stage failures unless configured.
 - Improve retry signatures to ignore pytest cache warnings and prefer project traceback lines.
 Option B: add a real repair micro-loop.
 For command/test failures, run a bounded repair loop before consuming another global retry:
 ```text
 failure -> classify -> inspect exact files -> produce small patch -> run targeted test -> repeat 2-4 times
 ```
 That would make NightShift behave more like Codex/Claude while preserving artifacts.
 Option C: delegate hard repairs to production agent backends.
 Add a `codex`/`claude-code` backend stage for implementation/repair. NightShift still owns task selection, safety, artifacts, tests, and reports, but lets a stronger tool run the inner edit/test loop.
 My recommendation: do A first, then B. The template/task mismatch is the largest avoidable failure source, and the unconditional churn stop is a real policy bug. Once those are fixed, the remaining failures will be much more informative.
--- a/docs/ideas.md
+++ b/docs/ideas.md
@ -256,3 +256,16 @@ Reason:
 - fallback makes artifacts harder to reason about
 - model variability is bad while debugging pipeline behavior
 - the default template should remain the reliability harness
 ## P2: Adopt Useful Fork Ideas From rsarv3006
 Source: https://github.com/rsarv3006/nightShift/commit/649eef65546a4ae648170bf29663f939eb031d2c
 Author: GitHub user `rsarv3006`
 Useful ideas to consider porting:
 - Add `on_status` stage routing so review stages can route `pass`, `retry`, `fail`, and `escalate` to different follow-up stages.
 - Add configurable repo lookup exclusions, for example `safety.skip_repo_parts`, so projects can hide generated or irrelevant directories from planner/reviewer context tools.
 - Add configurable agent timeout, for example `pipeline.agent_timeout_seconds`, so long local-model runs can be tuned per project.
 - Add docs and focused tests around status-based routing behavior.
--- a/docs/writer-idea.md
+++ b/docs/writer-idea.md
@ -1,396 +0,0 @@
 # Agentic Novel Writing Workflow Idea
 NightShift could plausibly support non-coding workflows, especially long-form fiction, because the core abstraction is not actually "write code." It is:
 - read task context
 - call one or more agents
 - produce artifacts
 - validate outputs
 - update project state
 - move to the next task
 That maps surprisingly well to writing a novel.
 ## Core Realization
 A novel workflow should not ask one model to write the whole book, or even necessarily one whole chapter.
 The durable project files would act like the source of truth:
 - `worldbuilding.md`
 - `characters.md`
 - `plot-state.md`
 - `style-guide.md`
 - `outline.md`
 - `chapters/chapter-001.md`
 - `chapters/chapter-001-scene-001.md`
 - `tasks.md`
 The task file would drive the work, similar to coding tasks:
 ```text
 - [ ] SCENE-001: Opening scene at the border checkpoint
 Description:
 Write the opening scene where Mara tries to enter the city under a false work permit.
 Acceptance Criteria:
 - Introduces Mara's immediate goal
 - Shows the checkpoint culture without exposition dump
 - Mentions the salt tax conflict indirectly
 - Ends with the inspector noticing the forged seal
 - 900-1400 words
 - Maintains close third-person POV
 ```
 NightShift would run one scene or section at a time.
 ## What We Already Have
 NightShift already has several useful primitives:
 - task files for chunking the novel into scenes or chapter sections
 - scoped paths so agents only edit allowed writing/project files
 - artifact output so drafts, reviews, and notes are preserved
 - retry loops for revision
 - planner/reviewer/debugger-style roles
 - repo context and semantic context retrieval
 - command stages that could run deterministic checks
 - file-writer stages that can update Markdown files
 - `lookup_requests` so agents can ask to read worldbuilding or prior scenes
 That means this may not require a totally new engine. It may mostly need a new template and some writing-specific validation/review stages.
 ## Likely Workflow
 One practical pipeline:
 ```text
 plan_scene
 gather_context
 draft_scene
 validate_scene
 continuity_review
 style_review
 update_plot_state
 summarize
 ```
 Possible roles:
 - Planner: turns the scene task into a beat plan.
 - Context agent: pulls relevant worldbuilding, character, and plot-state excerpts.
 - Drafting agent: writes the scene.
 - Continuity reviewer: checks contradictions against known state.
 - Style reviewer: checks POV, tone, pacing, and prose constraints.
 - State updater: updates `plot-state.md`, `characters.md`, and maybe `timeline.md`.
 ## Chunking Strategy
 Do not make a task equal to "write chapter 4" unless chapters are short.
 Better units:
 - scene
 - scene fragment
 - chapter section
 - revision pass for one scene
 - continuity update after one scene
 - prose polish for one scene
 A chapter can be assembled from multiple scene files:
 ```text
 chapters/
  chapter-001/
    scene-001.md
    scene-002.md
    scene-003.md
  chapter-001.md
 ```
 Then a later command or agent stage can compile `chapter-001.md`.
 ## Durable State Files
 The most important design piece is explicit state.
 Recommended files:
 ```text
 story/
  worldbuilding.md
  style-guide.md
  characters.md
  timeline.md
  plot-state.md
  unresolved-threads.md
  continuity-rules.md
  outline.md
  chapters/
 ```
 `plot-state.md` should be updated after every completed scene.
 It should track:
 - current character locations
 - known secrets
 - promises made to the reader
 - unresolved questions
 - relationships
 - injuries/resources/items
 - timeline date/time
 - what each POV character currently knows
 This is the fiction equivalent of application state.
 ## Validation Ideas
 Some checks can be deterministic:
 - word count range
 - file exists
 - only allowed files changed
 - Markdown heading format
 - no forbidden placeholders like `TODO`, `[insert]`, or `TBD`
 - no accidental author notes in final prose
 - required task terms are present
 - output compiles into a chapter file
 Some checks need model review:
 - continuity with worldbuilding
 - character voice consistency
 - POV discipline
 - pacing
 - whether the scene satisfies the beat plan
 - whether exposition is too direct
 - whether the state update accurately reflects the scene
 The key is not to overtrust model review. It should produce actionable retry notes, not silently bless everything.
 ## What Might Be Missing
 ### 1. Better Non-Code Templates
 This likely needs a dedicated template:
 ```text
 tutorial-deaddrop
 tutorial-novel
 ```
 or:
 ```text
 writer-novel
 ```
 The template would include:
 - starter story files
 - writing prompts
 - task examples
 - validation commands
 - allowed paths
 - recommended pipeline
 ### 2. Better Markdown Patch/File Handling
 The current file-writer flow can work, but fiction output may be long. It may be safer to require complete file blocks for one scene file at a time.
 The workflow should avoid having an agent rewrite the whole novel or whole `plot-state.md` unless necessary.
 ### 3. Stronger State Update Governance
 The risky part is not drafting prose. The risky part is bad state updates.
 Example failure:
 - the scene says Mara never saw the prince
 - the state updater records that Mara recognized the prince
 - future scenes build on the wrong state
 A state update should probably be reviewed against the actual scene before being applied.
 Possible pipeline:
 ```text
 draft_scene -> review_scene -> propose_state_update -> review_state_update -> apply
 ```
 ### 4. Context Window Management
 Worldbuilding documents can get large.
 The agent should not receive the entire story bible every time. It should receive:
 - the current task
 - relevant worldbuilding excerpts
 - relevant character entries
 - recent scene summaries
 - current plot state
 - style guide
 Semantic search is probably enough for a first version, but a novel template may want a more explicit index:
 ```text
 world-index.md
 character-index.md
 location-index.md
 ```
 ### 5. Scene Dependency Tracking
 Coding tasks already have dependencies. Fiction tasks would need the same:
 ```text
 Dependencies:
 - SCENE-001
 - SCENE-002
 ```
 This prevents writing a later scene before the required earlier story state exists.
 ### 6. Revision Workflows
 Writing is not only forward generation.
 Useful task types:
 - draft new scene
 - revise scene for pacing
 - revise dialogue
 - continuity repair
 - line edit
 - chapter assembly
 - chapter-level review
 - update outline after discovery writing
 NightShift can already represent these as tasks, but the prompts should distinguish them clearly.
 ### 7. Output Length Controls
 Long fiction output needs explicit limits.
 Use:
 - scene word count bounds
 - `num_predict`
 - task acceptance criteria
 - smaller scene files
 Do not ask for "write chapter 12" unless the chapter has already been broken into beats.
 ## Suggested First Template
 Start with a minimal `writer-novel` template.
 Files:
 ```text
 nightshift.yaml
 .nightshift/tasks.md
 .nightshift/agents/planner.md
 .nightshift/agents/drafter.md
 .nightshift/agents/continuity-reviewer.md
 .nightshift/agents/style-reviewer.md
 .nightshift/agents/state-updater.md
 story/worldbuilding.md
 story/characters.md
 story/style-guide.md
 story/plot-state.md
 story/timeline.md
 story/unresolved-threads.md
 story/chapters/.gitkeep
 ```
 Pipeline:
 ```text
 plan
 semantic_context
 context
 draft
 validate_draft
 continuity_review
 style_review
 update_state
 validate_state
 summarize
 ```
 Allowed paths:
 ```yaml
 scoped_paths:
  - story
  - .nightshift/tasks.md
 ```
 Draft stage allowed paths:
 ```yaml
 allowed_paths:
  - story/chapters
 ```
 State update stage allowed paths:
 ```yaml
 allowed_paths:
  - story/plot-state.md
  - story/characters.md
  - story/timeline.md
  - story/unresolved-threads.md
 ```
 That separation matters. The drafter should not freely rewrite the world bible, and the state updater should not rewrite the scene prose.
 ## What We Should Not Do First
 Do not start with:
 - automatic full-plot generation
 - full chapter generation
 - global rewrites of all prior chapters
 - one giant `worldbuilding.md` dumped into every prompt
 - trusting the model to maintain continuity without explicit state files
 Those are likely to produce impressive-looking but unstable output.
 ## Practical First Experiment
 A good first test:
 1. Create a tiny worldbuilding document.
 2. Create three characters.
 3. Create five scene tasks.
 4. Have NightShift draft one scene at a time.
 5. After each scene, update `plot-state.md`.
 6. Run continuity review against only the scene, state files, and relevant worldbuilding.
 7. Inspect artifacts.
 Success criteria:
 - scenes land in the right files
 - word counts stay bounded
 - state updates are accurate
 - future scenes use prior state correctly
 - reviewers catch obvious contradictions
 ## Bottom Line
 Theoretically, NightShift already has many of the needed utilities.
 The missing piece is mostly a writing-oriented template with:
 - scene-sized tasks
 - durable story state files
 - strict path separation between prose and state updates
 - writing-specific prompts
 - lightweight deterministic validators
 - continuity/style review stages
 This is viable, but it should start as a constrained scene-writing workflow, not an autonomous novel generator.
--- a/nightshift/patches.py
+++ b/nightshift/patches.py
@ -162,7 +162,6 @@ def generate_patch_from_file_updates(
        _validate_allowed_patch_path(normalized_path, root, allowed_paths)
        file_path = resolve_inside_root(root, normalized_path, f"file update '{normalized_path}'")
        old_text = file_path.read_text(encoding="utf-8", errors="replace") if file_path.exists() else ""
        _validate_protected_character_canon(normalized_path, old_text, update.content)
        if old_text == update.content:
            continue
        patch_parts.extend(_diff_for_file(normalized_path, old_text, update.content, file_path.exists()))
@ -225,51 +224,6 @@ def _validate_allowed_patch_path(path_text: str, root: Path, allowed_paths: tupl
        )
 def _validate_protected_character_canon(path_text: str, old_text: str, new_text: str) -> None:
    if path_text.replace("\\", "/") != "story/characters.md" or not old_text:
        return
    old_sections = _pronoun_reference_sections(old_text)
    if not old_sections:
        return
    new_sections = _pronoun_reference_sections(new_text)
    changed = [
        character
        for character, old_section in old_sections.items()
        if new_sections.get(character) != old_section
    ]
    if changed:
        names = ", ".join(changed)
        raise PipelineError(
            "File writer error: protected character pronoun canon changed in "
            f"`story/characters.md` for: {names}."
        )
 def _pronoun_reference_sections(text: str) -> dict[str, str]:
    sections: dict[str, str] = {}
    current_character: str | None = None
    lines = text.splitlines()
    index = 0
    while index < len(lines):
        line = lines[index]
        if line.startswith("## "):
            current_character = line[3:].strip()
            index += 1
            continue
        if current_character and line.strip() == "### Pronouns / Reference":
            start = index
            index += 1
            while index < len(lines):
                candidate = lines[index]
                if candidate.startswith("## ") or candidate.startswith("### "):
                    break
                index += 1
            sections[current_character] = "\n".join(lines[start:index]).strip()
            continue
        index += 1
    return sections
 def format_validation_result(result: PatchValidationResult) -> str:
    return "\n".join(
        [
--- a/nightshift/pipeline.py
+++ b/nightshift/pipeline.py
@ -48,6 +48,7 @@ from .runlog import RunLogger
 from .stages import StageResult
 from .tasks import Task, mark_task_completed
 from .telemetry import TelemetryEntry, format_telemetry_summary, telemetry_from_stage_output
 from .writing_validators import validate_writing_file_updates
@dataclass(frozen=True)
@ -776,6 +777,8 @@ class PipelineRunner:
                    updates,
                    retry_count,
                )
                if _is_writing_file_writer_stage(stage):
                    validate_writing_file_updates(updates, self.config.project.root)
                patch = generate_patch_from_file_updates(
                    updates,
                    self.config.project.root,
@ -794,6 +797,8 @@ class PipelineRunner:
                    and len(allowed_updates) < len(updates)
                    and "not allowed for this stage" in str(exc)
                ):
                    if _is_writing_file_writer_stage(stage):
                        validate_writing_file_updates(allowed_updates, self.config.project.root)
                    patch = generate_patch_from_file_updates(
                        allowed_updates,
                        self.config.project.root,
@ -1322,8 +1327,9 @@ class PipelineRunner:
            "Previous review output was malformed. Return exactly four lines: status, reason, next_stage, context_update. Do not return prose, headings, or analysis.",
        ]
        strict_outputs = _review_previous_outputs(previous_outputs)
        malformed_stdout = self._read_agent_stdout(malformed_result.output_path).strip()
        strict_outputs["malformed_review_output"] = _compact_previous_output(
-            self._read_output(malformed_result.output_path),
+            malformed_stdout if malformed_stdout else self._read_output(malformed_result.output_path),
            max_chars=800,
        )
        result = self.agent_executor.run_stage(
@ -1336,6 +1342,17 @@ class PipelineRunner:
            retry_context="\n".join(f"- {note}" for note in strict_notes),
        )
        if _is_malformed_review_result(result):
            if stage.id == "style_review" and _previous_continuity_review_passed(previous_outputs):
                return StageResult(
                    result.stage_id,
                    "pass",
                    (
                        "Style review output remained malformed after strict retry; "
                        "continuing because continuity review passed and deterministic validators already ran."
                    ),
                    output_path=result.output_path,
                    context_update="Style review was malformed twice; treated as soft-pass after continuity passed.",
                )
            return StageResult(
                result.stage_id,
                "fail",
@ -1784,6 +1801,13 @@ def _failure_target_stage(stage: StageConfig, result: StageResult) -> str | None
    return stage.on_fail
 def _previous_continuity_review_passed(previous_outputs: dict[str, str]) -> bool:
    for name, output in previous_outputs.items():
        if "continuity" in name and re.search(r"(?im)^status:\s*pass\s*$", output):
            return True
    return False
 def _review_previous_outputs(previous_outputs: dict[str, str], max_chars: int = 1600) -> dict[str, str]:
    compacted: dict[str, str] = {}
    priority_names = {
@ -1896,6 +1920,18 @@ def _is_scene_edit_stage(stage: StageConfig) -> bool:
    return stage.type == "file_writer" and stage.id.startswith("edit_") and "story/chapters" in allowed
 def _is_writing_file_writer_stage(stage: StageConfig) -> bool:
    allowed = {path.replace("\\", "/").rstrip("/") for path in stage.allowed_paths}
    writing_paths = {
        "story/chapters",
        "story/plot-state.md",
        "story/characters.md",
        "story/timeline.md",
        "story/unresolved-threads.md",
    }
    return stage.type == "file_writer" and bool(allowed & writing_paths)
 def _task_story_chapter_paths(task: Task) -> tuple[str, ...]:
    paths: list[str] = []
    seen: set[str] = set()
--- a/nightshift/project_templates/tutorial-novel/.nightshift/agents/continuity-reviewer.md
+++ b/nightshift/project_templates/tutorial-novel/.nightshift/agents/continuity-reviewer.md
@ -23,6 +23,15 @@ Do not fail the scene because durable state files are not updated yet. State fil
 Wrong pronouns are a continuity failure. If a drafted scene uses non-canonical pronouns for a named character, return `status: fail` and explain which character drifted. Do not pass the scene with only `context_update` guidance.
 Pronoun canon quick reference:
 - Proxy: she/her
 - BLOODMONEY: narrative default they/them; he/him allowed only when dialogue or close character voice has a specific reason
 - Cricket: she/her
 - Saint: he/him
 - Miette: she/her
 If retry notes, previous reviewer output, or generated scene text conflict with `story/characters.md`, obey `story/characters.md`. Do not infer pronouns from a previous failure note. Before failing a pronoun issue, verify the character's `Pronouns / Reference` section.
 Output exactly:
 status: pass | fail | retry | escalate
--- a/nightshift/project_templates/tutorial-novel/.nightshift/agents/editor.md
+++ b/nightshift/project_templates/tutorial-novel/.nightshift/agents/editor.md
@ -9,6 +9,8 @@ Rules:
 - Use `story/style-guide.md` for POV, tense, tone, and prose rules.
 - Use `story/characters.md`, especially `Pronouns / Reference`, as hard canon.
 - Wrong pronouns are mandatory fixes.
 - If retry notes or reviewer feedback conflict with `story/characters.md`, obey `story/characters.md`.
 - Never change correct canonical pronouns because a review note claims a different canon.
 - Do not edit state files, worldbuilding, outline, continuity rules, or style guide.
 - Do not resolve future plot threads unless the task explicitly asks for that.
 - Do not include author notes, TODOs, bracket placeholders, or analysis in the scene file.
--- a/nightshift/writing_validators.py
+++ b/nightshift/writing_validators.py
@ -0,0 +1,175 @@
 """Writing-workflow validators.
 These checks are intentionally kept out of the generic patch generator so code
 generation can continue to treat file blocks as ordinary project files.
 """
 from __future__ import annotations
 from pathlib import Path
 import re
 from .errors import PipelineError
 from .patches import FileUpdate
 def validate_writing_file_updates(updates: tuple[FileUpdate, ...], project_root: Path) -> None:
    """Validate writing-specific invariants for novel scene/state file updates."""
    root = Path(project_root)
    characters_path = root / "story" / "characters.md"
    character_sections = (
        _pronoun_reference_sections(characters_path.read_text(encoding="utf-8", errors="replace"))
        if characters_path.is_file()
        else {}
    )
    for update in updates:
        normalized_path = update.path.replace("\\", "/").strip().strip("/")
        if normalized_path == "story/characters.md":
            _validate_protected_character_canon(normalized_path, character_sections, update.content)
        if normalized_path.startswith("story/chapters/") and normalized_path.endswith(".md"):
            _validate_scene_pronoun_canon(normalized_path, update.content, character_sections)
 def _validate_protected_character_canon(
    path_text: str,
    old_sections: dict[str, str],
    new_text: str,
 ) -> None:
    if path_text != "story/characters.md" or not old_sections:
        return
    new_sections = _pronoun_reference_sections(new_text)
    changed = [
        character
        for character, old_section in old_sections.items()
        if new_sections.get(character) != old_section
    ]
    if changed:
        names = ", ".join(changed)
        raise PipelineError(
            "File writer error: protected character pronoun canon changed in "
            f"`story/characters.md` for: {names}."
        )
 def _validate_scene_pronoun_canon(
    path_text: str,
    scene_text: str,
    sections: dict[str, str],
 ) -> None:
    if not sections:
        return
    rules = _pronoun_rules_from_sections(sections)
    if not rules:
        return
    aliases = {alias: character for character in rules for alias in _character_aliases(character)}
    active_character: str | None = None
    for sentence in _scene_sentences(scene_text):
        present = {
            character
            for alias, character in aliases.items()
            if re.search(rf"\b{re.escape(alias)}\b", sentence)
        }
        if len(present) > 1:
            active_character = None
            continue
        character = next(iter(present)) if present else active_character
        if character is None:
            continue
        forbidden = rules[character]
        if present:
            bad = _first_forbidden_pronoun(sentence, forbidden)
            active_character = character
        else:
            bad = _leading_forbidden_pronoun(sentence, forbidden)
            if not bad:
                active_character = None
        if bad:
            excerpt = sentence.strip()
            if len(excerpt) > 160:
                excerpt = excerpt[:157].rstrip() + "..."
            raise PipelineError(
                "File writer error: scene pronoun canon violation for "
                f"{character}: found `{bad}` near character reference. Excerpt: {excerpt}"
            )
 def _first_forbidden_pronoun(sentence: str, forbidden: tuple[str, ...]) -> str | None:
    return next(
        (
            pronoun
            for pronoun in forbidden
            if re.search(rf"\b{re.escape(pronoun)}\b", sentence, flags=re.IGNORECASE)
        ),
        None,
    )
 def _leading_forbidden_pronoun(sentence: str, forbidden: tuple[str, ...]) -> str | None:
    stripped = sentence.strip()
    return next(
        (
            pronoun
            for pronoun in forbidden
            if re.match(rf"^{re.escape(pronoun)}\b", stripped, flags=re.IGNORECASE)
        ),
        None,
    )
 def _pronoun_rules_from_sections(sections: dict[str, str]) -> dict[str, tuple[str, ...]]:
    rules: dict[str, tuple[str, ...]] = {}
    for character, section in sections.items():
        match = re.search(r"(?im)^-\s*Pronouns:\s*(?P<pronouns>.+?)\s*$", section)
        if not match:
            continue
        pronouns = match.group("pronouns").lower()
        forbidden: set[str] = set()
        if "she/her" not in pronouns:
            forbidden.update({"she", "her", "hers", "herself"})
        if "he/him" not in pronouns:
            forbidden.update({"he", "him", "his", "himself"})
        if "they/them" not in pronouns:
            forbidden.update({"they", "them", "their", "theirs", "themselves"})
        if forbidden:
            rules[character] = tuple(sorted(forbidden))
    return rules
 def _character_aliases(character: str) -> tuple[str, ...]:
    base = re.sub(r"\s*\([^)]*\)", "", character).strip()
    aliases = {base}
    if base.startswith("DJ "):
        aliases.add(base[3:].strip())
    if " aka " in base:
        aliases.update(part.strip() for part in base.split(" aka ") if part.strip())
    return tuple(alias for alias in aliases if alias)
 def _scene_sentences(text: str) -> tuple[str, ...]:
    return tuple(part for part in re.split(r"(?<=[.!?])\s+|\n{2,}", text) if part.strip())
 def _pronoun_reference_sections(text: str) -> dict[str, str]:
    sections: dict[str, str] = {}
    current_character: str | None = None
    lines = text.splitlines()
    index = 0
    while index < len(lines):
        line = lines[index]
        if line.startswith("## "):
            current_character = line[3:].strip()
            index += 1
            continue
        if current_character and line.strip() == "### Pronouns / Reference":
            start = index
            index += 1
            while index < len(lines):
                candidate = lines[index]
                if candidate.startswith("## ") or candidate.startswith("### "):
                    break
                index += 1
            sections[current_character] = "\n".join(lines[start:index]).strip()
            continue
        index += 1
    return sections
--- a/tests/test_patches.py
+++ b/tests/test_patches.py
@ -375,48 +375,5 @@ new
            self.assertEqual(patch.count("diff --git a/app.py b/app.py"), 1)
    def test_file_updates_reject_character_pronoun_canon_changes(self) -> None:
        with tempfile.TemporaryDirectory() as directory:
            root = Path(directory)
            (root / "story").mkdir()
            (root / "story" / "characters.md").write_text(
                """# Characters
 ## Cricket
 ### Pronouns / Reference
 - Pronouns: she/her
 - Narrative reference: Cricket; she/her
 Scavenger.
 """,
                encoding="utf-8",
            )
            safety = SafetyConfig(
                require_clean_worktree=False,
                scoped_paths=("story",),
                allowed_commands=(),
                forbidden_commands=(),
            )
            updates = parse_file_updates(
                """FILE: story/characters.md
 ---CONTENT---
 # Characters
 ## Cricket
 ### Pronouns / Reference
 - Pronouns: they/them
 - Narrative reference: Cricket; they/them
 Scavenger.
 ---END---
 """
            )
            with self.assertRaisesRegex(PipelineError, "protected character pronoun canon changed"):
                generate_patch_from_file_updates(updates, root, safety)
 if __name__ == "__main__":
    unittest.main()
--- a/tests/test_pipeline.py
+++ b/tests/test_pipeline.py
@ -269,6 +269,48 @@ class PipelineRunnerTests(unittest.TestCase):
            self.assertIn("files", (task_dir / "review.md").read_text(encoding="utf-8"))
            self.assertIn("strict retry ok", (task_dir / "review-1.md").read_text(encoding="utf-8"))
    def test_malformed_review_retry_uses_stdout_summary_not_full_prompt_artifact(self) -> None:
        with tempfile.TemporaryDirectory() as directory:
            root = Path(directory)
            _write_common_files(root)
            (root / "fake_reviewer.py").write_text(
                "\n".join(
                    [
                        "import sys",
                        "prompt = sys.stdin.read()",
                        "if 'Previous review output was malformed' in prompt:",
                        "    open('retry-prompt.txt', 'w', encoding='utf-8').write(prompt)",
                        "    print('status: pass')",
                        "    print('reason: strict retry ok')",
                        "    print('next_stage:')",
                        "    print('context_update:')",
                        "else:",
                        "    print('No extra text. No JSON.')",
                    ]
                ),
                encoding="utf-8",
            )
            stages = (
                StageConfig(id="implement", type="agent", agent="planner", output="implementation-log.md"),
                StageConfig(id="review", type="agent_review", agent="reviewer", output="review.md"),
            )
            config = make_config(root, stages, max_retries=1)
            config.agents["reviewer"] = AgentConfig(
                id="reviewer",
                backend="command",
                command="python fake_reviewer.py",
                system_prompt=Path("reviewer.md"),
            )
            runner = PipelineRunner(config, ArtifactStore(root, ".nightshift", run_id="test-run"))
            result = runner.run_task(parse_tasks(TASK_MD)[0])
            retry_prompt = (root / "retry-prompt.txt").read_text(encoding="utf-8")
            self.assertEqual(result.status, "complete")
            self.assertIn("malformed_review_output", retry_prompt)
            self.assertIn("No extra text. No JSON.", retry_prompt)
            self.assertNotIn("## Prompt", retry_prompt)
    def test_malformed_review_stops_without_on_fail_redraft(self) -> None:
        with tempfile.TemporaryDirectory() as directory:
            root = Path(directory)
@ -304,6 +346,31 @@ class PipelineRunnerTests(unittest.TestCase):
            self.assertTrue((task_dir / "review.md").exists())
            self.assertTrue((task_dir / "review-1.md").exists())
    def test_malformed_style_review_soft_passes_after_continuity_pass(self) -> None:
        with tempfile.TemporaryDirectory() as directory:
            root = Path(directory)
            _write_common_files(root)
            (root / "fake_style.py").write_text("print('No extra text. No JSON.')\n", encoding="utf-8")
            stages = (
                StageConfig(id="continuity_review", type="agent_review", agent="reviewer", output="continuity-review.md"),
                StageConfig(id="style_review", type="agent_review", agent="style", output="style-review.md"),
                StageConfig(id="summarize", type="summarize", output="final-notes.md"),
            )
            config = make_config(root, stages, max_retries=1)
            config.agents["style"] = AgentConfig(
                id="style",
                backend="command",
                command="python fake_style.py",
                system_prompt=Path("reviewer.md"),
            )
            runner = PipelineRunner(config, ArtifactStore(root, ".nightshift", run_id="test-run"))
            result = runner.run_task(parse_tasks(TASK_MD)[0])
            self.assertEqual(result.status, "complete")
            self.assertIn("Style review output remained malformed", result.stage_results[1].reason)
            self.assertEqual([item.stage_id for item in result.stage_results], ["continuity_review", "style_review", "summarize"])
    def test_passing_review_next_stage_is_ignored(self) -> None:
        with tempfile.TemporaryDirectory() as directory:
            root = Path(directory)
--- a/tests/test_writing_validators.py
+++ b/tests/test_writing_validators.py
@ -0,0 +1,104 @@
 from pathlib import Path
 import tempfile
 import unittest
 from nightshift.errors import PipelineError
 from nightshift.patches import FileUpdate
 from nightshift.writing_validators import validate_writing_file_updates
 class WritingValidatorTests(unittest.TestCase):
    def test_rejects_character_pronoun_canon_changes(self) -> None:
        with tempfile.TemporaryDirectory() as directory:
            root = Path(directory)
            (root / "story").mkdir()
            (root / "story" / "characters.md").write_text(
                """# Characters
 ## Cricket
 ### Pronouns / Reference
 - Pronouns: she/her
 - Narrative reference: Cricket; she/her
 Scavenger.
 """,
                encoding="utf-8",
            )
            updates = (
                FileUpdate(
                    path="story/characters.md",
                    content="""# Characters
 ## Cricket
 ### Pronouns / Reference
 - Pronouns: they/them
 - Narrative reference: Cricket; they/them
 Scavenger.
 """,
                ),
            )
            with self.assertRaisesRegex(PipelineError, "protected character pronoun canon changed"):
                validate_writing_file_updates(updates, root)
    def test_rejects_scene_pronoun_drift(self) -> None:
        with tempfile.TemporaryDirectory() as directory:
            root = Path(directory)
            (root / "story" / "chapters").mkdir(parents=True)
            (root / "story" / "characters.md").write_text(
                """# Characters
 ## Proxy
 ### Pronouns / Reference
 - Pronouns: she/her
 - Narrative reference: Proxy; she/her
 """,
                encoding="utf-8",
            )
            updates = (
                FileUpdate(
                    path="story/chapters/chapter-001/scene-001.md",
                    content="Proxy checked the rack. He shut down the bad job.\n",
                ),
            )
            with self.assertRaisesRegex(PipelineError, "scene pronoun canon violation for Proxy"):
                validate_writing_file_updates(updates, root)
    def test_allows_scene_pronouns_when_multiple_characters_make_ambiguous_sentence(self) -> None:
        with tempfile.TemporaryDirectory() as directory:
            root = Path(directory)
            (root / "story" / "chapters" / "chapter-001").mkdir(parents=True)
            (root / "story" / "characters.md").write_text(
                """# Characters
 ## Proxy
 ### Pronouns / Reference
 - Pronouns: she/her
 - Narrative reference: Proxy; she/her
 ## Saint
 ### Pronouns / Reference
 - Pronouns: he/him
 - Narrative reference: Saint; he/him
 """,
                encoding="utf-8",
            )
            updates = (
                FileUpdate(
                    path="story/chapters/chapter-001/scene-001.md",
                    content="Proxy watched Saint as he picked up the phone.\n",
                ),
            )
            validate_writing_file_updates(updates, root)
 if __name__ == "__main__":
    unittest.main()