diff --git a/docs/2026-05-22/iteration1.md b/docs/2026-05-22/iteration1.md new file mode 100644 index 0000000..91e7da9 --- /dev/null +++ b/docs/2026-05-22/iteration1.md @@ -0,0 +1,119 @@ +# Iteration 1: SCENE-002 Update State Failure + +Date: 2026-05-22 + +## Run Reviewed + +- Sandbox: `integ_runs/20260522T214944.385761Z` +- Run: `.nightshift/runs/20260522T215005.188534Z` +- Task: `SCENE-002` +- Final status: failed +- Failed stage: `update_state` + +## What Happened + +The scene workflow mostly succeeded: + +- `draft_scene` wrote the scene. +- `continuity_review` correctly failed the first draft for pronoun drift. +- `edit_scene` repaired the pronoun issue. +- `continuity_review` passed after edit. +- `style_review` passed. + +The remaining failure happened in `update_state`. + +NightShift reported: + +```text +File writer error: no file blocks found. Expected FILE: path with ---CONTENT---/---END--- or fenced blocks like ```file:path.py. +``` + +The model output did contain visible `FILE:` blocks, but it omitted the required `---END---` delimiter. It emitted: + +```text +FILE: story/plot-state.md +---CONTENT--- +... + +FILE: story/characters.md +---CONTENT--- +... +``` + +The current parser requires `---END---`, so it rejected all of the blocks. + +## Additional Risk Found + +The rejected state update also tried to rewrite character canon in unsafe ways: + +- It changed BLOODMONEY's pronoun reference to `he/him`. +- It changed Cricket's pronoun reference to `they/them`. +- It compressed/replaced larger parts of `story/characters.md`. + +That means simply accepting unterminated blocks is not enough. The parser can be more tolerant, but the state updater still needs stronger constraints so durable canon does not drift. + +## Suggested Fixes + +Short-term fixes for this iteration: + +1. Make `parse_file_updates` tolerate delimiter blocks that omit `---END---` when a new `FILE:` block or EOF clearly terminates the previous block. +2. Keep strict path validation and duplicate-file validation unchanged. +3. Strengthen the state-updater prompt: + - never edit `Pronouns / Reference` sections + - preserve existing character profiles + - prefer updating `plot-state.md`, `timeline.md`, and `unresolved-threads.md` + - edit `characters.md` only for small additive current-status facts +4. Add regression tests for unterminated delimiter parsing. + +Longer-term follow-up: + +- Add deterministic writing-state validation that rejects changes to protected canon sections such as `Pronouns / Reference`. +- Move character canon into structured data so pronoun constraints can be validated directly. + +## Planned Changes + +- Update delimiter block parsing in `nightshift/patches.py`. +- Add parser tests in `tests/test_patches.py`. +- Tighten `state-updater.md` in the tutorial novel template. +- Run focused parser tests and the full suite. + +## Changes Made + +- `parse_file_updates` now accepts delimiter-style file blocks that omit `---END---` when the next `FILE:` header or EOF clearly terminates the block. +- Added regression coverage for: + - unterminated delimiter blocks before another `FILE:` + - mixed terminated and unterminated delimiter blocks +- Strengthened the tutorial novel state updater prompt to protect character canon: + - never change `Pronouns / Reference` + - never change canonical pronouns, narrative reference, identity, or core wound + - prefer state/timeline/thread files over `characters.md` + - edit `characters.md` only for small additive current-status facts or new named characters +- Added deterministic protection in file-block patch generation: + - changes to existing `Pronouns / Reference` sections in `story/characters.md` are rejected before a patch is generated +- Added regression coverage for rejecting protected pronoun canon changes. + +## Verification + +Focused tests: + +```powershell +python -m pytest tests/test_patches.py tests/test_pipeline.py -q +``` + +Result: + +```text +56 passed +``` + +Full suite: + +```powershell +python -m pytest -q +``` + +Result: + +```text +199 passed, 4 subtests passed +``` diff --git a/docs/ideas.md b/docs/ideas.md index 03e1825..15c916a 100644 --- a/docs/ideas.md +++ b/docs/ideas.md @@ -99,33 +99,17 @@ Examples: This keeps the initial useful output visible even when strict rerun output is worse. -## P1: Store Raw Agent Invocations As JSON +## P1: Classify Writing Review Failures For Repair Routing -The human-readable agent artifact wraps stdout, stderr, and prompts in markdown fences. Nested markdown fences from model output can confuse downstream parsing. +The tutorial novel now has a short-term editor stage for review failures, but review failures should eventually be classified before routing. -Write a machine-readable artifact alongside the markdown artifact: +Candidate classes: -```text --agent-output.json -``` +- `local_edit`: pronoun drift, small continuity issue, missing beat, light style correction +- `redraft`: wrong premise, broken scene structure, impossible chronology, severe acceptance mismatch +- `escalate`: ambiguous canon conflict or user preference needed -Suggested fields: - -```json -{ - "agent_id": "drafter", - "stage_id": "draft_scene", - "command": "POST http://localhost:11434/api/generate", - "exit_code": 0, - "timed_out": false, - "duration_seconds": 12.3, - "stdout": "...", - "stderr": "...", - "prompt": "..." -} -``` - -Pipeline parsing should read raw JSON fields instead of recovering stdout from markdown. +Route `local_edit` to the editor, `redraft` to the drafter, and `escalate` to a clear user-facing failure. Keep original draft and edited draft artifacts side by side for comparison. ## P1: Add A Writing-Mode Validator @@ -140,6 +124,31 @@ Add deterministic checks for prose workflows: This should run before model review stages. +## P1: Use Structured State Events For Writing Workflows + +Replace model-written full state-file rewrites with compact structured state events, then let NightShift deterministically merge them into durable files such as: + +- `story/plot-state.md` +- `story/characters.md` +- `story/timeline.md` +- `story/unresolved-threads.md` + +Candidate state updater output: + +```yaml +events: + - file: story/plot-state.md + section: Completed Scenes + add: + - SCENE-001 complete; Saint and Miette introduced. + - file: story/unresolved-threads.md + section: Open Threads + add: + - Saint depends emotionally on Miette and needs compute tokens to keep her present. +``` + +NightShift would validate allowed files/sections, reject unknown targets, and apply append/update operations deterministically. This avoids asking a writing model to rewrite entire durable state files after every scene. + ## P2: Add A Test Analyzer Agent For TDD Defer until generated tests are stable. diff --git a/docs/writer-and-coder.md b/docs/writer-and-coder.md new file mode 100644 index 0000000..52fdd88 --- /dev/null +++ b/docs/writer-and-coder.md @@ -0,0 +1,136 @@ +# Writer And Coder Compatibility Audit + +Date: 2026-05-22 + +## Summary + +The recent writer workflow changes do not intentionally alter the code-generation templates or their stage routing. + +During this audit, one possible shared-pipeline regression was found and fixed: generic `file_writer` stages were compacting large previous outputs on the first attempt. Since coding templates use `file_writer` for implementation, that could have reduced coding context before the implementer saw it. The behavior now preserves full first-attempt previous outputs while still stripping wrapped agent prompts from prior agent artifacts. + +After that correction, the automated test suite passes. + +## Writer Changes Reviewed + +- Tutorial novel added a scene editor repair path: + - failed continuity/style review routes to `edit_scene` + - edited scene is normalized, validated, applied, then routed back to review + - passing `style_review` skips editor and routes to `update_state` +- Tutorial novel prompts now include stricter pronoun and state-update guidance. +- State update file-writer stages receive focused current state context. +- Scene editor file-writer stages receive `current_scene_file`. +- Agent invocations now write a sibling JSON artifact for reliable stdout/stderr extraction. +- Pipeline config now supports optional `on_pass` routing. + +## Coding Impact Findings + +### Finding 1: Coding templates were not directly changed + +No non-novel project template files changed in the current diff: + +- `basic` +- `real-simple` +- `real-long-running` +- `tutorial-deaddrop` +- `tutorial-imageboard` +- `tutorial-lisp` + +The new `editor` agent and review repair routing are only configured in `tutorial-novel/nightshift.yaml`. + +### Finding 2: `on_pass` is inert for existing coding configs + +`on_pass` defaults to `None`, so existing coding templates keep their prior linear pass behavior unless they explicitly opt in. + +Passing review stages still ignore model-provided `next_stage` values. This preserves the existing safety behavior where reviewers cannot jump around the pipeline on a pass unless the config has an explicit `on_pass`. + +### Finding 3: Code writer stages still use the same direct patch path + +`code_writer` stages still: + +- call the configured agent +- parse stdout as a unified diff +- support lookup-request reruns +- write implementation summaries +- feed patch normalizer/validator/apply stages as before + +The JSON agent artifact change only changes how NightShift reads agent stdout internally; it does not change the prompt contract or patch contract. + +### Finding 4: File-writer implementers had one possible context regression; fixed + +Potential issue found: + +- `_file_writer_previous_outputs` had started compacting large previous outputs even on first attempt. +- Coding templates such as DeadDrop use `file_writer` for implementation. +- That could have shortened planner/context output before the implementer saw it. + +Fix applied: + +- First-attempt `file_writer` stages now preserve full previous outputs. +- Retry attempts still compact large previous outputs to control prompt bloat. +- Wrapped agent artifacts still strip down to stdout so old prompts do not pollute later prompts. + +Regression coverage added: + +- `test_file_writer_first_attempt_preserves_large_previous_outputs` + +### Finding 5: State/editor special context branches are narrowly gated + +The new context enrichment branches are guarded by stage shape: + +- state update branch only applies to `file_writer` stages whose allowed paths are state files: + - `story/plot-state.md` + - `story/characters.md` + - `story/timeline.md` + - `story/unresolved-threads.md` +- scene editor branch only applies to `file_writer` stages whose id starts with `edit_` and whose allowed paths include `story/chapters` + +Normal coding implementer stages such as `implement`, `implement_junior`, and `implement_senior` do not match either branch. + +## Template Validation Notes + +Validated successfully: + +- `basic` +- `tutorial-deaddrop` +- `tutorial-novel` + +Validation still fails for these templates because `debugger` is configured but `.nightshift/agents/debugger.md` is missing: + +- `real-simple` +- `real-long-running` +- `tutorial-imageboard` +- `tutorial-lisp` + +Those failures are not caused by the writer changes; there is no current diff in those template directories. + +## Verification + +Focused tests: + +```powershell +python -m pytest tests/test_pipeline.py tests/test_config.py tests/test_agents.py -q +``` + +Result: + +```text +71 passed, 4 subtests passed +``` + +Full suite: + +```powershell +python -m pytest -q +``` + +Result: + +```text +196 passed, 4 subtests passed +``` + +## Conclusion + +After the first-attempt file-writer context fix, I do not see evidence that the writer workflow changes degrade code generation. The shared changes are either opt-in (`on_pass`), artifact-reading improvements (JSON stdout), or narrowly gated to novel state/editor stages. + +Remaining non-writer issue: several coding-oriented templates still reference a missing `debugger.md` prompt. That should be handled separately from this writer/coder compatibility pass. diff --git a/nightshift/agents.py b/nightshift/agents.py index 0d09825..ea82a44 100644 --- a/nightshift/agents.py +++ b/nightshift/agents.py @@ -3,6 +3,7 @@ from __future__ import annotations from dataclasses import dataclass +from dataclasses import asdict import json import os from pathlib import Path @@ -119,6 +120,11 @@ class AgentExecutor: output_filename = stage.output or f"{stage.id}.md" output = format_agent_invocation(stage.id, invocation) output_path = self.artifacts.write_stage_output(task.id, output_filename, output) + json_output_path = self.artifacts.write_stage_output( + task.id, + _agent_invocation_json_filename(output_filename), + format_agent_invocation_json(stage.id, invocation), + ) self.logger.event( "artifact.write", "Wrote agent artifact", @@ -126,6 +132,7 @@ class AgentExecutor: task_id=task.id, agent_id=agent.id, artifact_path=output_path.relative_to(self.project_root), + json_artifact_path=json_output_path.relative_to(self.project_root), ) if invocation.timed_out: @@ -520,13 +527,30 @@ def _file_writer_block_contract(stage: StageConfig) -> str: return "\n".join( [ "Use exactly this delimiter format for the scene file:", - "FILE: story/chapters/chapter-001/scene-001.md", + "FILE: ", "---CONTENT---", "", "---END---", "Do not use markdown code fences for prose scene output.", ] ) + state_paths = { + "story/plot-state.md", + "story/characters.md", + "story/timeline.md", + "story/unresolved-threads.md", + } + if set(normalized).issubset(state_paths) and normalized: + return "\n".join( + [ + "Use exactly this delimiter format for each state file you update:", + "FILE: story/plot-state.md", + "---CONTENT---", + "", + "---END---", + "Do not use markdown code fences for state update output.", + ] + ) return "\n".join( [ "Use one fenced block per file with this exact opening form:", @@ -622,3 +646,18 @@ def format_agent_invocation(stage_id: str, invocation: AgentInvocation) -> str: "", ] ) + + +def format_agent_invocation_json(stage_id: str, invocation: AgentInvocation) -> str: + data = { + **asdict(invocation), + "stage_id": stage_id, + } + return json.dumps(data, ensure_ascii=False, indent=2) + "\n" + + +def _agent_invocation_json_filename(output_filename: str) -> str: + path = Path(output_filename) + if path.suffix: + return path.with_suffix(".json").as_posix() + return path.with_name(path.name + ".json").as_posix() diff --git a/nightshift/config.py b/nightshift/config.py index ade36d2..15041a7 100644 --- a/nightshift/config.py +++ b/nightshift/config.py @@ -61,6 +61,7 @@ class StageConfig: commands: tuple[str, ...] = () output: str | None = None on_fail: str | None = None + on_pass: str | None = None shell: bool = True timeout_seconds: int | None = None working_dir: Path | None = None @@ -392,6 +393,7 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig: commands=commands, output=_optional_string(stage_raw.get("output"), f"{stage_context}.output"), on_fail=_optional_string(stage_raw.get("on_fail"), f"{stage_context}.on_fail"), + on_pass=_optional_string(stage_raw.get("on_pass"), f"{stage_context}.on_pass"), shell=_optional_bool(stage_raw.get("shell", True), f"{stage_context}.shell"), timeout_seconds=timeout_seconds, working_dir=Path(working_dir_raw) if working_dir_raw else None, @@ -416,6 +418,10 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig: raise ConfigError( f"Config error: stage '{stage.id}' on_fail references unknown stage '{stage.on_fail}'." ) + if stage.on_pass and stage.on_pass not in stage_ids: + raise ConfigError( + f"Config error: stage '{stage.id}' on_pass references unknown stage '{stage.on_pass}'." + ) return NightShiftConfig( path=config_path, diff --git a/nightshift/patches.py b/nightshift/patches.py index c64eced..819ba6a 100644 --- a/nightshift/patches.py +++ b/nightshift/patches.py @@ -112,10 +112,26 @@ def parse_file_updates(text: str) -> tuple[FileUpdate, ...]: def _parse_delimited_file_updates(text: str) -> list[FileUpdate]: + updates: list[FileUpdate] = [] + header_pattern = re.compile(r"(?m)^FILE:\s*(?P[^\n]+)\n---CONTENT---\n") + matches = list(header_pattern.finditer(text)) + for index, match in enumerate(matches): + path = match.group("path").strip().strip("`") + content_start = match.end() + next_file_start = matches[index + 1].start() if index + 1 < len(matches) else len(text) + raw_content = text[content_start:next_file_start] + end_match = re.search(r"(?m)^---END---\s*$", raw_content) + if end_match: + raw_content = raw_content[: end_match.start()] + content = raw_content.rstrip("\r\n") + "\n" + if path: + updates.append(FileUpdate(path=path, content=content)) + if updates: + return updates + pattern = re.compile( r"(?ms)^FILE:\s*(?P[^\n]+)\n---CONTENT---\n(?P.*?)\n---END---\s*$" ) - updates: list[FileUpdate] = [] for match in pattern.finditer(text): path = match.group("path").strip().strip("`") content = match.group("content") @@ -146,6 +162,7 @@ def generate_patch_from_file_updates( _validate_allowed_patch_path(normalized_path, root, allowed_paths) file_path = resolve_inside_root(root, normalized_path, f"file update '{normalized_path}'") old_text = file_path.read_text(encoding="utf-8", errors="replace") if file_path.exists() else "" + _validate_protected_character_canon(normalized_path, old_text, update.content) if old_text == update.content: continue patch_parts.extend(_diff_for_file(normalized_path, old_text, update.content, file_path.exists())) @@ -208,6 +225,51 @@ def _validate_allowed_patch_path(path_text: str, root: Path, allowed_paths: tupl ) +def _validate_protected_character_canon(path_text: str, old_text: str, new_text: str) -> None: + if path_text.replace("\\", "/") != "story/characters.md" or not old_text: + return + old_sections = _pronoun_reference_sections(old_text) + if not old_sections: + return + new_sections = _pronoun_reference_sections(new_text) + changed = [ + character + for character, old_section in old_sections.items() + if new_sections.get(character) != old_section + ] + if changed: + names = ", ".join(changed) + raise PipelineError( + "File writer error: protected character pronoun canon changed in " + f"`story/characters.md` for: {names}." + ) + + +def _pronoun_reference_sections(text: str) -> dict[str, str]: + sections: dict[str, str] = {} + current_character: str | None = None + lines = text.splitlines() + index = 0 + while index < len(lines): + line = lines[index] + if line.startswith("## "): + current_character = line[3:].strip() + index += 1 + continue + if current_character and line.strip() == "### Pronouns / Reference": + start = index + index += 1 + while index < len(lines): + candidate = lines[index] + if candidate.startswith("## ") or candidate.startswith("### "): + break + index += 1 + sections[current_character] = "\n".join(lines[start:index]).strip() + continue + index += 1 + return sections + + def format_validation_result(result: PatchValidationResult) -> str: return "\n".join( [ diff --git a/nightshift/pipeline.py b/nightshift/pipeline.py index b07d87d..9a3fc64 100644 --- a/nightshift/pipeline.py +++ b/nightshift/pipeline.py @@ -3,6 +3,7 @@ from __future__ import annotations from dataclasses import dataclass, replace +import json from pathlib import Path import re import subprocess @@ -181,7 +182,7 @@ class PipelineRunner: stage_results.append(result) if stage.id in previous_outputs: del previous_outputs[stage.id] - previous_outputs[stage.id] = self._read_output(result.output_path) + previous_outputs[stage.id] = self._read_context_output(result.output_path) telemetry_entries.append(self._telemetry_entry(stage, result, retry_count)) self._write_telemetry(task.id, telemetry_entries) self.logger.event( @@ -198,6 +199,7 @@ class PipelineRunner: retry_notes.append(f"Context update from '{stage.id}': {result.context_update}") if result.status == "pass": + pass_target_stage = result.next_stage or stage.on_pass if stage.type in {"agent_review", "review"} and result.next_stage: self.logger.event( "stage.next_ignored", @@ -207,13 +209,12 @@ class PipelineRunner: stage_id=stage.id, requested_next_stage=result.next_stage, ) - index += 1 - continue - if result.next_stage: - if result.next_stage not in stage_indexes: + pass_target_stage = stage.on_pass + if pass_target_stage: + if pass_target_stage not in stage_indexes: final_status = "failed" final_reason = ( - f"Stage '{stage.id}' requested unknown next stage '{result.next_stage}'." + f"Stage '{stage.id}' requested unknown next stage '{pass_target_stage}'." ) break self.logger.event( @@ -222,18 +223,14 @@ class PipelineRunner: run_id=self.artifacts.run_id, task_id=task.id, stage_id=stage.id, - next_stage=result.next_stage, + next_stage=pass_target_stage, ) - index = stage_indexes[result.next_stage] + index = stage_indexes[pass_target_stage] continue index += 1 continue - target_stage = result.next_stage or ( - stage.on_fail - if not (stage.type in {"agent_review", "review"} and _is_malformed_review_result(result)) - else None - ) + target_stage = _failure_target_stage(stage, result) analysis_note = self._write_failure_diagnostics(stage, task, result, retry_count) if analysis_note: retry_notes.append(analysis_note) @@ -629,8 +626,7 @@ class PipelineRunner: task_context=context.task_context, retry_context=context.retry_context, ) - raw_output = self._read_output(result.output_path) - stdout = extract_agent_stdout(raw_output) + stdout = self._read_agent_stdout(result.output_path) lookup_requests = parse_lookup_requests(stdout) if lookup_requests and "diff --git " not in stdout: lookup_context = self.repo_tools.execute_requests( @@ -660,8 +656,7 @@ class PipelineRunner: task_context=context.task_context, retry_context="\n".join(f"- {note}" for note in rerun_notes), ) - raw_output = self._read_output(result.output_path) - stdout = extract_agent_stdout(raw_output) + stdout = self._read_agent_stdout(result.output_path) try: patch = extract_unified_diff(stdout) except PipelineError as exc: @@ -709,7 +704,18 @@ class PipelineRunner: ) -> StageResult: if stage.agent is None: raise PipelineError(f"Pipeline error: file_writer stage '{stage.id}' must reference an agent.") - enriched_outputs = _file_writer_previous_outputs(previous_outputs, retry_count) + if _is_state_update_stage(stage): + enriched_outputs = _state_update_previous_outputs(previous_outputs) + allowed_file_contents = self._allowed_file_contents(stage) + if allowed_file_contents: + enriched_outputs["current_allowed_files"] = allowed_file_contents + elif _is_scene_edit_stage(stage): + enriched_outputs = _file_writer_previous_outputs(previous_outputs, retry_count) + current_scene = self._task_scene_file_contents(task) + if current_scene: + enriched_outputs["current_scene_file"] = current_scene + else: + enriched_outputs = _file_writer_previous_outputs(previous_outputs, retry_count) context_pack_path = self._latest_task_artifact(task.id, "context-pack.md") if context_pack_path is not None: enriched_outputs["context-pack.md"] = context_pack_path.read_text(encoding="utf-8", errors="replace") @@ -727,8 +733,7 @@ class PipelineRunner: task_context=context.task_context, retry_context=context.retry_context, ) - raw_output = self._read_output(result.output_path) - stdout = extract_agent_stdout(raw_output) + stdout = self._read_agent_stdout(result.output_path) lookup_requests = parse_lookup_requests(stdout) if lookup_requests and "```file:" not in stdout.lower() and "```path:" not in stdout.lower(): lookup_context = self.repo_tools.execute_requests( @@ -758,8 +763,7 @@ class PipelineRunner: task_context=context.task_context, retry_context="\n".join(f"- {note}" for note in rerun_notes), ) - raw_output = self._read_output(result.output_path) - stdout = extract_agent_stdout(raw_output) + stdout = self._read_agent_stdout(result.output_path) invalid_rerun_done = False candidate_index_path: Path | None = None while True: @@ -803,7 +807,7 @@ class PipelineRunner: strict_notes = [ *retry_notes, "Previous file_writer output was invalid. Return complete file blocks now. Do not output lookup_requests, prose, or 'lookup failed'.", - "Use complete fenced file blocks with both the opening ```file:path and closing ``` fence.", + _file_writer_repair_format_note(stage), ] result = self.agent_executor.run_stage( agent_stage, @@ -814,8 +818,7 @@ class PipelineRunner: task_context=context.task_context, retry_context="\n".join(f"- {note}" for note in strict_notes), ) - raw_output = self._read_output(result.output_path) - stdout = extract_agent_stdout(raw_output) + stdout = self._read_agent_stdout(result.output_path) continue try: patch = normalize_patch_text(stdout) @@ -923,6 +926,44 @@ class PipelineRunner: lines.append("") return self.artifacts.write_stage_output(task_id, f"{base}/index.md", "\n".join(lines)) + def _allowed_file_contents(self, stage: StageConfig, max_chars: int = 2400) -> str: + sections: list[str] = [] + for path_text in stage.allowed_paths: + path = self.config.project.root / path_text + if not path.is_file(): + continue + content = path.read_text(encoding="utf-8", errors="replace") + sections.extend( + [ + f"## {path_text}", + "", + "```text", + _compact_previous_output(content, max_chars=max_chars).rstrip(), + "```", + "", + ] + ) + return "\n".join(sections).strip() + + def _task_scene_file_contents(self, task: Task, max_chars: int = 10000) -> str: + sections: list[str] = [] + for path_text in _task_story_chapter_paths(task): + path = self.config.project.root / path_text + if not path.is_file(): + continue + content = path.read_text(encoding="utf-8", errors="replace") + sections.extend( + [ + f"## {path_text}", + "", + "```text", + _compact_previous_output(content, max_chars=max_chars).rstrip(), + "```", + "", + ] + ) + return "\n".join(sections).strip() + def _writer_agent_stage(self, stage: StageConfig, retry_count: int) -> StageConfig: suffix = f"-{retry_count}" if retry_count else "" return replace( @@ -975,7 +1016,7 @@ class PipelineRunner: task_context=self.context.read_context(task, retry_notes).task_context, retry_context=self.context.read_context(task, retry_notes).retry_context, ) - source = extract_agent_stdout(self._read_output(result.output_path)) + source = self._read_agent_stdout(result.output_path) try: patch = normalize_patch_text(source) except PipelineError as exc: @@ -1127,8 +1168,7 @@ class PipelineRunner: task: Task, result: StageResult, ) -> StageResult | None: - output_text = self._read_output(result.output_path) - requests = parse_resource_requests(extract_agent_stdout(output_text)) + requests = parse_resource_requests(self._read_agent_stdout(result.output_path)) if not requests: return None paths = satisfy_resource_requests(self.artifacts, task.id, requests) @@ -1338,8 +1378,7 @@ class PipelineRunner: ) -> StageResult: if result.status != "pass" or result.output_path is None: return result - output_text = self._read_output(result.output_path) - requests = parse_lookup_requests(extract_agent_stdout(output_text)) + requests = parse_lookup_requests(self._read_agent_stdout(result.output_path)) if not requests: return result lookup_context = self.repo_tools.execute_requests( @@ -1457,6 +1496,25 @@ class PipelineRunner: return "" return path.read_text(encoding="utf-8") + def _read_context_output(self, output_path: str | None) -> str: + stdout = self._read_agent_stdout(output_path) + return stdout if stdout else self._read_output(output_path) + + def _read_agent_stdout(self, output_path: str | None) -> str: + if output_path is None: + return "" + path = self.config.project.root / Path(output_path) + json_path = _agent_invocation_json_path(path) + if json_path.exists(): + try: + data = json.loads(json_path.read_text(encoding="utf-8")) + except json.JSONDecodeError: + data = {} + stdout = data.get("stdout") + if isinstance(stdout, str): + return stdout + return extract_agent_stdout(self._read_output(output_path)) + def _format_retry_note( self, retry_count: int, @@ -1468,6 +1526,16 @@ class PipelineRunner: f"Retry {retry_count}: stage '{stage.id}' returned " f"{result.status} ({result.reason}); redirecting to '{target_stage}'." ) + if ( + target_stage == "update_state" + and "deletion-heavy patch" in result.reason.lower() + ): + note = ( + f"{note}\n" + "Repair guidance: preserve existing durable state text unless it directly conflicts " + "with the accepted scene. Make minimal additive edits instead of replacing whole " + "sections or compressing character/world files." + ) excerpt = self._failure_excerpt(result.output_path) if not excerpt: return note @@ -1683,6 +1751,16 @@ def _is_malformed_review_result(result: StageResult) -> bool: ) +def _failure_target_stage(stage: StageConfig, result: StageResult) -> str | None: + if stage.type not in {"agent_review", "review"}: + return result.next_stage or stage.on_fail + if _is_malformed_review_result(result): + return None + if result.next_stage and result.next_stage != stage.id: + return result.next_stage + return stage.on_fail + + def _review_previous_outputs(previous_outputs: dict[str, str], max_chars: int = 1600) -> dict[str, str]: compacted: dict[str, str] = {} priority_names = { @@ -1736,6 +1814,15 @@ def _file_writer_stage_guidance(stage: StageConfig) -> str: return "" +def _file_writer_repair_format_note(stage: StageConfig) -> str: + if _is_state_update_stage(stage): + return ( + "Use delimiter file blocks only: FILE: path, ---CONTENT---, complete file content, " + "---END---. Do not use markdown code fences for state update output." + ) + return "Use complete fenced file blocks with both the opening ```file:path and closing ``` fence." + + def _candidate_artifact_name(path_text: str) -> str: name = path_text.replace("\\", "/").strip().strip("/") name = re.sub(r"[^A-Za-z0-9_.-]+", "_", name) @@ -1748,14 +1835,70 @@ def _file_writer_previous_outputs( retry_count: int, max_chars: int = 1200, ) -> dict[str, str]: - if retry_count <= 0: - return dict(previous_outputs) compacted: dict[str, str] = {} for name, output in previous_outputs.items(): - compacted[name] = _compact_previous_output(output, max_chars=max_chars) + clean_output = _compact_agent_artifact_output(output) + if retry_count <= 0: + compacted[name] = clean_output + continue + compacted[name] = _compact_previous_output(clean_output, max_chars=max_chars) return compacted +def _is_state_update_stage(stage: StageConfig) -> bool: + state_paths = { + "story/plot-state.md", + "story/characters.md", + "story/timeline.md", + "story/unresolved-threads.md", + } + allowed = {path.replace("\\", "/").rstrip("/") for path in stage.allowed_paths} + return stage.type == "file_writer" and bool(allowed) and allowed.issubset(state_paths) + + +def _is_scene_edit_stage(stage: StageConfig) -> bool: + allowed = {path.replace("\\", "/").rstrip("/") for path in stage.allowed_paths} + return stage.type == "file_writer" and stage.id.startswith("edit_") and "story/chapters" in allowed + + +def _task_story_chapter_paths(task: Task) -> tuple[str, ...]: + paths: list[str] = [] + seen: set[str] = set() + for match in re.finditer(r"story/chapters/[^\s`]+?\.md", task.raw_markdown): + path = match.group(0).strip().strip("`") + if path not in seen: + paths.append(path) + seen.add(path) + return tuple(paths) + + +def _state_update_previous_outputs(previous_outputs: dict[str, str]) -> dict[str, str]: + compacted: dict[str, str] = {} + for name in ("draft_scene", "apply_draft", "continuity_review", "style_review"): + output = previous_outputs.get(name) + if output: + compacted[name] = _compact_previous_output(_compact_agent_artifact_output(output), max_chars=1800) + for name, output in previous_outputs.items(): + if name in compacted or name in {"plan", "semantic_context", "context"}: + continue + if "draft" in name or "review" in name or "apply" in name: + compacted[name] = _compact_previous_output(_compact_agent_artifact_output(output), max_chars=1200) + return compacted + + +def _compact_agent_artifact_output(output: str) -> str: + if "# Agent Output:" not in output or "## Prompt" not in output: + return output + stdout = extract_agent_stdout(output).strip() + return stdout if stdout else output + + +def _agent_invocation_json_path(output_path: Path) -> Path: + if output_path.suffix: + return output_path.with_suffix(".json") + return output_path.with_name(output_path.name + ".json") + + def _compact_previous_output(output: str, max_chars: int = 1200) -> str: if len(output) <= max_chars: return output diff --git a/nightshift/project_templates/tutorial-novel/.nightshift/agents/continuity-reviewer.md b/nightshift/project_templates/tutorial-novel/.nightshift/agents/continuity-reviewer.md index 3fe10d1..a700764 100644 --- a/nightshift/project_templates/tutorial-novel/.nightshift/agents/continuity-reviewer.md +++ b/nightshift/project_templates/tutorial-novel/.nightshift/agents/continuity-reviewer.md @@ -13,11 +13,16 @@ Review the drafted scene against: Check for: - contradictions - wrong character knowledge +- wrong character pronouns or narrative reference, using `Pronouns / Reference` in `story/characters.md` as hard canon - impossible locations or timing - accidental resolution of future threads - missing required beats from the task - invented lore that should have been added deliberately +Do not fail the scene because durable state files are not updated yet. State files are updated by a later `update_state` stage after review. If the task lists `Updates:`, treat those as future state-update requirements and mention them only as `context_update` guidance. + +Wrong pronouns are a continuity failure. If a drafted scene uses non-canonical pronouns for a named character, return `status: fail` and explain which character drifted. Do not pass the scene with only `context_update` guidance. + Output exactly: status: pass | fail | retry | escalate @@ -25,4 +30,4 @@ reason: next_stage: context_update: -When `status: pass`, leave `next_stage` blank. Use `retry` when the scene can be repaired by drafting again. +When `status: pass`, leave `next_stage` blank. Use `retry` when the scene can be repaired by drafting again. For retryable scene issues, leave `next_stage` blank; NightShift will route back to the configured drafting stage. diff --git a/nightshift/project_templates/tutorial-novel/.nightshift/agents/drafter.md b/nightshift/project_templates/tutorial-novel/.nightshift/agents/drafter.md index cb8ac35..22627cc 100644 --- a/nightshift/project_templates/tutorial-novel/.nightshift/agents/drafter.md +++ b/nightshift/project_templates/tutorial-novel/.nightshift/agents/drafter.md @@ -7,12 +7,14 @@ Rules: - Do not edit `story/worldbuilding.md`, `story/characters.md`, `story/style-guide.md`, `story/plot-state.md`, `story/timeline.md`, `story/unresolved-threads.md`, `story/continuity-rules.md`, or `story/outline.md`. - Use `story/style-guide.md` for POV, tense, tone, and prose rules. - Use `story/plot-state.md` and `story/timeline.md` as current state. +- Use the `Pronouns / Reference` sections in `story/characters.md` as hard canon. +- Do not infer, vary, or "smooth out" character pronouns. Use canonical narrative reference exactly. - Keep the scene bounded to the task acceptance criteria. - Do not resolve future plot threads unless the task explicitly asks for that. - Do not include author notes, TODOs, bracket placeholders, or analysis in the scene file. Output only one complete file block using this delimiter format: -FILE: story/chapters/chapter-001/scene-001.md +FILE: ---CONTENT--- ---END--- diff --git a/nightshift/project_templates/tutorial-novel/.nightshift/agents/editor.md b/nightshift/project_templates/tutorial-novel/.nightshift/agents/editor.md new file mode 100644 index 0000000..0bf9211 --- /dev/null +++ b/nightshift/project_templates/tutorial-novel/.nightshift/agents/editor.md @@ -0,0 +1,26 @@ +You are the scene editor for a NightShift novel-writing workflow. + +Edit an already drafted scene after a continuity or style review failure. + +Rules: +- Preserve the existing scene's structure, voice, events, pacing, and best lines. +- Make the smallest changes needed to satisfy the review failure and task acceptance criteria. +- Do not restart, summarize, replace the scene premise, or change scene direction. +- Use `story/style-guide.md` for POV, tense, tone, and prose rules. +- Use `story/characters.md`, especially `Pronouns / Reference`, as hard canon. +- Wrong pronouns are mandatory fixes. +- Do not edit state files, worldbuilding, outline, continuity rules, or style guide. +- Do not resolve future plot threads unless the task explicitly asks for that. +- Do not include author notes, TODOs, bracket placeholders, or analysis in the scene file. + +Use the `current_scene_file` context as the source text to edit. +Use the retry notes and latest review output to identify the required repair. + +Output only one complete file block using this delimiter format: +FILE: +---CONTENT--- + +---END--- + +Do not use markdown code fences for scene prose output. +Do not output a plan, notes, analysis, or any text outside the delimiter block. diff --git a/nightshift/project_templates/tutorial-novel/.nightshift/agents/state-updater.md b/nightshift/project_templates/tutorial-novel/.nightshift/agents/state-updater.md index 85c8ae5..a18c22e 100644 --- a/nightshift/project_templates/tutorial-novel/.nightshift/agents/state-updater.md +++ b/nightshift/project_templates/tutorial-novel/.nightshift/agents/state-updater.md @@ -21,8 +21,26 @@ State updates should reflect only what happened in the accepted scene: Do not invent events that are not in the scene. +Preserve existing durable state. Make minimal additive edits: +- append new scene facts, timeline bullets, character knowledge, and unresolved threads +- update current locations/status only where the accepted scene changes them +- do not remove or compress existing character profiles, faction notes, world notes, or open threads +- do not rewrite whole files for style, brevity, or cleanup +- if a section already contains useful detail, keep it and add only the new facts needed + +Protect character canon: +- Never change any `Pronouns / Reference` section. +- Never change a character's canonical pronouns, narrative reference, identity, or core wound. +- Prefer updating `story/plot-state.md`, `story/timeline.md`, and `story/unresolved-threads.md`. +- Edit `story/characters.md` only when the accepted scene adds a small current-status fact or introduces a new named character. +- If editing `story/characters.md`, preserve all existing sections and add only the minimal new status/detail needed. + Output only complete file content blocks. -Use one fenced block per file: -```file:story/plot-state.md +Use this delimiter format for each state file you update: + +FILE: story/plot-state.md +---CONTENT--- -``` +---END--- + +Do not use markdown code fences. Do not include prose outside FILE blocks. diff --git a/nightshift/project_templates/tutorial-novel/.nightshift/agents/style-reviewer.md b/nightshift/project_templates/tutorial-novel/.nightshift/agents/style-reviewer.md index 4da139c..8cd90ee 100644 --- a/nightshift/project_templates/tutorial-novel/.nightshift/agents/style-reviewer.md +++ b/nightshift/project_templates/tutorial-novel/.nightshift/agents/style-reviewer.md @@ -16,6 +16,8 @@ Check for: - placeholders such as TODO, TBD, `[insert]`, or author notes - scene length far outside the requested range +Do not fail the scene because durable state files are not updated yet. State files are updated by a later `update_state` stage after review. + Output exactly: status: pass | fail | retry | escalate @@ -23,4 +25,4 @@ reason: next_stage: context_update: -When `status: pass`, leave `next_stage` blank. Use `retry` when the drafter should revise the scene. +When `status: pass`, leave `next_stage` blank. Use `retry` when the drafter should revise the scene. For retryable scene issues, leave `next_stage` blank; NightShift will route back to the configured drafting stage. diff --git a/nightshift/project_templates/tutorial-novel/nightshift.yaml b/nightshift/project_templates/tutorial-novel/nightshift.yaml index 33b37a1..06b288b 100644 --- a/nightshift/project_templates/tutorial-novel/nightshift.yaml +++ b/nightshift/project_templates/tutorial-novel/nightshift.yaml @@ -37,6 +37,14 @@ agents: num_predict: 8192 system_prompt: .nightshift/agents/drafter.md + editor: + backend: ollama + model: nightshift-writer + temperature: 0.3 + num_ctx: 16384 + num_predict: 8192 + system_prompt: .nightshift/agents/editor.md + continuity_reviewer: backend: ollama model: nightshift-base @@ -110,13 +118,42 @@ pipeline: type: agent_review agent: continuity_reviewer output: continuity-review.md - on_fail: draft_scene + on_fail: edit_scene - id: style_review type: agent_review agent: style_reviewer output: style-review.md - on_fail: draft_scene + on_fail: edit_scene + on_pass: update_state + + - id: edit_scene + type: file_writer + agent: editor + output: scene-edit.patch + allowed_paths: + - story/chapters + + - id: normalize_edit + type: patch_normalizer + output: normalized-edit.patch + + - id: validate_edit + type: patch_validator + output: edit-validation.md + max_files: 2 + max_lines: 1200 + max_delete_ratio: 0.50 + allowed_paths: + - story/chapters + on_fail: edit_scene + + - id: apply_edit + type: patch_apply + mode: apply + output: edit-apply-output.txt + on_fail: edit_scene + on_pass: continuity_review - id: update_state type: file_writer diff --git a/tests/test_agents.py b/tests/test_agents.py index fbeaded..04d7f4f 100644 --- a/tests/test_agents.py +++ b/tests/test_agents.py @@ -4,7 +4,7 @@ import unittest from unittest.mock import MagicMock, patch from nightshift.agents import AgentExecutor, build_prompt_bundle, parse_review_output, strip_ansi_escape_sequences -from nightshift.agents import AgentInvocation, format_agent_invocation +from nightshift.agents import AgentInvocation, format_agent_invocation, format_agent_invocation_json from nightshift.artifacts import ArtifactStore from nightshift.config import AgentConfig, StageConfig from nightshift.tasks import parse_tasks @@ -93,7 +93,7 @@ class AgentExecutorTests(unittest.TestCase): self.assertIn("Use only paths under these project-relative targets: `story/chapters`.", prompt) self.assertIn("This is the drafting stage", prompt) - self.assertIn("FILE: story/chapters/chapter-001/scene-001.md", prompt) + self.assertIn("FILE: ", prompt) self.assertIn("---CONTENT---", prompt) self.assertIn("---END---", prompt) self.assertIn("Do not use markdown code fences", prompt) @@ -125,6 +125,10 @@ class AgentExecutorTests(unittest.TestCase): output = (root / result.output_path).read_text(encoding="utf-8") self.assertIn("TASK-001", output) self.assertIn("Plan carefully.", output) + json_output = (root / ".nightshift" / "runs" / "test-run" / "tasks" / task.id / "plan.json") + self.assertTrue(json_output.exists()) + self.assertIn('"stage_id": "plan"', json_output.read_text(encoding="utf-8")) + self.assertIn('"stdout"', json_output.read_text(encoding="utf-8")) def test_review_output_parser_accepts_structured_status(self) -> None: status, reason, next_stage, context_update = parse_review_output( @@ -238,6 +242,23 @@ class AgentExecutorTests(unittest.TestCase): self.assertIn("Agent: `planner`", output) self.assertIn("## stderr", output) + def test_agent_invocation_json_preserves_raw_streams(self) -> None: + invocation = AgentInvocation( + agent_id="planner", + command="cmd", + prompt="prompt with ``` fences", + exit_code=0, + stdout="stdout with ``` fences", + stderr="stderr", + duration_seconds=0.1, + ) + + output = format_agent_invocation_json("plan", invocation) + + self.assertIn('"stage_id": "plan"', output) + self.assertIn('stdout with ``` fences', output) + self.assertIn('prompt with ``` fences', output) + def test_strip_ansi_escape_sequences(self) -> None: self.assertEqual(strip_ansi_escape_sequences("\x1b[?25lthinking\x1b[0m"), "thinking") diff --git a/tests/test_config.py b/tests/test_config.py index 865219e..8d868d8 100644 --- a/tests/test_config.py +++ b/tests/test_config.py @@ -55,6 +55,40 @@ class ConfigTests(unittest.TestCase): with self.assertRaisesRegex(ConfigError, "on_fail references unknown stage"): load_config(config_path) + def test_on_pass_must_reference_existing_stage(self) -> None: + with tempfile.TemporaryDirectory() as directory: + root = Path(directory) + init_project(root) + config_path = root / "nightshift.yaml" + config_path.write_text( + config_path.read_text(encoding="utf-8").replace( + "on_fail: plan", "on_pass: missing_stage", 1 + ), + encoding="utf-8", + ) + + with self.assertRaisesRegex(ConfigError, "on_pass references unknown stage"): + load_config(config_path) + + def test_on_pass_loads(self) -> None: + with tempfile.TemporaryDirectory() as directory: + root = Path(directory) + init_project(root) + config_path = root / "nightshift.yaml" + config_path.write_text( + config_path.read_text(encoding="utf-8").replace( + " output: plan.md", + " output: plan.md\n on_pass: summarize", + 1, + ), + encoding="utf-8", + ) + + config = load_config(config_path) + plan_stage = next(stage for stage in config.pipeline.stages if stage.id == "plan") + + self.assertEqual(plan_stage.on_pass, "summarize") + def test_validate_requires_prompt_files(self) -> None: with tempfile.TemporaryDirectory() as directory: root = Path(directory) diff --git a/tests/test_patches.py b/tests/test_patches.py index 17cbb1d..72a7dcf 100644 --- a/tests/test_patches.py +++ b/tests/test_patches.py @@ -260,6 +260,47 @@ Sunlight did not belong here. self.assertEqual(updates[0].path, "story/chapters/chapter-001/scene-001.md") self.assertEqual(updates[0].content, "Sunlight did not belong here.\n") + def test_file_updates_parse_delimiters_without_end_before_next_file(self) -> None: + updates = parse_file_updates( + """Intro prose is ignored. + +FILE: story/plot-state.md +---CONTENT--- +# Plot State + +- Scene two happened. + +FILE: story/timeline.md +---CONTENT--- +# Timeline + +- SCENE-002 complete. +""" + ) + + self.assertEqual(len(updates), 2) + self.assertEqual(updates[0].path, "story/plot-state.md") + self.assertEqual(updates[0].content, "# Plot State\n\n- Scene two happened.\n") + self.assertEqual(updates[1].path, "story/timeline.md") + self.assertEqual(updates[1].content, "# Timeline\n\n- SCENE-002 complete.\n") + + def test_file_updates_parse_mixed_delimiter_end_and_next_file(self) -> None: + updates = parse_file_updates( + """FILE: story/plot-state.md +---CONTENT--- +first +---END--- + +FILE: story/timeline.md +---CONTENT--- +second +""" + ) + + self.assertEqual(len(updates), 2) + self.assertEqual(updates[0].content, "first\n") + self.assertEqual(updates[1].content, "second\n") + def test_file_updates_reject_duplicate_blocks(self) -> None: with tempfile.TemporaryDirectory() as directory: root = Path(directory) @@ -334,6 +375,48 @@ new self.assertEqual(patch.count("diff --git a/app.py b/app.py"), 1) + def test_file_updates_reject_character_pronoun_canon_changes(self) -> None: + with tempfile.TemporaryDirectory() as directory: + root = Path(directory) + (root / "story").mkdir() + (root / "story" / "characters.md").write_text( + """# Characters + +## Cricket + +### Pronouns / Reference +- Pronouns: she/her +- Narrative reference: Cricket; she/her + +Scavenger. +""", + encoding="utf-8", + ) + safety = SafetyConfig( + require_clean_worktree=False, + scoped_paths=("story",), + allowed_commands=(), + forbidden_commands=(), + ) + updates = parse_file_updates( + """FILE: story/characters.md +---CONTENT--- +# Characters + +## Cricket + +### Pronouns / Reference +- Pronouns: they/them +- Narrative reference: Cricket; they/them + +Scavenger. +---END--- +""" + ) + + with self.assertRaisesRegex(PipelineError, "protected character pronoun canon changed"): + generate_patch_from_file_updates(updates, root, safety) + if __name__ == "__main__": unittest.main() diff --git a/tests/test_pipeline.py b/tests/test_pipeline.py index 2076c8c..f472bdf 100644 --- a/tests/test_pipeline.py +++ b/tests/test_pipeline.py @@ -105,6 +105,30 @@ class PipelineRunnerTests(unittest.TestCase): ) self.assertIn("Modified Files", (root / ".nightshift" / "runs" / "test-run" / "run-summary.md").read_text(encoding="utf-8")) + def test_on_pass_jumps_to_configured_stage(self) -> None: + with tempfile.TemporaryDirectory() as directory: + root = Path(directory) + _write_common_files(root) + stages = ( + StageConfig(id="first", type="agent", agent="planner", output="first.md", on_pass="third"), + StageConfig( + id="second", + type="command", + commands=('python -c "print(\'should not run\')"',), + output="second-output.txt", + ), + StageConfig(id="third", type="summarize", output="final-notes.md"), + ) + config = make_config(root, stages) + runner = PipelineRunner(config, ArtifactStore(root, ".nightshift", run_id="test-run")) + + result = runner.run_task(parse_tasks(TASK_MD)[0]) + + task_dir = root / ".nightshift" / "runs" / "test-run" / "tasks" / "TASK-001" + self.assertEqual(result.status, "complete") + self.assertEqual([item.stage_id for item in result.stage_results], ["first", "third"]) + self.assertFalse((task_dir / "second-output.txt").exists()) + def test_task_preflight_fails_when_task_specific_test_file_is_missing(self) -> None: with tempfile.TemporaryDirectory() as directory: root = Path(directory) @@ -153,6 +177,46 @@ class PipelineRunnerTests(unittest.TestCase): self.assertIn("Retry limit reached", result.reason) self.assertEqual([item.stage_id for item in result.stage_results], ["implement", "review", "implement", "review", "implement", "review"]) + def test_failing_review_self_next_stage_routes_to_on_fail(self) -> None: + with tempfile.TemporaryDirectory() as directory: + root = Path(directory) + _write_common_files(root) + config = make_config(root, (), max_retries=1) + config.agents["reviewer"] = AgentConfig( + id="reviewer", + backend="command", + command=( + "python -c \"print('status: fail\\nreason: needs draft repair\\n" + "next_stage: review\\ncontext_update: add concrete details')\"" + ), + system_prompt=Path("reviewer.md"), + ) + config = replace( + config, + pipeline=PipelineConfig( + max_task_retries=1, + stages=( + StageConfig(id="implement", type="agent", agent="planner", output="implementation-log.md"), + StageConfig( + id="review", + type="agent_review", + agent="reviewer", + on_fail="implement", + output="review.md", + ), + ), + ), + ) + runner = PipelineRunner(config, ArtifactStore(root, ".nightshift", run_id="test-run")) + task = parse_tasks(TASK_MD)[0] + + result = runner.run_task(task) + + self.assertEqual(result.retry_count, 1) + self.assertEqual([item.stage_id for item in result.stage_results], ["implement", "review", "implement", "review"]) + log = (root / ".nightshift" / "runs" / "test-run" / "run.log").read_text(encoding="utf-8") + self.assertIn("next_stage=implement", log) + def test_malformed_review_gets_strict_retry_without_redrafting(self) -> None: with tempfile.TemporaryDirectory() as directory: root = Path(directory) @@ -544,6 +608,34 @@ Acceptance Criteria: self.assertIn("response = self.client.get('/board/general')", note) self.assertIn("self.assertEqual(response.status_code, 200)", note) + def test_state_update_retry_note_guides_deletion_heavy_repairs(self) -> None: + with tempfile.TemporaryDirectory() as directory: + root = Path(directory) + _write_common_files(root) + artifacts = ArtifactStore(root, ".nightshift", run_id="test-run") + config = make_config(root, ()) + runner = PipelineRunner(config, artifacts) + output_path = artifacts.write_stage_output( + "TASK-001", + "state-validation.md", + "# Patch Validation\n\nStatus: fail\nReason: Patch validation failed: deletion-heavy patch exceeds max_delete_ratio 0.35.\n", + ) + + note = runner._format_retry_note( + 1, + StageConfig(id="validate_state", type="patch_validator", on_fail="update_state"), + StageResult( + stage_id="validate_state", + status="fail", + reason="Patch validation failed: deletion-heavy patch exceeds max_delete_ratio 0.35.", + output_path=str(output_path.relative_to(root)), + ), + "update_state", + ) + + self.assertIn("preserve existing durable state text", note) + self.assertIn("minimal additive edits", note) + def test_code_writer_normalizer_and_validator_pipeline(self) -> None: with tempfile.TemporaryDirectory() as directory: root = Path(directory) @@ -892,6 +984,60 @@ Acceptance Criteria: self.assertIn("... ", retry_prompt) self.assertLess(len(retry_prompt), 9000) + def test_state_file_writer_invalid_output_retry_uses_delimiter_format(self) -> None: + with tempfile.TemporaryDirectory() as directory: + root = Path(directory) + _write_common_files(root) + story = root / "story" + story.mkdir() + (story / "plot-state.md").write_text("old\n", encoding="utf-8") + (root / "fake_writer.py").write_text( + "\n".join( + [ + "import sys", + "prompt = sys.stdin.read()", + "if 'Previous file_writer output was invalid' not in prompt:", + " print('lookup failed')", + "else:", + " (open('retry-prompt.txt', 'w', encoding='utf-8').write(prompt))", + " print('FILE: story/plot-state.md')", + " print('---CONTENT---')", + " print('old')", + " print('new')", + " print('---END---')", + ] + ), + encoding="utf-8", + ) + stages = ( + StageConfig( + id="update_state", + type="file_writer", + agent="writer", + allowed_paths=( + "story/plot-state.md", + "story/characters.md", + "story/timeline.md", + "story/unresolved-threads.md", + ), + ), + ) + config = make_config(root, stages) + config.agents["writer"] = AgentConfig( + id="writer", + backend="command", + command="python fake_writer.py", + system_prompt=Path("planner.md"), + ) + runner = PipelineRunner(config, ArtifactStore(root, ".nightshift", run_id="test-run")) + + result = runner.run_task(parse_tasks(TASK_MD)[0]) + + retry_prompt = (root / "retry-prompt.txt").read_text(encoding="utf-8") + self.assertEqual(result.status, "complete") + self.assertIn("Use delimiter file blocks only", retry_prompt) + self.assertNotIn("Use complete fenced file blocks", retry_prompt) + def test_file_writer_retry_compacts_large_previous_outputs(self) -> None: outputs = { "scene-draft.patch": "a" * 5000, @@ -904,6 +1050,162 @@ Acceptance Criteria: self.assertLess(len(compacted["scene-draft.patch"]), 180) self.assertEqual(compacted["draft-validation.md"], "Patch validation failed") + def test_file_writer_first_attempt_preserves_large_previous_outputs(self) -> None: + outputs = {"plan": "a" * 5000} + + compacted = _file_writer_previous_outputs(outputs, retry_count=0, max_chars=100) + + self.assertEqual(compacted["plan"], "a" * 5000) + + def test_file_writer_previous_outputs_strip_wrapped_agent_prompts(self) -> None: + output = "\n".join( + [ + "# Agent Output: plan", + "", + "## stdout", + "", + "```text", + "useful plan", + "```", + "", + "## stderr", + "", + "```text", + "```", + "", + "## Prompt", + "", + "```markdown", + "huge prompt marker", + "```", + ] + ) + + compacted = _file_writer_previous_outputs({"plan": output}, retry_count=0) + + self.assertEqual(compacted["plan"], "useful plan") + self.assertNotIn("huge prompt marker", compacted["plan"]) + + def test_state_update_file_writer_gets_focused_context_and_current_files(self) -> None: + with tempfile.TemporaryDirectory() as directory: + root = Path(directory) + _write_common_files(root) + (root / "story").mkdir() + (root / "story" / "plot-state.md").write_text("# Plot State\n\n- Before\n", encoding="utf-8") + (root / "fake_state_writer.py").write_text( + "\n".join( + [ + "import sys", + "prompt = sys.stdin.read()", + "open('state-prompt.txt', 'w', encoding='utf-8').write(prompt)", + "if 'current_allowed_files' in prompt and 'huge-plan-marker' not in prompt:", + " print('FILE: story/plot-state.md')", + " print('---CONTENT---')", + " print('# Plot State')", + " print()", + " print('- Before')", + " print('- After')", + " print('---END---')", + "else:", + " print('')", + ] + ), + encoding="utf-8", + ) + config = make_config( + root, + ( + StageConfig(id="plan", type="agent", agent="planner", output="plan.md"), + StageConfig( + id="update_state", + type="file_writer", + agent="state_updater", + allowed_paths=("story/plot-state.md",), + ), + ), + ) + config.agents["planner"] = AgentConfig( + id="planner", + backend="command", + command="python -c \"print('huge-plan-marker' * 1000)\"", + system_prompt=Path("planner.md"), + ) + config.agents["state_updater"] = AgentConfig( + id="state_updater", + backend="command", + command="python fake_state_writer.py", + system_prompt=Path("planner.md"), + ) + runner = PipelineRunner(config, ArtifactStore(root, ".nightshift", run_id="test-run")) + + result = runner.run_task(parse_tasks(TASK_MD)[0]) + + prompt = (root / "state-prompt.txt").read_text(encoding="utf-8") + self.assertEqual(result.status, "complete") + self.assertIn("current_allowed_files", prompt) + self.assertIn("# Plot State", prompt) + self.assertNotIn("huge-plan-marker", prompt) + + def test_scene_editor_file_writer_gets_current_scene_file(self) -> None: + task_md = """# Tasks + +- [ ] SCENE-001: Edit scene + +Description: +Repair the scene. + +Acceptance Criteria: +- Writes: +- `story/chapters/chapter-001/scene-001.md` +""" + with tempfile.TemporaryDirectory() as directory: + root = Path(directory) + _write_common_files(root) + (root / "tasks.md").write_text(task_md, encoding="utf-8") + scene_path = root / "story" / "chapters" / "chapter-001" / "scene-001.md" + scene_path.parent.mkdir(parents=True) + scene_path.write_text("Proxy walked home.\n", encoding="utf-8") + (root / "fake_editor.py").write_text( + "\n".join( + [ + "import sys", + "prompt = sys.stdin.read()", + "open('editor-prompt.txt', 'w', encoding='utf-8').write(prompt)", + "if 'current_scene_file' in prompt and 'Proxy walked home.' in prompt:", + " print('FILE: story/chapters/chapter-001/scene-001.md')", + " print('---CONTENT---')", + " print('Proxy walked home corrected.')", + " print('---END---')", + "else:", + " print('')", + ] + ), + encoding="utf-8", + ) + stages = ( + StageConfig( + id="edit_scene", + type="file_writer", + agent="editor", + allowed_paths=("story/chapters",), + ), + ) + config = make_config(root, stages) + config.agents["editor"] = AgentConfig( + id="editor", + backend="command", + command="python fake_editor.py", + system_prompt=Path("planner.md"), + ) + runner = PipelineRunner(config, ArtifactStore(root, ".nightshift", run_id="test-run")) + + result = runner.run_task(parse_tasks(task_md)[0]) + + prompt = (root / "editor-prompt.txt").read_text(encoding="utf-8") + self.assertEqual(result.status, "complete") + self.assertIn("current_scene_file", prompt) + self.assertIn("Proxy walked home.", prompt) + def test_patch_validator_rejects_unsafe_patch(self) -> None: with tempfile.TemporaryDirectory() as directory: root = Path(directory)