bugfixes after test run and terminal status

2026-06-14 18:18:36 +00:00 · 2026-05-21 04:14:05 -07:00 · 2026-05-21 04:14:05 -07:00 · 90e4c80116
commit 90e4c80116
parent fb575fc5f7
6 changed files with 533 additions and 281 deletions
--- a/docs/ideas.md
+++ b/docs/ideas.md
@ -1,194 +1,17 @@
 # Ideas TODO
-This file is now prioritized inline. Priority scale:
+This file tracks open ideas only. Completed items should be removed after they land.
 Priority scale:
 - P0: do next; directly improves current feedback loop
 - P1: important after the current loop is usable
 - P2: useful, but only after basics are stable
 - P3: defer or maybe reject
 ## P0: Make Integration Tests Easy To Run
 Status: implemented.
 Implemented command:
 ```powershell
 python -m nightshift.cli integ-test --template tutorial-deaddrop --task TASK-001
 ```
 It creates the integration sandbox, sets up the venv, runs validation through setup, runs the task from the generated project directory, and prints the artifact root. Use `--dry-run` to preview the setup and task command.
 Running integration tests is still too manual.
 Current process:
 - install the current version of NightShift
 - run `python -m nightshift.cli integ-run --template tutorial-deaddrop --setup`
 - copy the activation line from the output and run it
 - `cd` into the generated directory
 - run the task there, because running from the repo root does not find `nightshift.yaml`
 Recommendation: implement a wrapper command, not just a loose script.
 Target command:
 ```powershell
 python -m nightshift.cli integ-test --template tutorial-deaddrop --task TASK-001
 ```
 It should:
 1. create the integration run
 2. set up the venv
 3. install NightShift from the current checkout
 4. run `nightshift validate`
 5. run the selected task from the generated project directory
 6. print final status and artifact path
 Useful variants:
 ```powershell
 python -m nightshift.cli integ-test --template tutorial-deaddrop --all
 python -m nightshift.cli integ-test --template tutorial-deaddrop --task TASK-002 --keep 3
 ```
 The base-directory config issue may not be a core bug, but it is bad UX. The wrapper should handle `cwd` correctly.
 ## P0/P1: Remove Multi-Candidate Workflow From Default DeadDrop
 Status: implemented for the default DeadDrop template and tutorial example.
 Original idea:
 - The multi-candidate workflow does not add as much as expected.
 - Keep it as an example, maybe `example-multiagent`.
 Recommendation: yes. Remove it from the default DeadDrop tutorial.
 Reason:
 - DeadDrop is becoming the reliability harness.
 - Multi-candidate fallback makes artifacts harder to reason about.
 - It adds model variability while we are still debugging pipeline behavior.
 Better split:
 ```text
 tutorial-deaddrop
 tutorial-deaddrop-multiagent
 ```
 or:
 ```text
 examples/templates/multiagent-fallback
 ```
 Default DeadDrop should be boring:
 ```text
 planner -> semantic_context -> context -> implement -> validate -> test -> review
 ```
 Use one strong implementer first. Add fallback only in a separate experiment template.
 ## P1: Add A Qwen3 / 30B DeadDrop Variant
 Status: implemented as the default DeadDrop model path using `qwen3-coder:30b`.
 Original idea:
 - Use a non-coder model for planner roles.
 - Try `qwen3.6:27b` for planning.
 - Use `qwen3-coder:30b` for implementer and code-heavy roles.
 Recommendation: viable, but make this a variant, not the default.
 kass reply- No lets make this the default. the qwen3-coder:30b is fast now for me for some reason.
 Suggested template/config:
 ```text
 tutorial-deaddrop-qwen3
 ```
 Possible role split:
 - planner: `qwen3.6:27b`
 - reviewer/debugger: `qwen3.6:27b`
 - implementer: `qwen3-coder:30b` or exact local 30B coder model name
 Important: confirm exact model names with:
 ```powershell
 ollama list
 ```
 i did its `qwen3-coder:30b`
 Use 30B where it pays:
 - first implementation for hard tasks
 - repair after concrete test failure
 - schema/database changes
 - multi-file changes
 Do not blindly make every stage 30B if it is slow.
 reply: Its not slow now!`qwen3-coder:30b`
 ## P2: Expose More Model Parameters
 Status: implemented for the practical first set.
 Supported optional Ollama fields now include `num_ctx`, `num_predict`, `seed`, and `stop`, in addition to existing `temperature`.
 Original question:
 - What else besides temperature is available?
 - Are any worth optimizing?
 Likely useful for Ollama:
 - `temperature`
 - `num_ctx`
 - `num_predict`
 - `seed`
 - `stop`
 - maybe `top_p`, `top_k`, `repeat_penalty`
 Recommendation: add only a small practical set first.
 Useful config shape:
 ```yaml
 temperature: 0.1
 num_ctx: 8192
 num_predict: 4096
 seed: 1
 ```
 Most useful:
 - `num_ctx`: larger repo/task context
 - `num_predict`: caps runaway output
 - `seed`: reproducibility, if supported consistently
 - `temperature`: already useful; keep low for code
 - `stop`: could help enforce file-block or diff-only contracts
 Defer tuning `top_p`, `top_k`, and `repeat_penalty` unless a specific model needs it.
 reply: yup lets put this in the nightshift.yaml (optional parameters, if they arent in there that's fine, but we should offer them.)
 ## P1: Add Test Governance For Generated Tests
-Original idea:
+Use this only for generated-test mode. Do not put generated tests back into the default DeadDrop fixed-test pipeline yet.
 - Have a test governance layer for when agents write tests.
 - A reviewer validates alignment with acceptance criteria.
 Recommendation: yes, but only for generated-test mode. Do not put generated tests back into default DeadDrop yet.
 The previous failures proved test-writing agents will:
@ -208,19 +31,13 @@ Deterministic checks:
 - tests do not import undeclared dependencies
 - tests do not define Flask routes or app implementation
 - test names match current task id or current artifact
- no future-task keywords unless accepted by current task AC
+- no future-task keywords unless accepted by current task acceptance criteria
 Then optional model reviewer checks acceptance-criteria alignment.
 ## P2: Add A Test Analyzer Agent For TDD
-Original idea:
+Defer until generated tests are stable.
 - Analyze tests.
 - Translate them into direct instructions for the implementer.
 - Maybe implement using agent YAML definitions without new NightShift features.
 Recommendation: viable, but defer until generated tests are stable.
 Possible pipeline:
@ -244,12 +61,7 @@ This may help smaller models, but it is another model output that can be wrong.
 ## P2/P3: Add A Test Planner
-Original idea:
+Maybe, but defer.
 - A test planner understands acceptance criteria and code.
 - Provides input to the next stage about constraints and code, especially for non-TDD.
 Recommendation: maybe, but defer.
 This overlaps with:
@ -267,85 +79,8 @@ test_planner -> write_tests -> test_governance -> implement
 For now, fold this idea into the future test governance/analyzer work.
 ## P1: Add Fixed Tests For All DeadDrop Tasks
 Status: mostly implemented in the template.
 Current fixed tests:
 ```text
 tests/test_task001.py
 tests/test_task002.py
 tests/test_task003.py
 tests/test_task004.py
 tests/test_task005.py
 ```
 Important design:
 ```yaml
 python -m pytest -q tests/test_{task_id_compact}.py
 ```
 This lets all future task tests exist without breaking earlier tasks.
 Next step: validate these through integration runs, one task at a time.
 ## P1: Add `nightshift integ-report`
 Status: implemented as a first-pass artifact summarizer.
 New idea.
 Summarize latest integration run across tasks:
 ```text
 TASK-001 complete in 1 retry
 TASK-002 failed at validate_patch
 Root cause: protected tests modified
 Artifacts: ...
 ```
 Right now we inspect artifacts manually. NightShift should do more of that.
 Possible command:
 ```powershell
 python -m nightshift.cli integ-report --latest
 ```
 ## P1: Add Task-Test Preflight To `validate`
 Status: implemented.
 `nightshift validate` now renders task command placeholders for every task and fails early if a configured `tests/test_*.py` path is missing.
 Partially implemented at run time.
 Current behavior:
 - task command placeholders can render paths like `tests/test_task002.py`
 - `run_task` preflight fails before invoking agents if the task-specific test file is missing
 Better behavior:
 ```powershell
 nightshift validate
 ```
 should warn or fail:
 ```text
 TASK-003 expects tests/test_task003.py and it exists.
 TASK-004 expects tests/test_task004.py and it exists.
 ```
 This catches missing fixed tests earlier.
 ## P2: Add Run Comparison
 New idea.
 Useful once comparing 14B vs 30B:
 ```powershell
@ -364,3 +99,28 @@ Show:
 This should come after `integ-test` and `integ-report`.
 ## P2: Add A Separate Multiagent/Fallback DeadDrop Experiment
 Keep the default DeadDrop template boring and deterministic:
 ```text
 planner -> semantic_context -> context -> implement -> validate -> test -> review
 ```
 If fallback is useful, put it in a separate experiment template, for example:
 ```text
 tutorial-deaddrop-multiagent
 ```
 or:
 ```text
 examples/templates/multiagent-fallback
 ```
 Reason:
 - fallback makes artifacts harder to reason about
 - model variability is bad while debugging pipeline behavior
 - the default template should remain the reliability harness
--- a/docs/writer-idea.md
+++ b/docs/writer-idea.md
@ -0,0 +1,396 @@
 # Agentic Novel Writing Workflow Idea
 NightShift could plausibly support non-coding workflows, especially long-form fiction, because the core abstraction is not actually "write code." It is:
 - read task context
 - call one or more agents
 - produce artifacts
 - validate outputs
 - update project state
 - move to the next task
 That maps surprisingly well to writing a novel.
 ## Core Realization
 A novel workflow should not ask one model to write the whole book, or even necessarily one whole chapter.
 The durable project files would act like the source of truth:
 - `worldbuilding.md`
 - `characters.md`
 - `plot-state.md`
 - `style-guide.md`
 - `outline.md`
 - `chapters/chapter-001.md`
 - `chapters/chapter-001-scene-001.md`
 - `tasks.md`
 The task file would drive the work, similar to coding tasks:
 ```text
 - [ ] SCENE-001: Opening scene at the border checkpoint
 Description:
 Write the opening scene where Mara tries to enter the city under a false work permit.
 Acceptance Criteria:
 - Introduces Mara's immediate goal
 - Shows the checkpoint culture without exposition dump
 - Mentions the salt tax conflict indirectly
 - Ends with the inspector noticing the forged seal
 - 900-1400 words
 - Maintains close third-person POV
 ```
 NightShift would run one scene or section at a time.
 ## What We Already Have
 NightShift already has several useful primitives:
 - task files for chunking the novel into scenes or chapter sections
 - scoped paths so agents only edit allowed writing/project files
 - artifact output so drafts, reviews, and notes are preserved
 - retry loops for revision
 - planner/reviewer/debugger-style roles
 - repo context and semantic context retrieval
 - command stages that could run deterministic checks
 - file-writer stages that can update Markdown files
 - `lookup_requests` so agents can ask to read worldbuilding or prior scenes
 That means this may not require a totally new engine. It may mostly need a new template and some writing-specific validation/review stages.
 ## Likely Workflow
 One practical pipeline:
 ```text
 plan_scene
 gather_context
 draft_scene
 validate_scene
 continuity_review
 style_review
 update_plot_state
 summarize
 ```
 Possible roles:
 - Planner: turns the scene task into a beat plan.
 - Context agent: pulls relevant worldbuilding, character, and plot-state excerpts.
 - Drafting agent: writes the scene.
 - Continuity reviewer: checks contradictions against known state.
 - Style reviewer: checks POV, tone, pacing, and prose constraints.
 - State updater: updates `plot-state.md`, `characters.md`, and maybe `timeline.md`.
 ## Chunking Strategy
 Do not make a task equal to "write chapter 4" unless chapters are short.
 Better units:
 - scene
 - scene fragment
 - chapter section
 - revision pass for one scene
 - continuity update after one scene
 - prose polish for one scene
 A chapter can be assembled from multiple scene files:
 ```text
 chapters/
  chapter-001/
    scene-001.md
    scene-002.md
    scene-003.md
  chapter-001.md
 ```
 Then a later command or agent stage can compile `chapter-001.md`.
 ## Durable State Files
 The most important design piece is explicit state.
 Recommended files:
 ```text
 story/
  worldbuilding.md
  style-guide.md
  characters.md
  timeline.md
  plot-state.md
  unresolved-threads.md
  continuity-rules.md
  outline.md
  chapters/
 ```
 `plot-state.md` should be updated after every completed scene.
 It should track:
 - current character locations
 - known secrets
 - promises made to the reader
 - unresolved questions
 - relationships
 - injuries/resources/items
 - timeline date/time
 - what each POV character currently knows
 This is the fiction equivalent of application state.
 ## Validation Ideas
 Some checks can be deterministic:
 - word count range
 - file exists
 - only allowed files changed
 - Markdown heading format
 - no forbidden placeholders like `TODO`, `[insert]`, or `TBD`
 - no accidental author notes in final prose
 - required task terms are present
 - output compiles into a chapter file
 Some checks need model review:
 - continuity with worldbuilding
 - character voice consistency
 - POV discipline
 - pacing
 - whether the scene satisfies the beat plan
 - whether exposition is too direct
 - whether the state update accurately reflects the scene
 The key is not to overtrust model review. It should produce actionable retry notes, not silently bless everything.
 ## What Might Be Missing
 ### 1. Better Non-Code Templates
 This likely needs a dedicated template:
 ```text
 tutorial-deaddrop
 tutorial-novel
 ```
 or:
 ```text
 writer-novel
 ```
 The template would include:
 - starter story files
 - writing prompts
 - task examples
 - validation commands
 - allowed paths
 - recommended pipeline
 ### 2. Better Markdown Patch/File Handling
 The current file-writer flow can work, but fiction output may be long. It may be safer to require complete file blocks for one scene file at a time.
 The workflow should avoid having an agent rewrite the whole novel or whole `plot-state.md` unless necessary.
 ### 3. Stronger State Update Governance
 The risky part is not drafting prose. The risky part is bad state updates.
 Example failure:
 - the scene says Mara never saw the prince
 - the state updater records that Mara recognized the prince
 - future scenes build on the wrong state
 A state update should probably be reviewed against the actual scene before being applied.
 Possible pipeline:
 ```text
 draft_scene -> review_scene -> propose_state_update -> review_state_update -> apply
 ```
 ### 4. Context Window Management
 Worldbuilding documents can get large.
 The agent should not receive the entire story bible every time. It should receive:
 - the current task
 - relevant worldbuilding excerpts
 - relevant character entries
 - recent scene summaries
 - current plot state
 - style guide
 Semantic search is probably enough for a first version, but a novel template may want a more explicit index:
 ```text
 world-index.md
 character-index.md
 location-index.md
 ```
 ### 5. Scene Dependency Tracking
 Coding tasks already have dependencies. Fiction tasks would need the same:
 ```text
 Dependencies:
 - SCENE-001
 - SCENE-002
 ```
 This prevents writing a later scene before the required earlier story state exists.
 ### 6. Revision Workflows
 Writing is not only forward generation.
 Useful task types:
 - draft new scene
 - revise scene for pacing
 - revise dialogue
 - continuity repair
 - line edit
 - chapter assembly
 - chapter-level review
 - update outline after discovery writing
 NightShift can already represent these as tasks, but the prompts should distinguish them clearly.
 ### 7. Output Length Controls
 Long fiction output needs explicit limits.
 Use:
 - scene word count bounds
 - `num_predict`
 - task acceptance criteria
 - smaller scene files
 Do not ask for "write chapter 12" unless the chapter has already been broken into beats.
 ## Suggested First Template
 Start with a minimal `writer-novel` template.
 Files:
 ```text
 nightshift.yaml
 .nightshift/tasks.md
 .nightshift/agents/planner.md
 .nightshift/agents/drafter.md
 .nightshift/agents/continuity-reviewer.md
 .nightshift/agents/style-reviewer.md
 .nightshift/agents/state-updater.md
 story/worldbuilding.md
 story/characters.md
 story/style-guide.md
 story/plot-state.md
 story/timeline.md
 story/unresolved-threads.md
 story/chapters/.gitkeep
 ```
 Pipeline:
 ```text
 plan
 semantic_context
 context
 draft
 validate_draft
 continuity_review
 style_review
 update_state
 validate_state
 summarize
 ```
 Allowed paths:
 ```yaml
 scoped_paths:
  - story
  - .nightshift/tasks.md
 ```
 Draft stage allowed paths:
 ```yaml
 allowed_paths:
  - story/chapters
 ```
 State update stage allowed paths:
 ```yaml
 allowed_paths:
  - story/plot-state.md
  - story/characters.md
  - story/timeline.md
  - story/unresolved-threads.md
 ```
 That separation matters. The drafter should not freely rewrite the world bible, and the state updater should not rewrite the scene prose.
 ## What We Should Not Do First
 Do not start with:
 - automatic full-plot generation
 - full chapter generation
 - global rewrites of all prior chapters
 - one giant `worldbuilding.md` dumped into every prompt
 - trusting the model to maintain continuity without explicit state files
 Those are likely to produce impressive-looking but unstable output.
 ## Practical First Experiment
 A good first test:
 1. Create a tiny worldbuilding document.
 2. Create three characters.
 3. Create five scene tasks.
 4. Have NightShift draft one scene at a time.
 5. After each scene, update `plot-state.md`.
 6. Run continuity review against only the scene, state files, and relevant worldbuilding.
 7. Inspect artifacts.
 Success criteria:
 - scenes land in the right files
 - word counts stay bounded
 - state updates are accurate
 - future scenes use prior state correctly
 - reviewers catch obvious contradictions
 ## Bottom Line
 Theoretically, NightShift already has many of the needed utilities.
 The missing piece is mostly a writing-oriented template with:
 - scene-sized tasks
 - durable story state files
 - strict path separation between prose and state updates
 - writing-specific prompts
 - lightweight deterministic validators
 - continuity/style review stages
 This is viable, but it should start as a constrained scene-writing workflow, not an autonomous novel generator.
--- a/nightshift/cli.py
+++ b/nightshift/cli.py
@ -55,7 +55,7 @@ def build_parser() -> argparse.ArgumentParser:
    run_parser.add_argument("--all", action="store_true", help="Run all runnable incomplete tasks.")
    run_parser.add_argument(
        "--animation",
-        default="agent_thinking",
+        default="status_dots",
        choices=tuple(sorted(HOTDOG_ANIMATIONS)),
        help="Terminal animation to show while the run is active.",
    )
@ -210,13 +210,13 @@ def main(argv: list[str] | None = None) -> int:
            validate_task_dependencies(tasks)
            if args.all and args.task:
                parser.error("run accepts either --all or --task, not both.")
            runner = PipelineRunner(config, logger=RunLogger(console=print))
            if args.all:
                with TerminalAnimation(
                    args.animation,
-                    message="NightShift running all tasks",
+                    message="Starting all tasks",
                    enabled=not args.no_animation,
-                ):
+                ) as animation:
                    runner = PipelineRunner(config, logger=RunLogger(console=print, status=animation.update_message))
                    result = runner.run_tasks(tasks)
                print(f"Status: {result.status}")
                print(f"Tasks run: {len(result.task_results)}")
@ -229,9 +229,10 @@ def main(argv: list[str] | None = None) -> int:
            ensure_dependencies_satisfied(tasks, task)
            with TerminalAnimation(
                args.animation,
-                message=f"NightShift running {task.id}",
+                message=f"Task: {task.id} | Starting",
                enabled=not args.no_animation,
-            ):
+            ) as animation:
                runner = PipelineRunner(config, logger=RunLogger(console=print, status=animation.update_message))
                result = runner.run_task(task)
            print(f"Task: {result.task_id}")
            print(style_text(f"Status: {result.status}", color=_status_color(result.status), bold=True))
--- a/nightshift/runlog.py
+++ b/nightshift/runlog.py
@ -12,6 +12,7 @@ from .terminal import format_console_event_line, format_plain_event_line
 ConsoleWriter = Callable[[str], None]
 StatusWriter = Callable[[str], None]
@dataclass(frozen=True)
@ -24,8 +25,9 @@ class LogEvent:
 class RunLogger:
    """Write concise operational events to CLI and run log artifacts."""
-    def __init__(self, console: ConsoleWriter | None = None) -> None:
+    def __init__(self, console: ConsoleWriter | None = None, status: StatusWriter | None = None) -> None:
        self.console = console
        self.status = status
        self._run_log_path: Path | None = None
        self._aggregate_log_path: Path | None = None
        self._initialized_run_logs: set[Path] = set()
@ -45,6 +47,10 @@ class RunLogger:
        line = format_plain_event_line(timestamp, event, message, safe_fields)
        if self.console is not None:
            self.console(format_console_event_line(timestamp, event, message, safe_fields))
        if self.status is not None:
            status_message = format_status_event_message(event, message, safe_fields)
            if status_message:
                self.status(status_message)
        for path in (self._run_log_path,):
            if path is None:
                continue
@ -69,6 +75,39 @@ def format_log_line(log_event: LogEvent) -> str:
    return format_plain_event_line(timestamp, log_event.event, log_event.message, log_event.fields)
 def format_status_event_message(event: str, message: str, fields: dict[str, object]) -> str | None:
    task_id = str(fields.get("task_id", "") or "")
    retry = fields.get("retry_count")
    retry_text = f" retry {retry}" if retry not in (None, "") else ""
    stage_id = str(fields.get("stage_id", "") or "")
    stage_type = str(fields.get("stage_type", "") or "")
    agent_id = str(fields.get("agent_id", "") or "")
    model = str(fields.get("model", "") or "")
    command = str(fields.get("command", "") or "")
    status = str(fields.get("status", "") or "")
    next_stage = str(fields.get("next_stage", "") or "")
    prefix = f"Task: {task_id} | " if task_id else ""
    if event == "task.start":
        return f"Task: {task_id} | Starting" if task_id else "Starting task"
    if event == "stage.start" and stage_id:
        label = f"{stage_id} ({stage_type})" if stage_type else stage_id
        return f"{prefix}Stage: {label}{retry_text}"
    if event == "agent.start":
        model_text = f" | Model: {model}" if model else ""
        return f"{prefix}Agent: {agent_id or stage_id}{model_text}"
    if event == "command.start":
        return f"{prefix}Command: {command or stage_id}"
    if event == "stage.retry":
        return f"{prefix}Retrying after {stage_id} -> {next_stage}{retry_text}"
    if event in {"stage.finish", "task.finish"} and status:
        target = f"Stage: {stage_id}" if event == "stage.finish" and stage_id else "Task"
        return f"{prefix}{target} {status}"
    if event.endswith(".start"):
        return f"{prefix}{message}"
    return None
 def tail_lines(path: Path, limit: int = 100) -> list[str]:
    if limit <= 0:
        return []
--- a/nightshift/terminal.py
+++ b/nightshift/terminal.py
@ -44,6 +44,13 @@ BANNER_MESSAGES = [
 quote = random.choice(BANNER_MESSAGES)
 HOTDOG_ANIMATIONS = {
    "status_dots": [
        "[.  ]",
        "[.. ]",
        "[...]",
        "[ ..]",
        "[  .]",
    ],
    "classic_dance": [
        "🌭",
        "ヽ(🌭)ﾉ",
@ -158,6 +165,7 @@ class TerminalAnimation:
        self._stop = threading.Event()
        self._thread: threading.Thread | None = None
        self._width = 0
        self._lock = threading.Lock()
    def __enter__(self) -> "TerminalAnimation":
        self.start()
@ -180,11 +188,17 @@ class TerminalAnimation:
        self._clear()
        self._thread = None
    def update_message(self, message: str) -> None:
        with self._lock:
            self.message = message
    def _run(self) -> None:
        index = 0
        while not self._stop.is_set():
            frame = self.frames[index % len(self.frames)]
-            text = f"{frame} {self.message}"
+            with self._lock:
                message = self.message
            text = f"{frame} | {message}"
            self._width = max(self._width, len(text))
            self.stream.write("\r" + text.ljust(self._width))
            self.stream.flush()
--- a/tests/test_terminal.py
+++ b/tests/test_terminal.py
@ -5,7 +5,7 @@ import unittest
 from unittest.mock import patch
 from nightshift.artifacts import ArtifactStore
-from nightshift.runlog import RunLogger
+from nightshift.runlog import RunLogger, format_status_event_message
 from nightshift.terminal import (
    HOTDOG_ANIMATIONS,
    TerminalAnimation,
@ -34,6 +34,7 @@ class TerminalStylingTests(unittest.TestCase):
    def test_animation_frames_fall_back_to_agent_thinking(self) -> None:
        self.assertEqual(animation_frames("missing"), tuple(HOTDOG_ANIMATIONS["agent_thinking"]))
        self.assertEqual(animation_frames("classic_dance"), tuple(HOTDOG_ANIMATIONS["classic_dance"]))
        self.assertEqual(animation_frames("status_dots"), tuple(HOTDOG_ANIMATIONS["status_dots"]))
    def test_terminal_animation_is_disabled_for_non_tty(self) -> None:
        stream = StringIO()
@ -84,6 +85,47 @@ class TerminalStylingTests(unittest.TestCase):
            self.assertNotIn("\x1b[", run_log)
            self.assertNotIn("abc", run_log)
    def test_run_logger_status_callback_gets_compact_stage_message(self) -> None:
        with tempfile.TemporaryDirectory() as directory:
            root = Path(directory)
            artifacts = ArtifactStore(root, ".nightshift", run_id="test-run")
            statuses: list[str] = []
            logger = RunLogger(status=statuses.append)
            logger.bind(artifacts)
            logger.event(
                "stage.start",
                "Starting stage",
                task_id="TASK-001",
                stage_id="implement",
                stage_type="file_writer",
                retry_count=2,
            )
            logger.event(
                "agent.start",
                "Starting agent",
                task_id="TASK-001",
                agent_id="implementer",
                model="qwen3-coder:30b",
            )
            self.assertEqual(statuses[0], "Task: TASK-001 | Stage: implement (file_writer) retry 2")
            self.assertEqual(statuses[1], "Task: TASK-001 | Agent: implementer | Model: qwen3-coder:30b")
    def test_format_status_event_message_reports_retries(self) -> None:
        message = format_status_event_message(
            "stage.retry",
            "Redirecting after stage result",
            {
                "task_id": "TASK-001",
                "stage_id": "test",
                "next_stage": "implement",
                "retry_count": 1,
            },
        )
        self.assertEqual(message, "Task: TASK-001 | Retrying after test -> implement retry 1")
 if __name__ == "__main__":
    unittest.main()