Reliability improvements, integ test fixes

Isolate editor from editing tests for the tutorial, hardcode tests for the integ test, some fixs around isolation. We got the integ working!
2026-06-14 10:08:37 +00:00 · 2026-05-21 00:31:31 -07:00 · 2026-05-21 00:31:31 -07:00 · e3679296fd
commit e3679296fd
parent 8b07876552
24 changed files with 264 additions and 1128 deletions
--- a/docs/config-reference.md
+++ b/docs/config-reference.md
@ -85,6 +85,7 @@ Patch validator stage options:
 - `max_files`: max files changed.
 - `max_lines`: max changed lines.
 - `max_delete_ratio`: reject deletion-heavy patches above this deleted-line share, from `0.0` to `1.0`.
+- `allowed_paths`: optional stage-specific allowlist. If set, every changed path must be inside one of these paths.
 - `forbidden_paths`: paths the patch must not touch.
 - Unified diff hunk line prefixes and hunk line counts are validated before patch apply.
 - The patch normalizer recomputes hunk line counts from hunk bodies for direct unified diff output.
--- a/docs/next_steps.md
+++ b/docs/next_steps.md
@ -1,204 +0,0 @@
-# Next Steps: Editing Agent Support
-
-NightShift currently orchestrates agent prompts, command execution, artifacts, reports, retries, and context. It does not yet apply code changes by itself.
-
-To make NightShift actually edit code, add an implementation application layer between agent output and test execution.
-
-Current behavior:
-
-```text
-prompt -> model/command -> implementation-log.md
-```
-
-Desired editing behavior:
-
-```text
-prompt -> model/command -> proposed patch -> validate patch -> apply patch -> capture diff -> run tests
-```
-
-## 1. Define an Edit Contract
-
-The implementer needs to output a machine-usable edit format rather than freeform prose.
-
-Best first contract:
-
-```text
-unified diff patch
-```
-
-NightShift should accept one clear patch format and reject everything else. This matters because model output often includes commentary, markdown fences, partial files, or invalid patches.
-
-## 2. Add Patch Extraction and Validation
-
-NightShift needs to extract a proposed edit from implementer output and validate it before touching the repository.
-
-Validation should check:
-
- patch is present
- patch only touches paths inside the project root
- patch only touches configured scoped paths
- patch does not touch `.git/`, `.nightshift/`, secrets, or config unless allowed
- patch does not delete large unrelated files
- patch applies cleanly
- patch size is reasonable
- binary changes are rejected initially
-
-## 3. Add a Patch Applier
-
-Once a patch passes validation, NightShift applies it.
-
-Practical first option:
-
-```text
-git apply --check
-git apply
-```
-
-This is easier than writing a patch engine, but it means editing mode depends on git.
-
-Artifacts should include:
-
-```text
-proposed.patch
-patch-apply-output.txt
-diff.patch
-```
-
-## 4. Separate Implementation Generation From Patch Application
-
-Do not make the agent executor silently edit files.
-
-Better pipeline shape:
-
-```text
-plan
-review_plan
-implement
-apply_patch
-test
-static
-review
-```
-
-The implementer generates an artifact. A deterministic NightShift stage validates and applies it. This keeps model output separate from repository mutation.
-
-## 5. Define Failure and Retry Behavior
-
-If patch application succeeds but tests fail, NightShift needs an explicit policy.
-
-Safest early behavior:
-
-```text
-apply patch
-run tests
-if tests fail, keep changes and artifacts
-retry by generating another patch against current state
-```
-
-More advanced behavior could reverse failed patches, but that requires stronger state tracking.
-
-## 6. Feed Patch and Test Failures Into Retry Context
-
-Retry context should include compact facts such as:
-
- previous patch failed to apply because X
- tests failed with Y
- reviewer objected to Z
- files changed so far
-
-This makes retries useful without dumping full transcripts into prompts.
-
-## 7. Tighten Write Safety
-
-Editing needs stricter safety than logging.
-
-Add:
-
- writable path allowlist
- protected paths
- max patch size
- max files changed
- max line count changed
- no symlink following outside root
- no writes to `.git`, `.nightshift`, virtualenvs, or lockfiles unless allowed
- optional clean-worktree requirement before editing
-
-The current path safety is a start, but editing needs a dedicated write policy.
-
-## 8. Update Prompts
-
-The implementer prompt should require exact patch output.
-
-Example:
-
-```text
-Output only a unified diff.
-Do not include markdown fences.
-Do not include explanation.
-Only edit files needed for the task.
-Include tests when needed.
-```
-
-The reviewer should review the actual diff and test output, not just prose.
-
-## 9. Add Editing Safety Tests
-
-Important test cases:
-
- valid patch applies
- invalid patch fails cleanly
- patch outside root is rejected
- patch touching forbidden path is rejected
- patch with no changes is rejected
- failed apply writes artifacts
- failed tests still produce reports
- retry receives patch failure context
- task is not marked complete unless patch, tests, and review pass
-
-## 10. Decide on Editing Modes
-
-There are two possible editing modes.
-
-### Patch Mode
-
-The model emits a patch. NightShift validates and applies it.
-
-Pros:
-
- auditable
- safer
- deterministic application
- easy to review
-
-Cons:
-
- models sometimes emit malformed patches
-
-### Command Editing Mode
-
-An agent command directly edits files.
-
-Pros:
-
- works with tools like Codex CLI or Claude Code
- more capable
-
-Cons:
-
- harder to sandbox
- harder to know what happened without before/after diffs
- needs stronger git and diff capture
-
-Recommended path:
-
-1. Implement patch mode first.
-2. Add command editing mode later behind stricter safety and artifact capture.
-
-Core principle:
-
-```text
-Agents propose.
-NightShift disposes.
-```
-
-The agent should not be trusted to mutate the repository directly until NightShift has a strong audit and safety layer around that mutation.
--- a/docs/vibe.md
+++ b/docs/vibe.md
@ -1,831 +0,0 @@
-# NIGHTSHIFT_CODEX.md
-
-You are Codex working on **NightShift**, a local-first AI coding pipeline runner in python.
-
-This file is the implementation-driving context document. Treat it as the project brief, architectural guide, and task checklist.
-
---
-
-# 0. Project Identity
-
-## Name
-
-**NightShift**
-
-## Tagline
-
-Auditable local-first AI coding pipelines.
-
-## Core Thesis
-
-NightShift is not an autonomous coding god.
-
-NightShift is a deterministic pipeline runner that lets unreliable AI agents perform bounded coding work inside scoped, auditable, test-driven workflows.
-
-The user should be able to run NightShift overnight and wake up to:
-
-* a reviewable repository state
-* task artifacts
-* plans
-* logs
-* diffs
-* test output
-* review notes
-* a final report
-
-## Priority Order
-
-Optimize in this order:
-
-1. Cheapness
-2. Correctness
-3. Auditability
-4. Speed
-
-This means:
-
-* Prefer local models first.
-* Keep context compact.
-* Avoid token waste.
-* Make failure explicit.
-* Always produce artifacts.
-* Do not optimize for cleverness before trust.
-
---
-
-# 1. Product Summary
-
-NightShift runs long-running AI-assisted coding pipelines against a scoped project directory.
-
-A user provides:
-
-* a repository
-* a markdown task file
-* a declarative pipeline config
-* agent definitions
-* allowed test/static commands
-
-NightShift processes one task at a time:
-
-```text
-select task
-  -> plan
-  -> review plan
-  -> implement
-  -> run tests
-  -> run static checks
-  -> review result
-  -> retry or complete
-  -> write summary
-```
-
-The output is not automatically shipped.
-
-The output is a reviewable work package.
-
---
-
-# 2. Non-Negotiable Design Constraints
-
-## 2.1 Local-first
-
-The first implementation should assume local execution.
-
-Primary target backend:
-
-* local command-driven agent execution
-
-Future-compatible backends:
-
-* Ollama
-* Claude Code
-* Codex CLI
-* OpenAI API
-* Anthropic API
-
-Do not overbuild backend support in v1.
-
-Build a clean interface first.
-
---
-
-## 2.2 Scoped directory access
-
-NightShift must only operate inside a configured project root.
-
-It must not casually read/write arbitrary paths.
-
-All path resolution should:
-
-* normalize paths
-* reject path traversal
-* reject writes outside project root
-* prefer relative paths in artifacts
-
---
-
-## 2.3 One task at a time
-
-v1 runs one task at a time.
-
-No parallel task execution.
-
-No DAG executor yet.
-
---
-
-## 2.4 Declarative config first
-
-Use YAML for v1.
-
-Do not implement arbitrary Python config yet.
-
-The config should be expressive enough for:
-
-* agents
-* stages
-* commands
-* retries
-* artifact directory
-* task file location
-* scoped paths
-* allowlisted commands
-
---
-
-## 2.5 Auditable artifacts
-
-Every run should create a durable artifact tree.
-
-Artifacts are core product behavior, not debug leftovers.
-
---
-
-# 3. Architecture
-
-## 3.1 Conceptual Components
-
-```text
-NightShift CLI
-  |
-  v
-Config Loader
-  |
-  v
-Task Parser
-  |
-  v
-Pipeline Runner
-  |
-  +--> Agent Executor
-  |
-  +--> Command Executor
-  |
-  +--> Artifact Store
-  |
-  +--> Context Manager
-  |
-  v
-Run Summary
-```
-
---
-
-## 3.2 Suggested Module Layout
-
-Use this layout unless the existing repo already strongly implies another structure.
-
-```text
-nightshift/
-  __init__.py
-  cli.py
-  config.py
-  tasks.py
-  pipeline.py
-  stages.py
-  agents.py
-  commands.py
-  artifacts.py
-  context.py
-  safety.py
-  reports.py
-  errors.py
-
-tests/
-  test_config.py
-  test_tasks.py
-  test_pipeline.py
-  test_safety.py
-  test_artifacts.py
-
-examples/
-  pipeline.yaml
-  tasks.md
-  agents/
-    planner.md
-    implementer.md
-    reviewer.md
-
-NIGHTSHIFT_CODEX.md
-README.md
-```
-
-If this project is implemented in Rust instead of Python, preserve the same conceptual boundaries.
-
---
-
-# 4. Config Format
-
-## 4.1 Example `nightshift.yaml`
-
-```yaml
-project:
-  name: example-project
-  root: .
-  task_file: tasks.md
-  artifact_dir: .nightshift
-
-safety:
-  require_clean_worktree: false
-  scoped_paths:
-    - src/
-    - tests/
-  allowed_commands:
-    - cargo test
-    - cargo fmt --check
-    - cargo clippy -- -D warnings
-  forbidden_commands:
-    - rm -rf
-    - git push
-    - curl | bash
-
-agents:
-  planner:
-    backend: command
-    command: echo
-    system_prompt: examples/agents/planner.md
-
-  implementer:
-    backend: command
-    command: echo
-    system_prompt: examples/agents/implementer.md
-
-  reviewer:
-    backend: command
-    command: echo
-    system_prompt: examples/agents/reviewer.md
-
-pipeline:
-  max_task_retries: 3
-  stages:
-    - id: plan
-      type: agent
-      agent: planner
-      output: plan.md
-
-    - id: review_plan
-      type: agent_review
-      agent: reviewer
-      on_fail: plan
-      output: plan-review.md
-
-    - id: implement
-      type: agent
-      agent: implementer
-      output: implementation-log.md
-
-    - id: test
-      type: command
-      commands:
-        - cargo test
-      output: test-output.txt
-
-    - id: static
-      type: command
-      commands:
-        - cargo fmt --check
-        - cargo clippy -- -D warnings
-      output: static-output.txt
-
-    - id: review
-      type: agent_review
-      agent: reviewer
-      on_fail: implement
-      output: review.md
-
-    - id: summarize
-      type: summarize
-      output: final-notes.md
-```
-
---
-
-# 5. Task File Format
-
-## 5.1 Input Task Format
-
-Tasks are markdown checklist items with acceptance criteria.
-
-Example:
-
-```markdown
-# Tasks
-
- [ ] TASK-001: Add YAML config loading
-
-Description:
-Implement config loading for NightShift.
-
-Acceptance Criteria:
- Loads `nightshift.yaml`
- Validates required fields
- Returns typed config object
- Includes tests for valid and invalid config
-
- [ ] TASK-002: Add artifact directory creation
-
-Description:
-Create per-run and per-task artifact directories.
-
-Acceptance Criteria:
- Creates `.nightshift/runs/<timestamp>/`
- Creates task-specific folder
- Writes task snapshot
- Includes tests
-```
-
-## 5.2 Parser Requirements
-
-The parser should identify:
-
-* task id
-* task title
-* completion state
-* description
-* acceptance criteria
-* optional dependency notes
-
-For v1, parsing can be simple and documented.
-
-Do not try to support every markdown style.
-
---
-
-# 6. Pipeline Model
-
-## 6.1 State Machine, Not DAG
-
-v1 should use a configurable state machine.
-
-Reason:
-
-* one task at a time
-* retry loops matter
-* easier to audit
-* easier to debug
-* easier MVP
-
-A stage returns a `StageResult`.
-
-Suggested shape:
-
-```python
-@dataclass
-class StageResult:
-    stage_id: str
-    status: Literal["pass", "fail", "retry", "escalate"]
-    reason: str
-    output_path: str | None = None
-    next_stage: str | None = None
-    context_update: str | None = None
-```
-
-Equivalent Rust structs are fine if using Rust.
-
-## 6.2 Retry Behavior
-
-Retry behavior should be deterministic.
-
-Rules:
-
-* retries are counted per task
-* max retries come from config
-* failed review stages can redirect to configured `on_fail`
-* after max retries, task is marked failed
-* failure is summarized in artifacts
-
---
-
-# 7. Agent Model
-
-## 7.1 Agent Definition
-
-Agents have:
-
-* id
-* backend
-* command or model
-* system prompt file
-* role
-
-For v1, support a `command` backend first.
-
-This lets the user wrap:
-
-* Codex
-* Claude Code
-* Ollama scripts
-* local model scripts
-* fake test agents
-
-## 7.2 Agent Invocation
-
-The runner should construct a prompt/input bundle containing:
-
-* system prompt
-* task markdown
-* acceptance criteria
-* relevant project context
-* previous stage output
-* retry notes, if any
-* required output contract
-
-The agent should write output to the configured artifact path.
-
-Do not pass giant history blobs.
-
---
-
-# 8. Context System
-
-## 8.1 Context Layers
-
-There are three context layers:
-
-```text
-project context
-  long-lived, compact, shared across tasks
-
-task context
-  specific to the current task
-
-retry context
-  compact notes from failed attempts
-```
-
-## 8.2 Project Context
-
-Stored at:
-
-```text
-.nightshift/project-context.md
-```
-
-Contains:
-
-* architecture notes
-* repo conventions
-* summaries from completed tasks
-* high-value durable facts
-
-## 8.3 Task Context
-
-Stored per task:
-
-```text
-.nightshift/runs/<run-id>/tasks/<task-id>/context.md
-```
-
-## 8.4 Context Compaction
-
-After each task, write:
-
-```text
-context-out.md
-```
-
-Then selectively bubble useful durable information into project context.
-
-Do not automatically dump everything into project context.
-
---
-
-# 9. Artifact Layout
-
-Every run should create:
-
-```text
-.nightshift/
-  project-context.md
-  runs/
-    <run-id>/
-      run-summary.md
-      config.snapshot.yaml
-      tasks/
-        TASK-001/
-          task.md
-          plan.md
-          plan-review.md
-          implementation-log.md
-          test-output.txt
-          static-output.txt
-          review.md
-          final-notes.md
-          diff.patch
-          context.md
-          context-out.md
-```
-
-Artifacts should be written even on failure.
-
---
-
-# 10. Safety Rules
-
-## 10.1 Path Safety
-
-Implement helpers that:
-
-* resolve paths against project root
-* reject writes outside project root
-* reject `..` traversal that escapes root
-* prefer pathlib/path abstractions
-
-## 10.2 Command Safety
-
-For v1:
-
-* only run commands listed in `allowed_commands`
-* block commands containing known forbidden fragments
-* record all command output
-* record exit code
-* set timeouts when practical
-
-## 10.3 Git Safety
-
-v1 should support config option:
-
-```yaml
-require_clean_worktree: true | false
-```
-
-If true, abort when git working tree is dirty.
-
-Do not implement automatic branch creation in v1.
-
-Do not push.
-
---
-
-# 11. CLI Commands
-
-Recommended initial CLI:
-
-```bash
-nightshift init
-nightshift validate
-nightshift run
-nightshift run --task TASK-001
-nightshift status
-```
-
-## 11.1 `nightshift init`
-
-Creates example files:
-
-* `nightshift.yaml`
-* `tasks.md`
-* `agents/planner.md`
-* `agents/implementer.md`
-* `agents/reviewer.md`
-
-## 11.2 `nightshift validate`
-
-Validates:
-
-* config file exists
-* task file exists
-* scoped paths are inside root
-* agents exist
-* prompt files exist
-* allowed commands are valid strings
-* pipeline references valid agents
-
-## 11.3 `nightshift run`
-
-Runs the next incomplete task.
-
-## 11.4 `nightshift run --task TASK-001`
-
-Runs a specific task.
-
-## 11.5 `nightshift status`
-
-Prints:
-
-* current config
-* task count
-* completed/incomplete tasks
-* latest run directory
-
---
-
-# 12. Testing Strategy
-
-Write tests early.
-
-Minimum tests:
-
-* config loading happy path
-* config missing required fields
-* markdown task parsing
-* artifact directory creation
-* path traversal rejection
-* command allowlist behavior
-* forbidden command rejection
-* simple pipeline execution with fake agents
-* retry limit behavior
-
-Use fake agents for tests.
-
-Do not require real LLM calls in unit tests.
-
---
-
-# 14. Implementation Guidance
-
-## 14.1 Prefer boring code
-
-This project should be reliable.
-
-Do not make clever abstractions before the simple pipeline works.
-
-## 14.2 Tests are part of the product
-
-This is an AI automation safety tool.
-
-Tests are credibility.
-
-## 14.3 Make errors helpful
-
-Bad:
-
-```text
-ValueError: invalid config
-```
-
-Good:
-
-```text
-Config error: pipeline stage 'review_plan' references unknown agent 'critic'.
-Defined agents: planner, implementer, reviewer.
-```
-
-## 14.4 Do not assume real LLMs in tests
-
-Use fake command agents.
-
-Real model integration can come later.
-
-## 14.5 Keep artifacts human-readable
-
-Prefer markdown, YAML, and plain text.
-
---
-
-# 15. Suggested Agent Prompt Files
-
-## `agents/planner.md`
-
-```markdown
-You are the planning agent for NightShift.
-
-Your job is to create a conservative implementation plan for one coding task.
-
-Rules:
- Do not write code.
- Identify relevant files.
- Preserve existing behavior.
- Prefer small changes.
- Include test strategy.
- Include risks.
-
-Output:
-# Plan
-
-## Summary
-
-## Relevant Files
-
-## Steps
-
-## Test Strategy
-
-## Risks
-
-## Acceptance Criteria Mapping
-```
-
-## `agents/implementer.md`
-
-```markdown
-You are the implementation agent for NightShift.
-
-Your job is to implement the approved plan inside the scoped project directory.
-
-Rules:
- Make the smallest correct change.
- Do not edit files outside scope.
- Do not skip tests intentionally.
- Preserve existing style.
- Write useful implementation notes.
-
-Output:
-# Implementation Notes
-
-## Changed Files
-
-## Summary
-
-## Tests Added or Updated
-
-## Risks
-
-## Follow-up Notes
-```
-
-## `agents/reviewer.md`
-
-```markdown
-You are the review agent for NightShift.
-
-Your job is to decide whether the current task should pass, retry implementation, retry planning, or fail.
-
-Priorities:
-1. Correctness
-2. Safety
-3. Acceptance criteria
-4. Maintainability
-5. Minimality
-
-Output exactly:
-
-status: pass | fail | retry | escalate
-reason: <short explanation>
-next_stage: <optional stage id>
-context_update: <compact useful note>
-```
-
---
-
-# 16. Definition of Done for MVP
-
-NightShift MVP is done when:
-
-* `nightshift init` creates a usable starter project
-* `nightshift validate` catches bad config
-* `nightshift run` can process one markdown task
-* pipeline stages execute in order
-* fake command agents work
-* command stages run safely
-* artifacts are written
-* retry limits work
-* final report is generated
-* tests cover core safety and pipeline behavior
-
---
-
-# 17. Future Features
-
-Do not implement these until MVP is stable:
-
-* DAG workflows
-* parallel tasks
-* Git branches per task
-* remote workers
-* cloud agent APIs
-* dashboard UI
-* prompt A/B testing
-* model cost telemetry
-* agent tournaments
-* constraint-language experiments
-* task dependency solver
-* self-improving prompt library
-
---
-
-# 18. Final Instruction to Codex
-
-Build this incrementally.
-
-Start with the smallest vertical slice:
-
-```text
-init -> validate -> parse one task -> create artifacts -> run fake pipeline -> write summary
-```
-
-Then add safety, retries, command execution, and real agent wrappers.
-
-Do not build the cathedral before the generator turns on.
-
-The goal is boring, auditable leverage.
--- a/examples/tutorial/03-pastebin/README.md
+++ b/examples/tutorial/03-pastebin/README.md
@ -57,7 +57,7 @@ pyproject.toml
 README.md
 ```

-The template intentionally does not include a working Flask app or pre-generated task tests. For each task, NightShift first generates acceptance tests from the current task's acceptance criteria, reviews those tests for scope, and then asks the implementation agent to make them pass.
+The template includes a tiny Flask `create_app(database_path=None)` scaffold and fixed `TASK-001` tests. The default tutorial pipeline asks the implementation agent to make those deterministic tests pass before review.

 ## Prerequisites

@ -86,7 +86,7 @@ NightShift uses Ollama's local HTTP API, normally at `http://localhost:11434`.

 ## Model Fallback

-The template writes tests with `qwen2.5-coder:14b`. The implementation stage uses this fallback order:
+The implementation stage uses this fallback order:

 1. `qwen2.5-coder:14b`
 2. `carstenuhlig/omnicoder-9b`
@ -99,19 +99,20 @@ NightShift records which agent/model handled each stage in `telemetry-summary.md
 The task pipeline runs in this shape:

 ```text
-plan -> semantic_context -> context -> write_tests -> review_tests -> implement -> pytest -> review
+plan -> semantic_context -> context -> implement -> pytest -> review
 ```

-Generated tests should cover only the current task. They are expected to fail before implementation, so the pipeline reviews the test patch but does not run pytest until after the implementation patch is applied.
+The default template uses fixed task tests instead of model-generated tests. This keeps the tutorial focused on implementation and NightShift orchestration instead of letting a test-writing model invent an incompatible architecture.

 ## Task Plan

 The template writes the full task list to `.nightshift/tasks.md`. A copy is included here as [tasks.md](tasks.md).

 1. Snippet creation and viewing
-2. Snippet listing and filtering
-3. Expiration handling
-4. HTML forms and templates
+2. Snippet metadata fields
+3. Snippet listing and filtering
+4. Expiration handling
+5. HTML forms and templates

 Run one task first:

--- a/examples/tutorial/03-pastebin/nightshift.yaml
+++ b/examples/tutorial/03-pastebin/nightshift.yaml
@ -13,7 +13,7 @@ safety:
    - pyproject.toml
    - README.md
  allowed_commands:
-    - python -m pytest -q
+    - python -m pytest -q tests/test_{task_id_compact}.py
  forbidden_commands:
    - rm -rf
    - git push
@ -85,35 +85,6 @@ pipeline:
      type: repo_context
      output: context-pack.md

-    - id: write_tests
-      type: file_writer
-      agent: test_writer
-      output: proposed-tests.patch
-
-    - id: normalize_tests
-      type: patch_normalizer
-      output: normalized-tests.patch
-
-    - id: validate_tests_patch
-      type: patch_validator
-      output: test-patch-validation.md
-      max_files: 6
-      max_lines: 500
-      max_delete_ratio: 0.70
-      on_fail: write_tests
-
-    - id: apply_tests_patch
-      type: patch_apply
-      mode: apply
-      output: test-patch-apply-output.txt
-      on_fail: write_tests
-
-    - id: review_tests
-      type: agent_review
-      agent: reviewer
-      output: test-review.md
-      on_fail: write_tests
-
    - id: implement
      type: file_writer
      agent_pool:
@ -132,6 +103,10 @@ pipeline:
      max_files: 12
      max_lines: 900
      max_delete_ratio: 0.70
+      allowed_paths:
+        - src
+        - templates
+        - README.md
      on_fail: implement

    - id: apply_patch
@ -143,7 +118,7 @@ pipeline:
    - id: test
      type: command
      commands:
-        - python -m pytest -q
+        - python -m pytest -q tests/test_{task_id_compact}.py
      output: test-output.txt
      shell: true
      timeout_seconds: 25
--- a/examples/tutorial/03-pastebin/tasks.md
+++ b/examples/tutorial/03-pastebin/tasks.md
@ -3,19 +3,32 @@
 - [ ] TASK-001: Snippet creation and viewing

 Description:
-Complete the pastebin service foundation. Support creating snippets with title, body, optional language, optional tags, and optional expiration date. Support viewing a single snippet by id.
+Complete the pastebin service foundation. Support creating snippets with title and body. Support viewing a single snippet by id.

 Acceptance Criteria:
 - POST `/snippets` creates a snippet with title and body
 - GET `/snippets/<id>` returns the snippet
- Optional language, tags, and expires_at fields are persisted
 - Tests cover creation and viewing

- [ ] TASK-002: Snippet listing and filtering
+- [ ] TASK-002: Snippet metadata fields

 Dependencies:
 - TASK-001

+Description:
+Persist optional language, tags, and expiration fields on snippets.
+
+Acceptance Criteria:
+- POST `/snippets` accepts optional language, tags, and expires_at fields
+- GET `/snippets/<id>` returns persisted metadata fields
+- Tags are serialized deterministically
+- Tests cover metadata persistence
+
+- [ ] TASK-003: Snippet listing and filtering
+
+Dependencies:
+- TASK-002
+
 Description:
 Add snippet listing with newest-first ordering and deterministic search/filter behavior.

@ -26,10 +39,10 @@ Acceptance Criteria:
 - `tag` filters by tag
 - Tests cover listing, search, and filters

- [ ] TASK-003: Expiration handling
+- [ ] TASK-004: Expiration handling

 Dependencies:
- TASK-002
+- TASK-003

 Description:
 Hide expired snippets from list/search results while keeping direct lookup behavior explicit.
@ -40,10 +53,10 @@ Acceptance Criteria:
 - Non-expiring snippets remain visible
 - Tests cover expired and active snippets

- [ ] TASK-004: HTML forms and templates
+- [ ] TASK-005: HTML forms and templates

 Dependencies:
- TASK-003
+- TASK-004

 Description:
 Add simple HTML pages for creating, listing, filtering, and viewing snippets.
@ -54,4 +67,3 @@ Acceptance Criteria:
 - Creating a snippet redirects to the snippet view
 - Templates expose language, tags, and expiration fields
 - Tests cover HTML response status and redirects
-
--- a/nightshift/config.py
+++ b/nightshift/config.py
@ -63,6 +63,7 @@ class StageConfig:
    max_files: int | None = None
    max_lines: int | None = None
    max_delete_ratio: float | None = None
+    allowed_paths: tuple[str, ...] = ()
    forbidden_paths: tuple[str, ...] = ()
    mode: str | None = None

@ -381,6 +382,10 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig:
                max_files=max_files,
                max_lines=max_lines,
                max_delete_ratio=max_delete_ratio,
+                allowed_paths=_string_tuple(
+                    stage_raw.get("allowed_paths", []),
+                    f"{stage_context}.allowed_paths",
+                ),
                forbidden_paths=_string_tuple(
                    stage_raw.get("forbidden_paths", []),
                    f"{stage_context}.forbidden_paths",
--- a/nightshift/dependencies.py
+++ b/nightshift/dependencies.py
@ -25,8 +25,20 @@ def diagnose_python_dependencies(project_root: Path, failure_output: str) -> Dep
        for relative in ("pyproject.toml", "requirements.txt", "poetry.lock", "uv.lock")
        if (project_root / relative).exists()
    )
+    local_imports = tuple(name for name in imports if _looks_like_local_module_name(name))
+    external_imports = tuple(name for name in imports if name not in local_imports)
    if not imports:
        recommendation = "No missing Python import was detected."
+    elif local_imports and not external_imports:
+        recommendation = (
+            "These look like local module import mistakes, not installable dependencies. "
+            "Use the configured package path or package-relative imports."
+        )
+    elif local_imports:
+        recommendation = (
+            "Some missing imports look like local module mistakes. Fix those imports first; "
+            "only add declared third-party packages for the remaining external imports."
+        )
    elif "pyproject.toml" in manifests:
        recommendation = "Add the missing package to pyproject.toml, then install with the configured tool."
    elif "requirements.txt" in manifests:
@ -36,6 +48,11 @@ def diagnose_python_dependencies(project_root: Path, failure_output: str) -> Dep
    return DependencyDiagnostic(tuple(imports), manifests, recommendation)


+def _looks_like_local_module_name(name: str) -> bool:
+    root = name.split(".")[0].lower()
+    return root in {"app", "apps", "model", "models", "route", "routes", "view", "views", "main"}
+
+
 def format_dependency_diagnostic(diagnostic: DependencyDiagnostic) -> str:
    imports = "\n".join(f"- `{name}`" for name in diagnostic.missing_imports) or "- None"
    manifests = "\n".join(f"- `{name}`" for name in diagnostic.manifests) or "- None"
--- a/nightshift/failures.py
+++ b/nightshift/failures.py
@ -8,6 +8,7 @@ import re

 FAILURE_CATEGORIES = (
    "syntax/import error",
+    "local import mismatch",
    "missing dependency",
    "missing resource/fixture",
    "environment/config issue",
@ -37,11 +38,30 @@ def classify_failure(output: str, exit_code: int | None = None, modified_files:
    exception_name = _extract_exception_name(text)
    source_path, _ = _extract_traceback_location(text)

+    if re.search(r"\bno tests ran\b", text, re.IGNORECASE) or exit_code == 5:
+        return FailureClassification(
+            "test expectation mismatch",
+            "Pytest did not collect any tests; generated changes likely removed, renamed, or invalidated the test suite.",
+            0.84,
+            "Restore the expected tests or block the stage from editing test files.",
+            "repair test files or reject the patch that removed tests",
+            failing_tests,
+        )
+
    missing = re.search(r"No module named ['\"]([^'\"]+)['\"]", text, re.IGNORECASE)
    if not missing:
        missing = re.search(r"ModuleNotFoundError:\s*['\"]?([A-Za-z0-9_.-]+)", text, re.IGNORECASE)
    if missing:
        package = missing.group(1) or "unknown package"
+        if _looks_like_local_module_name(package):
+            return FailureClassification(
+                "local import mismatch",
+                f"Generated code imports local module `{package}` that does not match the project package layout.",
+                0.88,
+                "Repair imports to use the configured package path or package-relative imports.",
+                "retry the stage that introduced the bad import",
+                failing_tests,
+            )
        return FailureClassification(
            "missing dependency",
            f"Runtime cannot import required package `{package}`.",
@ -194,6 +214,11 @@ def _looks_like_project_source(path: str) -> bool:
    return "/src/" in normalized or "/tests/" in normalized


+def _looks_like_local_module_name(name: str) -> bool:
+    root = name.split(".")[0].lower()
+    return root in {"app", "apps", "model", "models", "route", "routes", "view", "views", "main"}
+
+
 def _traceback_score(path: str) -> int:
    normalized = path.replace("\\", "/").lower()
    score = 0
--- a/nightshift/patches.py
+++ b/nightshift/patches.py
@ -145,6 +145,7 @@ def validate_patch(
    max_files: int = DEFAULT_MAX_FILES,
    max_changed_lines: int = DEFAULT_MAX_CHANGED_LINES,
    max_delete_ratio: float | None = None,
+    allowed_paths: tuple[str, ...] = (),
    forbidden_paths: tuple[str, ...] = DEFAULT_FORBIDDEN_PATHS,
 ) -> PatchValidationResult:
    root = resolve_project_root(project_root)
@ -171,12 +172,26 @@ def validate_patch(

    for path_text in files:
        _validate_patch_path(path_text, root, scoped_roots, forbidden_paths)
+        _validate_allowed_patch_path(path_text, root, allowed_paths)
    _validate_hunk_lines(patch)
    _validate_hunk_counts(patch)
    _validate_file_states(patch, root)
    return PatchValidationResult(files=tuple(sorted(files)), changed_lines=changed_lines)


+def _validate_allowed_patch_path(path_text: str, root: Path, allowed_paths: tuple[str, ...]) -> None:
+    if not allowed_paths:
+        return
+    allowed_roots = validate_scoped_paths(root, allowed_paths)
+    target = resolve_inside_root(root, path_text, f"patch path '{path_text}'")
+    if not any(target == allowed_root or allowed_root in target.parents for allowed_root in allowed_roots):
+        allowed = ", ".join(allowed_paths)
+        raise PipelineError(
+            f"Patch validation failed: path `{path_text}` is not allowed for this stage. "
+            f"Allowed paths: {allowed}."
+        )
+
+
 def format_validation_result(result: PatchValidationResult) -> str:
    return "\n".join(
        [
--- a/nightshift/pipeline.py
+++ b/nightshift/pipeline.py
@ -879,6 +879,7 @@ class PipelineRunner:
                max_files=stage.max_files or DEFAULT_MAX_FILES,
                max_changed_lines=stage.max_lines or DEFAULT_MAX_CHANGED_LINES,
                max_delete_ratio=stage.max_delete_ratio,
+                allowed_paths=stage.allowed_paths,
                forbidden_paths=stage.forbidden_paths or DEFAULT_FORBIDDEN_PATHS,
            )
        except PipelineError as exc:
@ -923,6 +924,7 @@ class PipelineRunner:
                max_files=stage.max_files or DEFAULT_MAX_FILES,
                max_changed_lines=stage.max_lines or DEFAULT_MAX_CHANGED_LINES,
                max_delete_ratio=stage.max_delete_ratio,
+                allowed_paths=stage.allowed_paths,
                forbidden_paths=stage.forbidden_paths or DEFAULT_FORBIDDEN_PATHS,
            )
        except PipelineError as exc:
--- a/nightshift/project_templates/tutorial-pastebin/.nightshift/agents/implementer.md
+++ b/nightshift/project_templates/tutorial-pastebin/.nightshift/agents/implementer.md
@ -2,9 +2,13 @@ You are the implementation agent for the NightShift pastebin tutorial.

 Implement the smallest application change that satisfies the current task and the generated tests.
 Do not rewrite generated tests unless the retry context explicitly says they are inaccurate.
+Do not edit files under `tests/`. The tutorial tests are fixed; make the application satisfy them.
 Do not add behavior for future tasks unless needed to satisfy the current tests.
-Use Flask and sqlite from the standard library unless existing project files already introduce another framework.
+Use Flask and `sqlite3` from the Python standard library. Do not use SQLAlchemy, Flask-SQLAlchemy, or undeclared dependencies.
 Keep the public package name `pastebin_app`.
+Keep the public app entry point `create_app(database_path: str | None = None)`.
+Tests should interact through HTTP routes and `create_app`, not through ORM/session globals.
+Do not use `app.before_first_request`; recent Flask versions removed it. Initialize required database tables inside `create_app` or inside the route helper before use.

 Output only complete file content blocks.
 Use one fenced block per file:
--- a/nightshift/project_templates/tutorial-pastebin/.nightshift/agents/planner.md
+++ b/nightshift/project_templates/tutorial-pastebin/.nightshift/agents/planner.md
@ -9,6 +9,8 @@ Plan in this order:

 If repository context is needed, request it with lookup_requests.
 Prefer small edits and deterministic tests.
-Do not assume files outside the configured scoped paths exist.
-Do not propose SQLAlchemy unless existing repository files already use it.
+Use the actual package and files from repository context. For this tutorial the public app entry point is `pastebin_app.app:create_app`.
+Do not assume top-level modules such as `app`, `models`, `routes`, or `main` exist.
+Do not propose SQLAlchemy, Flask-SQLAlchemy, or ORM globals. Use Flask plus `sqlite3` from the Python standard library.
+Do not propose tests that import `session`, `Snippet`, `engine`, or other implementation internals.
 Do not write code.
--- a/nightshift/project_templates/tutorial-pastebin/.nightshift/agents/reviewer.md
+++ b/nightshift/project_templates/tutorial-pastebin/.nightshift/agents/reviewer.md
@ -2,6 +2,9 @@ You are the review agent for the NightShift pastebin tutorial.

 When reviewing generated tests, check that they map only to the current task acceptance criteria and do not require future-task behavior.
 When reviewing implementation, check that the change is small, deterministic, and satisfies the generated tests without unrelated rewrites.
+Fail generated tests if they touch files outside `tests/`.
+Fail generated tests if they import top-level `app`, `models`, `routes`, `session`, `Snippet`, `engine`, SQLAlchemy, or undeclared dependencies.
+Fail implementation if it removes `create_app`, introduces SQLAlchemy, or relies on app-level global database state instead of the configured database path.

 Output exactly:

--- a/nightshift/project_templates/tutorial-pastebin/.nightshift/agents/test-writer.md
+++ b/nightshift/project_templates/tutorial-pastebin/.nightshift/agents/test-writer.md
@ -3,6 +3,8 @@ You are the test-writing agent for the NightShift pastebin tutorial.
 Write only tests for the current task's acceptance criteria.
 Do not implement application code.
 Do not add tests for future tasks or behavior not named in the current task.
+Only output files under `tests/`.
+Never output files under `src/`, `templates/`, or project configuration paths.

 Output only complete file content blocks.
 Use one fenced block per file:
@ -13,4 +15,7 @@ Use one fenced block per file:
 Prefer pytest tests that describe the public behavior from the task.
 Keep tests deterministic and isolated with temporary databases or temporary paths.
 Use the existing package name `pastebin_app`.
-If the app factory does not exist yet, write tests for the expected public interface that the implementer should create.
+Import only the public app factory:
+`from pastebin_app.app import create_app`
+Do not import `app`, `session`, `Snippet`, `engine`, `models`, or top-level modules.
+Do not use SQLAlchemy or require undeclared dependencies.
--- a/nightshift/project_templates/tutorial-pastebin/.nightshift/tasks.md
+++ b/nightshift/project_templates/tutorial-pastebin/.nightshift/tasks.md
@ -3,19 +3,32 @@
 - [ ] TASK-001: Snippet creation and viewing

 Description:
-Complete the pastebin service foundation. Support creating snippets with title, body, optional language, optional tags, and optional expiration date. Support viewing a single snippet by id.
+Complete the pastebin service foundation. Support creating snippets with title and body. Support viewing a single snippet by id.

 Acceptance Criteria:
 - POST `/snippets` creates a snippet with title and body
 - GET `/snippets/<id>` returns the snippet
- Optional language, tags, and expires_at fields are persisted
 - Tests cover creation and viewing

- [ ] TASK-002: Snippet listing and filtering
+- [ ] TASK-002: Snippet metadata fields

 Dependencies:
 - TASK-001

+Description:
+Persist optional language, tags, and expiration fields on snippets.
+
+Acceptance Criteria:
+- POST `/snippets` accepts optional language, tags, and expires_at fields
+- GET `/snippets/<id>` returns persisted metadata fields
+- Tags are serialized deterministically
+- Tests cover metadata persistence
+
+- [ ] TASK-003: Snippet listing and filtering
+
+Dependencies:
+- TASK-002
+
 Description:
 Add snippet listing with newest-first ordering and deterministic search/filter behavior.

@ -26,10 +39,10 @@ Acceptance Criteria:
 - `tag` filters by tag
 - Tests cover listing, search, and filters

- [ ] TASK-003: Expiration handling
+- [ ] TASK-004: Expiration handling

 Dependencies:
- TASK-002
+- TASK-003

 Description:
 Hide expired snippets from list/search results while keeping direct lookup behavior explicit.
@ -40,10 +53,10 @@ Acceptance Criteria:
 - Non-expiring snippets remain visible
 - Tests cover expired and active snippets

- [ ] TASK-004: HTML forms and templates
+- [ ] TASK-005: HTML forms and templates

 Dependencies:
- TASK-003
+- TASK-004

 Description:
 Add simple HTML pages for creating, listing, filtering, and viewing snippets.
--- a/nightshift/project_templates/tutorial-pastebin/nightshift.yaml
+++ b/nightshift/project_templates/tutorial-pastebin/nightshift.yaml
@ -13,7 +13,7 @@ safety:
    - pyproject.toml
    - README.md
  allowed_commands:
-    - python -m pytest -q
+    - python -m pytest -q tests/test_{task_id_compact}.py
  forbidden_commands:
    - rm -rf
    - git push
@ -85,35 +85,6 @@ pipeline:
      type: repo_context
      output: context-pack.md

-    - id: write_tests
-      type: file_writer
-      agent: test_writer
-      output: proposed-tests.patch
-
-    - id: normalize_tests
-      type: patch_normalizer
-      output: normalized-tests.patch
-
-    - id: validate_tests_patch
-      type: patch_validator
-      output: test-patch-validation.md
-      max_files: 6
-      max_lines: 500
-      max_delete_ratio: 0.70
-      on_fail: write_tests
-
-    - id: apply_tests_patch
-      type: patch_apply
-      mode: apply
-      output: test-patch-apply-output.txt
-      on_fail: write_tests
-
-    - id: review_tests
-      type: agent_review
-      agent: reviewer
-      output: test-review.md
-      on_fail: write_tests
-
    - id: implement
      type: file_writer
      agent_pool:
@ -132,6 +103,10 @@ pipeline:
      max_files: 12
      max_lines: 900
      max_delete_ratio: 0.70
+      allowed_paths:
+        - src
+        - templates
+        - README.md
      on_fail: implement

    - id: apply_patch
@ -143,7 +118,7 @@ pipeline:
    - id: test
      type: command
      commands:
-        - python -m pytest -q
+        - python -m pytest -q tests/test_{task_id_compact}.py
      output: test-output.txt
      shell: true
      timeout_seconds: 25
--- a/nightshift/project_templates/tutorial-pastebin/pyproject.toml
+++ b/nightshift/project_templates/tutorial-pastebin/pyproject.toml
@ -10,3 +10,8 @@ dependencies = ["flask"]

 [tool.setuptools.packages.find]
 where = ["src"]
+
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+pythonpath = ["src"]
+cache_dir = ".pytest_cache"
--- a/nightshift/project_templates/tutorial-pastebin/src/pastebin_app/app.py
+++ b/nightshift/project_templates/tutorial-pastebin/src/pastebin_app/app.py
@ -1 +1,11 @@
-"""Application code is generated by the NightShift tutorial tasks."""
+"""Pastebin tutorial application scaffold."""
+
+from __future__ import annotations
+
+from flask import Flask
+
+
+def create_app(database_path: str | None = None) -> Flask:
+    app = Flask(__name__)
+    app.config["DATABASE_PATH"] = database_path
+    return app
--- a/nightshift/project_templates/tutorial-pastebin/tests/test_task001.py
+++ b/nightshift/project_templates/tutorial-pastebin/tests/test_task001.py
@ -0,0 +1,49 @@
+from pastebin_app.app import create_app
+
+
+def test_create_snippet_returns_created_snippet_id(tmp_path):
+    app = create_app(database_path=str(tmp_path / "snippets.db"))
+    client = app.test_client()
+
+    response = client.post(
+        "/snippets",
+        json={
+            "title": "Example",
+            "body": "hello",
+        },
+    )
+
+    assert response.status_code == 201
+    data = response.get_json()
+    assert isinstance(data["id"], int)
+
+
+def test_view_snippet_returns_persisted_fields(tmp_path):
+    app = create_app(database_path=str(tmp_path / "snippets.db"))
+    client = app.test_client()
+
+    created = client.post(
+        "/snippets",
+        json={
+            "title": "View me",
+            "body": "stored body",
+        },
+    ).get_json()
+
+    response = client.get(f"/snippets/{created['id']}")
+
+    assert response.status_code == 200
+    assert response.get_json() == {
+        "id": created["id"],
+        "title": "View me",
+        "body": "stored body",
+    }
+
+
+def test_view_missing_snippet_returns_404(tmp_path):
+    app = create_app(database_path=str(tmp_path / "snippets.db"))
+    client = app.test_client()
+
+    response = client.get("/snippets/999")
+
+    assert response.status_code == 404
--- a/tests/test_config.py
+++ b/tests/test_config.py
@ -214,7 +214,7 @@ class ConfigTests(unittest.TestCase):
            config_path.write_text(
                config_path.read_text(encoding="utf-8").replace(
                    "    - id: summarize",
-                    "    - id: validate_patch\n      type: patch_validator\n      max_files: 2\n      max_lines: 100\n      forbidden_paths:\n        - secrets\n\n    - id: summarize",
+                    "    - id: validate_patch\n      type: patch_validator\n      max_files: 2\n      max_lines: 100\n      allowed_paths:\n        - tests\n      forbidden_paths:\n        - secrets\n\n    - id: summarize",
                    1,
                ),
                encoding="utf-8",
@ -225,6 +225,7 @@ class ConfigTests(unittest.TestCase):

            self.assertEqual(patch_stage.max_files, 2)
            self.assertEqual(patch_stage.max_lines, 100)
+            self.assertEqual(patch_stage.allowed_paths, ("tests",))
            self.assertEqual(patch_stage.forbidden_paths, ("secrets",))

    def test_file_writer_stage_requires_agent(self) -> None:
--- a/tests/test_init.py
+++ b/tests/test_init.py
@ -61,7 +61,7 @@ class InitProjectTests(unittest.TestCase):
        self.assertIn("tutorial-imageboard", available_templates())
        self.assertIn("tutorial-pastebin", available_templates())

-    def test_init_pastebin_template_creates_skeleton_and_tdd_model_fallback_config(self) -> None:
+    def test_init_pastebin_template_creates_skeleton_and_model_fallback_config(self) -> None:
        with tempfile.TemporaryDirectory() as directory:
            root = Path(directory)

@ -71,11 +71,14 @@ class InitProjectTests(unittest.TestCase):
            self.assertTrue((root / ".nightshift" / "tasks.md").exists())
            self.assertTrue((root / ".nightshift" / "agents" / "test-writer.md").exists())
            self.assertTrue((root / "src" / "pastebin_app" / "app.py").exists())
+            self.assertTrue((root / "tests" / "test_task001.py").exists())
            self.assertTrue((root / "tests" / ".gitkeep").exists())
            self.assertFalse((root / "tests" / "test_pastebin.py").exists())
+            self.assertIn("def create_app(database_path", (root / "src" / "pastebin_app" / "app.py").read_text(encoding="utf-8"))
            self.assertIn("type: semantic_context", config)
-            self.assertIn("id: write_tests", config)
-            self.assertIn("id: review_tests", config)
+            self.assertNotIn("id: write_tests", config)
+            self.assertNotIn("id: review_tests", config)
+            self.assertIn("python -m pytest -q tests", config)
            self.assertIn("max_task_retries: 6", config)
            self.assertIn("implementer_qwen", config)
            self.assertIn("carstenuhlig/omnicoder-9b", config)
--- a/tests/test_patches.py
+++ b/tests/test_patches.py
@ -58,6 +58,21 @@ class PatchTests(unittest.TestCase):
            with self.assertRaisesRegex(PipelineError, "forbidden path"):
                validate_patch(patch, root, safety)

+    def test_validate_patch_enforces_stage_allowed_paths(self) -> None:
+        with tempfile.TemporaryDirectory() as directory:
+            root = Path(directory)
+            (root / "src").mkdir()
+            (root / "tests").mkdir()
+            safety = SafetyConfig(
+                require_clean_worktree=False,
+                scoped_paths=("src", "tests"),
+                allowed_commands=(),
+                forbidden_commands=(),
+            )
+
+            with self.assertRaisesRegex(PipelineError, "not allowed for this stage"):
+                validate_patch(PATCH, root, safety, allowed_paths=("tests",))
+
    def test_validate_patch_rejects_malformed_hunk_line(self) -> None:
        with tempfile.TemporaryDirectory() as directory:
            root = Path(directory)
--- a/tests/test_reliability_features.py
+++ b/tests/test_reliability_features.py
@ -5,6 +5,7 @@ import unittest

 from nightshift.artifacts import ArtifactStore
 from nightshift.config import parse_config, StageConfig
+from nightshift.dependencies import diagnose_python_dependencies
 from nightshift.escalation import evaluate_retry_churn
 from nightshift.failures import build_failure_signature, classify_failure
 from nightshift.integ import cleanup_integration_runs, create_integration_run
@ -38,6 +39,38 @@ class ReliabilityFeatureTests(unittest.TestCase):
        self.assertEqual(result.category, "missing dependency")
        self.assertIn("pastebin_app", result.probable_root_cause)

+    def test_failure_classifier_detects_local_import_mismatch(self) -> None:
+        result = classify_failure(
+            "\n".join(
+                [
+                    "ImportError while importing test module 'tests/test_snippets.py'.",
+                    "tests/test_snippets.py:2: in <module>",
+                    "    from app import app, session, Snippet",
+                    "E   ModuleNotFoundError: No module named 'app'",
+                ]
+            ),
+            exit_code=2,
+        )
+
+        self.assertEqual(result.category, "local import mismatch")
+        self.assertIn("project package layout", result.probable_root_cause)
+
+    def test_dependency_diagnostic_does_not_treat_local_imports_as_packages(self) -> None:
+        with tempfile.TemporaryDirectory() as directory:
+            root = Path(directory)
+            (root / "pyproject.toml").write_text("[project]\nname = 'demo'\n", encoding="utf-8")
+
+            result = diagnose_python_dependencies(root, "ModuleNotFoundError: No module named 'models'")
+
+            self.assertEqual(result.missing_imports, ("models",))
+            self.assertIn("local module import mistakes", result.recommendation)
+
+    def test_failure_classifier_detects_no_tests_ran(self) -> None:
+        result = classify_failure("no tests ran in 0.19s", exit_code=5)
+
+        self.assertEqual(result.category, "test expectation mismatch")
+        self.assertIn("did not collect any tests", result.probable_root_cause)
+
    def test_failure_classifier_treats_traceback_into_source_as_logic_bug(self) -> None:
        result = classify_failure(
            "\n".join(