Update design.md

2026-06-14 10:08:37 +00:00 · 2026-05-17 09:45:49 -07:00 · 2026-05-17 09:45:49 -07:00 · 86aa7dd13c
commit 86aa7dd13c
parent caac39b090
1 changed files with 220 additions and 244 deletions
--- a/docs/design.md
+++ b/docs/design.md
@ -844,291 +844,267 @@ Mitigation:

 ---

-# 16. MVP Definition
+# 16. Implemented Baseline

-The minimum viable NightShift implementation should:
+The MVP and post-MVP phases through phase 22 are implemented.

-1. Parse markdown tasks
-2. Execute a declarative pipeline
-3. Support local agents
-4. Generate plans
-5. Generate implementations
-6. Run tests
-7. Run static analysis
-8. Run review agents
-9. Retry failed stages
-10. Produce artifacts
-11. Produce an overnight summary
-12. Restrict repository access
+NightShift currently provides:

-This MVP is sufficient to:
+* `nightshift init` for starter project generation
+* `nightshift validate` for config, prompt, task, dependency, path, and command validation
+* `nightshift status` for read-only project inspection
+* `nightshift run` for the next runnable incomplete task
+* `nightshift run --task TASK-ID` for a specific task
+* `nightshift run --all` for sequential multi-task execution
+* `nightshift web` for a read-only artifact dashboard
+* Markdown task parsing with descriptions, acceptance criteria, completion state, and dependency bullets
+* Dependency validation for missing references and simple cycles
+* Dependency-aware task selection and task blocking
+* Declarative YAML pipeline execution
+* Command, agent, agent-review, review, and summarize stage handling
+* Retry redirection with a configured task retry limit
+* Command-backed agents
+* Ollama-backed local model agents
+* Prompt bundle construction with project, task, retry, and previous-stage context
+* Prompt snapshots and run metadata for experiment comparison
+* Optional experiment labels and prompt variant metadata
+* Command allowlists and forbidden-fragment checks
+* Optional shell-free command execution
+* Per-stage command timeouts
+* Project-root-restricted command working directories
+* Environment variable allowlists for command stages
+* Scoped path and artifact path safety checks
+* Optional clean-worktree enforcement
+* Pre-run and post-run git status artifacts
+* Per-task `diff.patch` artifacts
+* Task completion mutation for successful runs
+* Per-run and per-task markdown/text artifacts
+* Project, task, retry, and context-out files
+* Final task notes, stage summaries, task completion artifacts, and run summaries
+* Documentation for config, artifact review, troubleshooting, and quickstart workflows
+* A complete fake-agent quickstart Lisp example under `examples/quickstart-lisp/`

-* Demonstrate orchestration architecture
-* Demonstrate AI pipeline engineering
-* Demonstrate safety-aware automation
-* Serve as a strong portfolio project
+The system remains sequential and local-first. It is designed to produce reviewable artifacts and repository state, not to deploy, push, or autonomously ship changes.

 ---

-# 17. MVP Implementation Status
+# 17. Current Product Shape

-The first MVP pass is implemented across phases 1 through 11.
+The implemented product is now a practical local runner rather than only a single-task MVP.

-Implemented capabilities:
+## 17.1 CLI Workflow

-* Project initialization
-* Config validation
-* Markdown task parsing
-* Path and command safety checks
-* Artifact storage
-* Command stage execution
-* Command-backed agent execution
-* Deterministic pipeline execution
-* Retry redirection and retry limits
-* Context file creation and prompt injection
-* Final task notes and run summaries
-* README documentation
+Common workflow:

-Known MVP limitations:
+```text
+nightshift init
+nightshift validate
+nightshift status
+nightshift run
+nightshift run --task TASK-001
+nightshift run --all
+nightshift web
+```

-* Only the `command` agent backend is implemented
-* `nightshift status` is still a placeholder
-* Clean worktree enforcement is not fully wired
-* Diff patch capture is not implemented
-* Task completion mutation is not implemented
-* Task dependency enforcement is not implemented
-* Multi-task overnight batching is not implemented
+The CLI can validate a project, select runnable tasks, enforce dependencies, run one or more tasks, and report artifact locations.
+
+## 17.2 Artifact Workflow
+
+Artifacts are still the primary audit surface.
+
+Current run artifacts include:
+
+```text
+.nightshift/
+  project-context.md
+  runs/
+    <run-id>/
+      run-summary.md
+      config.snapshot.yaml
+      run-metadata.md
+      prompts/
+        <agent-id>.md
+      tasks/
+        TASK-001/
+          task.md
+          context.md
+          plan.md
+          implementation-log.md
+          test-output.txt
+          review.md
+          stage-results.md
+          context-out.md
+          task-completion.md
+          git-status-before.txt
+          git-status-after.txt
+          diff.patch
+          final-notes.md
+```
+
+Exact task artifact names depend on configured stage `output` values.
+
+## 17.3 Dashboard Workflow
+
+The web dashboard is read-only and artifact-driven.
+
+It currently:
+
+* Lists runs from `.nightshift/runs/`
+* Shows run summaries
+* Links to text and markdown artifacts
+* Safely rejects artifact path traversal
+* Auto-refreshes
+
+It does not:
+
+* Start or stop runs
+* Mutate config or tasks
+* Provide approval gates
+* Stream live process output
+* Authenticate users
+
+## 17.4 Known Limitations
+
+Current limitations:
+
+* Execution is sequential; there is no parallel task runner.
+* The web dashboard is read-only and artifact-oriented.
+* Live run progress is limited to basic CLI prints and artifact inspection.
+* Flask is optional; `nightshift web` requires it to be installed.
+* Ollama support depends on the user's local Ollama installation and model availability.
+* Git artifacts can be unavailable or degraded in non-git repositories or repositories blocked by Git safe-directory rules.
+* Task mutation is intentionally minimal and only flips matching checklist lines.
+* Command configuration is safer than the MVP but is still string-first for compatibility.
+* There is no branch isolation, resumable run state machine, approval workflow, or deployment integration.

 ---

 # 18. Next Major Update Plan

-The next major update should turn the single-task MVP into a more practical local runner while preserving the same safety and auditability model.
+The next major update should improve operational visibility while preserving the current artifact-first model.

-## Phase 12: Status Command
+## Phase 23: Improved Logging and Live Visibility

-* [ ] Implement `nightshift status`
-* [ ] Print config path and project root
-* [ ] Print task counts
-* [ ] Print next incomplete task
-* [ ] Print latest run directory
-* [ ] Print validation warnings where useful
-* [ ] Add tests
+NightShift should make active runs easier to observe from both the CLI and the web dashboard.
+
+Implementation tasks:
+
+* [ ] Add a small logging module with structured operational events.
+* [ ] Stream human-readable progress to the CLI during `run` and `run --all`.
+* [ ] Include run id, task id, stage id, agent/backend, command index, retry count, status, duration, and artifact path where available.
+* [ ] Write a per-run log file such as `.nightshift/runs/<run-id>/run.log`.
+* [ ] Optionally write or rotate an aggregate `.nightshift/nightshift.log` for cross-run troubleshooting.
+* [ ] Keep logs operational; do not duplicate full prompts, full model responses, or full command output that already lives in artifacts.
+* [ ] Redact or avoid secrets from logged environment/config values.
+* [ ] Add dashboard support for viewing the latest log tail.
+* [ ] Cap the dashboard log view to the last 100 lines by default.
+* [ ] Keep the full per-run log file available as an artifact unless a later size cap is configured.
+* [ ] Auto-refresh the dashboard log view with the existing dashboard refresh model.
+* [ ] Add tests for log writing, CLI progress hooks, dashboard log rendering, missing log files, and the 100-line cap.

 Acceptance Criteria:

-* User can inspect project state without running a pipeline
-* Missing or malformed inputs produce clear errors
-* Latest artifacts are discoverable from the CLI
-
---
-
-## Phase 13: Git Safety and Diff Artifacts
-
-* [ ] Implement clean-worktree enforcement when configured
-* [ ] Capture pre-run git status
-* [ ] Capture post-run git status
-* [ ] Write `diff.patch`
-* [ ] Include changed files in final reports
-* [ ] Handle non-git repositories gracefully
-* [ ] Add tests with temporary git repositories where practical
-
-Acceptance Criteria:
-
-* `require_clean_worktree: true` blocks dirty repositories
-* Diffs are persisted after task execution
-* Reports identify modified files without requiring users to inspect every artifact
-
---
-
-## Phase 14: Task Completion Updates
-
-* [ ] Mark completed tasks in `tasks.md`
-* [ ] Preserve task file formatting where practical
-* [ ] Avoid marking failed tasks complete
-* [ ] Record task completion decisions in artifacts
-* [ ] Add tests
-
-Acceptance Criteria:
-
-* Successful runs can mark `[ ]` tasks as `[x]`
-* Failed runs leave tasks incomplete
-* Task file updates are reviewable and minimal
-
---
-
-## Phase 15: Multi-Task Run Mode
-
-* [ ] Add `nightshift run --all`
-* [ ] Process incomplete tasks in file order
-* [ ] Stop or continue on failure based on config
-* [ ] Create per-task artifact directories under one run
-* [ ] Generate aggregate run summary
-* [ ] Add tests
-
-Acceptance Criteria:
-
-* User can run more than one task unattended
-* Each task remains independently reviewable
-* Aggregate summary shows completed and failed tasks
-
---
-
-## Phase 16: Dependency Handling
-
-* [ ] Parse dependency bullets into structured task dependencies
-* [ ] Block tasks whose dependencies are incomplete
-* [ ] Detect missing dependency references
-* [ ] Detect simple dependency cycles
-* [ ] Report blocked tasks in status and run summaries
-* [ ] Add tests
-
-Acceptance Criteria:
-
-* Tasks do not run before declared dependencies are complete
-* Dependency errors are clear and actionable
-* Task ordering remains deterministic
-
---
-
-## Phase 17: Local Model Backend
-
-* [ ] Add an Ollama-compatible agent backend
-* [ ] Keep the existing command backend
-* [ ] Reuse prompt bundle construction
-* [ ] Persist request/response metadata
-* [ ] Handle model errors and timeouts
-* [ ] Add fake backend tests without requiring Ollama
-
-Acceptance Criteria:
-
-* Users can configure a local model backend for agent stages
-* Tests do not require real model calls
-* Agent artifacts remain comparable across backends
-
---
-
-## Phase 18: Prompt and Pipeline Experiments
-
-* [ ] Add prompt variant identifiers
-* [ ] Snapshot prompt files per run
-* [ ] Record agent backend metadata
-* [ ] Add optional experiment labels to config
-* [ ] Include experiment metadata in reports
-* [ ] Add tests
-
-Acceptance Criteria:
-
-* Users can compare prompt/pipeline runs from artifacts
-* Reports show which prompts and backend settings produced a result
-* Experiment metadata does not change execution semantics
-
---
-
-## Phase 19: Stronger Command Execution
-
-* [ ] Replace shell-string execution where possible with parsed argv execution
-* [ ] Preserve compatibility with explicit shell command stages when configured
-* [ ] Add per-command timeout config
-* [ ] Add environment variable allowlists
-* [ ] Add working-directory restrictions
-* [ ] Add tests
-
-Acceptance Criteria:
-
-* Command execution is safer by default
-* Shell execution is explicit rather than implicit
-* Command behavior remains auditable
-
---
-
-## Phase 20: Documentation and Examples Refresh
-
-* [ ] Add complete example project
-* [ ] Add example fake-agent pipeline
-* [ ] Add example local-model pipeline
-* [ ] Document artifact review workflow
-* [ ] Document troubleshooting
-* [ ] Add config reference
-
-Acceptance Criteria:
-
-* New users can run a complete demo from a fresh checkout
-* Documentation distinguishes implemented features from planned features
-* Examples remain safe to run locally
-
---
-
-## Phase 21: Read-Only Web Dashboard
-
-* [ ] Add a Flask-based `nightshift web` command
-* [ ] Read run state from `.nightshift/runs/`
-* [ ] Show latest run summary
-* [ ] Show task status and retry count
-* [ ] Show stage results and artifact links
-* [ ] Render markdown/plain-text artifacts safely
-* [ ] Add simple auto-refresh
-* [ ] Keep the dashboard read-only
-* [ ] Add tests for route rendering and missing artifact handling
-
-Acceptance Criteria:
-
-* User can monitor a run from a browser without controlling execution
-* Dashboard works from existing artifact files
-* Missing or partial run artifacts do not crash the server
-* No config, task, command, or pipeline mutation is exposed from the UI
+* A user running NightShift from a terminal can tell which task and stage are active.
+* Long Ollama or command stages show enough lifecycle information that the process does not appear hung.
+* The latest run log is visible from `nightshift web`.
+* The web client displays at most the last 100 log lines by default.
+* Logs point users to detailed artifacts instead of replacing them.
+* Missing or partial log files do not crash the dashboard.

 Notes:

-* This phase should avoid websockets and process control at first.
-* The dashboard should be artifact-driven so it remains decoupled from pipeline internals.
-* Start/stop controls, authentication, live log streaming, and approval gates are separate future work.
+* This phase should not add process control, websockets, authentication, or write actions to the web client.
+* If future live streaming is needed, the first version can still use file tailing plus refresh before introducing websockets.
+* Operational logs should complement artifacts: artifacts remain the source of detailed prompts, responses, command output, diffs, and summaries.

---
+## Phase 24: Per-Agent Model Parameters

-## Phase 22: Quickstart Test Project
+- [ ] Add `temperature` to agent config.
+- [ ] Pass temperature to Ollama/OpenAI-compatible backends.
+- [ ] Default safely if omitted.
+- [ ] Add config validation tests.

-* [ ] Add a guided quickstart project to `QUICKSTART.md`
-* [ ] Recommend a small Python Lisp interpreter as the default test project
-* [ ] Provide a multi-task `tasks.md` example
-* [ ] Provide a matching `nightshift.yaml` example
-* [ ] Provide suggested planner, implementer, and reviewer prompt files
-* [ ] Include dependency examples across tasks
-* [ ] Include commands for validation, `run --task`, and `run --all`
-* [ ] Explain what artifacts the user should inspect after each run
+## Phase 25: Repo Lookup Tools MVP

-Acceptance Criteria:
+- [ ] Add tool interface for repo operations.
+- [ ] Implement scoped `list_files`.
+- [ ] Implement scoped `read_file`.
+- [ ] Implement scoped `grep`.
+- [ ] Enforce existing path safety rules.
+- [ ] Log tool calls as artifacts.

-* A new user can create a small target repo and exercise NightShift end to end
-* The project has multiple independently reviewable tasks
-* Tasks are small enough for local/fake agents but realistic enough to test planning, implementation, tests, retries, artifacts, and dependencies
-* The quickstart does not require external services
+## Phase 26: Planner Code-Discovery Support

-Recommended Project:
+- [ ] Teach planner prompt to request needed code context.
+- [ ] Add structured planner output for lookup requests.
+- [ ] Execute requested lookup tools.
+- [ ] Save `files-inspected.md`.
+- [ ] Re-run planner with retrieved context.

-* A minimal Lisp interpreter in Python is a good test project because it is compact, incremental, testable, and naturally splits into parser, evaluator, environment, builtins, and error-handling tasks.
+## Phase 27: Context Pack Builder

-Alternative Projects:
+- [ ] Add `repo_context` stage.
+- [ ] Generate `context-pack.md`.
+- [ ] Include task, acceptance criteria, relevant files, snippets, and constraints.
+- [ ] Add line-numbered excerpts.
+- [ ] Add context-size caps.

-* If the Lisp interpreter feels too language-theory focused, use a small INI/TOML-like config parser or a markdown todo CLI. Both are also compact and testable, but the Lisp interpreter gives better coverage of multi-step implementation and test generation.
+## Phase 28: Project Context Chart MVP

---
+- [ ] Generate `.nightshift/project-context-chart.md`.
+- [ ] Include files, responsibilities, functions/classes, entry points, tests.
+- [ ] Use simple regex/parser MVP.
+- [ ] Update chart during planning.
+- [ ] Store anchors/line numbers/search terms.

-## Phase 17-22 Implementation Status
+## Phase 29: Code Writer Stage

-Phases 17 through 22 are implemented.
+- [ ] Add `code_writer` stage type.
+- [ ] Feed it task + context pack.
+- [ ] Require unified diff output.
+- [ ] Save `proposed.patch`.
+- [ ] Save `implementation-summary.md`.

-Implemented capabilities:
+## Phase 30: Patch Normalization

-* Ollama agent backend
-* Experiment metadata and prompt snapshots
-* Stronger command execution options
-* Config reference, artifact review, and troubleshooting docs
-* Read-only Flask dashboard entry point
-* Complete quickstart Lisp example project
+- [ ] Add `patch_normalizer` stage.
+- [ ] Support low-temperature formatter model.
+- [ ] Convert messy model output to valid unified diff.
+- [ ] Reject missing/ambiguous edits.
+- [ ] Save `normalized.patch`.

-See `docs/devlog/phase17.md` through `docs/devlog/phase22.md` for implementation notes and decisions.
+## Phase 31: Patch Validation

+- [ ] Parse unified diffs.
+- [ ] Reject malformed patches.
+- [ ] Enforce scoped paths.
+- [ ] Reject path traversal.
+- [ ] Enforce max files/max lines changed.
+- [ ] Reject forbidden files.
+
+## Phase 32: Patch Apply / Dry Run
+
+- [ ] Add `patch_apply` stage.
+- [ ] Support `mode: dry_run`.
+- [ ] Support `mode: apply`.
+- [ ] Save `applied.patch`.
+- [ ] Preserve pre/post git status.
+- [ ] Fail cleanly on apply errors.
+
+## Phase 33: Test Feedback Repair Loop
+
+- [ ] Feed test/static failure output back into implementer.
+- [ ] Add bounded repair attempts.
+- [ ] Save each repair patch.
+- [ ] Save repair summaries.
+- [ ] Stop after max retry count.
+
+## Phase 34: End-to-End Coding Quickstart
+
+- [ ] Update quickstart to modify real code.
+- [ ] Include fake-agent test fixture.
+- [ ] Demonstrate lookup → context pack → patch → apply → test.
+- [ ] Document dry-run vs apply mode.
 ---

 # Appendix A: Design Decisions and Rationale