Ollama backend support, experiment metadata and prompt snapshots, stronger command execution controls, refreshed docs/examples, a read-only Flask dashboard, and a runnable quickstart Lisp project.

2026-06-14 10:08:37 +00:00 · 2026-05-17 01:39:44 -07:00 · 2026-05-17 01:39:44 -07:00 · 957dc7d25b
commit 957dc7d25b
parent 57608e9660
33 changed files with 1181 additions and 27 deletions
--- a/QUICKSTART.md
+++ b/QUICKSTART.md
@ -80,3 +80,227 @@ tasks/TASK-001/final-notes.md

 Example run files are available in `templates/`.
 They are safe starter examples and use command-backed fake agents.
+
+The repository also includes a complete sample target project:
+
+```text
+examples/quickstart-lisp/
+```
+
+Copy that directory elsewhere if you want to test NightShift against a multi-task project.
+
+## Quickstart Test Project
+
+A good first real target project is a tiny Lisp interpreter in Python. It is small enough to review, but it naturally breaks into multiple tasks that test NightShift's planning, implementation, command execution, artifacts, reports, and dependency handling.
+
+If you do not want a language interpreter, use a small config parser or markdown todo CLI instead. The Lisp interpreter is the recommended default because it has clear incremental milestones and simple tests.
+
+### 1. Create a Target Project
+
+```bash
+mkdir tiny-lisp
+cd tiny-lisp
+mkdir agents tests
+touch lisp.py tests/test_lisp.py
+```
+
+### 2. Add `nightshift.yaml`
+
+```yaml
+project:
+  name: tiny-lisp
+  root: .
+  task_file: tasks.md
+  artifact_dir: .nightshift
+
+safety:
+  require_clean_worktree: false
+  scoped_paths:
+    - .
+  allowed_commands:
+    - python -m unittest discover -v
+  forbidden_commands:
+    - rm -rf
+    - git push
+    - curl | bash
+
+agents:
+  planner:
+    backend: command
+    command: echo
+    system_prompt: agents/planner.md
+
+  implementer:
+    backend: command
+    command: echo
+    system_prompt: agents/implementer.md
+
+  reviewer:
+    backend: command
+    command: python -c "print('status: pass'); print('reason: quickstart reviewer accepted artifacts')"
+    system_prompt: agents/reviewer.md
+
+pipeline:
+  max_task_retries: 1
+  continue_on_task_failure: false
+  stages:
+    - id: plan
+      type: agent
+      agent: planner
+      output: plan.md
+
+    - id: implement
+      type: agent
+      agent: implementer
+      output: implementation-log.md
+
+    - id: test
+      type: command
+      commands:
+        - python -m unittest discover -v
+      output: test-output.txt
+
+    - id: review
+      type: agent_review
+      agent: reviewer
+      on_fail: implement
+      output: review.md
+
+    - id: summarize
+      type: summarize
+      output: final-notes.md
+```
+
+This uses fake command agents so the pipeline is safe and deterministic. Replace `command: echo` later with your real local agent wrapper.
+
+### 3. Add `tasks.md`
+
+```markdown
+# Tasks
+
+- [ ] TASK-001: Parse Lisp expressions
+
+Description:
+Implement tokenization and parsing for a tiny Lisp subset.
+
+Acceptance Criteria:
+- Parses numbers
+- Parses symbols
+- Parses nested lists
+- Raises useful errors for unbalanced parentheses
+- Includes unit tests
+
+- [ ] TASK-002: Evaluate arithmetic forms
+
+Dependencies:
+- TASK-001
+
+Description:
+Evaluate parsed arithmetic expressions.
+
+Acceptance Criteria:
+- Supports `+`, `-`, `*`, and `/`
+- Evaluates nested arithmetic
+- Includes unit tests
+
+- [ ] TASK-003: Add variables and definitions
+
+Dependencies:
+- TASK-002
+
+Description:
+Add an environment and support variable lookup and definitions.
+
+Acceptance Criteria:
+- Supports symbol lookup
+- Supports `(define name value)`
+- Keeps environment behavior tested
+
+- [ ] TASK-004: Add conditionals
+
+Dependencies:
+- TASK-003
+
+Description:
+Implement simple truthiness and `if` expressions.
+
+Acceptance Criteria:
+- Supports `(if condition then else)`
+- Handles false-like values consistently
+- Includes tests for both branches
+```
+
+### 4. Add Prompt Files
+
+`agents/planner.md`:
+
+```markdown
+You are the planning agent. Create a small, conservative plan for the task.
+Do not write code. Include files to edit, tests to add, and risks.
+```
+
+`agents/implementer.md`:
+
+```markdown
+You are the implementation agent. Implement the smallest correct change.
+Preserve existing behavior and include tests.
+```
+
+`agents/reviewer.md`:
+
+```markdown
+You are the review agent. Decide whether the task should pass, retry, or fail.
+
+Output:
+status: pass | fail | retry | escalate
+reason: <short explanation>
+next_stage: <optional stage id>
+context_update: <compact useful note>
+```
+
+### 5. Add an Initial Passing Test File
+
+```python
+# tests/test_lisp.py
+import unittest
+
+
+class SmokeTests(unittest.TestCase):
+    def test_smoke(self):
+        self.assertTrue(True)
+
+
+if __name__ == "__main__":
+    unittest.main()
+```
+
+### 6. Validate and Run
+
+```bash
+nightshift validate
+nightshift status
+nightshift run --task TASK-001
+```
+
+Run all currently runnable tasks:
+
+```bash
+nightshift run --all
+```
+
+Because the example uses fake agents, it will not actually implement the Lisp interpreter by itself. It is meant to verify the pipeline, dependency handling, reports, and artifacts before you connect a real command-backed agent.
+
+### 7. Review Artifacts
+
+After a run, inspect:
+
+```text
+.nightshift/runs/<run-id>/run-summary.md
+.nightshift/runs/<run-id>/tasks/TASK-001/plan.md
+.nightshift/runs/<run-id>/tasks/TASK-001/implementation-log.md
+.nightshift/runs/<run-id>/tasks/TASK-001/test-output.txt
+.nightshift/runs/<run-id>/tasks/TASK-001/review.md
+.nightshift/runs/<run-id>/tasks/TASK-001/final-notes.md
+```
+
+The useful signal is whether NightShift selected the right task, respected dependencies, ran the command stage, wrote artifacts, updated task completion, and produced a clear summary.
--- a/README.md
+++ b/README.md
@ -17,6 +17,7 @@ The core MVP is implemented:
 - `nightshift run` executes the next incomplete task.
 - `nightshift run --task TASK-001` executes a specific task.
 - Command-backed agents receive compact prompt bundles on stdin.
+- Ollama-backed agents can call local models with `backend: ollama`.
 - Command stages run through allowlist and forbidden-fragment checks.
 - Runs create `.nightshift/` artifacts, task context, retry context, command output, agent output, final notes, and run summaries.
 - Unit tests cover config, safety, tasks, artifacts, commands, agents, pipeline retries, context, and reports.
@ -179,7 +180,7 @@ pipeline:

 ## Agent Backends

-The MVP supports `backend: command`.
+NightShift supports `backend: command` and `backend: ollama`.

 NightShift builds a prompt bundle containing:

@ -193,7 +194,17 @@ NightShift builds a prompt bundle containing:
 - retry notes
 - output contract

-The prompt is passed to the configured command on stdin. stdout, stderr, exit code, duration, and the prompt are persisted as artifacts.
+The prompt is passed to the configured command or local Ollama model on stdin. stdout, stderr, exit code, duration, and the prompt are persisted as artifacts.
+
+Ollama example:
+
+```yaml
+agents:
+  planner:
+    backend: ollama
+    model: qwen2.5-coder:14b
+    system_prompt: agents/planner.md
+```

 Review agents should emit:

@ -264,17 +275,29 @@ Compile-check modules:
 python -m compileall nightshift tests
 ```

+Optional read-only dashboard:
+
+```bash
+pip install flask
+nightshift web
+```
+
+Additional docs:
+
+- [Config reference](docs/config-reference.md)
+- [Artifact review workflow](docs/artifact-review.md)
+- [Troubleshooting](docs/troubleshooting.md)
+- [Quickstart](QUICKSTART.md)
+- [Quickstart Lisp example](examples/quickstart-lisp/)
+
 ## Roadmap

 Next major work:

- real local model wrappers
- stronger git safety and diff capture
- task completion updates
- dependency handling
- richer status command
- prompt and model experimentation
+- richer local backend support beyond Ollama
 - optional branch isolation
- longer-run multi-task reports
+- live dashboard enhancements
+- stronger structured command definitions
+- longer-run reporting and resumability

 NightShift remains oriented around reviewable output, not blind autonomy.
--- a/docs/artifact-review.md
+++ b/docs/artifact-review.md
@ -0,0 +1,34 @@
+# Artifact Review Workflow
+
+Start with:
+
+```text
+.nightshift/runs/<run-id>/run-summary.md
+```
+
+Then inspect the task directory:
+
+```text
+.nightshift/runs/<run-id>/tasks/<task-id>/
+```
+
+Useful artifacts:
+
+- `task.md`: task snapshot.
+- `context.md`: compact task context.
+- `plan.md`: planning agent output.
+- `implementation-log.md`: implementation agent output.
+- `test-output.txt`: command stage transcript.
+- `review.md`: review agent output.
+- `stage-results.md`: structured stage status summary.
+- `context-out.md`: retry/context summary.
+- `final-notes.md`: final task report.
+- `diff.patch`: git diff when available.
+- `git-status-before.txt` / `git-status-after.txt`: git state snapshots.
+- `task-completion.md`: whether the task was marked complete.
+
+Run-level artifacts:
+
+- `config.snapshot.yaml`
+- `run-metadata.md`
+- `prompts/<agent>.md`
--- a/docs/config-reference.md
+++ b/docs/config-reference.md
@ -0,0 +1,61 @@
+# NightShift Config Reference
+
+NightShift config is YAML.
+
+## `project`
+
+- `name`: project display name.
+- `root`: project root, resolved relative to the config file.
+- `task_file`: markdown task file inside the project root.
+- `artifact_dir`: artifact directory inside the project root.
+
+## `safety`
+
+- `require_clean_worktree`: when true, block runs if `git status --short` is dirty or unavailable.
+- `scoped_paths`: paths that must resolve inside the project root.
+- `allowed_commands`: exact command-stage allowlist entries after whitespace normalization.
+- `forbidden_commands`: dangerous fragments blocked before allowlist acceptance.
+- `allowed_env`: optional environment variable names to pass to command stages.
+
+## `experiment`
+
+- `label`: optional run experiment label.
+- `prompt_variant`: optional prompt variant label.
+
+## `agents`
+
+Supported backends:
+
+- `command`: runs a local command with the prompt on stdin.
+- `ollama`: runs `ollama run <model>` with the prompt on stdin.
+
+Command agent:
+
+```yaml
+planner:
+  backend: command
+  command: echo
+  system_prompt: agents/planner.md
+```
+
+Ollama agent:
+
+```yaml
+planner:
+  backend: ollama
+  model: qwen2.5-coder:14b
+  system_prompt: agents/planner.md
+```
+
+## `pipeline`
+
+- `max_task_retries`: task retry limit.
+- `continue_on_task_failure`: for `run --all`, continue after failed/blocked tasks.
+- `stages`: ordered state-machine stages.
+
+Command stage options:
+
+- `commands`: command strings.
+- `shell`: defaults to true. Set false for argv-style execution.
+- `timeout_seconds`: per-stage timeout override.
+- `working_dir`: command working directory inside project root.
--- a/docs/design.md
+++ b/docs/design.md
@ -1086,6 +1086,51 @@ Notes:

 ---

+## Phase 22: Quickstart Test Project
+
+* [ ] Add a guided quickstart project to `QUICKSTART.md`
+* [ ] Recommend a small Python Lisp interpreter as the default test project
+* [ ] Provide a multi-task `tasks.md` example
+* [ ] Provide a matching `nightshift.yaml` example
+* [ ] Provide suggested planner, implementer, and reviewer prompt files
+* [ ] Include dependency examples across tasks
+* [ ] Include commands for validation, `run --task`, and `run --all`
+* [ ] Explain what artifacts the user should inspect after each run
+
+Acceptance Criteria:
+
+* A new user can create a small target repo and exercise NightShift end to end
+* The project has multiple independently reviewable tasks
+* Tasks are small enough for local/fake agents but realistic enough to test planning, implementation, tests, retries, artifacts, and dependencies
+* The quickstart does not require external services
+
+Recommended Project:
+
+* A minimal Lisp interpreter in Python is a good test project because it is compact, incremental, testable, and naturally splits into parser, evaluator, environment, builtins, and error-handling tasks.
+
+Alternative Projects:
+
+* If the Lisp interpreter feels too language-theory focused, use a small INI/TOML-like config parser or a markdown todo CLI. Both are also compact and testable, but the Lisp interpreter gives better coverage of multi-step implementation and test generation.
+
+---
+
+## Phase 17-22 Implementation Status
+
+Phases 17 through 22 are implemented.
+
+Implemented capabilities:
+
+* Ollama agent backend
+* Experiment metadata and prompt snapshots
+* Stronger command execution options
+* Config reference, artifact review, and troubleshooting docs
+* Read-only Flask dashboard entry point
+* Complete quickstart Lisp example project
+
+See `docs/devlog/phase17.md` through `docs/devlog/phase22.md` for implementation notes and decisions.
+
+---
+
 # Appendix A: Design Decisions and Rationale

 ## A.1 Local-first architecture
--- a/docs/devlog/phase17.md
+++ b/docs/devlog/phase17.md
@ -0,0 +1,21 @@
+# Phase 17 Devlog: Local Model Backend
+
+## Implemented
+
+- Added first-class `backend: ollama` agent config support.
+- Required `model` for Ollama agents.
+- Kept `backend: command` unchanged.
+- Reused the existing prompt bundle for Ollama.
+- Invoked Ollama as `ollama run <model>` with prompt input on stdin.
+- Persisted Ollama responses through the same agent artifact format.
+- Added tests with mocked subprocess calls so Ollama is not required.
+
+## Decisions Made
+
+- Ollama is implemented as a local subprocess backend instead of an HTTP API wrapper.
+- Missing Ollama executable returns a failed agent invocation artifact rather than crashing.
+- Backend artifacts remain comparable across command and Ollama agents.
+
+## Notes
+
+- Real model quality and model availability are user environment concerns; tests do not require a running Ollama daemon.
--- a/docs/devlog/phase18.md
+++ b/docs/devlog/phase18.md
@ -0,0 +1,18 @@
+# Phase 18 Devlog: Prompt and Pipeline Experiments
+
+## Implemented
+
+- Added optional `experiment.label` and `experiment.prompt_variant` config fields.
+- Snapshotted agent prompt files into `runs/<run-id>/prompts/`.
+- Wrote `run-metadata.md` with project, experiment, agent backend, model, command, and prompt metadata.
+- Included experiment metadata in final task reports and run summaries.
+- Added tests for experiment config loading and prompt/metadata artifact creation.
+
+## Decisions Made
+
+- Experiment metadata is descriptive only and does not alter execution semantics.
+- Prompt snapshots are per-run, not per-task, because agent definitions are run-level configuration.
+
+## Notes
+
+- This creates enough metadata to compare prompt/backend runs from artifacts without adding a database.
--- a/docs/devlog/phase19.md
+++ b/docs/devlog/phase19.md
@ -0,0 +1,20 @@
+# Phase 19 Devlog: Stronger Command Execution
+
+## Implemented
+
+- Added command stage `shell` option, defaulting to true for backward compatibility.
+- Added command stage `timeout_seconds` override.
+- Added command stage `working_dir` restricted to the project root.
+- Added `safety.allowed_env` for optional environment variable pass-through.
+- Added argv-style execution path when `shell: false`.
+- Added tests for shell-free execution and working-directory restrictions.
+
+## Decisions Made
+
+- Existing string command config remains valid.
+- `shell: false` still uses the same exact allowlist check before splitting into argv.
+- `PATH` is preserved when an environment allowlist is configured so common executables remain discoverable.
+
+## Notes
+
+- Future hardening can move toward structured command definitions, but this phase avoids breaking current configs.
--- a/docs/devlog/phase20.md
+++ b/docs/devlog/phase20.md
@ -0,0 +1,19 @@
+# Phase 20 Devlog: Documentation and Examples Refresh
+
+## Implemented
+
+- Added `docs/config-reference.md`.
+- Added `docs/artifact-review.md`.
+- Added `docs/troubleshooting.md`.
+- Added a complete `examples/quickstart-lisp/` project.
+- Updated quickstart docs to point users at the example project.
+
+## Decisions Made
+
+- Documentation now distinguishes command and Ollama agent backends.
+- The example project uses fake command agents so it can run without external services.
+- The quickstart Lisp project is included as a target repo example rather than baked into NightShift runtime behavior.
+
+## Notes
+
+- The example is intended for pipeline testing and artifact review, not as a full Lisp implementation.
--- a/docs/devlog/phase21.md
+++ b/docs/devlog/phase21.md
@ -0,0 +1,21 @@
+# Phase 21 Devlog: Read-Only Web Dashboard
+
+## Implemented
+
+- Added `nightshift/web.py`.
+- Added `nightshift web` CLI command.
+- Implemented read-only artifact dashboard rendering.
+- Listed runs from `.nightshift/runs/`.
+- Rendered run summaries with simple auto-refresh.
+- Added safe artifact reading that rejects path traversal.
+- Added tests for missing runs, run listing, and artifact path handling.
+
+## Decisions Made
+
+- Flask is an optional dependency. The CLI gives a clear error if Flask is missing.
+- The dashboard is artifact-driven and does not control pipeline execution.
+- No websockets, authentication, mutation, or live process control were added.
+
+## Notes
+
+- This is intentionally a monitoring entry point, not an operations console.
--- a/docs/devlog/phase22.md
+++ b/docs/devlog/phase22.md
@ -0,0 +1,19 @@
+# Phase 22 Devlog: Quickstart Test Project
+
+## Implemented
+
+- Added a guided Lisp interpreter quickstart project to `QUICKSTART.md`.
+- Added concrete quickstart project files under `examples/quickstart-lisp/`.
+- Included multi-task `tasks.md` with dependencies.
+- Included a matching `nightshift.yaml`.
+- Included planner, implementer, and reviewer prompt files.
+- Included an initial passing unittest smoke test.
+
+## Decisions Made
+
+- Kept the Lisp interpreter as the recommended test project because it is compact, incremental, and testable.
+- Fake agents are used in the example so users can validate NightShift before connecting a real local model or coding agent.
+
+## Notes
+
+- Users can copy `examples/quickstart-lisp/` to a scratch directory and run `nightshift validate`, `nightshift status`, and `nightshift run --all`.
--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@ -0,0 +1,29 @@
+# Troubleshooting
+
+## `command is not allowlisted`
+
+Add the exact command to `safety.allowed_commands`. NightShift normalizes whitespace but otherwise expects exact matches.
+
+## Command works in PowerShell but fails in NightShift
+
+Command stages use Python subprocess execution. By default `shell: true` uses the platform shell, which is usually `cmd.exe` on Windows. Prefer Python module commands or set explicit shell commands.
+
+## No runnable tasks
+
+Check `nightshift status`. A task may be blocked by dependencies listed under `Dependencies:`.
+
+## Git clean worktree failure
+
+If `require_clean_worktree: true`, NightShift blocks dirty repositories before creating artifacts. Commit/stash changes or set it to false.
+
+## Ollama backend fails
+
+The `ollama` backend requires the `ollama` executable to be installed and the configured model to be available. Tests do not require Ollama.
+
+## Flask dashboard fails
+
+Install Flask:
+
+```bash
+pip install flask
+```
--- a/examples/quickstart-lisp/agents/implementer.md
+++ b/examples/quickstart-lisp/agents/implementer.md
@ -0,0 +1,3 @@
+You are the implementation agent.
+
+Implement the smallest correct change and include tests.
--- a/examples/quickstart-lisp/agents/planner.md
+++ b/examples/quickstart-lisp/agents/planner.md
@ -0,0 +1,7 @@
+You are the planning agent. Create a small conservative plan.
+
+Include:
+- relevant files
+- implementation steps
+- tests
+- risks
--- a/examples/quickstart-lisp/agents/reviewer.md
+++ b/examples/quickstart-lisp/agents/reviewer.md
@ -0,0 +1,7 @@
+You are the review agent.
+
+Output:
+status: pass | fail | retry | escalate
+reason: <short explanation>
+next_stage: <optional stage id>
+context_update: <compact useful note>
--- a/examples/quickstart-lisp/lisp.py
+++ b/examples/quickstart-lisp/lisp.py
@ -0,0 +1,4 @@
+"""Tiny Lisp quickstart target.
+
+NightShift tasks in this example are intended to fill this module in.
+"""
--- a/examples/quickstart-lisp/nightshift.yaml
+++ b/examples/quickstart-lisp/nightshift.yaml
@ -0,0 +1,68 @@
+project:
+  name: tiny-lisp
+  root: .
+  task_file: tasks.md
+  artifact_dir: .nightshift
+
+safety:
+  require_clean_worktree: false
+  scoped_paths:
+    - .
+  allowed_commands:
+    - python -m unittest discover -v
+  forbidden_commands:
+    - rm -rf
+    - git push
+    - curl | bash
+
+experiment:
+  label: quickstart-lisp
+  prompt_variant: fake-agent-v1
+
+agents:
+  planner:
+    backend: command
+    command: echo
+    system_prompt: agents/planner.md
+
+  implementer:
+    backend: command
+    command: echo
+    system_prompt: agents/implementer.md
+
+  reviewer:
+    backend: command
+    command: python -c "print('status: pass'); print('reason: quickstart reviewer accepted artifacts')"
+    system_prompt: agents/reviewer.md
+
+pipeline:
+  max_task_retries: 1
+  continue_on_task_failure: false
+  stages:
+    - id: plan
+      type: agent
+      agent: planner
+      output: plan.md
+
+    - id: implement
+      type: agent
+      agent: implementer
+      output: implementation-log.md
+
+    - id: test
+      type: command
+      commands:
+        - python -m unittest discover -v
+      output: test-output.txt
+      shell: true
+      timeout_seconds: 60
+
+    - id: review
+      type: agent_review
+      agent: reviewer
+      on_fail: implement
+      output: review.md
+
+    - id: summarize
+      type: summarize
+      output: final-notes.md
--- a/examples/quickstart-lisp/tasks.md
+++ b/examples/quickstart-lisp/tasks.md
@ -0,0 +1,39 @@
+# Tasks
+
+- [ ] TASK-001: Parse Lisp expressions
+
+Description:
+Implement tokenization and parsing for a tiny Lisp subset.
+
+Acceptance Criteria:
+- Parses numbers
+- Parses symbols
+- Parses nested lists
+- Raises useful errors for unbalanced parentheses
+- Includes unit tests
+
+- [ ] TASK-002: Evaluate arithmetic forms
+
+Dependencies:
+- TASK-001
+
+Description:
+Evaluate parsed arithmetic expressions.
+
+Acceptance Criteria:
+- Supports `+`, `-`, `*`, and `/`
+- Evaluates nested arithmetic
+- Includes unit tests
+
+- [ ] TASK-003: Add variables and definitions
+
+Dependencies:
+- TASK-002
+
+Description:
+Add an environment and support variable lookup and definitions.
+
+Acceptance Criteria:
+- Supports symbol lookup
+- Supports `(define name value)`
+- Keeps environment behavior tested
--- a/examples/quickstart-lisp/tests/init.py
+++ b/examples/quickstart-lisp/tests/init.py
@ -0,0 +1 @@
+"""Quickstart Lisp test suite."""
--- a/examples/quickstart-lisp/tests/test_lisp.py
+++ b/examples/quickstart-lisp/tests/test_lisp.py
@ -0,0 +1,10 @@
+import unittest
+
+
+class SmokeTests(unittest.TestCase):
+    def test_smoke(self):
+        self.assertTrue(True)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/nightshift/agents.py
+++ b/nightshift/agents.py
@ -33,8 +33,8 @@ class AgentInvocation:
 class AgentExecutor:
    """Execute configured agents.

-    v1 supports the `command` backend only. The command receives the prompt
-    bundle on stdin and its stdout/stderr are persisted as the stage artifact.
+    Supports command-backed agents and a first-class Ollama backend. Both
+    receive the same prompt bundle on stdin and persist comparable artifacts.
    """

    def __init__(
@ -64,12 +64,14 @@ class AgentExecutor:
        agent = self.agents.get(stage.agent)
        if agent is None:
            raise AgentError(f"Agent error: unknown agent '{stage.agent}' for stage '{stage.id}'.")
-        if agent.backend != "command":
+        if agent.backend not in {"command", "ollama"}:
            raise AgentError(
                f"Agent error: agent '{agent.id}' uses unsupported backend '{agent.backend}'."
            )
-        if not agent.command:
+        if agent.backend == "command" and not agent.command:
            raise AgentError(f"Agent error: command backend agent '{agent.id}' has no command.")
+        if agent.backend == "ollama" and not agent.model:
+            raise AgentError(f"Agent error: ollama backend agent '{agent.id}' has no model.")

        system_prompt = self._read_system_prompt(agent)
        prompt = build_prompt_bundle(
@ -131,6 +133,13 @@ class AgentExecutor:
        return self.artifacts.project_context_path.read_text(encoding="utf-8")

    def _invoke(self, agent: AgentConfig, prompt: str) -> AgentInvocation:
+        if agent.backend == "ollama":
+            return self._invoke_ollama(agent, prompt)
+        return self._invoke_command(agent, prompt)
+
+    def _invoke_command(self, agent: AgentConfig, prompt: str) -> AgentInvocation:
+        if not agent.command:
+            raise AgentError(f"Agent error: command backend agent '{agent.id}' has no command.")
        started = time.monotonic()
        try:
            completed = subprocess.run(
@ -165,6 +174,54 @@ class AgentExecutor:
                timed_out=True,
            )

+    def _invoke_ollama(self, agent: AgentConfig, prompt: str) -> AgentInvocation:
+        if not agent.model:
+            raise AgentError(f"Agent error: ollama backend agent '{agent.id}' has no model.")
+        command = f"ollama run {agent.model}"
+        started = time.monotonic()
+        try:
+            completed = subprocess.run(
+                ["ollama", "run", agent.model],
+                cwd=self.project_root,
+                input=prompt,
+                capture_output=True,
+                text=True,
+                timeout=self.timeout_seconds,
+            )
+            duration = time.monotonic() - started
+            return AgentInvocation(
+                agent_id=agent.id,
+                command=command,
+                prompt=prompt,
+                exit_code=completed.returncode,
+                stdout=completed.stdout,
+                stderr=completed.stderr,
+                duration_seconds=duration,
+            )
+        except FileNotFoundError as exc:
+            duration = time.monotonic() - started
+            return AgentInvocation(
+                agent_id=agent.id,
+                command=command,
+                prompt=prompt,
+                exit_code=127,
+                stdout="",
+                stderr=str(exc),
+                duration_seconds=duration,
+            )
+        except subprocess.TimeoutExpired as exc:
+            duration = time.monotonic() - started
+            return AgentInvocation(
+                agent_id=agent.id,
+                command=command,
+                prompt=prompt,
+                exit_code=-1,
+                stdout=_coerce_output(exc.stdout),
+                stderr=_coerce_output(exc.stderr),
+                duration_seconds=duration,
+                timed_out=True,
+            )
+

 def build_prompt_bundle(
    system_prompt: str,
--- a/nightshift/artifacts.py
+++ b/nightshift/artifacts.py
@ -68,6 +68,34 @@ class ArtifactStore:
        shutil.copyfile(source, self.config_snapshot_path)
        return self.config_snapshot_path

+    def write_prompt_snapshots(self, prompt_paths: dict[str, Path]) -> list[Path]:
+        """Copy agent prompt files into the run artifact directory."""
+
+        self.initialize_run()
+        prompts_dir = self.run_dir / "prompts"
+        prompts_dir.mkdir(parents=True, exist_ok=True)
+        written: list[Path] = []
+        for agent_id, prompt_path in sorted(prompt_paths.items()):
+            source = prompt_path.resolve()
+            try:
+                source.relative_to(self.project_root)
+            except ValueError as exc:
+                raise ArtifactError(
+                    f"Artifact error: prompt path is outside project root: {source}"
+                ) from exc
+            if not source.exists():
+                raise ArtifactError(f"Artifact error: prompt path does not exist: {source}")
+            target = prompts_dir / f"{_safe_artifact_segment(agent_id, 'agent id')}.md"
+            shutil.copyfile(source, target)
+            written.append(target)
+        return written
+
+    def write_run_metadata(self, content: str, filename: str = "run-metadata.md") -> Path:
+        self.initialize_run()
+        path = self.run_dir / filename
+        path.write_text(content, encoding="utf-8")
+        return path
+
    def create_task_dir(self, task_id: str) -> TaskArtifactPaths:
        """Create the artifact directory for one task."""

--- a/nightshift/cli.py
+++ b/nightshift/cli.py
@ -18,6 +18,7 @@ from .tasks import (
    select_task_by_id,
    validate_task_dependencies,
 )
+from .web import create_app


 def build_parser() -> argparse.ArgumentParser:
@ -41,6 +42,11 @@ def build_parser() -> argparse.ArgumentParser:
    status_parser = subparsers.add_parser("status", help="Inspect NightShift project status.")
    status_parser.add_argument("--config", default="nightshift.yaml", help="Config file to inspect.")

+    web_parser = subparsers.add_parser("web", help="Start a read-only artifact dashboard.")
+    web_parser.add_argument("--config", default="nightshift.yaml", help="Config file to inspect.")
+    web_parser.add_argument("--host", default="127.0.0.1", help="Host to bind.")
+    web_parser.add_argument("--port", type=int, default=8765, help="Port to bind.")
+
    return parser


@ -101,6 +107,12 @@ def main(argv: list[str] | None = None) -> int:
            print(format_status(build_status(config, tasks)))
            return 0

+        if args.command == "web":
+            config = validate_config(args.config)
+            app = create_app(config.project.root, config.project.artifact_dir)
+            app.run(host=args.host, port=args.port)
+            return 0
+
    except NightShiftError as exc:
        print(str(exc), file=sys.stderr)
        return 1
--- a/nightshift/commands.py
+++ b/nightshift/commands.py
@ -3,14 +3,16 @@
 from __future__ import annotations

 from dataclasses import dataclass
+import os
 from pathlib import Path
+import shlex
 import subprocess
 import time

 from .artifacts import ArtifactStore
 from .config import SafetyConfig, StageConfig
 from .errors import CommandError, SafetyError
-from .safety import ensure_command_allowed, resolve_project_root
+from .safety import ensure_command_allowed, resolve_inside_root, resolve_project_root
 from .stages import StageResult


@ -55,11 +57,17 @@ class CommandExecutor:
        reason = "All commands passed."

        for command in stage.commands:
-            run = self.run_command(command)
+            run = self.run_command(
+                command,
+                shell=stage.shell,
+                timeout_seconds=stage.timeout_seconds,
+                working_dir=stage.working_dir,
+            )
            runs.append(run)
            if run.timed_out:
                status = "fail"
-                reason = f"Command timed out after {self.timeout_seconds}s: {run.command}"
+                timeout = stage.timeout_seconds or self.timeout_seconds
+                reason = f"Command timed out after {timeout}s: {run.command}"
                break
            if run.exit_code != 0:
                status = "fail"
@ -79,7 +87,13 @@ class CommandExecutor:
            output_path=str(output_path.relative_to(self.project_root)),
        )

-    def run_command(self, command: str) -> CommandRun:
+    def run_command(
+        self,
+        command: str,
+        shell: bool = True,
+        timeout_seconds: int | None = None,
+        working_dir: Path | None = None,
+    ) -> CommandRun:
        try:
            normalized = ensure_command_allowed(
                command,
@ -89,15 +103,30 @@ class CommandExecutor:
        except SafetyError as exc:
            raise CommandError(str(exc)) from exc

+        cwd = self.project_root
+        if working_dir is not None:
+            try:
+                cwd = resolve_inside_root(self.project_root, working_dir, "command working_dir")
+            except SafetyError as exc:
+                raise CommandError(str(exc)) from exc
+        timeout = timeout_seconds or self.timeout_seconds
+        args: str | list[str] = normalized if shell else shlex.split(normalized)
+        env = None
+        if self.safety.allowed_env:
+            env = {name: os.environ[name] for name in self.safety.allowed_env if name in os.environ}
+            if "PATH" in os.environ:
+                env.setdefault("PATH", os.environ["PATH"])
+
        started = time.monotonic()
        try:
            completed = subprocess.run(
-                normalized,
-                cwd=self.project_root,
-                shell=True,
+                args,
+                cwd=cwd,
+                shell=shell,
                capture_output=True,
                text=True,
-                timeout=self.timeout_seconds,
+                timeout=timeout,
+                env=env,
            )
            duration = time.monotonic() - started
            return CommandRun(
--- a/nightshift/config.py
+++ b/nightshift/config.py
@ -32,6 +32,7 @@ class SafetyConfig:
    scoped_paths: tuple[str, ...]
    allowed_commands: tuple[str, ...]
    forbidden_commands: tuple[str, ...]
+    allowed_env: tuple[str, ...] = ()


@dataclass(frozen=True)
@ -52,6 +53,15 @@ class StageConfig:
    commands: tuple[str, ...] = ()
    output: str | None = None
    on_fail: str | None = None
+    shell: bool = True
+    timeout_seconds: int | None = None
+    working_dir: Path | None = None
+
+
+@dataclass(frozen=True)
+class ExperimentConfig:
+    label: str | None = None
+    prompt_variant: str | None = None


@dataclass(frozen=True)
@ -68,6 +78,7 @@ class NightShiftConfig:
    safety: SafetyConfig
    agents: dict[str, AgentConfig]
    pipeline: PipelineConfig
+    experiment: ExperimentConfig = ExperimentConfig()


 AGENT_STAGE_TYPES = {"agent", "agent_review", "review"}
@ -110,6 +121,11 @@ def validate_config(path: str | Path = "nightshift.yaml") -> NightShiftConfig:
            )

    for stage in config.pipeline.stages:
+        if stage.working_dir is not None:
+            try:
+                resolve_inside_root(root, stage.working_dir, f"stage '{stage.id}' working_dir")
+            except SafetyError as exc:
+                raise ConfigError(f"Config error: {exc}") from exc
        for command in stage.commands:
            try:
                ensure_command_allowed(
@ -153,6 +169,7 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig:
        forbidden_commands=_string_tuple(
            safety_raw.get("forbidden_commands", []), "safety.forbidden_commands"
        ),
+        allowed_env=_string_tuple(safety_raw.get("allowed_env", []), "safety.allowed_env"),
    )

    agents_raw = _require_mapping(raw["agents"], "agents")
@ -163,25 +180,41 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig:
        agent_raw = _require_mapping(agent_raw_value, f"agents.{agent_id}")
        backend = _require_string(agent_raw, "backend", f"agents.{agent_id}")
        command = _optional_string(agent_raw.get("command"), f"agents.{agent_id}.command")
-        if backend != "command":
+        model = _optional_string(agent_raw.get("model"), f"agents.{agent_id}.model")
+        if backend not in {"command", "ollama"}:
            raise ConfigError(
                f"Config error: agent '{agent_id}' uses unsupported backend '{backend}'. "
-                "Supported backends: command."
+                "Supported backends: command, ollama."
            )
-        if command is None:
+        if backend == "command" and command is None:
            raise ConfigError(
                f"Config error: command backend agent '{agent_id}' must define command."
            )
+        if backend == "ollama" and model is None:
+            raise ConfigError(
+                f"Config error: ollama backend agent '{agent_id}' must define model."
+            )
        system_prompt = Path(_require_string(agent_raw, "system_prompt", f"agents.{agent_id}"))
        agents[str(agent_id)] = AgentConfig(
            id=str(agent_id),
            backend=backend,
            command=command,
            system_prompt=system_prompt,
-            model=_optional_string(agent_raw.get("model"), f"agents.{agent_id}.model"),
+            model=model,
            role=_optional_string(agent_raw.get("role"), f"agents.{agent_id}.role"),
        )

+    experiment_raw = raw.get("experiment", {})
+    if experiment_raw is None:
+        experiment_raw = {}
+    experiment_raw = _require_mapping(experiment_raw, "experiment")
+    experiment = ExperimentConfig(
+        label=_optional_string(experiment_raw.get("label"), "experiment.label"),
+        prompt_variant=_optional_string(
+            experiment_raw.get("prompt_variant"), "experiment.prompt_variant"
+        ),
+    )
+
    pipeline_raw = _require_mapping(raw["pipeline"], "pipeline")
    max_task_retries = _optional_int(
        pipeline_raw.get("max_task_retries", 0),
@ -218,6 +251,13 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig:

        agent = _optional_string(stage_raw.get("agent"), f"{stage_context}.agent")
        commands = _string_tuple(stage_raw.get("commands", []), f"{stage_context}.commands")
+        timeout_seconds = _optional_int_or_none(
+            stage_raw.get("timeout_seconds"),
+            f"{stage_context}.timeout_seconds",
+        )
+        if timeout_seconds is not None and timeout_seconds <= 0:
+            raise ConfigError(f"Config error: {stage_context}.timeout_seconds must be greater than zero.")
+        working_dir_raw = _optional_string(stage_raw.get("working_dir"), f"{stage_context}.working_dir")

        if stage_type in AGENT_STAGE_TYPES:
            if agent is None:
@ -244,6 +284,9 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig:
                commands=commands,
                output=_optional_string(stage_raw.get("output"), f"{stage_context}.output"),
                on_fail=_optional_string(stage_raw.get("on_fail"), f"{stage_context}.on_fail"),
+                shell=_optional_bool(stage_raw.get("shell", True), f"{stage_context}.shell"),
+                timeout_seconds=timeout_seconds,
+                working_dir=Path(working_dir_raw) if working_dir_raw else None,
            )
        )

@ -264,6 +307,7 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig:
            stages=tuple(stages),
            continue_on_task_failure=continue_on_task_failure,
        ),
+        experiment=experiment,
    )


@ -442,6 +486,12 @@ def _optional_int(value: Any, context: str) -> int:
    return value


+def _optional_int_or_none(value: Any, context: str) -> int | None:
+    if value is None:
+        return None
+    return _optional_int(value, context)
+
+
 def _string_tuple(value: Any, context: str) -> tuple[str, ...]:
    if value is None:
        return ()
--- a/nightshift/pipeline.py
+++ b/nightshift/pipeline.py
@ -50,7 +50,12 @@ class PipelineRunner:
        self.config = config
        self.artifacts = artifacts or ArtifactStore.from_config(config)
        self.context = ContextManager(self.artifacts)
-        self.reports = ReportGenerator(config.project.root, self.artifacts)
+        self.reports = ReportGenerator(
+            config.project.root,
+            self.artifacts,
+            experiment_label=config.experiment.label,
+            prompt_variant=config.experiment.prompt_variant,
+        )
        self.agent_executor = AgentExecutor(
            config.project.root,
            config.agents,
@ -68,6 +73,13 @@ class PipelineRunner:
        ensure_clean_worktree(self.config.project.root, self.config.safety.require_clean_worktree)
        self.artifacts.initialize_run()
        self.artifacts.write_config_snapshot(self.config.path)
+        self.artifacts.write_prompt_snapshots(
+            {
+                agent_id: self.config.project.root / agent.system_prompt
+                for agent_id, agent in self.config.agents.items()
+            }
+        )
+        self.artifacts.write_run_metadata(format_run_metadata(self.config))
        self.artifacts.write_task_snapshot(task)
        write_git_artifacts(self.artifacts, task.id, "before")
        self.context.ensure_project_context()
@ -333,3 +345,29 @@ def format_aggregate_run_summary(results: list[PipelineResult], status: str, rea
        )
    lines.append("")
    return "\n".join(lines)
+
+
+def format_run_metadata(config: NightShiftConfig) -> str:
+    lines = [
+        "# Run Metadata",
+        "",
+        f"Project: {config.project.name}",
+        f"Experiment label: {config.experiment.label or ''}",
+        f"Prompt variant: {config.experiment.prompt_variant or ''}",
+        "",
+        "## Agents",
+        "",
+    ]
+    for agent in config.agents.values():
+        lines.extend(
+            [
+                f"### {agent.id}",
+                "",
+                f"- Backend: {agent.backend}",
+                f"- Model: {agent.model or ''}",
+                f"- Command: {agent.command or ''}",
+                f"- System prompt: {agent.system_prompt}",
+                "",
+            ]
+        )
+    return "\n".join(lines)
--- a/nightshift/reports.py
+++ b/nightshift/reports.py
@ -21,9 +21,17 @@ class TaskReport:
 class ReportGenerator:
    """Write task and run summaries from pipeline results."""

-    def __init__(self, project_root: Path, artifacts: ArtifactStore) -> None:
+    def __init__(
+        self,
+        project_root: Path,
+        artifacts: ArtifactStore,
+        experiment_label: str | None = None,
+        prompt_variant: str | None = None,
+    ) -> None:
        self.project_root = project_root
        self.artifacts = artifacts
+        self.experiment_label = experiment_label
+        self.prompt_variant = prompt_variant

    def write_reports(
        self,
@ -51,6 +59,8 @@ class ReportGenerator:
                modified_files=modified_files,
                stage_results_path=stage_results_path,
                context_out_path=context_out_path,
+                experiment_label=self.experiment_label,
+                prompt_variant=self.prompt_variant,
            ),
        )
        self.artifacts.run_summary_path.write_text(
@ -62,6 +72,8 @@ class ReportGenerator:
                modified_files=modified_files,
                final_notes_path=final_notes_path,
                stage_results_path=stage_results_path,
+                experiment_label=self.experiment_label,
+                prompt_variant=self.prompt_variant,
            ),
            encoding="utf-8",
        )
@ -109,6 +121,8 @@ def format_task_report(
    modified_files: list[str],
    stage_results_path: Path,
    context_out_path: Path | None,
+    experiment_label: str | None = None,
+    prompt_variant: str | None = None,
 ) -> str:
    stage_lines = "\n".join(
        f"- `{result.stage_id}`: {result.status} ({result.reason})" for result in stage_results
@ -130,6 +144,11 @@ def format_task_report(
            f"Retry count: {retry_count}",
            f"Reason: {reason}",
            "",
+            "## Experiment",
+            "",
+            f"- Label: {experiment_label or ''}",
+            f"- Prompt variant: {prompt_variant or ''}",
+            "",
            "## Acceptance Criteria",
            "",
            "\n".join(f"- {item}" for item in task.acceptance_criteria),
@ -158,6 +177,8 @@ def format_run_summary(
    modified_files: list[str],
    final_notes_path: Path,
    stage_results_path: Path,
+    experiment_label: str | None = None,
+    prompt_variant: str | None = None,
 ) -> str:
    modified = "\n".join(f"- `{path}`" for path in modified_files) if modified_files else "- Unavailable or none detected"
    return "\n".join(
@ -168,6 +189,8 @@ def format_run_summary(
            f"- Status: {status}",
            f"- Retry count: {retry_count}",
            f"- Reason: {reason}",
+            f"- Experiment label: {experiment_label or ''}",
+            f"- Prompt variant: {prompt_variant or ''}",
            "",
            "## Modified Files",
            "",
--- a/nightshift/web.py
+++ b/nightshift/web.py
@ -0,0 +1,81 @@
+"""Read-only web dashboard for NightShift artifacts."""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from html import escape
+from pathlib import Path
+
+from .errors import NightShiftError
+
+
+@dataclass(frozen=True)
+class RunInfo:
+    name: str
+    path: Path
+    summary: str
+
+
+def list_runs(artifact_dir: str | Path) -> list[RunInfo]:
+    runs_dir = Path(artifact_dir) / "runs"
+    if not runs_dir.exists():
+        return []
+    runs: list[RunInfo] = []
+    for path in sorted((item for item in runs_dir.iterdir() if item.is_dir()), reverse=True):
+        summary_path = path / "run-summary.md"
+        summary = summary_path.read_text(encoding="utf-8") if summary_path.exists() else "No run summary yet."
+        runs.append(RunInfo(name=path.name, path=path, summary=summary))
+    return runs
+
+
+def read_artifact(run_path: Path, relative_path: str) -> str:
+    candidate = (run_path / relative_path).resolve()
+    try:
+        candidate.relative_to(run_path.resolve())
+    except ValueError:
+        return "Artifact path escapes run directory."
+    if not candidate.exists() or not candidate.is_file():
+        return "Artifact not found."
+    return candidate.read_text(encoding="utf-8", errors="replace")
+
+
+def render_dashboard(artifact_dir: str | Path) -> str:
+    runs = list_runs(artifact_dir)
+    body = ["<h1>NightShift Dashboard</h1>", '<meta http-equiv="refresh" content="5">']
+    if not runs:
+        body.append("<p>No runs found.</p>")
+    for run in runs:
+        body.extend(
+            [
+                f"<section><h2>{escape(run.name)}</h2>",
+                "<pre>",
+                escape(run.summary),
+                "</pre>",
+                "</section>",
+            ]
+        )
+    return "\n".join(["<!doctype html>", "<html><body>", *body, "</body></html>"])
+
+
+def create_app(project_root: str | Path = ".", artifact_dir: str | Path = ".nightshift"):
+    try:
+        from flask import Flask, Response
+    except ModuleNotFoundError as exc:
+        raise NightShiftError(
+            "Web dashboard requires Flask. Install it with `pip install flask`."
+        ) from exc
+
+    root = Path(project_root).resolve()
+    artifacts = root / artifact_dir
+    app = Flask(__name__)
+
+    @app.get("/")
+    def index():
+        return Response(render_dashboard(artifacts), mimetype="text/html")
+
+    @app.get("/runs/<run_id>/<path:artifact_path>")
+    def artifact(run_id: str, artifact_path: str):
+        content = read_artifact(artifacts / "runs" / run_id, artifact_path)
+        return Response(f"<pre>{escape(content)}</pre>", mimetype="text/html")
+
+    return app
--- a/tests/test_agents.py
+++ b/tests/test_agents.py
@ -1,6 +1,7 @@
 from pathlib import Path
 import tempfile
 import unittest
+from unittest.mock import patch

 from nightshift.agents import AgentExecutor, build_prompt_bundle, parse_review_output
 from nightshift.artifacts import ArtifactStore
@ -94,6 +95,42 @@ class AgentExecutorTests(unittest.TestCase):
        self.assertEqual(next_stage, "implement")
        self.assertEqual(context_update, "Fix tests")

+    def test_ollama_agent_invocation_uses_model_without_real_ollama(self) -> None:
+        with tempfile.TemporaryDirectory() as directory:
+            root = Path(directory)
+            prompt_path = root / "planner.md"
+            prompt_path.write_text("Plan carefully.", encoding="utf-8")
+            artifacts = ArtifactStore(root, ".nightshift", run_id="test-run")
+            executor = AgentExecutor(
+                root,
+                {
+                    "planner": AgentConfig(
+                        id="planner",
+                        backend="ollama",
+                        command=None,
+                        model="tiny-model",
+                        system_prompt=Path("planner.md"),
+                    )
+                },
+                artifacts,
+            )
+            task = parse_tasks(TASK_MD)[0]
+            stage = StageConfig(id="plan", type="agent", agent="planner", output="plan.md")
+
+            completed = type(
+                "Completed",
+                (),
+                {"returncode": 0, "stdout": "ollama output", "stderr": ""},
+            )()
+            with patch("nightshift.agents.subprocess.run", return_value=completed) as run:
+                result = executor.run_stage(stage, task)
+
+            self.assertEqual(result.status, "pass")
+            run.assert_called_once()
+            self.assertEqual(run.call_args.args[0], ["ollama", "run", "tiny-model"])
+            output = (root / result.output_path).read_text(encoding="utf-8")
+            self.assertIn("ollama run tiny-model", output)
+

 if __name__ == "__main__":
    unittest.main()
--- a/tests/test_commands.py
+++ b/tests/test_commands.py
@ -119,6 +119,37 @@ class CommandExecutorTests(unittest.TestCase):
            output = (root / result.output_path).read_text(encoding="utf-8")
            self.assertIn("Timed out: true", output)

+    def test_command_stage_can_run_without_shell_and_with_working_dir(self) -> None:
+        with tempfile.TemporaryDirectory() as directory:
+            root = Path(directory)
+            work = root / "work"
+            work.mkdir()
+            command = 'python -c "import pathlib; print(pathlib.Path.cwd().name)"'
+            executor = CommandExecutor(
+                root,
+                SafetyConfig(
+                    require_clean_worktree=False,
+                    scoped_paths=(".",),
+                    allowed_commands=(command,),
+                    forbidden_commands=("rm -rf",),
+                ),
+                ArtifactStore(root, ".nightshift", run_id="test-run"),
+            )
+            stage = StageConfig(
+                id="test",
+                type="command",
+                commands=(command,),
+                output="test-output.txt",
+                shell=False,
+                working_dir=Path("work"),
+            )
+
+            result = executor.run_stage(stage, "TASK-001")
+
+            self.assertEqual(result.status, "pass")
+            output = (root / result.output_path).read_text(encoding="utf-8")
+            self.assertIn("work", output)
+

 if __name__ == "__main__":
    unittest.main()
--- a/tests/test_config.py
+++ b/tests/test_config.py
@ -128,6 +128,66 @@ class ConfigTests(unittest.TestCase):
            with self.assertRaisesRegex(ConfigError, "must define command"):
                load_config(config_path)

+    def test_ollama_backend_requires_model(self) -> None:
+        with tempfile.TemporaryDirectory() as directory:
+            root = Path(directory)
+            init_project(root)
+            config_path = root / "nightshift.yaml"
+            config_path.write_text(
+                config_path.read_text(encoding="utf-8").replace(
+                    "backend: command\n    command: echo",
+                    "backend: ollama",
+                    1,
+                ),
+                encoding="utf-8",
+            )
+
+            with self.assertRaisesRegex(ConfigError, "must define model"):
+                load_config(config_path)
+
+    def test_ollama_backend_and_experiment_metadata_load(self) -> None:
+        with tempfile.TemporaryDirectory() as directory:
+            root = Path(directory)
+            init_project(root)
+            config_path = root / "nightshift.yaml"
+            text = config_path.read_text(encoding="utf-8").replace(
+                "backend: command\n    command: echo",
+                "backend: ollama\n    model: qwen2.5-coder:14b",
+                1,
+            )
+            text = text.replace(
+                "agents:",
+                "experiment:\n  label: local-test\n  prompt_variant: v1\n\nagents:",
+            )
+            config_path.write_text(text, encoding="utf-8")
+
+            config = load_config(config_path)
+
+            self.assertEqual(config.agents["planner"].backend, "ollama")
+            self.assertEqual(config.agents["planner"].model, "qwen2.5-coder:14b")
+            self.assertEqual(config.experiment.label, "local-test")
+
+    def test_command_stage_options_load(self) -> None:
+        with tempfile.TemporaryDirectory() as directory:
+            root = Path(directory)
+            init_project(root)
+            config_path = root / "nightshift.yaml"
+            config_path.write_text(
+                config_path.read_text(encoding="utf-8").replace(
+                    "      output: test-output.txt",
+                    "      output: test-output.txt\n      shell: false\n      timeout_seconds: 30\n      working_dir: .",
+                    1,
+                ),
+                encoding="utf-8",
+            )
+
+            config = load_config(config_path)
+            test_stage = next(stage for stage in config.pipeline.stages if stage.id == "test")
+
+            self.assertFalse(test_stage.shell)
+            self.assertEqual(test_stage.timeout_seconds, 30)
+            self.assertEqual(test_stage.working_dir, Path("."))
+
    def test_non_command_stage_cannot_define_commands(self) -> None:
        with tempfile.TemporaryDirectory() as directory:
            root = Path(directory)
--- a/tests/test_pipeline.py
+++ b/tests/test_pipeline.py
@ -93,6 +93,8 @@ class PipelineRunnerTests(unittest.TestCase):
            self.assertEqual(result.retry_count, 0)
            self.assertTrue((root / ".nightshift" / "runs" / "test-run" / "tasks" / task.id / "plan.md").exists())
            self.assertTrue((root / ".nightshift" / "runs" / "test-run" / "tasks" / task.id / "stage-results.md").exists())
+            self.assertTrue((root / ".nightshift" / "runs" / "test-run" / "prompts" / "planner.md").exists())
+            self.assertTrue((root / ".nightshift" / "runs" / "test-run" / "run-metadata.md").exists())
            self.assertTrue((root / ".nightshift" / "runs" / "test-run" / "tasks" / task.id / "context.md").exists())
            self.assertTrue((root / ".nightshift" / "runs" / "test-run" / "tasks" / task.id / "context-out.md").exists())
            self.assertIn(
--- a/tests/test_web.py
+++ b/tests/test_web.py
@ -0,0 +1,33 @@
+from pathlib import Path
+import tempfile
+import unittest
+
+from nightshift.artifacts import ArtifactStore
+from nightshift.web import list_runs, read_artifact, render_dashboard
+
+
+class WebDashboardTests(unittest.TestCase):
+    def test_render_dashboard_handles_missing_runs(self) -> None:
+        with tempfile.TemporaryDirectory() as directory:
+            html = render_dashboard(Path(directory) / ".nightshift")
+
+            self.assertIn("No runs found", html)
+
+    def test_lists_runs_and_reads_artifacts_safely(self) -> None:
+        with tempfile.TemporaryDirectory() as directory:
+            root = Path(directory)
+            artifacts = ArtifactStore(root, ".nightshift", run_id="test-run")
+            artifacts.initialize_run()
+            artifacts.run_summary_path.write_text("# Summary\n\nok", encoding="utf-8")
+
+            runs = list_runs(root / ".nightshift")
+            content = read_artifact(root / ".nightshift" / "runs" / "test-run", "run-summary.md")
+            escaped = read_artifact(root / ".nightshift" / "runs" / "test-run", "../project-context.md")
+
+            self.assertEqual(len(runs), 1)
+            self.assertIn("ok", content)
+            self.assertIn("escapes", escaped)
+
+
+if __name__ == "__main__":
+    unittest.main()