Ollama backend support, experiment metadata and prompt snapshots, stronger command execution controls, refreshed docs/examples, a read-only Flask dashboard, and a runnable quickstart Lisp project.

This commit is contained in:
K. Hodges 2026-05-17 01:39:44 -07:00
parent 57608e9660
commit 957dc7d25b
33 changed files with 1181 additions and 27 deletions

View File

@ -80,3 +80,227 @@ tasks/TASK-001/final-notes.md
Example run files are available in `templates/`. Example run files are available in `templates/`.
They are safe starter examples and use command-backed fake agents. They are safe starter examples and use command-backed fake agents.
The repository also includes a complete sample target project:
```text
examples/quickstart-lisp/
```
Copy that directory elsewhere if you want to test NightShift against a multi-task project.
## Quickstart Test Project
A good first real target project is a tiny Lisp interpreter in Python. It is small enough to review, but it naturally breaks into multiple tasks that test NightShift's planning, implementation, command execution, artifacts, reports, and dependency handling.
If you do not want a language interpreter, use a small config parser or markdown todo CLI instead. The Lisp interpreter is the recommended default because it has clear incremental milestones and simple tests.
### 1. Create a Target Project
```bash
mkdir tiny-lisp
cd tiny-lisp
mkdir agents tests
touch lisp.py tests/test_lisp.py
```
### 2. Add `nightshift.yaml`
```yaml
project:
name: tiny-lisp
root: .
task_file: tasks.md
artifact_dir: .nightshift
safety:
require_clean_worktree: false
scoped_paths:
- .
allowed_commands:
- python -m unittest discover -v
forbidden_commands:
- rm -rf
- git push
- curl | bash
agents:
planner:
backend: command
command: echo
system_prompt: agents/planner.md
implementer:
backend: command
command: echo
system_prompt: agents/implementer.md
reviewer:
backend: command
command: python -c "print('status: pass'); print('reason: quickstart reviewer accepted artifacts')"
system_prompt: agents/reviewer.md
pipeline:
max_task_retries: 1
continue_on_task_failure: false
stages:
- id: plan
type: agent
agent: planner
output: plan.md
- id: implement
type: agent
agent: implementer
output: implementation-log.md
- id: test
type: command
commands:
- python -m unittest discover -v
output: test-output.txt
- id: review
type: agent_review
agent: reviewer
on_fail: implement
output: review.md
- id: summarize
type: summarize
output: final-notes.md
```
This uses fake command agents so the pipeline is safe and deterministic. Replace `command: echo` later with your real local agent wrapper.
### 3. Add `tasks.md`
```markdown
# Tasks
- [ ] TASK-001: Parse Lisp expressions
Description:
Implement tokenization and parsing for a tiny Lisp subset.
Acceptance Criteria:
- Parses numbers
- Parses symbols
- Parses nested lists
- Raises useful errors for unbalanced parentheses
- Includes unit tests
- [ ] TASK-002: Evaluate arithmetic forms
Dependencies:
- TASK-001
Description:
Evaluate parsed arithmetic expressions.
Acceptance Criteria:
- Supports `+`, `-`, `*`, and `/`
- Evaluates nested arithmetic
- Includes unit tests
- [ ] TASK-003: Add variables and definitions
Dependencies:
- TASK-002
Description:
Add an environment and support variable lookup and definitions.
Acceptance Criteria:
- Supports symbol lookup
- Supports `(define name value)`
- Keeps environment behavior tested
- [ ] TASK-004: Add conditionals
Dependencies:
- TASK-003
Description:
Implement simple truthiness and `if` expressions.
Acceptance Criteria:
- Supports `(if condition then else)`
- Handles false-like values consistently
- Includes tests for both branches
```
### 4. Add Prompt Files
`agents/planner.md`:
```markdown
You are the planning agent. Create a small, conservative plan for the task.
Do not write code. Include files to edit, tests to add, and risks.
```
`agents/implementer.md`:
```markdown
You are the implementation agent. Implement the smallest correct change.
Preserve existing behavior and include tests.
```
`agents/reviewer.md`:
```markdown
You are the review agent. Decide whether the task should pass, retry, or fail.
Output:
status: pass | fail | retry | escalate
reason: <short explanation>
next_stage: <optional stage id>
context_update: <compact useful note>
```
### 5. Add an Initial Passing Test File
```python
# tests/test_lisp.py
import unittest
class SmokeTests(unittest.TestCase):
def test_smoke(self):
self.assertTrue(True)
if __name__ == "__main__":
unittest.main()
```
### 6. Validate and Run
```bash
nightshift validate
nightshift status
nightshift run --task TASK-001
```
Run all currently runnable tasks:
```bash
nightshift run --all
```
Because the example uses fake agents, it will not actually implement the Lisp interpreter by itself. It is meant to verify the pipeline, dependency handling, reports, and artifacts before you connect a real command-backed agent.
### 7. Review Artifacts
After a run, inspect:
```text
.nightshift/runs/<run-id>/run-summary.md
.nightshift/runs/<run-id>/tasks/TASK-001/plan.md
.nightshift/runs/<run-id>/tasks/TASK-001/implementation-log.md
.nightshift/runs/<run-id>/tasks/TASK-001/test-output.txt
.nightshift/runs/<run-id>/tasks/TASK-001/review.md
.nightshift/runs/<run-id>/tasks/TASK-001/final-notes.md
```
The useful signal is whether NightShift selected the right task, respected dependencies, ran the command stage, wrote artifacts, updated task completion, and produced a clear summary.

View File

@ -17,6 +17,7 @@ The core MVP is implemented:
- `nightshift run` executes the next incomplete task. - `nightshift run` executes the next incomplete task.
- `nightshift run --task TASK-001` executes a specific task. - `nightshift run --task TASK-001` executes a specific task.
- Command-backed agents receive compact prompt bundles on stdin. - Command-backed agents receive compact prompt bundles on stdin.
- Ollama-backed agents can call local models with `backend: ollama`.
- Command stages run through allowlist and forbidden-fragment checks. - Command stages run through allowlist and forbidden-fragment checks.
- Runs create `.nightshift/` artifacts, task context, retry context, command output, agent output, final notes, and run summaries. - Runs create `.nightshift/` artifacts, task context, retry context, command output, agent output, final notes, and run summaries.
- Unit tests cover config, safety, tasks, artifacts, commands, agents, pipeline retries, context, and reports. - Unit tests cover config, safety, tasks, artifacts, commands, agents, pipeline retries, context, and reports.
@ -179,7 +180,7 @@ pipeline:
## Agent Backends ## Agent Backends
The MVP supports `backend: command`. NightShift supports `backend: command` and `backend: ollama`.
NightShift builds a prompt bundle containing: NightShift builds a prompt bundle containing:
@ -193,7 +194,17 @@ NightShift builds a prompt bundle containing:
- retry notes - retry notes
- output contract - output contract
The prompt is passed to the configured command on stdin. stdout, stderr, exit code, duration, and the prompt are persisted as artifacts. The prompt is passed to the configured command or local Ollama model on stdin. stdout, stderr, exit code, duration, and the prompt are persisted as artifacts.
Ollama example:
```yaml
agents:
planner:
backend: ollama
model: qwen2.5-coder:14b
system_prompt: agents/planner.md
```
Review agents should emit: Review agents should emit:
@ -264,17 +275,29 @@ Compile-check modules:
python -m compileall nightshift tests python -m compileall nightshift tests
``` ```
Optional read-only dashboard:
```bash
pip install flask
nightshift web
```
Additional docs:
- [Config reference](docs/config-reference.md)
- [Artifact review workflow](docs/artifact-review.md)
- [Troubleshooting](docs/troubleshooting.md)
- [Quickstart](QUICKSTART.md)
- [Quickstart Lisp example](examples/quickstart-lisp/)
## Roadmap ## Roadmap
Next major work: Next major work:
- real local model wrappers - richer local backend support beyond Ollama
- stronger git safety and diff capture
- task completion updates
- dependency handling
- richer status command
- prompt and model experimentation
- optional branch isolation - optional branch isolation
- longer-run multi-task reports - live dashboard enhancements
- stronger structured command definitions
- longer-run reporting and resumability
NightShift remains oriented around reviewable output, not blind autonomy. NightShift remains oriented around reviewable output, not blind autonomy.

34
docs/artifact-review.md Normal file
View File

@ -0,0 +1,34 @@
# Artifact Review Workflow
Start with:
```text
.nightshift/runs/<run-id>/run-summary.md
```
Then inspect the task directory:
```text
.nightshift/runs/<run-id>/tasks/<task-id>/
```
Useful artifacts:
- `task.md`: task snapshot.
- `context.md`: compact task context.
- `plan.md`: planning agent output.
- `implementation-log.md`: implementation agent output.
- `test-output.txt`: command stage transcript.
- `review.md`: review agent output.
- `stage-results.md`: structured stage status summary.
- `context-out.md`: retry/context summary.
- `final-notes.md`: final task report.
- `diff.patch`: git diff when available.
- `git-status-before.txt` / `git-status-after.txt`: git state snapshots.
- `task-completion.md`: whether the task was marked complete.
Run-level artifacts:
- `config.snapshot.yaml`
- `run-metadata.md`
- `prompts/<agent>.md`

61
docs/config-reference.md Normal file
View File

@ -0,0 +1,61 @@
# NightShift Config Reference
NightShift config is YAML.
## `project`
- `name`: project display name.
- `root`: project root, resolved relative to the config file.
- `task_file`: markdown task file inside the project root.
- `artifact_dir`: artifact directory inside the project root.
## `safety`
- `require_clean_worktree`: when true, block runs if `git status --short` is dirty or unavailable.
- `scoped_paths`: paths that must resolve inside the project root.
- `allowed_commands`: exact command-stage allowlist entries after whitespace normalization.
- `forbidden_commands`: dangerous fragments blocked before allowlist acceptance.
- `allowed_env`: optional environment variable names to pass to command stages.
## `experiment`
- `label`: optional run experiment label.
- `prompt_variant`: optional prompt variant label.
## `agents`
Supported backends:
- `command`: runs a local command with the prompt on stdin.
- `ollama`: runs `ollama run <model>` with the prompt on stdin.
Command agent:
```yaml
planner:
backend: command
command: echo
system_prompt: agents/planner.md
```
Ollama agent:
```yaml
planner:
backend: ollama
model: qwen2.5-coder:14b
system_prompt: agents/planner.md
```
## `pipeline`
- `max_task_retries`: task retry limit.
- `continue_on_task_failure`: for `run --all`, continue after failed/blocked tasks.
- `stages`: ordered state-machine stages.
Command stage options:
- `commands`: command strings.
- `shell`: defaults to true. Set false for argv-style execution.
- `timeout_seconds`: per-stage timeout override.
- `working_dir`: command working directory inside project root.

View File

@ -1086,6 +1086,51 @@ Notes:
--- ---
## Phase 22: Quickstart Test Project
* [ ] Add a guided quickstart project to `QUICKSTART.md`
* [ ] Recommend a small Python Lisp interpreter as the default test project
* [ ] Provide a multi-task `tasks.md` example
* [ ] Provide a matching `nightshift.yaml` example
* [ ] Provide suggested planner, implementer, and reviewer prompt files
* [ ] Include dependency examples across tasks
* [ ] Include commands for validation, `run --task`, and `run --all`
* [ ] Explain what artifacts the user should inspect after each run
Acceptance Criteria:
* A new user can create a small target repo and exercise NightShift end to end
* The project has multiple independently reviewable tasks
* Tasks are small enough for local/fake agents but realistic enough to test planning, implementation, tests, retries, artifacts, and dependencies
* The quickstart does not require external services
Recommended Project:
* A minimal Lisp interpreter in Python is a good test project because it is compact, incremental, testable, and naturally splits into parser, evaluator, environment, builtins, and error-handling tasks.
Alternative Projects:
* If the Lisp interpreter feels too language-theory focused, use a small INI/TOML-like config parser or a markdown todo CLI. Both are also compact and testable, but the Lisp interpreter gives better coverage of multi-step implementation and test generation.
---
## Phase 17-22 Implementation Status
Phases 17 through 22 are implemented.
Implemented capabilities:
* Ollama agent backend
* Experiment metadata and prompt snapshots
* Stronger command execution options
* Config reference, artifact review, and troubleshooting docs
* Read-only Flask dashboard entry point
* Complete quickstart Lisp example project
See `docs/devlog/phase17.md` through `docs/devlog/phase22.md` for implementation notes and decisions.
---
# Appendix A: Design Decisions and Rationale # Appendix A: Design Decisions and Rationale
## A.1 Local-first architecture ## A.1 Local-first architecture

21
docs/devlog/phase17.md Normal file
View File

@ -0,0 +1,21 @@
# Phase 17 Devlog: Local Model Backend
## Implemented
- Added first-class `backend: ollama` agent config support.
- Required `model` for Ollama agents.
- Kept `backend: command` unchanged.
- Reused the existing prompt bundle for Ollama.
- Invoked Ollama as `ollama run <model>` with prompt input on stdin.
- Persisted Ollama responses through the same agent artifact format.
- Added tests with mocked subprocess calls so Ollama is not required.
## Decisions Made
- Ollama is implemented as a local subprocess backend instead of an HTTP API wrapper.
- Missing Ollama executable returns a failed agent invocation artifact rather than crashing.
- Backend artifacts remain comparable across command and Ollama agents.
## Notes
- Real model quality and model availability are user environment concerns; tests do not require a running Ollama daemon.

18
docs/devlog/phase18.md Normal file
View File

@ -0,0 +1,18 @@
# Phase 18 Devlog: Prompt and Pipeline Experiments
## Implemented
- Added optional `experiment.label` and `experiment.prompt_variant` config fields.
- Snapshotted agent prompt files into `runs/<run-id>/prompts/`.
- Wrote `run-metadata.md` with project, experiment, agent backend, model, command, and prompt metadata.
- Included experiment metadata in final task reports and run summaries.
- Added tests for experiment config loading and prompt/metadata artifact creation.
## Decisions Made
- Experiment metadata is descriptive only and does not alter execution semantics.
- Prompt snapshots are per-run, not per-task, because agent definitions are run-level configuration.
## Notes
- This creates enough metadata to compare prompt/backend runs from artifacts without adding a database.

20
docs/devlog/phase19.md Normal file
View File

@ -0,0 +1,20 @@
# Phase 19 Devlog: Stronger Command Execution
## Implemented
- Added command stage `shell` option, defaulting to true for backward compatibility.
- Added command stage `timeout_seconds` override.
- Added command stage `working_dir` restricted to the project root.
- Added `safety.allowed_env` for optional environment variable pass-through.
- Added argv-style execution path when `shell: false`.
- Added tests for shell-free execution and working-directory restrictions.
## Decisions Made
- Existing string command config remains valid.
- `shell: false` still uses the same exact allowlist check before splitting into argv.
- `PATH` is preserved when an environment allowlist is configured so common executables remain discoverable.
## Notes
- Future hardening can move toward structured command definitions, but this phase avoids breaking current configs.

19
docs/devlog/phase20.md Normal file
View File

@ -0,0 +1,19 @@
# Phase 20 Devlog: Documentation and Examples Refresh
## Implemented
- Added `docs/config-reference.md`.
- Added `docs/artifact-review.md`.
- Added `docs/troubleshooting.md`.
- Added a complete `examples/quickstart-lisp/` project.
- Updated quickstart docs to point users at the example project.
## Decisions Made
- Documentation now distinguishes command and Ollama agent backends.
- The example project uses fake command agents so it can run without external services.
- The quickstart Lisp project is included as a target repo example rather than baked into NightShift runtime behavior.
## Notes
- The example is intended for pipeline testing and artifact review, not as a full Lisp implementation.

21
docs/devlog/phase21.md Normal file
View File

@ -0,0 +1,21 @@
# Phase 21 Devlog: Read-Only Web Dashboard
## Implemented
- Added `nightshift/web.py`.
- Added `nightshift web` CLI command.
- Implemented read-only artifact dashboard rendering.
- Listed runs from `.nightshift/runs/`.
- Rendered run summaries with simple auto-refresh.
- Added safe artifact reading that rejects path traversal.
- Added tests for missing runs, run listing, and artifact path handling.
## Decisions Made
- Flask is an optional dependency. The CLI gives a clear error if Flask is missing.
- The dashboard is artifact-driven and does not control pipeline execution.
- No websockets, authentication, mutation, or live process control were added.
## Notes
- This is intentionally a monitoring entry point, not an operations console.

19
docs/devlog/phase22.md Normal file
View File

@ -0,0 +1,19 @@
# Phase 22 Devlog: Quickstart Test Project
## Implemented
- Added a guided Lisp interpreter quickstart project to `QUICKSTART.md`.
- Added concrete quickstart project files under `examples/quickstart-lisp/`.
- Included multi-task `tasks.md` with dependencies.
- Included a matching `nightshift.yaml`.
- Included planner, implementer, and reviewer prompt files.
- Included an initial passing unittest smoke test.
## Decisions Made
- Kept the Lisp interpreter as the recommended test project because it is compact, incremental, and testable.
- Fake agents are used in the example so users can validate NightShift before connecting a real local model or coding agent.
## Notes
- Users can copy `examples/quickstart-lisp/` to a scratch directory and run `nightshift validate`, `nightshift status`, and `nightshift run --all`.

29
docs/troubleshooting.md Normal file
View File

@ -0,0 +1,29 @@
# Troubleshooting
## `command is not allowlisted`
Add the exact command to `safety.allowed_commands`. NightShift normalizes whitespace but otherwise expects exact matches.
## Command works in PowerShell but fails in NightShift
Command stages use Python subprocess execution. By default `shell: true` uses the platform shell, which is usually `cmd.exe` on Windows. Prefer Python module commands or set explicit shell commands.
## No runnable tasks
Check `nightshift status`. A task may be blocked by dependencies listed under `Dependencies:`.
## Git clean worktree failure
If `require_clean_worktree: true`, NightShift blocks dirty repositories before creating artifacts. Commit/stash changes or set it to false.
## Ollama backend fails
The `ollama` backend requires the `ollama` executable to be installed and the configured model to be available. Tests do not require Ollama.
## Flask dashboard fails
Install Flask:
```bash
pip install flask
```

View File

@ -0,0 +1,3 @@
You are the implementation agent.
Implement the smallest correct change and include tests.

View File

@ -0,0 +1,7 @@
You are the planning agent. Create a small conservative plan.
Include:
- relevant files
- implementation steps
- tests
- risks

View File

@ -0,0 +1,7 @@
You are the review agent.
Output:
status: pass | fail | retry | escalate
reason: <short explanation>
next_stage: <optional stage id>
context_update: <compact useful note>

View File

@ -0,0 +1,4 @@
"""Tiny Lisp quickstart target.
NightShift tasks in this example are intended to fill this module in.
"""

View File

@ -0,0 +1,68 @@
project:
name: tiny-lisp
root: .
task_file: tasks.md
artifact_dir: .nightshift
safety:
require_clean_worktree: false
scoped_paths:
- .
allowed_commands:
- python -m unittest discover -v
forbidden_commands:
- rm -rf
- git push
- curl | bash
experiment:
label: quickstart-lisp
prompt_variant: fake-agent-v1
agents:
planner:
backend: command
command: echo
system_prompt: agents/planner.md
implementer:
backend: command
command: echo
system_prompt: agents/implementer.md
reviewer:
backend: command
command: python -c "print('status: pass'); print('reason: quickstart reviewer accepted artifacts')"
system_prompt: agents/reviewer.md
pipeline:
max_task_retries: 1
continue_on_task_failure: false
stages:
- id: plan
type: agent
agent: planner
output: plan.md
- id: implement
type: agent
agent: implementer
output: implementation-log.md
- id: test
type: command
commands:
- python -m unittest discover -v
output: test-output.txt
shell: true
timeout_seconds: 60
- id: review
type: agent_review
agent: reviewer
on_fail: implement
output: review.md
- id: summarize
type: summarize
output: final-notes.md

View File

@ -0,0 +1,39 @@
# Tasks
- [ ] TASK-001: Parse Lisp expressions
Description:
Implement tokenization and parsing for a tiny Lisp subset.
Acceptance Criteria:
- Parses numbers
- Parses symbols
- Parses nested lists
- Raises useful errors for unbalanced parentheses
- Includes unit tests
- [ ] TASK-002: Evaluate arithmetic forms
Dependencies:
- TASK-001
Description:
Evaluate parsed arithmetic expressions.
Acceptance Criteria:
- Supports `+`, `-`, `*`, and `/`
- Evaluates nested arithmetic
- Includes unit tests
- [ ] TASK-003: Add variables and definitions
Dependencies:
- TASK-002
Description:
Add an environment and support variable lookup and definitions.
Acceptance Criteria:
- Supports symbol lookup
- Supports `(define name value)`
- Keeps environment behavior tested

View File

@ -0,0 +1 @@
"""Quickstart Lisp test suite."""

View File

@ -0,0 +1,10 @@
import unittest
class SmokeTests(unittest.TestCase):
def test_smoke(self):
self.assertTrue(True)
if __name__ == "__main__":
unittest.main()

View File

@ -33,8 +33,8 @@ class AgentInvocation:
class AgentExecutor: class AgentExecutor:
"""Execute configured agents. """Execute configured agents.
v1 supports the `command` backend only. The command receives the prompt Supports command-backed agents and a first-class Ollama backend. Both
bundle on stdin and its stdout/stderr are persisted as the stage artifact. receive the same prompt bundle on stdin and persist comparable artifacts.
""" """
def __init__( def __init__(
@ -64,12 +64,14 @@ class AgentExecutor:
agent = self.agents.get(stage.agent) agent = self.agents.get(stage.agent)
if agent is None: if agent is None:
raise AgentError(f"Agent error: unknown agent '{stage.agent}' for stage '{stage.id}'.") raise AgentError(f"Agent error: unknown agent '{stage.agent}' for stage '{stage.id}'.")
if agent.backend != "command": if agent.backend not in {"command", "ollama"}:
raise AgentError( raise AgentError(
f"Agent error: agent '{agent.id}' uses unsupported backend '{agent.backend}'." f"Agent error: agent '{agent.id}' uses unsupported backend '{agent.backend}'."
) )
if not agent.command: if agent.backend == "command" and not agent.command:
raise AgentError(f"Agent error: command backend agent '{agent.id}' has no command.") raise AgentError(f"Agent error: command backend agent '{agent.id}' has no command.")
if agent.backend == "ollama" and not agent.model:
raise AgentError(f"Agent error: ollama backend agent '{agent.id}' has no model.")
system_prompt = self._read_system_prompt(agent) system_prompt = self._read_system_prompt(agent)
prompt = build_prompt_bundle( prompt = build_prompt_bundle(
@ -131,6 +133,13 @@ class AgentExecutor:
return self.artifacts.project_context_path.read_text(encoding="utf-8") return self.artifacts.project_context_path.read_text(encoding="utf-8")
def _invoke(self, agent: AgentConfig, prompt: str) -> AgentInvocation: def _invoke(self, agent: AgentConfig, prompt: str) -> AgentInvocation:
if agent.backend == "ollama":
return self._invoke_ollama(agent, prompt)
return self._invoke_command(agent, prompt)
def _invoke_command(self, agent: AgentConfig, prompt: str) -> AgentInvocation:
if not agent.command:
raise AgentError(f"Agent error: command backend agent '{agent.id}' has no command.")
started = time.monotonic() started = time.monotonic()
try: try:
completed = subprocess.run( completed = subprocess.run(
@ -165,6 +174,54 @@ class AgentExecutor:
timed_out=True, timed_out=True,
) )
def _invoke_ollama(self, agent: AgentConfig, prompt: str) -> AgentInvocation:
if not agent.model:
raise AgentError(f"Agent error: ollama backend agent '{agent.id}' has no model.")
command = f"ollama run {agent.model}"
started = time.monotonic()
try:
completed = subprocess.run(
["ollama", "run", agent.model],
cwd=self.project_root,
input=prompt,
capture_output=True,
text=True,
timeout=self.timeout_seconds,
)
duration = time.monotonic() - started
return AgentInvocation(
agent_id=agent.id,
command=command,
prompt=prompt,
exit_code=completed.returncode,
stdout=completed.stdout,
stderr=completed.stderr,
duration_seconds=duration,
)
except FileNotFoundError as exc:
duration = time.monotonic() - started
return AgentInvocation(
agent_id=agent.id,
command=command,
prompt=prompt,
exit_code=127,
stdout="",
stderr=str(exc),
duration_seconds=duration,
)
except subprocess.TimeoutExpired as exc:
duration = time.monotonic() - started
return AgentInvocation(
agent_id=agent.id,
command=command,
prompt=prompt,
exit_code=-1,
stdout=_coerce_output(exc.stdout),
stderr=_coerce_output(exc.stderr),
duration_seconds=duration,
timed_out=True,
)
def build_prompt_bundle( def build_prompt_bundle(
system_prompt: str, system_prompt: str,

View File

@ -68,6 +68,34 @@ class ArtifactStore:
shutil.copyfile(source, self.config_snapshot_path) shutil.copyfile(source, self.config_snapshot_path)
return self.config_snapshot_path return self.config_snapshot_path
def write_prompt_snapshots(self, prompt_paths: dict[str, Path]) -> list[Path]:
"""Copy agent prompt files into the run artifact directory."""
self.initialize_run()
prompts_dir = self.run_dir / "prompts"
prompts_dir.mkdir(parents=True, exist_ok=True)
written: list[Path] = []
for agent_id, prompt_path in sorted(prompt_paths.items()):
source = prompt_path.resolve()
try:
source.relative_to(self.project_root)
except ValueError as exc:
raise ArtifactError(
f"Artifact error: prompt path is outside project root: {source}"
) from exc
if not source.exists():
raise ArtifactError(f"Artifact error: prompt path does not exist: {source}")
target = prompts_dir / f"{_safe_artifact_segment(agent_id, 'agent id')}.md"
shutil.copyfile(source, target)
written.append(target)
return written
def write_run_metadata(self, content: str, filename: str = "run-metadata.md") -> Path:
self.initialize_run()
path = self.run_dir / filename
path.write_text(content, encoding="utf-8")
return path
def create_task_dir(self, task_id: str) -> TaskArtifactPaths: def create_task_dir(self, task_id: str) -> TaskArtifactPaths:
"""Create the artifact directory for one task.""" """Create the artifact directory for one task."""

View File

@ -18,6 +18,7 @@ from .tasks import (
select_task_by_id, select_task_by_id,
validate_task_dependencies, validate_task_dependencies,
) )
from .web import create_app
def build_parser() -> argparse.ArgumentParser: def build_parser() -> argparse.ArgumentParser:
@ -41,6 +42,11 @@ def build_parser() -> argparse.ArgumentParser:
status_parser = subparsers.add_parser("status", help="Inspect NightShift project status.") status_parser = subparsers.add_parser("status", help="Inspect NightShift project status.")
status_parser.add_argument("--config", default="nightshift.yaml", help="Config file to inspect.") status_parser.add_argument("--config", default="nightshift.yaml", help="Config file to inspect.")
web_parser = subparsers.add_parser("web", help="Start a read-only artifact dashboard.")
web_parser.add_argument("--config", default="nightshift.yaml", help="Config file to inspect.")
web_parser.add_argument("--host", default="127.0.0.1", help="Host to bind.")
web_parser.add_argument("--port", type=int, default=8765, help="Port to bind.")
return parser return parser
@ -101,6 +107,12 @@ def main(argv: list[str] | None = None) -> int:
print(format_status(build_status(config, tasks))) print(format_status(build_status(config, tasks)))
return 0 return 0
if args.command == "web":
config = validate_config(args.config)
app = create_app(config.project.root, config.project.artifact_dir)
app.run(host=args.host, port=args.port)
return 0
except NightShiftError as exc: except NightShiftError as exc:
print(str(exc), file=sys.stderr) print(str(exc), file=sys.stderr)
return 1 return 1

View File

@ -3,14 +3,16 @@
from __future__ import annotations from __future__ import annotations
from dataclasses import dataclass from dataclasses import dataclass
import os
from pathlib import Path from pathlib import Path
import shlex
import subprocess import subprocess
import time import time
from .artifacts import ArtifactStore from .artifacts import ArtifactStore
from .config import SafetyConfig, StageConfig from .config import SafetyConfig, StageConfig
from .errors import CommandError, SafetyError from .errors import CommandError, SafetyError
from .safety import ensure_command_allowed, resolve_project_root from .safety import ensure_command_allowed, resolve_inside_root, resolve_project_root
from .stages import StageResult from .stages import StageResult
@ -55,11 +57,17 @@ class CommandExecutor:
reason = "All commands passed." reason = "All commands passed."
for command in stage.commands: for command in stage.commands:
run = self.run_command(command) run = self.run_command(
command,
shell=stage.shell,
timeout_seconds=stage.timeout_seconds,
working_dir=stage.working_dir,
)
runs.append(run) runs.append(run)
if run.timed_out: if run.timed_out:
status = "fail" status = "fail"
reason = f"Command timed out after {self.timeout_seconds}s: {run.command}" timeout = stage.timeout_seconds or self.timeout_seconds
reason = f"Command timed out after {timeout}s: {run.command}"
break break
if run.exit_code != 0: if run.exit_code != 0:
status = "fail" status = "fail"
@ -79,7 +87,13 @@ class CommandExecutor:
output_path=str(output_path.relative_to(self.project_root)), output_path=str(output_path.relative_to(self.project_root)),
) )
def run_command(self, command: str) -> CommandRun: def run_command(
self,
command: str,
shell: bool = True,
timeout_seconds: int | None = None,
working_dir: Path | None = None,
) -> CommandRun:
try: try:
normalized = ensure_command_allowed( normalized = ensure_command_allowed(
command, command,
@ -89,15 +103,30 @@ class CommandExecutor:
except SafetyError as exc: except SafetyError as exc:
raise CommandError(str(exc)) from exc raise CommandError(str(exc)) from exc
cwd = self.project_root
if working_dir is not None:
try:
cwd = resolve_inside_root(self.project_root, working_dir, "command working_dir")
except SafetyError as exc:
raise CommandError(str(exc)) from exc
timeout = timeout_seconds or self.timeout_seconds
args: str | list[str] = normalized if shell else shlex.split(normalized)
env = None
if self.safety.allowed_env:
env = {name: os.environ[name] for name in self.safety.allowed_env if name in os.environ}
if "PATH" in os.environ:
env.setdefault("PATH", os.environ["PATH"])
started = time.monotonic() started = time.monotonic()
try: try:
completed = subprocess.run( completed = subprocess.run(
normalized, args,
cwd=self.project_root, cwd=cwd,
shell=True, shell=shell,
capture_output=True, capture_output=True,
text=True, text=True,
timeout=self.timeout_seconds, timeout=timeout,
env=env,
) )
duration = time.monotonic() - started duration = time.monotonic() - started
return CommandRun( return CommandRun(

View File

@ -32,6 +32,7 @@ class SafetyConfig:
scoped_paths: tuple[str, ...] scoped_paths: tuple[str, ...]
allowed_commands: tuple[str, ...] allowed_commands: tuple[str, ...]
forbidden_commands: tuple[str, ...] forbidden_commands: tuple[str, ...]
allowed_env: tuple[str, ...] = ()
@dataclass(frozen=True) @dataclass(frozen=True)
@ -52,6 +53,15 @@ class StageConfig:
commands: tuple[str, ...] = () commands: tuple[str, ...] = ()
output: str | None = None output: str | None = None
on_fail: str | None = None on_fail: str | None = None
shell: bool = True
timeout_seconds: int | None = None
working_dir: Path | None = None
@dataclass(frozen=True)
class ExperimentConfig:
label: str | None = None
prompt_variant: str | None = None
@dataclass(frozen=True) @dataclass(frozen=True)
@ -68,6 +78,7 @@ class NightShiftConfig:
safety: SafetyConfig safety: SafetyConfig
agents: dict[str, AgentConfig] agents: dict[str, AgentConfig]
pipeline: PipelineConfig pipeline: PipelineConfig
experiment: ExperimentConfig = ExperimentConfig()
AGENT_STAGE_TYPES = {"agent", "agent_review", "review"} AGENT_STAGE_TYPES = {"agent", "agent_review", "review"}
@ -110,6 +121,11 @@ def validate_config(path: str | Path = "nightshift.yaml") -> NightShiftConfig:
) )
for stage in config.pipeline.stages: for stage in config.pipeline.stages:
if stage.working_dir is not None:
try:
resolve_inside_root(root, stage.working_dir, f"stage '{stage.id}' working_dir")
except SafetyError as exc:
raise ConfigError(f"Config error: {exc}") from exc
for command in stage.commands: for command in stage.commands:
try: try:
ensure_command_allowed( ensure_command_allowed(
@ -153,6 +169,7 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig:
forbidden_commands=_string_tuple( forbidden_commands=_string_tuple(
safety_raw.get("forbidden_commands", []), "safety.forbidden_commands" safety_raw.get("forbidden_commands", []), "safety.forbidden_commands"
), ),
allowed_env=_string_tuple(safety_raw.get("allowed_env", []), "safety.allowed_env"),
) )
agents_raw = _require_mapping(raw["agents"], "agents") agents_raw = _require_mapping(raw["agents"], "agents")
@ -163,25 +180,41 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig:
agent_raw = _require_mapping(agent_raw_value, f"agents.{agent_id}") agent_raw = _require_mapping(agent_raw_value, f"agents.{agent_id}")
backend = _require_string(agent_raw, "backend", f"agents.{agent_id}") backend = _require_string(agent_raw, "backend", f"agents.{agent_id}")
command = _optional_string(agent_raw.get("command"), f"agents.{agent_id}.command") command = _optional_string(agent_raw.get("command"), f"agents.{agent_id}.command")
if backend != "command": model = _optional_string(agent_raw.get("model"), f"agents.{agent_id}.model")
if backend not in {"command", "ollama"}:
raise ConfigError( raise ConfigError(
f"Config error: agent '{agent_id}' uses unsupported backend '{backend}'. " f"Config error: agent '{agent_id}' uses unsupported backend '{backend}'. "
"Supported backends: command." "Supported backends: command, ollama."
) )
if command is None: if backend == "command" and command is None:
raise ConfigError( raise ConfigError(
f"Config error: command backend agent '{agent_id}' must define command." f"Config error: command backend agent '{agent_id}' must define command."
) )
if backend == "ollama" and model is None:
raise ConfigError(
f"Config error: ollama backend agent '{agent_id}' must define model."
)
system_prompt = Path(_require_string(agent_raw, "system_prompt", f"agents.{agent_id}")) system_prompt = Path(_require_string(agent_raw, "system_prompt", f"agents.{agent_id}"))
agents[str(agent_id)] = AgentConfig( agents[str(agent_id)] = AgentConfig(
id=str(agent_id), id=str(agent_id),
backend=backend, backend=backend,
command=command, command=command,
system_prompt=system_prompt, system_prompt=system_prompt,
model=_optional_string(agent_raw.get("model"), f"agents.{agent_id}.model"), model=model,
role=_optional_string(agent_raw.get("role"), f"agents.{agent_id}.role"), role=_optional_string(agent_raw.get("role"), f"agents.{agent_id}.role"),
) )
experiment_raw = raw.get("experiment", {})
if experiment_raw is None:
experiment_raw = {}
experiment_raw = _require_mapping(experiment_raw, "experiment")
experiment = ExperimentConfig(
label=_optional_string(experiment_raw.get("label"), "experiment.label"),
prompt_variant=_optional_string(
experiment_raw.get("prompt_variant"), "experiment.prompt_variant"
),
)
pipeline_raw = _require_mapping(raw["pipeline"], "pipeline") pipeline_raw = _require_mapping(raw["pipeline"], "pipeline")
max_task_retries = _optional_int( max_task_retries = _optional_int(
pipeline_raw.get("max_task_retries", 0), pipeline_raw.get("max_task_retries", 0),
@ -218,6 +251,13 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig:
agent = _optional_string(stage_raw.get("agent"), f"{stage_context}.agent") agent = _optional_string(stage_raw.get("agent"), f"{stage_context}.agent")
commands = _string_tuple(stage_raw.get("commands", []), f"{stage_context}.commands") commands = _string_tuple(stage_raw.get("commands", []), f"{stage_context}.commands")
timeout_seconds = _optional_int_or_none(
stage_raw.get("timeout_seconds"),
f"{stage_context}.timeout_seconds",
)
if timeout_seconds is not None and timeout_seconds <= 0:
raise ConfigError(f"Config error: {stage_context}.timeout_seconds must be greater than zero.")
working_dir_raw = _optional_string(stage_raw.get("working_dir"), f"{stage_context}.working_dir")
if stage_type in AGENT_STAGE_TYPES: if stage_type in AGENT_STAGE_TYPES:
if agent is None: if agent is None:
@ -244,6 +284,9 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig:
commands=commands, commands=commands,
output=_optional_string(stage_raw.get("output"), f"{stage_context}.output"), output=_optional_string(stage_raw.get("output"), f"{stage_context}.output"),
on_fail=_optional_string(stage_raw.get("on_fail"), f"{stage_context}.on_fail"), on_fail=_optional_string(stage_raw.get("on_fail"), f"{stage_context}.on_fail"),
shell=_optional_bool(stage_raw.get("shell", True), f"{stage_context}.shell"),
timeout_seconds=timeout_seconds,
working_dir=Path(working_dir_raw) if working_dir_raw else None,
) )
) )
@ -264,6 +307,7 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig:
stages=tuple(stages), stages=tuple(stages),
continue_on_task_failure=continue_on_task_failure, continue_on_task_failure=continue_on_task_failure,
), ),
experiment=experiment,
) )
@ -442,6 +486,12 @@ def _optional_int(value: Any, context: str) -> int:
return value return value
def _optional_int_or_none(value: Any, context: str) -> int | None:
if value is None:
return None
return _optional_int(value, context)
def _string_tuple(value: Any, context: str) -> tuple[str, ...]: def _string_tuple(value: Any, context: str) -> tuple[str, ...]:
if value is None: if value is None:
return () return ()

View File

@ -50,7 +50,12 @@ class PipelineRunner:
self.config = config self.config = config
self.artifacts = artifacts or ArtifactStore.from_config(config) self.artifacts = artifacts or ArtifactStore.from_config(config)
self.context = ContextManager(self.artifacts) self.context = ContextManager(self.artifacts)
self.reports = ReportGenerator(config.project.root, self.artifacts) self.reports = ReportGenerator(
config.project.root,
self.artifacts,
experiment_label=config.experiment.label,
prompt_variant=config.experiment.prompt_variant,
)
self.agent_executor = AgentExecutor( self.agent_executor = AgentExecutor(
config.project.root, config.project.root,
config.agents, config.agents,
@ -68,6 +73,13 @@ class PipelineRunner:
ensure_clean_worktree(self.config.project.root, self.config.safety.require_clean_worktree) ensure_clean_worktree(self.config.project.root, self.config.safety.require_clean_worktree)
self.artifacts.initialize_run() self.artifacts.initialize_run()
self.artifacts.write_config_snapshot(self.config.path) self.artifacts.write_config_snapshot(self.config.path)
self.artifacts.write_prompt_snapshots(
{
agent_id: self.config.project.root / agent.system_prompt
for agent_id, agent in self.config.agents.items()
}
)
self.artifacts.write_run_metadata(format_run_metadata(self.config))
self.artifacts.write_task_snapshot(task) self.artifacts.write_task_snapshot(task)
write_git_artifacts(self.artifacts, task.id, "before") write_git_artifacts(self.artifacts, task.id, "before")
self.context.ensure_project_context() self.context.ensure_project_context()
@ -333,3 +345,29 @@ def format_aggregate_run_summary(results: list[PipelineResult], status: str, rea
) )
lines.append("") lines.append("")
return "\n".join(lines) return "\n".join(lines)
def format_run_metadata(config: NightShiftConfig) -> str:
lines = [
"# Run Metadata",
"",
f"Project: {config.project.name}",
f"Experiment label: {config.experiment.label or ''}",
f"Prompt variant: {config.experiment.prompt_variant or ''}",
"",
"## Agents",
"",
]
for agent in config.agents.values():
lines.extend(
[
f"### {agent.id}",
"",
f"- Backend: {agent.backend}",
f"- Model: {agent.model or ''}",
f"- Command: {agent.command or ''}",
f"- System prompt: {agent.system_prompt}",
"",
]
)
return "\n".join(lines)

View File

@ -21,9 +21,17 @@ class TaskReport:
class ReportGenerator: class ReportGenerator:
"""Write task and run summaries from pipeline results.""" """Write task and run summaries from pipeline results."""
def __init__(self, project_root: Path, artifacts: ArtifactStore) -> None: def __init__(
self,
project_root: Path,
artifacts: ArtifactStore,
experiment_label: str | None = None,
prompt_variant: str | None = None,
) -> None:
self.project_root = project_root self.project_root = project_root
self.artifacts = artifacts self.artifacts = artifacts
self.experiment_label = experiment_label
self.prompt_variant = prompt_variant
def write_reports( def write_reports(
self, self,
@ -51,6 +59,8 @@ class ReportGenerator:
modified_files=modified_files, modified_files=modified_files,
stage_results_path=stage_results_path, stage_results_path=stage_results_path,
context_out_path=context_out_path, context_out_path=context_out_path,
experiment_label=self.experiment_label,
prompt_variant=self.prompt_variant,
), ),
) )
self.artifacts.run_summary_path.write_text( self.artifacts.run_summary_path.write_text(
@ -62,6 +72,8 @@ class ReportGenerator:
modified_files=modified_files, modified_files=modified_files,
final_notes_path=final_notes_path, final_notes_path=final_notes_path,
stage_results_path=stage_results_path, stage_results_path=stage_results_path,
experiment_label=self.experiment_label,
prompt_variant=self.prompt_variant,
), ),
encoding="utf-8", encoding="utf-8",
) )
@ -109,6 +121,8 @@ def format_task_report(
modified_files: list[str], modified_files: list[str],
stage_results_path: Path, stage_results_path: Path,
context_out_path: Path | None, context_out_path: Path | None,
experiment_label: str | None = None,
prompt_variant: str | None = None,
) -> str: ) -> str:
stage_lines = "\n".join( stage_lines = "\n".join(
f"- `{result.stage_id}`: {result.status} ({result.reason})" for result in stage_results f"- `{result.stage_id}`: {result.status} ({result.reason})" for result in stage_results
@ -130,6 +144,11 @@ def format_task_report(
f"Retry count: {retry_count}", f"Retry count: {retry_count}",
f"Reason: {reason}", f"Reason: {reason}",
"", "",
"## Experiment",
"",
f"- Label: {experiment_label or ''}",
f"- Prompt variant: {prompt_variant or ''}",
"",
"## Acceptance Criteria", "## Acceptance Criteria",
"", "",
"\n".join(f"- {item}" for item in task.acceptance_criteria), "\n".join(f"- {item}" for item in task.acceptance_criteria),
@ -158,6 +177,8 @@ def format_run_summary(
modified_files: list[str], modified_files: list[str],
final_notes_path: Path, final_notes_path: Path,
stage_results_path: Path, stage_results_path: Path,
experiment_label: str | None = None,
prompt_variant: str | None = None,
) -> str: ) -> str:
modified = "\n".join(f"- `{path}`" for path in modified_files) if modified_files else "- Unavailable or none detected" modified = "\n".join(f"- `{path}`" for path in modified_files) if modified_files else "- Unavailable or none detected"
return "\n".join( return "\n".join(
@ -168,6 +189,8 @@ def format_run_summary(
f"- Status: {status}", f"- Status: {status}",
f"- Retry count: {retry_count}", f"- Retry count: {retry_count}",
f"- Reason: {reason}", f"- Reason: {reason}",
f"- Experiment label: {experiment_label or ''}",
f"- Prompt variant: {prompt_variant or ''}",
"", "",
"## Modified Files", "## Modified Files",
"", "",

81
nightshift/web.py Normal file
View File

@ -0,0 +1,81 @@
"""Read-only web dashboard for NightShift artifacts."""
from __future__ import annotations
from dataclasses import dataclass
from html import escape
from pathlib import Path
from .errors import NightShiftError
@dataclass(frozen=True)
class RunInfo:
name: str
path: Path
summary: str
def list_runs(artifact_dir: str | Path) -> list[RunInfo]:
runs_dir = Path(artifact_dir) / "runs"
if not runs_dir.exists():
return []
runs: list[RunInfo] = []
for path in sorted((item for item in runs_dir.iterdir() if item.is_dir()), reverse=True):
summary_path = path / "run-summary.md"
summary = summary_path.read_text(encoding="utf-8") if summary_path.exists() else "No run summary yet."
runs.append(RunInfo(name=path.name, path=path, summary=summary))
return runs
def read_artifact(run_path: Path, relative_path: str) -> str:
candidate = (run_path / relative_path).resolve()
try:
candidate.relative_to(run_path.resolve())
except ValueError:
return "Artifact path escapes run directory."
if not candidate.exists() or not candidate.is_file():
return "Artifact not found."
return candidate.read_text(encoding="utf-8", errors="replace")
def render_dashboard(artifact_dir: str | Path) -> str:
runs = list_runs(artifact_dir)
body = ["<h1>NightShift Dashboard</h1>", '<meta http-equiv="refresh" content="5">']
if not runs:
body.append("<p>No runs found.</p>")
for run in runs:
body.extend(
[
f"<section><h2>{escape(run.name)}</h2>",
"<pre>",
escape(run.summary),
"</pre>",
"</section>",
]
)
return "\n".join(["<!doctype html>", "<html><body>", *body, "</body></html>"])
def create_app(project_root: str | Path = ".", artifact_dir: str | Path = ".nightshift"):
try:
from flask import Flask, Response
except ModuleNotFoundError as exc:
raise NightShiftError(
"Web dashboard requires Flask. Install it with `pip install flask`."
) from exc
root = Path(project_root).resolve()
artifacts = root / artifact_dir
app = Flask(__name__)
@app.get("/")
def index():
return Response(render_dashboard(artifacts), mimetype="text/html")
@app.get("/runs/<run_id>/<path:artifact_path>")
def artifact(run_id: str, artifact_path: str):
content = read_artifact(artifacts / "runs" / run_id, artifact_path)
return Response(f"<pre>{escape(content)}</pre>", mimetype="text/html")
return app

View File

@ -1,6 +1,7 @@
from pathlib import Path from pathlib import Path
import tempfile import tempfile
import unittest import unittest
from unittest.mock import patch
from nightshift.agents import AgentExecutor, build_prompt_bundle, parse_review_output from nightshift.agents import AgentExecutor, build_prompt_bundle, parse_review_output
from nightshift.artifacts import ArtifactStore from nightshift.artifacts import ArtifactStore
@ -94,6 +95,42 @@ class AgentExecutorTests(unittest.TestCase):
self.assertEqual(next_stage, "implement") self.assertEqual(next_stage, "implement")
self.assertEqual(context_update, "Fix tests") self.assertEqual(context_update, "Fix tests")
def test_ollama_agent_invocation_uses_model_without_real_ollama(self) -> None:
with tempfile.TemporaryDirectory() as directory:
root = Path(directory)
prompt_path = root / "planner.md"
prompt_path.write_text("Plan carefully.", encoding="utf-8")
artifacts = ArtifactStore(root, ".nightshift", run_id="test-run")
executor = AgentExecutor(
root,
{
"planner": AgentConfig(
id="planner",
backend="ollama",
command=None,
model="tiny-model",
system_prompt=Path("planner.md"),
)
},
artifacts,
)
task = parse_tasks(TASK_MD)[0]
stage = StageConfig(id="plan", type="agent", agent="planner", output="plan.md")
completed = type(
"Completed",
(),
{"returncode": 0, "stdout": "ollama output", "stderr": ""},
)()
with patch("nightshift.agents.subprocess.run", return_value=completed) as run:
result = executor.run_stage(stage, task)
self.assertEqual(result.status, "pass")
run.assert_called_once()
self.assertEqual(run.call_args.args[0], ["ollama", "run", "tiny-model"])
output = (root / result.output_path).read_text(encoding="utf-8")
self.assertIn("ollama run tiny-model", output)
if __name__ == "__main__": if __name__ == "__main__":
unittest.main() unittest.main()

View File

@ -119,6 +119,37 @@ class CommandExecutorTests(unittest.TestCase):
output = (root / result.output_path).read_text(encoding="utf-8") output = (root / result.output_path).read_text(encoding="utf-8")
self.assertIn("Timed out: true", output) self.assertIn("Timed out: true", output)
def test_command_stage_can_run_without_shell_and_with_working_dir(self) -> None:
with tempfile.TemporaryDirectory() as directory:
root = Path(directory)
work = root / "work"
work.mkdir()
command = 'python -c "import pathlib; print(pathlib.Path.cwd().name)"'
executor = CommandExecutor(
root,
SafetyConfig(
require_clean_worktree=False,
scoped_paths=(".",),
allowed_commands=(command,),
forbidden_commands=("rm -rf",),
),
ArtifactStore(root, ".nightshift", run_id="test-run"),
)
stage = StageConfig(
id="test",
type="command",
commands=(command,),
output="test-output.txt",
shell=False,
working_dir=Path("work"),
)
result = executor.run_stage(stage, "TASK-001")
self.assertEqual(result.status, "pass")
output = (root / result.output_path).read_text(encoding="utf-8")
self.assertIn("work", output)
if __name__ == "__main__": if __name__ == "__main__":
unittest.main() unittest.main()

View File

@ -128,6 +128,66 @@ class ConfigTests(unittest.TestCase):
with self.assertRaisesRegex(ConfigError, "must define command"): with self.assertRaisesRegex(ConfigError, "must define command"):
load_config(config_path) load_config(config_path)
def test_ollama_backend_requires_model(self) -> None:
with tempfile.TemporaryDirectory() as directory:
root = Path(directory)
init_project(root)
config_path = root / "nightshift.yaml"
config_path.write_text(
config_path.read_text(encoding="utf-8").replace(
"backend: command\n command: echo",
"backend: ollama",
1,
),
encoding="utf-8",
)
with self.assertRaisesRegex(ConfigError, "must define model"):
load_config(config_path)
def test_ollama_backend_and_experiment_metadata_load(self) -> None:
with tempfile.TemporaryDirectory() as directory:
root = Path(directory)
init_project(root)
config_path = root / "nightshift.yaml"
text = config_path.read_text(encoding="utf-8").replace(
"backend: command\n command: echo",
"backend: ollama\n model: qwen2.5-coder:14b",
1,
)
text = text.replace(
"agents:",
"experiment:\n label: local-test\n prompt_variant: v1\n\nagents:",
)
config_path.write_text(text, encoding="utf-8")
config = load_config(config_path)
self.assertEqual(config.agents["planner"].backend, "ollama")
self.assertEqual(config.agents["planner"].model, "qwen2.5-coder:14b")
self.assertEqual(config.experiment.label, "local-test")
def test_command_stage_options_load(self) -> None:
with tempfile.TemporaryDirectory() as directory:
root = Path(directory)
init_project(root)
config_path = root / "nightshift.yaml"
config_path.write_text(
config_path.read_text(encoding="utf-8").replace(
" output: test-output.txt",
" output: test-output.txt\n shell: false\n timeout_seconds: 30\n working_dir: .",
1,
),
encoding="utf-8",
)
config = load_config(config_path)
test_stage = next(stage for stage in config.pipeline.stages if stage.id == "test")
self.assertFalse(test_stage.shell)
self.assertEqual(test_stage.timeout_seconds, 30)
self.assertEqual(test_stage.working_dir, Path("."))
def test_non_command_stage_cannot_define_commands(self) -> None: def test_non_command_stage_cannot_define_commands(self) -> None:
with tempfile.TemporaryDirectory() as directory: with tempfile.TemporaryDirectory() as directory:
root = Path(directory) root = Path(directory)

View File

@ -93,6 +93,8 @@ class PipelineRunnerTests(unittest.TestCase):
self.assertEqual(result.retry_count, 0) self.assertEqual(result.retry_count, 0)
self.assertTrue((root / ".nightshift" / "runs" / "test-run" / "tasks" / task.id / "plan.md").exists()) self.assertTrue((root / ".nightshift" / "runs" / "test-run" / "tasks" / task.id / "plan.md").exists())
self.assertTrue((root / ".nightshift" / "runs" / "test-run" / "tasks" / task.id / "stage-results.md").exists()) self.assertTrue((root / ".nightshift" / "runs" / "test-run" / "tasks" / task.id / "stage-results.md").exists())
self.assertTrue((root / ".nightshift" / "runs" / "test-run" / "prompts" / "planner.md").exists())
self.assertTrue((root / ".nightshift" / "runs" / "test-run" / "run-metadata.md").exists())
self.assertTrue((root / ".nightshift" / "runs" / "test-run" / "tasks" / task.id / "context.md").exists()) self.assertTrue((root / ".nightshift" / "runs" / "test-run" / "tasks" / task.id / "context.md").exists())
self.assertTrue((root / ".nightshift" / "runs" / "test-run" / "tasks" / task.id / "context-out.md").exists()) self.assertTrue((root / ".nightshift" / "runs" / "test-run" / "tasks" / task.id / "context-out.md").exists())
self.assertIn( self.assertIn(

33
tests/test_web.py Normal file
View File

@ -0,0 +1,33 @@
from pathlib import Path
import tempfile
import unittest
from nightshift.artifacts import ArtifactStore
from nightshift.web import list_runs, read_artifact, render_dashboard
class WebDashboardTests(unittest.TestCase):
def test_render_dashboard_handles_missing_runs(self) -> None:
with tempfile.TemporaryDirectory() as directory:
html = render_dashboard(Path(directory) / ".nightshift")
self.assertIn("No runs found", html)
def test_lists_runs_and_reads_artifacts_safely(self) -> None:
with tempfile.TemporaryDirectory() as directory:
root = Path(directory)
artifacts = ArtifactStore(root, ".nightshift", run_id="test-run")
artifacts.initialize_run()
artifacts.run_summary_path.write_text("# Summary\n\nok", encoding="utf-8")
runs = list_runs(root / ".nightshift")
content = read_artifact(root / ".nightshift" / "runs" / "test-run", "run-summary.md")
escaped = read_artifact(root / ".nightshift" / "runs" / "test-run", "../project-context.md")
self.assertEqual(len(runs), 1)
self.assertIn("ok", content)
self.assertIn("escapes", escaped)
if __name__ == "__main__":
unittest.main()