Repo Lookup, Request Context, Planner, Context Stage, QoL improvements

- Added operational run logging via nightshift/runlog.py.
  - CLI now streams progress during run / run --all.
  - Runs write .nightshift/runs/<run-id>/run.log and aggregate .nightshift/nightshift.log.
  - Web dashboard now shows the last 100 run log lines.
  - Added agent temperature config.
  - Added minimal openai_compatible backend and temperature passing for it.
  - Added Ollama temperature handling.
  - Added scoped repo lookup tools in nightshift/repo_tools.py: list_files, read_file, grep.
  - Planner agents can request lookup context with lookup_requests; NightShift saves files-inspected.md and reruns the planner with retrieved context.
  - Added repo_context stage type that writes context-pack.md.
  - Marked phases 23-27 complete in docs/design.md:990.
This commit is contained in:
K. Hodges 2026-05-17 09:56:28 -07:00
parent 86aa7dd13c
commit 646c655314
15 changed files with 1042 additions and 45 deletions

View File

@ -989,18 +989,18 @@ NightShift should make active runs easier to observe from both the CLI and the w
Implementation tasks: Implementation tasks:
* [ ] Add a small logging module with structured operational events. * [x] Add a small logging module with structured operational events.
* [ ] Stream human-readable progress to the CLI during `run` and `run --all`. * [x] Stream human-readable progress to the CLI during `run` and `run --all`.
* [ ] Include run id, task id, stage id, agent/backend, command index, retry count, status, duration, and artifact path where available. * [x] Include run id, task id, stage id, agent/backend, command index, retry count, status, duration, and artifact path where available.
* [ ] Write a per-run log file such as `.nightshift/runs/<run-id>/run.log`. * [x] Write a per-run log file such as `.nightshift/runs/<run-id>/run.log`.
* [ ] Optionally write or rotate an aggregate `.nightshift/nightshift.log` for cross-run troubleshooting. * [x] Optionally write or rotate an aggregate `.nightshift/nightshift.log` for cross-run troubleshooting.
* [ ] Keep logs operational; do not duplicate full prompts, full model responses, or full command output that already lives in artifacts. * [x] Keep logs operational; do not duplicate full prompts, full model responses, or full command output that already lives in artifacts.
* [ ] Redact or avoid secrets from logged environment/config values. * [x] Redact or avoid secrets from logged environment/config values.
* [ ] Add dashboard support for viewing the latest log tail. * [x] Add dashboard support for viewing the latest log tail.
* [ ] Cap the dashboard log view to the last 100 lines by default. * [x] Cap the dashboard log view to the last 100 lines by default.
* [ ] Keep the full per-run log file available as an artifact unless a later size cap is configured. * [x] Keep the full per-run log file available as an artifact unless a later size cap is configured.
* [ ] Auto-refresh the dashboard log view with the existing dashboard refresh model. * [x] Auto-refresh the dashboard log view with the existing dashboard refresh model.
* [ ] Add tests for log writing, CLI progress hooks, dashboard log rendering, missing log files, and the 100-line cap. * [x] Add tests for log writing, CLI progress hooks, dashboard log rendering, missing log files, and the 100-line cap.
Acceptance Criteria: Acceptance Criteria:
@ -1019,35 +1019,35 @@ Notes:
## Phase 24: Per-Agent Model Parameters ## Phase 24: Per-Agent Model Parameters
- [ ] Add `temperature` to agent config. - [x] Add `temperature` to agent config.
- [ ] Pass temperature to Ollama/OpenAI-compatible backends. - [x] Pass temperature to Ollama/OpenAI-compatible backends.
- [ ] Default safely if omitted. - [x] Default safely if omitted.
- [ ] Add config validation tests. - [x] Add config validation tests.
## Phase 25: Repo Lookup Tools MVP ## Phase 25: Repo Lookup Tools MVP
- [ ] Add tool interface for repo operations. - [x] Add tool interface for repo operations.
- [ ] Implement scoped `list_files`. - [x] Implement scoped `list_files`.
- [ ] Implement scoped `read_file`. - [x] Implement scoped `read_file`.
- [ ] Implement scoped `grep`. - [x] Implement scoped `grep`.
- [ ] Enforce existing path safety rules. - [x] Enforce existing path safety rules.
- [ ] Log tool calls as artifacts. - [x] Log tool calls as artifacts.
## Phase 26: Planner Code-Discovery Support ## Phase 26: Planner Code-Discovery Support
- [ ] Teach planner prompt to request needed code context. - [x] Teach planner prompt to request needed code context.
- [ ] Add structured planner output for lookup requests. - [x] Add structured planner output for lookup requests.
- [ ] Execute requested lookup tools. - [x] Execute requested lookup tools.
- [ ] Save `files-inspected.md`. - [x] Save `files-inspected.md`.
- [ ] Re-run planner with retrieved context. - [x] Re-run planner with retrieved context.
## Phase 27: Context Pack Builder ## Phase 27: Context Pack Builder
- [ ] Add `repo_context` stage. - [x] Add `repo_context` stage.
- [ ] Generate `context-pack.md`. - [x] Generate `context-pack.md`.
- [ ] Include task, acceptance criteria, relevant files, snippets, and constraints. - [x] Include task, acceptance criteria, relevant files, snippets, and constraints.
- [ ] Add line-numbered excerpts. - [x] Add line-numbered excerpts.
- [ ] Add context-size caps. - [x] Add context-size caps.
## Phase 28: Project Context Chart MVP ## Phase 28: Project Context Chart MVP

View File

@ -3,13 +3,18 @@
from __future__ import annotations from __future__ import annotations
from dataclasses import dataclass from dataclasses import dataclass
import json
import os
from pathlib import Path from pathlib import Path
import subprocess import subprocess
import time import time
from urllib import request
from urllib.error import URLError
from .artifacts import ArtifactStore from .artifacts import ArtifactStore
from .config import AgentConfig, StageConfig from .config import AgentConfig, StageConfig
from .errors import AgentError, SafetyError from .errors import AgentError, SafetyError
from .runlog import NullRunLogger, RunLogger
from .safety import resolve_inside_root, resolve_project_root from .safety import resolve_inside_root, resolve_project_root
from .stages import StageResult, StageStatus from .stages import StageResult, StageStatus
from .tasks import Task from .tasks import Task
@ -43,11 +48,13 @@ class AgentExecutor:
agents: dict[str, AgentConfig], agents: dict[str, AgentConfig],
artifacts: ArtifactStore, artifacts: ArtifactStore,
timeout_seconds: int = DEFAULT_AGENT_TIMEOUT_SECONDS, timeout_seconds: int = DEFAULT_AGENT_TIMEOUT_SECONDS,
logger: RunLogger | None = None,
) -> None: ) -> None:
self.project_root = resolve_project_root(project_root) self.project_root = resolve_project_root(project_root)
self.agents = agents self.agents = agents
self.artifacts = artifacts self.artifacts = artifacts
self.timeout_seconds = timeout_seconds self.timeout_seconds = timeout_seconds
self.logger = logger or NullRunLogger()
def run_stage( def run_stage(
self, self,
@ -64,7 +71,7 @@ class AgentExecutor:
agent = self.agents.get(stage.agent) agent = self.agents.get(stage.agent)
if agent is None: if agent is None:
raise AgentError(f"Agent error: unknown agent '{stage.agent}' for stage '{stage.id}'.") raise AgentError(f"Agent error: unknown agent '{stage.agent}' for stage '{stage.id}'.")
if agent.backend not in {"command", "ollama"}: if agent.backend not in {"command", "ollama", "openai_compatible"}:
raise AgentError( raise AgentError(
f"Agent error: agent '{agent.id}' uses unsupported backend '{agent.backend}'." f"Agent error: agent '{agent.id}' uses unsupported backend '{agent.backend}'."
) )
@ -72,6 +79,10 @@ class AgentExecutor:
raise AgentError(f"Agent error: command backend agent '{agent.id}' has no command.") raise AgentError(f"Agent error: command backend agent '{agent.id}' has no command.")
if agent.backend == "ollama" and not agent.model: if agent.backend == "ollama" and not agent.model:
raise AgentError(f"Agent error: ollama backend agent '{agent.id}' has no model.") raise AgentError(f"Agent error: ollama backend agent '{agent.id}' has no model.")
if agent.backend == "openai_compatible" and not agent.model:
raise AgentError(f"Agent error: openai_compatible backend agent '{agent.id}' has no model.")
if agent.backend == "openai_compatible" and not agent.base_url:
raise AgentError(f"Agent error: openai_compatible backend agent '{agent.id}' has no base_url.")
system_prompt = self._read_system_prompt(agent) system_prompt = self._read_system_prompt(agent)
prompt = build_prompt_bundle( prompt = build_prompt_bundle(
@ -84,10 +95,37 @@ class AgentExecutor:
retry_notes=retry_notes or [], retry_notes=retry_notes or [],
retry_context=retry_context, retry_context=retry_context,
) )
self.logger.event(
"agent.start",
"Starting agent",
stage_id=stage.id,
agent_id=agent.id,
backend=agent.backend,
model=agent.model,
temperature=agent.temperature,
)
invocation = self._invoke(agent, prompt) invocation = self._invoke(agent, prompt)
self.logger.event(
"agent.finish",
"Finished agent",
stage_id=stage.id,
agent_id=agent.id,
backend=agent.backend,
exit_code=invocation.exit_code,
duration=f"{invocation.duration_seconds:.3f}s",
timed_out=str(invocation.timed_out).lower(),
)
output_filename = stage.output or f"{stage.id}.md" output_filename = stage.output or f"{stage.id}.md"
output = format_agent_invocation(stage.id, invocation) output = format_agent_invocation(stage.id, invocation)
output_path = self.artifacts.write_stage_output(task.id, output_filename, output) output_path = self.artifacts.write_stage_output(task.id, output_filename, output)
self.logger.event(
"artifact.write",
"Wrote agent artifact",
stage_id=stage.id,
task_id=task.id,
agent_id=agent.id,
artifact_path=output_path.relative_to(self.project_root),
)
if invocation.timed_out: if invocation.timed_out:
status: StageStatus = "fail" status: StageStatus = "fail"
@ -135,6 +173,8 @@ class AgentExecutor:
def _invoke(self, agent: AgentConfig, prompt: str) -> AgentInvocation: def _invoke(self, agent: AgentConfig, prompt: str) -> AgentInvocation:
if agent.backend == "ollama": if agent.backend == "ollama":
return self._invoke_ollama(agent, prompt) return self._invoke_ollama(agent, prompt)
if agent.backend == "openai_compatible":
return self._invoke_openai_compatible(agent, prompt)
return self._invoke_command(agent, prompt) return self._invoke_command(agent, prompt)
def _invoke_command(self, agent: AgentConfig, prompt: str) -> AgentInvocation: def _invoke_command(self, agent: AgentConfig, prompt: str) -> AgentInvocation:
@ -180,12 +220,15 @@ class AgentExecutor:
if not agent.model: if not agent.model:
raise AgentError(f"Agent error: ollama backend agent '{agent.id}' has no model.") raise AgentError(f"Agent error: ollama backend agent '{agent.id}' has no model.")
command = f"ollama run {agent.model}" command = f"ollama run {agent.model}"
prompt_input = prompt
if agent.temperature is not None:
prompt_input = f"/set parameter temperature {agent.temperature}\n{prompt}"
started = time.monotonic() started = time.monotonic()
try: try:
completed = subprocess.run( completed = subprocess.run(
["ollama", "run", agent.model], ["ollama", "run", agent.model],
cwd=self.project_root, cwd=self.project_root,
input=prompt, input=prompt_input,
capture_output=True, capture_output=True,
text=True, text=True,
encoding="utf-8", encoding="utf-8",
@ -196,7 +239,7 @@ class AgentExecutor:
return AgentInvocation( return AgentInvocation(
agent_id=agent.id, agent_id=agent.id,
command=command, command=command,
prompt=prompt, prompt=prompt_input,
exit_code=completed.returncode, exit_code=completed.returncode,
stdout=_coerce_output(completed.stdout), stdout=_coerce_output(completed.stdout),
stderr=_coerce_output(completed.stderr), stderr=_coerce_output(completed.stderr),
@ -207,7 +250,7 @@ class AgentExecutor:
return AgentInvocation( return AgentInvocation(
agent_id=agent.id, agent_id=agent.id,
command=command, command=command,
prompt=prompt, prompt=prompt_input,
exit_code=127, exit_code=127,
stdout="", stdout="",
stderr=str(exc), stderr=str(exc),
@ -218,7 +261,7 @@ class AgentExecutor:
return AgentInvocation( return AgentInvocation(
agent_id=agent.id, agent_id=agent.id,
command=command, command=command,
prompt=prompt, prompt=prompt_input,
exit_code=-1, exit_code=-1,
stdout=_coerce_output(exc.stdout), stdout=_coerce_output(exc.stdout),
stderr=_coerce_output(exc.stderr), stderr=_coerce_output(exc.stderr),
@ -226,6 +269,63 @@ class AgentExecutor:
timed_out=True, timed_out=True,
) )
def _invoke_openai_compatible(self, agent: AgentConfig, prompt: str) -> AgentInvocation:
if not agent.model or not agent.base_url:
raise AgentError(f"Agent error: openai_compatible backend agent '{agent.id}' is incomplete.")
url = agent.base_url.rstrip("/") + "/chat/completions"
command = f"POST {url}"
body: dict[str, object] = {
"model": agent.model,
"messages": [{"role": "user", "content": prompt}],
}
if agent.temperature is not None:
body["temperature"] = agent.temperature
headers = {"Content-Type": "application/json"}
api_key_env = agent.api_key_env or "OPENAI_API_KEY"
api_key = os.environ.get(api_key_env)
if api_key:
headers["Authorization"] = f"Bearer {api_key}"
started = time.monotonic()
try:
payload = json.dumps(body).encode("utf-8")
req = request.Request(url, data=payload, headers=headers, method="POST")
with request.urlopen(req, timeout=self.timeout_seconds) as response:
raw = response.read().decode("utf-8", errors="replace")
duration = time.monotonic() - started
return AgentInvocation(
agent_id=agent.id,
command=command,
prompt=prompt,
exit_code=0,
stdout=_extract_openai_content(raw),
stderr="",
duration_seconds=duration,
)
except TimeoutError:
duration = time.monotonic() - started
return AgentInvocation(
agent_id=agent.id,
command=command,
prompt=prompt,
exit_code=-1,
stdout="",
stderr="Request timed out.",
duration_seconds=duration,
timed_out=True,
)
except (OSError, URLError) as exc:
duration = time.monotonic() - started
return AgentInvocation(
agent_id=agent.id,
command=command,
prompt=prompt,
exit_code=1,
stdout="",
stderr=str(exc),
duration_seconds=duration,
)
def build_prompt_bundle( def build_prompt_bundle(
system_prompt: str, system_prompt: str,
@ -294,6 +394,20 @@ def _coerce_output(value: str | bytes | None) -> str:
return value return value
def _extract_openai_content(raw: str) -> str:
try:
data = json.loads(raw)
choices = data.get("choices", [])
if choices:
message = choices[0].get("message", {})
content = message.get("content")
if isinstance(content, str):
return content
except (json.JSONDecodeError, AttributeError):
pass
return raw
def output_contract_for(stage: StageConfig) -> str: def output_contract_for(stage: StageConfig) -> str:
if stage.type in {"agent_review", "review"}: if stage.type in {"agent_review", "review"}:
return "\n".join( return "\n".join(
@ -305,6 +419,20 @@ def output_contract_for(stage: StageConfig) -> str:
"context_update: <compact useful note>", "context_update: <compact useful note>",
] ]
) )
if stage.type == "agent" and ("plan" in stage.id.lower() or stage.agent == "planner"):
return "\n".join(
[
"Write the requested stage output in concise markdown.",
"",
"If you need repository context before finalizing the plan, include:",
"lookup_requests:",
"- tool: list_files | read_file | grep",
" path: <relative path>",
" pattern: <glob for list_files or regex for grep>",
"",
"NightShift will run these read-only lookup tools, save files-inspected.md, and re-run this planner stage with the retrieved context.",
]
)
return "Write the requested stage output in concise markdown." return "Write the requested stage output in concise markdown."

View File

@ -39,6 +39,8 @@ class ArtifactStore:
self.project_context_path = self.artifact_root / "project-context.md" self.project_context_path = self.artifact_root / "project-context.md"
self.run_summary_path = self.run_dir / "run-summary.md" self.run_summary_path = self.run_dir / "run-summary.md"
self.config_snapshot_path = self.run_dir / "config.snapshot.yaml" self.config_snapshot_path = self.run_dir / "config.snapshot.yaml"
self.run_log_path = self.run_dir / "run.log"
self.aggregate_log_path = self.artifact_root / "nightshift.log"
@classmethod @classmethod
def from_config(cls, config: NightShiftConfig, run_id: str | None = None) -> "ArtifactStore": def from_config(cls, config: NightShiftConfig, run_id: str | None = None) -> "ArtifactStore":

View File

@ -10,6 +10,7 @@ from .config import validate_config
from .errors import NightShiftError from .errors import NightShiftError
from .init import init_project from .init import init_project
from .pipeline import PipelineRunner from .pipeline import PipelineRunner
from .runlog import RunLogger
from .status import build_status, format_status from .status import build_status, format_status
from .tasks import ( from .tasks import (
ensure_dependencies_satisfied, ensure_dependencies_satisfied,
@ -80,7 +81,7 @@ def main(argv: list[str] | None = None) -> int:
validate_task_dependencies(tasks) validate_task_dependencies(tasks)
if args.all and args.task: if args.all and args.task:
parser.error("run accepts either --all or --task, not both.") parser.error("run accepts either --all or --task, not both.")
runner = PipelineRunner(config) runner = PipelineRunner(config, logger=RunLogger(console=print))
if args.all: if args.all:
selected = [task for task in tasks if not task.completed] selected = [task for task in tasks if not task.completed]
result = runner.run_tasks(selected) result = runner.run_tasks(selected)

View File

@ -12,6 +12,7 @@ import time
from .artifacts import ArtifactStore from .artifacts import ArtifactStore
from .config import SafetyConfig, StageConfig from .config import SafetyConfig, StageConfig
from .errors import CommandError, SafetyError from .errors import CommandError, SafetyError
from .runlog import NullRunLogger, RunLogger
from .safety import ensure_command_allowed, resolve_inside_root, resolve_project_root from .safety import ensure_command_allowed, resolve_inside_root, resolve_project_root
from .stages import StageResult from .stages import StageResult
@ -38,11 +39,13 @@ class CommandExecutor:
safety: SafetyConfig, safety: SafetyConfig,
artifacts: ArtifactStore, artifacts: ArtifactStore,
timeout_seconds: int = DEFAULT_COMMAND_TIMEOUT_SECONDS, timeout_seconds: int = DEFAULT_COMMAND_TIMEOUT_SECONDS,
logger: RunLogger | None = None,
) -> None: ) -> None:
self.project_root = resolve_project_root(project_root) self.project_root = resolve_project_root(project_root)
self.safety = safety self.safety = safety
self.artifacts = artifacts self.artifacts = artifacts
self.timeout_seconds = timeout_seconds self.timeout_seconds = timeout_seconds
self.logger = logger or NullRunLogger()
def run_stage(self, stage: StageConfig, task_id: str) -> StageResult: def run_stage(self, stage: StageConfig, task_id: str) -> StageResult:
if stage.type != "command": if stage.type != "command":
@ -56,7 +59,14 @@ class CommandExecutor:
status = "pass" status = "pass"
reason = "All commands passed." reason = "All commands passed."
for command in stage.commands: for index, command in enumerate(stage.commands, start=1):
self.logger.event(
"command.start",
"Starting command",
stage_id=stage.id,
command_index=index,
command=command,
)
run = self.run_command( run = self.run_command(
command, command,
shell=stage.shell, shell=stage.shell,
@ -64,6 +74,15 @@ class CommandExecutor:
working_dir=stage.working_dir, working_dir=stage.working_dir,
) )
runs.append(run) runs.append(run)
self.logger.event(
"command.finish",
"Finished command",
stage_id=stage.id,
command_index=index,
exit_code=run.exit_code,
duration=f"{run.duration_seconds:.3f}s",
timed_out=str(run.timed_out).lower(),
)
if run.timed_out: if run.timed_out:
status = "fail" status = "fail"
timeout = stage.timeout_seconds or self.timeout_seconds timeout = stage.timeout_seconds or self.timeout_seconds
@ -80,6 +99,13 @@ class CommandExecutor:
output_filename, output_filename,
format_command_runs(stage.id, runs), format_command_runs(stage.id, runs),
) )
self.logger.event(
"artifact.write",
"Wrote command artifact",
stage_id=stage.id,
task_id=task_id,
artifact_path=output_path.relative_to(self.project_root),
)
return StageResult( return StageResult(
stage_id=stage.id, stage_id=stage.id,
status=status, # type: ignore[arg-type] status=status, # type: ignore[arg-type]

View File

@ -43,6 +43,9 @@ class AgentConfig:
system_prompt: Path system_prompt: Path
model: str | None = None model: str | None = None
role: str | None = None role: str | None = None
temperature: float | None = None
base_url: str | None = None
api_key_env: str | None = None
@dataclass(frozen=True) @dataclass(frozen=True)
@ -83,7 +86,7 @@ class NightShiftConfig:
AGENT_STAGE_TYPES = {"agent", "agent_review", "review"} AGENT_STAGE_TYPES = {"agent", "agent_review", "review"}
COMMAND_STAGE_TYPES = {"command"} COMMAND_STAGE_TYPES = {"command"}
SUPPORTED_STAGE_TYPES = AGENT_STAGE_TYPES | COMMAND_STAGE_TYPES | {"summarize"} SUPPORTED_STAGE_TYPES = AGENT_STAGE_TYPES | COMMAND_STAGE_TYPES | {"repo_context", "summarize"}
def load_config(path: str | Path = "nightshift.yaml") -> NightShiftConfig: def load_config(path: str | Path = "nightshift.yaml") -> NightShiftConfig:
@ -181,10 +184,20 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig:
backend = _require_string(agent_raw, "backend", f"agents.{agent_id}") backend = _require_string(agent_raw, "backend", f"agents.{agent_id}")
command = _optional_string(agent_raw.get("command"), f"agents.{agent_id}.command") command = _optional_string(agent_raw.get("command"), f"agents.{agent_id}.command")
model = _optional_string(agent_raw.get("model"), f"agents.{agent_id}.model") model = _optional_string(agent_raw.get("model"), f"agents.{agent_id}.model")
if backend not in {"command", "ollama"}: base_url = _optional_string(agent_raw.get("base_url"), f"agents.{agent_id}.base_url")
api_key_env = _optional_string(agent_raw.get("api_key_env"), f"agents.{agent_id}.api_key_env")
temperature = _optional_float_or_none(
agent_raw.get("temperature"),
f"agents.{agent_id}.temperature",
)
if temperature is not None and temperature < 0:
raise ConfigError(
f"Config error: agents.{agent_id}.temperature must be zero or greater."
)
if backend not in {"command", "ollama", "openai_compatible"}:
raise ConfigError( raise ConfigError(
f"Config error: agent '{agent_id}' uses unsupported backend '{backend}'. " f"Config error: agent '{agent_id}' uses unsupported backend '{backend}'. "
"Supported backends: command, ollama." "Supported backends: command, ollama, openai_compatible."
) )
if backend == "command" and command is None: if backend == "command" and command is None:
raise ConfigError( raise ConfigError(
@ -194,6 +207,14 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig:
raise ConfigError( raise ConfigError(
f"Config error: ollama backend agent '{agent_id}' must define model." f"Config error: ollama backend agent '{agent_id}' must define model."
) )
if backend == "openai_compatible" and model is None:
raise ConfigError(
f"Config error: openai_compatible backend agent '{agent_id}' must define model."
)
if backend == "openai_compatible" and base_url is None:
raise ConfigError(
f"Config error: openai_compatible backend agent '{agent_id}' must define base_url."
)
system_prompt = Path(_require_string(agent_raw, "system_prompt", f"agents.{agent_id}")) system_prompt = Path(_require_string(agent_raw, "system_prompt", f"agents.{agent_id}"))
agents[str(agent_id)] = AgentConfig( agents[str(agent_id)] = AgentConfig(
id=str(agent_id), id=str(agent_id),
@ -202,6 +223,9 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig:
system_prompt=system_prompt, system_prompt=system_prompt,
model=model, model=model,
role=_optional_string(agent_raw.get("role"), f"agents.{agent_id}.role"), role=_optional_string(agent_raw.get("role"), f"agents.{agent_id}.role"),
temperature=temperature,
base_url=base_url,
api_key_env=api_key_env,
) )
experiment_raw = raw.get("experiment", {}) experiment_raw = raw.get("experiment", {})
@ -444,6 +468,8 @@ def _parse_scalar(value: str) -> Any:
return None return None
if re.fullmatch(r"-?\d+", value): if re.fullmatch(r"-?\d+", value):
return int(value) return int(value)
if re.fullmatch(r"-?(\d+\.\d*|\d*\.\d+)", value):
return float(value)
if (value.startswith('"') and value.endswith('"')) or ( if (value.startswith('"') and value.endswith('"')) or (
value.startswith("'") and value.endswith("'") value.startswith("'") and value.endswith("'")
): ):
@ -492,6 +518,14 @@ def _optional_int_or_none(value: Any, context: str) -> int | None:
return _optional_int(value, context) return _optional_int(value, context)
def _optional_float_or_none(value: Any, context: str) -> float | None:
if value is None:
return None
if isinstance(value, bool) or not isinstance(value, (int, float)):
raise ConfigError(f"Config error: '{context}' must be a number when set.")
return float(value)
def _string_tuple(value: Any, context: str) -> tuple[str, ...]: def _string_tuple(value: Any, context: str) -> tuple[str, ...]:
if value is None: if value is None:
return () return ()

View File

@ -4,6 +4,7 @@ from __future__ import annotations
from dataclasses import dataclass from dataclasses import dataclass
from pathlib import Path from pathlib import Path
import re
from .agents import AgentExecutor from .agents import AgentExecutor
from .artifacts import ArtifactStore from .artifacts import ArtifactStore
@ -14,6 +15,8 @@ from .errors import PipelineError
from .errors import NightShiftError from .errors import NightShiftError
from .git import ensure_clean_worktree, write_diff_artifact, write_git_artifacts from .git import ensure_clean_worktree, write_diff_artifact, write_git_artifacts
from .reports import ReportGenerator from .reports import ReportGenerator
from .repo_tools import RepoTools, extract_agent_stdout, parse_lookup_requests
from .runlog import RunLogger
from .stages import StageResult from .stages import StageResult
from .tasks import Task, mark_task_completed from .tasks import Task, mark_task_completed
@ -46,9 +49,11 @@ class PipelineRunner:
artifacts: ArtifactStore | None = None, artifacts: ArtifactStore | None = None,
agent_timeout_seconds: int = 600, agent_timeout_seconds: int = 600,
command_timeout_seconds: int = 300, command_timeout_seconds: int = 300,
logger: RunLogger | None = None,
) -> None: ) -> None:
self.config = config self.config = config
self.artifacts = artifacts or ArtifactStore.from_config(config) self.artifacts = artifacts or ArtifactStore.from_config(config)
self.logger = logger or RunLogger()
self.context = ContextManager(self.artifacts) self.context = ContextManager(self.artifacts)
self.reports = ReportGenerator( self.reports = ReportGenerator(
config.project.root, config.project.root,
@ -61,17 +66,33 @@ class PipelineRunner:
config.agents, config.agents,
self.artifacts, self.artifacts,
timeout_seconds=agent_timeout_seconds, timeout_seconds=agent_timeout_seconds,
logger=self.logger,
) )
self.command_executor = CommandExecutor( self.command_executor = CommandExecutor(
config.project.root, config.project.root,
config.safety, config.safety,
self.artifacts, self.artifacts,
timeout_seconds=command_timeout_seconds, timeout_seconds=command_timeout_seconds,
logger=self.logger,
)
self.repo_tools = RepoTools(
config.project.root,
config.safety,
self.artifacts,
logger=self.logger,
) )
def run_task(self, task: Task) -> PipelineResult: def run_task(self, task: Task) -> PipelineResult:
ensure_clean_worktree(self.config.project.root, self.config.safety.require_clean_worktree) ensure_clean_worktree(self.config.project.root, self.config.safety.require_clean_worktree)
self.artifacts.initialize_run() self.artifacts.initialize_run()
self.logger.bind(self.artifacts)
self.logger.event(
"task.start",
"Starting task",
run_id=self.artifacts.run_id,
task_id=task.id,
task_title=task.title,
)
self.artifacts.write_config_snapshot(self.config.path) self.artifacts.write_config_snapshot(self.config.path)
self.artifacts.write_prompt_snapshots( self.artifacts.write_prompt_snapshots(
{ {
@ -97,6 +118,15 @@ class PipelineRunner:
while index < len(stages): while index < len(stages):
stage = stages[index] stage = stages[index]
self.logger.event(
"stage.start",
"Starting stage",
run_id=self.artifacts.run_id,
task_id=task.id,
stage_id=stage.id,
stage_type=stage.type,
retry_count=retry_count,
)
try: try:
result = self._run_stage(stage, task, previous_outputs, retry_notes) result = self._run_stage(stage, task, previous_outputs, retry_notes)
except NightShiftError as exc: except NightShiftError as exc:
@ -113,6 +143,16 @@ class PipelineRunner:
) )
stage_results.append(result) stage_results.append(result)
previous_outputs[stage.id] = self._read_output(result.output_path) previous_outputs[stage.id] = self._read_output(result.output_path)
self.logger.event(
"stage.finish",
"Finished stage",
run_id=self.artifacts.run_id,
task_id=task.id,
stage_id=stage.id,
status=result.status,
reason=result.reason,
artifact_path=result.output_path,
)
if result.context_update: if result.context_update:
retry_notes.append(f"Context update from '{stage.id}': {result.context_update}") retry_notes.append(f"Context update from '{stage.id}': {result.context_update}")
@ -135,6 +175,16 @@ class PipelineRunner:
) )
break break
retry_count += 1 retry_count += 1
self.logger.event(
"stage.retry",
"Redirecting after stage result",
run_id=self.artifacts.run_id,
task_id=task.id,
stage_id=stage.id,
status=result.status,
retry_count=retry_count,
next_stage=target_stage,
)
retry_notes.append( retry_notes.append(
f"Retry {retry_count}: stage '{stage.id}' returned " f"Retry {retry_count}: stage '{stage.id}' returned "
f"{result.status} ({result.reason}); redirecting to '{target_stage}'." f"{result.status} ({result.reason}); redirecting to '{target_stage}'."
@ -179,6 +229,16 @@ class PipelineRunner:
stage_results, stage_results,
context_out_path=context_out_path, context_out_path=context_out_path,
) )
self.logger.event(
"task.finish",
"Finished task",
run_id=self.artifacts.run_id,
task_id=task.id,
status=final_status,
retry_count=retry_count,
reason=final_reason,
artifact_path=self.artifacts.create_task_dir(task.id).directory.relative_to(self.config.project.root),
)
return PipelineResult( return PipelineResult(
task_id=task.id, task_id=task.id,
@ -191,6 +251,8 @@ class PipelineRunner:
def run_tasks(self, tasks: list[Task] | tuple[Task, ...]) -> MultiTaskResult: def run_tasks(self, tasks: list[Task] | tuple[Task, ...]) -> MultiTaskResult:
self.artifacts.initialize_run() self.artifacts.initialize_run()
self.logger.bind(self.artifacts)
self.logger.event("run.start", "Starting multi-task run", run_id=self.artifacts.run_id)
results: list[PipelineResult] = [] results: list[PipelineResult] = []
known_ids = {task.id for task in tasks} known_ids = {task.id for task in tasks}
completed_ids = {task.id for task in tasks if task.completed} completed_ids = {task.id for task in tasks if task.completed}
@ -216,6 +278,13 @@ class PipelineRunner:
reason="Task blocked by " + "; ".join(reason_parts), reason="Task blocked by " + "; ".join(reason_parts),
) )
results.append(blocked) results.append(blocked)
self.logger.event(
"task.blocked",
"Task blocked by dependencies",
run_id=self.artifacts.run_id,
task_id=task.id,
reason=blocked.reason,
)
if not self.config.pipeline.continue_on_task_failure: if not self.config.pipeline.continue_on_task_failure:
break break
continue continue
@ -234,6 +303,14 @@ class PipelineRunner:
format_aggregate_run_summary(results, status, reason), format_aggregate_run_summary(results, status, reason),
encoding="utf-8", encoding="utf-8",
) )
self.logger.event(
"run.finish",
"Finished multi-task run",
run_id=self.artifacts.run_id,
status=status,
completed_count=completed_count,
failed_count=failed_count,
)
return MultiTaskResult( return MultiTaskResult(
status=status, status=status,
task_results=tuple(results), task_results=tuple(results),
@ -251,7 +328,7 @@ class PipelineRunner:
) -> StageResult: ) -> StageResult:
if stage.type in {"agent", "agent_review", "review"}: if stage.type in {"agent", "agent_review", "review"}:
context = self.context.read_context(task, retry_notes) context = self.context.read_context(task, retry_notes)
return self.agent_executor.run_stage( result = self.agent_executor.run_stage(
stage, stage,
task, task,
previous_outputs, previous_outputs,
@ -260,8 +337,39 @@ class PipelineRunner:
task_context=context.task_context, task_context=context.task_context,
retry_context=context.retry_context, retry_context=context.retry_context,
) )
if stage.type == "agent":
return self._maybe_rerun_agent_with_repo_lookup(
stage,
task,
result,
previous_outputs,
retry_notes,
context.project_context,
context.task_context,
context.retry_context,
)
return result
if stage.type in COMMAND_STAGE_TYPES: if stage.type in COMMAND_STAGE_TYPES:
return self.command_executor.run_stage(stage, task.id) return self.command_executor.run_stage(stage, task.id)
if stage.type == "repo_context":
output_path = self.artifacts.write_stage_output(
task.id,
stage.output or "context-pack.md",
self._build_context_pack(task),
)
self.logger.event(
"artifact.write",
"Wrote context pack",
stage_id=stage.id,
task_id=task.id,
artifact_path=output_path.relative_to(self.config.project.root),
)
return StageResult(
stage_id=stage.id,
status="pass",
reason="Context pack written.",
output_path=str(output_path.relative_to(self.config.project.root)),
)
if stage.type == "summarize": if stage.type == "summarize":
output_path = self.artifacts.write_stage_output( output_path = self.artifacts.write_stage_output(
task.id, task.id,
@ -276,6 +384,103 @@ class PipelineRunner:
) )
raise PipelineError(f"Pipeline error: unsupported stage type '{stage.type}'.") raise PipelineError(f"Pipeline error: unsupported stage type '{stage.type}'.")
def _maybe_rerun_agent_with_repo_lookup(
self,
stage: StageConfig,
task: Task,
result: StageResult,
previous_outputs: dict[str, str],
retry_notes: list[str],
project_context: str,
task_context: str,
retry_context: str | None,
) -> StageResult:
if result.status != "pass" or result.output_path is None:
return result
output_text = self._read_output(result.output_path)
requests = parse_lookup_requests(extract_agent_stdout(output_text))
if not requests:
return result
lookup_context = self.repo_tools.execute_requests(
task.id,
requests,
filename="files-inspected.md",
)
self.logger.event(
"agent.rerun",
"Re-running agent with repo lookup context",
stage_id=stage.id,
task_id=task.id,
lookup_count=len(requests),
)
rerun_outputs = dict(previous_outputs)
rerun_outputs["repo_lookup_results"] = lookup_context
rerun_result = self.agent_executor.run_stage(
stage,
task,
rerun_outputs,
retry_notes,
project_context=project_context,
task_context=task_context,
retry_context=retry_context,
)
return StageResult(
stage_id=rerun_result.stage_id,
status=rerun_result.status,
reason=(
"Agent completed after repo lookup."
if rerun_result.status == "pass"
else rerun_result.reason
),
output_path=rerun_result.output_path,
next_stage=rerun_result.next_stage,
context_update=rerun_result.context_update,
)
def _build_context_pack(self, task: Task) -> str:
terms = _task_search_terms(task)
files = self.repo_tools.list_files(".", pattern="*.py", max_files=80)
grep_sections: list[str] = []
for term in terms[:5]:
grep_sections.extend(
[
f"### Search: {term}",
"",
"```text",
self.repo_tools.grep(re.escape(term), ".", max_matches=20),
"```",
"",
]
)
return "\n".join(
[
"# Context Pack",
"",
f"Task: `{task.id}`",
f"Title: {task.title}",
"",
"## Acceptance Criteria",
"",
"\n".join(f"- {item}" for item in task.acceptance_criteria) or "- None",
"",
"## Constraints",
"",
f"- Scoped paths: {', '.join(self.config.safety.scoped_paths) or '.'}",
"- Repository lookups are read-only.",
"- Excerpts are line-numbered where files are read directly.",
"",
"## Relevant Files",
"",
"```text",
files,
"```",
"",
"## Search Results",
"",
*grep_sections,
]
)
def _read_output(self, output_path: str | None) -> str: def _read_output(self, output_path: str | None) -> str:
if output_path is None: if output_path is None:
return "" return ""
@ -365,9 +570,37 @@ def format_run_metadata(config: NightShiftConfig) -> str:
"", "",
f"- Backend: {agent.backend}", f"- Backend: {agent.backend}",
f"- Model: {agent.model or ''}", f"- Model: {agent.model or ''}",
f"- Temperature: {agent.temperature if agent.temperature is not None else ''}",
f"- Base URL: {agent.base_url or ''}",
f"- Command: {agent.command or ''}", f"- Command: {agent.command or ''}",
f"- System prompt: {agent.system_prompt}", f"- System prompt: {agent.system_prompt}",
"", "",
] ]
) )
return "\n".join(lines) return "\n".join(lines)
def _task_search_terms(task: Task) -> list[str]:
source = " ".join([task.id, task.title, *task.acceptance_criteria])
words = re.findall(r"[A-Za-z_][A-Za-z0-9_]{2,}", source)
ignored = {
"the",
"and",
"for",
"with",
"that",
"this",
"task",
"add",
"use",
"can",
"should",
"must",
}
terms: list[str] = []
for word in words:
lowered = word.lower()
if lowered in ignored or lowered in terms:
continue
terms.append(lowered)
return terms or [task.id]

250
nightshift/repo_tools.py Normal file
View File

@ -0,0 +1,250 @@
"""Scoped repository lookup tools."""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
import fnmatch
import re
from .artifacts import ArtifactStore
from .config import SafetyConfig
from .errors import SafetyError
from .runlog import NullRunLogger, RunLogger
from .safety import resolve_inside_root, resolve_project_root, validate_scoped_paths
DEFAULT_MAX_BYTES = 20_000
DEFAULT_MAX_MATCHES = 100
@dataclass(frozen=True)
class ToolCall:
name: str
arguments: dict[str, str]
output: str
class RepoTools:
"""Read-only repo tools constrained to configured project scope."""
def __init__(
self,
project_root: str | Path,
safety: SafetyConfig,
artifacts: ArtifactStore,
logger: RunLogger | None = None,
) -> None:
self.project_root = resolve_project_root(project_root)
self.safety = safety
self.artifacts = artifacts
self.logger = logger or NullRunLogger()
self.scoped_roots = validate_scoped_paths(
self.project_root,
safety.scoped_paths or (".",),
)
def list_files(self, path: str = ".", pattern: str = "*", max_files: int = 200) -> str:
root = self._resolve_scoped(path, "list_files path")
if not root.exists():
return f"Path not found: {path}"
if root.is_file():
candidates = [root]
else:
candidates = [item for item in root.rglob("*") if item.is_file()]
relative_files = [
_relative(item, self.project_root)
for item in sorted(candidates)
if fnmatch.fnmatch(item.name, pattern)
]
lines = relative_files[:max_files]
if len(relative_files) > max_files:
lines.append(f"... truncated {len(relative_files) - max_files} files")
return "\n".join(lines) or "No files found."
def read_file(self, path: str, max_bytes: int = DEFAULT_MAX_BYTES) -> str:
file_path = self._resolve_scoped(path, "read_file path")
if not file_path.exists() or not file_path.is_file():
return f"File not found: {path}"
data = file_path.read_bytes()[:max_bytes + 1]
truncated = len(data) > max_bytes
text = data[:max_bytes].decode("utf-8", errors="replace")
numbered = _line_number(text)
if truncated:
numbered += "\n... truncated"
return numbered
def grep(
self,
pattern: str,
path: str = ".",
max_matches: int = DEFAULT_MAX_MATCHES,
) -> str:
root = self._resolve_scoped(path, "grep path")
regex = re.compile(pattern)
files = [root] if root.is_file() else [item for item in root.rglob("*") if item.is_file()]
matches: list[str] = []
for file_path in sorted(files):
try:
text = file_path.read_text(encoding="utf-8", errors="replace")
except OSError:
continue
for line_number, line in enumerate(text.splitlines(), start=1):
if regex.search(line):
matches.append(f"{_relative(file_path, self.project_root)}:{line_number}: {line}")
if len(matches) >= max_matches:
matches.append("... truncated")
return "\n".join(matches)
return "\n".join(matches) or "No matches found."
def write_tool_artifact(self, task_id: str, calls: list[ToolCall], filename: str = "repo-tools.md") -> Path:
content = format_tool_calls(calls)
path = self.artifacts.write_stage_output(task_id, filename, content)
self.logger.event(
"artifact.write",
"Wrote repo tool artifact",
task_id=task_id,
artifact_path=path.relative_to(self.project_root),
)
return path
def execute_requests(self, task_id: str, requests: list[ToolCall], filename: str = "repo-tools.md") -> str:
completed: list[ToolCall] = []
for request in requests:
self.logger.event(
"tool.call",
"Running repo lookup tool",
task_id=task_id,
tool=request.name,
**request.arguments,
)
try:
output = self._execute_request(request)
except (SafetyError, re.error) as exc:
output = str(exc)
completed.append(ToolCall(request.name, request.arguments, output))
self.write_tool_artifact(task_id, completed, filename=filename)
return format_tool_calls(completed)
def _execute_request(self, request: ToolCall) -> str:
if request.name == "list_files":
return self.list_files(
path=request.arguments.get("path", "."),
pattern=request.arguments.get("pattern", "*"),
)
if request.name == "read_file":
path = request.arguments.get("path")
if not path:
return "Missing required argument: path"
return self.read_file(path)
if request.name == "grep":
pattern = request.arguments.get("pattern")
if not pattern:
return "Missing required argument: pattern"
return self.grep(pattern, path=request.arguments.get("path", "."))
return f"Unsupported repo lookup tool: {request.name}"
def _resolve_scoped(self, path: str, context: str) -> Path:
resolved = resolve_inside_root(self.project_root, path, context)
for scoped_root in self.scoped_roots:
try:
resolved.relative_to(scoped_root)
return resolved
except ValueError:
continue
scopes = ", ".join(_relative(item, self.project_root) for item in self.scoped_roots)
raise SafetyError(f"Safety error: {context} is outside configured scoped paths: {path}. Scopes: {scopes}")
def format_tool_calls(calls: list[ToolCall]) -> str:
lines = ["# Repo Tool Calls", ""]
if not calls:
lines.append("No tool calls.")
return "\n".join(lines)
for index, call in enumerate(calls, start=1):
lines.extend(
[
f"## {index}. {call.name}",
"",
"Arguments:",
]
)
for key, value in sorted(call.arguments.items()):
lines.append(f"- {key}: `{value}`")
lines.extend(["", "Output:", "", "```text", call.output.rstrip(), "```", ""])
return "\n".join(lines)
def parse_lookup_requests(text: str) -> list[ToolCall]:
"""Parse a small YAML-like lookup request list from model output."""
lines = text.splitlines()
in_section = False
current: dict[str, str] = {}
requests: list[ToolCall] = []
def flush() -> None:
nonlocal current
if not current:
return
name = current.pop("tool", "").strip()
if name:
requests.append(ToolCall(name=name, arguments=dict(current), output=""))
current = {}
for raw_line in lines:
stripped = raw_line.strip()
if stripped in {"lookup_requests:", "repo_lookup:", "repo_lookups:"}:
in_section = True
continue
if not in_section:
continue
if not stripped:
continue
if not raw_line.startswith((" ", "-", "\t")) and not stripped.endswith(":"):
break
if stripped.startswith("- "):
flush()
stripped = stripped[2:].strip()
if ":" not in stripped:
continue
key, value = stripped.split(":", 1)
key = key.strip()
value = value.strip().strip('"').strip("'")
if key == "tool" and current:
flush()
current[key] = value
flush()
return requests
def extract_agent_stdout(artifact_text: str) -> str:
lines = artifact_text.splitlines()
for index, line in enumerate(lines):
if line.strip() != "## stdout":
continue
start = None
for cursor in range(index + 1, len(lines)):
if lines[cursor].strip().startswith("```"):
start = cursor + 1
break
if start is None:
return ""
end = len(lines)
for cursor in range(start, len(lines)):
if lines[cursor].strip().startswith("```"):
end = cursor
break
return "\n".join(lines[start:end])
return artifact_text
def _line_number(text: str) -> str:
return "\n".join(f"{index}: {line}" for index, line in enumerate(text.splitlines(), start=1))
def _relative(path: Path, root: Path) -> str:
try:
return path.relative_to(root).as_posix()
except ValueError:
return path.as_posix()

91
nightshift/runlog.py Normal file
View File

@ -0,0 +1,91 @@
"""Operational run logging for NightShift."""
from __future__ import annotations
from dataclasses import dataclass
from datetime import datetime, timezone
from pathlib import Path
from typing import Callable
from .artifacts import ArtifactStore
ConsoleWriter = Callable[[str], None]
@dataclass(frozen=True)
class LogEvent:
event: str
message: str
fields: dict[str, object]
class RunLogger:
"""Write concise operational events to CLI and run log artifacts."""
def __init__(self, console: ConsoleWriter | None = None) -> None:
self.console = console
self._run_log_path: Path | None = None
self._aggregate_log_path: Path | None = None
def bind(self, artifacts: ArtifactStore) -> None:
artifacts.initialize_run()
self._run_log_path = artifacts.run_log_path
self._aggregate_log_path = artifacts.aggregate_log_path
def event(self, event: str, message: str, **fields: object) -> None:
safe_fields = _redact_fields(fields)
line = format_log_line(LogEvent(event=event, message=message, fields=safe_fields))
if self.console is not None:
self.console(line)
for path in (self._run_log_path, self._aggregate_log_path):
if path is None:
continue
path.parent.mkdir(parents=True, exist_ok=True)
with path.open("a", encoding="utf-8") as handle:
handle.write(line + "\n")
class NullRunLogger(RunLogger):
def __init__(self) -> None:
super().__init__(console=None)
def bind(self, artifacts: ArtifactStore) -> None:
return None
def event(self, event: str, message: str, **fields: object) -> None:
return None
def format_log_line(log_event: LogEvent) -> str:
timestamp = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
parts = [timestamp, log_event.event, log_event.message]
for key, value in sorted(log_event.fields.items()):
if value is None or value == "":
continue
parts.append(f"{key}={_format_value(value)}")
return " | ".join(parts)
def tail_lines(path: Path, limit: int = 100) -> list[str]:
if limit <= 0:
return []
if not path.exists() or not path.is_file():
return []
return path.read_text(encoding="utf-8", errors="replace").splitlines()[-limit:]
def _format_value(value: object) -> str:
text = str(value).replace("\n", " ").replace("\r", " ")
return text if text else ""
def _redact_fields(fields: dict[str, object]) -> dict[str, object]:
redacted: dict[str, object] = {}
for key, value in fields.items():
lowered = key.lower()
if any(marker in lowered for marker in ("secret", "token", "password", "key")):
redacted[key] = "<redacted>"
else:
redacted[key] = value
return redacted

View File

@ -7,6 +7,7 @@ from html import escape
from pathlib import Path from pathlib import Path
from .errors import NightShiftError from .errors import NightShiftError
from .runlog import tail_lines
@dataclass(frozen=True) @dataclass(frozen=True)
@ -14,6 +15,7 @@ class RunInfo:
name: str name: str
path: Path path: Path
summary: str summary: str
log_tail: tuple[str, ...] = ()
def list_runs(artifact_dir: str | Path) -> list[RunInfo]: def list_runs(artifact_dir: str | Path) -> list[RunInfo]:
@ -24,7 +26,14 @@ def list_runs(artifact_dir: str | Path) -> list[RunInfo]:
for path in sorted((item for item in runs_dir.iterdir() if item.is_dir()), reverse=True): for path in sorted((item for item in runs_dir.iterdir() if item.is_dir()), reverse=True):
summary_path = path / "run-summary.md" summary_path = path / "run-summary.md"
summary = summary_path.read_text(encoding="utf-8") if summary_path.exists() else "No run summary yet." summary = summary_path.read_text(encoding="utf-8") if summary_path.exists() else "No run summary yet."
runs.append(RunInfo(name=path.name, path=path, summary=summary)) runs.append(
RunInfo(
name=path.name,
path=path,
summary=summary,
log_tail=tuple(tail_lines(path / "run.log", limit=100)),
)
)
return runs return runs
@ -51,6 +60,10 @@ def render_dashboard(artifact_dir: str | Path) -> str:
"<pre>", "<pre>",
escape(run.summary), escape(run.summary),
"</pre>", "</pre>",
"<h3>Log Tail</h3>",
"<pre>",
escape("\n".join(run.log_tail) if run.log_tail else "No run log yet."),
"</pre>",
"</section>", "</section>",
] ]
) )

View File

@ -1,7 +1,7 @@
from pathlib import Path from pathlib import Path
import tempfile import tempfile
import unittest import unittest
from unittest.mock import patch from unittest.mock import MagicMock, patch
from nightshift.agents import AgentExecutor, build_prompt_bundle, parse_review_output from nightshift.agents import AgentExecutor, build_prompt_bundle, parse_review_output
from nightshift.agents import AgentInvocation, format_agent_invocation from nightshift.agents import AgentInvocation, format_agent_invocation
@ -132,6 +132,43 @@ class AgentExecutorTests(unittest.TestCase):
output = (root / result.output_path).read_text(encoding="utf-8") output = (root / result.output_path).read_text(encoding="utf-8")
self.assertIn("ollama run tiny-model", output) self.assertIn("ollama run tiny-model", output)
def test_openai_compatible_agent_sends_temperature(self) -> None:
with tempfile.TemporaryDirectory() as directory:
root = Path(directory)
prompt_path = root / "planner.md"
prompt_path.write_text("Plan carefully.", encoding="utf-8")
artifacts = ArtifactStore(root, ".nightshift", run_id="test-run")
executor = AgentExecutor(
root,
{
"planner": AgentConfig(
id="planner",
backend="openai_compatible",
command=None,
model="tiny-model",
base_url="http://localhost:11434/v1",
temperature=0.2,
system_prompt=Path("planner.md"),
)
},
artifacts,
)
task = parse_tasks(TASK_MD)[0]
stage = StageConfig(id="plan", type="agent", agent="planner", output="plan.md")
response = MagicMock()
response.__enter__.return_value.read.return_value = (
b'{"choices":[{"message":{"content":"api output"}}]}'
)
with patch("nightshift.agents.request.urlopen", return_value=response) as urlopen:
result = executor.run_stage(stage, task)
self.assertEqual(result.status, "pass")
request_obj = urlopen.call_args.args[0]
body = request_obj.data.decode("utf-8")
self.assertIn('"temperature": 0.2', body)
self.assertIn("api output", (root / result.output_path).read_text(encoding="utf-8"))
def test_agent_artifact_format_tolerates_missing_streams(self) -> None: def test_agent_artifact_format_tolerates_missing_streams(self) -> None:
invocation = AgentInvocation( invocation = AgentInvocation(
agent_id="planner", agent_id="planner",

View File

@ -167,6 +167,24 @@ class ConfigTests(unittest.TestCase):
self.assertEqual(config.agents["planner"].model, "qwen2.5-coder:14b") self.assertEqual(config.agents["planner"].model, "qwen2.5-coder:14b")
self.assertEqual(config.experiment.label, "local-test") self.assertEqual(config.experiment.label, "local-test")
def test_openai_compatible_backend_loads(self) -> None:
with tempfile.TemporaryDirectory() as directory:
root = Path(directory)
init_project(root)
config_path = root / "nightshift.yaml"
text = config_path.read_text(encoding="utf-8").replace(
"backend: command\n command: echo",
"backend: openai_compatible\n model: local-model\n base_url: http://localhost:11434/v1\n temperature: 0.1",
1,
)
config_path.write_text(text, encoding="utf-8")
config = load_config(config_path)
self.assertEqual(config.agents["planner"].backend, "openai_compatible")
self.assertEqual(config.agents["planner"].base_url, "http://localhost:11434/v1")
self.assertEqual(config.agents["planner"].temperature, 0.1)
def test_command_stage_options_load(self) -> None: def test_command_stage_options_load(self) -> None:
with tempfile.TemporaryDirectory() as directory: with tempfile.TemporaryDirectory() as directory:
root = Path(directory) root = Path(directory)
@ -188,6 +206,41 @@ class ConfigTests(unittest.TestCase):
self.assertEqual(test_stage.timeout_seconds, 30) self.assertEqual(test_stage.timeout_seconds, 30)
self.assertEqual(test_stage.working_dir, Path(".")) self.assertEqual(test_stage.working_dir, Path("."))
def test_agent_temperature_loads(self) -> None:
with tempfile.TemporaryDirectory() as directory:
root = Path(directory)
init_project(root)
config_path = root / "nightshift.yaml"
config_path.write_text(
config_path.read_text(encoding="utf-8").replace(
" system_prompt: agents/planner.md",
" system_prompt: agents/planner.md\n temperature: 0.2",
1,
),
encoding="utf-8",
)
config = load_config(config_path)
self.assertEqual(config.agents["planner"].temperature, 0.2)
def test_agent_temperature_must_be_number(self) -> None:
with tempfile.TemporaryDirectory() as directory:
root = Path(directory)
init_project(root)
config_path = root / "nightshift.yaml"
config_path.write_text(
config_path.read_text(encoding="utf-8").replace(
" system_prompt: agents/planner.md",
" system_prompt: agents/planner.md\n temperature: low",
1,
),
encoding="utf-8",
)
with self.assertRaisesRegex(ConfigError, "temperature"):
load_config(config_path)
def test_non_command_stage_cannot_define_commands(self) -> None: def test_non_command_stage_cannot_define_commands(self) -> None:
with tempfile.TemporaryDirectory() as directory: with tempfile.TemporaryDirectory() as directory:
root = Path(directory) root = Path(directory)

View File

@ -230,6 +230,79 @@ Acceptance Criteria:
self.assertEqual(result.status, "failed") self.assertEqual(result.status, "failed")
self.assertEqual(result.task_results[0].status, "blocked") self.assertEqual(result.task_results[0].status, "blocked")
def test_run_writes_operational_log(self) -> None:
with tempfile.TemporaryDirectory() as directory:
root = Path(directory)
_write_common_files(root)
stages = (StageConfig(id="plan", type="agent", agent="planner", output="plan.md"),)
artifacts = ArtifactStore(root, ".nightshift", run_id="test-run")
config = make_config(root, stages)
runner = PipelineRunner(config, artifacts)
task = parse_tasks(TASK_MD)[0]
runner.run_task(task)
log = (root / ".nightshift" / "runs" / "test-run" / "run.log").read_text(encoding="utf-8")
self.assertIn("task.start", log)
self.assertIn("stage.start", log)
self.assertIn("agent.finish", log)
def test_planner_lookup_requests_write_files_inspected_and_rerun(self) -> None:
with tempfile.TemporaryDirectory() as directory:
root = Path(directory)
_write_common_files(root)
(root / "target.py").write_text("VALUE = 1\n", encoding="utf-8")
(root / "fake_planner.py").write_text(
"\n".join(
[
"import sys",
"prompt = sys.stdin.read()",
"if 'repo_lookup_results' in prompt:",
" print('final plan with context')",
"else:",
" print('lookup_requests:')",
" print('- tool: read_file')",
" print(' path: target.py')",
]
),
encoding="utf-8",
)
stages = (StageConfig(id="plan", type="agent", agent="planner", output="plan.md"),)
config = make_config(root, stages)
config.agents["planner"] = AgentConfig(
id="planner",
backend="command",
command="python fake_planner.py",
system_prompt=Path("planner.md"),
)
runner = PipelineRunner(config, ArtifactStore(root, ".nightshift", run_id="test-run"))
task = parse_tasks(TASK_MD)[0]
result = runner.run_task(task)
task_dir = root / ".nightshift" / "runs" / "test-run" / "tasks" / task.id
self.assertEqual(result.status, "complete")
self.assertTrue((task_dir / "files-inspected.md").exists())
self.assertIn("1: VALUE = 1", (task_dir / "files-inspected.md").read_text(encoding="utf-8"))
self.assertIn("final plan with context", (task_dir / "plan.md").read_text(encoding="utf-8"))
def test_repo_context_stage_writes_context_pack(self) -> None:
with tempfile.TemporaryDirectory() as directory:
root = Path(directory)
_write_common_files(root)
(root / "app.py").write_text("def run_pipeline():\n return True\n", encoding="utf-8")
stages = (StageConfig(id="context", type="repo_context", output="context-pack.md"),)
config = make_config(root, stages)
runner = PipelineRunner(config, ArtifactStore(root, ".nightshift", run_id="test-run"))
task = parse_tasks(TASK_MD)[0]
result = runner.run_task(task)
pack = root / ".nightshift" / "runs" / "test-run" / "tasks" / task.id / "context-pack.md"
self.assertEqual(result.status, "complete")
self.assertIn("Context Pack", pack.read_text(encoding="utf-8"))
self.assertIn("app.py", pack.read_text(encoding="utf-8"))
def _write_common_files(root: Path) -> None: def _write_common_files(root: Path) -> None:
(root / "nightshift.yaml").write_text("project:\n name: test\n", encoding="utf-8") (root / "nightshift.yaml").write_text("project:\n name: test\n", encoding="utf-8")

47
tests/test_repo_tools.py Normal file
View File

@ -0,0 +1,47 @@
from pathlib import Path
import tempfile
import unittest
from nightshift.artifacts import ArtifactStore
from nightshift.config import SafetyConfig
from nightshift.repo_tools import RepoTools, parse_lookup_requests
class RepoToolsTests(unittest.TestCase):
def test_repo_tools_are_scoped_and_line_numbered(self) -> None:
with tempfile.TemporaryDirectory() as directory:
root = Path(directory)
(root / "src").mkdir()
(root / "src" / "app.py").write_text("def hello():\n return 'hi'\n", encoding="utf-8")
safety = SafetyConfig(
require_clean_worktree=False,
scoped_paths=("src",),
allowed_commands=(),
forbidden_commands=(),
)
tools = RepoTools(root, safety, ArtifactStore(root, ".nightshift", run_id="test-run"))
self.assertIn("src/app.py", tools.list_files("src", "*.py"))
self.assertIn("1: def hello():", tools.read_file("src/app.py"))
self.assertIn("src/app.py:1", tools.grep("hello", "src"))
def test_parse_lookup_requests(self) -> None:
output = """Plan needs context.
lookup_requests:
- tool: read_file
path: nightshift/pipeline.py
- tool: grep
path: nightshift
pattern: PipelineRunner
"""
requests = parse_lookup_requests(output)
self.assertEqual([request.name for request in requests], ["read_file", "grep"])
self.assertEqual(requests[0].arguments["path"], "nightshift/pipeline.py")
self.assertEqual(requests[1].arguments["pattern"], "PipelineRunner")
if __name__ == "__main__":
unittest.main()

View File

@ -19,14 +19,23 @@ class WebDashboardTests(unittest.TestCase):
artifacts = ArtifactStore(root, ".nightshift", run_id="test-run") artifacts = ArtifactStore(root, ".nightshift", run_id="test-run")
artifacts.initialize_run() artifacts.initialize_run()
artifacts.run_summary_path.write_text("# Summary\n\nok", encoding="utf-8") artifacts.run_summary_path.write_text("# Summary\n\nok", encoding="utf-8")
artifacts.run_log_path.write_text(
"\n".join(f"line {index}" for index in range(120)),
encoding="utf-8",
)
runs = list_runs(root / ".nightshift") runs = list_runs(root / ".nightshift")
content = read_artifact(root / ".nightshift" / "runs" / "test-run", "run-summary.md") content = read_artifact(root / ".nightshift" / "runs" / "test-run", "run-summary.md")
escaped = read_artifact(root / ".nightshift" / "runs" / "test-run", "../project-context.md") escaped = read_artifact(root / ".nightshift" / "runs" / "test-run", "../project-context.md")
dashboard = render_dashboard(root / ".nightshift")
self.assertEqual(len(runs), 1) self.assertEqual(len(runs), 1)
self.assertEqual(len(runs[0].log_tail), 100)
self.assertIn("ok", content) self.assertIn("ok", content)
self.assertIn("escapes", escaped) self.assertIn("escapes", escaped)
self.assertIn("Log Tail", dashboard)
self.assertIn("line 119", dashboard)
self.assertNotIn("line 19\n", dashboard)
if __name__ == "__main__": if __name__ == "__main__":