The ollama backend now uses Ollama’s HTTP API instead of ollama run

This commit is contained in:
K. Hodges 2026-05-17 14:23:31 -07:00
parent db9b24379e
commit 42564c6867
10 changed files with 132 additions and 114 deletions

View File

@ -83,6 +83,8 @@ tasks/TASK-001/context-out.md
tasks/TASK-001/final-notes.md
```
Retry attempts preserve separate artifacts with numeric suffixes, such as `repair-1.patch`, `normalized-1.patch`, `patch-validation-1.md`, `applied-1.patch`, and `patch-apply-output-1.txt`.
## Example Templates
Example run files are available in `examples/templates/`.
@ -363,4 +365,6 @@ After a run, inspect:
.nightshift/runs/<run-id>/tasks/TASK-001/final-notes.md
```
If validation or later stages retry implementation, inspect the suffixed retry artifacts too, for example `repair-1.patch` and `patch-validation-1.md`.
The useful signal is whether NightShift selected the right task, respected dependencies, generated context, validated and applied a patch, ran tests, wrote artifacts, updated task completion, and produced a clear summary.

View File

@ -195,10 +195,13 @@ agents:
implementer:
backend: ollama
model: qwen2.5-coder:14b
base_url: http://localhost:11434
temperature: 0.2
system_prompt: agents/implementer.md
```
The Ollama backend uses the local HTTP API instead of `ollama run`, which keeps exact patch output away from terminal rendering and line wrapping.
Example OpenAI-compatible agent:
```yaml
@ -212,7 +215,7 @@ agents:
system_prompt: agents/implementer.md
```
NightShift passes prompt bundles to agents and persists stdout, stderr, exit code, duration, and prompt artifacts. Code writer agents should return unified diffs.
NightShift passes prompt bundles to agents and persists stdout, stderr, exit code, duration, and prompt artifacts. Code writer agents should return unified diffs. On retries, patch artifacts are versioned by attempt, for example `repair-1.patch`, `normalized-1.patch`, and `patch-validation-1.md`.
Review agents should emit:
@ -274,10 +277,15 @@ A run creates human-readable artifacts:
context-pack.md
plan.md
proposed.patch
repair-1.patch
normalized.patch
normalized-1.patch
patch-validation.md
patch-validation-1.md
applied.patch
applied-1.patch
patch-apply-output.txt
patch-apply-output-1.txt
test-output.txt
review.md
stage-results.md

View File

@ -27,7 +27,8 @@ NightShift config is YAML.
Supported backends:
- `command`: runs a local command with the prompt on stdin.
- `ollama`: runs `ollama run <model>` with the prompt on stdin.
- `ollama`: calls the local Ollama HTTP API at `http://localhost:11434/api/generate` by default.
- `openai_compatible`: calls a Chat Completions-compatible HTTP API.
Command agent:
@ -44,6 +45,7 @@ Ollama agent:
planner:
backend: ollama
model: qwen2.5-coder:14b
base_url: http://localhost:11434
system_prompt: agents/planner.md
```

View File

@ -865,7 +865,7 @@ NightShift currently provides:
* Command, agent, agent-review, review, summarize, repo-context, code-writer, patch-normalizer, patch-validator, and patch-apply stage handling
* Retry redirection with a configured task retry limit
* Command-backed agents
* Ollama-backed local model agents
* Ollama-backed local model agents through the local HTTP API
* OpenAI-compatible local/server model agents
* Per-agent temperature settings
* Scoped repo lookup tools: `list_files`, `read_file`, and `grep`
@ -874,6 +874,7 @@ NightShift currently provides:
* Context pack generation
* Unified diff code-writing contract
* Patch normalization, validation, dry-run, and apply modes
* Per-attempt retry patch artifacts such as `repair-1.patch`, `normalized-1.patch`, and `patch-validation-1.md`
* Test/static failure repair loops via bounded stage retries
* Prompt bundle construction with project, task, retry, and previous-stage context
* Prompt snapshots and run metadata for experiment comparison
@ -1014,13 +1015,13 @@ The next important additions are:
Move max files, max lines, forbidden paths, allowed file types, binary rejection, and protected files into a reusable project-level write policy.
5. Better model backend support
Expand OpenAI-compatible behavior, add request metadata artifacts, support response format hints, and document local server patterns. Prefer non-terminal APIs for machine-readable model output. In particular, avoid relying on interactive CLI streaming paths such as `ollama run` when exact patch text matters; use the Ollama HTTP API or OpenAI-compatible endpoint so terminal rendering, spinners, and line-wrapping behavior cannot corrupt artifacts.
Expand OpenAI-compatible behavior, add request metadata artifacts, support response format hints, and document local server patterns. Machine-readable Ollama output now uses the HTTP API instead of the interactive `ollama run` terminal path; keep this non-terminal capture policy for future model backends where exact patch text matters.
6. Deterministic diff generation
Reduce direct reliance on models emitting perfect unified diffs. Add a workflow where the model returns complete file contents or a structured edit description, then NightShift writes the unified diff deterministically from before/after file snapshots. Keep the existing unified-diff contract for advanced agents, but make deterministic diff generation the preferred path for smaller local models.
7. Retry artifact versioning
Preserve per-attempt artifacts instead of overwriting fixed filenames such as `proposed.patch`, `normalized.patch`, and `patch-validation.md`. Retry artifacts should include attempt numbers, while summary artifacts can point to the latest attempt. This makes repeated validation and repair failures diagnosable.
Continue improving per-attempt artifact preservation. Patch retries now preserve files such as `repair-1.patch`, `normalized-1.patch`, and `patch-validation-1.md`; future work should add richer latest-attempt indexes and dashboard navigation.
8. Patch repair stage
Add an explicit patch repair or strict normalizer stage that receives the invalid patch, validation error, and relevant source excerpts, then returns a complete replacement patch. This stage should remain bounded by strict validation and should not silently guess intent for arbitrary malformed hunks.
@ -1042,7 +1043,7 @@ The next important additions are:
Implementation note:
Recent local-model patch experiments exposed repeated line-fragment artifacts where long generated lines were split and the tail was duplicated on the following line. This affected prose and unified diffs, producing malformed hunk lines that strict validation correctly rejected. Treat this as a backend/output-capture and patch-contract problem before adding editor or linter agents: remove terminal streaming from model capture, preserve retry artifacts, and prefer deterministic diff generation when exact syntax matters.
Recent local-model patch experiments exposed repeated line-fragment artifacts where long generated lines were split and the tail was duplicated on the following line. This affected prose and unified diffs, producing malformed hunk lines that strict validation correctly rejected. Treat this as a backend/output-capture and patch-contract problem before adding editor or linter agents: avoid terminal streaming for machine output, preserve retry artifacts, and prefer deterministic diff generation when exact syntax matters.
---
# Appendix A: Design Decisions and Rationale

View File

@ -18,7 +18,7 @@ If `require_clean_worktree: true`, NightShift blocks dirty repositories before c
## Ollama backend fails
The `ollama` backend requires the `ollama` executable to be installed and the configured model to be available. Tests do not require Ollama.
The `ollama` backend uses Ollama's local HTTP API, normally at `http://localhost:11434/api/generate`. Confirm Ollama is running and the configured model is available with `ollama list` or `ollama pull <model>`. Tests do not require Ollama.
## Flask dashboard fails

View File

@ -29,10 +29,10 @@ Install and start Ollama, then make sure the model is available:
```bash
ollama pull qwen2.5-coder:14b
ollama run qwen2.5-coder:14b
ollama list
```
Stop the interactive `ollama run` session after confirming the model responds. NightShift will invoke Ollama itself.
Keep Ollama running. NightShift uses Ollama's local HTTP API, normally at `http://localhost:11434`, rather than the interactive `ollama run` terminal path.
## 1. Create a Scratch Target Project
@ -189,6 +189,8 @@ Inspect these artifacts:
.nightshift/runs/<run-id>/tasks/TASK-001/final-notes.md
```
If a later stage routes back to `implement`, retry artifacts are written with attempt suffixes such as `repair-1.patch`, `normalized-1.patch`, `patch-validation-1.md`, `applied-1.patch`, and `patch-apply-output-1.txt`.
In dry-run mode, the patch should be validated and checked with `git apply --check`, but files should not change.
## 5. Apply The Patch
@ -215,6 +217,7 @@ If the model generates a valid patch, NightShift will:
- apply the patch with `git apply`
- run `python -m unittest discover -v`
- retry through the implementer if the test stage fails and `max_task_retries` allows it
- preserve per-attempt retry patch artifacts with numeric suffixes
- mark the task complete only if the pipeline completes
## 6. Monitor From The Web Dashboard
@ -277,13 +280,13 @@ Once you trust the workflow, consider setting `require_clean_worktree: true` in
## Troubleshooting
If Ollama is not found:
If Ollama is unavailable:
```text
Agent exited with code 127
Agent exited with code 1
```
Confirm `ollama` is installed and available on `PATH`.
Confirm Ollama is running at the configured `base_url` and the model appears in `ollama list`.
If the model returns prose instead of a patch, tighten `agents/implementer.md`. The implementation stage requires a unified diff.
@ -291,8 +294,11 @@ If patch validation fails, inspect:
```text
patch-validation.md
patch-validation-1.md
normalized.patch
normalized-1.patch
proposed.patch
repair-1.patch
```
If patch apply fails, inspect:

View File

@ -8,7 +8,6 @@ import os
from pathlib import Path
import re
import subprocess
import tempfile
import time
from urllib import request
from urllib.error import URLError
@ -23,7 +22,6 @@ from .tasks import Task
DEFAULT_AGENT_TIMEOUT_SECONDS = 600
OLLAMA_HEARTBEAT_SECONDS = 30.0
@dataclass(frozen=True)
@ -222,96 +220,58 @@ class AgentExecutor:
def _invoke_ollama(self, agent: AgentConfig, prompt: str) -> AgentInvocation:
if not agent.model:
raise AgentError(f"Agent error: ollama backend agent '{agent.id}' has no model.")
command = f"ollama run {agent.model}"
prompt_input = prompt
base_url = (agent.base_url or "http://localhost:11434").rstrip("/")
url = base_url + "/api/generate"
command = f"POST {url}"
body: dict[str, object] = {
"model": agent.model,
"prompt": prompt,
"stream": False,
}
if agent.temperature is not None:
prompt_input = f"/set parameter temperature {agent.temperature}\n{prompt}"
body["options"] = {"temperature": agent.temperature}
headers = {"Content-Type": "application/json"}
started = time.monotonic()
self.logger.event(
"ollama.start",
"Starting Ollama model invocation",
"Starting Ollama HTTP model invocation",
agent_id=agent.id,
model=agent.model,
timeout_seconds=self.timeout_seconds,
)
try:
with tempfile.TemporaryFile("w+", encoding="utf-8", errors="replace") as stdout_file:
with tempfile.TemporaryFile("w+", encoding="utf-8", errors="replace") as stderr_file:
process = subprocess.Popen(
["ollama", "run", agent.model],
cwd=self.project_root,
stdin=subprocess.PIPE,
stdout=stdout_file,
stderr=stderr_file,
text=True,
encoding="utf-8",
errors="replace",
)
assert process.stdin is not None
process.stdin.write(prompt_input)
process.stdin.close()
last_heartbeat = started
timed_out = False
while process.poll() is None:
now = time.monotonic()
elapsed = now - started
if elapsed > self.timeout_seconds:
process.kill()
timed_out = True
break
if now - last_heartbeat >= OLLAMA_HEARTBEAT_SECONDS:
self.logger.event(
"ollama.wait",
"Ollama invocation still running",
agent_id=agent.id,
model=agent.model,
elapsed=f"{elapsed:.0f}s",
)
last_heartbeat = now
time.sleep(1.0)
process.wait()
payload = json.dumps(body).encode("utf-8")
req = request.Request(url, data=payload, headers=headers, method="POST")
with request.urlopen(req, timeout=self.timeout_seconds) as response:
raw = response.read().decode("utf-8", errors="replace")
duration = time.monotonic() - started
stdout_file.seek(0)
stderr_file.seek(0)
stdout = stdout_file.read()
stderr = stderr_file.read()
if timed_out:
return AgentInvocation(
agent_id=agent.id,
command=command,
prompt=prompt_input,
prompt=prompt,
exit_code=0,
stdout=_extract_ollama_response(raw),
stderr="",
duration_seconds=duration,
)
except TimeoutError:
duration = time.monotonic() - started
return AgentInvocation(
agent_id=agent.id,
command=command,
prompt=prompt,
exit_code=-1,
stdout=stdout,
stderr=stderr,
stdout="",
stderr="Request timed out.",
duration_seconds=duration,
timed_out=True,
)
return AgentInvocation(
agent_id=agent.id,
command=command,
prompt=prompt_input,
exit_code=process.returncode or 0,
stdout=stdout,
stderr=stderr,
duration_seconds=duration,
)
except FileNotFoundError as exc:
except (OSError, URLError) as exc:
duration = time.monotonic() - started
return AgentInvocation(
agent_id=agent.id,
command=command,
prompt=prompt_input,
exit_code=127,
stdout="",
stderr=str(exc),
duration_seconds=duration,
)
except OSError as exc:
duration = time.monotonic() - started
return AgentInvocation(
agent_id=agent.id,
command=command,
prompt=prompt_input,
prompt=prompt,
exit_code=1,
stdout="",
stderr=str(exc),
@ -461,11 +421,22 @@ def _extract_openai_content(raw: str) -> str:
return raw
def _extract_ollama_response(raw: str) -> str:
try:
data = json.loads(raw)
response = data.get("response")
if isinstance(response, str):
return response
except (json.JSONDecodeError, AttributeError):
pass
return raw
def output_contract_for(stage: StageConfig) -> str:
if stage.type == "code_writer":
return "\n".join(
[
"Return a unified diff only, suitable for saving as proposed.patch.",
"Return a unified diff only, suitable for saving as proposed.patch or repair-N.patch.",
"Do not include prose outside the patch.",
"Use diff --git headers and hunk headers.",
"For existing files, do not use new file mode or /dev/null headers.",

View File

@ -370,11 +370,11 @@ class PipelineRunner:
if stage.type == "code_writer":
return self._run_code_writer_stage(stage, task, previous_outputs, retry_notes, retry_count)
if stage.type == "patch_normalizer":
return self._run_patch_normalizer_stage(stage, task, previous_outputs, retry_notes)
return self._run_patch_normalizer_stage(stage, task, previous_outputs, retry_notes, retry_count)
if stage.type == "patch_validator":
return self._run_patch_validator_stage(stage, task, previous_outputs)
return self._run_patch_validator_stage(stage, task, previous_outputs, retry_count)
if stage.type == "patch_apply":
return self._run_patch_apply_stage(stage, task, previous_outputs)
return self._run_patch_apply_stage(stage, task, previous_outputs, retry_count)
if stage.type == "repo_context":
output_path = self.artifacts.write_stage_output(
task.id,
@ -488,7 +488,7 @@ class PipelineRunner:
f"# Implementation Summary\n\nStatus: fail\nReason: {exc}\n",
)
return StageResult(stage.id, "fail", str(exc), output_path=result.output_path)
patch_filename = stage.output or ("proposed.patch" if retry_count == 0 else f"repair-{retry_count}.patch")
patch_filename = "repair-{0}.patch".format(retry_count) if retry_count else (stage.output or "proposed.patch")
summary_filename = "implementation-summary.md" if retry_count == 0 else f"repair-summary-{retry_count}.md"
proposed_path = self.artifacts.write_stage_output(task.id, patch_filename, patch)
summary_path = self.artifacts.write_stage_output(
@ -522,6 +522,7 @@ class PipelineRunner:
task: Task,
previous_outputs: dict[str, str],
retry_notes: list[str],
retry_count: int = 0,
) -> StageResult:
source = _latest_patch_like_output(previous_outputs)
if stage.agent is not None:
@ -539,7 +540,11 @@ class PipelineRunner:
patch = normalize_patch_text(source)
except PipelineError as exc:
return StageResult(stage.id, "fail", str(exc))
output_path = self.artifacts.write_stage_output(task.id, stage.output or "normalized.patch", patch)
output_path = self.artifacts.write_stage_output(
task.id,
_attempt_filename(stage.output or "normalized.patch", retry_count),
patch,
)
self.logger.event(
"artifact.write",
"Wrote normalized patch",
@ -559,7 +564,9 @@ class PipelineRunner:
stage: StageConfig,
task: Task,
previous_outputs: dict[str, str],
retry_count: int = 0,
) -> StageResult:
output_filename = _attempt_filename(stage.output or "patch-validation.md", retry_count)
source = _latest_patch_like_output(previous_outputs)
try:
patch = normalize_patch_text(source)
@ -574,7 +581,7 @@ class PipelineRunner:
except PipelineError as exc:
output_path = self.artifacts.write_stage_output(
task.id,
stage.output or "patch-validation.md",
output_filename,
f"# Patch Validation\n\nStatus: fail\nReason: {exc}\n",
)
return StageResult(
@ -585,7 +592,7 @@ class PipelineRunner:
)
output_path = self.artifacts.write_stage_output(
task.id,
stage.output or "patch-validation.md",
output_filename,
format_validation_result(result),
)
return StageResult(
@ -600,7 +607,9 @@ class PipelineRunner:
stage: StageConfig,
task: Task,
previous_outputs: dict[str, str],
retry_count: int = 0,
) -> StageResult:
output_filename = _attempt_filename(stage.output or "patch-apply-output.txt", retry_count)
source = _latest_patch_like_output(previous_outputs)
try:
patch = normalize_patch_text(source)
@ -615,7 +624,7 @@ class PipelineRunner:
except PipelineError as exc:
output_path = self.artifacts.write_stage_output(
task.id,
stage.output or "patch-apply-output.txt",
output_filename,
f"# Patch Apply\n\nStatus: fail\nReason: {exc}\n",
)
return StageResult(
@ -625,14 +634,18 @@ class PipelineRunner:
output_path=str(output_path.relative_to(self.config.project.root)),
)
applied_path = self.artifacts.write_stage_output(task.id, "applied.patch", patch)
applied_path = self.artifacts.write_stage_output(
task.id,
_attempt_filename("applied.patch", retry_count),
patch,
)
write_git_artifacts(self.artifacts, task.id, "before-patch-apply")
mode = stage.mode or "dry_run"
apply_result = apply_patch_with_git(applied_path, self.config.project.root, mode=mode)
write_git_artifacts(self.artifacts, task.id, "after-patch-apply")
output_path = self.artifacts.write_stage_output(
task.id,
stage.output or "patch-apply-output.txt",
output_filename,
format_patch_apply_result(
apply_result,
applied_path.relative_to(self.config.project.root).as_posix(),
@ -839,6 +852,19 @@ def _latest_patch_like_output(previous_outputs: dict[str, str]) -> str:
raise PipelineError("Patch error: no previous patch output found.")
def _attempt_filename(filename: str, retry_count: int) -> str:
if retry_count <= 0:
return filename
path = Path(filename)
suffix = "".join(path.suffixes)
if suffix:
stem = path.name[: -len(suffix)]
name = f"{stem}-{retry_count}{suffix}"
else:
name = f"{path.name}-{retry_count}"
return path.with_name(name).as_posix()
def format_aggregate_run_summary(results: list[PipelineResult], status: str, reason: str) -> str:
lines = [
"# Run Summary",

View File

@ -1,5 +1,4 @@
from pathlib import Path
import io
import tempfile
import unittest
from unittest.mock import MagicMock, patch
@ -119,27 +118,20 @@ class AgentExecutorTests(unittest.TestCase):
task = parse_tasks(TASK_MD)[0]
stage = StageConfig(id="plan", type="agent", agent="planner", output="plan.md")
class FakePopen:
def __init__(self, args, cwd=None, stdin=None, stdout=None, stderr=None, **kwargs):
self.args = args
self.stdin = io.StringIO()
self.returncode = 0
stdout.write("ollama output")
response = MagicMock()
response.__enter__.return_value.read.return_value = b'{"response":"ollama output"}'
def poll(self):
return self.returncode
def wait(self):
return self.returncode
with patch("nightshift.agents.subprocess.Popen", side_effect=FakePopen) as popen:
with patch("nightshift.agents.request.urlopen", return_value=response) as urlopen:
result = executor.run_stage(stage, task)
self.assertEqual(result.status, "pass")
popen.assert_called_once()
self.assertEqual(popen.call_args.args[0], ["ollama", "run", "tiny-model"])
request_obj = urlopen.call_args.args[0]
body = request_obj.data.decode("utf-8")
self.assertIn('"model": "tiny-model"', body)
self.assertIn('"stream": false', body)
output = (root / result.output_path).read_text(encoding="utf-8")
self.assertIn("ollama run tiny-model", output)
self.assertIn("POST http://localhost:11434/api/generate", output)
self.assertIn("ollama output", output)
def test_openai_compatible_agent_sends_temperature(self) -> None:
with tempfile.TemporaryDirectory() as directory:

View File

@ -484,7 +484,7 @@ Acceptance Criteria:
encoding="utf-8",
)
stages = (
StageConfig(id="write", type="code_writer", agent="writer"),
StageConfig(id="write", type="code_writer", agent="writer", output="proposed.patch"),
StageConfig(id="normalize", type="patch_normalizer"),
StageConfig(id="validate", type="patch_validator", on_fail="write"),
)
@ -506,6 +506,10 @@ Acceptance Criteria:
any("creates existing file" in stage.reason for stage in result.stage_results)
)
self.assertTrue((task_dir / "repair-1.patch").exists())
self.assertTrue((task_dir / "normalized.patch").exists())
self.assertTrue((task_dir / "normalized-1.patch").exists())
self.assertTrue((task_dir / "patch-validation.md").exists())
self.assertTrue((task_dir / "patch-validation-1.md").exists())
def test_patch_apply_stage_applies_patch(self) -> None:
with tempfile.TemporaryDirectory() as directory:
@ -615,6 +619,10 @@ Acceptance Criteria:
self.assertEqual((root / "app.py").read_text(encoding="utf-8"), "new\n")
self.assertTrue((task_dir / "repair-1.patch").exists())
self.assertTrue((task_dir / "repair-summary-1.md").exists())
self.assertTrue((task_dir / "normalized-1.patch").exists())
self.assertTrue((task_dir / "patch-validation-1.md").exists())
self.assertTrue((task_dir / "applied-1.patch").exists())
self.assertTrue((task_dir / "patch-apply-output-1.txt").exists())
def _write_common_files(root: Path) -> None: