The ollama backend now uses Ollama’s HTTP API instead of ollama run

2026-06-14 10:08:37 +00:00 · 2026-05-17 14:23:31 -07:00 · 2026-05-17 14:23:31 -07:00 · 42564c6867
commit 42564c6867
parent db9b24379e
10 changed files with 132 additions and 114 deletions
--- a/QUICKSTART.md
+++ b/QUICKSTART.md
@ -83,6 +83,8 @@ tasks/TASK-001/context-out.md
 tasks/TASK-001/final-notes.md
 ```
 Retry attempts preserve separate artifacts with numeric suffixes, such as `repair-1.patch`, `normalized-1.patch`, `patch-validation-1.md`, `applied-1.patch`, and `patch-apply-output-1.txt`.
 ## Example Templates
 Example run files are available in `examples/templates/`.
@ -363,4 +365,6 @@ After a run, inspect:
 .nightshift/runs/<run-id>/tasks/TASK-001/final-notes.md
 ```
 If validation or later stages retry implementation, inspect the suffixed retry artifacts too, for example `repair-1.patch` and `patch-validation-1.md`.
 The useful signal is whether NightShift selected the right task, respected dependencies, generated context, validated and applied a patch, ran tests, wrote artifacts, updated task completion, and produced a clear summary.
--- a/README.md
+++ b/README.md
@ -195,10 +195,13 @@ agents:
  implementer:
    backend: ollama
    model: qwen2.5-coder:14b
    base_url: http://localhost:11434
    temperature: 0.2
    system_prompt: agents/implementer.md
 ```
 The Ollama backend uses the local HTTP API instead of `ollama run`, which keeps exact patch output away from terminal rendering and line wrapping.
 Example OpenAI-compatible agent:
 ```yaml
@ -212,7 +215,7 @@ agents:
    system_prompt: agents/implementer.md
 ```
-NightShift passes prompt bundles to agents and persists stdout, stderr, exit code, duration, and prompt artifacts. Code writer agents should return unified diffs.
+NightShift passes prompt bundles to agents and persists stdout, stderr, exit code, duration, and prompt artifacts. Code writer agents should return unified diffs. On retries, patch artifacts are versioned by attempt, for example `repair-1.patch`, `normalized-1.patch`, and `patch-validation-1.md`.
 Review agents should emit:
@ -274,10 +277,15 @@ A run creates human-readable artifacts:
          context-pack.md
          plan.md
          proposed.patch
          repair-1.patch
          normalized.patch
          normalized-1.patch
          patch-validation.md
          patch-validation-1.md
          applied.patch
          applied-1.patch
          patch-apply-output.txt
          patch-apply-output-1.txt
          test-output.txt
          review.md
          stage-results.md
--- a/docs/config-reference.md
+++ b/docs/config-reference.md
@ -27,7 +27,8 @@ NightShift config is YAML.
 Supported backends:
 - `command`: runs a local command with the prompt on stdin.
- `ollama`: runs `ollama run <model>` with the prompt on stdin.
+- `ollama`: calls the local Ollama HTTP API at `http://localhost:11434/api/generate` by default.
 - `openai_compatible`: calls a Chat Completions-compatible HTTP API.
 Command agent:
@ -44,6 +45,7 @@ Ollama agent:
 planner:
  backend: ollama
  model: qwen2.5-coder:14b
  base_url: http://localhost:11434
  system_prompt: agents/planner.md
 ```
--- a/docs/design.md
+++ b/docs/design.md
@ -865,7 +865,7 @@ NightShift currently provides:
 * Command, agent, agent-review, review, summarize, repo-context, code-writer, patch-normalizer, patch-validator, and patch-apply stage handling
 * Retry redirection with a configured task retry limit
 * Command-backed agents
-* Ollama-backed local model agents
+* Ollama-backed local model agents through the local HTTP API
 * OpenAI-compatible local/server model agents
 * Per-agent temperature settings
 * Scoped repo lookup tools: `list_files`, `read_file`, and `grep`
@ -874,6 +874,7 @@ NightShift currently provides:
 * Context pack generation
 * Unified diff code-writing contract
 * Patch normalization, validation, dry-run, and apply modes
 * Per-attempt retry patch artifacts such as `repair-1.patch`, `normalized-1.patch`, and `patch-validation-1.md`
 * Test/static failure repair loops via bounded stage retries
 * Prompt bundle construction with project, task, retry, and previous-stage context
 * Prompt snapshots and run metadata for experiment comparison
@ -1014,13 +1015,13 @@ The next important additions are:
   Move max files, max lines, forbidden paths, allowed file types, binary rejection, and protected files into a reusable project-level write policy.
 5. Better model backend support
-   Expand OpenAI-compatible behavior, add request metadata artifacts, support response format hints, and document local server patterns. Prefer non-terminal APIs for machine-readable model output. In particular, avoid relying on interactive CLI streaming paths such as `ollama run` when exact patch text matters; use the Ollama HTTP API or OpenAI-compatible endpoint so terminal rendering, spinners, and line-wrapping behavior cannot corrupt artifacts.
+   Expand OpenAI-compatible behavior, add request metadata artifacts, support response format hints, and document local server patterns. Machine-readable Ollama output now uses the HTTP API instead of the interactive `ollama run` terminal path; keep this non-terminal capture policy for future model backends where exact patch text matters.
 6. Deterministic diff generation
   Reduce direct reliance on models emitting perfect unified diffs. Add a workflow where the model returns complete file contents or a structured edit description, then NightShift writes the unified diff deterministically from before/after file snapshots. Keep the existing unified-diff contract for advanced agents, but make deterministic diff generation the preferred path for smaller local models.
 7. Retry artifact versioning
-   Preserve per-attempt artifacts instead of overwriting fixed filenames such as `proposed.patch`, `normalized.patch`, and `patch-validation.md`. Retry artifacts should include attempt numbers, while summary artifacts can point to the latest attempt. This makes repeated validation and repair failures diagnosable.
+   Continue improving per-attempt artifact preservation. Patch retries now preserve files such as `repair-1.patch`, `normalized-1.patch`, and `patch-validation-1.md`; future work should add richer latest-attempt indexes and dashboard navigation.
 8. Patch repair stage
   Add an explicit patch repair or strict normalizer stage that receives the invalid patch, validation error, and relevant source excerpts, then returns a complete replacement patch. This stage should remain bounded by strict validation and should not silently guess intent for arbitrary malformed hunks.
@ -1042,7 +1043,7 @@ The next important additions are:
 Implementation note:
-Recent local-model patch experiments exposed repeated line-fragment artifacts where long generated lines were split and the tail was duplicated on the following line. This affected prose and unified diffs, producing malformed hunk lines that strict validation correctly rejected. Treat this as a backend/output-capture and patch-contract problem before adding editor or linter agents: remove terminal streaming from model capture, preserve retry artifacts, and prefer deterministic diff generation when exact syntax matters.
+Recent local-model patch experiments exposed repeated line-fragment artifacts where long generated lines were split and the tail was duplicated on the following line. This affected prose and unified diffs, producing malformed hunk lines that strict validation correctly rejected. Treat this as a backend/output-capture and patch-contract problem before adding editor or linter agents: avoid terminal streaming for machine output, preserve retry artifacts, and prefer deterministic diff generation when exact syntax matters.
 --- 
 # Appendix A: Design Decisions and Rationale
--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@ -18,7 +18,7 @@ If `require_clean_worktree: true`, NightShift blocks dirty repositories before c
 ## Ollama backend fails
-The `ollama` backend requires the `ollama` executable to be installed and the configured model to be available. Tests do not require Ollama.
+The `ollama` backend uses Ollama's local HTTP API, normally at `http://localhost:11434/api/generate`. Confirm Ollama is running and the configured model is available with `ollama list` or `ollama pull <model>`. Tests do not require Ollama.
 ## Flask dashboard fails
--- a/examples/tutorial/01-intro.md
+++ b/examples/tutorial/01-intro.md
@ -29,10 +29,10 @@ Install and start Ollama, then make sure the model is available:
 ```bash
 ollama pull qwen2.5-coder:14b
-ollama run qwen2.5-coder:14b
+ollama list
 ```
-Stop the interactive `ollama run` session after confirming the model responds. NightShift will invoke Ollama itself.
+Keep Ollama running. NightShift uses Ollama's local HTTP API, normally at `http://localhost:11434`, rather than the interactive `ollama run` terminal path.
 ## 1. Create a Scratch Target Project
@ -189,6 +189,8 @@ Inspect these artifacts:
 .nightshift/runs/<run-id>/tasks/TASK-001/final-notes.md
 ```
 If a later stage routes back to `implement`, retry artifacts are written with attempt suffixes such as `repair-1.patch`, `normalized-1.patch`, `patch-validation-1.md`, `applied-1.patch`, and `patch-apply-output-1.txt`.
 In dry-run mode, the patch should be validated and checked with `git apply --check`, but files should not change.
 ## 5. Apply The Patch
@ -215,6 +217,7 @@ If the model generates a valid patch, NightShift will:
 - apply the patch with `git apply`
 - run `python -m unittest discover -v`
 - retry through the implementer if the test stage fails and `max_task_retries` allows it
 - preserve per-attempt retry patch artifacts with numeric suffixes
 - mark the task complete only if the pipeline completes
 ## 6. Monitor From The Web Dashboard
@ -277,13 +280,13 @@ Once you trust the workflow, consider setting `require_clean_worktree: true` in
 ## Troubleshooting
-If Ollama is not found:
+If Ollama is unavailable:
 ```text
-Agent exited with code 127
+Agent exited with code 1
 ```
-Confirm `ollama` is installed and available on `PATH`.
+Confirm Ollama is running at the configured `base_url` and the model appears in `ollama list`.
 If the model returns prose instead of a patch, tighten `agents/implementer.md`. The implementation stage requires a unified diff.
@ -291,8 +294,11 @@ If patch validation fails, inspect:
 ```text
 patch-validation.md
 patch-validation-1.md
 normalized.patch
 normalized-1.patch
 proposed.patch
 repair-1.patch
 ```
 If patch apply fails, inspect:
--- a/nightshift/agents.py
+++ b/nightshift/agents.py
@ -8,7 +8,6 @@ import os
 from pathlib import Path
 import re
 import subprocess
 import tempfile
 import time
 from urllib import request
 from urllib.error import URLError
@ -23,7 +22,6 @@ from .tasks import Task
 DEFAULT_AGENT_TIMEOUT_SECONDS = 600
 OLLAMA_HEARTBEAT_SECONDS = 30.0
@dataclass(frozen=True)
@ -222,96 +220,58 @@ class AgentExecutor:
    def _invoke_ollama(self, agent: AgentConfig, prompt: str) -> AgentInvocation:
        if not agent.model:
            raise AgentError(f"Agent error: ollama backend agent '{agent.id}' has no model.")
-        command = f"ollama run {agent.model}"
+        base_url = (agent.base_url or "http://localhost:11434").rstrip("/")
-        prompt_input = prompt
+        url = base_url + "/api/generate"
        command = f"POST {url}"
        body: dict[str, object] = {
            "model": agent.model,
            "prompt": prompt,
            "stream": False,
        }
        if agent.temperature is not None:
-            prompt_input = f"/set parameter temperature {agent.temperature}\n{prompt}"
+            body["options"] = {"temperature": agent.temperature}
        headers = {"Content-Type": "application/json"}
        started = time.monotonic()
        self.logger.event(
            "ollama.start",
-            "Starting Ollama model invocation",
+            "Starting Ollama HTTP model invocation",
            agent_id=agent.id,
            model=agent.model,
            timeout_seconds=self.timeout_seconds,
        )
        try:
-            with tempfile.TemporaryFile("w+", encoding="utf-8", errors="replace") as stdout_file:
+            payload = json.dumps(body).encode("utf-8")
-                with tempfile.TemporaryFile("w+", encoding="utf-8", errors="replace") as stderr_file:
+            req = request.Request(url, data=payload, headers=headers, method="POST")
-                    process = subprocess.Popen(
+            with request.urlopen(req, timeout=self.timeout_seconds) as response:
-                        ["ollama", "run", agent.model],
+                raw = response.read().decode("utf-8", errors="replace")
                        cwd=self.project_root,
                        stdin=subprocess.PIPE,
                        stdout=stdout_file,
                        stderr=stderr_file,
                        text=True,
                        encoding="utf-8",
                        errors="replace",
                    )
                    assert process.stdin is not None
                    process.stdin.write(prompt_input)
                    process.stdin.close()
                    last_heartbeat = started
                    timed_out = False
                    while process.poll() is None:
                        now = time.monotonic()
                        elapsed = now - started
                        if elapsed > self.timeout_seconds:
                            process.kill()
                            timed_out = True
                            break
                        if now - last_heartbeat >= OLLAMA_HEARTBEAT_SECONDS:
                            self.logger.event(
                                "ollama.wait",
                                "Ollama invocation still running",
                                agent_id=agent.id,
                                model=agent.model,
                                elapsed=f"{elapsed:.0f}s",
                            )
                            last_heartbeat = now
                        time.sleep(1.0)
                    process.wait()
            duration = time.monotonic() - started
                    stdout_file.seek(0)
                    stderr_file.seek(0)
                    stdout = stdout_file.read()
                    stderr = stderr_file.read()
                    if timed_out:
            return AgentInvocation(
                agent_id=agent.id,
                command=command,
-                            prompt=prompt_input,
+                prompt=prompt,
                exit_code=0,
                stdout=_extract_ollama_response(raw),
                stderr="",
                duration_seconds=duration,
            )
        except TimeoutError:
            duration = time.monotonic() - started
            return AgentInvocation(
                agent_id=agent.id,
                command=command,
                prompt=prompt,
                exit_code=-1,
-                            stdout=stdout,
+                stdout="",
-                            stderr=stderr,
+                stderr="Request timed out.",
                duration_seconds=duration,
                timed_out=True,
            )
-                    return AgentInvocation(
+        except (OSError, URLError) as exc:
                        agent_id=agent.id,
                        command=command,
                        prompt=prompt_input,
                        exit_code=process.returncode or 0,
                        stdout=stdout,
                        stderr=stderr,
                        duration_seconds=duration,
                    )
        except FileNotFoundError as exc:
            duration = time.monotonic() - started
            return AgentInvocation(
                agent_id=agent.id,
                command=command,
-                prompt=prompt_input,
+                prompt=prompt,
                exit_code=127,
                stdout="",
                stderr=str(exc),
                duration_seconds=duration,
            )
        except OSError as exc:
            duration = time.monotonic() - started
            return AgentInvocation(
                agent_id=agent.id,
                command=command,
                prompt=prompt_input,
                exit_code=1,
                stdout="",
                stderr=str(exc),
@ -461,11 +421,22 @@ def _extract_openai_content(raw: str) -> str:
    return raw
 def _extract_ollama_response(raw: str) -> str:
    try:
        data = json.loads(raw)
        response = data.get("response")
        if isinstance(response, str):
            return response
    except (json.JSONDecodeError, AttributeError):
        pass
    return raw
 def output_contract_for(stage: StageConfig) -> str:
    if stage.type == "code_writer":
        return "\n".join(
            [
-                "Return a unified diff only, suitable for saving as proposed.patch.",
+                "Return a unified diff only, suitable for saving as proposed.patch or repair-N.patch.",
                "Do not include prose outside the patch.",
                "Use diff --git headers and hunk headers.",
                "For existing files, do not use new file mode or /dev/null headers.",
--- a/nightshift/pipeline.py
+++ b/nightshift/pipeline.py
@ -370,11 +370,11 @@ class PipelineRunner:
        if stage.type == "code_writer":
            return self._run_code_writer_stage(stage, task, previous_outputs, retry_notes, retry_count)
        if stage.type == "patch_normalizer":
-            return self._run_patch_normalizer_stage(stage, task, previous_outputs, retry_notes)
+            return self._run_patch_normalizer_stage(stage, task, previous_outputs, retry_notes, retry_count)
        if stage.type == "patch_validator":
-            return self._run_patch_validator_stage(stage, task, previous_outputs)
+            return self._run_patch_validator_stage(stage, task, previous_outputs, retry_count)
        if stage.type == "patch_apply":
-            return self._run_patch_apply_stage(stage, task, previous_outputs)
+            return self._run_patch_apply_stage(stage, task, previous_outputs, retry_count)
        if stage.type == "repo_context":
            output_path = self.artifacts.write_stage_output(
                task.id,
@ -488,7 +488,7 @@ class PipelineRunner:
                f"# Implementation Summary\n\nStatus: fail\nReason: {exc}\n",
            )
            return StageResult(stage.id, "fail", str(exc), output_path=result.output_path)
-        patch_filename = stage.output or ("proposed.patch" if retry_count == 0 else f"repair-{retry_count}.patch")
+        patch_filename = "repair-{0}.patch".format(retry_count) if retry_count else (stage.output or "proposed.patch")
        summary_filename = "implementation-summary.md" if retry_count == 0 else f"repair-summary-{retry_count}.md"
        proposed_path = self.artifacts.write_stage_output(task.id, patch_filename, patch)
        summary_path = self.artifacts.write_stage_output(
@ -522,6 +522,7 @@ class PipelineRunner:
        task: Task,
        previous_outputs: dict[str, str],
        retry_notes: list[str],
        retry_count: int = 0,
    ) -> StageResult:
        source = _latest_patch_like_output(previous_outputs)
        if stage.agent is not None:
@ -539,7 +540,11 @@ class PipelineRunner:
            patch = normalize_patch_text(source)
        except PipelineError as exc:
            return StageResult(stage.id, "fail", str(exc))
-        output_path = self.artifacts.write_stage_output(task.id, stage.output or "normalized.patch", patch)
+        output_path = self.artifacts.write_stage_output(
            task.id,
            _attempt_filename(stage.output or "normalized.patch", retry_count),
            patch,
        )
        self.logger.event(
            "artifact.write",
            "Wrote normalized patch",
@ -559,7 +564,9 @@ class PipelineRunner:
        stage: StageConfig,
        task: Task,
        previous_outputs: dict[str, str],
        retry_count: int = 0,
    ) -> StageResult:
        output_filename = _attempt_filename(stage.output or "patch-validation.md", retry_count)
        source = _latest_patch_like_output(previous_outputs)
        try:
            patch = normalize_patch_text(source)
@ -574,7 +581,7 @@ class PipelineRunner:
        except PipelineError as exc:
            output_path = self.artifacts.write_stage_output(
                task.id,
-                stage.output or "patch-validation.md",
+                output_filename,
                f"# Patch Validation\n\nStatus: fail\nReason: {exc}\n",
            )
            return StageResult(
@ -585,7 +592,7 @@ class PipelineRunner:
            )
        output_path = self.artifacts.write_stage_output(
            task.id,
-            stage.output or "patch-validation.md",
+            output_filename,
            format_validation_result(result),
        )
        return StageResult(
@ -600,7 +607,9 @@ class PipelineRunner:
        stage: StageConfig,
        task: Task,
        previous_outputs: dict[str, str],
        retry_count: int = 0,
    ) -> StageResult:
        output_filename = _attempt_filename(stage.output or "patch-apply-output.txt", retry_count)
        source = _latest_patch_like_output(previous_outputs)
        try:
            patch = normalize_patch_text(source)
@ -615,7 +624,7 @@ class PipelineRunner:
        except PipelineError as exc:
            output_path = self.artifacts.write_stage_output(
                task.id,
-                stage.output or "patch-apply-output.txt",
+                output_filename,
                f"# Patch Apply\n\nStatus: fail\nReason: {exc}\n",
            )
            return StageResult(
@ -625,14 +634,18 @@ class PipelineRunner:
                output_path=str(output_path.relative_to(self.config.project.root)),
            )
-        applied_path = self.artifacts.write_stage_output(task.id, "applied.patch", patch)
+        applied_path = self.artifacts.write_stage_output(
            task.id,
            _attempt_filename("applied.patch", retry_count),
            patch,
        )
        write_git_artifacts(self.artifacts, task.id, "before-patch-apply")
        mode = stage.mode or "dry_run"
        apply_result = apply_patch_with_git(applied_path, self.config.project.root, mode=mode)
        write_git_artifacts(self.artifacts, task.id, "after-patch-apply")
        output_path = self.artifacts.write_stage_output(
            task.id,
-            stage.output or "patch-apply-output.txt",
+            output_filename,
            format_patch_apply_result(
                apply_result,
                applied_path.relative_to(self.config.project.root).as_posix(),
@ -839,6 +852,19 @@ def _latest_patch_like_output(previous_outputs: dict[str, str]) -> str:
    raise PipelineError("Patch error: no previous patch output found.")
 def _attempt_filename(filename: str, retry_count: int) -> str:
    if retry_count <= 0:
        return filename
    path = Path(filename)
    suffix = "".join(path.suffixes)
    if suffix:
        stem = path.name[: -len(suffix)]
        name = f"{stem}-{retry_count}{suffix}"
    else:
        name = f"{path.name}-{retry_count}"
    return path.with_name(name).as_posix()
 def format_aggregate_run_summary(results: list[PipelineResult], status: str, reason: str) -> str:
    lines = [
        "# Run Summary",
--- a/tests/test_agents.py
+++ b/tests/test_agents.py
@ -1,5 +1,4 @@
 from pathlib import Path
 import io
 import tempfile
 import unittest
 from unittest.mock import MagicMock, patch
@ -119,27 +118,20 @@ class AgentExecutorTests(unittest.TestCase):
            task = parse_tasks(TASK_MD)[0]
            stage = StageConfig(id="plan", type="agent", agent="planner", output="plan.md")
-            class FakePopen:
+            response = MagicMock()
-                def __init__(self, args, cwd=None, stdin=None, stdout=None, stderr=None, **kwargs):
+            response.__enter__.return_value.read.return_value = b'{"response":"ollama output"}'
                    self.args = args
                    self.stdin = io.StringIO()
                    self.returncode = 0
                    stdout.write("ollama output")
-                def poll(self):
+            with patch("nightshift.agents.request.urlopen", return_value=response) as urlopen:
                    return self.returncode
                def wait(self):
                    return self.returncode
            with patch("nightshift.agents.subprocess.Popen", side_effect=FakePopen) as popen:
                result = executor.run_stage(stage, task)
            self.assertEqual(result.status, "pass")
-            popen.assert_called_once()
+            request_obj = urlopen.call_args.args[0]
-            self.assertEqual(popen.call_args.args[0], ["ollama", "run", "tiny-model"])
+            body = request_obj.data.decode("utf-8")
            self.assertIn('"model": "tiny-model"', body)
            self.assertIn('"stream": false', body)
            output = (root / result.output_path).read_text(encoding="utf-8")
-            self.assertIn("ollama run tiny-model", output)
+            self.assertIn("POST http://localhost:11434/api/generate", output)
            self.assertIn("ollama output", output)
    def test_openai_compatible_agent_sends_temperature(self) -> None:
        with tempfile.TemporaryDirectory() as directory:
--- a/tests/test_pipeline.py
+++ b/tests/test_pipeline.py
@ -484,7 +484,7 @@ Acceptance Criteria:
                encoding="utf-8",
            )
            stages = (
-                StageConfig(id="write", type="code_writer", agent="writer"),
+                StageConfig(id="write", type="code_writer", agent="writer", output="proposed.patch"),
                StageConfig(id="normalize", type="patch_normalizer"),
                StageConfig(id="validate", type="patch_validator", on_fail="write"),
            )
@ -506,6 +506,10 @@ Acceptance Criteria:
                any("creates existing file" in stage.reason for stage in result.stage_results)
            )
            self.assertTrue((task_dir / "repair-1.patch").exists())
            self.assertTrue((task_dir / "normalized.patch").exists())
            self.assertTrue((task_dir / "normalized-1.patch").exists())
            self.assertTrue((task_dir / "patch-validation.md").exists())
            self.assertTrue((task_dir / "patch-validation-1.md").exists())
    def test_patch_apply_stage_applies_patch(self) -> None:
        with tempfile.TemporaryDirectory() as directory:
@ -615,6 +619,10 @@ Acceptance Criteria:
            self.assertEqual((root / "app.py").read_text(encoding="utf-8"), "new\n")
            self.assertTrue((task_dir / "repair-1.patch").exists())
            self.assertTrue((task_dir / "repair-summary-1.md").exists())
            self.assertTrue((task_dir / "normalized-1.patch").exists())
            self.assertTrue((task_dir / "patch-validation-1.md").exists())
            self.assertTrue((task_dir / "applied-1.patch").exists())
            self.assertTrue((task_dir / "patch-apply-output-1.txt").exists())
 def _write_common_files(root: Path) -> None: