diff --git a/QUICKSTART.md b/QUICKSTART.md index ceaa198..01fb210 100644 --- a/QUICKSTART.md +++ b/QUICKSTART.md @@ -255,7 +255,7 @@ Acceptance Criteria: - Includes tests for both branches ``` -### 4. Add Prompt Files +### 4. Add Prompt Files and Fake Agent Fixtures `agents/planner.md`: @@ -271,7 +271,30 @@ You are the implementation agent. Output only a unified diff. Preserve existing behavior and include tests when needed. ``` -For deterministic local fixtures, add `agents/fake_planner.py` that requests file lookups and `agents/fake_code_writer.py` that prints a unified diff. The included `examples/quickstart-lisp/` project contains working fixtures. +The config above also expects two deterministic Python fixtures: + +```text +agents/fake_planner.py +agents/fake_code_writer.py +``` + +If you have the NightShift checkout locally, copy the working fixtures from the included example project. + +PowerShell: + +```powershell +Copy-Item C:\Users\metis\Documents\GitHub\nightShift\examples\quickstart-lisp\agents\fake_planner.py agents\ +Copy-Item C:\Users\metis\Documents\GitHub\nightShift\examples\quickstart-lisp\agents\fake_code_writer.py agents\ +``` + +Bash: + +```bash +cp /path/to/nightShift/examples/quickstart-lisp/agents/fake_planner.py agents/ +cp /path/to/nightShift/examples/quickstart-lisp/agents/fake_code_writer.py agents/ +``` + +These fixtures make the manual project behave like `examples/quickstart-lisp/`: the fake planner requests repository context, and the fake code writer emits a real unified diff. `agents/reviewer.md`: diff --git a/README.md b/README.md index 48d6524..0c14aa6 100644 --- a/README.md +++ b/README.md @@ -68,7 +68,13 @@ python -m nightshift.cli --help NightShift uses the Python standard library for runtime behavior where practical. PyYAML is used automatically if installed, but starter configs work with the built-in YAML subset parser. -## Quickstart +## Getting Started + +Start with the [Quickstart](QUICKSTART.md). It uses deterministic fake agents so you can verify lookup, context generation, patch validation, patch apply, tests, and artifacts without installing a model. + +After that works, continue with [Tutorial 01: Running NightShift With Real Local Models](docs/tutorial/01-intro.md). It swaps the fake agents for Ollama-backed agents such as `qwen2.5-coder:14b` and walks through dry-run and apply-mode patch generation. + +### Quickstart Commands Validate the included end-to-end patch example: @@ -298,6 +304,7 @@ python -m compileall nightshift tests Additional docs: - [Quickstart](QUICKSTART.md) +- [Tutorial: running real local models](docs/tutorial/01-intro.md) - [Config reference](docs/config-reference.md) - [Artifact review workflow](docs/artifact-review.md) - [Troubleshooting](docs/troubleshooting.md) diff --git a/docs/tutorial/01-intro.md b/docs/tutorial/01-intro.md new file mode 100644 index 0000000..add93d6 --- /dev/null +++ b/docs/tutorial/01-intro.md @@ -0,0 +1,318 @@ +# Tutorial 01: Running NightShift With Real Local Models + +This tutorial starts after the quickstart. The quickstart uses fake command agents so you can verify the pipeline deterministically. Here, you will replace those fake agents with real Ollama-backed agents and let a model generate a real patch. + +The examples use `qwen2.5-coder:14b`, but any local coding model that can follow a strict unified-diff contract can be used. + +## What You Will Build + +You will run NightShift against a copy of the tiny Lisp example and use a local model to: + +1. Inspect task and repository context. +2. Produce a plan. +3. Generate a unified diff. +4. Normalize and validate that patch. +5. Dry-run the patch. +6. Optionally apply the patch and run tests. + +NightShift still controls the workflow. The model proposes code; NightShift validates and applies the patch. + +## Prerequisites + +Install NightShift from this repository: + +```bash +python -m pip install -e . +``` + +Install and start Ollama, then make sure the model is available: + +```bash +ollama pull qwen2.5-coder:14b +ollama run qwen2.5-coder:14b +``` + +Stop the interactive `ollama run` session after confirming the model responds. NightShift will invoke Ollama itself. + +## 1. Create a Scratch Target Project + +Do not run apply-mode experiments directly against the checked-in example. Copy it somewhere disposable. + +PowerShell: + +```powershell +Copy-Item -Recurse C:\Users\metis\Documents\GitHub\nightShift\examples\quickstart-lisp C:\Users\metis\Documents\tiny-lisp-model +Set-Location C:\Users\metis\Documents\tiny-lisp-model +``` + +Bash: + +```bash +cp -r /path/to/nightShift/examples/quickstart-lisp ~/tiny-lisp-model +cd ~/tiny-lisp-model +``` + +Validate the copied project: + +```bash +python -m nightshift.cli validate --config nightshift.yaml +``` + +## 2. Replace Fake Agents With Ollama Agents + +Edit `nightshift.yaml`. + +Replace the `agents:` section with: + +```yaml +agents: + planner: + backend: ollama + model: qwen2.5-coder:14b + temperature: 0.2 + system_prompt: agents/planner.md + + implementer: + backend: ollama + model: qwen2.5-coder:14b + temperature: 0.1 + system_prompt: agents/implementer.md + + reviewer: + backend: ollama + model: qwen2.5-coder:14b + temperature: 0.1 + system_prompt: agents/reviewer.md +``` + +Then update the experiment labels: + +```yaml +experiment: + label: quickstart-lisp-real-model + prompt_variant: ollama-qwen25-coder-14b-v1 +``` + +## 3. Strengthen The Prompts + +Real models need stricter instructions than fake fixtures. + +Use this for `agents/planner.md`: + +```markdown +You are the planning agent for NightShift. + +Create a concise implementation plan for the current task. + +If you need repository context before planning, output lookup requests exactly like this: + +lookup_requests: +- tool: read_file + path: relative/path.py +- tool: grep + path: . + pattern: search_regex + +After context is provided, write a short plan with: +- files to edit +- tests to add or update +- risks + +Do not write code. +``` + +Use this for `agents/implementer.md`: + +```markdown +You are the implementation agent for NightShift. + +Output only a unified diff. +Do not wrap the patch in markdown fences. +Do not include explanations before or after the patch. +Use diff --git headers. +Include tests when needed. +Keep the change as small as possible. +Only edit files needed for the task. +``` + +Use this for `agents/reviewer.md`: + +```markdown +You are the review agent for NightShift. + +Review the task, plan, patch artifacts, test output, and final state. + +Output exactly: + +status: pass | fail | retry | escalate +reason: +next_stage: +context_update: + +Use retry when the implementation is close but needs another patch. +Use fail when the patch is unsafe, unrelated, or clearly broken. +Use pass only when the acceptance criteria are satisfied. +``` + +## 4. Start With Dry Run Mode + +Before letting a model edit files, set patch apply to dry run. + +In `nightshift.yaml`: + +```yaml +- id: apply_patch + type: patch_apply + mode: dry_run + output: patch-apply-output.txt +``` + +Run one task: + +```bash +python -m nightshift.cli run --config nightshift.yaml --task TASK-001 +``` + +Inspect these artifacts: + +```text +.nightshift/runs//run.log +.nightshift/runs//tasks/TASK-001/plan.md +.nightshift/runs//tasks/TASK-001/context-pack.md +.nightshift/runs//tasks/TASK-001/proposed.patch +.nightshift/runs//tasks/TASK-001/normalized.patch +.nightshift/runs//tasks/TASK-001/patch-validation.md +.nightshift/runs//tasks/TASK-001/patch-apply-output.txt +.nightshift/runs//tasks/TASK-001/final-notes.md +``` + +In dry-run mode, the patch should be validated and checked with `git apply --check`, but files should not change. + +## 5. Apply The Patch + +If the dry run looks good, switch to apply mode: + +```yaml +- id: apply_patch + type: patch_apply + mode: apply + output: patch-apply-output.txt +``` + +Run again: + +```bash +python -m nightshift.cli run --config nightshift.yaml --task TASK-001 +``` + +If the model generates a valid patch, NightShift will: + +- write `applied.patch` +- apply the patch with `git apply` +- run `python -m unittest discover -v` +- retry through the implementer if the test stage fails and `max_task_retries` allows it +- mark the task complete only if the pipeline completes + +## 6. Monitor From The Web Dashboard + +Install Flask if needed: + +```bash +python -m pip install flask +``` + +Start the read-only dashboard: + +```bash +python -m nightshift.cli web --config nightshift.yaml +``` + +Open the displayed local URL. The dashboard reads artifacts from `.nightshift/runs/` and shows the latest run summary and log tail. + +## 7. Recommended First Settings + +For real models, start conservatively: + +```yaml +pipeline: + max_task_retries: 1 + continue_on_task_failure: false +``` + +Patch validator: + +```yaml +- id: validate_patch + type: patch_validator + output: patch-validation.md + max_files: 4 + max_lines: 400 + forbidden_paths: + - .git + - .nightshift + - .env +``` + +Safety: + +```yaml +safety: + require_clean_worktree: false + scoped_paths: + - . + allowed_commands: + - python -m unittest discover -v + forbidden_commands: + - rm -rf + - git push + - curl | bash +``` + +Once you trust the workflow, consider setting `require_clean_worktree: true` in real repositories. + +## Troubleshooting + +If Ollama is not found: + +```text +Agent exited with code 127 +``` + +Confirm `ollama` is installed and available on `PATH`. + +If the model returns prose instead of a patch, tighten `agents/implementer.md`. The implementation stage requires a unified diff. + +If patch validation fails, inspect: + +```text +patch-validation.md +normalized.patch +proposed.patch +``` + +If patch apply fails, inspect: + +```text +patch-apply-output.txt +applied.patch +``` + +If tests fail, inspect: + +```text +test-output.txt +repair-1.patch +repair-summary-1.md +``` + +Repair artifacts only appear when a later stage routes back to `implement` and the retry limit allows another attempt. + +## What To Try Next + +After `TASK-001` works: + +```bash +python -m nightshift.cli run --config nightshift.yaml --all +``` + +Keep reviewing patches before trusting longer runs. The point of NightShift is not blind autonomy; it is controlled, reviewable leverage.