mirror of
https://github.com/khodges42/nightShift.git
synced 2026-06-14 18:18:36 +00:00
Add tutorial integration workflow helpers
- Add `integ-test` to create, set up, validate, and run integration template tasks - Add `integ-report` to summarize latest integration run artifacts - Switch default pastebin template from model fallback to single `qwen3-coder:30b` - Support optional Ollama fields: `num_ctx`, `num_predict`, `seed`, and `stop` - Add `nightshift validate` preflight for task-specific test files - Update pastebin docs, config reference, and ideas tracking - Add tests for integration helpers, task-test validation, config parsing, and template expectations
This commit is contained in:
parent
e3679296fd
commit
f7fed4535b
|
|
@ -1,195 +0,0 @@
|
||||||
# Bugfix TODO
|
|
||||||
|
|
||||||
## Some issues going with run --all
|
|
||||||
reason=Stage 'review' requested unknown next stage 'None'. Not every time. I think there's a pattern that is out of place here. Maybe it's related to the last task success? Or the last run?
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Going from individual tasks to --all fails
|
|
||||||
|
|
||||||
If you do nightshift run --task TASK-001 and then that completes and then you go to nightshift run --all it fails on blocked by missing dependencies: TASK-001 . I think this is because the tasks get reset at the top of the run, but there is something marking completion of TASK-001 requiring manual reset.
|
|
||||||
|
|
||||||
run --all should start at the first not done task (seems like it does)
|
|
||||||
|
|
||||||
## Some kind of tool install feature
|
|
||||||
|
|
||||||
Continually fails on flask_sqlalchemy until I install that.
|
|
||||||
|
|
||||||
## Tutorial need to include . directory for imageboard
|
|
||||||
|
|
||||||
## Git status artifacts are noisy for non-git repositories
|
|
||||||
|
|
||||||
Observed artifact:
|
|
||||||
|
|
||||||
```text
|
|
||||||
# Git Status before
|
|
||||||
|
|
||||||
Available: false
|
|
||||||
Exit code: 128
|
|
||||||
|
|
||||||
fatal: not a git repository (or any of the parent directories): .git
|
|
||||||
```
|
|
||||||
|
|
||||||
Current behavior:
|
|
||||||
|
|
||||||
- NightShift continues when `require_clean_worktree: false`.
|
|
||||||
- `git-status-before.txt`, `git-status-after.txt`, and `diff.patch` may contain git errors.
|
|
||||||
- This is technically safe, but confusing for users running quickstart/demo projects outside git.
|
|
||||||
|
|
||||||
Desired behavior:
|
|
||||||
|
|
||||||
- Detect non-git repositories explicitly.
|
|
||||||
- Write a clearer artifact message such as:
|
|
||||||
|
|
||||||
```text
|
|
||||||
Git repository: false
|
|
||||||
Clean-worktree enforcement: skipped because require_clean_worktree is false
|
|
||||||
Diff artifact: unavailable because project is not a git repository
|
|
||||||
```
|
|
||||||
|
|
||||||
- Avoid treating non-git as a scary-looking failure when clean worktree is not required.
|
|
||||||
|
|
||||||
Acceptance criteria:
|
|
||||||
|
|
||||||
- Non-git projects produce readable git artifacts without fatal-looking output.
|
|
||||||
- `require_clean_worktree: true` still fails safely in non-git projects.
|
|
||||||
- Reports mention that git metadata/diff is unavailable because the project is not a git repo.
|
|
||||||
|
|
||||||
## Git safe.directory / ownership conflicts on Windows
|
|
||||||
|
|
||||||
Observed context:
|
|
||||||
|
|
||||||
- Git can report dubious ownership or safe-directory errors when a repo was created or managed by a different Windows user identity.
|
|
||||||
- This may happen when using GitHub Desktop, WSL, admin shells, or multiple Windows accounts.
|
|
||||||
|
|
||||||
Current behavior:
|
|
||||||
|
|
||||||
- NightShift records the raw git error in artifacts.
|
|
||||||
- If `require_clean_worktree: true`, NightShift blocks execution.
|
|
||||||
- If `require_clean_worktree: false`, NightShift continues but git status/diff artifacts can look like hard failures.
|
|
||||||
|
|
||||||
Desired behavior:
|
|
||||||
|
|
||||||
- Detect common `dubious ownership` / `safe.directory` messages.
|
|
||||||
- Write a clearer explanation in artifacts and reports.
|
|
||||||
- Suggest the exact remediation outside NightShift, for example:
|
|
||||||
|
|
||||||
```powershell
|
|
||||||
git config --global --add safe.directory <project-root>
|
|
||||||
```
|
|
||||||
|
|
||||||
Acceptance criteria:
|
|
||||||
|
|
||||||
- Safe-directory failures are classified separately from ordinary git failures.
|
|
||||||
- Users get actionable guidance.
|
|
||||||
- NightShift does not attempt to change global git config automatically.
|
|
||||||
|
|
||||||
## Clarify docs around git requirements
|
|
||||||
|
|
||||||
Add to `QUICKSTART.md` and troubleshooting:
|
|
||||||
|
|
||||||
- Git is optional when `require_clean_worktree: false`.
|
|
||||||
- Git is required for clean-worktree enforcement and useful diffs.
|
|
||||||
- Non-git projects can still run pipelines.
|
|
||||||
- Git ownership/safe-directory errors affect git artifacts, not core task execution, unless clean-worktree enforcement is enabled.
|
|
||||||
|
|
||||||
## Console appears idle during long agent calls
|
|
||||||
|
|
||||||
Current behavior:
|
|
||||||
|
|
||||||
- Long Ollama calls can make `nightshift run` look frozen.
|
|
||||||
- Progress is only visible by inspecting `.nightshift/` artifacts or `ollama ps`.
|
|
||||||
|
|
||||||
Desired behavior:
|
|
||||||
|
|
||||||
- Print stage start/finish messages to the console.
|
|
||||||
- Include agent id, stage id, task id, and artifact path when available.
|
|
||||||
- Do not stream model output yet; just show lifecycle progress.
|
|
||||||
|
|
||||||
Acceptance criteria:
|
|
||||||
|
|
||||||
- User can tell which stage is running.
|
|
||||||
- Long-running model calls no longer look like a hung process.
|
|
||||||
|
|
||||||
## Ollama output can make review stages fail if not structured
|
|
||||||
|
|
||||||
Current behavior:
|
|
||||||
|
|
||||||
- Review stages require `status: pass | fail | retry | escalate`.
|
|
||||||
- General-purpose model output may include prose before/after the structured fields.
|
|
||||||
- If no valid status is found, the review stage fails.
|
|
||||||
|
|
||||||
Desired behavior:
|
|
||||||
|
|
||||||
- Keep strict structured review parsing, but improve prompt templates and error messages.
|
|
||||||
- Artifact should clearly say the review output was unparseable and show the expected contract.
|
|
||||||
|
|
||||||
Acceptance criteria:
|
|
||||||
|
|
||||||
- Failed review parsing is easy to diagnose from `review.md` and `stage-results.md`.
|
|
||||||
|
|
||||||
## `echo` fake agents do not behave consistently across shells
|
|
||||||
|
|
||||||
Current behavior:
|
|
||||||
|
|
||||||
- Starter templates use `command: echo`.
|
|
||||||
- Depending on shell/platform, `echo` may not preserve stdin or may only echo arguments.
|
|
||||||
- This can make fake agent artifacts less useful.
|
|
||||||
|
|
||||||
Desired behavior:
|
|
||||||
|
|
||||||
- Replace fake-agent defaults with small Python one-liners or documented fake-agent scripts.
|
|
||||||
- Keep examples cross-platform.
|
|
||||||
|
|
||||||
Acceptance criteria:
|
|
||||||
|
|
||||||
- Starter project produces predictable fake-agent output on Windows PowerShell/cmd and Unix shells.
|
|
||||||
|
|
||||||
## `unittest discover` behavior depends on test package layout
|
|
||||||
|
|
||||||
Current behavior:
|
|
||||||
|
|
||||||
- Python 3.14 returned `NO TESTS RAN` with exit code 5 for an example project until `tests/__init__.py` was added.
|
|
||||||
- Users may hit the same issue in fresh target repos.
|
|
||||||
|
|
||||||
Desired behavior:
|
|
||||||
|
|
||||||
- Document this in troubleshooting.
|
|
||||||
- Consider making quickstart templates include `tests/__init__.py`.
|
|
||||||
|
|
||||||
Acceptance criteria:
|
|
||||||
|
|
||||||
- Quickstart test command works in a fresh copied example.
|
|
||||||
- Troubleshooting mentions what to do if `NO TESTS RAN` appears.
|
|
||||||
|
|
||||||
## Task completion can mark tasks complete even if no source changed
|
|
||||||
|
|
||||||
Current behavior:
|
|
||||||
|
|
||||||
- A pipeline can pass with fake agents and passing tests, then mark the task complete.
|
|
||||||
- This is expected for fake/demo mode but surprising when users expect code edits.
|
|
||||||
|
|
||||||
Desired behavior:
|
|
||||||
|
|
||||||
- Add a warning when a task completes and git/diff detects no source changes, where git is available.
|
|
||||||
- Documentation should explain fake-agent mode vs editing-agent mode.
|
|
||||||
|
|
||||||
Acceptance criteria:
|
|
||||||
|
|
||||||
- Users are less likely to mistake artifact generation for code modification.
|
|
||||||
|
|
||||||
## Dashboard requires Flask but dependency is optional
|
|
||||||
|
|
||||||
Current behavior:
|
|
||||||
|
|
||||||
- `nightshift web` fails with a helpful message if Flask is missing.
|
|
||||||
- README mentions `pip install flask`, but install extras are not defined.
|
|
||||||
|
|
||||||
Desired behavior:
|
|
||||||
|
|
||||||
- Add an optional dependency group such as `nightshift[web]` later.
|
|
||||||
- Keep graceful error behavior.
|
|
||||||
|
|
||||||
Acceptance criteria:
|
|
||||||
|
|
||||||
- Users have one documented install command for dashboard support.
|
|
||||||
|
|
@ -62,11 +62,19 @@ Ollama agent:
|
||||||
```yaml
|
```yaml
|
||||||
planner:
|
planner:
|
||||||
backend: ollama
|
backend: ollama
|
||||||
model: qwen2.5-coder:14b
|
model: qwen3-coder:30b
|
||||||
base_url: http://localhost:11434
|
base_url: http://localhost:11434
|
||||||
system_prompt: agents/planner.md
|
system_prompt: agents/planner.md
|
||||||
|
temperature: 0.2
|
||||||
|
num_ctx: 8192
|
||||||
|
num_predict: 4096
|
||||||
|
seed: 1
|
||||||
|
stop:
|
||||||
|
- STOP
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Optional Ollama generation options currently supported by NightShift are `temperature`, `num_ctx`, `num_predict`, `seed`, and `stop`.
|
||||||
|
|
||||||
## `pipeline`
|
## `pipeline`
|
||||||
|
|
||||||
- `max_task_retries`: task retry limit.
|
- `max_task_retries`: task retry limit.
|
||||||
|
|
@ -76,6 +84,7 @@ planner:
|
||||||
Command stage options:
|
Command stage options:
|
||||||
|
|
||||||
- `commands`: command strings.
|
- `commands`: command strings.
|
||||||
|
- Command strings may use task placeholders: `{task_id}`, `{task_id_lower}`, `{task_id_slug}`, and `{task_id_compact}`.
|
||||||
- `shell`: defaults to true. Set false for argv-style execution.
|
- `shell`: defaults to true. Set false for argv-style execution.
|
||||||
- `timeout_seconds`: per-stage timeout override.
|
- `timeout_seconds`: per-stage timeout override.
|
||||||
- `working_dir`: command working directory inside project root.
|
- `working_dir`: command working directory inside project root.
|
||||||
|
|
@ -141,6 +150,12 @@ Create a local integration sandbox from the NightShift repository root:
|
||||||
python -m nightshift.cli integ-run --template tutorial-pastebin
|
python -m nightshift.cli integ-run --template tutorial-pastebin
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Create, set up, validate, and run one task from the generated project directory:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m nightshift.cli integ-test --template tutorial-pastebin --task TASK-001
|
||||||
|
```
|
||||||
|
|
||||||
Set up the generated Python project:
|
Set up the generated Python project:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|
@ -161,6 +176,12 @@ Preview commands without running them:
|
||||||
python -m nightshift.cli integ-setup --project integ_runs/<timestamp>/project --dry-run
|
python -m nightshift.cli integ-setup --project integ_runs/<timestamp>/project --dry-run
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Summarize the latest integration artifact run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m nightshift.cli integ-report --latest
|
||||||
|
```
|
||||||
|
|
||||||
To clean up old sandboxes before creating a new one, keep only the newest three existing runs:
|
To clean up old sandboxes before creating a new one, keep only the newest three existing runs:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|
@ -169,8 +190,4 @@ python -m nightshift.cli integ-run --template tutorial-pastebin --keep 3
|
||||||
|
|
||||||
## Pastebin Tutorial
|
## Pastebin Tutorial
|
||||||
|
|
||||||
`nightshift init --template tutorial-pastebin` creates a small Flask snippet-hosting target with deterministic tests and incremental NightShift tasks. Its pipeline includes semantic context retrieval, telemetry, debugger support, and implementation fallback order:
|
`nightshift init --template tutorial-pastebin` creates a small Flask snippet-hosting target with deterministic tests and incremental NightShift tasks. Its pipeline includes semantic context retrieval, telemetry, debugger support, fixed task-specific tests, and a single default `qwen3-coder:30b` model path.
|
||||||
|
|
||||||
- `qwen2.5-coder:14b`
|
|
||||||
- `carstenuhlig/omnicoder-9b`
|
|
||||||
- `deepseek-coder-v2:16b`
|
|
||||||
|
|
|
||||||
17
docs/future_ideas.md
Normal file
17
docs/future_ideas.md
Normal file
|
|
@ -0,0 +1,17 @@
|
||||||
|
### Future Ideas
|
||||||
|
Not to implement until we get successful long running runs.
|
||||||
|
|
||||||
|
## I am realizing "templates" are abstracted from the user
|
||||||
|
* I think templates will be a first class citizen, a package for deployments, and a harness for performance tests
|
||||||
|
* These should live external to nightshift/project_templates as users will likely create their own
|
||||||
|
* one solution would be to reference two directories when looking up templates, builtin ones will be in nightshift/project_templates or users can define a templates directory in their nightshift config
|
||||||
|
|
||||||
|
## nightshift config
|
||||||
|
* store user settings in ~/.nightshift/config.yaml
|
||||||
|
* things like templates folder (can also live here)
|
||||||
|
* maybe this is later
|
||||||
|
|
||||||
|
## A way to easily make A/B tests to benchmark models?
|
||||||
|
* Right now I can do this manually, for example I want to run the tutorial-pastebin with qwen3.6:27b as the planner and qwen2.5-coder:14b as the coder, and another with qwen3.6:27b as both, etc.
|
||||||
|
* Maybe there is a way to make it easier to do that, possibly by creating a template that can be controlled by a larger multi-run file?
|
||||||
|
* This is probably for way later.
|
||||||
366
docs/ideas.md
Normal file
366
docs/ideas.md
Normal file
|
|
@ -0,0 +1,366 @@
|
||||||
|
# Ideas TODO
|
||||||
|
|
||||||
|
This file is now prioritized inline. Priority scale:
|
||||||
|
|
||||||
|
- P0: do next; directly improves current feedback loop
|
||||||
|
- P1: important after the current loop is usable
|
||||||
|
- P2: useful, but only after basics are stable
|
||||||
|
- P3: defer or maybe reject
|
||||||
|
|
||||||
|
## P0: Make Integration Tests Easy To Run
|
||||||
|
|
||||||
|
Status: implemented.
|
||||||
|
|
||||||
|
Implemented command:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
python -m nightshift.cli integ-test --template tutorial-pastebin --task TASK-001
|
||||||
|
```
|
||||||
|
|
||||||
|
It creates the integration sandbox, sets up the venv, runs validation through setup, runs the task from the generated project directory, and prints the artifact root. Use `--dry-run` to preview the setup and task command.
|
||||||
|
|
||||||
|
Running integration tests is still too manual.
|
||||||
|
|
||||||
|
Current process:
|
||||||
|
|
||||||
|
- install the current version of NightShift
|
||||||
|
- run `python -m nightshift.cli integ-run --template tutorial-pastebin --setup`
|
||||||
|
- copy the activation line from the output and run it
|
||||||
|
- `cd` into the generated directory
|
||||||
|
- run the task there, because running from the repo root does not find `nightshift.yaml`
|
||||||
|
|
||||||
|
Recommendation: implement a wrapper command, not just a loose script.
|
||||||
|
|
||||||
|
Target command:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
python -m nightshift.cli integ-test --template tutorial-pastebin --task TASK-001
|
||||||
|
```
|
||||||
|
|
||||||
|
It should:
|
||||||
|
|
||||||
|
1. create the integration run
|
||||||
|
2. set up the venv
|
||||||
|
3. install NightShift from the current checkout
|
||||||
|
4. run `nightshift validate`
|
||||||
|
5. run the selected task from the generated project directory
|
||||||
|
6. print final status and artifact path
|
||||||
|
|
||||||
|
Useful variants:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
python -m nightshift.cli integ-test --template tutorial-pastebin --all
|
||||||
|
python -m nightshift.cli integ-test --template tutorial-pastebin --task TASK-002 --keep 3
|
||||||
|
```
|
||||||
|
|
||||||
|
The base-directory config issue may not be a core bug, but it is bad UX. The wrapper should handle `cwd` correctly.
|
||||||
|
|
||||||
|
## P0/P1: Remove Multi-Candidate Workflow From Default Pastebin
|
||||||
|
|
||||||
|
Status: implemented for the default pastebin template and tutorial example.
|
||||||
|
|
||||||
|
Original idea:
|
||||||
|
|
||||||
|
- The multi-candidate workflow does not add as much as expected.
|
||||||
|
- Keep it as an example, maybe `example-multiagent`.
|
||||||
|
|
||||||
|
Recommendation: yes. Remove it from the default pastebin tutorial.
|
||||||
|
|
||||||
|
Reason:
|
||||||
|
|
||||||
|
- Pastebin is becoming the reliability harness.
|
||||||
|
- Multi-candidate fallback makes artifacts harder to reason about.
|
||||||
|
- It adds model variability while we are still debugging pipeline behavior.
|
||||||
|
|
||||||
|
Better split:
|
||||||
|
|
||||||
|
```text
|
||||||
|
tutorial-pastebin
|
||||||
|
tutorial-pastebin-multiagent
|
||||||
|
```
|
||||||
|
|
||||||
|
or:
|
||||||
|
|
||||||
|
```text
|
||||||
|
examples/templates/multiagent-fallback
|
||||||
|
```
|
||||||
|
|
||||||
|
Default pastebin should be boring:
|
||||||
|
|
||||||
|
```text
|
||||||
|
planner -> semantic_context -> context -> implement -> validate -> test -> review
|
||||||
|
```
|
||||||
|
|
||||||
|
Use one strong implementer first. Add fallback only in a separate experiment template.
|
||||||
|
|
||||||
|
## P1: Add A Qwen3 / 30B Pastebin Variant
|
||||||
|
|
||||||
|
Status: implemented as the default pastebin model path using `qwen3-coder:30b`.
|
||||||
|
|
||||||
|
Original idea:
|
||||||
|
|
||||||
|
- Use a non-coder model for planner roles.
|
||||||
|
- Try `qwen3.6:27b` for planning.
|
||||||
|
- Use `qwen3-coder:30b` for implementer and code-heavy roles.
|
||||||
|
|
||||||
|
Recommendation: viable, but make this a variant, not the default.
|
||||||
|
|
||||||
|
kass reply- No lets make this the default. the qwen3-coder:30b is fast now for me for some reason.
|
||||||
|
|
||||||
|
Suggested template/config:
|
||||||
|
|
||||||
|
```text
|
||||||
|
tutorial-pastebin-qwen3
|
||||||
|
```
|
||||||
|
|
||||||
|
Possible role split:
|
||||||
|
|
||||||
|
- planner: `qwen3.6:27b`
|
||||||
|
- reviewer/debugger: `qwen3.6:27b`
|
||||||
|
- implementer: `qwen3-coder:30b` or exact local 30B coder model name
|
||||||
|
|
||||||
|
Important: confirm exact model names with:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
ollama list
|
||||||
|
```
|
||||||
|
|
||||||
|
i did its `qwen3-coder:30b`
|
||||||
|
|
||||||
|
Use 30B where it pays:
|
||||||
|
|
||||||
|
- first implementation for hard tasks
|
||||||
|
- repair after concrete test failure
|
||||||
|
- schema/database changes
|
||||||
|
- multi-file changes
|
||||||
|
|
||||||
|
Do not blindly make every stage 30B if it is slow.
|
||||||
|
|
||||||
|
reply: Its not slow now!`qwen3-coder:30b`
|
||||||
|
|
||||||
|
## P2: Expose More Model Parameters
|
||||||
|
|
||||||
|
Status: implemented for the practical first set.
|
||||||
|
|
||||||
|
Supported optional Ollama fields now include `num_ctx`, `num_predict`, `seed`, and `stop`, in addition to existing `temperature`.
|
||||||
|
|
||||||
|
Original question:
|
||||||
|
|
||||||
|
- What else besides temperature is available?
|
||||||
|
- Are any worth optimizing?
|
||||||
|
|
||||||
|
Likely useful for Ollama:
|
||||||
|
|
||||||
|
- `temperature`
|
||||||
|
- `num_ctx`
|
||||||
|
- `num_predict`
|
||||||
|
- `seed`
|
||||||
|
- `stop`
|
||||||
|
- maybe `top_p`, `top_k`, `repeat_penalty`
|
||||||
|
|
||||||
|
Recommendation: add only a small practical set first.
|
||||||
|
|
||||||
|
Useful config shape:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
temperature: 0.1
|
||||||
|
num_ctx: 8192
|
||||||
|
num_predict: 4096
|
||||||
|
seed: 1
|
||||||
|
```
|
||||||
|
|
||||||
|
Most useful:
|
||||||
|
|
||||||
|
- `num_ctx`: larger repo/task context
|
||||||
|
- `num_predict`: caps runaway output
|
||||||
|
- `seed`: reproducibility, if supported consistently
|
||||||
|
- `temperature`: already useful; keep low for code
|
||||||
|
- `stop`: could help enforce file-block or diff-only contracts
|
||||||
|
|
||||||
|
Defer tuning `top_p`, `top_k`, and `repeat_penalty` unless a specific model needs it.
|
||||||
|
|
||||||
|
reply: yup lets put this in the nightshift.yaml (optional parameters, if they arent in there that's fine, but we should offer them.)
|
||||||
|
|
||||||
|
## P1: Add Test Governance For Generated Tests
|
||||||
|
|
||||||
|
Original idea:
|
||||||
|
|
||||||
|
- Have a test governance layer for when agents write tests.
|
||||||
|
- A reviewer validates alignment with acceptance criteria.
|
||||||
|
|
||||||
|
Recommendation: yes, but only for generated-test mode. Do not put generated tests back into default pastebin yet.
|
||||||
|
|
||||||
|
The previous failures proved test-writing agents will:
|
||||||
|
|
||||||
|
- edit app code
|
||||||
|
- import nonexistent modules
|
||||||
|
- require undeclared dependencies
|
||||||
|
- inspect implementation internals
|
||||||
|
- write tests for future behavior
|
||||||
|
|
||||||
|
Governance should be deterministic first, model-reviewed second.
|
||||||
|
|
||||||
|
Deterministic checks:
|
||||||
|
|
||||||
|
- test-writing stage may only touch `tests/`
|
||||||
|
- tests compile
|
||||||
|
- tests import only allowed public interfaces
|
||||||
|
- tests do not import undeclared dependencies
|
||||||
|
- tests do not define Flask routes or app implementation
|
||||||
|
- test names match current task id or current artifact
|
||||||
|
- no future-task keywords unless accepted by current task AC
|
||||||
|
|
||||||
|
Then optional model reviewer checks acceptance-criteria alignment.
|
||||||
|
|
||||||
|
## P2: Add A Test Analyzer Agent For TDD
|
||||||
|
|
||||||
|
Original idea:
|
||||||
|
|
||||||
|
- Analyze tests.
|
||||||
|
- Translate them into direct instructions for the implementer.
|
||||||
|
- Maybe implement using agent YAML definitions without new NightShift features.
|
||||||
|
|
||||||
|
Recommendation: viable, but defer until generated tests are stable.
|
||||||
|
|
||||||
|
Possible pipeline:
|
||||||
|
|
||||||
|
```text
|
||||||
|
write_tests -> validate_tests -> analyze_tests -> implement
|
||||||
|
```
|
||||||
|
|
||||||
|
Analyzer output should be concrete:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Implementation requirements:
|
||||||
|
- create_app(database_path) must return a Flask app.
|
||||||
|
- POST /snippets must return 201 and JSON id.
|
||||||
|
- GET /snippets/<id> must return persisted fields.
|
||||||
|
|
||||||
|
Do not modify:
|
||||||
|
- tests/test_task001.py
|
||||||
|
```
|
||||||
|
|
||||||
|
This may help smaller models, but it is another model output that can be wrong. Add it only after the fixed-test pipeline works through all pastebin tasks.
|
||||||
|
|
||||||
|
## P2/P3: Add A Test Planner
|
||||||
|
|
||||||
|
Original idea:
|
||||||
|
|
||||||
|
- A test planner understands acceptance criteria and code.
|
||||||
|
- Provides input to the next stage about constraints and code, especially for non-TDD.
|
||||||
|
|
||||||
|
Recommendation: maybe, but defer.
|
||||||
|
|
||||||
|
This overlaps with:
|
||||||
|
|
||||||
|
- planner
|
||||||
|
- test analyzer
|
||||||
|
- test governance
|
||||||
|
|
||||||
|
Too many planning-ish stages can make the pipeline bloated and contradictory.
|
||||||
|
|
||||||
|
If implemented later, keep it focused:
|
||||||
|
|
||||||
|
```text
|
||||||
|
test_planner -> write_tests -> test_governance -> implement
|
||||||
|
```
|
||||||
|
|
||||||
|
For now, fold this idea into the future test governance/analyzer work.
|
||||||
|
|
||||||
|
## P1: Add Fixed Tests For All Pastebin Tasks
|
||||||
|
|
||||||
|
Status: mostly implemented in the template.
|
||||||
|
|
||||||
|
Current fixed tests:
|
||||||
|
|
||||||
|
```text
|
||||||
|
tests/test_task001.py
|
||||||
|
tests/test_task002.py
|
||||||
|
tests/test_task003.py
|
||||||
|
tests/test_task004.py
|
||||||
|
tests/test_task005.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Important design:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
python -m pytest -q tests/test_{task_id_compact}.py
|
||||||
|
```
|
||||||
|
|
||||||
|
This lets all future task tests exist without breaking earlier tasks.
|
||||||
|
|
||||||
|
Next step: validate these through integration runs, one task at a time.
|
||||||
|
|
||||||
|
## P1: Add `nightshift integ-report`
|
||||||
|
|
||||||
|
Status: implemented as a first-pass artifact summarizer.
|
||||||
|
|
||||||
|
New idea.
|
||||||
|
|
||||||
|
Summarize latest integration run across tasks:
|
||||||
|
|
||||||
|
```text
|
||||||
|
TASK-001 complete in 1 retry
|
||||||
|
TASK-002 failed at validate_patch
|
||||||
|
Root cause: protected tests modified
|
||||||
|
Artifacts: ...
|
||||||
|
```
|
||||||
|
|
||||||
|
Right now we inspect artifacts manually. NightShift should do more of that.
|
||||||
|
|
||||||
|
Possible command:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
python -m nightshift.cli integ-report --latest
|
||||||
|
```
|
||||||
|
|
||||||
|
## P1: Add Task-Test Preflight To `validate`
|
||||||
|
|
||||||
|
Status: implemented.
|
||||||
|
|
||||||
|
`nightshift validate` now renders task command placeholders for every task and fails early if a configured `tests/test_*.py` path is missing.
|
||||||
|
|
||||||
|
Partially implemented at run time.
|
||||||
|
|
||||||
|
Current behavior:
|
||||||
|
|
||||||
|
- task command placeholders can render paths like `tests/test_task002.py`
|
||||||
|
- `run_task` preflight fails before invoking agents if the task-specific test file is missing
|
||||||
|
|
||||||
|
Better behavior:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
nightshift validate
|
||||||
|
```
|
||||||
|
|
||||||
|
should warn or fail:
|
||||||
|
|
||||||
|
```text
|
||||||
|
TASK-003 expects tests/test_task003.py and it exists.
|
||||||
|
TASK-004 expects tests/test_task004.py and it exists.
|
||||||
|
```
|
||||||
|
|
||||||
|
This catches missing fixed tests earlier.
|
||||||
|
|
||||||
|
## P2: Add Run Comparison
|
||||||
|
|
||||||
|
New idea.
|
||||||
|
|
||||||
|
Useful once comparing 14B vs 30B:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
nightshift compare-runs --latest 5
|
||||||
|
```
|
||||||
|
|
||||||
|
Show:
|
||||||
|
|
||||||
|
- model
|
||||||
|
- task
|
||||||
|
- retries
|
||||||
|
- failure stage
|
||||||
|
- final reason
|
||||||
|
- runtime
|
||||||
|
- token estimate
|
||||||
|
|
||||||
|
This should come after `integ-test` and `integ-report`.
|
||||||
|
|
||||||
|
|
@ -1,4 +1,4 @@
|
||||||
# Tutorial 03: Pastebin With Model Fallback And Telemetry
|
# Tutorial 03: Pastebin With Fixed Tests And Telemetry
|
||||||
|
|
||||||
This tutorial uses the `tutorial-pastebin` template: a small Flask snippet-hosting service designed for deterministic NightShift orchestration tests.
|
This tutorial uses the `tutorial-pastebin` template: a small Flask snippet-hosting service designed for deterministic NightShift orchestration tests.
|
||||||
|
|
||||||
|
|
@ -19,6 +19,12 @@ For an isolated local integration run, use the integration sandbox command from
|
||||||
python -m nightshift.cli integ-run --template tutorial-pastebin
|
python -m nightshift.cli integ-run --template tutorial-pastebin
|
||||||
```
|
```
|
||||||
|
|
||||||
|
To create, set up, validate, and run one task in a single command:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m nightshift.cli integ-test --template tutorial-pastebin --task TASK-001
|
||||||
|
```
|
||||||
|
|
||||||
To create the sandbox and set up the Python project immediately:
|
To create the sandbox and set up the Python project immediately:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|
@ -57,7 +63,7 @@ pyproject.toml
|
||||||
README.md
|
README.md
|
||||||
```
|
```
|
||||||
|
|
||||||
The template includes a tiny Flask `create_app(database_path=None)` scaffold and fixed `TASK-001` tests. The default tutorial pipeline asks the implementation agent to make those deterministic tests pass before review.
|
The template includes a tiny Flask `create_app(database_path=None)` scaffold and fixed tests for each tutorial task. The default tutorial pipeline asks the implementation agent to make only the current task's deterministic tests pass before review.
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
|
|
@ -73,26 +79,22 @@ Install target dependencies:
|
||||||
python -m pip install -e . pytest flask
|
python -m pip install -e . pytest flask
|
||||||
```
|
```
|
||||||
|
|
||||||
Install and start Ollama, then pull the fallback models you want available:
|
Install and start Ollama, then pull the default pastebin model:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ollama pull qwen2.5-coder:14b
|
ollama pull qwen3-coder:30b
|
||||||
ollama pull carstenuhlig/omnicoder-9b
|
|
||||||
ollama pull deepseek-coder-v2:16b
|
|
||||||
ollama list
|
ollama list
|
||||||
```
|
```
|
||||||
|
|
||||||
NightShift uses Ollama's local HTTP API, normally at `http://localhost:11434`.
|
NightShift uses Ollama's local HTTP API, normally at `http://localhost:11434`.
|
||||||
|
|
||||||
## Model Fallback
|
## Model
|
||||||
|
|
||||||
The implementation stage uses this fallback order:
|
The default pastebin pipeline uses one strong local coder model:
|
||||||
|
|
||||||
1. `qwen2.5-coder:14b`
|
- `qwen3-coder:30b`
|
||||||
2. `carstenuhlig/omnicoder-9b`
|
|
||||||
3. `deepseek-coder-v2:16b`
|
|
||||||
|
|
||||||
NightShift records which agent/model handled each stage in `telemetry-summary.md`.
|
NightShift records which agent/model handled each stage in `telemetry-summary.md`. Multi-candidate fallback belongs in a separate experiment template, not the default pastebin reliability harness.
|
||||||
|
|
||||||
## TDD Pipeline
|
## TDD Pipeline
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -20,51 +20,49 @@ safety:
|
||||||
- curl | bash
|
- curl | bash
|
||||||
|
|
||||||
experiment:
|
experiment:
|
||||||
label: pastebin-model-fallback
|
label: pastebin-qwen3-coder
|
||||||
prompt_variant: tdd-qwen-omnicoder-deepseek-v2
|
prompt_variant: fixed-tests-qwen3-coder-30b-v1
|
||||||
|
|
||||||
agents:
|
agents:
|
||||||
planner:
|
planner:
|
||||||
backend: ollama
|
backend: ollama
|
||||||
model: qwen2.5-coder:14b
|
model: qwen3-coder:30b
|
||||||
temperature: 0.2
|
temperature: 0.2
|
||||||
|
num_ctx: 8192
|
||||||
|
num_predict: 4096
|
||||||
system_prompt: .nightshift/agents/planner.md
|
system_prompt: .nightshift/agents/planner.md
|
||||||
|
|
||||||
implementer_qwen:
|
implementer:
|
||||||
backend: ollama
|
backend: ollama
|
||||||
model: qwen2.5-coder:14b
|
model: qwen3-coder:30b
|
||||||
temperature: 0.1
|
temperature: 0.1
|
||||||
|
num_ctx: 8192
|
||||||
|
num_predict: 4096
|
||||||
system_prompt: .nightshift/agents/implementer.md
|
system_prompt: .nightshift/agents/implementer.md
|
||||||
|
|
||||||
test_writer:
|
test_writer:
|
||||||
backend: ollama
|
backend: ollama
|
||||||
model: qwen2.5-coder:14b
|
model: qwen3-coder:30b
|
||||||
temperature: 0.1
|
temperature: 0.1
|
||||||
|
num_ctx: 8192
|
||||||
|
num_predict: 4096
|
||||||
system_prompt: .nightshift/agents/test-writer.md
|
system_prompt: .nightshift/agents/test-writer.md
|
||||||
|
|
||||||
implementer_omnicoder:
|
|
||||||
backend: ollama
|
|
||||||
model: carstenuhlig/omnicoder-9b
|
|
||||||
temperature: 0.1
|
|
||||||
system_prompt: .nightshift/agents/implementer.md
|
|
||||||
|
|
||||||
implementer_deepseek:
|
|
||||||
backend: ollama
|
|
||||||
model: deepseek-coder-v2:16b
|
|
||||||
temperature: 0.1
|
|
||||||
system_prompt: .nightshift/agents/implementer.md
|
|
||||||
|
|
||||||
debugger:
|
debugger:
|
||||||
backend: ollama
|
backend: ollama
|
||||||
model: qwen2.5-coder:14b
|
model: qwen3-coder:30b
|
||||||
role: debugger
|
role: debugger
|
||||||
temperature: 0.1
|
temperature: 0.1
|
||||||
|
num_ctx: 8192
|
||||||
|
num_predict: 4096
|
||||||
system_prompt: .nightshift/agents/debugger.md
|
system_prompt: .nightshift/agents/debugger.md
|
||||||
|
|
||||||
reviewer:
|
reviewer:
|
||||||
backend: ollama
|
backend: ollama
|
||||||
model: qwen2.5-coder:14b
|
model: qwen3-coder:30b
|
||||||
temperature: 0.1
|
temperature: 0.1
|
||||||
|
num_ctx: 8192
|
||||||
|
num_predict: 4096
|
||||||
system_prompt: .nightshift/agents/reviewer.md
|
system_prompt: .nightshift/agents/reviewer.md
|
||||||
|
|
||||||
pipeline:
|
pipeline:
|
||||||
|
|
@ -87,10 +85,7 @@ pipeline:
|
||||||
|
|
||||||
- id: implement
|
- id: implement
|
||||||
type: file_writer
|
type: file_writer
|
||||||
agent_pool:
|
agent: implementer
|
||||||
- implementer_qwen
|
|
||||||
- implementer_omnicoder
|
|
||||||
- implementer_deepseek
|
|
||||||
output: proposed.patch
|
output: proposed.patch
|
||||||
|
|
||||||
- id: normalize
|
- id: normalize
|
||||||
|
|
|
||||||
|
|
@ -228,8 +228,9 @@ class AgentExecutor:
|
||||||
"prompt": prompt,
|
"prompt": prompt,
|
||||||
"stream": False,
|
"stream": False,
|
||||||
}
|
}
|
||||||
if agent.temperature is not None:
|
options = _ollama_options(agent)
|
||||||
body["options"] = {"temperature": agent.temperature}
|
if options:
|
||||||
|
body["options"] = options
|
||||||
headers = {"Content-Type": "application/json"}
|
headers = {"Content-Type": "application/json"}
|
||||||
started = time.monotonic()
|
started = time.monotonic()
|
||||||
self.logger.event(
|
self.logger.event(
|
||||||
|
|
@ -395,6 +396,21 @@ def build_prompt_bundle(
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _ollama_options(agent: AgentConfig) -> dict[str, object]:
|
||||||
|
options: dict[str, object] = {}
|
||||||
|
if agent.temperature is not None:
|
||||||
|
options["temperature"] = agent.temperature
|
||||||
|
if agent.num_ctx is not None:
|
||||||
|
options["num_ctx"] = agent.num_ctx
|
||||||
|
if agent.num_predict is not None:
|
||||||
|
options["num_predict"] = agent.num_predict
|
||||||
|
if agent.seed is not None:
|
||||||
|
options["seed"] = agent.seed
|
||||||
|
if agent.stop:
|
||||||
|
options["stop"] = list(agent.stop)
|
||||||
|
return options
|
||||||
|
|
||||||
|
|
||||||
def _coerce_output(value: str | bytes | None) -> str:
|
def _coerce_output(value: str | bytes | None) -> str:
|
||||||
if value is None:
|
if value is None:
|
||||||
return ""
|
return ""
|
||||||
|
|
|
||||||
|
|
@ -7,13 +7,16 @@ from pathlib import Path
|
||||||
import sys
|
import sys
|
||||||
|
|
||||||
from .config import validate_config
|
from .config import validate_config
|
||||||
from .errors import NightShiftError
|
from .errors import ConfigError, NightShiftError
|
||||||
from .init import available_templates, init_project
|
from .init import available_templates, init_project
|
||||||
from .integ import create_integration_run
|
from .integ import create_integration_run
|
||||||
|
from .integ_report import build_integration_report, format_integration_report
|
||||||
from .integ_setup import format_setup_result, setup_python_project
|
from .integ_setup import format_setup_result, setup_python_project
|
||||||
|
from .integ_test import format_integration_test_result, run_integration_test
|
||||||
from .pipeline import PipelineRunner
|
from .pipeline import PipelineRunner
|
||||||
from .runlog import RunLogger
|
from .runlog import RunLogger
|
||||||
from .status import build_status, format_status
|
from .status import build_status, format_status
|
||||||
|
from .task_tests import check_task_test_files, format_task_test_checks, missing_task_test_paths
|
||||||
from .terminal import HOTDOG_ANIMATIONS, TerminalAnimation, format_banner, style_text
|
from .terminal import HOTDOG_ANIMATIONS, TerminalAnimation, format_banner, style_text
|
||||||
from .tasks import (
|
from .tasks import (
|
||||||
ensure_dependencies_satisfied,
|
ensure_dependencies_satisfied,
|
||||||
|
|
@ -105,6 +108,33 @@ def build_parser() -> argparse.ArgumentParser:
|
||||||
help="Print --setup commands without running them.",
|
help="Print --setup commands without running them.",
|
||||||
)
|
)
|
||||||
|
|
||||||
|
integ_test_parser = subparsers.add_parser(
|
||||||
|
"integ-test",
|
||||||
|
help="Create, set up, validate, and run an integration template task.",
|
||||||
|
)
|
||||||
|
integ_test_parser.add_argument("--root", default=".", help="Repository root where integ_runs/ is created.")
|
||||||
|
integ_test_parser.add_argument(
|
||||||
|
"--template",
|
||||||
|
default="tutorial-pastebin",
|
||||||
|
choices=available_templates(),
|
||||||
|
help="Template to initialize inside the sandbox.",
|
||||||
|
)
|
||||||
|
integ_test_parser.add_argument("--task", help="Specific task id to run.")
|
||||||
|
integ_test_parser.add_argument("--all", action="store_true", help="Run all runnable incomplete tasks.")
|
||||||
|
integ_test_parser.add_argument("--keep", type=int, help="Keep only the newest N old integration runs before creating a new one.")
|
||||||
|
integ_test_parser.add_argument(
|
||||||
|
"--setup-extra",
|
||||||
|
action="append",
|
||||||
|
default=["pytest"],
|
||||||
|
help="Extra package to install during setup. May be repeated. Defaults to pytest.",
|
||||||
|
)
|
||||||
|
integ_test_parser.add_argument("--setup-skip-validate", action="store_true", help="Skip validation during setup.")
|
||||||
|
integ_test_parser.add_argument("--dry-run", action="store_true", help="Print commands without running setup or tasks.")
|
||||||
|
|
||||||
|
integ_report_parser = subparsers.add_parser("integ-report", help="Summarize the latest integration run.")
|
||||||
|
integ_report_parser.add_argument("--root", default=".", help="Repository root where integ_runs/ is located.")
|
||||||
|
integ_report_parser.add_argument("--latest", action="store_true", help="Report the latest integration run.")
|
||||||
|
|
||||||
setup_parser = subparsers.add_parser(
|
setup_parser = subparsers.add_parser(
|
||||||
"integ-setup",
|
"integ-setup",
|
||||||
help="Set up a Python integration project venv and dependencies.",
|
help="Set up a Python integration project venv and dependencies.",
|
||||||
|
|
@ -160,12 +190,18 @@ def main(argv: list[str] | None = None) -> int:
|
||||||
config = validate_config(args.config)
|
config = validate_config(args.config)
|
||||||
tasks = parse_task_file(config.project.root, config.project.task_file)
|
tasks = parse_task_file(config.project.root, config.project.task_file)
|
||||||
validate_task_dependencies(tasks)
|
validate_task_dependencies(tasks)
|
||||||
|
task_test_checks = check_task_test_files(config, tasks)
|
||||||
|
missing_task_tests = missing_task_test_paths(task_test_checks)
|
||||||
|
if missing_task_tests:
|
||||||
|
details = format_task_test_checks(task_test_checks)
|
||||||
|
raise ConfigError(f"Config error: missing configured task test files.\n{details}")
|
||||||
incomplete = sum(1 for task in tasks if not task.completed)
|
incomplete = sum(1 for task in tasks if not task.completed)
|
||||||
print(f"Config valid: {config.path}")
|
print(f"Config valid: {config.path}")
|
||||||
print(f"Project: {config.project.name}")
|
print(f"Project: {config.project.name}")
|
||||||
print(f"Stages: {len(config.pipeline.stages)}")
|
print(f"Stages: {len(config.pipeline.stages)}")
|
||||||
print(f"Tasks: {len(tasks)}")
|
print(f"Tasks: {len(tasks)}")
|
||||||
print(f"Incomplete tasks: {incomplete}")
|
print(f"Incomplete tasks: {incomplete}")
|
||||||
|
print(format_task_test_checks(task_test_checks))
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
if args.command == "run":
|
if args.command == "run":
|
||||||
|
|
@ -256,6 +292,25 @@ def main(argv: list[str] | None = None) -> int:
|
||||||
print(format_setup_result(result))
|
print(format_setup_result(result))
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
if args.command == "integ-test":
|
||||||
|
result = run_integration_test(
|
||||||
|
args.root,
|
||||||
|
template=args.template,
|
||||||
|
task=args.task,
|
||||||
|
all_tasks=args.all,
|
||||||
|
keep=args.keep,
|
||||||
|
setup_extras=tuple(args.setup_extra or ()),
|
||||||
|
skip_setup_validate=args.setup_skip_validate,
|
||||||
|
dry_run=args.dry_run,
|
||||||
|
)
|
||||||
|
print(format_integration_test_result(result))
|
||||||
|
return result.exit_code
|
||||||
|
|
||||||
|
if args.command == "integ-report":
|
||||||
|
report = build_integration_report(args.root, latest=True)
|
||||||
|
print(format_integration_report(report))
|
||||||
|
return 0
|
||||||
|
|
||||||
except NightShiftError as exc:
|
except NightShiftError as exc:
|
||||||
print(str(exc), file=sys.stderr)
|
print(str(exc), file=sys.stderr)
|
||||||
return 1
|
return 1
|
||||||
|
|
|
||||||
|
|
@ -5,6 +5,7 @@ from __future__ import annotations
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
import os
|
import os
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
import re
|
||||||
import shlex
|
import shlex
|
||||||
import subprocess
|
import subprocess
|
||||||
import sys
|
import sys
|
||||||
|
|
@ -68,11 +69,16 @@ class CommandExecutor:
|
||||||
command_index=index,
|
command_index=index,
|
||||||
command=command,
|
command=command,
|
||||||
)
|
)
|
||||||
|
rendered_command = render_command_template(command, task_id)
|
||||||
|
rendered_allowed_commands = tuple(
|
||||||
|
render_command_template(allowed, task_id) for allowed in self.safety.allowed_commands
|
||||||
|
)
|
||||||
run = self.run_command(
|
run = self.run_command(
|
||||||
command,
|
rendered_command,
|
||||||
shell=stage.shell,
|
shell=stage.shell,
|
||||||
timeout_seconds=stage.timeout_seconds,
|
timeout_seconds=stage.timeout_seconds,
|
||||||
working_dir=stage.working_dir,
|
working_dir=stage.working_dir,
|
||||||
|
allowed_commands=rendered_allowed_commands,
|
||||||
)
|
)
|
||||||
runs.append(run)
|
runs.append(run)
|
||||||
self.logger.event(
|
self.logger.event(
|
||||||
|
|
@ -120,11 +126,12 @@ class CommandExecutor:
|
||||||
shell: bool = True,
|
shell: bool = True,
|
||||||
timeout_seconds: int | None = None,
|
timeout_seconds: int | None = None,
|
||||||
working_dir: Path | None = None,
|
working_dir: Path | None = None,
|
||||||
|
allowed_commands: tuple[str, ...] | None = None,
|
||||||
) -> CommandRun:
|
) -> CommandRun:
|
||||||
try:
|
try:
|
||||||
normalized = ensure_command_allowed(
|
normalized = ensure_command_allowed(
|
||||||
command,
|
command,
|
||||||
self.safety.allowed_commands,
|
allowed_commands if allowed_commands is not None else self.safety.allowed_commands,
|
||||||
self.safety.forbidden_commands,
|
self.safety.forbidden_commands,
|
||||||
)
|
)
|
||||||
except SafetyError as exc:
|
except SafetyError as exc:
|
||||||
|
|
@ -210,6 +217,27 @@ def format_command_runs(stage_id: str, runs: list[CommandRun]) -> str:
|
||||||
return "\n".join(lines)
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def render_command_template(command: str, task_id: str) -> str:
|
||||||
|
task_id_lower = task_id.lower()
|
||||||
|
task_id_slug = task_id_lower.replace("-", "_")
|
||||||
|
task_id_compact = task_id_lower.replace("-", "")
|
||||||
|
return command.format(
|
||||||
|
task_id=task_id,
|
||||||
|
task_id_lower=task_id_lower,
|
||||||
|
task_id_slug=task_id_slug,
|
||||||
|
task_id_compact=task_id_compact,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def extract_test_file_paths(command: str) -> tuple[str, ...]:
|
||||||
|
paths: list[str] = []
|
||||||
|
for match in re.finditer(r"(?<![\w./\\-])(tests[\\/][^\s`'\"<>|&;]+\.py)", command):
|
||||||
|
path = match.group(1).replace("\\", "/")
|
||||||
|
if path not in paths:
|
||||||
|
paths.append(path)
|
||||||
|
return tuple(paths)
|
||||||
|
|
||||||
|
|
||||||
def _coerce_output(value: str | bytes | None) -> str:
|
def _coerce_output(value: str | bytes | None) -> str:
|
||||||
if value is None:
|
if value is None:
|
||||||
return ""
|
return ""
|
||||||
|
|
|
||||||
|
|
@ -46,6 +46,10 @@ class AgentConfig:
|
||||||
temperature: float | None = None
|
temperature: float | None = None
|
||||||
base_url: str | None = None
|
base_url: str | None = None
|
||||||
api_key_env: str | None = None
|
api_key_env: str | None = None
|
||||||
|
num_ctx: int | None = None
|
||||||
|
num_predict: int | None = None
|
||||||
|
seed: int | None = None
|
||||||
|
stop: tuple[str, ...] = ()
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
@dataclass(frozen=True)
|
||||||
|
|
@ -207,10 +211,18 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig:
|
||||||
agent_raw.get("temperature"),
|
agent_raw.get("temperature"),
|
||||||
f"agents.{agent_id}.temperature",
|
f"agents.{agent_id}.temperature",
|
||||||
)
|
)
|
||||||
|
num_ctx = _optional_int_or_none(agent_raw.get("num_ctx"), f"agents.{agent_id}.num_ctx")
|
||||||
|
num_predict = _optional_int_or_none(agent_raw.get("num_predict"), f"agents.{agent_id}.num_predict")
|
||||||
|
seed = _optional_int_or_none(agent_raw.get("seed"), f"agents.{agent_id}.seed")
|
||||||
|
stop = _string_tuple(agent_raw.get("stop", []), f"agents.{agent_id}.stop")
|
||||||
if temperature is not None and temperature < 0:
|
if temperature is not None and temperature < 0:
|
||||||
raise ConfigError(
|
raise ConfigError(
|
||||||
f"Config error: agents.{agent_id}.temperature must be zero or greater."
|
f"Config error: agents.{agent_id}.temperature must be zero or greater."
|
||||||
)
|
)
|
||||||
|
if num_ctx is not None and num_ctx <= 0:
|
||||||
|
raise ConfigError(f"Config error: agents.{agent_id}.num_ctx must be greater than zero.")
|
||||||
|
if num_predict is not None and num_predict <= 0:
|
||||||
|
raise ConfigError(f"Config error: agents.{agent_id}.num_predict must be greater than zero.")
|
||||||
if backend not in {"command", "ollama", "openai_compatible"}:
|
if backend not in {"command", "ollama", "openai_compatible"}:
|
||||||
raise ConfigError(
|
raise ConfigError(
|
||||||
f"Config error: agent '{agent_id}' uses unsupported backend '{backend}'. "
|
f"Config error: agent '{agent_id}' uses unsupported backend '{backend}'. "
|
||||||
|
|
@ -243,6 +255,10 @@ def parse_config(raw: dict[str, Any], config_path: Path) -> NightShiftConfig:
|
||||||
temperature=temperature,
|
temperature=temperature,
|
||||||
base_url=base_url,
|
base_url=base_url,
|
||||||
api_key_env=api_key_env,
|
api_key_env=api_key_env,
|
||||||
|
num_ctx=num_ctx,
|
||||||
|
num_predict=num_predict,
|
||||||
|
seed=seed,
|
||||||
|
stop=stop,
|
||||||
)
|
)
|
||||||
|
|
||||||
experiment_raw = raw.get("experiment", {})
|
experiment_raw = raw.get("experiment", {})
|
||||||
|
|
|
||||||
71
nightshift/integ_report.py
Normal file
71
nightshift/integ_report.py
Normal file
|
|
@ -0,0 +1,71 @@
|
||||||
|
"""Summarize integration run artifacts."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
import re
|
||||||
|
|
||||||
|
from .errors import NightShiftError
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class IntegrationReport:
|
||||||
|
integration_run: Path
|
||||||
|
nightshift_run: Path | None
|
||||||
|
lines: tuple[str, ...]
|
||||||
|
|
||||||
|
|
||||||
|
def build_integration_report(root: str | Path = ".", *, latest: bool = True) -> IntegrationReport:
|
||||||
|
base = Path(root).resolve() / "integ_runs"
|
||||||
|
if not base.exists():
|
||||||
|
raise NightShiftError(f"Integration report error: no integ_runs directory found: {base}")
|
||||||
|
runs = sorted((path for path in base.iterdir() if path.is_dir()), key=lambda path: path.name, reverse=True)
|
||||||
|
if not runs:
|
||||||
|
raise NightShiftError(f"Integration report error: no integration runs found under: {base}")
|
||||||
|
integration_run = runs[0] if latest else runs[0]
|
||||||
|
artifacts_root = integration_run / "project" / ".nightshift" / "runs"
|
||||||
|
if not artifacts_root.exists():
|
||||||
|
return IntegrationReport(
|
||||||
|
integration_run,
|
||||||
|
None,
|
||||||
|
("No NightShift run artifacts found. Setup may have failed before task execution.",),
|
||||||
|
)
|
||||||
|
nightshift_runs = sorted((path for path in artifacts_root.iterdir() if path.is_dir()), key=lambda path: path.name, reverse=True)
|
||||||
|
if not nightshift_runs:
|
||||||
|
return IntegrationReport(integration_run, None, ("No NightShift run directories found.",))
|
||||||
|
nightshift_run = nightshift_runs[0]
|
||||||
|
summaries = sorted(nightshift_run.glob("tasks/*/run-summary.md"))
|
||||||
|
if not summaries and (nightshift_run / "run-summary.md").exists():
|
||||||
|
summaries = [nightshift_run / "run-summary.md"]
|
||||||
|
lines = [_summarize_run_summary(path, integration_run) for path in summaries]
|
||||||
|
return IntegrationReport(integration_run, nightshift_run, tuple(lines or ("No task summaries found.",)))
|
||||||
|
|
||||||
|
|
||||||
|
def format_integration_report(report: IntegrationReport) -> str:
|
||||||
|
lines = [f"Integration run: {report.integration_run}"]
|
||||||
|
if report.nightshift_run is not None:
|
||||||
|
lines.append(f"NightShift run: {report.nightshift_run}")
|
||||||
|
lines.append("")
|
||||||
|
lines.extend(f"- {line}" for line in report.lines)
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def _summarize_run_summary(path: Path, integration_run: Path) -> str:
|
||||||
|
text = path.read_text(encoding="utf-8", errors="replace")
|
||||||
|
task = _field(text, "Task") or path.parent.name
|
||||||
|
status = _field(text, "Status") or "unknown"
|
||||||
|
retries = _field(text, "Retry count") or "unknown"
|
||||||
|
reason = _field(text, "Reason") or "no reason recorded"
|
||||||
|
try:
|
||||||
|
relative = path.relative_to(integration_run)
|
||||||
|
except ValueError:
|
||||||
|
relative = path
|
||||||
|
return f"{task} {status} after {retries} retries. Reason: {reason}. Artifacts: {relative.parent}"
|
||||||
|
|
||||||
|
|
||||||
|
def _field(text: str, name: str) -> str | None:
|
||||||
|
match = re.search(rf"^- {re.escape(name)}:\s*(.+)$", text, flags=re.MULTILINE)
|
||||||
|
if not match:
|
||||||
|
return None
|
||||||
|
return match.group(1).strip()
|
||||||
71
nightshift/integ_test.py
Normal file
71
nightshift/integ_test.py
Normal file
|
|
@ -0,0 +1,71 @@
|
||||||
|
"""End-to-end integration test wrapper."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
import subprocess
|
||||||
|
|
||||||
|
from .errors import NightShiftError
|
||||||
|
from .integ import IntegrationRun, create_integration_run
|
||||||
|
from .integ_setup import IntegrationSetupResult, setup_python_project
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class IntegrationTestResult:
|
||||||
|
run: IntegrationRun
|
||||||
|
setup: IntegrationSetupResult
|
||||||
|
command: tuple[str, ...]
|
||||||
|
exit_code: int
|
||||||
|
dry_run: bool
|
||||||
|
|
||||||
|
|
||||||
|
def run_integration_test(
|
||||||
|
root: str | Path = ".",
|
||||||
|
*,
|
||||||
|
template: str = "tutorial-pastebin",
|
||||||
|
task: str | None = None,
|
||||||
|
all_tasks: bool = False,
|
||||||
|
keep: int | None = None,
|
||||||
|
setup_extras: tuple[str, ...] = ("pytest",),
|
||||||
|
skip_setup_validate: bool = False,
|
||||||
|
dry_run: bool = False,
|
||||||
|
) -> IntegrationTestResult:
|
||||||
|
if task and all_tasks:
|
||||||
|
raise NightShiftError("Integration test error: use either --task or --all, not both.")
|
||||||
|
if not task and not all_tasks:
|
||||||
|
raise NightShiftError("Integration test error: provide --task or --all.")
|
||||||
|
|
||||||
|
run = create_integration_run(Path(root), template=template, keep=keep)
|
||||||
|
project = run.directory / "project"
|
||||||
|
setup = setup_python_project(
|
||||||
|
project,
|
||||||
|
extras=setup_extras,
|
||||||
|
validate=not skip_setup_validate,
|
||||||
|
dry_run=dry_run,
|
||||||
|
)
|
||||||
|
command = [str(setup.python), "-m", "nightshift.cli", "run", "--no-animation"]
|
||||||
|
if all_tasks:
|
||||||
|
command.append("--all")
|
||||||
|
else:
|
||||||
|
command.extend(["--task", task or ""])
|
||||||
|
|
||||||
|
exit_code = 0
|
||||||
|
if not dry_run:
|
||||||
|
completed = subprocess.run(command, cwd=project, text=True, encoding="utf-8", errors="replace")
|
||||||
|
exit_code = completed.returncode
|
||||||
|
return IntegrationTestResult(run, setup, tuple(command), exit_code, dry_run)
|
||||||
|
|
||||||
|
|
||||||
|
def format_integration_test_result(result: IntegrationTestResult) -> str:
|
||||||
|
lines = [
|
||||||
|
f"Integration run: {result.run.directory}",
|
||||||
|
f"Project: {result.run.directory / 'project'}",
|
||||||
|
f"Venv: {result.run.venv_dir}",
|
||||||
|
f"Run command: {' '.join(result.command)}",
|
||||||
|
f"Exit code: {result.exit_code}",
|
||||||
|
f"Artifacts: {result.run.directory / 'project' / '.nightshift'}",
|
||||||
|
]
|
||||||
|
if result.dry_run:
|
||||||
|
lines.insert(3, "Dry run: true")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
@ -9,7 +9,7 @@ import subprocess
|
||||||
|
|
||||||
from .agents import AgentExecutor
|
from .agents import AgentExecutor
|
||||||
from .artifacts import ArtifactStore
|
from .artifacts import ArtifactStore
|
||||||
from .commands import CommandExecutor
|
from .commands import CommandExecutor, extract_test_file_paths, render_command_template
|
||||||
from .config import COMMAND_STAGE_TYPES, NightShiftConfig, StageConfig
|
from .config import COMMAND_STAGE_TYPES, NightShiftConfig, StageConfig
|
||||||
from .context import ContextManager
|
from .context import ContextManager
|
||||||
from .dependencies import diagnose_python_dependencies, format_dependency_diagnostic
|
from .dependencies import diagnose_python_dependencies, format_dependency_diagnostic
|
||||||
|
|
@ -145,6 +145,12 @@ class PipelineRunner:
|
||||||
index = 0
|
index = 0
|
||||||
final_status = "complete"
|
final_status = "complete"
|
||||||
final_reason = "Pipeline completed."
|
final_reason = "Pipeline completed."
|
||||||
|
preflight_result = self._preflight_task(task, stages)
|
||||||
|
if preflight_result:
|
||||||
|
stage_results.append(preflight_result)
|
||||||
|
final_status = "failed"
|
||||||
|
final_reason = preflight_result.reason
|
||||||
|
index = len(stages)
|
||||||
|
|
||||||
while index < len(stages):
|
while index < len(stages):
|
||||||
stage = stages[index]
|
stage = stages[index]
|
||||||
|
|
@ -248,6 +254,13 @@ class PipelineRunner:
|
||||||
"retry-memory.md",
|
"retry-memory.md",
|
||||||
summarize_retry_memory(tuple(retry_memory)),
|
summarize_retry_memory(tuple(retry_memory)),
|
||||||
)
|
)
|
||||||
|
if _repeated_protected_path_violation(tuple(retry_memory)):
|
||||||
|
final_status = "failed"
|
||||||
|
final_reason = (
|
||||||
|
"Escalation policy stopped retries: implementation repeatedly "
|
||||||
|
"attempted to modify paths outside the stage allowlist."
|
||||||
|
)
|
||||||
|
break
|
||||||
decision = evaluate_retry_churn(
|
decision = evaluate_retry_churn(
|
||||||
tuple(retry_memory),
|
tuple(retry_memory),
|
||||||
retry_budget=self.config.pipeline.max_task_retries + 1,
|
retry_budget=self.config.pipeline.max_task_retries + 1,
|
||||||
|
|
@ -334,6 +347,45 @@ class PipelineRunner:
|
||||||
reason=final_reason,
|
reason=final_reason,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
def _preflight_task(self, task: Task, stages: list[StageConfig]) -> StageResult | None:
|
||||||
|
missing_paths: list[str] = []
|
||||||
|
for stage in stages:
|
||||||
|
if stage.type not in COMMAND_STAGE_TYPES:
|
||||||
|
continue
|
||||||
|
for command in stage.commands:
|
||||||
|
rendered = render_command_template(command, task.id)
|
||||||
|
for path_text in extract_test_file_paths(rendered):
|
||||||
|
if not (self.config.project.root / path_text).exists():
|
||||||
|
missing_paths.append(path_text)
|
||||||
|
if not missing_paths:
|
||||||
|
return None
|
||||||
|
unique_paths = tuple(dict.fromkeys(missing_paths))
|
||||||
|
details = "\n".join(f"- `{path}`" for path in unique_paths)
|
||||||
|
output_path = self.artifacts.write_stage_output(
|
||||||
|
task.id,
|
||||||
|
"preflight.md",
|
||||||
|
"\n".join(
|
||||||
|
[
|
||||||
|
"# Task Preflight",
|
||||||
|
"",
|
||||||
|
"Status: fail",
|
||||||
|
"Reason: configured task test file is missing.",
|
||||||
|
"",
|
||||||
|
"## Missing Files",
|
||||||
|
"",
|
||||||
|
details,
|
||||||
|
"",
|
||||||
|
]
|
||||||
|
),
|
||||||
|
)
|
||||||
|
return StageResult(
|
||||||
|
"preflight",
|
||||||
|
"fail",
|
||||||
|
"Task preflight failed: configured task test file is missing: "
|
||||||
|
+ ", ".join(unique_paths),
|
||||||
|
output_path=str(output_path.relative_to(self.config.project.root)),
|
||||||
|
)
|
||||||
|
|
||||||
def run_tasks(self, tasks: list[Task] | tuple[Task, ...]) -> MultiTaskResult:
|
def run_tasks(self, tasks: list[Task] | tuple[Task, ...]) -> MultiTaskResult:
|
||||||
self.artifacts.initialize_run()
|
self.artifacts.initialize_run()
|
||||||
self.logger.bind(self.artifacts)
|
self.logger.bind(self.artifacts)
|
||||||
|
|
@ -1428,6 +1480,18 @@ def _extract_exit_code(text: str) -> int | None:
|
||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _repeated_protected_path_violation(entries: tuple[RetryMemoryEntry, ...]) -> bool:
|
||||||
|
recent = entries[-2:]
|
||||||
|
if len(recent) < 2:
|
||||||
|
return False
|
||||||
|
return all(_is_protected_path_violation(entry.cause) for entry in recent)
|
||||||
|
|
||||||
|
|
||||||
|
def _is_protected_path_violation(text: str) -> bool:
|
||||||
|
lowered = text.lower()
|
||||||
|
return "not allowed for this stage" in lowered and "tests/" in lowered.replace("\\", "/")
|
||||||
|
|
||||||
|
|
||||||
def format_aggregate_run_summary(results: list[PipelineResult], status: str, reason: str) -> str:
|
def format_aggregate_run_summary(results: list[PipelineResult], status: str, reason: str) -> str:
|
||||||
lines = [
|
lines = [
|
||||||
"# Run Summary",
|
"# Run Summary",
|
||||||
|
|
|
||||||
|
|
@ -1,9 +1,11 @@
|
||||||
You are the debugger agent for the NightShift pastebin tutorial.
|
You are the debugger agent for the NightShift pastebin tutorial.
|
||||||
|
|
||||||
Diagnose failed attempts without editing files.
|
Diagnose failed attempts without editing files.
|
||||||
Distinguish inaccurate generated tests from implementation bugs.
|
Distinguish fixed-test/template problems from implementation bugs.
|
||||||
If tests are inaccurate for the current task, recommend retrying `write_tests`.
|
This tutorial uses fixed task tests and task-specific pytest commands. Do not recommend `write_tests` unless the configured pipeline actually has a `write_tests` stage.
|
||||||
|
If a current task appears to lack tests, report a template or test-selection problem.
|
||||||
If implementation is wrong, recommend the smallest implementation repair and name files that should not be modified.
|
If implementation is wrong, recommend the smallest implementation repair and name files that should not be modified.
|
||||||
|
Implementation agents must not edit files under `tests/`.
|
||||||
Return:
|
Return:
|
||||||
- concise diagnosis
|
- concise diagnosis
|
||||||
- recommended next action
|
- recommended next action
|
||||||
|
|
|
||||||
|
|
@ -7,8 +7,10 @@ Do not add behavior for future tasks unless needed to satisfy the current tests.
|
||||||
Use Flask and `sqlite3` from the Python standard library. Do not use SQLAlchemy, Flask-SQLAlchemy, or undeclared dependencies.
|
Use Flask and `sqlite3` from the Python standard library. Do not use SQLAlchemy, Flask-SQLAlchemy, or undeclared dependencies.
|
||||||
Keep the public package name `pastebin_app`.
|
Keep the public package name `pastebin_app`.
|
||||||
Keep the public app entry point `create_app(database_path: str | None = None)`.
|
Keep the public app entry point `create_app(database_path: str | None = None)`.
|
||||||
|
Respect `database_path`; do not hard-code `snippets.db` when a database path is supplied.
|
||||||
Tests should interact through HTTP routes and `create_app`, not through ORM/session globals.
|
Tests should interact through HTTP routes and `create_app`, not through ORM/session globals.
|
||||||
Do not use `app.before_first_request`; recent Flask versions removed it. Initialize required database tables inside `create_app` or inside the route helper before use.
|
Do not use `app.before_first_request`; recent Flask versions removed it. Initialize required database tables inside `create_app` or inside the route helper before use.
|
||||||
|
When adding columns to an existing sqlite table, handle existing databases idempotently with `ALTER TABLE` checks or a simple migration helper. `CREATE TABLE IF NOT EXISTS` does not add columns to an existing table.
|
||||||
|
|
||||||
Output only complete file content blocks.
|
Output only complete file content blocks.
|
||||||
Use one fenced block per file:
|
Use one fenced block per file:
|
||||||
|
|
|
||||||
|
|
@ -14,6 +14,12 @@ Or create an isolated integration sandbox from the NightShift repository root:
|
||||||
python -m nightshift.cli integ-run --template tutorial-pastebin
|
python -m nightshift.cli integ-run --template tutorial-pastebin
|
||||||
```
|
```
|
||||||
|
|
||||||
|
To create, set up, validate, and run one task in a single command:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m nightshift.cli integ-test --template tutorial-pastebin --task TASK-001
|
||||||
|
```
|
||||||
|
|
||||||
To create the sandbox and set it up in one step:
|
To create the sandbox and set it up in one step:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|
@ -48,12 +54,8 @@ nightshift what-happened
|
||||||
|
|
||||||
When running from an integration sandbox, the same commands are run inside `integ_runs/<timestamp>/project`.
|
When running from an integration sandbox, the same commands are run inside `integ_runs/<timestamp>/project`.
|
||||||
|
|
||||||
The pipeline uses model fallback ordering for implementation attempts:
|
The default pastebin pipeline uses `qwen3-coder:30b` for planning, implementation, debugging, test review, and final review. It intentionally does not use multi-candidate fallback; pastebin is the deterministic reliability harness.
|
||||||
|
|
||||||
1. `qwen2.5-coder:14b`
|
|
||||||
2. `carstenuhlig/omnicoder-9b`
|
|
||||||
3. `deepseek-coder-v2:16b`
|
|
||||||
|
|
||||||
Telemetry artifacts record which agent/model handled each stage and estimate token usage.
|
Telemetry artifacts record which agent/model handled each stage and estimate token usage.
|
||||||
|
|
||||||
This template uses a TDD-oriented pipeline. It starts with a skeletal package, generates task-specific pytest tests from the current task acceptance criteria, reviews those tests for scope, and then implements only enough application code to pass them.
|
This template uses fixed task-specific pytest files. The pipeline starts with a skeletal package, implements only the current task, runs `tests/test_{task_id_compact}.py`, and then reviews the result.
|
||||||
|
|
|
||||||
|
|
@ -20,51 +20,49 @@ safety:
|
||||||
- curl | bash
|
- curl | bash
|
||||||
|
|
||||||
experiment:
|
experiment:
|
||||||
label: pastebin-model-fallback
|
label: pastebin-qwen3-coder
|
||||||
prompt_variant: tdd-qwen-omnicoder-deepseek-v2
|
prompt_variant: fixed-tests-qwen3-coder-30b-v1
|
||||||
|
|
||||||
agents:
|
agents:
|
||||||
planner:
|
planner:
|
||||||
backend: ollama
|
backend: ollama
|
||||||
model: qwen2.5-coder:14b
|
model: qwen3-coder:30b
|
||||||
temperature: 0.2
|
temperature: 0.2
|
||||||
|
num_ctx: 8192
|
||||||
|
num_predict: 4096
|
||||||
system_prompt: .nightshift/agents/planner.md
|
system_prompt: .nightshift/agents/planner.md
|
||||||
|
|
||||||
implementer_qwen:
|
implementer:
|
||||||
backend: ollama
|
backend: ollama
|
||||||
model: qwen2.5-coder:14b
|
model: qwen3-coder:30b
|
||||||
temperature: 0.1
|
temperature: 0.1
|
||||||
|
num_ctx: 8192
|
||||||
|
num_predict: 4096
|
||||||
system_prompt: .nightshift/agents/implementer.md
|
system_prompt: .nightshift/agents/implementer.md
|
||||||
|
|
||||||
test_writer:
|
test_writer:
|
||||||
backend: ollama
|
backend: ollama
|
||||||
model: qwen2.5-coder:14b
|
model: qwen3-coder:30b
|
||||||
temperature: 0.1
|
temperature: 0.1
|
||||||
|
num_ctx: 8192
|
||||||
|
num_predict: 4096
|
||||||
system_prompt: .nightshift/agents/test-writer.md
|
system_prompt: .nightshift/agents/test-writer.md
|
||||||
|
|
||||||
implementer_omnicoder:
|
|
||||||
backend: ollama
|
|
||||||
model: carstenuhlig/omnicoder-9b
|
|
||||||
temperature: 0.1
|
|
||||||
system_prompt: .nightshift/agents/implementer.md
|
|
||||||
|
|
||||||
implementer_deepseek:
|
|
||||||
backend: ollama
|
|
||||||
model: deepseek-coder-v2:16b
|
|
||||||
temperature: 0.1
|
|
||||||
system_prompt: .nightshift/agents/implementer.md
|
|
||||||
|
|
||||||
debugger:
|
debugger:
|
||||||
backend: ollama
|
backend: ollama
|
||||||
model: qwen2.5-coder:14b
|
model: qwen3-coder:30b
|
||||||
role: debugger
|
role: debugger
|
||||||
temperature: 0.1
|
temperature: 0.1
|
||||||
|
num_ctx: 8192
|
||||||
|
num_predict: 4096
|
||||||
system_prompt: .nightshift/agents/debugger.md
|
system_prompt: .nightshift/agents/debugger.md
|
||||||
|
|
||||||
reviewer:
|
reviewer:
|
||||||
backend: ollama
|
backend: ollama
|
||||||
model: qwen2.5-coder:14b
|
model: qwen3-coder:30b
|
||||||
temperature: 0.1
|
temperature: 0.1
|
||||||
|
num_ctx: 8192
|
||||||
|
num_predict: 4096
|
||||||
system_prompt: .nightshift/agents/reviewer.md
|
system_prompt: .nightshift/agents/reviewer.md
|
||||||
|
|
||||||
pipeline:
|
pipeline:
|
||||||
|
|
@ -87,10 +85,7 @@ pipeline:
|
||||||
|
|
||||||
- id: implement
|
- id: implement
|
||||||
type: file_writer
|
type: file_writer
|
||||||
agent_pool:
|
agent: implementer
|
||||||
- implementer_qwen
|
|
||||||
- implementer_omnicoder
|
|
||||||
- implementer_deepseek
|
|
||||||
output: proposed.patch
|
output: proposed.patch
|
||||||
|
|
||||||
- id: normalize
|
- id: normalize
|
||||||
|
|
|
||||||
|
|
@ -16,6 +16,7 @@ def test_create_snippet_returns_created_snippet_id(tmp_path):
|
||||||
assert response.status_code == 201
|
assert response.status_code == 201
|
||||||
data = response.get_json()
|
data = response.get_json()
|
||||||
assert isinstance(data["id"], int)
|
assert isinstance(data["id"], int)
|
||||||
|
assert (tmp_path / "snippets.db").exists()
|
||||||
|
|
||||||
|
|
||||||
def test_view_snippet_returns_persisted_fields(tmp_path):
|
def test_view_snippet_returns_persisted_fields(tmp_path):
|
||||||
|
|
@ -38,6 +39,7 @@ def test_view_snippet_returns_persisted_fields(tmp_path):
|
||||||
"title": "View me",
|
"title": "View me",
|
||||||
"body": "stored body",
|
"body": "stored body",
|
||||||
}
|
}
|
||||||
|
assert (tmp_path / "snippets.db").exists()
|
||||||
|
|
||||||
|
|
||||||
def test_view_missing_snippet_returns_404(tmp_path):
|
def test_view_missing_snippet_returns_404(tmp_path):
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,50 @@
|
||||||
|
from pastebin_app.app import create_app
|
||||||
|
|
||||||
|
|
||||||
|
def test_create_snippet_accepts_optional_metadata(tmp_path):
|
||||||
|
app = create_app(database_path=str(tmp_path / "snippets.db"))
|
||||||
|
client = app.test_client()
|
||||||
|
|
||||||
|
response = client.post(
|
||||||
|
"/snippets",
|
||||||
|
json={
|
||||||
|
"title": "Tagged",
|
||||||
|
"body": "metadata body",
|
||||||
|
"language": "python",
|
||||||
|
"tags": ["alpha", "beta"],
|
||||||
|
"expires_at": "2030-01-01T00:00:00",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
assert response.status_code == 201
|
||||||
|
assert isinstance(response.get_json()["id"], int)
|
||||||
|
assert (tmp_path / "snippets.db").exists()
|
||||||
|
|
||||||
|
|
||||||
|
def test_view_snippet_returns_optional_metadata(tmp_path):
|
||||||
|
app = create_app(database_path=str(tmp_path / "snippets.db"))
|
||||||
|
client = app.test_client()
|
||||||
|
|
||||||
|
created = client.post(
|
||||||
|
"/snippets",
|
||||||
|
json={
|
||||||
|
"title": "Tagged",
|
||||||
|
"body": "metadata body",
|
||||||
|
"language": "python",
|
||||||
|
"tags": ["alpha", "beta"],
|
||||||
|
"expires_at": "2030-01-01T00:00:00",
|
||||||
|
},
|
||||||
|
).get_json()
|
||||||
|
|
||||||
|
response = client.get(f"/snippets/{created['id']}")
|
||||||
|
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert response.get_json() == {
|
||||||
|
"id": created["id"],
|
||||||
|
"title": "Tagged",
|
||||||
|
"body": "metadata body",
|
||||||
|
"language": "python",
|
||||||
|
"tags": ["alpha", "beta"],
|
||||||
|
"expires_at": "2030-01-01T00:00:00",
|
||||||
|
}
|
||||||
|
assert (tmp_path / "snippets.db").exists()
|
||||||
|
|
@ -0,0 +1,47 @@
|
||||||
|
from pastebin_app.app import create_app
|
||||||
|
|
||||||
|
|
||||||
|
def _create(client, title, body, **metadata):
|
||||||
|
response = client.post("/snippets", json={"title": title, "body": body, **metadata})
|
||||||
|
assert response.status_code == 201
|
||||||
|
return response.get_json()["id"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_list_snippets_newest_first(tmp_path):
|
||||||
|
app = create_app(database_path=str(tmp_path / "snippets.db"))
|
||||||
|
client = app.test_client()
|
||||||
|
|
||||||
|
first_id = _create(client, "First", "older")
|
||||||
|
second_id = _create(client, "Second", "newer")
|
||||||
|
|
||||||
|
response = client.get("/snippets")
|
||||||
|
|
||||||
|
assert response.status_code == 200
|
||||||
|
ids = [snippet["id"] for snippet in response.get_json()]
|
||||||
|
assert ids[:2] == [second_id, first_id]
|
||||||
|
|
||||||
|
|
||||||
|
def test_search_filters_by_title_or_body(tmp_path):
|
||||||
|
app = create_app(database_path=str(tmp_path / "snippets.db"))
|
||||||
|
client = app.test_client()
|
||||||
|
_create(client, "Python note", "ordinary body")
|
||||||
|
_create(client, "Other", "contains needle")
|
||||||
|
|
||||||
|
response = client.get("/snippets?q=python")
|
||||||
|
assert [snippet["title"] for snippet in response.get_json()] == ["Python note"]
|
||||||
|
|
||||||
|
response = client.get("/snippets?q=needle")
|
||||||
|
assert [snippet["title"] for snippet in response.get_json()] == ["Other"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_language_and_tag_filters(tmp_path):
|
||||||
|
app = create_app(database_path=str(tmp_path / "snippets.db"))
|
||||||
|
client = app.test_client()
|
||||||
|
_create(client, "Python", "body", language="python", tags=["code", "demo"])
|
||||||
|
_create(client, "Text", "body", language="text", tags=["notes"])
|
||||||
|
|
||||||
|
response = client.get("/snippets?language=python")
|
||||||
|
assert [snippet["title"] for snippet in response.get_json()] == ["Python"]
|
||||||
|
|
||||||
|
response = client.get("/snippets?tag=notes")
|
||||||
|
assert [snippet["title"] for snippet in response.get_json()] == ["Text"]
|
||||||
|
|
@ -0,0 +1,43 @@
|
||||||
|
from pastebin_app.app import create_app
|
||||||
|
|
||||||
|
|
||||||
|
def test_expired_snippets_are_excluded_from_listing(tmp_path):
|
||||||
|
app = create_app(database_path=str(tmp_path / "snippets.db"))
|
||||||
|
client = app.test_client()
|
||||||
|
client.post(
|
||||||
|
"/snippets",
|
||||||
|
json={"title": "Expired", "body": "old", "expires_at": "2000-01-01T00:00:00"},
|
||||||
|
)
|
||||||
|
active = client.post(
|
||||||
|
"/snippets",
|
||||||
|
json={"title": "Active", "body": "new", "expires_at": "2999-01-01T00:00:00"},
|
||||||
|
).get_json()
|
||||||
|
|
||||||
|
response = client.get("/snippets")
|
||||||
|
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert [snippet["id"] for snippet in response.get_json()] == [active["id"]]
|
||||||
|
|
||||||
|
|
||||||
|
def test_direct_lookup_of_expired_snippet_returns_410(tmp_path):
|
||||||
|
app = create_app(database_path=str(tmp_path / "snippets.db"))
|
||||||
|
client = app.test_client()
|
||||||
|
expired = client.post(
|
||||||
|
"/snippets",
|
||||||
|
json={"title": "Expired", "body": "old", "expires_at": "2000-01-01T00:00:00"},
|
||||||
|
).get_json()
|
||||||
|
|
||||||
|
response = client.get(f"/snippets/{expired['id']}")
|
||||||
|
|
||||||
|
assert response.status_code == 410
|
||||||
|
|
||||||
|
|
||||||
|
def test_non_expiring_snippet_remains_visible(tmp_path):
|
||||||
|
app = create_app(database_path=str(tmp_path / "snippets.db"))
|
||||||
|
client = app.test_client()
|
||||||
|
created = client.post("/snippets", json={"title": "Forever", "body": "body"}).get_json()
|
||||||
|
|
||||||
|
response = client.get(f"/snippets/{created['id']}")
|
||||||
|
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert response.get_json()["title"] == "Forever"
|
||||||
|
|
@ -0,0 +1,46 @@
|
||||||
|
from pastebin_app.app import create_app
|
||||||
|
|
||||||
|
|
||||||
|
def test_root_shows_snippet_list_html(tmp_path):
|
||||||
|
app = create_app(database_path=str(tmp_path / "snippets.db"))
|
||||||
|
client = app.test_client()
|
||||||
|
client.post("/snippets", json={"title": "Visible", "body": "body"})
|
||||||
|
|
||||||
|
response = client.get("/")
|
||||||
|
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert "Visible" in response.get_data(as_text=True)
|
||||||
|
|
||||||
|
|
||||||
|
def test_new_snippet_form_loads(tmp_path):
|
||||||
|
app = create_app(database_path=str(tmp_path / "snippets.db"))
|
||||||
|
client = app.test_client()
|
||||||
|
|
||||||
|
response = client.get("/new")
|
||||||
|
|
||||||
|
assert response.status_code == 200
|
||||||
|
html = response.get_data(as_text=True)
|
||||||
|
assert 'name="title"' in html
|
||||||
|
assert 'name="body"' in html
|
||||||
|
assert 'name="language"' in html
|
||||||
|
assert 'name="tags"' in html
|
||||||
|
assert 'name="expires_at"' in html
|
||||||
|
|
||||||
|
|
||||||
|
def test_form_post_redirects_to_snippet_view(tmp_path):
|
||||||
|
app = create_app(database_path=str(tmp_path / "snippets.db"))
|
||||||
|
client = app.test_client()
|
||||||
|
|
||||||
|
response = client.post(
|
||||||
|
"/new",
|
||||||
|
data={
|
||||||
|
"title": "Form title",
|
||||||
|
"body": "Form body",
|
||||||
|
"language": "text",
|
||||||
|
"tags": "forms,html",
|
||||||
|
"expires_at": "",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
assert response.status_code == 302
|
||||||
|
assert response.headers["Location"].endswith("/snippets/1")
|
||||||
48
nightshift/task_tests.py
Normal file
48
nightshift/task_tests.py
Normal file
|
|
@ -0,0 +1,48 @@
|
||||||
|
"""Task-specific test file validation."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from .commands import extract_test_file_paths, render_command_template
|
||||||
|
from .config import COMMAND_STAGE_TYPES, NightShiftConfig
|
||||||
|
from .tasks import Task
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class TaskTestCheck:
|
||||||
|
task_id: str
|
||||||
|
path: str
|
||||||
|
exists: bool
|
||||||
|
|
||||||
|
|
||||||
|
def check_task_test_files(config: NightShiftConfig, tasks: tuple[Task, ...] | list[Task]) -> tuple[TaskTestCheck, ...]:
|
||||||
|
checks: list[TaskTestCheck] = []
|
||||||
|
for task in tasks:
|
||||||
|
seen: set[str] = set()
|
||||||
|
for stage in config.pipeline.stages:
|
||||||
|
if stage.type not in COMMAND_STAGE_TYPES:
|
||||||
|
continue
|
||||||
|
for command in stage.commands:
|
||||||
|
rendered = render_command_template(command, task.id)
|
||||||
|
for path_text in extract_test_file_paths(rendered):
|
||||||
|
if path_text in seen:
|
||||||
|
continue
|
||||||
|
seen.add(path_text)
|
||||||
|
checks.append(TaskTestCheck(task.id, path_text, (config.project.root / path_text).exists()))
|
||||||
|
return tuple(checks)
|
||||||
|
|
||||||
|
|
||||||
|
def format_task_test_checks(checks: tuple[TaskTestCheck, ...]) -> str:
|
||||||
|
if not checks:
|
||||||
|
return "Task test files: no task-specific test paths detected."
|
||||||
|
lines = ["Task test files:"]
|
||||||
|
for check in checks:
|
||||||
|
status = "ok" if check.exists else "missing"
|
||||||
|
lines.append(f"- {check.task_id}: {check.path} ({status})")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def missing_task_test_paths(checks: tuple[TaskTestCheck, ...]) -> tuple[Path, ...]:
|
||||||
|
return tuple(Path(check.path) for check in checks if not check.exists)
|
||||||
|
|
@ -6,6 +6,7 @@ from nightshift.artifacts import ArtifactStore
|
||||||
from nightshift.commands import CommandExecutor
|
from nightshift.commands import CommandExecutor
|
||||||
from nightshift.commands import CommandRun, format_command_runs
|
from nightshift.commands import CommandRun, format_command_runs
|
||||||
from nightshift.commands import _command_env
|
from nightshift.commands import _command_env
|
||||||
|
from nightshift.commands import render_command_template
|
||||||
from nightshift.config import SafetyConfig, StageConfig
|
from nightshift.config import SafetyConfig, StageConfig
|
||||||
from nightshift.errors import CommandError
|
from nightshift.errors import CommandError
|
||||||
import sys
|
import sys
|
||||||
|
|
@ -16,6 +17,13 @@ FAILING_COMMAND = 'python -c "import sys; print(\'bad\'); sys.exit(7)"'
|
||||||
|
|
||||||
|
|
||||||
class CommandExecutorTests(unittest.TestCase):
|
class CommandExecutorTests(unittest.TestCase):
|
||||||
|
def test_render_command_template_includes_task_id_variants(self) -> None:
|
||||||
|
command = "python -m pytest -q tests/test_{task_id_compact}.py # {task_id_slug} {task_id}"
|
||||||
|
|
||||||
|
rendered = render_command_template(command, "TASK-001")
|
||||||
|
|
||||||
|
self.assertEqual(rendered, "python -m pytest -q tests/test_task001.py # task_001 TASK-001")
|
||||||
|
|
||||||
def test_passing_command_stage_returns_pass_and_writes_output(self) -> None:
|
def test_passing_command_stage_returns_pass_and_writes_output(self) -> None:
|
||||||
with tempfile.TemporaryDirectory() as directory:
|
with tempfile.TemporaryDirectory() as directory:
|
||||||
root = Path(directory)
|
root = Path(directory)
|
||||||
|
|
@ -46,6 +54,33 @@ class CommandExecutorTests(unittest.TestCase):
|
||||||
self.assertIn("Exit code: 0", output)
|
self.assertIn("Exit code: 0", output)
|
||||||
self.assertIn("ok", output)
|
self.assertIn("ok", output)
|
||||||
|
|
||||||
|
def test_command_stage_renders_task_id_before_allowlist_check(self) -> None:
|
||||||
|
with tempfile.TemporaryDirectory() as directory:
|
||||||
|
root = Path(directory)
|
||||||
|
artifacts = ArtifactStore(root, ".nightshift", run_id="test-run")
|
||||||
|
executor = CommandExecutor(
|
||||||
|
root,
|
||||||
|
SafetyConfig(
|
||||||
|
require_clean_worktree=False,
|
||||||
|
scoped_paths=(".",),
|
||||||
|
allowed_commands=('python -c "print(\'{task_id_compact}\')"',),
|
||||||
|
forbidden_commands=("rm -rf",),
|
||||||
|
),
|
||||||
|
artifacts,
|
||||||
|
)
|
||||||
|
stage = StageConfig(
|
||||||
|
id="test",
|
||||||
|
type="command",
|
||||||
|
commands=('python -c "print(\'{task_id_compact}\')"',),
|
||||||
|
output="test-output.txt",
|
||||||
|
)
|
||||||
|
|
||||||
|
result = executor.run_stage(stage, "TASK-002")
|
||||||
|
|
||||||
|
self.assertEqual(result.status, "pass")
|
||||||
|
output = (root / result.output_path).read_text(encoding="utf-8")
|
||||||
|
self.assertIn("task002", output)
|
||||||
|
|
||||||
def test_failing_command_stage_returns_fail_and_writes_output(self) -> None:
|
def test_failing_command_stage_returns_fail_and_writes_output(self) -> None:
|
||||||
with tempfile.TemporaryDirectory() as directory:
|
with tempfile.TemporaryDirectory() as directory:
|
||||||
root = Path(directory)
|
root = Path(directory)
|
||||||
|
|
|
||||||
|
|
@ -282,6 +282,27 @@ class ConfigTests(unittest.TestCase):
|
||||||
|
|
||||||
self.assertEqual(config.agents["planner"].temperature, 0.2)
|
self.assertEqual(config.agents["planner"].temperature, 0.2)
|
||||||
|
|
||||||
|
def test_agent_ollama_options_load(self) -> None:
|
||||||
|
with tempfile.TemporaryDirectory() as directory:
|
||||||
|
root = Path(directory)
|
||||||
|
init_project(root)
|
||||||
|
config_path = root / "nightshift.yaml"
|
||||||
|
config_path.write_text(
|
||||||
|
config_path.read_text(encoding="utf-8").replace(
|
||||||
|
" system_prompt: agents/planner.md",
|
||||||
|
" system_prompt: agents/planner.md\n num_ctx: 8192\n num_predict: 4096\n seed: 1\n stop:\n - STOP",
|
||||||
|
1,
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
config = load_config(config_path)
|
||||||
|
|
||||||
|
self.assertEqual(config.agents["planner"].num_ctx, 8192)
|
||||||
|
self.assertEqual(config.agents["planner"].num_predict, 4096)
|
||||||
|
self.assertEqual(config.agents["planner"].seed, 1)
|
||||||
|
self.assertEqual(config.agents["planner"].stop, ("STOP",))
|
||||||
|
|
||||||
def test_agent_temperature_must_be_number(self) -> None:
|
def test_agent_temperature_must_be_number(self) -> None:
|
||||||
with tempfile.TemporaryDirectory() as directory:
|
with tempfile.TemporaryDirectory() as directory:
|
||||||
root = Path(directory)
|
root = Path(directory)
|
||||||
|
|
|
||||||
|
|
@ -61,7 +61,7 @@ class InitProjectTests(unittest.TestCase):
|
||||||
self.assertIn("tutorial-imageboard", available_templates())
|
self.assertIn("tutorial-imageboard", available_templates())
|
||||||
self.assertIn("tutorial-pastebin", available_templates())
|
self.assertIn("tutorial-pastebin", available_templates())
|
||||||
|
|
||||||
def test_init_pastebin_template_creates_skeleton_and_model_fallback_config(self) -> None:
|
def test_init_pastebin_template_creates_skeleton_and_qwen3_config(self) -> None:
|
||||||
with tempfile.TemporaryDirectory() as directory:
|
with tempfile.TemporaryDirectory() as directory:
|
||||||
root = Path(directory)
|
root = Path(directory)
|
||||||
|
|
||||||
|
|
@ -78,11 +78,15 @@ class InitProjectTests(unittest.TestCase):
|
||||||
self.assertIn("type: semantic_context", config)
|
self.assertIn("type: semantic_context", config)
|
||||||
self.assertNotIn("id: write_tests", config)
|
self.assertNotIn("id: write_tests", config)
|
||||||
self.assertNotIn("id: review_tests", config)
|
self.assertNotIn("id: review_tests", config)
|
||||||
self.assertIn("python -m pytest -q tests", config)
|
self.assertIn("python -m pytest -q tests/test_{task_id_compact}.py", config)
|
||||||
self.assertIn("max_task_retries: 6", config)
|
self.assertIn("max_task_retries: 6", config)
|
||||||
self.assertIn("implementer_qwen", config)
|
self.assertIn("implementer:", config)
|
||||||
self.assertIn("carstenuhlig/omnicoder-9b", config)
|
self.assertIn("qwen3-coder:30b", config)
|
||||||
self.assertIn("deepseek-coder-v2:16b", config)
|
self.assertIn("num_ctx: 8192", config)
|
||||||
|
self.assertIn("num_predict: 4096", config)
|
||||||
|
self.assertNotIn("agent_pool:", config)
|
||||||
|
self.assertNotIn("carstenuhlig/omnicoder-9b", config)
|
||||||
|
self.assertNotIn("deepseek-coder-v2:16b", config)
|
||||||
|
|
||||||
def test_pastebin_example_tutorial_docs_exist(self) -> None:
|
def test_pastebin_example_tutorial_docs_exist(self) -> None:
|
||||||
root = Path(__file__).resolve().parents[1]
|
root = Path(__file__).resolve().parents[1]
|
||||||
|
|
|
||||||
51
tests/test_integ_test.py
Normal file
51
tests/test_integ_test.py
Normal file
|
|
@ -0,0 +1,51 @@
|
||||||
|
from pathlib import Path
|
||||||
|
import tempfile
|
||||||
|
import unittest
|
||||||
|
|
||||||
|
from nightshift.integ_report import build_integration_report, format_integration_report
|
||||||
|
from nightshift.integ_test import format_integration_test_result, run_integration_test
|
||||||
|
|
||||||
|
|
||||||
|
class IntegrationTestCommandTests(unittest.TestCase):
|
||||||
|
def test_run_integration_test_dry_run_builds_task_command(self) -> None:
|
||||||
|
with tempfile.TemporaryDirectory() as directory:
|
||||||
|
result = run_integration_test(
|
||||||
|
directory,
|
||||||
|
template="tutorial-pastebin",
|
||||||
|
task="TASK-001",
|
||||||
|
dry_run=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
rendered = format_integration_test_result(result)
|
||||||
|
self.assertIn("Dry run: true", rendered)
|
||||||
|
self.assertIn("TASK-001", " ".join(result.command))
|
||||||
|
self.assertTrue((result.run.directory / "project" / "nightshift.yaml").exists())
|
||||||
|
|
||||||
|
def test_build_integration_report_summarizes_latest_task_summary(self) -> None:
|
||||||
|
with tempfile.TemporaryDirectory() as directory:
|
||||||
|
root = Path(directory)
|
||||||
|
summary = root / "integ_runs" / "20260521T000000.000000Z" / "project" / ".nightshift" / "runs" / "run1" / "tasks" / "TASK-001" / "run-summary.md"
|
||||||
|
summary.parent.mkdir(parents=True)
|
||||||
|
summary.write_text(
|
||||||
|
"\n".join(
|
||||||
|
[
|
||||||
|
"# Run Summary",
|
||||||
|
"",
|
||||||
|
"- Task: TASK-001",
|
||||||
|
"- Status: complete",
|
||||||
|
"- Retry count: 1",
|
||||||
|
"- Reason: Done.",
|
||||||
|
]
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
report = build_integration_report(root)
|
||||||
|
rendered = format_integration_report(report)
|
||||||
|
|
||||||
|
self.assertIn("TASK-001 complete after 1 retries", rendered)
|
||||||
|
self.assertIn("Reason: Done.", rendered)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
unittest.main()
|
||||||
|
|
@ -105,6 +105,29 @@ class PipelineRunnerTests(unittest.TestCase):
|
||||||
)
|
)
|
||||||
self.assertIn("Modified Files", (root / ".nightshift" / "runs" / "test-run" / "run-summary.md").read_text(encoding="utf-8"))
|
self.assertIn("Modified Files", (root / ".nightshift" / "runs" / "test-run" / "run-summary.md").read_text(encoding="utf-8"))
|
||||||
|
|
||||||
|
def test_task_preflight_fails_when_task_specific_test_file_is_missing(self) -> None:
|
||||||
|
with tempfile.TemporaryDirectory() as directory:
|
||||||
|
root = Path(directory)
|
||||||
|
_write_common_files(root)
|
||||||
|
stages = (
|
||||||
|
StageConfig(
|
||||||
|
id="test",
|
||||||
|
type="command",
|
||||||
|
commands=("python -m pytest -q tests/test_{task_id_compact}.py",),
|
||||||
|
output="test-output.txt",
|
||||||
|
),
|
||||||
|
)
|
||||||
|
config = make_config(root, stages, max_retries=0)
|
||||||
|
runner = PipelineRunner(config, ArtifactStore(root, ".nightshift", run_id="test-run"))
|
||||||
|
task = parse_tasks(TASK_MD)[0]
|
||||||
|
|
||||||
|
result = runner.run_task(task)
|
||||||
|
|
||||||
|
self.assertEqual(result.status, "failed")
|
||||||
|
self.assertIn("configured task test file is missing", result.reason)
|
||||||
|
task_dir = root / ".nightshift" / "runs" / "test-run" / "tasks" / task.id
|
||||||
|
self.assertIn("tests/test_task001.py", (task_dir / "preflight.md").read_text(encoding="utf-8"))
|
||||||
|
|
||||||
def test_review_can_retry_implementation_until_limit(self) -> None:
|
def test_review_can_retry_implementation_until_limit(self) -> None:
|
||||||
with tempfile.TemporaryDirectory() as directory:
|
with tempfile.TemporaryDirectory() as directory:
|
||||||
root = Path(directory)
|
root = Path(directory)
|
||||||
|
|
|
||||||
77
tests/test_task_tests.py
Normal file
77
tests/test_task_tests.py
Normal file
|
|
@ -0,0 +1,77 @@
|
||||||
|
from pathlib import Path
|
||||||
|
import tempfile
|
||||||
|
import unittest
|
||||||
|
|
||||||
|
from nightshift.config import validate_config
|
||||||
|
from nightshift.task_tests import check_task_test_files, missing_task_test_paths
|
||||||
|
from nightshift.tasks import parse_task_file
|
||||||
|
|
||||||
|
|
||||||
|
class TaskTestValidationTests(unittest.TestCase):
|
||||||
|
def test_check_task_test_files_renders_task_placeholder(self) -> None:
|
||||||
|
with tempfile.TemporaryDirectory() as directory:
|
||||||
|
root = Path(directory)
|
||||||
|
(root / "agents").mkdir()
|
||||||
|
(root / "agents" / "planner.md").write_text("Prompt", encoding="utf-8")
|
||||||
|
(root / "tests").mkdir()
|
||||||
|
(root / "tests" / "test_task001.py").write_text("def test_ok():\n assert True\n", encoding="utf-8")
|
||||||
|
(root / "nightshift.yaml").write_text(
|
||||||
|
"\n".join(
|
||||||
|
[
|
||||||
|
"project:",
|
||||||
|
" name: task-test-validation",
|
||||||
|
" root: .",
|
||||||
|
" task_file: tasks.md",
|
||||||
|
" artifact_dir: .nightshift",
|
||||||
|
"",
|
||||||
|
"safety:",
|
||||||
|
" require_clean_worktree: false",
|
||||||
|
" scoped_paths:",
|
||||||
|
" - .",
|
||||||
|
" allowed_commands:",
|
||||||
|
" - python -m pytest -q tests/test_{task_id_compact}.py",
|
||||||
|
" forbidden_commands:",
|
||||||
|
" - rm -rf",
|
||||||
|
"",
|
||||||
|
"agents:",
|
||||||
|
" planner:",
|
||||||
|
" backend: command",
|
||||||
|
" command: python -c \"print('ok')\"",
|
||||||
|
" system_prompt: agents/planner.md",
|
||||||
|
"",
|
||||||
|
"pipeline:",
|
||||||
|
" stages:",
|
||||||
|
" - id: test",
|
||||||
|
" type: command",
|
||||||
|
" commands:",
|
||||||
|
" - python -m pytest -q tests/test_{task_id_compact}.py",
|
||||||
|
]
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
(root / "tasks.md").write_text(
|
||||||
|
"""# Tasks
|
||||||
|
|
||||||
|
- [ ] TASK-001: One
|
||||||
|
|
||||||
|
Acceptance Criteria:
|
||||||
|
- passes
|
||||||
|
|
||||||
|
- [ ] TASK-002: Two
|
||||||
|
|
||||||
|
Acceptance Criteria:
|
||||||
|
- reports missing test
|
||||||
|
""",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
config = validate_config(root / "nightshift.yaml")
|
||||||
|
tasks = parse_task_file(config.project.root, config.project.task_file)
|
||||||
|
checks = check_task_test_files(config, tasks)
|
||||||
|
|
||||||
|
self.assertEqual([check.path for check in checks], ["tests/test_task001.py", "tests/test_task002.py"])
|
||||||
|
self.assertEqual(tuple(path.as_posix() for path in missing_task_test_paths(checks)), ("tests/test_task002.py",))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
unittest.main()
|
||||||
Loading…
Reference in New Issue
Block a user