mirror of
https://github.com/khodges42/nightShift.git
synced 2026-06-14 18:18:36 +00:00
bugfixes after test run and terminal status
This commit is contained in:
parent
fb575fc5f7
commit
90e4c80116
304
docs/ideas.md
304
docs/ideas.md
|
|
@ -1,194 +1,17 @@
|
||||||
# Ideas TODO
|
# Ideas TODO
|
||||||
|
|
||||||
This file is now prioritized inline. Priority scale:
|
This file tracks open ideas only. Completed items should be removed after they land.
|
||||||
|
|
||||||
|
Priority scale:
|
||||||
|
|
||||||
- P0: do next; directly improves current feedback loop
|
- P0: do next; directly improves current feedback loop
|
||||||
- P1: important after the current loop is usable
|
- P1: important after the current loop is usable
|
||||||
- P2: useful, but only after basics are stable
|
- P2: useful, but only after basics are stable
|
||||||
- P3: defer or maybe reject
|
- P3: defer or maybe reject
|
||||||
|
|
||||||
## P0: Make Integration Tests Easy To Run
|
|
||||||
|
|
||||||
Status: implemented.
|
|
||||||
|
|
||||||
Implemented command:
|
|
||||||
|
|
||||||
```powershell
|
|
||||||
python -m nightshift.cli integ-test --template tutorial-deaddrop --task TASK-001
|
|
||||||
```
|
|
||||||
|
|
||||||
It creates the integration sandbox, sets up the venv, runs validation through setup, runs the task from the generated project directory, and prints the artifact root. Use `--dry-run` to preview the setup and task command.
|
|
||||||
|
|
||||||
Running integration tests is still too manual.
|
|
||||||
|
|
||||||
Current process:
|
|
||||||
|
|
||||||
- install the current version of NightShift
|
|
||||||
- run `python -m nightshift.cli integ-run --template tutorial-deaddrop --setup`
|
|
||||||
- copy the activation line from the output and run it
|
|
||||||
- `cd` into the generated directory
|
|
||||||
- run the task there, because running from the repo root does not find `nightshift.yaml`
|
|
||||||
|
|
||||||
Recommendation: implement a wrapper command, not just a loose script.
|
|
||||||
|
|
||||||
Target command:
|
|
||||||
|
|
||||||
```powershell
|
|
||||||
python -m nightshift.cli integ-test --template tutorial-deaddrop --task TASK-001
|
|
||||||
```
|
|
||||||
|
|
||||||
It should:
|
|
||||||
|
|
||||||
1. create the integration run
|
|
||||||
2. set up the venv
|
|
||||||
3. install NightShift from the current checkout
|
|
||||||
4. run `nightshift validate`
|
|
||||||
5. run the selected task from the generated project directory
|
|
||||||
6. print final status and artifact path
|
|
||||||
|
|
||||||
Useful variants:
|
|
||||||
|
|
||||||
```powershell
|
|
||||||
python -m nightshift.cli integ-test --template tutorial-deaddrop --all
|
|
||||||
python -m nightshift.cli integ-test --template tutorial-deaddrop --task TASK-002 --keep 3
|
|
||||||
```
|
|
||||||
|
|
||||||
The base-directory config issue may not be a core bug, but it is bad UX. The wrapper should handle `cwd` correctly.
|
|
||||||
|
|
||||||
## P0/P1: Remove Multi-Candidate Workflow From Default DeadDrop
|
|
||||||
|
|
||||||
Status: implemented for the default DeadDrop template and tutorial example.
|
|
||||||
|
|
||||||
Original idea:
|
|
||||||
|
|
||||||
- The multi-candidate workflow does not add as much as expected.
|
|
||||||
- Keep it as an example, maybe `example-multiagent`.
|
|
||||||
|
|
||||||
Recommendation: yes. Remove it from the default DeadDrop tutorial.
|
|
||||||
|
|
||||||
Reason:
|
|
||||||
|
|
||||||
- DeadDrop is becoming the reliability harness.
|
|
||||||
- Multi-candidate fallback makes artifacts harder to reason about.
|
|
||||||
- It adds model variability while we are still debugging pipeline behavior.
|
|
||||||
|
|
||||||
Better split:
|
|
||||||
|
|
||||||
```text
|
|
||||||
tutorial-deaddrop
|
|
||||||
tutorial-deaddrop-multiagent
|
|
||||||
```
|
|
||||||
|
|
||||||
or:
|
|
||||||
|
|
||||||
```text
|
|
||||||
examples/templates/multiagent-fallback
|
|
||||||
```
|
|
||||||
|
|
||||||
Default DeadDrop should be boring:
|
|
||||||
|
|
||||||
```text
|
|
||||||
planner -> semantic_context -> context -> implement -> validate -> test -> review
|
|
||||||
```
|
|
||||||
|
|
||||||
Use one strong implementer first. Add fallback only in a separate experiment template.
|
|
||||||
|
|
||||||
## P1: Add A Qwen3 / 30B DeadDrop Variant
|
|
||||||
|
|
||||||
Status: implemented as the default DeadDrop model path using `qwen3-coder:30b`.
|
|
||||||
|
|
||||||
Original idea:
|
|
||||||
|
|
||||||
- Use a non-coder model for planner roles.
|
|
||||||
- Try `qwen3.6:27b` for planning.
|
|
||||||
- Use `qwen3-coder:30b` for implementer and code-heavy roles.
|
|
||||||
|
|
||||||
Recommendation: viable, but make this a variant, not the default.
|
|
||||||
|
|
||||||
kass reply- No lets make this the default. the qwen3-coder:30b is fast now for me for some reason.
|
|
||||||
|
|
||||||
Suggested template/config:
|
|
||||||
|
|
||||||
```text
|
|
||||||
tutorial-deaddrop-qwen3
|
|
||||||
```
|
|
||||||
|
|
||||||
Possible role split:
|
|
||||||
|
|
||||||
- planner: `qwen3.6:27b`
|
|
||||||
- reviewer/debugger: `qwen3.6:27b`
|
|
||||||
- implementer: `qwen3-coder:30b` or exact local 30B coder model name
|
|
||||||
|
|
||||||
Important: confirm exact model names with:
|
|
||||||
|
|
||||||
```powershell
|
|
||||||
ollama list
|
|
||||||
```
|
|
||||||
|
|
||||||
i did its `qwen3-coder:30b`
|
|
||||||
|
|
||||||
Use 30B where it pays:
|
|
||||||
|
|
||||||
- first implementation for hard tasks
|
|
||||||
- repair after concrete test failure
|
|
||||||
- schema/database changes
|
|
||||||
- multi-file changes
|
|
||||||
|
|
||||||
Do not blindly make every stage 30B if it is slow.
|
|
||||||
|
|
||||||
reply: Its not slow now!`qwen3-coder:30b`
|
|
||||||
|
|
||||||
## P2: Expose More Model Parameters
|
|
||||||
|
|
||||||
Status: implemented for the practical first set.
|
|
||||||
|
|
||||||
Supported optional Ollama fields now include `num_ctx`, `num_predict`, `seed`, and `stop`, in addition to existing `temperature`.
|
|
||||||
|
|
||||||
Original question:
|
|
||||||
|
|
||||||
- What else besides temperature is available?
|
|
||||||
- Are any worth optimizing?
|
|
||||||
|
|
||||||
Likely useful for Ollama:
|
|
||||||
|
|
||||||
- `temperature`
|
|
||||||
- `num_ctx`
|
|
||||||
- `num_predict`
|
|
||||||
- `seed`
|
|
||||||
- `stop`
|
|
||||||
- maybe `top_p`, `top_k`, `repeat_penalty`
|
|
||||||
|
|
||||||
Recommendation: add only a small practical set first.
|
|
||||||
|
|
||||||
Useful config shape:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
temperature: 0.1
|
|
||||||
num_ctx: 8192
|
|
||||||
num_predict: 4096
|
|
||||||
seed: 1
|
|
||||||
```
|
|
||||||
|
|
||||||
Most useful:
|
|
||||||
|
|
||||||
- `num_ctx`: larger repo/task context
|
|
||||||
- `num_predict`: caps runaway output
|
|
||||||
- `seed`: reproducibility, if supported consistently
|
|
||||||
- `temperature`: already useful; keep low for code
|
|
||||||
- `stop`: could help enforce file-block or diff-only contracts
|
|
||||||
|
|
||||||
Defer tuning `top_p`, `top_k`, and `repeat_penalty` unless a specific model needs it.
|
|
||||||
|
|
||||||
reply: yup lets put this in the nightshift.yaml (optional parameters, if they arent in there that's fine, but we should offer them.)
|
|
||||||
|
|
||||||
## P1: Add Test Governance For Generated Tests
|
## P1: Add Test Governance For Generated Tests
|
||||||
|
|
||||||
Original idea:
|
Use this only for generated-test mode. Do not put generated tests back into the default DeadDrop fixed-test pipeline yet.
|
||||||
|
|
||||||
- Have a test governance layer for when agents write tests.
|
|
||||||
- A reviewer validates alignment with acceptance criteria.
|
|
||||||
|
|
||||||
Recommendation: yes, but only for generated-test mode. Do not put generated tests back into default DeadDrop yet.
|
|
||||||
|
|
||||||
The previous failures proved test-writing agents will:
|
The previous failures proved test-writing agents will:
|
||||||
|
|
||||||
|
|
@ -208,19 +31,13 @@ Deterministic checks:
|
||||||
- tests do not import undeclared dependencies
|
- tests do not import undeclared dependencies
|
||||||
- tests do not define Flask routes or app implementation
|
- tests do not define Flask routes or app implementation
|
||||||
- test names match current task id or current artifact
|
- test names match current task id or current artifact
|
||||||
- no future-task keywords unless accepted by current task AC
|
- no future-task keywords unless accepted by current task acceptance criteria
|
||||||
|
|
||||||
Then optional model reviewer checks acceptance-criteria alignment.
|
Then optional model reviewer checks acceptance-criteria alignment.
|
||||||
|
|
||||||
## P2: Add A Test Analyzer Agent For TDD
|
## P2: Add A Test Analyzer Agent For TDD
|
||||||
|
|
||||||
Original idea:
|
Defer until generated tests are stable.
|
||||||
|
|
||||||
- Analyze tests.
|
|
||||||
- Translate them into direct instructions for the implementer.
|
|
||||||
- Maybe implement using agent YAML definitions without new NightShift features.
|
|
||||||
|
|
||||||
Recommendation: viable, but defer until generated tests are stable.
|
|
||||||
|
|
||||||
Possible pipeline:
|
Possible pipeline:
|
||||||
|
|
||||||
|
|
@ -244,12 +61,7 @@ This may help smaller models, but it is another model output that can be wrong.
|
||||||
|
|
||||||
## P2/P3: Add A Test Planner
|
## P2/P3: Add A Test Planner
|
||||||
|
|
||||||
Original idea:
|
Maybe, but defer.
|
||||||
|
|
||||||
- A test planner understands acceptance criteria and code.
|
|
||||||
- Provides input to the next stage about constraints and code, especially for non-TDD.
|
|
||||||
|
|
||||||
Recommendation: maybe, but defer.
|
|
||||||
|
|
||||||
This overlaps with:
|
This overlaps with:
|
||||||
|
|
||||||
|
|
@ -267,85 +79,8 @@ test_planner -> write_tests -> test_governance -> implement
|
||||||
|
|
||||||
For now, fold this idea into the future test governance/analyzer work.
|
For now, fold this idea into the future test governance/analyzer work.
|
||||||
|
|
||||||
## P1: Add Fixed Tests For All DeadDrop Tasks
|
|
||||||
|
|
||||||
Status: mostly implemented in the template.
|
|
||||||
|
|
||||||
Current fixed tests:
|
|
||||||
|
|
||||||
```text
|
|
||||||
tests/test_task001.py
|
|
||||||
tests/test_task002.py
|
|
||||||
tests/test_task003.py
|
|
||||||
tests/test_task004.py
|
|
||||||
tests/test_task005.py
|
|
||||||
```
|
|
||||||
|
|
||||||
Important design:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
python -m pytest -q tests/test_{task_id_compact}.py
|
|
||||||
```
|
|
||||||
|
|
||||||
This lets all future task tests exist without breaking earlier tasks.
|
|
||||||
|
|
||||||
Next step: validate these through integration runs, one task at a time.
|
|
||||||
|
|
||||||
## P1: Add `nightshift integ-report`
|
|
||||||
|
|
||||||
Status: implemented as a first-pass artifact summarizer.
|
|
||||||
|
|
||||||
New idea.
|
|
||||||
|
|
||||||
Summarize latest integration run across tasks:
|
|
||||||
|
|
||||||
```text
|
|
||||||
TASK-001 complete in 1 retry
|
|
||||||
TASK-002 failed at validate_patch
|
|
||||||
Root cause: protected tests modified
|
|
||||||
Artifacts: ...
|
|
||||||
```
|
|
||||||
|
|
||||||
Right now we inspect artifacts manually. NightShift should do more of that.
|
|
||||||
|
|
||||||
Possible command:
|
|
||||||
|
|
||||||
```powershell
|
|
||||||
python -m nightshift.cli integ-report --latest
|
|
||||||
```
|
|
||||||
|
|
||||||
## P1: Add Task-Test Preflight To `validate`
|
|
||||||
|
|
||||||
Status: implemented.
|
|
||||||
|
|
||||||
`nightshift validate` now renders task command placeholders for every task and fails early if a configured `tests/test_*.py` path is missing.
|
|
||||||
|
|
||||||
Partially implemented at run time.
|
|
||||||
|
|
||||||
Current behavior:
|
|
||||||
|
|
||||||
- task command placeholders can render paths like `tests/test_task002.py`
|
|
||||||
- `run_task` preflight fails before invoking agents if the task-specific test file is missing
|
|
||||||
|
|
||||||
Better behavior:
|
|
||||||
|
|
||||||
```powershell
|
|
||||||
nightshift validate
|
|
||||||
```
|
|
||||||
|
|
||||||
should warn or fail:
|
|
||||||
|
|
||||||
```text
|
|
||||||
TASK-003 expects tests/test_task003.py and it exists.
|
|
||||||
TASK-004 expects tests/test_task004.py and it exists.
|
|
||||||
```
|
|
||||||
|
|
||||||
This catches missing fixed tests earlier.
|
|
||||||
|
|
||||||
## P2: Add Run Comparison
|
## P2: Add Run Comparison
|
||||||
|
|
||||||
New idea.
|
|
||||||
|
|
||||||
Useful once comparing 14B vs 30B:
|
Useful once comparing 14B vs 30B:
|
||||||
|
|
||||||
```powershell
|
```powershell
|
||||||
|
|
@ -364,3 +99,28 @@ Show:
|
||||||
|
|
||||||
This should come after `integ-test` and `integ-report`.
|
This should come after `integ-test` and `integ-report`.
|
||||||
|
|
||||||
|
## P2: Add A Separate Multiagent/Fallback DeadDrop Experiment
|
||||||
|
|
||||||
|
Keep the default DeadDrop template boring and deterministic:
|
||||||
|
|
||||||
|
```text
|
||||||
|
planner -> semantic_context -> context -> implement -> validate -> test -> review
|
||||||
|
```
|
||||||
|
|
||||||
|
If fallback is useful, put it in a separate experiment template, for example:
|
||||||
|
|
||||||
|
```text
|
||||||
|
tutorial-deaddrop-multiagent
|
||||||
|
```
|
||||||
|
|
||||||
|
or:
|
||||||
|
|
||||||
|
```text
|
||||||
|
examples/templates/multiagent-fallback
|
||||||
|
```
|
||||||
|
|
||||||
|
Reason:
|
||||||
|
|
||||||
|
- fallback makes artifacts harder to reason about
|
||||||
|
- model variability is bad while debugging pipeline behavior
|
||||||
|
- the default template should remain the reliability harness
|
||||||
|
|
|
||||||
396
docs/writer-idea.md
Normal file
396
docs/writer-idea.md
Normal file
|
|
@ -0,0 +1,396 @@
|
||||||
|
# Agentic Novel Writing Workflow Idea
|
||||||
|
|
||||||
|
NightShift could plausibly support non-coding workflows, especially long-form fiction, because the core abstraction is not actually "write code." It is:
|
||||||
|
|
||||||
|
- read task context
|
||||||
|
- call one or more agents
|
||||||
|
- produce artifacts
|
||||||
|
- validate outputs
|
||||||
|
- update project state
|
||||||
|
- move to the next task
|
||||||
|
|
||||||
|
That maps surprisingly well to writing a novel.
|
||||||
|
|
||||||
|
## Core Realization
|
||||||
|
|
||||||
|
A novel workflow should not ask one model to write the whole book, or even necessarily one whole chapter.
|
||||||
|
|
||||||
|
The durable project files would act like the source of truth:
|
||||||
|
|
||||||
|
- `worldbuilding.md`
|
||||||
|
- `characters.md`
|
||||||
|
- `plot-state.md`
|
||||||
|
- `style-guide.md`
|
||||||
|
- `outline.md`
|
||||||
|
- `chapters/chapter-001.md`
|
||||||
|
- `chapters/chapter-001-scene-001.md`
|
||||||
|
- `tasks.md`
|
||||||
|
|
||||||
|
The task file would drive the work, similar to coding tasks:
|
||||||
|
|
||||||
|
```text
|
||||||
|
- [ ] SCENE-001: Opening scene at the border checkpoint
|
||||||
|
|
||||||
|
Description:
|
||||||
|
Write the opening scene where Mara tries to enter the city under a false work permit.
|
||||||
|
|
||||||
|
Acceptance Criteria:
|
||||||
|
- Introduces Mara's immediate goal
|
||||||
|
- Shows the checkpoint culture without exposition dump
|
||||||
|
- Mentions the salt tax conflict indirectly
|
||||||
|
- Ends with the inspector noticing the forged seal
|
||||||
|
- 900-1400 words
|
||||||
|
- Maintains close third-person POV
|
||||||
|
```
|
||||||
|
|
||||||
|
NightShift would run one scene or section at a time.
|
||||||
|
|
||||||
|
## What We Already Have
|
||||||
|
|
||||||
|
NightShift already has several useful primitives:
|
||||||
|
|
||||||
|
- task files for chunking the novel into scenes or chapter sections
|
||||||
|
- scoped paths so agents only edit allowed writing/project files
|
||||||
|
- artifact output so drafts, reviews, and notes are preserved
|
||||||
|
- retry loops for revision
|
||||||
|
- planner/reviewer/debugger-style roles
|
||||||
|
- repo context and semantic context retrieval
|
||||||
|
- command stages that could run deterministic checks
|
||||||
|
- file-writer stages that can update Markdown files
|
||||||
|
- `lookup_requests` so agents can ask to read worldbuilding or prior scenes
|
||||||
|
|
||||||
|
That means this may not require a totally new engine. It may mostly need a new template and some writing-specific validation/review stages.
|
||||||
|
|
||||||
|
## Likely Workflow
|
||||||
|
|
||||||
|
One practical pipeline:
|
||||||
|
|
||||||
|
```text
|
||||||
|
plan_scene
|
||||||
|
gather_context
|
||||||
|
draft_scene
|
||||||
|
validate_scene
|
||||||
|
continuity_review
|
||||||
|
style_review
|
||||||
|
update_plot_state
|
||||||
|
summarize
|
||||||
|
```
|
||||||
|
|
||||||
|
Possible roles:
|
||||||
|
|
||||||
|
- Planner: turns the scene task into a beat plan.
|
||||||
|
- Context agent: pulls relevant worldbuilding, character, and plot-state excerpts.
|
||||||
|
- Drafting agent: writes the scene.
|
||||||
|
- Continuity reviewer: checks contradictions against known state.
|
||||||
|
- Style reviewer: checks POV, tone, pacing, and prose constraints.
|
||||||
|
- State updater: updates `plot-state.md`, `characters.md`, and maybe `timeline.md`.
|
||||||
|
|
||||||
|
## Chunking Strategy
|
||||||
|
|
||||||
|
Do not make a task equal to "write chapter 4" unless chapters are short.
|
||||||
|
|
||||||
|
Better units:
|
||||||
|
|
||||||
|
- scene
|
||||||
|
- scene fragment
|
||||||
|
- chapter section
|
||||||
|
- revision pass for one scene
|
||||||
|
- continuity update after one scene
|
||||||
|
- prose polish for one scene
|
||||||
|
|
||||||
|
A chapter can be assembled from multiple scene files:
|
||||||
|
|
||||||
|
```text
|
||||||
|
chapters/
|
||||||
|
chapter-001/
|
||||||
|
scene-001.md
|
||||||
|
scene-002.md
|
||||||
|
scene-003.md
|
||||||
|
chapter-001.md
|
||||||
|
```
|
||||||
|
|
||||||
|
Then a later command or agent stage can compile `chapter-001.md`.
|
||||||
|
|
||||||
|
## Durable State Files
|
||||||
|
|
||||||
|
The most important design piece is explicit state.
|
||||||
|
|
||||||
|
Recommended files:
|
||||||
|
|
||||||
|
```text
|
||||||
|
story/
|
||||||
|
worldbuilding.md
|
||||||
|
style-guide.md
|
||||||
|
characters.md
|
||||||
|
timeline.md
|
||||||
|
plot-state.md
|
||||||
|
unresolved-threads.md
|
||||||
|
continuity-rules.md
|
||||||
|
outline.md
|
||||||
|
chapters/
|
||||||
|
```
|
||||||
|
|
||||||
|
`plot-state.md` should be updated after every completed scene.
|
||||||
|
|
||||||
|
It should track:
|
||||||
|
|
||||||
|
- current character locations
|
||||||
|
- known secrets
|
||||||
|
- promises made to the reader
|
||||||
|
- unresolved questions
|
||||||
|
- relationships
|
||||||
|
- injuries/resources/items
|
||||||
|
- timeline date/time
|
||||||
|
- what each POV character currently knows
|
||||||
|
|
||||||
|
This is the fiction equivalent of application state.
|
||||||
|
|
||||||
|
## Validation Ideas
|
||||||
|
|
||||||
|
Some checks can be deterministic:
|
||||||
|
|
||||||
|
- word count range
|
||||||
|
- file exists
|
||||||
|
- only allowed files changed
|
||||||
|
- Markdown heading format
|
||||||
|
- no forbidden placeholders like `TODO`, `[insert]`, or `TBD`
|
||||||
|
- no accidental author notes in final prose
|
||||||
|
- required task terms are present
|
||||||
|
- output compiles into a chapter file
|
||||||
|
|
||||||
|
Some checks need model review:
|
||||||
|
|
||||||
|
- continuity with worldbuilding
|
||||||
|
- character voice consistency
|
||||||
|
- POV discipline
|
||||||
|
- pacing
|
||||||
|
- whether the scene satisfies the beat plan
|
||||||
|
- whether exposition is too direct
|
||||||
|
- whether the state update accurately reflects the scene
|
||||||
|
|
||||||
|
The key is not to overtrust model review. It should produce actionable retry notes, not silently bless everything.
|
||||||
|
|
||||||
|
## What Might Be Missing
|
||||||
|
|
||||||
|
### 1. Better Non-Code Templates
|
||||||
|
|
||||||
|
This likely needs a dedicated template:
|
||||||
|
|
||||||
|
```text
|
||||||
|
tutorial-deaddrop
|
||||||
|
tutorial-novel
|
||||||
|
```
|
||||||
|
|
||||||
|
or:
|
||||||
|
|
||||||
|
```text
|
||||||
|
writer-novel
|
||||||
|
```
|
||||||
|
|
||||||
|
The template would include:
|
||||||
|
|
||||||
|
- starter story files
|
||||||
|
- writing prompts
|
||||||
|
- task examples
|
||||||
|
- validation commands
|
||||||
|
- allowed paths
|
||||||
|
- recommended pipeline
|
||||||
|
|
||||||
|
### 2. Better Markdown Patch/File Handling
|
||||||
|
|
||||||
|
The current file-writer flow can work, but fiction output may be long. It may be safer to require complete file blocks for one scene file at a time.
|
||||||
|
|
||||||
|
The workflow should avoid having an agent rewrite the whole novel or whole `plot-state.md` unless necessary.
|
||||||
|
|
||||||
|
### 3. Stronger State Update Governance
|
||||||
|
|
||||||
|
The risky part is not drafting prose. The risky part is bad state updates.
|
||||||
|
|
||||||
|
Example failure:
|
||||||
|
|
||||||
|
- the scene says Mara never saw the prince
|
||||||
|
- the state updater records that Mara recognized the prince
|
||||||
|
- future scenes build on the wrong state
|
||||||
|
|
||||||
|
A state update should probably be reviewed against the actual scene before being applied.
|
||||||
|
|
||||||
|
Possible pipeline:
|
||||||
|
|
||||||
|
```text
|
||||||
|
draft_scene -> review_scene -> propose_state_update -> review_state_update -> apply
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Context Window Management
|
||||||
|
|
||||||
|
Worldbuilding documents can get large.
|
||||||
|
|
||||||
|
The agent should not receive the entire story bible every time. It should receive:
|
||||||
|
|
||||||
|
- the current task
|
||||||
|
- relevant worldbuilding excerpts
|
||||||
|
- relevant character entries
|
||||||
|
- recent scene summaries
|
||||||
|
- current plot state
|
||||||
|
- style guide
|
||||||
|
|
||||||
|
Semantic search is probably enough for a first version, but a novel template may want a more explicit index:
|
||||||
|
|
||||||
|
```text
|
||||||
|
world-index.md
|
||||||
|
character-index.md
|
||||||
|
location-index.md
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Scene Dependency Tracking
|
||||||
|
|
||||||
|
Coding tasks already have dependencies. Fiction tasks would need the same:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Dependencies:
|
||||||
|
- SCENE-001
|
||||||
|
- SCENE-002
|
||||||
|
```
|
||||||
|
|
||||||
|
This prevents writing a later scene before the required earlier story state exists.
|
||||||
|
|
||||||
|
### 6. Revision Workflows
|
||||||
|
|
||||||
|
Writing is not only forward generation.
|
||||||
|
|
||||||
|
Useful task types:
|
||||||
|
|
||||||
|
- draft new scene
|
||||||
|
- revise scene for pacing
|
||||||
|
- revise dialogue
|
||||||
|
- continuity repair
|
||||||
|
- line edit
|
||||||
|
- chapter assembly
|
||||||
|
- chapter-level review
|
||||||
|
- update outline after discovery writing
|
||||||
|
|
||||||
|
NightShift can already represent these as tasks, but the prompts should distinguish them clearly.
|
||||||
|
|
||||||
|
### 7. Output Length Controls
|
||||||
|
|
||||||
|
Long fiction output needs explicit limits.
|
||||||
|
|
||||||
|
Use:
|
||||||
|
|
||||||
|
- scene word count bounds
|
||||||
|
- `num_predict`
|
||||||
|
- task acceptance criteria
|
||||||
|
- smaller scene files
|
||||||
|
|
||||||
|
Do not ask for "write chapter 12" unless the chapter has already been broken into beats.
|
||||||
|
|
||||||
|
## Suggested First Template
|
||||||
|
|
||||||
|
Start with a minimal `writer-novel` template.
|
||||||
|
|
||||||
|
Files:
|
||||||
|
|
||||||
|
```text
|
||||||
|
nightshift.yaml
|
||||||
|
.nightshift/tasks.md
|
||||||
|
.nightshift/agents/planner.md
|
||||||
|
.nightshift/agents/drafter.md
|
||||||
|
.nightshift/agents/continuity-reviewer.md
|
||||||
|
.nightshift/agents/style-reviewer.md
|
||||||
|
.nightshift/agents/state-updater.md
|
||||||
|
story/worldbuilding.md
|
||||||
|
story/characters.md
|
||||||
|
story/style-guide.md
|
||||||
|
story/plot-state.md
|
||||||
|
story/timeline.md
|
||||||
|
story/unresolved-threads.md
|
||||||
|
story/chapters/.gitkeep
|
||||||
|
```
|
||||||
|
|
||||||
|
Pipeline:
|
||||||
|
|
||||||
|
```text
|
||||||
|
plan
|
||||||
|
semantic_context
|
||||||
|
context
|
||||||
|
draft
|
||||||
|
validate_draft
|
||||||
|
continuity_review
|
||||||
|
style_review
|
||||||
|
update_state
|
||||||
|
validate_state
|
||||||
|
summarize
|
||||||
|
```
|
||||||
|
|
||||||
|
Allowed paths:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
scoped_paths:
|
||||||
|
- story
|
||||||
|
- .nightshift/tasks.md
|
||||||
|
```
|
||||||
|
|
||||||
|
Draft stage allowed paths:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
allowed_paths:
|
||||||
|
- story/chapters
|
||||||
|
```
|
||||||
|
|
||||||
|
State update stage allowed paths:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
allowed_paths:
|
||||||
|
- story/plot-state.md
|
||||||
|
- story/characters.md
|
||||||
|
- story/timeline.md
|
||||||
|
- story/unresolved-threads.md
|
||||||
|
```
|
||||||
|
|
||||||
|
That separation matters. The drafter should not freely rewrite the world bible, and the state updater should not rewrite the scene prose.
|
||||||
|
|
||||||
|
## What We Should Not Do First
|
||||||
|
|
||||||
|
Do not start with:
|
||||||
|
|
||||||
|
- automatic full-plot generation
|
||||||
|
- full chapter generation
|
||||||
|
- global rewrites of all prior chapters
|
||||||
|
- one giant `worldbuilding.md` dumped into every prompt
|
||||||
|
- trusting the model to maintain continuity without explicit state files
|
||||||
|
|
||||||
|
Those are likely to produce impressive-looking but unstable output.
|
||||||
|
|
||||||
|
## Practical First Experiment
|
||||||
|
|
||||||
|
A good first test:
|
||||||
|
|
||||||
|
1. Create a tiny worldbuilding document.
|
||||||
|
2. Create three characters.
|
||||||
|
3. Create five scene tasks.
|
||||||
|
4. Have NightShift draft one scene at a time.
|
||||||
|
5. After each scene, update `plot-state.md`.
|
||||||
|
6. Run continuity review against only the scene, state files, and relevant worldbuilding.
|
||||||
|
7. Inspect artifacts.
|
||||||
|
|
||||||
|
Success criteria:
|
||||||
|
|
||||||
|
- scenes land in the right files
|
||||||
|
- word counts stay bounded
|
||||||
|
- state updates are accurate
|
||||||
|
- future scenes use prior state correctly
|
||||||
|
- reviewers catch obvious contradictions
|
||||||
|
|
||||||
|
## Bottom Line
|
||||||
|
|
||||||
|
Theoretically, NightShift already has many of the needed utilities.
|
||||||
|
|
||||||
|
The missing piece is mostly a writing-oriented template with:
|
||||||
|
|
||||||
|
- scene-sized tasks
|
||||||
|
- durable story state files
|
||||||
|
- strict path separation between prose and state updates
|
||||||
|
- writing-specific prompts
|
||||||
|
- lightweight deterministic validators
|
||||||
|
- continuity/style review stages
|
||||||
|
|
||||||
|
This is viable, but it should start as a constrained scene-writing workflow, not an autonomous novel generator.
|
||||||
|
|
@ -55,7 +55,7 @@ def build_parser() -> argparse.ArgumentParser:
|
||||||
run_parser.add_argument("--all", action="store_true", help="Run all runnable incomplete tasks.")
|
run_parser.add_argument("--all", action="store_true", help="Run all runnable incomplete tasks.")
|
||||||
run_parser.add_argument(
|
run_parser.add_argument(
|
||||||
"--animation",
|
"--animation",
|
||||||
default="agent_thinking",
|
default="status_dots",
|
||||||
choices=tuple(sorted(HOTDOG_ANIMATIONS)),
|
choices=tuple(sorted(HOTDOG_ANIMATIONS)),
|
||||||
help="Terminal animation to show while the run is active.",
|
help="Terminal animation to show while the run is active.",
|
||||||
)
|
)
|
||||||
|
|
@ -210,13 +210,13 @@ def main(argv: list[str] | None = None) -> int:
|
||||||
validate_task_dependencies(tasks)
|
validate_task_dependencies(tasks)
|
||||||
if args.all and args.task:
|
if args.all and args.task:
|
||||||
parser.error("run accepts either --all or --task, not both.")
|
parser.error("run accepts either --all or --task, not both.")
|
||||||
runner = PipelineRunner(config, logger=RunLogger(console=print))
|
|
||||||
if args.all:
|
if args.all:
|
||||||
with TerminalAnimation(
|
with TerminalAnimation(
|
||||||
args.animation,
|
args.animation,
|
||||||
message="NightShift running all tasks",
|
message="Starting all tasks",
|
||||||
enabled=not args.no_animation,
|
enabled=not args.no_animation,
|
||||||
):
|
) as animation:
|
||||||
|
runner = PipelineRunner(config, logger=RunLogger(console=print, status=animation.update_message))
|
||||||
result = runner.run_tasks(tasks)
|
result = runner.run_tasks(tasks)
|
||||||
print(f"Status: {result.status}")
|
print(f"Status: {result.status}")
|
||||||
print(f"Tasks run: {len(result.task_results)}")
|
print(f"Tasks run: {len(result.task_results)}")
|
||||||
|
|
@ -229,9 +229,10 @@ def main(argv: list[str] | None = None) -> int:
|
||||||
ensure_dependencies_satisfied(tasks, task)
|
ensure_dependencies_satisfied(tasks, task)
|
||||||
with TerminalAnimation(
|
with TerminalAnimation(
|
||||||
args.animation,
|
args.animation,
|
||||||
message=f"NightShift running {task.id}",
|
message=f"Task: {task.id} | Starting",
|
||||||
enabled=not args.no_animation,
|
enabled=not args.no_animation,
|
||||||
):
|
) as animation:
|
||||||
|
runner = PipelineRunner(config, logger=RunLogger(console=print, status=animation.update_message))
|
||||||
result = runner.run_task(task)
|
result = runner.run_task(task)
|
||||||
print(f"Task: {result.task_id}")
|
print(f"Task: {result.task_id}")
|
||||||
print(style_text(f"Status: {result.status}", color=_status_color(result.status), bold=True))
|
print(style_text(f"Status: {result.status}", color=_status_color(result.status), bold=True))
|
||||||
|
|
|
||||||
|
|
@ -12,6 +12,7 @@ from .terminal import format_console_event_line, format_plain_event_line
|
||||||
|
|
||||||
|
|
||||||
ConsoleWriter = Callable[[str], None]
|
ConsoleWriter = Callable[[str], None]
|
||||||
|
StatusWriter = Callable[[str], None]
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
@dataclass(frozen=True)
|
||||||
|
|
@ -24,8 +25,9 @@ class LogEvent:
|
||||||
class RunLogger:
|
class RunLogger:
|
||||||
"""Write concise operational events to CLI and run log artifacts."""
|
"""Write concise operational events to CLI and run log artifacts."""
|
||||||
|
|
||||||
def __init__(self, console: ConsoleWriter | None = None) -> None:
|
def __init__(self, console: ConsoleWriter | None = None, status: StatusWriter | None = None) -> None:
|
||||||
self.console = console
|
self.console = console
|
||||||
|
self.status = status
|
||||||
self._run_log_path: Path | None = None
|
self._run_log_path: Path | None = None
|
||||||
self._aggregate_log_path: Path | None = None
|
self._aggregate_log_path: Path | None = None
|
||||||
self._initialized_run_logs: set[Path] = set()
|
self._initialized_run_logs: set[Path] = set()
|
||||||
|
|
@ -45,6 +47,10 @@ class RunLogger:
|
||||||
line = format_plain_event_line(timestamp, event, message, safe_fields)
|
line = format_plain_event_line(timestamp, event, message, safe_fields)
|
||||||
if self.console is not None:
|
if self.console is not None:
|
||||||
self.console(format_console_event_line(timestamp, event, message, safe_fields))
|
self.console(format_console_event_line(timestamp, event, message, safe_fields))
|
||||||
|
if self.status is not None:
|
||||||
|
status_message = format_status_event_message(event, message, safe_fields)
|
||||||
|
if status_message:
|
||||||
|
self.status(status_message)
|
||||||
for path in (self._run_log_path,):
|
for path in (self._run_log_path,):
|
||||||
if path is None:
|
if path is None:
|
||||||
continue
|
continue
|
||||||
|
|
@ -69,6 +75,39 @@ def format_log_line(log_event: LogEvent) -> str:
|
||||||
return format_plain_event_line(timestamp, log_event.event, log_event.message, log_event.fields)
|
return format_plain_event_line(timestamp, log_event.event, log_event.message, log_event.fields)
|
||||||
|
|
||||||
|
|
||||||
|
def format_status_event_message(event: str, message: str, fields: dict[str, object]) -> str | None:
|
||||||
|
task_id = str(fields.get("task_id", "") or "")
|
||||||
|
retry = fields.get("retry_count")
|
||||||
|
retry_text = f" retry {retry}" if retry not in (None, "") else ""
|
||||||
|
stage_id = str(fields.get("stage_id", "") or "")
|
||||||
|
stage_type = str(fields.get("stage_type", "") or "")
|
||||||
|
agent_id = str(fields.get("agent_id", "") or "")
|
||||||
|
model = str(fields.get("model", "") or "")
|
||||||
|
command = str(fields.get("command", "") or "")
|
||||||
|
status = str(fields.get("status", "") or "")
|
||||||
|
next_stage = str(fields.get("next_stage", "") or "")
|
||||||
|
|
||||||
|
prefix = f"Task: {task_id} | " if task_id else ""
|
||||||
|
if event == "task.start":
|
||||||
|
return f"Task: {task_id} | Starting" if task_id else "Starting task"
|
||||||
|
if event == "stage.start" and stage_id:
|
||||||
|
label = f"{stage_id} ({stage_type})" if stage_type else stage_id
|
||||||
|
return f"{prefix}Stage: {label}{retry_text}"
|
||||||
|
if event == "agent.start":
|
||||||
|
model_text = f" | Model: {model}" if model else ""
|
||||||
|
return f"{prefix}Agent: {agent_id or stage_id}{model_text}"
|
||||||
|
if event == "command.start":
|
||||||
|
return f"{prefix}Command: {command or stage_id}"
|
||||||
|
if event == "stage.retry":
|
||||||
|
return f"{prefix}Retrying after {stage_id} -> {next_stage}{retry_text}"
|
||||||
|
if event in {"stage.finish", "task.finish"} and status:
|
||||||
|
target = f"Stage: {stage_id}" if event == "stage.finish" and stage_id else "Task"
|
||||||
|
return f"{prefix}{target} {status}"
|
||||||
|
if event.endswith(".start"):
|
||||||
|
return f"{prefix}{message}"
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
def tail_lines(path: Path, limit: int = 100) -> list[str]:
|
def tail_lines(path: Path, limit: int = 100) -> list[str]:
|
||||||
if limit <= 0:
|
if limit <= 0:
|
||||||
return []
|
return []
|
||||||
|
|
|
||||||
|
|
@ -44,6 +44,13 @@ BANNER_MESSAGES = [
|
||||||
quote = random.choice(BANNER_MESSAGES)
|
quote = random.choice(BANNER_MESSAGES)
|
||||||
|
|
||||||
HOTDOG_ANIMATIONS = {
|
HOTDOG_ANIMATIONS = {
|
||||||
|
"status_dots": [
|
||||||
|
"[. ]",
|
||||||
|
"[.. ]",
|
||||||
|
"[...]",
|
||||||
|
"[ ..]",
|
||||||
|
"[ .]",
|
||||||
|
],
|
||||||
"classic_dance": [
|
"classic_dance": [
|
||||||
"🌭",
|
"🌭",
|
||||||
"ヽ(🌭)ノ",
|
"ヽ(🌭)ノ",
|
||||||
|
|
@ -158,6 +165,7 @@ class TerminalAnimation:
|
||||||
self._stop = threading.Event()
|
self._stop = threading.Event()
|
||||||
self._thread: threading.Thread | None = None
|
self._thread: threading.Thread | None = None
|
||||||
self._width = 0
|
self._width = 0
|
||||||
|
self._lock = threading.Lock()
|
||||||
|
|
||||||
def __enter__(self) -> "TerminalAnimation":
|
def __enter__(self) -> "TerminalAnimation":
|
||||||
self.start()
|
self.start()
|
||||||
|
|
@ -180,11 +188,17 @@ class TerminalAnimation:
|
||||||
self._clear()
|
self._clear()
|
||||||
self._thread = None
|
self._thread = None
|
||||||
|
|
||||||
|
def update_message(self, message: str) -> None:
|
||||||
|
with self._lock:
|
||||||
|
self.message = message
|
||||||
|
|
||||||
def _run(self) -> None:
|
def _run(self) -> None:
|
||||||
index = 0
|
index = 0
|
||||||
while not self._stop.is_set():
|
while not self._stop.is_set():
|
||||||
frame = self.frames[index % len(self.frames)]
|
frame = self.frames[index % len(self.frames)]
|
||||||
text = f"{frame} {self.message}"
|
with self._lock:
|
||||||
|
message = self.message
|
||||||
|
text = f"{frame} | {message}"
|
||||||
self._width = max(self._width, len(text))
|
self._width = max(self._width, len(text))
|
||||||
self.stream.write("\r" + text.ljust(self._width))
|
self.stream.write("\r" + text.ljust(self._width))
|
||||||
self.stream.flush()
|
self.stream.flush()
|
||||||
|
|
|
||||||
|
|
@ -5,7 +5,7 @@ import unittest
|
||||||
from unittest.mock import patch
|
from unittest.mock import patch
|
||||||
|
|
||||||
from nightshift.artifacts import ArtifactStore
|
from nightshift.artifacts import ArtifactStore
|
||||||
from nightshift.runlog import RunLogger
|
from nightshift.runlog import RunLogger, format_status_event_message
|
||||||
from nightshift.terminal import (
|
from nightshift.terminal import (
|
||||||
HOTDOG_ANIMATIONS,
|
HOTDOG_ANIMATIONS,
|
||||||
TerminalAnimation,
|
TerminalAnimation,
|
||||||
|
|
@ -34,6 +34,7 @@ class TerminalStylingTests(unittest.TestCase):
|
||||||
def test_animation_frames_fall_back_to_agent_thinking(self) -> None:
|
def test_animation_frames_fall_back_to_agent_thinking(self) -> None:
|
||||||
self.assertEqual(animation_frames("missing"), tuple(HOTDOG_ANIMATIONS["agent_thinking"]))
|
self.assertEqual(animation_frames("missing"), tuple(HOTDOG_ANIMATIONS["agent_thinking"]))
|
||||||
self.assertEqual(animation_frames("classic_dance"), tuple(HOTDOG_ANIMATIONS["classic_dance"]))
|
self.assertEqual(animation_frames("classic_dance"), tuple(HOTDOG_ANIMATIONS["classic_dance"]))
|
||||||
|
self.assertEqual(animation_frames("status_dots"), tuple(HOTDOG_ANIMATIONS["status_dots"]))
|
||||||
|
|
||||||
def test_terminal_animation_is_disabled_for_non_tty(self) -> None:
|
def test_terminal_animation_is_disabled_for_non_tty(self) -> None:
|
||||||
stream = StringIO()
|
stream = StringIO()
|
||||||
|
|
@ -84,6 +85,47 @@ class TerminalStylingTests(unittest.TestCase):
|
||||||
self.assertNotIn("\x1b[", run_log)
|
self.assertNotIn("\x1b[", run_log)
|
||||||
self.assertNotIn("abc", run_log)
|
self.assertNotIn("abc", run_log)
|
||||||
|
|
||||||
|
def test_run_logger_status_callback_gets_compact_stage_message(self) -> None:
|
||||||
|
with tempfile.TemporaryDirectory() as directory:
|
||||||
|
root = Path(directory)
|
||||||
|
artifacts = ArtifactStore(root, ".nightshift", run_id="test-run")
|
||||||
|
statuses: list[str] = []
|
||||||
|
logger = RunLogger(status=statuses.append)
|
||||||
|
logger.bind(artifacts)
|
||||||
|
|
||||||
|
logger.event(
|
||||||
|
"stage.start",
|
||||||
|
"Starting stage",
|
||||||
|
task_id="TASK-001",
|
||||||
|
stage_id="implement",
|
||||||
|
stage_type="file_writer",
|
||||||
|
retry_count=2,
|
||||||
|
)
|
||||||
|
logger.event(
|
||||||
|
"agent.start",
|
||||||
|
"Starting agent",
|
||||||
|
task_id="TASK-001",
|
||||||
|
agent_id="implementer",
|
||||||
|
model="qwen3-coder:30b",
|
||||||
|
)
|
||||||
|
|
||||||
|
self.assertEqual(statuses[0], "Task: TASK-001 | Stage: implement (file_writer) retry 2")
|
||||||
|
self.assertEqual(statuses[1], "Task: TASK-001 | Agent: implementer | Model: qwen3-coder:30b")
|
||||||
|
|
||||||
|
def test_format_status_event_message_reports_retries(self) -> None:
|
||||||
|
message = format_status_event_message(
|
||||||
|
"stage.retry",
|
||||||
|
"Redirecting after stage result",
|
||||||
|
{
|
||||||
|
"task_id": "TASK-001",
|
||||||
|
"stage_id": "test",
|
||||||
|
"next_stage": "implement",
|
||||||
|
"retry_count": 1,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
self.assertEqual(message, "Task: TASK-001 | Retrying after test -> implement retry 1")
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
unittest.main()
|
unittest.main()
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue
Block a user