diff --git a/README.md b/README.md new file mode 100644 index 0000000..78b2f6e --- /dev/null +++ b/README.md @@ -0,0 +1,669 @@ +# NightShift + +> Auditable local-first AI coding pipelines. +> +> Wake up to reviewable work, not chaos. + +NightShift is a deterministic pipeline runner for long-running AI-assisted coding workflows. + +It is designed for overnight or unattended execution against a scoped project repository using local or external coding agents. + +NightShift is not an autonomous coding god. + +It is a safety-aware orchestration system that treats LLMs like unreliable distributed systems. + +Agents are bounded by: + +* scoped repository access +* structured pipeline stages +* tests and static analysis +* retry limits +* review stages +* context compaction +* durable artifacts + +The output is: + +* reviewable code +* plans +* logs +* diffs +* test output +* review notes +* overnight summaries + +Not blind autonomous shipping. + +--- + +# Why? + +Most "AI coding agents" optimize for: + +* autonomy +* demo magic +* speed +* vibes + +NightShift optimizes for: + +1. Cheapness +2. Correctness +3. Auditability +4. Speed + +NightShift is also intended to serve as an experimentation platform for AI-assisted software engineering workflows. + +The system is intentionally designed to facilitate testing and comparison of: + +* different models +* different agent roles +* prompt structures +* system prompts +* retry strategies +* review strategies +* context compaction techniques +* pipeline structures +* reasoning formats +* constraint-driven workflows + +The pipeline architecture should make these experiments reproducible, auditable, and configurable rather than hidden inside opaque agent behavior. + +The assumption is simple: + +> AI systems are useful but unreliable. + +NightShift embraces this reality by building deterministic orchestration around nondeterministic agents. + +--- + +# Features + +## Local-first execution + +Designed primarily for: + +* Ollama +* local models +* Codex CLI +* Claude Code +* command-driven wrappers + +Use cheap local models for most work. +Escalate expensive models only where useful. + +--- + +## Declarative pipelines + +Define workflows in YAML: + +```yaml +pipeline: + stages: + - id: plan + type: agent + agent: planner + + - id: implement + type: agent + agent: implementer + + - id: test + type: command + commands: + - cargo test + + - id: review + type: review + agent: reviewer +``` + +Pipelines are intentionally portable and configurable so users can experiment with: + +* model routing +* review loops +* retry logic +* prompt engineering +* reasoning formats +* planning strategies +* context structures +* cost/performance tradeoffs + +NightShift is designed to make these workflow experiments measurable and repeatable rather than ad-hoc. + +```yaml +pipeline: + stages: + - id: plan + type: agent + agent: planner + + - id: implement + type: agent + agent: implementer + + - id: test + type: command + commands: + - cargo test + + - id: review + type: review + agent: reviewer +``` + +--- + +## Review-first workflows + +NightShift is designed around: + +```text +plan + -> review + -> implement + -> test + -> static analysis + -> review + -> retry or complete +``` + +The goal is: + +> Wake up to a useful review package. + +--- + +## Durable artifacts + +Every run creates a full audit trail. + +Example: + +```text +.nightshift/ + runs/ + 2026-05-16-overnight/ + run-summary.md + + tasks/ + TASK-001/ + plan.md + review.md + implementation-log.md + test-output.txt + diff.patch +``` + +This makes: + +* debugging easier +* prompt experimentation possible +* retries understandable +* failures inspectable +* portfolio demos stronger + +--- + +## Scoped repository safety + +NightShift can: + +* restrict writable directories +* allowlist commands +* block dangerous shell operations +* require clean git worktrees + +The system is intentionally conservative. + +--- + +# Philosophy + +NightShift follows a few core principles. + +## Deterministic orchestration + +Agents are probabilistic. + +The pipeline runner should not be. + +--- + +## Context compaction + +Do not dump infinite history into prompts. + +Use: + +* project context +* task context +* retry summaries + +Keep context compact and intentional. + +--- + +## Reviewability over autonomy + +NightShift is optimized to produce: + +* reviewable work +* reviewable reasoning +* reviewable failure + +Not autonomous deployment. + +--- + +## Boring reliability beats magical demos + +The system should: + +* fail clearly +* retry explicitly +* preserve artifacts +* avoid spooky hidden behavior + +--- + +# Architecture Overview + +```text +Task Parser + ↓ +Pipeline Runner + ↓ +Stage Executor + ┌────┴────┐ + ↓ ↓ +Agents Commands +``` + +Core components: + +* Task parser +* Pipeline runner +* Stage executor +* Agent wrappers +* Command runner +* Artifact store +* Context manager +* Safety layer + +--- + +# Example Workflow + +Input: + +* repository +* tasks.md +* nightshift.yaml +* agent prompt files + +Execution: + +```text +TASK-001 + ↓ +plan + ↓ +review_plan + ↓ +implement + ↓ +test + ↓ +static analysis + ↓ +review + ↓ +complete or retry +``` + +Output: + +* modified repository +* task artifacts +* overnight report +* review notes + +--- + +# Installation + +## Status + +NightShift is currently an early-stage project. + +The MVP focuses on: + +* local-first execution +* declarative pipelines +* task orchestration +* artifact generation +* safe command execution +* reviewable workflows + +--- + +## Planned Installation + +Python version: + +```bash +pip install nightshift +``` + +Development install: + +```bash +git clone +cd nightshift +pip install -e . +``` + +--- + +# Quickstart + +## 1. Initialize a project + +```bash +nightshift init +``` + +Creates: + +```text +nightshift.yaml +tasks.md +agents/ +``` + +--- + +## 2. Define tasks + +Example: + +```markdown +- [ ] TASK-001: Add YAML config loading + +Description: +Implement config loading for NightShift. + +Acceptance Criteria: +- Loads `nightshift.yaml` +- Validates required fields +- Includes tests +``` + +--- + +## 3. Configure pipeline + +Example: + +```yaml +project: + root: . + task_file: tasks.md + artifact_dir: .nightshift + +pipeline: + max_task_retries: 3 +``` + +--- + +## 4. Run NightShift + +```bash +nightshift run +``` + +Or: + +```bash +nightshift run --task TASK-001 +``` + +--- + +## 5. Review artifacts + +```text +.nightshift/runs// +``` + +Contains: + +* plans +* logs +* diffs +* test output +* review notes +* summaries + +--- + +# Example Config + +```yaml +project: + name: example-project + root: . + task_file: tasks.md + artifact_dir: .nightshift + +safety: + require_clean_worktree: true + + scoped_paths: + - src/ + - tests/ + + allowed_commands: + - cargo test + - cargo fmt --check + + forbidden_commands: + - rm -rf + - git push + +agents: + planner: + backend: command + command: codex + system_prompt: agents/planner.md + + implementer: + backend: command + command: codex + system_prompt: agents/implementer.md + + reviewer: + backend: command + command: codex + system_prompt: agents/reviewer.md + +pipeline: + max_task_retries: 3 + + stages: + - id: plan + type: agent + agent: planner + + - id: implement + type: agent + agent: implementer + + - id: test + type: command + commands: + - cargo test + + - id: review + type: review + agent: reviewer +``` + +--- + +# Safety Model + +NightShift intentionally limits agent freedom. + +## Repository scope restrictions + +Agents should only operate within configured project paths. + +--- + +## Command allowlists + +Commands must be explicitly permitted. + +Example: + +```yaml +allowed_commands: + - cargo test + - cargo fmt --check +``` + +--- + +## Dangerous command blocking + +NightShift may block commands such as: + +```text +rm -rf +git push +curl | bash +``` + +--- + +## Review-first workflow + +The system assumes: + +> Humans remain the final authority. + +--- + +# Roadmap + +## MVP + +* [ ] YAML config loading +* [ ] Markdown task parsing +* [ ] Pipeline execution +* [ ] Fake command agents +* [ ] Artifact generation +* [ ] Safe command execution +* [ ] Retry handling +* [ ] Overnight reports + +## Future + +* [ ] Ollama integration +* [ ] Claude Code integration +* [ ] Codex integration +* [ ] Parallel execution +* [ ] DAG workflows +* [ ] Prompt A/B testing +* [ ] Cost telemetry +* [ ] Git branch isolation +* [ ] Dashboard UI +* [ ] Constraint-language experimentation + +--- + +# Inspiration + +NightShift is inspired by: + +* CI/CD systems +* build pipelines +* state machines +* agent orchestration research +* distributed systems thinking +* local-first tooling +* practical AI skepticism + +--- + +# Philosophy Statement + +NightShift rejects two extremes: + +## Fully manual engineering + +Too slow. + +## Reckless autonomous agents + +Too unreliable. + +Instead: + +> NightShift treats AI systems as bounded workers inside deterministic workflows. + +The goal is not artificial software gods. + +The goal is trustworthy leverage. + +--- + +# License + +Planned: + +GPLv3 + +Rationale: + +NightShift is licensed under GPLv3 because AI-assisted software engineering is rapidly becoming dependent on opaque, vendor-controlled tooling. As agent systems become part of the actual software production process, users deserve the freedom to inspect, modify, audit, and reproduce the systems operating on their codebases. GPLv3 helps ensure that improvements to NightShift and its orchestration layer remain part of a transparent, inspectable ecosystem rather than disappearing into proprietary black boxes. The goal is not just open source for its own sake, but preserving user autonomy, local-first experimentation, and the ability to understand how automated systems are making decisions inside increasingly critical engineering workflows. + +* encourages community contribution +* protects local-first ecosystem +* aligns with hacker/free software ethos + +[Read more here, GPLv3 saves the world.](https://www.gnu.org/licenses/rms-why-gplv3.html) + +--- + +# Contributing + +NightShift is intentionally early and experimental. + +Good contributions: + +* safety improvements +* pipeline reliability +* better artifact systems +* better context compaction +* local model integrations +* tests +* docs + +Bad contributions: + +* adding magical autonomy before reliability exists +* removing safety boundaries +* overcomplicated abstractions before MVP stability + +--- + +# Final Note + +AI coding tools are currently optimized for demos. + +NightShift is optimized for surviving the night. diff --git a/docs/design.md b/docs/design.md new file mode 100644 index 0000000..b87c875 --- /dev/null +++ b/docs/design.md @@ -0,0 +1,1039 @@ +# NightShift + +## Auditable Local-First AI Coding Pipelines + +Version: v0.1 Draft +Author: K455 +Status: Design Proposal + +--- + +# 1. Executive Summary + +NightShift is a local-first AI pipeline runner designed to execute long-running coding workflows against a constrained project workspace. + +The system is intended to run overnight or unattended for extended periods while remaining: + +* Cheap +* Correct +* Auditable +* Safe +* Reviewable + +NightShift is not designed to be a fully autonomous "AI software engineer." +Instead, it is a deterministic orchestration system that allows fallible AI agents to operate within constrained, test-driven, auditable workflows. + +The core philosophy is: + +> Treat LLMs like unreliable distributed systems. + +Agents are bounded by: + +* Scoped repository access +* Structured stage contracts +* Explicit retry behavior +* Tests and static checks +* Review stages +* Context compaction +* Artifact logging + +The intended workflow is: + +1. User provides: + + * Repository + * Task list + * Pipeline configuration + * Agent definitions + +2. NightShift: + + * Selects the next task + * Generates a plan + * Reviews the plan + * Implements changes + * Runs tests/static analysis + * Reviews results + * Retries if necessary + * Produces an overnight report + +The result is a reviewable repository state and a full audit trail of AI behavior. + +--- + +# 2. Goals + +## 2.1 Primary Goals + +### Local-first execution + +The system should work primarily with local models and local execution environments. + +Examples: + +* Ollama +* Local transformers +* Local agent runtimes +* Claude Code +* Codex CLI + +### Long-running unattended workflows + +NightShift should support: + +* Overnight execution +* Large task chains +* Multi-stage workflows +* Automated retries +* Context handoff between stages + +### Auditability + +Every important action should be recorded. + +Users should be able to inspect: + +* Prompts +* Plans +* Reviews +* Command outputs +* Diffs +* Test results +* Retry reasoning +* Final summaries + +### Cheapness-first execution + +The orchestration layer should assume: + +* Cheap local models handle most work +* Expensive models are escalation layers +* Context size matters +* Token usage matters +* Retry cost matters + +### Safe repository boundaries + +The system should: + +* Restrict file access +* Restrict shell commands +* Avoid destructive operations +* Minimize repository damage + +--- + +## 2.2 Non-Goals (v1) + +The following are intentionally out of scope for v1: + +* Fully autonomous software development +* Parallel distributed execution +* Automatic deployment +* Cloud-native orchestration +* Dynamic self-modifying pipelines +* Autonomous internet access +* Agent swarms +* Arbitrary Python execution hooks +* Automatic git pushes +* Full DAG orchestration + +--- + +# 3. Design Philosophy + +NightShift is built around several core principles. + +## 3.1 Deterministic orchestration + +Agents are nondeterministic. + +The orchestration system should not be. + +Pipeline behavior should be: + +* Predictable +* Reproducible +* Configurable +* Explicit + +--- + +## 3.2 Structured state transitions + +NightShift uses a state-machine workflow model. + +A task moves through defined stages: + +```text +Task Queue + -> Plan + -> Plan Review + -> Implement + -> Test + -> Static Check + -> Review + -> Retry / Complete +``` + +Each stage produces: + +```yaml +status: pass | fail | retry | escalate +reason: string +next_stage: optional +context_update: optional +``` + +This allows the pipeline runner to remain deterministic even while agents are probabilistic. + +--- + +## 3.3 Context compaction + +Agents should not inherit unlimited history. + +Instead: + +* Project-level context is persistent and compact +* Task-level context is scoped +* Retry context is summarized +* Stage context is minimized + +This reduces: + +* Token costs +* Context poisoning +* Hallucination drift +* Recursive confusion + +--- + +## 3.4 Reviewability over autonomy + +NightShift is optimized to produce: + +* Reviewable code +* Reviewable reports +* Reviewable reasoning + +The primary output is: + +> A useful morning review state. + +Not: + +> Fully autonomous shipping. + +--- + +# 4. Architecture Overview + +## 4.1 High-Level Components + +```text ++-------------------+ +| Task Parser | ++-------------------+ + | + v ++-------------------+ +| Pipeline Runner | ++-------------------+ + | + v ++-------------------+ +| Stage Executor | ++-------------------+ + | | + | +----------------+ + | | + v v ++-----------+ +----------------+ +| Agent API | | Command Runner | ++-----------+ +----------------+ + | | + v v ++-----------+ +----------------+ +| LLM Model | | Test/Lint/etc | ++-----------+ +----------------+ +``` + +--- + +## 4.2 Core Components + +### Task Parser + +Responsible for: + +* Reading markdown task files +* Parsing acceptance criteria +* Tracking completion state +* Determining dependencies + +--- + +### Pipeline Runner + +Responsible for: + +* Stage orchestration +* Retry logic +* State transitions +* Artifact management +* Context propagation + +--- + +### Stage Executor + +Responsible for: + +* Executing stage definitions +* Calling agents +* Running commands +* Collecting outputs + +--- + +### Agent Layer + +Responsible for: + +* Prompt construction +* Model backend integration +* Structured output parsing +* Context injection + +--- + +### Command Runner + +Responsible for: + +* Executing tests +* Static analysis +* Formatting +* Shell command restrictions +* Sandboxing + +--- + +# 5. Workflow Model + +## 5.1 State Machine Model + +NightShift uses a configurable state-machine workflow. + +This was selected over: + +* DAG orchestration +* Arbitrary scripting + +because: + +* v1 executes one task at a time +* Retry loops are first-class +* Auditability is easier +* Deterministic transitions are simpler + +--- + +## 5.2 Default Pipeline + +```text +PLAN + ↓ +REVIEW_PLAN + ↓ +IMPLEMENT + ↓ +TEST + ↓ +STATIC_ANALYSIS + ↓ +REVIEW + ↓ +DECISION +``` + +Decision outcomes: + +* COMPLETE +* RETRY_IMPLEMENTATION +* RETRY_PLANNING +* FAIL + +--- + +## 5.3 Configurable Pipelines + +Pipelines are defined declaratively. + +Users may: + +* Swap stage orders +* Add/remove stages +* Define retry behavior +* Use different models +* A/B test prompts +* Experiment with reasoning structures + +--- + +# 6. Configuration System + +## 6.1 Configuration Format + +NightShift uses YAML configuration files. + +Reasons: + +* Human-readable +* Good nested structure support +* Easier workflow representation than TOML +* Safer than arbitrary Python execution + +--- + +## 6.2 Example Configuration + +```yaml +project: + name: my-project + root: . + task_file: tasks.md + artifact_dir: .nightshift + +safety: + require_clean_worktree: true + + scoped_paths: + - src/ + - tests/ + + forbidden_commands: + - rm -rf + - git push + + allowed_commands: + - cargo test + - cargo fmt + - cargo clippy + +agents: + planner: + backend: ollama + model: qwen2.5-coder:14b + system_prompt: agents/planner.md + + implementer: + backend: claude-code + model: sonnet + system_prompt: agents/implementer.md + + reviewer: + backend: ollama + model: deepseek-r1:32b + system_prompt: agents/reviewer.md + +pipeline: + max_task_retries: 3 + + stages: + - id: plan + type: agent + agent: planner + + - id: review_plan + type: review + agent: reviewer + on_fail: plan + + - id: implement + type: agent + agent: implementer + + - id: test + type: command + commands: + - cargo test + + - id: static + type: command + commands: + - cargo fmt --check + - cargo clippy -- -D warnings + + - id: review + type: review + agent: reviewer + on_fail: implement +``` + +--- + +# 7. Task System + +## 7.1 Task Format + +Tasks are defined in markdown. + +Example: + +```markdown +- [ ] TASK-001: Add retry support to pipeline runner + +Acceptance Criteria: +- Retries configurable per stage +- Retry summaries persisted +- Retry count visible in final report +``` + +--- + +## 7.2 Task Lifecycle + +Each task: + +1. Is parsed +2. Is assigned a workspace +3. Receives planning +4. Receives implementation +5. Is validated +6. Is reviewed +7. Produces artifacts +8. Is marked complete or failed + +--- + +## 7.3 Task Dependencies + +Future versions may support: + +```text +TASK-003 depends on TASK-001 +``` + +However: + +* Tasks should remain independently testable when possible +* Pipelines should maintain a buildable repository state + +--- + +# 8. Agent Model + +## 8.1 Agent Roles + +Agents are specialized. + +Example roles: + +* planner +* implementer +* reviewer +* summarizer +* test-writer + +--- + +## 8.2 Agent Definitions + +Agents are configurable. + +Each agent defines: + +* Backend +* Model +* System prompt +* Constraints +* Output schema + +--- + +## 8.3 Multi-Backend Support + +NightShift should support: + +* Ollama +* Claude Code +* Codex CLI +* Future local runners + +This allows: + +* Cheap local planning +* Expensive selective escalation +* Hybrid pipelines + +--- + +## 8.4 Structured Outputs + +Agents should emit machine-readable results. + +Example: + +```yaml +status: pass +summary: | + Tests succeeded. +issues: + - None +next_stage: review +``` + +--- + +# 9. Context System + +## 9.1 Context Layers + +NightShift uses layered context. + +### Project Context + +Long-lived information: + +* Architecture +* Coding standards +* Constraints +* Previous summaries + +--- + +### Task Context + +Task-specific information: + +* Acceptance criteria +* Relevant files +* Prior retries +* Implementation notes + +--- + +### Retry Context + +Compact summaries of: + +* Previous failures +* Previous reviews +* Previous test errors + +--- + +## 9.2 Context Compaction + +Every stage should summarize output. + +This prevents: + +* Infinite context growth +* Token explosion +* Recursive hallucination +* Low-signal history accumulation + +--- + +# 10. Safety Model + +## 10.1 Repository Scope Restrictions + +NightShift should restrict: + +* Accessible directories +* Writable paths +* Executable commands + +--- + +## 10.2 Command Restrictions + +Commands are allowlisted. + +Potentially dangerous commands are forbidden. + +Examples: + +```text +Forbidden: +- rm -rf +- git push +- curl | bash +``` + +--- + +## 10.3 Clean Worktree Requirement + +v1 may optionally require: + +```text +git status == clean +``` + +before execution. + +This simplifies: + +* Auditability +* Recovery +* Diff inspection + +--- + +# 11. Testing and Validation + +## 11.1 Validation Pipeline + +Validation occurs in multiple stages: + +```text +Tests + ↓ +Static Analysis + ↓ +Review Agent + ↓ +Decision +``` + +--- + +## 11.2 Global Test Suite + +Tests are global. + +Rationale: + +* New changes must not break old functionality +* Pipeline should maintain cumulative stability + +--- + +## 11.3 Generated Tests + +Agents may generate tests for features. + +Generated tests become part of the persistent suite. + +--- + +# 12. Artifact System + +## 12.1 Artifact Goals + +Artifacts provide: + +* Auditability +* Replayability +* Debugging +* Historical inspection +* Prompt experimentation + +--- + +## 12.2 Example Layout + +```text +.nightshift/ + project-context.md + + runs/ + 2026-05-16-overnight/ + run-summary.md + config.snapshot.yaml + + tasks/ + TASK-001/ + task.md + plan.md + plan-review.md + implementation-log.md + test-output.txt + static-output.txt + review.md + final-notes.md + diff.patch + context-out.md +``` + +--- + +# 13. Overnight Report + +At completion NightShift generates: + +* Completed tasks +* Failed tasks +* Retry counts +* Files modified +* Test results +* Reviewer summaries +* Remaining issues +* Suggested follow-up work + +The goal is: + +> Wake up to a review package. + +--- + +# 14. Future Directions + +Potential future features: + +* Parallel task execution +* DAG workflows +* Distributed workers +* Sandboxed containers +* Git branch isolation +* Agent tournaments +* Constraint language experimentation +* Prompt A/B testing +* Semantic memory systems +* Multi-repo orchestration +* Web dashboard +* Cost telemetry +* Human approval gates + +--- + +# 15. Risks + +## 15.1 Context poisoning + +Mitigation: + +* Context compaction +* Retry summarization +* Structured stage boundaries + +--- + +## 15.2 Agent loops + +Mitigation: + +* Explicit retry counts +* Deterministic transitions +* Timeout handling + +--- + +## 15.3 Repository damage + +Mitigation: + +* Scoped directories +* Command restrictions +* Validation stages + +--- + +## 15.4 Cost explosion + +Mitigation: + +* Local-first execution +* Context minimization +* Escalation-only expensive models + +--- + +# 16. MVP Definition + +The minimum viable NightShift implementation should: + +1. Parse markdown tasks +2. Execute a declarative pipeline +3. Support local agents +4. Generate plans +5. Generate implementations +6. Run tests +7. Run static analysis +8. Run review agents +9. Retry failed stages +10. Produce artifacts +11. Produce an overnight summary +12. Restrict repository access + +This MVP is sufficient to: + +* Demonstrate orchestration architecture +* Demonstrate AI pipeline engineering +* Demonstrate safety-aware automation +* Serve as a strong portfolio project + +--- + +# Appendix A: Design Decisions and Rationale + +## A.1 Local-first architecture + +Decision: + +* Prefer local models and local execution + +Reasoning: + +* Cheapness-first design +* Better experimentation +* Better privacy +* Reduced vendor dependency +* Better overnight scalability + +--- + +## A.2 State machine over DAG + +Decision: + +* Use configurable state-machine workflows + +Reasoning: + +* One-task-at-a-time execution +* Retry loops are primary workflow behavior +* Easier auditing +* Easier debugging +* Simpler MVP + +--- + +## A.3 YAML configuration + +Decision: + +* Use declarative YAML config + +Reasoning: + +* Human-readable +* Easier nested workflow representation +* Safer than arbitrary Python +* Better portability + +--- + +## A.4 Cheapness-first model routing + +Decision: + +* Use expensive models selectively + +Reasoning: + +* Overnight pipelines can become token-expensive +* Local models are sufficient for many stages +* Review stages benefit more from premium models + +--- + +## A.5 Strict repository scoping + +Decision: + +* Limit writable paths and executable commands + +Reasoning: + +* Prevent accidental damage +* Maintain trust in unattended execution +* Improve auditability + +--- + +## A.6 Reviewable output over autonomy + +Decision: + +* Produce review packages rather than autonomous shipping + +Reasoning: + +* Human review remains critical +* Improves safety +* Improves correctness +* Keeps architecture grounded and practical + +--- + +## A.7 Layered context model + +Decision: + +* Separate project, task, and retry context + +Reasoning: + +* Reduces token usage +* Prevents context explosion +* Improves signal quality +* Prevents recursive drift + +--- + +## A.8 Artifact-heavy architecture + +Decision: + +* Persist plans, logs, reviews, outputs, and summaries + +Reasoning: + +* Debugging +* Prompt experimentation +* A/B testing +* Replayability +* Portfolio visibility + +--- + +## A.9 No parallelism in v1 + +Decision: + +* Execute one task at a time + +Reasoning: + +* Simpler correctness model +* Easier debugging +* Easier repository safety +* Easier context management + +--- + +## A.10 Declarative pipelines first + +Decision: + +* No arbitrary Python hooks in v1 + +Reasoning: + +* Safer execution +* Easier reproducibility +* Easier auditing +* Easier portability + +--- + +# Closing Statement + +NightShift is intended to explore a practical middle ground between: + +* Fully manual software engineering +* Reckless autonomous agent systems + +The system assumes that AI agents are useful but unreliable. + +NightShift therefore treats agents as bounded workers inside deterministic, auditable, test-driven workflows. + +The primary output is not blind autonomy. + +The primary output is trustworthy leverage. diff --git a/docs/vibe.md b/docs/vibe.md new file mode 100644 index 0000000..d58ba5c --- /dev/null +++ b/docs/vibe.md @@ -0,0 +1,1031 @@ +# NIGHTSHIFT_CODEX.md + +You are Codex working on **NightShift**, a local-first AI coding pipeline runner. + +This file is the implementation-driving context document. Treat it as the project brief, architectural guide, and task checklist. + +--- + +# 0. Project Identity + +## Name + +**NightShift** + +## Tagline + +Auditable local-first AI coding pipelines. + +## Core Thesis + +NightShift is not an autonomous coding god. + +NightShift is a deterministic pipeline runner that lets unreliable AI agents perform bounded coding work inside scoped, auditable, test-driven workflows. + +The user should be able to run NightShift overnight and wake up to: + +* a reviewable repository state +* task artifacts +* plans +* logs +* diffs +* test output +* review notes +* a final report + +## Priority Order + +Optimize in this order: + +1. Cheapness +2. Correctness +3. Auditability +4. Speed + +This means: + +* Prefer local models first. +* Keep context compact. +* Avoid token waste. +* Make failure explicit. +* Always produce artifacts. +* Do not optimize for cleverness before trust. + +--- + +# 1. Product Summary + +NightShift runs long-running AI-assisted coding pipelines against a scoped project directory. + +A user provides: + +* a repository +* a markdown task file +* a declarative pipeline config +* agent definitions +* allowed test/static commands + +NightShift processes one task at a time: + +```text +select task + -> plan + -> review plan + -> implement + -> run tests + -> run static checks + -> review result + -> retry or complete + -> write summary +``` + +The output is not automatically shipped. + +The output is a reviewable work package. + +--- + +# 2. Non-Negotiable Design Constraints + +## 2.1 Local-first + +The first implementation should assume local execution. + +Primary target backend: + +* local command-driven agent execution + +Future-compatible backends: + +* Ollama +* Claude Code +* Codex CLI +* OpenAI API +* Anthropic API + +Do not overbuild backend support in v1. + +Build a clean interface first. + +--- + +## 2.2 Scoped directory access + +NightShift must only operate inside a configured project root. + +It must not casually read/write arbitrary paths. + +All path resolution should: + +* normalize paths +* reject path traversal +* reject writes outside project root +* prefer relative paths in artifacts + +--- + +## 2.3 One task at a time + +v1 runs one task at a time. + +No parallel task execution. + +No DAG executor yet. + +--- + +## 2.4 Declarative config first + +Use YAML for v1. + +Do not implement arbitrary Python config yet. + +The config should be expressive enough for: + +* agents +* stages +* commands +* retries +* artifact directory +* task file location +* scoped paths +* allowlisted commands + +--- + +## 2.5 Auditable artifacts + +Every run should create a durable artifact tree. + +Artifacts are core product behavior, not debug leftovers. + +--- + +# 3. Architecture + +## 3.1 Conceptual Components + +```text +NightShift CLI + | + v +Config Loader + | + v +Task Parser + | + v +Pipeline Runner + | + +--> Agent Executor + | + +--> Command Executor + | + +--> Artifact Store + | + +--> Context Manager + | + v +Run Summary +``` + +--- + +## 3.2 Suggested Module Layout + +Use this layout unless the existing repo already strongly implies another structure. + +```text +nightshift/ + __init__.py + cli.py + config.py + tasks.py + pipeline.py + stages.py + agents.py + commands.py + artifacts.py + context.py + safety.py + reports.py + errors.py + +tests/ + test_config.py + test_tasks.py + test_pipeline.py + test_safety.py + test_artifacts.py + +examples/ + pipeline.yaml + tasks.md + agents/ + planner.md + implementer.md + reviewer.md + +NIGHTSHIFT_CODEX.md +README.md +``` + +If this project is implemented in Rust instead of Python, preserve the same conceptual boundaries. + +--- + +# 4. Config Format + +## 4.1 Example `nightshift.yaml` + +```yaml +project: + name: example-project + root: . + task_file: tasks.md + artifact_dir: .nightshift + +safety: + require_clean_worktree: false + scoped_paths: + - src/ + - tests/ + allowed_commands: + - cargo test + - cargo fmt --check + - cargo clippy -- -D warnings + forbidden_commands: + - rm -rf + - git push + - curl | bash + +agents: + planner: + backend: command + command: echo + system_prompt: examples/agents/planner.md + + implementer: + backend: command + command: echo + system_prompt: examples/agents/implementer.md + + reviewer: + backend: command + command: echo + system_prompt: examples/agents/reviewer.md + +pipeline: + max_task_retries: 3 + stages: + - id: plan + type: agent + agent: planner + output: plan.md + + - id: review_plan + type: agent_review + agent: reviewer + on_fail: plan + output: plan-review.md + + - id: implement + type: agent + agent: implementer + output: implementation-log.md + + - id: test + type: command + commands: + - cargo test + output: test-output.txt + + - id: static + type: command + commands: + - cargo fmt --check + - cargo clippy -- -D warnings + output: static-output.txt + + - id: review + type: agent_review + agent: reviewer + on_fail: implement + output: review.md + + - id: summarize + type: summarize + output: final-notes.md +``` + +--- + +# 5. Task File Format + +## 5.1 Input Task Format + +Tasks are markdown checklist items with acceptance criteria. + +Example: + +```markdown +# Tasks + +- [ ] TASK-001: Add YAML config loading + +Description: +Implement config loading for NightShift. + +Acceptance Criteria: +- Loads `nightshift.yaml` +- Validates required fields +- Returns typed config object +- Includes tests for valid and invalid config + +- [ ] TASK-002: Add artifact directory creation + +Description: +Create per-run and per-task artifact directories. + +Acceptance Criteria: +- Creates `.nightshift/runs//` +- Creates task-specific folder +- Writes task snapshot +- Includes tests +``` + +## 5.2 Parser Requirements + +The parser should identify: + +* task id +* task title +* completion state +* description +* acceptance criteria +* optional dependency notes + +For v1, parsing can be simple and documented. + +Do not try to support every markdown style. + +--- + +# 6. Pipeline Model + +## 6.1 State Machine, Not DAG + +v1 should use a configurable state machine. + +Reason: + +* one task at a time +* retry loops matter +* easier to audit +* easier to debug +* easier MVP + +A stage returns a `StageResult`. + +Suggested shape: + +```python +@dataclass +class StageResult: + stage_id: str + status: Literal["pass", "fail", "retry", "escalate"] + reason: str + output_path: str | None = None + next_stage: str | None = None + context_update: str | None = None +``` + +Equivalent Rust structs are fine if using Rust. + +## 6.2 Retry Behavior + +Retry behavior should be deterministic. + +Rules: + +* retries are counted per task +* max retries come from config +* failed review stages can redirect to configured `on_fail` +* after max retries, task is marked failed +* failure is summarized in artifacts + +--- + +# 7. Agent Model + +## 7.1 Agent Definition + +Agents have: + +* id +* backend +* command or model +* system prompt file +* role + +For v1, support a `command` backend first. + +This lets the user wrap: + +* Codex +* Claude Code +* Ollama scripts +* local model scripts +* fake test agents + +## 7.2 Agent Invocation + +The runner should construct a prompt/input bundle containing: + +* system prompt +* task markdown +* acceptance criteria +* relevant project context +* previous stage output +* retry notes, if any +* required output contract + +The agent should write output to the configured artifact path. + +Do not pass giant history blobs. + +--- + +# 8. Context System + +## 8.1 Context Layers + +There are three context layers: + +```text +project context + long-lived, compact, shared across tasks + +task context + specific to the current task + +retry context + compact notes from failed attempts +``` + +## 8.2 Project Context + +Stored at: + +```text +.nightshift/project-context.md +``` + +Contains: + +* architecture notes +* repo conventions +* summaries from completed tasks +* high-value durable facts + +## 8.3 Task Context + +Stored per task: + +```text +.nightshift/runs//tasks//context.md +``` + +## 8.4 Context Compaction + +After each task, write: + +```text +context-out.md +``` + +Then selectively bubble useful durable information into project context. + +Do not automatically dump everything into project context. + +--- + +# 9. Artifact Layout + +Every run should create: + +```text +.nightshift/ + project-context.md + runs/ + / + run-summary.md + config.snapshot.yaml + tasks/ + TASK-001/ + task.md + plan.md + plan-review.md + implementation-log.md + test-output.txt + static-output.txt + review.md + final-notes.md + diff.patch + context.md + context-out.md +``` + +Artifacts should be written even on failure. + +--- + +# 10. Safety Rules + +## 10.1 Path Safety + +Implement helpers that: + +* resolve paths against project root +* reject writes outside project root +* reject `..` traversal that escapes root +* prefer pathlib/path abstractions + +## 10.2 Command Safety + +For v1: + +* only run commands listed in `allowed_commands` +* block commands containing known forbidden fragments +* record all command output +* record exit code +* set timeouts when practical + +## 10.3 Git Safety + +v1 should support config option: + +```yaml +require_clean_worktree: true | false +``` + +If true, abort when git working tree is dirty. + +Do not implement automatic branch creation in v1. + +Do not push. + +--- + +# 11. CLI Commands + +Recommended initial CLI: + +```bash +nightshift init +nightshift validate +nightshift run +nightshift run --task TASK-001 +nightshift status +``` + +## 11.1 `nightshift init` + +Creates example files: + +* `nightshift.yaml` +* `tasks.md` +* `agents/planner.md` +* `agents/implementer.md` +* `agents/reviewer.md` + +## 11.2 `nightshift validate` + +Validates: + +* config file exists +* task file exists +* scoped paths are inside root +* agents exist +* prompt files exist +* allowed commands are valid strings +* pipeline references valid agents + +## 11.3 `nightshift run` + +Runs the next incomplete task. + +## 11.4 `nightshift run --task TASK-001` + +Runs a specific task. + +## 11.5 `nightshift status` + +Prints: + +* current config +* task count +* completed/incomplete tasks +* latest run directory + +--- + +# 12. Testing Strategy + +Write tests early. + +Minimum tests: + +* config loading happy path +* config missing required fields +* markdown task parsing +* artifact directory creation +* path traversal rejection +* command allowlist behavior +* forbidden command rejection +* simple pipeline execution with fake agents +* retry limit behavior + +Use fake agents for tests. + +Do not require real LLM calls in unit tests. + +--- + +# 13. MVP Task Checklist + +## Phase 1: Skeleton + +* [ ] Create project package/module layout +* [ ] Add CLI entry point +* [ ] Add `nightshift init` +* [ ] Generate example `nightshift.yaml` +* [ ] Generate example `tasks.md` +* [ ] Generate example agent prompt files + +Acceptance Criteria: + +* User can run init command +* Expected files are created +* Existing files are not overwritten without confirmation or force flag + +--- + +## Phase 2: Config Loading + +* [ ] Implement YAML config loader +* [ ] Define typed config objects +* [ ] Validate required sections +* [ ] Validate agent references +* [ ] Validate pipeline stages +* [ ] Add tests + +Acceptance Criteria: + +* Valid config loads +* Invalid config fails with clear error +* Pipeline stages cannot reference missing agents + +--- + +## Phase 3: Safety Layer + +* [ ] Implement project root resolution +* [ ] Implement scoped path validation +* [ ] Implement safe artifact path creation +* [ ] Implement command allowlist check +* [ ] Implement forbidden command fragment check +* [ ] Add tests for path traversal +* [ ] Add tests for forbidden commands + +Acceptance Criteria: + +* Cannot write outside project root +* Cannot execute commands outside allowlist +* Dangerous command fragments are blocked + +--- + +## Phase 4: Task Parser + +* [ ] Parse markdown task checklist +* [ ] Extract task id +* [ ] Extract title +* [ ] Extract description +* [ ] Extract acceptance criteria +* [ ] Support selecting next incomplete task +* [ ] Support selecting specific task id +* [ ] Add tests + +Acceptance Criteria: + +* Parser handles documented task format +* Parser returns useful errors for malformed tasks +* Task selection works + +--- + +## Phase 5: Artifact Store + +* [ ] Create `.nightshift/` +* [ ] Create per-run directory +* [ ] Create per-task directory +* [ ] Write config snapshot +* [ ] Write task snapshot +* [ ] Write stage outputs +* [ ] Write command outputs +* [ ] Write final task notes +* [ ] Add tests + +Acceptance Criteria: + +* Every run creates deterministic artifact structure +* Artifacts are present even when stages fail + +--- + +## Phase 6: Command Executor + +* [ ] Implement command stage execution +* [ ] Capture stdout +* [ ] Capture stderr +* [ ] Capture exit code +* [ ] Persist command output +* [ ] Return structured stage result +* [ ] Add tests with harmless commands + +Acceptance Criteria: + +* Passing command returns pass +* Failing command returns fail +* Output is written to artifact file + +--- + +## Phase 7: Agent Executor + +* [ ] Implement `command` backend agent +* [ ] Load system prompt file +* [ ] Build prompt bundle +* [ ] Pass prompt to command backend +* [ ] Capture output +* [ ] Persist output +* [ ] Return structured stage result +* [ ] Add fake-agent tests + +Acceptance Criteria: + +* Fake command agent can produce stage output +* Prompt includes task and acceptance criteria +* Agent output is stored in artifacts + +--- + +## Phase 8: Pipeline Runner + +* [ ] Execute configured stages in order +* [ ] Stop on unrecoverable failure +* [ ] Support `on_fail` stage redirection +* [ ] Track retry count +* [ ] Enforce max task retries +* [ ] Write per-stage summaries +* [ ] Add tests + +Acceptance Criteria: + +* Happy path pipeline completes +* Failed review can retry implementation +* Retry limit is enforced +* Final task status is recorded + +--- + +## Phase 9: Context Manager + +* [ ] Create project context file if absent +* [ ] Create task context file +* [ ] Include project context in agent prompt bundle +* [ ] Include prior stage notes in retry prompt +* [ ] Write `context-out.md` +* [ ] Add tests + +Acceptance Criteria: + +* Context files are created +* Agent prompt receives compact context +* Context output is persisted + +--- + +## Phase 10: Reports + +* [ ] Generate task final report +* [ ] Generate run summary +* [ ] Include task status +* [ ] Include retry count +* [ ] Include modified files if available +* [ ] Include test/static results +* [ ] Include artifact paths +* [ ] Add tests + +Acceptance Criteria: + +* User can inspect one summary after run +* Summary explains what happened without reading every artifact + +--- + +## Phase 11: README + +* [ ] Explain what NightShift is +* [ ] Explain what it is not +* [ ] Add quickstart +* [ ] Add config example +* [ ] Add task file example +* [ ] Add safety model explanation +* [ ] Add MVP status + +Acceptance Criteria: + +* A new user can understand and run the MVP +* README emphasizes reviewable output, not blind autonomy + +--- + +# 14. Implementation Guidance + +## 14.1 Prefer boring code + +This project should be reliable. + +Do not make clever abstractions before the simple pipeline works. + +## 14.2 Tests are part of the product + +This is an AI automation safety tool. + +Tests are credibility. + +## 14.3 Make errors helpful + +Bad: + +```text +ValueError: invalid config +``` + +Good: + +```text +Config error: pipeline stage 'review_plan' references unknown agent 'critic'. +Defined agents: planner, implementer, reviewer. +``` + +## 14.4 Do not assume real LLMs in tests + +Use fake command agents. + +Real model integration can come later. + +## 14.5 Keep artifacts human-readable + +Prefer markdown, YAML, and plain text. + +--- + +# 15. Suggested Agent Prompt Files + +## `agents/planner.md` + +```markdown +You are the planning agent for NightShift. + +Your job is to create a conservative implementation plan for one coding task. + +Rules: +- Do not write code. +- Identify relevant files. +- Preserve existing behavior. +- Prefer small changes. +- Include test strategy. +- Include risks. + +Output: +# Plan + +## Summary + +## Relevant Files + +## Steps + +## Test Strategy + +## Risks + +## Acceptance Criteria Mapping +``` + +## `agents/implementer.md` + +```markdown +You are the implementation agent for NightShift. + +Your job is to implement the approved plan inside the scoped project directory. + +Rules: +- Make the smallest correct change. +- Do not edit files outside scope. +- Do not skip tests intentionally. +- Preserve existing style. +- Write useful implementation notes. + +Output: +# Implementation Notes + +## Changed Files + +## Summary + +## Tests Added or Updated + +## Risks + +## Follow-up Notes +``` + +## `agents/reviewer.md` + +```markdown +You are the review agent for NightShift. + +Your job is to decide whether the current task should pass, retry implementation, retry planning, or fail. + +Priorities: +1. Correctness +2. Safety +3. Acceptance criteria +4. Maintainability +5. Minimality + +Output exactly: + +status: pass | fail | retry | escalate +reason: +next_stage: +context_update: +``` + +--- + +# 16. Definition of Done for MVP + +NightShift MVP is done when: + +* `nightshift init` creates a usable starter project +* `nightshift validate` catches bad config +* `nightshift run` can process one markdown task +* pipeline stages execute in order +* fake command agents work +* command stages run safely +* artifacts are written +* retry limits work +* final report is generated +* tests cover core safety and pipeline behavior + +--- + +# 17. Future Features + +Do not implement these until MVP is stable: + +* DAG workflows +* parallel tasks +* Git branches per task +* remote workers +* cloud agent APIs +* dashboard UI +* prompt A/B testing +* model cost telemetry +* agent tournaments +* constraint-language experiments +* task dependency solver +* self-improving prompt library + +--- + +# 18. Final Instruction to Codex + +Build this incrementally. + +Start with the smallest vertical slice: + +```text +init -> validate -> parse one task -> create artifacts -> run fake pipeline -> write summary +``` + +Then add safety, retries, command execution, and real agent wrappers. + +Do not build the cathedral before the generator turns on. + +The goal is boring, auditable leverage.