# NIGHTSHIFT_CODEX.md

You are Codex working on **NightShift**, a local-first AI coding pipeline runner.

This file is the implementation-driving context document. Treat it as the project brief, architectural guide, and task checklist.

---

# 0. Project Identity

## Name

**NightShift**

## Tagline

Auditable local-first AI coding pipelines.

## Core Thesis

NightShift is not an autonomous coding god.

NightShift is a deterministic pipeline runner that lets unreliable AI agents perform bounded coding work inside scoped, auditable, test-driven workflows.

The user should be able to run NightShift overnight and wake up to:

* a reviewable repository state
* task artifacts
* plans
* logs
* diffs
* test output
* review notes
* a final report

## Priority Order

Optimize in this order:

1. Cheapness
2. Correctness
3. Auditability
4. Speed

This means:

* Prefer local models first.
* Keep context compact.
* Avoid token waste.
* Make failure explicit.
* Always produce artifacts.
* Do not optimize for cleverness before trust.

---

# 1. Product Summary

NightShift runs long-running AI-assisted coding pipelines against a scoped project directory.

A user provides:

* a repository
* a markdown task file
* a declarative pipeline config
* agent definitions
* allowed test/static commands

NightShift processes one task at a time:

```text
select task
  -> plan
  -> review plan
  -> implement
  -> run tests
  -> run static checks
  -> review result
  -> retry or complete
  -> write summary
```

The output is not automatically shipped.

The output is a reviewable work package.

---

# 2. Non-Negotiable Design Constraints

## 2.1 Local-first

The first implementation should assume local execution.

Primary target backend:

* local command-driven agent execution

Future-compatible backends:

* Ollama
* Claude Code
* Codex CLI
* OpenAI API
* Anthropic API

Do not overbuild backend support in v1.

Build a clean interface first.

---

## 2.2 Scoped directory access

NightShift must only operate inside a configured project root.

It must not casually read/write arbitrary paths.

All path resolution should:

* normalize paths
* reject path traversal
* reject writes outside project root
* prefer relative paths in artifacts

---

## 2.3 One task at a time

v1 runs one task at a time.

No parallel task execution.

No DAG executor yet.

---

## 2.4 Declarative config first

Use YAML for v1.

Do not implement arbitrary Python config yet.

The config should be expressive enough for:

* agents
* stages
* commands
* retries
* artifact directory
* task file location
* scoped paths
* allowlisted commands

---

## 2.5 Auditable artifacts

Every run should create a durable artifact tree.

Artifacts are core product behavior, not debug leftovers.

---

# 3. Architecture

## 3.1 Conceptual Components

```text
NightShift CLI
  |
  v
Config Loader
  |
  v
Task Parser
  |
  v
Pipeline Runner
  |
  +--> Agent Executor
  |
  +--> Command Executor
  |
  +--> Artifact Store
  |
  +--> Context Manager
  |
  v
Run Summary
```

---

## 3.2 Suggested Module Layout

Use this layout unless the existing repo already strongly implies another structure.

```text
nightshift/
  __init__.py
  cli.py
  config.py
  tasks.py
  pipeline.py
  stages.py
  agents.py
  commands.py
  artifacts.py
  context.py
  safety.py
  reports.py
  errors.py

tests/
  test_config.py
  test_tasks.py
  test_pipeline.py
  test_safety.py
  test_artifacts.py

examples/
  pipeline.yaml
  tasks.md
  agents/
    planner.md
    implementer.md
    reviewer.md

NIGHTSHIFT_CODEX.md
README.md
```

If this project is implemented in Rust instead of Python, preserve the same conceptual boundaries.

---

# 4. Config Format

## 4.1 Example `nightshift.yaml`

```yaml
project:
  name: example-project
  root: .
  task_file: tasks.md
  artifact_dir: .nightshift

safety:
  require_clean_worktree: false
  scoped_paths:
    - src/
    - tests/
  allowed_commands:
    - cargo test
    - cargo fmt --check
    - cargo clippy -- -D warnings
  forbidden_commands:
    - rm -rf
    - git push
    - curl | bash

agents:
  planner:
    backend: command
    command: echo
    system_prompt: examples/agents/planner.md

  implementer:
    backend: command
    command: echo
    system_prompt: examples/agents/implementer.md

  reviewer:
    backend: command
    command: echo
    system_prompt: examples/agents/reviewer.md

pipeline:
  max_task_retries: 3
  stages:
    - id: plan
      type: agent
      agent: planner
      output: plan.md

    - id: review_plan
      type: agent_review
      agent: reviewer
      on_fail: plan
      output: plan-review.md

    - id: implement
      type: agent
      agent: implementer
      output: implementation-log.md

    - id: test
      type: command
      commands:
        - cargo test
      output: test-output.txt

    - id: static
      type: command
      commands:
        - cargo fmt --check
        - cargo clippy -- -D warnings
      output: static-output.txt

    - id: review
      type: agent_review
      agent: reviewer
      on_fail: implement
      output: review.md

    - id: summarize
      type: summarize
      output: final-notes.md
```

---

# 5. Task File Format

## 5.1 Input Task Format

Tasks are markdown checklist items with acceptance criteria.

Example:

```markdown
# Tasks

- [ ] TASK-001: Add YAML config loading

Description:
Implement config loading for NightShift.

Acceptance Criteria:
- Loads `nightshift.yaml`
- Validates required fields
- Returns typed config object
- Includes tests for valid and invalid config

- [ ] TASK-002: Add artifact directory creation

Description:
Create per-run and per-task artifact directories.

Acceptance Criteria:
- Creates `.nightshift/runs/<timestamp>/`
- Creates task-specific folder
- Writes task snapshot
- Includes tests
```

## 5.2 Parser Requirements

The parser should identify:

* task id
* task title
* completion state
* description
* acceptance criteria
* optional dependency notes

For v1, parsing can be simple and documented.

Do not try to support every markdown style.

---

# 6. Pipeline Model

## 6.1 State Machine, Not DAG

v1 should use a configurable state machine.

Reason:

* one task at a time
* retry loops matter
* easier to audit
* easier to debug
* easier MVP

A stage returns a `StageResult`.

Suggested shape:

```python
@dataclass
class StageResult:
    stage_id: str
    status: Literal["pass", "fail", "retry", "escalate"]
    reason: str
    output_path: str | None = None
    next_stage: str | None = None
    context_update: str | None = None
```

Equivalent Rust structs are fine if using Rust.

## 6.2 Retry Behavior

Retry behavior should be deterministic.

Rules:

* retries are counted per task
* max retries come from config
* failed review stages can redirect to configured `on_fail`
* after max retries, task is marked failed
* failure is summarized in artifacts

---

# 7. Agent Model

## 7.1 Agent Definition

Agents have:

* id
* backend
* command or model
* system prompt file
* role

For v1, support a `command` backend first.

This lets the user wrap:

* Codex
* Claude Code
* Ollama scripts
* local model scripts
* fake test agents

## 7.2 Agent Invocation

The runner should construct a prompt/input bundle containing:

* system prompt
* task markdown
* acceptance criteria
* relevant project context
* previous stage output
* retry notes, if any
* required output contract

The agent should write output to the configured artifact path.

Do not pass giant history blobs.

---

# 8. Context System

## 8.1 Context Layers

There are three context layers:

```text
project context
  long-lived, compact, shared across tasks

task context
  specific to the current task

retry context
  compact notes from failed attempts
```

## 8.2 Project Context

Stored at:

```text
.nightshift/project-context.md
```

Contains:

* architecture notes
* repo conventions
* summaries from completed tasks
* high-value durable facts

## 8.3 Task Context

Stored per task:

```text
.nightshift/runs/<run-id>/tasks/<task-id>/context.md
```

## 8.4 Context Compaction

After each task, write:

```text
context-out.md
```

Then selectively bubble useful durable information into project context.

Do not automatically dump everything into project context.

---

# 9. Artifact Layout

Every run should create:

```text
.nightshift/
  project-context.md
  runs/
    <run-id>/
      run-summary.md
      config.snapshot.yaml
      tasks/
        TASK-001/
          task.md
          plan.md
          plan-review.md
          implementation-log.md
          test-output.txt
          static-output.txt
          review.md
          final-notes.md
          diff.patch
          context.md
          context-out.md
```

Artifacts should be written even on failure.

---

# 10. Safety Rules

## 10.1 Path Safety

Implement helpers that:

* resolve paths against project root
* reject writes outside project root
* reject `..` traversal that escapes root
* prefer pathlib/path abstractions

## 10.2 Command Safety

For v1:

* only run commands listed in `allowed_commands`
* block commands containing known forbidden fragments
* record all command output
* record exit code
* set timeouts when practical

## 10.3 Git Safety

v1 should support config option:

```yaml
require_clean_worktree: true | false
```

If true, abort when git working tree is dirty.

Do not implement automatic branch creation in v1.

Do not push.

---

# 11. CLI Commands

Recommended initial CLI:

```bash
nightshift init
nightshift validate
nightshift run
nightshift run --task TASK-001
nightshift status
```

## 11.1 `nightshift init`

Creates example files:

* `nightshift.yaml`
* `tasks.md`
* `agents/planner.md`
* `agents/implementer.md`
* `agents/reviewer.md`

## 11.2 `nightshift validate`

Validates:

* config file exists
* task file exists
* scoped paths are inside root
* agents exist
* prompt files exist
* allowed commands are valid strings
* pipeline references valid agents

## 11.3 `nightshift run`

Runs the next incomplete task.

## 11.4 `nightshift run --task TASK-001`

Runs a specific task.

## 11.5 `nightshift status`

Prints:

* current config
* task count
* completed/incomplete tasks
* latest run directory

---

# 12. Testing Strategy

Write tests early.

Minimum tests:

* config loading happy path
* config missing required fields
* markdown task parsing
* artifact directory creation
* path traversal rejection
* command allowlist behavior
* forbidden command rejection
* simple pipeline execution with fake agents
* retry limit behavior

Use fake agents for tests.

Do not require real LLM calls in unit tests.

---

# 13. MVP Task Checklist

## Phase 1: Skeleton

* [ ] Create project package/module layout
* [ ] Add CLI entry point
* [ ] Add `nightshift init`
* [ ] Generate example `nightshift.yaml`
* [ ] Generate example `tasks.md`
* [ ] Generate example agent prompt files

Acceptance Criteria:

* User can run init command
* Expected files are created
* Existing files are not overwritten without confirmation or force flag

---

## Phase 2: Config Loading

* [ ] Implement YAML config loader
* [ ] Define typed config objects
* [ ] Validate required sections
* [ ] Validate agent references
* [ ] Validate pipeline stages
* [ ] Add tests

Acceptance Criteria:

* Valid config loads
* Invalid config fails with clear error
* Pipeline stages cannot reference missing agents

---

## Phase 3: Safety Layer

* [ ] Implement project root resolution
* [ ] Implement scoped path validation
* [ ] Implement safe artifact path creation
* [ ] Implement command allowlist check
* [ ] Implement forbidden command fragment check
* [ ] Add tests for path traversal
* [ ] Add tests for forbidden commands

Acceptance Criteria:

* Cannot write outside project root
* Cannot execute commands outside allowlist
* Dangerous command fragments are blocked

---

## Phase 4: Task Parser

* [ ] Parse markdown task checklist
* [ ] Extract task id
* [ ] Extract title
* [ ] Extract description
* [ ] Extract acceptance criteria
* [ ] Support selecting next incomplete task
* [ ] Support selecting specific task id
* [ ] Add tests

Acceptance Criteria:

* Parser handles documented task format
* Parser returns useful errors for malformed tasks
* Task selection works

---

## Phase 5: Artifact Store

* [ ] Create `.nightshift/`
* [ ] Create per-run directory
* [ ] Create per-task directory
* [ ] Write config snapshot
* [ ] Write task snapshot
* [ ] Write stage outputs
* [ ] Write command outputs
* [ ] Write final task notes
* [ ] Add tests

Acceptance Criteria:

* Every run creates deterministic artifact structure
* Artifacts are present even when stages fail

---

## Phase 6: Command Executor

* [ ] Implement command stage execution
* [ ] Capture stdout
* [ ] Capture stderr
* [ ] Capture exit code
* [ ] Persist command output
* [ ] Return structured stage result
* [ ] Add tests with harmless commands

Acceptance Criteria:

* Passing command returns pass
* Failing command returns fail
* Output is written to artifact file

---

## Phase 7: Agent Executor

* [ ] Implement `command` backend agent
* [ ] Load system prompt file
* [ ] Build prompt bundle
* [ ] Pass prompt to command backend
* [ ] Capture output
* [ ] Persist output
* [ ] Return structured stage result
* [ ] Add fake-agent tests

Acceptance Criteria:

* Fake command agent can produce stage output
* Prompt includes task and acceptance criteria
* Agent output is stored in artifacts

---

## Phase 8: Pipeline Runner

* [ ] Execute configured stages in order
* [ ] Stop on unrecoverable failure
* [ ] Support `on_fail` stage redirection
* [ ] Track retry count
* [ ] Enforce max task retries
* [ ] Write per-stage summaries
* [ ] Add tests

Acceptance Criteria:

* Happy path pipeline completes
* Failed review can retry implementation
* Retry limit is enforced
* Final task status is recorded

---

## Phase 9: Context Manager

* [ ] Create project context file if absent
* [ ] Create task context file
* [ ] Include project context in agent prompt bundle
* [ ] Include prior stage notes in retry prompt
* [ ] Write `context-out.md`
* [ ] Add tests

Acceptance Criteria:

* Context files are created
* Agent prompt receives compact context
* Context output is persisted

---

## Phase 10: Reports

* [ ] Generate task final report
* [ ] Generate run summary
* [ ] Include task status
* [ ] Include retry count
* [ ] Include modified files if available
* [ ] Include test/static results
* [ ] Include artifact paths
* [ ] Add tests

Acceptance Criteria:

* User can inspect one summary after run
* Summary explains what happened without reading every artifact

---

## Phase 11: README

* [ ] Explain what NightShift is
* [ ] Explain what it is not
* [ ] Add quickstart
* [ ] Add config example
* [ ] Add task file example
* [ ] Add safety model explanation
* [ ] Add MVP status

Acceptance Criteria:

* A new user can understand and run the MVP
* README emphasizes reviewable output, not blind autonomy

---

# 14. Implementation Guidance

## 14.1 Prefer boring code

This project should be reliable.

Do not make clever abstractions before the simple pipeline works.

## 14.2 Tests are part of the product

This is an AI automation safety tool.

Tests are credibility.

## 14.3 Make errors helpful

Bad:

```text
ValueError: invalid config
```

Good:

```text
Config error: pipeline stage 'review_plan' references unknown agent 'critic'.
Defined agents: planner, implementer, reviewer.
```

## 14.4 Do not assume real LLMs in tests

Use fake command agents.

Real model integration can come later.

## 14.5 Keep artifacts human-readable

Prefer markdown, YAML, and plain text.

---

# 15. Suggested Agent Prompt Files

## `agents/planner.md`

```markdown
You are the planning agent for NightShift.

Your job is to create a conservative implementation plan for one coding task.

Rules:
- Do not write code.
- Identify relevant files.
- Preserve existing behavior.
- Prefer small changes.
- Include test strategy.
- Include risks.

Output:
# Plan

## Summary

## Relevant Files

## Steps

## Test Strategy

## Risks

## Acceptance Criteria Mapping
```

## `agents/implementer.md`

```markdown
You are the implementation agent for NightShift.

Your job is to implement the approved plan inside the scoped project directory.

Rules:
- Make the smallest correct change.
- Do not edit files outside scope.
- Do not skip tests intentionally.
- Preserve existing style.
- Write useful implementation notes.

Output:
# Implementation Notes

## Changed Files

## Summary

## Tests Added or Updated

## Risks

## Follow-up Notes
```

## `agents/reviewer.md`

```markdown
You are the review agent for NightShift.

Your job is to decide whether the current task should pass, retry implementation, retry planning, or fail.

Priorities:
1. Correctness
2. Safety
3. Acceptance criteria
4. Maintainability
5. Minimality

Output exactly:

status: pass | fail | retry | escalate
reason: <short explanation>
next_stage: <optional stage id>
context_update: <compact useful note>
```

---

# 16. Definition of Done for MVP

NightShift MVP is done when:

* `nightshift init` creates a usable starter project
* `nightshift validate` catches bad config
* `nightshift run` can process one markdown task
* pipeline stages execute in order
* fake command agents work
* command stages run safely
* artifacts are written
* retry limits work
* final report is generated
* tests cover core safety and pipeline behavior

---

# 17. Future Features

Do not implement these until MVP is stable:

* DAG workflows
* parallel tasks
* Git branches per task
* remote workers
* cloud agent APIs
* dashboard UI
* prompt A/B testing
* model cost telemetry
* agent tournaments
* constraint-language experiments
* task dependency solver
* self-improving prompt library

---

# 18. Final Instruction to Codex

Build this incrementally.

Start with the smallest vertical slice:

```text
init -> validate -> parse one task -> create artifacts -> run fake pipeline -> write summary
```

Then add safety, retries, command execution, and real agent wrappers.

Do not build the cathedral before the generator turns on.

The goal is boring, auditable leverage.