LifeRPG_v2.0/modern/ops/RUNBOOK.md
Copilot 90750ee8df
Strip emoji from docs, fix XSS/hashing vulnerabilities, remediate all failing CI checks (#1)
* Initial plan

* Fix security vulnerabilities: MD5→SHA-256, XSS via dangerouslySetInnerHTML/innerHTML, insecure randomness, CodeQL config

Co-authored-by: TLimoges33 <125313326+TLimoges33@users.noreply.github.com>

* Clean up README: remove decorative emojis for a professional tone

Remove all emojis from section headers, list item prefixes, and
decorative positions. Replace  phase status markers with '(Complete)'
text. Keep the  in the final call-to-action line. No changes to
links, badges, code blocks, or technical content.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: remove emoji characters from CONTRIBUTING.md

Remove all emoji from section headers and closing line while
preserving links, code blocks, and technical content.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: remove emoji characters from documentation files

Remove all emoji characters from 8 documentation files in docs/.
Replace status-marker checkmarks () with '(Done)' text.
Remove decorative emojis from headers and body text entirely.
Preserve emojis inside code blocks unchanged.
Clean up trailing whitespace introduced by removals.

Files modified:
- DEPLOYMENT_GUIDE.md
- IMPLEMENTATION_PLAN.md
- MILESTONE_6_SUMMARY.md
- PRODUCTION_ROADMAP.md
- PROJECT_STATUS.md
- REPOSITORY_ENHANCEMENT.md
- ROADMAP.md
- SECURITY_AUDIT_ROADMAP.md

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: remove emoji characters from documentation files

Remove all emoji characters from 9 markdown files while preserving
code block content (box-drawing characters, indentation). Emojis
removed from headers, list items, and body text across READMEs,
issue templates, PR template, runbook, and mobile docs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Remove excessive emoji from all documentation for professional presentation

Co-authored-by: TLimoges33 <125313326+TLimoges33@users.noreply.github.com>

* Fix PluginWidget initial state and remove || true from security audit steps

Co-authored-by: TLimoges33 <125313326+TLimoges33@users.noreply.github.com>

* Remediate all failing CI checks: update deprecated actions, fix npm vulnerabilities, fix migrations YAML

Co-authored-by: SynOSdev <257853113+SynOSdev@users.noreply.github.com>

* Fix all remaining CI failures: Node 18→20, fix test API contract, fix pytest version, fix Postgres health checks

Co-authored-by: SynOSdev <257853113+SynOSdev@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: TLimoges33 <125313326+TLimoges33@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: SynOSdev <257853113+SynOSdev@users.noreply.github.com>
2026-03-14 08:59:37 -04:00

2.4 KiB

LifeRPG Ops Runbook

This runbook summarizes common operational signals and actions.

Key metrics and dashboards

  • HTTP: request rate (http_requests_total), p95 latency (http_request_duration_seconds), in-progress gauge.
  • Jobs: jobs_processed_total{status}.
  • Integrations: integration_sync_total{provider,result}, integration_sync_by_integration_total{integration_id,result}.
  • Backpressure: sync_enqueue_skips_total{reason}, sync_queue_depth{provider}, sync_inflight{provider}.
  • Logs: structured JSON logs for requests and jobs; ship via Promtail to Loki.

Grafana dashboard: ops/grafana-dashboard.json (import into Grafana and configure PROM_DS and LOKI_DS).

Common symptoms

  1. High enqueue skips
  • Symptom: sync_enqueue_skips_total rate > 0.2 for >10m.
  • Likely causes: provider concurrency cap, duplicate enqueues (guard), or downstream slowness.
  • Actions:
  • Check sync_inflight{provider} vs cap (env SYNC_MAX_CONCURRENCY_PER_PROVIDER).
  • Temporarily raise the cap if safe, or reduce scheduler cadence (sync_interval_seconds).
  • Inspect job logs in Loki for adapter errors or rate limits.
  1. Queue depth rising
  • Symptom: increase(sync_queue_depth[15m]) > 50.
  • Actions:
  • Scale workers or increase per-provider cap cautiously.
  • Pause non-critical providers by increasing intervals.
  • Check external API health/rate limits.
  1. Elevated request latency
  • Symptom: p95 > 500ms sustained.
  • Actions:
  • Inspect recent deployments, DB CPU/IO, and external dependencies.
  • Enable sampling/profiling; consider caching.

Configuration

  • Concurrency cap per provider: SYNC_MAX_CONCURRENCY_PER_PROVIDER (default 4).
  • Default scheduler interval: DEFAULT_SYNC_INTERVAL_SECONDS (default 900s). Per-integration override: integration.config.sync_interval_seconds.
  • Close mode: INTEGRATION_CLOSE_MODE (archive default; delete opt-in).

On-call checklist

  • Confirm alerts and correlate with Grafana panels.
  • Review recent logs for event=enqueued|start|success|fail in Loki.
  • Take one mitigating action at a time; document in the incident log.

Playbooks

  • Raise provider cap:
  • Set SYNC_MAX_CONCURRENCY_PER_PROVIDER and restart worker.
  • Slow the scheduler:
  • PATCH integration config {"sync_interval_seconds": <value>} for noisy integrations.
  • Toggle close policy:
  • POST /api/v1/admin/settings { "integration_close_mode": "archive|delete" }.