* Initial plan * Fix security vulnerabilities: MD5→SHA-256, XSS via dangerouslySetInnerHTML/innerHTML, insecure randomness, CodeQL config Co-authored-by: TLimoges33 <125313326+TLimoges33@users.noreply.github.com> * Clean up README: remove decorative emojis for a professional tone Remove all emojis from section headers, list item prefixes, and decorative positions. Replace ✅ phase status markers with '(Complete)' text. Keep the ⭐ in the final call-to-action line. No changes to links, badges, code blocks, or technical content. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: remove emoji characters from CONTRIBUTING.md Remove all emoji from section headers and closing line while preserving links, code blocks, and technical content. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: remove emoji characters from documentation files Remove all emoji characters from 8 documentation files in docs/. Replace status-marker checkmarks (✅) with '(Done)' text. Remove decorative emojis from headers and body text entirely. Preserve emojis inside code blocks unchanged. Clean up trailing whitespace introduced by removals. Files modified: - DEPLOYMENT_GUIDE.md - IMPLEMENTATION_PLAN.md - MILESTONE_6_SUMMARY.md - PRODUCTION_ROADMAP.md - PROJECT_STATUS.md - REPOSITORY_ENHANCEMENT.md - ROADMAP.md - SECURITY_AUDIT_ROADMAP.md Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: remove emoji characters from documentation files Remove all emoji characters from 9 markdown files while preserving code block content (box-drawing characters, indentation). Emojis removed from headers, list items, and body text across READMEs, issue templates, PR template, runbook, and mobile docs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Remove excessive emoji from all documentation for professional presentation Co-authored-by: TLimoges33 <125313326+TLimoges33@users.noreply.github.com> * Fix PluginWidget initial state and remove || true from security audit steps Co-authored-by: TLimoges33 <125313326+TLimoges33@users.noreply.github.com> * Remediate all failing CI checks: update deprecated actions, fix npm vulnerabilities, fix migrations YAML Co-authored-by: SynOSdev <257853113+SynOSdev@users.noreply.github.com> * Fix all remaining CI failures: Node 18→20, fix test API contract, fix pytest version, fix Postgres health checks Co-authored-by: SynOSdev <257853113+SynOSdev@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: TLimoges33 <125313326+TLimoges33@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: SynOSdev <257853113+SynOSdev@users.noreply.github.com>
2.4 KiB
2.4 KiB
LifeRPG Ops Runbook
This runbook summarizes common operational signals and actions.
Key metrics and dashboards
- HTTP: request rate (
http_requests_total), p95 latency (http_request_duration_seconds), in-progress gauge. - Jobs:
jobs_processed_total{status}. - Integrations:
integration_sync_total{provider,result},integration_sync_by_integration_total{integration_id,result}. - Backpressure:
sync_enqueue_skips_total{reason},sync_queue_depth{provider},sync_inflight{provider}. - Logs: structured JSON logs for requests and jobs; ship via Promtail to Loki.
Grafana dashboard: ops/grafana-dashboard.json (import into Grafana and configure PROM_DS and LOKI_DS).
Common symptoms
- High enqueue skips
- Symptom:
sync_enqueue_skips_totalrate > 0.2 for >10m. - Likely causes: provider concurrency cap, duplicate enqueues (guard), or downstream slowness.
- Actions:
- Check
sync_inflight{provider}vs cap (envSYNC_MAX_CONCURRENCY_PER_PROVIDER). - Temporarily raise the cap if safe, or reduce scheduler cadence (
sync_interval_seconds). - Inspect job logs in Loki for adapter errors or rate limits.
- Queue depth rising
- Symptom:
increase(sync_queue_depth[15m]) > 50. - Actions:
- Scale workers or increase per-provider cap cautiously.
- Pause non-critical providers by increasing intervals.
- Check external API health/rate limits.
- Elevated request latency
- Symptom: p95 > 500ms sustained.
- Actions:
- Inspect recent deployments, DB CPU/IO, and external dependencies.
- Enable sampling/profiling; consider caching.
Configuration
- Concurrency cap per provider:
SYNC_MAX_CONCURRENCY_PER_PROVIDER(default 4). - Default scheduler interval:
DEFAULT_SYNC_INTERVAL_SECONDS(default 900s). Per-integration override:integration.config.sync_interval_seconds. - Close mode:
INTEGRATION_CLOSE_MODE(archivedefault;deleteopt-in).
On-call checklist
- Confirm alerts and correlate with Grafana panels.
- Review recent logs for
event=enqueued|start|success|failin Loki. - Take one mitigating action at a time; document in the incident log.
Playbooks
- Raise provider cap:
- Set
SYNC_MAX_CONCURRENCY_PER_PROVIDERand restart worker. - Slow the scheduler:
- PATCH integration config
{"sync_interval_seconds": <value>}for noisy integrations. - Toggle close policy:
- POST
/api/v1/admin/settings{ "integration_close_mode": "archive|delete" }.