Synapse-OS-Assistant-And-AI.../Docs/System Architecture & Technical Documentation.md
0% [█ █ █ █ █ █ █ █ █ █] 100% 8a5337dcde Added the Whitepaper and System Arch docs
2026-06-04 20:47:22 -05:00

3.5 KiB

System Architecture & Technical Documentation

1. Overview

The system consists of three primary components:

  1. AI C2 Server (AI C2 Server): A central hub that routes commands between the local client and the web browser.
  2. Unified Web Harness (Tampermonkey Script): A JavaScript payload injected into AI chat websites to hijack the UI and use the AI as a free API.
  3. Voice Assistant PoC (VoiceAssistantPoC_Win): The client application that listens to the user's voice, communicates with the C2 server, and executes SCL commands on the local OS.

2. AI C2 Server (Command and Control)

The C2 Server is a WinForms application running a local HttpListener on port 8080. It acts as a message broker.

  • State Management (OperationCenter.cs): Maintains a dictionary of active instances (browser tabs). It tracks whether an instance is busy, its turn count, and its chat history.
  • Relay Endpoints (/api/relay/*): Used exclusively by the Tampermonkey script.
    • GET /command/{id}: The browser polls this endpoint to see if the user has queued a prompt.
    • POST /result/{id}: The browser posts the extracted markdown response from the AI here.
    • POST /state/{id}: The browser sends heartbeats to keep the instance alive in the C2 UI.
  • Admin Endpoints (/api/admin/*): Used by the Voice Assistant PoC to queue prompts (/inject), reset chats (/action/new_chat), and fetch active sessions.

3. Unified Web Harness (Tampermonkey)

The harness (GoogleAI_Search_Deepseek_ChatGPT_UnifiedHarness.js) is a sophisticated DOM-manipulation script that bypasses the need for paid API keys by puppeteering the web interfaces of major AI providers.

  • CORS Bypass: Uses GM_xmlhttpRequest to communicate with localhost:8080, bypassing standard browser CORS restrictions.
  • DOM Tactics: Employs different strategies to inject text into the AI's chat box. For Google, it uses standard value setters. For Gemini and ChatGPT, it uses contenteditable tactics (manipulating the Selection and Range APIs to simulate typing).
  • Kinetic Wait & Semantic Lock: Because AI responses stream in dynamically, the script uses a "Kinetic Wait" (a hard delay to allow the UI to register the submit click) followed by a "Semantic Lock". The Semantic Lock monitors the DOM for text growth. Once the text length remains stable for a required "streak" (e.g., 3 seconds), the lock releases, and the script extracts the final response.
  • Smart Extraction: Custom HTML-to-Markdown parsers strip away UI clutter (buttons, SVGs, hidden elements) and extract only the AI's actual response text.

4. Voice Assistant PoC

The client application brings the system together, acting as the bridge between the user's voice, the AI, and the Windows OS.

  • Speech-to-Text (Vosk): Uses the Vosk offline speech recognition engine via NAudio. It listens continuously for the wake word ("Computer"). Once detected, it captures the subsequent phrase.
  • Base64 Encoding: To prevent the AI from confusing user input with system instructions, the user's prompt is Base64 encoded before being sent to the C2 server. The AI is instructed to decode it before processing.
  • SCL Execution (SclProcessor.cs): When the AI responds with an SCL command (e.g., ~cmd[dir]), the SclProcessor parses it using a custom character-stepping algorithm that respects nested brackets and escape characters. It then executes the command via System.Diagnostics.Process (cmd.exe) and returns the ^cmd[0|output] result back to the AI loop until the AI is satisfied.