22 KiB
⛧ Hades Gate
Direct syscall construction from first principles
"The gate to the underworld is open. Walk through it without knocking."
Abstract
Hades Gate is a technique for constructing direct syscall stubs at runtime by resolving native API (Nt*) function addresses from ntdll.dll via PEB walking, extracting their syscall numbers from the unhooked portions of their stubs, and synthesizing clean syscall instructions that bypass all userland EDR/AV function hooks.
It does not hardcode syscall numbers. It does not rely on a pre-computed table. It derives everything from the running system at runtime, meaning it works across Windows versions without modification.
Relationship to Jake Swiz (0xXyc)
Hades Gate is a targeted extension of Jake Swiz's Windows shellcoding research.
His Trilogy (the foundation)
| Pillar | What it covers |
|---|---|
| Fukahi Na Tekio | CALL/POP XOR encoder with LFSR, polymorphism, static AV/EDR signature evasion. Replaces the broken FPU-based shikata_ga_nai for ARM64/Prism. |
| Windows Shellcoding In-Depth | The definitive public treatment of self-sufficient Windows shellcode. PEB walking → find kernel32 → parse exports → resolve Win32 API functions from scratch. |
| ASLR & NX/DEP Bypass | Linux ROP chain tutorial: GOT leaking, ret2libc, pwntools automation. Different OS, same foundational mindset. |
How Hades Gate extends it
Jake's Shellcoding In-Depth guide walks the PEB to find kernel32.dll and resolves WinExec / MessageBoxA / CreateProcess through the Win32 API layer. This works — but every Win32 API call goes through kernel32 → kernelbase → ntdll, and ntdll is hooked by every EDR on the market.
Hades Gate asks: what if we point the same PEB walker at ntdll instead of kernel32? Instead of resolving a Win32 function, we resolve NtAllocateVirtualMemory internally, extract its syscall number from the stub, and build a mov eax, SSN; syscall; ret stub that goes straight to the kernel. The Win32 layer — and its hooks — are never executed.
Jake's path: PEB walk → kernel32.dll → export scan → WinExec → shellcode runs
Hades Gate: PEB walk → ntdll.dll → export scan → SSN → direct syscall → EDR blind
Same engine. Different destination. His trilogy gives you the locomotion; Hades Gate takes that locomotion one DLL further and changes the outcome from "my code runs" to "my code runs without the EDR watching."
This repository exists because Jake published the foundation publicly. It's an arm of his work, not a replacement for it.
References to Jake's original work:
- Fukahi Na Tekio — Encoder with AV/EDR signature evasion
- Windows Shellcoding In-Depth — PEB walking & WinAPI resolution fundamentals
- ASLR & NX/DEP Bypass — Linux ROP chain methodology
- Swiz Security Protocol — Full research catalog
- Church of Malware — Our Blessed Connection: The Shellphone Sermon — The article that introduced Jake's work to the congregation
Table of Contents
- The Problem
- Background: The Syscall Layer
- How EDRs Hook You
- The Technique
- Usage
- Where and Why to Use It
- Variants and Bypasses
- Limitations and Detection
- References
- License
1. The Problem
When your code calls VirtualAllocEx, the call chain looks like this:
your code → kernel32.dll → kernelbase.dll → ntdll.dll → kernel (ring 0)
Every single EDR and consumer AV on the market hooks the Windows API at the userland level — usually inside ntdll.dll, sometimes also kernelbase. They overwrite the first 5-16 bytes of each function with a jmp that redirects through their monitoring engine. Your call goes:
your code → kernel32 → kernelbase → ntdll (HOOKED) → EDR engine → kernel
The EDR sees:
- What function you're calling (NtAllocateVirtualMemory, NtCreateThreadEx, NtOpenProcess, etc.)
- What arguments you're passing (target PID, memory size, protection flags, etc.)
- What called it (stack trace back to your code)
Once they have that data, detection is trivial: call NtAllocateVirtualMemory with PAGE_EXECUTE_READWRITE from a non-Microsoft binary? Alert. From shellcode? Alert. From a process that just decrypted itself? Alert.
This is why your payload gets burned before the first byte executes.
2. Background: The Syscall Layer
On Windows (x64), system calls work like this:
Your process (ring 3) → ntdll stub → syscall instruction → kernel (ring 0)
Every kernel service is exposed through ntdll as a small assembly stub. For example, NtAllocateVirtualMemory looks like this in a clean, unhooked ntdll:
mov r10, rcx ; 4C 8B D1 ; syscall clobbers RCX, save to R10
mov eax, 0x0018 ; B8 18 00 00 00 ; syscall number for this function
syscall ; 0F 05 ; trap to kernel
ret ; C3 ; return
That mov eax, 0x0018 — the 0x0018 is the System Service Number (SSN), also called the syscall number. Each Nt* function has a unique SSN. The kernel uses this number to dispatch to the right handler.
The SSN is the key. If we know the SSN and we emit the syscall instruction ourselves, we never need to call ntdll's stub. We can go directly from our code to the kernel.
Why this works: There is exactly one kernel. You cannot hook the kernel from userland (well, you could, but that's a different conversation involving kernel callbacks, ETW, and PatchGuard). The syscall instruction is an atomic trap. If you issue it with the correct SSN and arguments, the kernel will service your request regardless of what the EDR did to ntdll.
3. How EDRs Hook You
There are three common hooking strategies, ordered from most to least common:
3.1 Inline Hooking (most common)
The EDR writes a 5-byte jmp or call at offset 0 of the ntdll stub, redirecting to a trampoline in the EDR's own DLL.
; Before hook (clean):
4C 8B D1 mov r10, rcx
B8 18 00 00 00 mov eax, 0x18
0F 05 syscall
C3 ret
; After hook:
E9 XX XX XX XX jmp edr_trampoline ; 5 byte jmp
B8 18 00 00 00 mov eax, 0x18 ; these bytes are still here
0F 05 syscall
C3 ret
Notice the layout carefully:
Original stub (11 bytes):
[0] 4C mov r10, rcx
[1] 8B
[2] D1
[3] B8 mov eax, SSN ← immediate starts here
[4] 18 ← SSN = 0x18
[5] 00
[6] 00
[7] 00
[8] 0F syscall
[9] 05
[10] C3 ret
5-byte JMP hook (what a simple detour looks like):
[0] E9 jmp edr_trampoline
[1] XX
[2] XX
[3] XX
[4] XX ← 5-byte jmp overwrites [0] through [4]
[5] 00 ← SSN upper bytes survive here
[6] 00
[7] 00
[8] 0F syscall
[9] 05
[10] C3 ret
A pure 5-byte jmp (E9 XX XX XX XX) does overwrite byte [4] — the low byte of the SSN. If this were the only hooking method, reading byte [4] would fail.
In practice, it doesn't matter because almost no modern EDR uses a pure 5-byte jmp. They use longer hook sequences that leave byte [4] untouched:
| Hook type | Size | Byte layout | Overwrites [4]? |
|---|---|---|---|
jmp [rip+offset] |
6 bytes | FF 25 XX XX XX XX |
❌ No (only [0-5]) |
call [rip+offset] |
6 bytes | FF 15 XX XX XX XX |
❌ No (only [0-5]) |
mov rax, imm; jmp rax |
13 bytes | 48 B8 XX ... XX FF E0 |
❌ No (only [0-12]) |
jmp rel32 |
5 bytes | E9 XX XX XX XX |
✅ Yes |
The jmp [rip+offset] (6-byte) and call [rip+offset] (6-byte) forms are by far the most common in modern EDRs — Defender for Endpoint, SentinelOne, Cortex XDR, Sophos Intercept X, and Carbon Black all use these. The 5-byte jmp rel32 is largely legacy or toy detour implementations.
If you encounter a 5-byte hook (visible when bytes [0-4] read as E9 XX XX XX XX), the SSN at [4] is gone. Fall back to:
- Scavenge the SSN higher bytes — a 5-byte jmp leaves bytes [5-7] intact. The SSN is in the low byte, but you can reconstruct it from the known SSN ranges per Windows build, or
- Clean ntdll map (Section 7.1) — read the clean DLL from disk and extract SSNs from the real stubs, or
- Suspended process method — create a process suspended before the EDR attaches, read SSNs from its clean ntdll
For 95%+ of real-world deployments, byte [4] is clean. The code reads it and moves on.
3.2 Hooking via Detours (Microsoft Detours style)
The EDR saves the original bytes elsewhere, patches with a jmp, and provides a "trampoline" to call the original. This is the most polite approach and the easiest to bypass — just don't use the trampoline.
3.3 Replacement (least common)
The EDR replaces the entire function body with a jmp to a completely fake function. The original stub is nowhere in memory. This breaks Hades Gate (and Hell's Gate, and most other techniques). The solution is to map a clean copy of ntdll from disk.
4. The Technique
Step 1: PEB Walk → ntdll Base
Every Windows process has a Process Environment Block (PEB) accessible at a fixed offset from the GS segment register:
x64: GS:[0x60] → PEB
x86: FS:[0x30] → PEB
The PEB contains a pointer to PEB_LDR_DATA (at offset 0x18), which contains a linked list of loaded modules. We traverse this list to find ntdll.dll and get its base address.
Why not GetModuleHandle? It's hooked. The PEB is never patched by EDRs because touching it would crash the process.
The structure:
GS:[0x60] → PEB
+0x18 → PEB_LDR_DATA
+0x20 → InMemoryOrderModuleList (LIST_ENTRY)
Flink → .exe (first)
Flink → ntdll.dll (second)
Flink → kernel32.dll (third)
Each LDR_DATA_TABLE_ENTRY has:
+0x10: DllBase+0x40: BaseDllName (UNICODE_STRING)
We iterate until we find the entry whose BaseDllName matches "ntdll.dll" (case-insensitive comparison), and read DllBase from offset 0x10.
Step 2: PE Parse → Export Resolution
With ntdll's base address, we parse the PE headers:
DOS_HEADER → e_lfanew → NT_HEADERS → OptionalHeader
→ DataDirectory[EXPORT]
→ ExportDirectory
The export directory gives us:
- AddressOfNames: array of RVA pointers to function name strings
- AddressOfNameOrdinals: maps name index → ordinal
- AddressOfFunctions: maps ordinal → function RVA
We hash the target function name with FNV-1a, iterate through AddressOfNames until we find the match, resolve the ordinal, and get the function RVA from AddressOfFunctions. Adding the function RVA to the ntdll base gives us the function address in memory.
Step 3: SSN Extraction
With the function address, we inspect the first N bytes of the stub. In a clean ntdll, the pattern is:
4C 8B D1 [00-02] mov r10, rcx
B8 XX XX XX XX [03-07] mov eax, SSN
0F 05 [08-09] syscall
C3 [0A] ret
Even in a hooked stub, bytes [5-7] (the upper 24 bits of the mov eax immediate) are almost never overwritten because they're past any jmp/call hook preamble. On modern Windows (10+), syscall numbers fit in one byte (0x00-0xFF), so reading byte [4] or [5] gives us the SSN.
Edge cases handled:
- If the hook is exactly 5 bytes (overwriting [0-4]), the SSN at [4] is clobbered. We detect this by checking if bytes [3-7] form a valid
mov eax— if not, the SSN is at [5] (theB8at [3] was clobbered but the immediate at [5-7] survived). - If the function is an export but NOT a syscall stub (e.g.,
RtlAllocateHeap), there's no SSN to extract and we return 0. - If the function has no recognizable stub at all (EDR replaced it entirely), SSN is 0 and we fail gracefully.
Step 4: Stub Synthesis
With the SSN in hand, we allocate executable memory and write:
mov r10, rcx ; 4C 8B D1 — syscall calling convention
mov eax, SSN ; B8 XX 00 00 00 — syscall number
syscall ; 0F 05 — trap to kernel
ret ; C3 — return
This stub never touches ntdll. It goes directly from our allocated executable page into ring 0. The EDR's hooks are still sitting in ntdll, unexecuted, wondering where everyone went.
Step 5: Integration
Chain these stubs for a complete unhooked injection flow:
void* hNtOpenProcess = hg_syscall("NtOpenProcess");
void* hNtAllocateVirtualMemory = hg_syscall("NtAllocateVirtualMemory");
void* hNtWriteVirtualMemory = hg_syscall("NtWriteVirtualMemory");
void* hNtProtectVirtualMemory = hg_syscall("NtProtectVirtualMemory");
void* hNtCreateThreadEx = hg_syscall("NtCreateThreadEx");
void* hNtClose = hg_syscall("NtClose");
Cast each to its NTAPI prototype and call directly. The EDR sees nothing.
5. Usage
5.1 Build
# With MinGW cross-compiler:
x86_64-w64-mingw32-gcc -Os -masm=intel -c src/hades_gate.c -o hades_gate.o
x86_64-w64-mingw32-gcc -Os -masm=intel hades_gate.o your_code.c -o payload.exe
# With MSVC (cl.exe):
cl /c /O1 src/hades_gate.c
link hades_gate.obj your_code.obj /OUT:payload.exe
5.2 Basic usage
#include "src/hades_gate.h"
int main(void) {
// One-shot: resolve and build a clean syscall stub
void* stub = hg_syscall("NtAllocateVirtualMemory");
if (!stub) return 1;
// Cast to the proper NTAPI prototype
typedef NTSTATUS (NTAPI* fnNtAllocateVirtualMemory)(
HANDLE, PVOID*, ULONG_PTR, PSIZE_T, ULONG, ULONG);
fnNtAllocateVirtualMemory pNtAllocateVirtualMemory =
(fnNtAllocateVirtualMemory)stub;
// Use it — EDR never fires
PVOID addr = NULL;
SIZE_T size = 0x1000;
NTSTATUS status = pNtAllocateVirtualMemory(
(HANDLE)-1, &addr, 0, &size,
MEM_COMMIT | MEM_RESERVE,
PAGE_EXECUTE_READWRITE);
// addr = RWX memory. The EDR has no idea this happened.
return 0;
}
5.3 Manual two-step (if you need to inspect the result)
HG_RESOLVED r = hg_resolve("NtAllocateVirtualMemory");
if (r.ssn == 0) {
// Either the function wasn't found or it's not a syscall stub
return 1;
}
printf("NtAllocateVirtualMemory is at %p, SSN = 0x%02X\n",
r.address, r.ssn);
void* stub = hg_build_stub(r.ssn);
if (!stub) return 1;
// stub is ready to call
5.4 Complete unhooked injection chain
See examples/injector.c for a full implementation that opens a target process, allocates memory, writes shellcode, and executes it — all via direct syscalls. No Win32 API calls involved.
6. Where and Why to Use It
Use Hades Gate when:
-
You're writing shellcode or position-independent code that cannot rely on import tables, runtime linking, or CRT initialization. The PEB walker gives you everything you need from nothing.
-
You're building a loader or injector that needs to survive on modern Windows with EDR present. Direct syscalls are the baseline requirement for any payload that doesn't want to be detected at the API call level.
-
You're writing C2 implants that need to dynamically resolve APIs at runtime without static IAT entries. Hades Gate's PEB walking + export parsing gives you dynamic resolution without calling
GetProcAddress(which is also hooked). -
You need cross-version compatibility. Because Hades Gate derives syscall numbers at runtime, the same binary works on Windows 10 1507, Windows 11 24H2, and everything in between. No hardcoded offset tables to maintain.
Know your enemy — what lurks below userland
Hades Gate bypasses userland API hooks. There are three layers below that it does not touch. Here's how to handle each:
Layer 1: Kernel callbacks (ETW, PsSetCreateProcessNotifyRoutine, etc.)
Most EDRs register kernel callbacks that fire after the syscall completes. Direct syscalls don't avoid these — they happen in ring 0 regardless of how you called the kernel.
What they see: the syscall number, the arguments, the calling process. What they don't see: the Win32 function name, the call stack through ntdll.
How to mitigate:
- Batch allocations into fewer, larger calls (reduces event volume)
- Chain shellcode delivery through reflective DLL loading instead of per-API calls
- Use
NtSetInformationProcessto disable ETW for your process before injection calls - Time your calls with realistic delays between them (an injection that completes in 2ms is obvious)
- Spoof the calling thread's start address so the kernel callback sees a legitimate entry point
Layer 2: Secure Kernel / VBS
Virtualization-Based Security runs a hypervisor below the kernel. It can intercept every syscall at the VMExit level. There is no userland bypass for this.
If VBS is enabled, direct syscalls still work — they just don't help you hide from the hypervisor. The EDR watching from VBS sees every syscall with full fidelity.
How to deal with it:
- Hades Gate still gives you cross-version compat and avoids userland hooks. It's not useless under VBS — it's just not invisible.
- Combine with ETW disable and call-spoofing to reduce the signal your process emits at userland, making it harder to distinguish from legitimate behavior even when VBS is watching.
- If absolute invisibility is required under VBS, you need hardware-level techniques (Secure Kernel bypasses) that are outside the scope of any userland tool.
#### Layer 3: Full stub replacement (CrowdStrike Falcon, some SentinelOne configs)
These EDRs don't just hook the first bytes — they overwrite the entire syscall stub with a jmp to a completely fake function. The SSN at byte [4] is gone.
How Hades Gate handles this:
- Call
hg_resolve()first, thenhg_verify_stub(). If the stub doesn't look like a syscall stub, fall back tohg_map_clean_ntdll()+hg_resolve_at()(Section 7.1). hg_verify_stub()checks for the presence of0F 05 C3(syscall; ret) within the first 16 bytes. If absent, the EDR has replaced the stub.hg_map_clean_ntdll()maps a fresh copy of ntdll.dll from disk, thenhg_resolve_at()extracts SSNs from the real stubs.
HG_RESOLVED r = hg_resolve("NtAllocateVirtualMemory");
if (!hg_verify_stub(r.address)) {
// Stub appears replaced — try clean ntdll from disk
uintptr_t clean_base = hg_map_clean_ntdll();
r = hg_resolve_at("NtAllocateVirtualMemory", clean_base);
}
void* stub = hg_build_stub(r.ssn);
Bottom line
Hades Gate is a userland hook bypass. Nothing more, nothing less. If the EDR is watching from ring 0 or below, direct syscalls are still useful — they eliminate the most common detection vector (function hooking) — but they are not a complete stealth solution. Pair with ETW disable, call spoofing, and behavioral timing for a fuller picture.
7. Variants and Bypasses
7.1 Clean-mapped nTDLL
When the EDR replaces entire stubs instead of hooking them, read ntdll.dll from disk and map it as a clean copy. Then resolve SSNs from the clean copy.
Steps:
1. NtOpenFile("\\??\\C:\\Windows\\System32\\ntdll.dll")
2. NtCreateSection(..., SEC_IMAGE, ...)
3. NtMapViewOfSection(...) → maps a CLEAN copy into memory
4. Run hg_resolve against this clean base instead of in-memory ntdll
5. Use the resolved SSNs to build stubs
The in-memory ntdll may have fake stubs, but the on-disk copy is always clean.
Updated API usage:
// Before: only resolves from in-memory ntdll
HG_RESOLVED r = hg_resolve("NtAllocateVirtualMemory");
// Now: verify the stub is real, fall back to clean copy
if (!hg_verify_stub(r.address)) {
// EDR replaced the stub — map from disk
uintptr_t clean = hg_map_clean_ntdll();
r = hg_resolve_at("NtAllocateVirtualMemory", clean);
}
void* stub = hg_build_stub(r.ssn);
7.2 Indirect Syscalls
Some EDRs (Cybereason, modern Defender ATP) hook the syscall instruction itself by patching ntdll!KiFastSystemCall. To bypass:
- Scan any signed Microsoft DLL (kernel32.dll, user32.dll, etc.) for bytes
0F 05 C3(syscall + ret) - Redirect your stub's
syscallto that gadget instead of embedding it
Your stub becomes:
mov r10, rcx
mov eax, SSN
jmp gadget_addr ; jumps to a clean syscall;ret
The EDR hooks the syscall in ntdll, not in kernel32, so the gadget is clean.
7.3 Random Access SSN Extraction
Some EDRs use variable-length hooks that clobber different offsets. Use multiple extraction strategies:
// Strategy 1 (Hell's Gate / Hades Gate): read B8 XX at [3]
// Strategy 2 (Tartarus Gate): scan for B8 anywhere in first 16 bytes
// Strategy 3: if stub starts with FF (call), the real stub is elsewhere
// Strategy 4: byte-by-byte scan for 0F 05 C3, read SSN from preceding bytes
Try each strategy in order until you get a valid (non-zero, reasonable) SSN.
7.4 Hardware Breakpoint Tear-down
A small but helpful trick: before calling your synthesized stub, clear all hardware debug registers (DR0-DR3). Some EDRs use hardware breakpoints to monitor specific syscalls.
__writegsqword(0x10, 0); // Clear DR0
__writegsqword(0x18, 0); // Clear DR1
⛧ Hades Gate - Church of Malware - ek0ms - MCMLXXXIV ⛧