In a previous post we documented how we used GitHub Copilot to solve a stubborn Objective-C runtime registration problem inside Patrick Wardle’s ReflectiveLoader project to enable loading of ObjC and Swift libraries entirely from memory. This post picks up where that one left off, describing the next capability we wanted: executing arbitrary native Mach-O executables in memory, with the goal of having one binary masquerade as another.
Everything is python
At a recent security conference I attended a talk that featured an interesting post-compromise evasion on Linux. The technique used ulexecve, a Python library that maps and executes whatever ELF binary you load into it directly from userland. From a monitoring perspective, the activity still appears to be the original parent process. For example, if the parent process was python, EDR tooling would simply observe repeated invocations of python even though entirely different binaries were actually being executed under the hood. The effect is conceptually similar to techniques like process doppelganging on Windows: the visible process identity remains the same, but the code being executed has changed.
The evasion implications are significant. EDRs, audit frameworks, and many blue-team detection pipelines tend to key heavily on execve/posix_spawn events. That’s where process lineage, code-signing validation, and behavioral baselines are typically enforced. If a technique avoids those APIs and instead executes binaries inside a forked process that retains the parent’s identity, much of that visibility disappears. Once you start exploring that path on Linux, the natural follow-on question becomes: can I do the same thing on macOS?
Extend ReflectiveLoader to Support DWARF Loading
The ReflectiveLoader codebase already contained a working in-memory Mach-O loader for dylibs. We had already extended it with ObjC/Swift runtime registration support. The core machinery, parsing load commands, mapping segments with vm_copy, resolving imported symbols, running initializers, was all there. The only restriction standing between us and executable loading seemed to be a single line of code:
if ( mh->filetype == MH_EXECUTE )
throw "can't load another MH_EXECUTE";
That guard existed because dyld never loads a second executable into a running process. For our purposes it was exactly the wrong behavior. The plan was to load the entire project into GitHub Copilot using Claude Sonnet as the model, point it at the dwarf_exec branch, and incrementally work through the implementation — letting the AI handle the mechanical parts of the low-level Mach-O surgery while we focused on the higher-level design decisions.
__PAGEZERO and the Executable Address Space
Removing the guard and trying a first run immediately surfaced the next problem. Executables carry a __PAGEZERO segment, a zero-filled guard page anchored at virtual address 0x0 whose entire purpose is to catch null-pointer dereferences. Its file size is zero; there is nothing to copy. The loader’s segment-mapping loop, written for dylibs, did not know about this and tried to vm_copy zero bytes into address zero, which the kernel rightfully rejects.
We asked Copilot to audit every loop in the loader that touched segments and identify where __PAGEZERO would cause problems. It returned three call sites in ImageLoaderMachO.cpp: the address-range calculation in assignSegmentAddresses, the vm_copy loop in mapSegments, and the permissions loop that follows it. Each needed a guard:
// Skip __PAGEZERO — unmapped guard region with no file content
if ( (strcmp(segName(i), "__PAGEZERO") == 0) && (segFileSize(i) == 0) )
continue;
With those three guards in place, plain C executables compiled against the older LC_DYLD_INFO_ONLY binding format loaded and ran correctly. We built a small test executable, loaded it from memory, called its main(), and it printed its arguments and exited with the expected code. That was the first real milestone , a Mach-O executable executing from a byte buffer inside another process.
Modern Binaries and LC_DYLD_CHAINED_FIXUPS
The next test was a Homebrew binary — the kind of real-world target you would want to run post-compromise. It crashed immediately. The problem was that modern macOS toolchain defaults have shifted away from LC_DYLD_INFO_ONLY toward LC_DYLD_CHAINED_FIXUPS, a more compact encoding that chains pointer fixups through the data segment rather than using a separate binding opcode stream. The existing codebase, compiled in UNSIGN_TOLERANT mode (which cannot use Apple’s private fixup APIs), had a stub that simply called dyld::throwf("Unimplemented") when it encountered a chained fixups binary.
We loaded the ImageLoaderMachOCompressed.cpp source into context and asked Copilot to implement doApplyChainedFixupsManual(). It generated a from-scratch chained fixups walker that would build the import target table purely from public headers and then walk the per-segment fixup chains. The implementation went through several iterations. The key insight Claude surfaced was that the existing resolve() method already handled two-level namespace lookup, library ordinal resolution, and weak imports. This meant the new code only needed to build the ordered import target array and then dispatch pointer rewrites based on the chain pointer format field:
switch ( ptrFmt ) {
case DYLD_CHAINED_PTR_64_OFFSET: // format 6 — arm64 Homebrew/SDK
case DYLD_CHAINED_PTR_64: // format 2 — standard 64-bit
case DYLD_CHAINED_PTR_ARM64E: // formats 1,9,12 — arm64e
case DYLD_CHAINED_PTR_ARM64E_USERLAND:
case DYLD_CHAINED_PTR_ARM64E_USERLAND24:
// resolve bind vs rebase entries and patch in place
}
With that in place, Homebrew-compiled arm64 binaries, those using DYLD_CHAINED_PTR_64_OFFSET format 6, loaded cleanly. We tested with jq as a first real-world target: pass it a byte buffer, call custom_dlopen_executable_from_memory(), retrieve the entry point with custom_dlget_entry(), and call main() directly in a forked child with pipes capturing stdout. It worked.
Arm64e System Binaries and Pointer Authentication
The real prize was Apple’s own system binaries — whoami, id, ls, ps — because those are exactly what you reach for post-compromise to run recon without spawning new processes. System binaries on Apple Silicon are compiled for arm64e, which adds Pointer Authentication Code (PAC) instructions to the ISA. PAC uses CPU keys to cryptographically sign and verify pointers, and the signing key for a process is set at launch. This means a binary loaded into a foreign process will carry PAC instructions that reference keys it was never given.
The approach we settled on, with significant help from Claude walking through the arm64e ISA reference, was to patch PAC instructions out of the code before execution. The loader already has write access to the freshly vm_copy-ed text segment and, because we hold the allow-unsigned-executable-memory and disable-executable-page-protection entitlements (the same ones the dylib PoC requires), we can temporarily make the segment writable with vm_protect, walk every instruction, substitute the appropriate equivalent without authentication, and restore execute-only permissions before jumping in.
The substitution rules Claude identified covered five groups:
// Group 1 — HINT-space stack PAC (exact encodings)
PACIASP / PACIBSP / AUTIASP / AUTIBSP / PACIAZ / ... → NOP (0xD503201F)
// Group 2 — Authenticated returns
RETAA / RETAB → RET (0xD65F03C0)
// Group 3 — Authenticated indirect branches, zero modifier (BRAAZ/BLRAAZ...)
(insn & 0xFFBFF81F) == 0xD61F081F → BR/BLR Xn
// Group 4 — Authenticated indirect branches, register modifier (BRAA/BLRAA...)
(insn & 0xFFBFF800) == 0xD71F0800 → BR/BLR Xn
// Group 5 — Register PAC ops (PACIA/AUTIA/XPACI/...)
(insn & 0xFFFF0000) == 0xDAC10000 → NOP
The first pass of the PAC stripping code had a subtle off-by-one in the Group 3 and Group 4 masks — the bit that distinguishes BR from BLR was being included in the match mask, causing authenticated branch variants to fall through unpatched. Copilot helped track down the encoding discrepancy by working directly from the arm64e ISA supplement: bit 21 is the link bit, not bit 25. A second off-by-one appeared in the arm64e chained fixup chain walk: the next stride field occupies bits 51–61, so the right-shift is 51, not 52. One bit of error there caused the walker to compute a stride of 1024 entries instead of 1, skipping the entire __auth_got section.
Once both fixes landed, the full suite passed: whoami, id, ls, ps, uname, hostname, echo — all Apple system binaries, all arm64e, all loading and executing cleanly from memory.
A macOS Equivalent to ulexecve
The final public API surface is intentionally minimal. Three functions alongside the existing dylib loader:
// Load a Mach-O executable from a byte buffer into the current process
void *custom_dlopen_executable_from_memory(void *mh, int len);
// Retrieve the entry point (tries LC_MAIN first, falls back to LC_UNIXTHREAD)
void *custom_dlget_entry(void *handle);
// Free the in-memory image
int custom_dlclose(void *handle);
The included proof-of-concept (PoC_exec) implements a forked execution model: the loader forks, the child loads the target binary from the in-memory byte buffer, calls main(), and pipes stdout and stderr back to the parent. From an external observer’s perspective — process listings, audit logs, EDR telemetry — only one process ever existed. The child’s ancestry shows the parent; there is no execve event; no file is written to disk. The target binary is fetched, executed, its output captured, and the memory cleaned up, all inside the lifespan of the original process.
Fat/universal binaries are handled transparently — the loader extracts the native architecture slice before passing it to the mapping code, so the same call works for binaries targeting both Intel and Apple Silicon.
Lessons Working With Claude
As with the ObjC fix in Part 1, the most productive approach was not to ask the AI for working code outright but to use it as a structured sounding board with the full repository in context.
The approach is consistent with the broader theme from Part 1: the AI is most useful when the workspace is its context. Loading the relevant source files, the ABI documentation, and the concrete error output into the conversation collapses what would otherwise be an afternoon of manual binary diffing into a handful of focused prompts.
Limitations and Next Steps
A few constraints are worth calling out for anyone looking to use this operationally:
- Private library dependencies. Binaries that depend on
@rpath libraries that are not already resident in the host process (e.g., cmake, Python-linked tools) will fail at link time. The workaround is to either preload the dependency libraries or target binaries that link only against the system-provided frameworks.
- Required entitlements. The host process needs
com.apple.security.cs.allow-unsigned-executable-memory and com.apple.security.cs.disable-executable-page-protection. These are the same entitlements the dylib loader already required, so any implant already using that loader does not need new entitlements.
- SIP-protected binaries. Loading a binary does not grant it the entitlements or privileges it would have under its own code signature. Binaries requiring specific entitlements to operate (TCC-gated APIs, com.apple.rootless, etc.) will hit those restrictions inside the host process.
The code is currently available on the dwarf_exec branch of the ReflectiveLoader fork. I have submitted a PR to have it merged into the main branch. The PoC_exec directory contains the proof of concept, the test payloads, and the Makefile.