Building Zyrox: A Custom LLVM Obfuscator (Part I)

Developers often need to protect their code from reverse engineering, especially when it contains proprietary algorithms or security-critical logic. While several LLVM-based obfuscators exist on GitHub, many are outdated, unreliable, or come with a complex build process.

This challenge led me to develop Zyrox, a custom LLVM obfuscator designed for modern C/C++ projects. My goal was to create a flexible tool that works as both a compile-time plugin (via clang) and a link-time plugin (via lld). A compile-time plugin is easy to use, requiring just an extra command-line argument. A link-time plugin is more powerful, as it can view the entire program as a single module, enabling more comprehensive and accurate obfuscations.

Setting Up the Plugin

The first step was to create the basic plugin structure. The following code registers Zyrox with LLVM's new PassManager, making it available during both compile-time and link-time optimization (LTO) pipelines. The std::atomic<bool> state{false}; flag is a simple way to ensure the plugin's passes are added only once.

#include <atomic>
#include <llvm/IR/PassManager.h>
#include <llvm/Passes/PassBuilder.h>
#include <llvm/Passes/PassPlugin.h>

class ZyroxPlugin : public llvm::PassInfoMixin<ZyroxPlugin>
{
public:
    llvm::PreservedAnalyses run(llvm::Module &m, llvm::ModuleAnalysisManager &mam);

    static bool isRequired() { return true; }
};

std::atomic<bool> state{false};

extern "C" LLVM_ATTRIBUTE_WEAK llvm::PassPluginLibraryInfo llvmGetPassPluginInfo()
{
    return {LLVM_PLUGIN_API_VERSION, "ZyroxPlugin", LLVM_VERSION_STRING,
            [](llvm::PassBuilder &pb)
            {
                // Register as a compile-time pass
                pb.registerPipelineEarlySimplificationEPCallback(
                    [&](llvm::ModulePassManager &mpm, llvm::OptimizationLevel)
                    {
                        if (state.load()) return true;
                        state.store(true);
                        mpm.addPass(ZyroxPlugin());
                        return true;
                    });

                // Register as a link-time (Full LTO) pass
                pb.registerFullLinkTimeOptimizationEarlyEPCallback(
                    [&](llvm::ModulePassManager &mpm, llvm::OptimizationLevel)
                    {
                        if (state.load()) return true;
                        state.store(true);
                        mpm.addPass(ZyroxPlugin());
                        return true;
                    });
            }};
}
expand code

With the plugin initialized, I started implementing the first obfuscation passes. The ZyroxPlugin::run method iterates through the functions in the module, applying each transformation. Using an iterator-based loop is important because subsequent passes might add new functions to the module.

// Inside ZyroxPlugin::run
auto &func_list = m.getFunctionList();
auto it = func_list.begin();
while (it != func_list.end())
{
    RunOnFunction(*it);
    ++it;
}
expand code

Pass 1: Mixed Boolean-Arithmetic (MBA) Substitution

The first technique implemented was Mixed Boolean-Arithmetic (MBA) Substitution. This pass replaces standard arithmetic and logical operations with more complex but functionally equivalent expressions. For example:

a ^ b becomes (a | b) & ~(a & b)

b * c becomes (((b | c) * (b & c)) + ((b & ~c) * (c & ~b)))

x + y becomes ~(x + (-x + (-x + ~y)))

While a single substitution might seem simple, applying them repeatedly and in combination can make the underlying logic very difficult to understand.

Implementation

To keep the code clean and extensible, I used a macro to define the necessary functions for each operation (Add, Sub, Xor, etc.).

#define DEFINE_FN(Op)                                                               \
    static void runOn##Op(BasicBlock &BB);                                            \
    static Value *Obfuscate##Op(IRBuilder<> &Builder, BinaryOperator *BinOp)

class MBASub
{
    DEFINE_FN(Sub);
    DEFINE_FN(Add);
    DEFINE_FN(Xor);
    DEFINE_FN(Mul);
    DEFINE_FN(Or);

  public:
    static void RunOnFunction(Function &f);
};
expand code

For each operation, I created a list of callback functions, each providing a different MBA transformation. This allows the obfuscator to randomly pick a substitution, increasing the variability of the output. Here is an example for the XOR operation:

typedef std::function<Value *(IRBuilder<> *, BinaryOperator *)> Callback;

// An array of possible transformations for XOR
Callback xor_ops[] = {
    [](IRBuilder<> *builder, BinaryOperator *operation) -> Value *
    {
        // a ^ b => (a | b) & ~(a & b)
        Value *a = operation->getOperand(0);
        Value *b = operation->getOperand(1);
        return builder->CreateAnd(builder->CreateOr(a, b),
                                 builder->CreateNot(builder->CreateAnd(a, b)));
    },
    // More XOR substitutions could be added here
};
expand code

Another macro, DEFINE_RUN, generates the functions that find and replace instructions within a basic block. It iterates through all instructions, finds the target opcode (e.g., Instruction::Xor), and replaces it with the result of a randomly selected obfuscation callback.

#define DEFINE_RUN(Op, Callbacks)                                                   \
    Value *MBASub::Obfuscate##Op(IRBuilder<> &Builder, BinaryOperator *BinOp)       \
    {                                                                               \
        return Callbacks[Random::IntRanged<size_t>(                                 \
            0, sizeof(Callbacks) / sizeof(Callbacks[0]) - 1)](&Builder,             \
                                                                 BinOp);            \
    }                                                                               \
                                                                                    \
    void MBASub::runOn##Op(BasicBlock &BB)                                          \
    {                                                                               \
        std::vector<Instruction *> instructions;                                    \
                                                                                    \
        for (Instruction & Instr : BB)                                              \
        {                                                                           \
            if (Instr.getOpcode() == Instruction::Op)                               \
                instructions.push_back(&Instr);                                     \
        }                                                                           \
                                                                                    \
        for (auto &Instr : instructions)                                            \
        {                                                                           \
            BinaryOperator *BinOp = (BinaryOperator *)Instr;                        \
            IRBuilder<> Builder(Instr);                                             \
            Instr->replaceAllUsesWith(Obfuscate##Op(Builder, BinOp));               \
        }                                                                           \
    }

// Generate the functions for each operation
DEFINE_RUN(Xor, xor_ops)
DEFINE_RUN(Sub, sub_ops)
// ...and so on
expand code

Results

Let's see the effect on a simple XOR function.

After (1 Pass):

__int64 __fastcall XOR(unsigned int a1, int a2)
{
    return a2 + a1 - 2 * (a2 & a1);
}
expand code
int XOR(int a, int b) {
    return a ^ b;
}
expand code

This is still readable. However, running the pass just one more time produces a significantly more convoluted result that hides the original intent.

After (2 Passes):

__int64 __fastcall XOR(int a1, int a2) {
  // variables collapsed

  v2 = a2 + a1;
  v3 = a2 & a1;
  v4 = ~(~(a2 & a1 & 2 & a2 & a1) &
        (a2 & a1 & 2 & a2 & a1 ^ a2 & a1 & 2 ^ a2 & a1) & 2) &
       (~(a2 & a1 & 2) & (a2 & a1) & 2 ^
        ~(a2 & a1 & 2 & a2 & a1) &
          /* more of whatever that is... */
  return ~(~((-v7 & v2 & 0xFFFFFFFD) * (~(-v7 & v2) & 2) +
            (-v7 & v2 & 2) * ((-v7 & v2) + 2 - (-v7 & v2 & 2))) -
           (v2 -
            v7 -
            // ...
}
expand code
int XOR(int a, int b) {
    return a ^ b;
}
expand code

This demonstrates how quickly MBA can obscure simple arithmetic. To balance complexity and performance, it is better to allow controlling obfuscation on a per-function basis.

Pass 2: Simple Indirect Branch (SIBR)

Next, I implemented Simple Indirect Branch (SIBR). This pass is designed to confuse static analysis tools like IDA Pro and Ghidra by breaking their control flow graphs. While it's not unbreakable (in fact, it is easy to deobfuscate), it effectively deters casual reverse engineering.

The idea is to replace direct branches with indirect ones. Instead of goto LABEL_A, the code will store then load the address of LABEL_A on stack and jump to it.

Conceptual Example:

// Create a jump table on the stack
@stack jump_table[2];

// Store jump targets
jump_table[0] = &LABEL_B;
jump_table[1] = &LABEL_A;

// Calculate index and jump
int index = !(x == 2);
goto jump_table[index];
expand code
if (x == 2) {
    goto LABEL_A;
} else {
    goto LABEL_B;
}
expand code

This simple indirection is often enough to break the analysis flow in decompilers.

Limitation: Switch Statements

A major limitation of this approach is that it only works on BranchInst instructions, not SwitchInst. To solve this, I wrote a utility pass that flattens switches into a series of equivalent if-else branches before the SIBR pass runs.

Flattening Logic:

if (val == 0) goto A;
else goto CHECK_B;

CHECK_B:
if (val == 1) goto B;
else goto D;
expand code
switch (val) {
    case 0: goto A;
    case 1: goto B;
    default: goto D;
}
expand code

Once all switch statements are flattened, the SIBR pass can effectively process every jump in the function.

Results

Let's apply SIBR to a simple if statement.

Before:

int __fastcall main(int argc, const char **argv, const char **envp)
{
  __int64 v3; // kr00_8
  int result; // w0
  _QWORD v5[2]; // [xsp+10h] [xbp-10h]

  // This is our jump table
  v5[0] = sub_170C;
  v5[1] = &loc_172C;

  // The address is selected and stored in register X8
  v3 = v5[(unsigned int)XOR(5, 7) != 2];

  // An indirect branch is performed
  __asm { BR              X8 }
  return result; // Seems that the decompiler gave up, lol
}
expand code
int main() {
    if (XOR(5, 7) == 2) {
        printf("result is 2\n");
    } else {
        printf("result is not 2\n");
    }
    return 0;
}
expand code

The decompiler can no longer see the if-else structure. Instead, it sees a jump table being populated and an indirect branch (BR X8). It fails to connect main to the code blocks that print the results.

Pass 3: Basic Block Splitting (BBS)

What if a function contains important logic but has very few jumps? Passes like SIBR would have little effect. To address this, I created the Basic Block Splitter (BBS).

This pass splits large basic blocks into smaller ones, inserting unconditional jumps between them. This doesn't change the program's logic but creates many more branching instructions for other control-flow obfuscations to work with.

Implementation

The implementation identifies blocks that are larger than a configured threshold. It then uses a worklist to repeatedly split them until all resulting blocks are under a maximum size.

The core of the logic is a call to BasicBlock::splitBasicBlock, which takes an iterator pointing to an instruction and splits the block at that point, creating a new block and adding a jump from the original one.

// A simplified view of the splitting loop
while (!work_list.empty())
{
    BasicBlock *current = work_list.back();
    work_list.pop_back();

    if (current->size() <= max_block_size)
        continue;

    // Find a safe point to split the block
    BasicBlock::iterator split_it = find_split_point(current);

    // Perform the split
    BasicBlock *new_block = current->splitBasicBlock(&*split_it);

    // Add the resulting blocks back to the list for further splitting
    work_list.push_back(current);
    work_list.push_back(new_block);
}
expand code

Combining the Passes

Now, our obfuscation pipeline looks like this:

FlattenSwitches: Prepare the code for SIBR.

MBASub: Obfuscate arithmetic expressions.

BasicBlockSplitter: Create more control flow complexity.

SimpleIndirectBranch: Obfuscate the newly created control flow.

Let's see the final result on the heavily obfuscated XOR function from earlier. First, we apply MBA and BBS. This doesn't change the decompiled C code, but the assembly now contains many small blocks connected by jmp instructions.

Assembly after MBA + BBS:

XOR proc near
    ; ...
    jmp     loc_2196
loc_210F:
    mov     esi, ecx
    xor     esi, 0FFFFFFFFh
    ; ...
    jmp     loc_21C0
loc_2120:
    mov     r9d, r8d
    ; ...
    jmp     short loc_214D
; ... many more small blocks
expand code

Finally, we apply SIBR to this fragmented code. The result is a mess of indirect jumps that is extremely difficult to analyze.

Final Decompiled Code (MBA + BBS + SIBR):

void XOR()
{
  JUMPOUT(0x1808LL); // Decompiler hits a dead end immediately
}

__int64 __fastcall sub_179B(__int64 a1, __int64 a2, int a3, unsigned int a4)
{
  __int64 v4; // rbp

  *(_QWORD *)(v4 - 16) = sub_17B2;
  return (*(__int64 (__fastcall **)(__int64, _QWORD, _QWORD, _QWORD))(v4 - 16))(
           a1,
           a3 + a4,
           *(_QWORD *)(v4 - 16),
           a3 & a4);
}

__int64 __fastcall sub_1A00(__int64 a1, __int64 a2, __int64 a3,
        __int64 a4, __int64 a5, __int64 a6, __int64 a7)
{
  // variables collapsed

  v9 = a5 + a4;
  v10 = v9 ^ __ROL8__(a5, 13);
  v11 = v8 + a6;
  v12 = __ROL8__(v8, 16);
  v13 = v11
      + v12
      - (((unsigned __int8)v11 & (unsigned __int8)v12 & 2) * (v11 & v12 | 2)
       + ((unsigned __int8)v11
            & (unsigned __int8)v12 & 2 ^ 2LL) * (v11 & v12 & 0xFFFFFFFFFFFFFFFDLL));
  v14 = v13 + __ROL8__(v9, 32);
  retaddr = sub_1B10;
  return ((__int64 (__fastcall *)
            (__int64, unsigned __int64, __int64, unsigned __int64, unsigned __int64))
               *(&retaddr + ((v7 & 1) == 0)))(
                    a1,
                   v14 ^ __ROL8__(v13, 21),
                   a3,
                   v14,
                      ((v11 & 2) * (v11 | 2) + (v11 & 2 ^ 2) * (v11 & 0xFFFFFFFFFFFFFFFDLL)
                         + 2 * v10 - (v10 + v11)) ^ __ROL8__(v10, 17));
}

// ... and many more confusing, disconnected functions
expand code
int XOR(int a, int b) {
    return a ^ b;
}
expand code

The original XOR function is now completely hidden inside a web of obfuscated arithmetic and indirect control flow.

What's Next?

This article covers the foundational passes in Zyrox. In the future, I plan to write about more advanced techniques, including:

Control Flow Flattening: A more robust alternative to SIBR with multiple implementation strategies.

Indirect Branch V2: Using encrypted jump tables to raise the bar for analysis.

String Encryption: Protecting sensitive string literals at rest (on stack, and globally).

The Rewrite: Combines a JavaScript-based configuration system with annotate("") support, enabling precise, per-function control across the full obfuscation pipeline.

Stay tuned for Part II!

In the meantime, feel free to explore more blogs here.

You can also check this page, which I wrote for a university project and fun.

© 2025 peterr[dot]dev. All rights reserved.