Fuzzing

Overview

Fuzzing is a software testing technique that involves feeding a program with random generated data to identify corner-case errors. There are different types of fuzzers:

“Black-box” fuzzers, that runs your program as-is, without any instrumentation and knowledge.
“Grey-box” fuzzers, that instruments your program at compile time or runs it inside QEMU (efficiently), gaining insight about control flow.

Grey-box fuzzers, also known as feedback-driven fuzzers, are superior to black-box ones but require full access to the binary or source code. By leveraging control-flow knowledge, they can employ genetic algorithms for input mutation, save new discovered path and explore your program step-by-step.

They also has two different modes:

The program runs without code modifications, accepting generated input via files or CLI arguments. This method is slower due to program startup overhead, even with fuzzer’s attempt to minimize it (e.g., by forking before main()).
So called “Persistent Mode”, when you provide function for testing arbitary data.

Persistent Mode should be used for anything serious, becuase it’s 2x-5x times faster.

Funny ways to utilize fuzzers & tips

This section is just a bunch of random tips and ideas.

Everything can be a bytecode

A fuzzer gives you an array of random bytes—but you can interpret it however you want. For example, if you’re fuzzing a binary tree, you could treat the fuzzer’s input as a sequence of commands like this:
```
struct cmd {
    enum command_type; // insert, remove, etc

    union {
        struct cmd_insert ins; // for insert
        struct cmd_remove rem; // for remove
        // ...
    };
};
```
Of course, data is still random so you will have to deal with enum value being out of range or violating your code’s input invariants.
Adding fuzzer to unittests

The most valuable output of fuzzing is the corpus: a set of unique inputs (in terms of code coverage) that trigger every path the fuzzer discovers. So you can integrate interpreter + corpus data as unittests into your CI for fast checks and ~100% coverage, while keep running fuzzer 24/7 on some VM. Just pull new code once a week or so to discover added code pathes, and upload corpus to unittests.
Corpus is everything

Never underestimate the corpus – especially if you’re fuzzing something complex, like a binary tree through interpreter. Because code path discovery is kinda like a multidimensional optimization problem, fuzzers can get stuck in local minima. So you need to give the fuzzer multiple starting points by providing a well-seeded corpus.

I personally ran into this issue once: two fuzzers, both starting with empty corpora, discovered different code paths and never crossed over to the other.

List of fuzzers

Fuzzers differ in their mutation algorithms and heuristics, so there is no such thing as perfect fuzzer. There are 3 main fuzzers for C that I have tested:

Honggfuzz – multithreaded, good results (best for my case), easy to set up and provides clear coverage monitoring.
libFuzzer – multithreaded, part of LLVM, easy to use but semidead and has unclear status reporting
AFL++ – kinda old legend or idk. Has a lot of variations, but is singlethreaded. You __can run it in a master-slave configuration across multiple cores, even with different strategies or use different forks, but the setup is notoriously painful and there is no easy version.

Simple example with Honggfuzz

Let’s test a basic program that crashes when the first input byte is even:

// main.c

#include <stdint.h>
#include <stdlib.h>

// Function has to have this name
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t len) {
    if (len == 0) { return 0; }
    if (data[0] % 2 == 0) { abort(); }
    return 0; 
}

First of all, clone and build honggfuzz

git clone https://github.com/google/honggfuzz
cd honggfuzz && make

Build main.c with fuzzer’s compiler

<hongfuzz_dir>/hfuzz_cc/hfuzz-cc -g -fsanitize=address,undefined -O3 main.c -o main

Run fuzzer with empty corpus (it will start from random data)

mkdir corpus crashes
<honggfuzz>/honggfuzz -P -n $(nproc) -i corpus/ --crashdir crashes/ --max-file-size 10 -- ./main

It will instantly find first crashes and save them in crashes/ directory. 3. Reproduce & debug bug

./main crashes/< some crash file >

For debugging you will probably want to rebuild ./main with disablled optimizations and launch it under gdb.