A Look at AFL++ Under The Hood

How this post is structured

The objective of this post is to allow anyone to gain an understanding of AFL at the level they want. I want to cover AFL at both a usage level and an internals level.

At the end of this article, there are In-Depth sections that cover AFL in even more depth.

One additional note. In the code snippets, I often use ... to replace certain code. This is to increase readability by eliminating the edge case code. I have linked the source code on all code snippets if you are interested in reading that code.

This is not a user’s guide to AFL. For that, I highly recommend checking out Fuzzing101 by GitHub Security Lab. This post is more targeted at those interested in hacking on AFL or learning a little bit more about the world’s favorite fuzzer.

Disclaimer: I not a developer on the AFL++ project. This is just my analysis of the source code.

What is AFL++ and Coverage-Based Fuzzing?

AFL, or the current variation AFL++, is a state-of-the-art fuzzer used to fuzz a wide variety of binaries. Almost every fuzzing campaign today is done using AFL or some variation of AFL.

AFL is not a random-input fuzzer. AFL does something called coverage-based fuzzing. The idea behind coverage-based fuzzing is to keep track of what areas of the binary are executing or coverage. By keeping track of this information, we are able to figure out which inputs lead to which parts of the code executing. With this, we can develop a database of inputs that cover not only a small subset of the codebase but the entire codebase. This would allow us to find errors everywhere in the code, not only in the most commonly used codepaths.

However, how do we build that database of inputs - or corpus? The answer is iteration. Typically, we start with a few manually generated seed inputs. These seed inputs are then mutated (Randomly changed) to see if they cause a change in coverage. If they cause a change in coverage (Among other potential factors), they are deemed interesting and stored in our corpus for further mutation. Eventually, after iteratively mutating on our growing corpus, we will have a corpus that covers the entire codebase.

Hopefully, we will be able to mutate an input in such a way that we will generate an error in the codepath targeted by that input.

Let’s see an example of coverage given some example code:

int main(){
	input = get_input(); //stub that represents getting the input. Could be via a file, via stdin, or some other means. 
	if(condition A){
		...code A...
	}else if(condition B){
		...code B ..
	}else{
		...code C...
	}
	return 0;
}

Suppose we enter our seed input into the program. This input - seed - leads to code C being executed.

Suppose we then mutate seed in some way to generate seed_A. This seed_A, upon being inputted, leads to code A being executed. This is new. With the knowledge that seed_A leads to code A being executed, we can store seed_A in our corpus and mutate it further to allow for new, different code to execute.

This iterative process of finding new, “interesting” input and building on it is what makes coverage-based fuzzing so powerful.

AFL++ Architecture

AFL++ starts when the afl-fuzz binary is run. The command will usually look something like this:

afl-fuzz -i [input] -o [output] -- /path/to/fuzzed/binary [@@]

Typically, the @@ is replaced with an input file that the process reads from. If no input file is provided, then AFL assumes that the process takes in input using stdin (Or shmem).

-i is the directory containing the seed inputs.

-o is the directory that the output will be placed.

When afl-fuzz is run, a series of initialization functions are run. After that, afl-fuzz is forked and then exec’d with the target binary. However, the fuzzing itself does not happen in the child process of afl-fuzz. Instead, the child process is stopped right before main and serves as a “forkserver”. This forkserver then forks again, and the fuzzing happens in the grandchild process. The reason for this is that forking is much faster than exec. So, it is significantly faster to perform fuzzing in the grandchild than in the child. We will discuss this process further.

In order for afl-fuzz to communicate with the target binary, two pipes are created: a control pipe and a status pipe (Source). The control pipe is located at FORKSRV_FD (Usually 198) and is read-only for the target binary and write-only for afl-fuzz. The status pipe is located at FORKSRV_FD+1 and is write-only by the target binary and read-only by afl-fuzz.

The control pipe is used to send control messages to the target binary. The status pipe is used to send status messages to afl-fuzz.

Now that we have a high-level understanding of what afl-fuzz does let’s take a closer look at what the binary looks like and is doing.

Coverage Instrumentation

In order for AFL++ to work properly (At least in its normal operating mode), the binary must be compiled using a special AFL compiler. This is so that AFL can insert special instructions to keep track of coverage. These special instructions are known as instrumentation. Typically, instrumentaton instructions access what is known as a coverage map. This coverage map is a record of what parts of the code have been accessed.

AFL currently supports many forms of coverage, but the 2 most common and well-supported are PCGUARD and LTO.

Let’s start with PCGUARD. To use PCGUARD instrumentation, you must use the afl-clang-fast. To understand PCGUARD, let’s take a look at a binary compiled with PCGUARD in Ghidra/lldb:

Decompiled C:

Assembly:

In AFL, the coverage map is __afl_area_ptr. __afl_area_ptr is an array that is accessed every time a new area is reached. This can be seen in the decompiled C. Notice how there is an access to __afl_area_ptr at the top of the function, in the if statement, and in the else statement.

In PCGUARD, The index of __afl_area_ptr accessed is determined at runtime. Specifically, the value of the DAT_* variable is populated at runtime. This can be seen in the assembly with the movsxd where the index is being loaded into rax. We will see this process in more detail when we discuss initialization.

As an aside, notice the adc dl, 0 in the assembly. This is to ensure that if add dl, 1 wraps around, dl != 0. If add dl, 1 wraps around then CF=1. So, adc dl, 0 will result in dl=1.

Now let’s discuss LTO. LTO coverage can be achieved using the afl-clang-lto compiler. LTO coverage is very similar to PCGUARD except the index into __afl_area_ptr is populated at compile-time instead of runtime.

This can be seen in the decompiled C of an LTO binary:

Notice how the index into __afl_area_ptr is an immediate instead of a variable.

For more information on coverage, how it is inserted, and general implementation details I recommend checking out the In-Depth: Coverage Instrumentation and LLVM Hell section which covers the internals of the AFL compiler.

Initialization and Forkserver

Before we discuss the initialization process, we need to a discuss a feature within ELFs known as the init array.

When a binary is run, main doesn’t launch. Instead, we always start with the interpreter (Usually, ld.so). The interpreter sets up many key structures for processes like dynamic linking. However, a lesser known feature of the interpreter is its ability to run initialization functions. These initialization functions are located in a section called .init_array. This section contains an array of functions that are run in-order right before main (Source 1, Source 2).

The functions in .init_array are dependent on whether the binary was compiled with PCGUARD or LTO (However, they are mostly the same).

As we discuss the functions in .init_array, there will be many variables, environment variables, macros. For ease of use, I have placed all their definitions below:

Regular Variables:

__afl_area_ptr - The coverage map pointer. By default this is __afl_area_initial. The size of __afl_area_initial is MAP_INITIAL_SIZE (Source)
__afl_final_loc - The last index in __afl_area_ptr accessed by instrumentation
__afl_map_addr - This is the address that the coverage map will be mmap’d. As far as I know, this only really exists when AFL_LLVM_MAP_ADDR is set in LTO mode. Otherwise, it is 0.
__afl_map_size - The size of the coverage map
__afl_area_initial - The coverage map used before shared memory is mapped and if shared memory is not accessible (i.e. we are not running under AFL). This is created as an array in afl-compiler-rt.o.c

Environment Variables:

__AFL_SHM_ID (Aliased to SHM_ENV_VAR) - Shared memory ID for the coverage map.
__AFL_SHM_FUZZ_ID (Aliased to SHM_FUZZ_ENV_VAR) - Shared memory ID for shared memory fuzzing.
AFL_MAP_SIZE - Used to set the size of the shared memory buffer allocated by afl-fuzz.

Macros:

MAP_SIZE - a custom value that afl-fuzz can use to force the size of the shared memory map.
MAP_INITIAL_SIZE - size of __afl_area_initial Let’s start with PCGUARD. The init functions are as follows:

Let’s walk through each function. All of these functions (Except for sancov.module_ctor_trace_pc_guard) can be found in afl-compiler-rt.o.c.

Let’s start with __afl_auto_first. This function does nothing. All it does is set __afl_already_initialized_first = 1.

The next function is __afl_auto_second. In the case of PCGUARD, this function does nothing. This is because __afl_final_loc = 0 at this point. We will see that this is actually initialized in the next function.

The next function is sancov.module_ctor_trace_pc_guard. This is actually just a call to __sanitizer_cov_trace_pc_guard_init as seen below. For more information on how this call is constructed, I recommend checking out the “In-Depth: Coverage Instrumentation and LLVM Hell” section.

__sanitizer_cov_trace_pc_guard_init is what initializes __afl_final_loc and the DAT_* variables we saw in the instrumentation. Let’s take a look at the function in more detail:

void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) {

  u32   inst_ratio = 100;
  char *x;

  _is_sancov = 1;

...

  if (start == stop || *start) return;

	...

  if (__afl_final_loc < 3) __afl_final_loc = 3;  // we skip the first 4 entries

  *(start++) = ++__afl_final_loc; //start at 4

  while (start < stop) {

    if (likely(inst_ratio == 100) || R(100) < inst_ratio)
      *start = ++__afl_final_loc;
    else
      *start = 0;  // write to map[0]

    start++;

  }
...

}

A Look at AFL++ Under The Hood#

How this post is structured#

What is AFL++ and Coverage-Based Fuzzing?#

AFL++ Architecture#

Coverage Instrumentation#

Initialization and Forkserver#

Persistent Mode#

Shared Memory Fuzzing#

Sanitizers#

In-Depth: afl-fuzz initialization and runtime processes#

Initializing afl-fuzz#

Runtime of afl-fuzz#

In-Depth: Coverage Instrumentation and LLVM Hell#

afl-cc#

LLVM Pass Plugins#

Conclusion#

Acknowledgements#

A Look at AFL++ Under The Hood

How this post is structured

What is AFL++ and Coverage-Based Fuzzing?

AFL++ Architecture

Coverage Instrumentation

Initialization and Forkserver

Persistent Mode

Shared Memory Fuzzing

Sanitizers

In-Depth: `afl-fuzz` initialization and runtime processes

Initializing `afl-fuzz`

Runtime of `afl-fuzz`

In-Depth: Coverage Instrumentation and LLVM Hell

`afl-cc`

LLVM Pass Plugins

Conclusion

Acknowledgements