My favourite C tooling is the sanitizer suite from Google. Valgrind is still excellent, but I’ve been reaching for the sanitizers more often than not nowadays. Using the sanitizers does not require running a separate binary like valgrind (although you need to build your binary with the sanitizer libraries), and they run faster to boot.
On my Arch Linux machine, the sanitizer libraries are available through the gcc-libs
package:
/usr/lib/libasan.so is owned by gcc-libs 12.2.1-2
/usr/lib/liblsan.so is owned by gcc-libs 12.2.1-2
/usr/lib/libtsan.so is owned by gcc-libs 12.2.1-2
/usr/lib/libubsan.so is owned by gcc-libs 12.2.1-2
Address Sanitizer
The address sanitizer detects memory errors, including but not limited to:
Let’s take a look at some examples!
Use After Free and Heap Overflow
#include <stdlib.h>
int main(void) {
char *array = calloc(10, sizeof(*array));
free(array);
return array[5];
}
Here we have a classic use-after-free scenario, with the array freed on line 5 yet an element of it used on line 6. After compiling with the appropriate libraries, executing the program results in a massive error output:
=================================================================
==41637==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000000015 at pc 0x55e54c91e118 bp 0x7ffd7cb08f60 sp 0x7ffd7cb08f58
READ of size 1 at 0x602000000015 thread T0
#0 0x55e54c91e117 in main /path/to/prog.c:6:12
#1 0x7fb584abc78f (/usr/lib/libc.so.6+0x2378f) (BuildId: 4a4bec3d95a1804443e852958fe59ed461135ce9)
#2 0x7fb584abc849 in __libc_start_main (/usr/lib/libc.so.6+0x23849) (BuildId: 4a4bec3d95a1804443e852958fe59ed461135ce9)
#3 0x55e54c822064 in _start (/path/to/a.out+0x20064) (BuildId: 97282113be450691408ea99c37467b6d5620c439)
0x602000000015 is located 5 bytes inside of 10-byte region [0x602000000010,0x60200000001a)
freed by thread T0 here:
#0 0x55e54c8d85e2 in __interceptor_free.part.0 asan_malloc_linux.cpp.o
#1 0x55e54c91e0da in main /path/to/prog.c:5:5
#2 0x7fb584abc78f (/usr/lib/libc.so.6+0x2378f) (BuildId: 4a4bec3d95a1804443e852958fe59ed461135ce9)
previously allocated by thread T0 here:
#0 0x55e54c8d9951 in __interceptor_calloc (/path/to/a.out+0xd7951) (BuildId: 97282113be450691408ea99c37467b6d5620c439)
#1 0x55e54c91e0cd in main /path/to/prog.c:4:19
#2 0x7fb584abc78f (/usr/lib/libc.so.6+0x2378f) (BuildId: 4a4bec3d95a1804443e852958fe59ed461135ce9)
SUMMARY: AddressSanitizer: heap-use-after-free /path/to/prog.c:6:12 in main
Shadow bytes around the buggy address:
0x0c047fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c047fff8000: fa fa[fd]fd fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==41637==ABORTING
Valid point, but I’ll break down the output into pieces to help better explain it. Here’s the first block:
==41637==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000000015 at pc 0x55e54c91e118 bp 0x7ffd7cb08f60 sp 0x7ffd7cb08f58
READ of size 1 at 0x602000000015 thread T0
#0 0x55e54c91e117 in main /path/to/prog.c:6:12
#1 0x7fb584abc78f (/usr/lib/libc.so.6+0x2378f) (BuildId: 4a4bec3d95a1804443e852958fe59ed461135ce9)
#2 0x7fb584abc849 in __libc_start_main (/usr/lib/libc.so.6+0x23849) (BuildId: 4a4bec3d95a1804443e852958fe59ed461135ce9)
#3 0x55e54c822064 in _start (/path/to/a.out+0x20064) (BuildId: 97282113be450691408ea99c37467b6d5620c439)
The first line of the block informs us of some key addresses:
- Bad Memory Address
- 0x602000000015
- Program Counter (pc)
- 0x55e54c91e118
- Base Pointer (bp)
- 0x7ffd7cb08f60
- Stack Pointer (sp)
- 0x7ffd7cb08f58
Right after, we get info about the location of the “use” in “use after free”.
We see that we have an invalid read of 1 byte at the listed address 0x602000000015
.
The stack trace given right below shows us exactly where the read occured: line 6, column 12 of prog.c
, in the main
function. Neat!
If you do not get line numbers in the stack trace, make sure debug information is included during compilation (usually through the -g
flag).
freed by thread T0 here:
#0 0x55e54c8d85e2 in __interceptor_free.part.0 asan_malloc_linux.cpp.o
#1 0x55e54c91e0da in main /path/to/prog.c:5:5
#2 0x7fb584abc78f (/usr/lib/libc.so.6+0x2378f) (BuildId: 4a4bec3d95a1804443e852958fe59ed461135ce9)
previously allocated by thread T0 here:
#0 0x55e54c8d9951 in __interceptor_calloc (/path/to/a.out+0xd7951) (BuildId: 97282113be450691408ea99c37467b6d5620c439)
#1 0x55e54c91e0cd in main /path/to/prog.c:4:19
#2 0x7fb584abc78f (/usr/lib/libc.so.6+0x2378f) (BuildId: 4a4bec3d95a1804443e852958fe59ed461135ce9)
These two blocks show the offset of the invalid access in the previously allocated (then freed) block. We also get two stack traces representing the deallocation and allocation of the block respectively.
In the example, we see that the byte read at 0x602000000015
is located inside the 10 byte block allocated starting 0x602000000010
. This 10-byte block corresponds to the 10 element char
array in the C code. We see that the array was freed on line 5, column 5 of main
, and allocated on line 4, column 19.
Shadow bytes around the buggy address:
0x0c047fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c047fff8000: fa fa[fd]fd fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Lastly is the summary block, displaying what we’ve learned so far in a more graphical view. The block of bytes shown are shadow bytes, which are special bytes filled in by ASAN to catch invalid accesses.
Good question! The answer is in the shadow byte legend, displayed at the end of the error output.
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==41637==ABORTING
Here, more information is revealed. We see that one shadow byte in the summary is actually 8 bytes in the application. That also explains the weird addresses: they aren’t actual memory addresses, but virtual shadow byte addresses that are mapped to the actual memory addresses in the program!
The addresses are mapped like so:
---------------- | ---------------------------------
0x0c047fff8000 | 0x602000000000 - 0x602000000007
0x0c047fff8001 | 0x602000000008 - 0x60200000000F
0x0c047fff8002 | 0x602000000010 - 0x602000000018
...
The location of the invalid access (marked by square brackets) is at the shadow address 0x0c047fff8002
, which corresponds to real addresses 0x602000000010
to 0x602000000018
.
This range contains our invalid access at 0x602000000015
, so we know we’re on the right track.
According to the shadow byte legend, the bytes we are accessing are fd
bytes, representing “freed heap”. This makes sense, as we did free the array before indexing it. The surrounding bytes are marked fa
, which the corresponding name “heap left redzone” represents inaccessible bytes around an allocation.
If we access these fa
bytes, we would get a heap overflow. We can see that in action by indexing past an allocated array:
#include <stdlib.h>
int main(void) {
char *array = calloc(10, sizeof(*array));
return array[25];
}
Removing uninteresting output previously covered:
=================================================================
==6899==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000029 at pc 0x558dd040d10f bp 0x7fffc3788d50 sp 0x7fffc3788d48
...
SUMMARY: AddressSanitizer: heap-buffer-overflow /path/to/prog.c:5:12 in main
Shadow bytes around the buggy address:
...
0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c047fff8000: fa fa 00 02 fa[fa]fa fa fa fa fa fa fa fa fa fa
0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
...
Notice that we are not freeing the array before indexing, but are indexing an invalid location in memory after the allocated block, in the fa
shadow byte.
Memory Leaks
Let’s leak some memory!
#include <stdlib.h>
void leaky(void) {
char *array = calloc(10, sizeof(*array));
}
int main(void) {
leaky();
return EXIT_SUCCESS;
}
Here, an array of 10 bytes was dynamically allocated on line 4, but memory was not freed until program exit. Doing the ol’ compileroni:
=================================================================
==7871==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 10 byte(s) in 1 object(s) allocated from:
#0 0x56284121f951 in __interceptor_calloc (/path/to/a.out+0xd7951) (BuildId: 0fe30c0ddc3b7cc7322a206ccc544b39c1e6b8ce)
#1 0x5628412640c6 in leaky /path/to/prog.c:4:19
#2 0x5628412640f3 in main /path/to/prog.c:8:5
#3 0x7efcb6ca578f (/usr/lib/libc.so.6+0x2378f) (BuildId: 4a4bec3d95a1804443e852958fe59ed461135ce9)
SUMMARY: AddressSanitizer: 10 byte(s) leaked in 1 allocation(s).
We get the amount of memory leaked (10 bytes), as well as the stack trace involving the allocation that was leaked.
It might be surprising that the address sanitizer also checks for leaks in addition to checking for invalid memory accesses, but the leak sanitizer was integrated into Address Sanitizer a while back. To only use the leak sanitizer (usually due to performance reasons), build with -fsanitize=leak
instead of -fsanitize=address
. This will link your program with liblsan
instead of libasan
.
Undefined Behavior Sanitizer
The undefined behavior sanitizer catches, you guessed it, undefined behavior at runtime. The list of UB detected is quite comprehensive, so I’ll just list a few noteworthy examples and the corresponding output. Compiling the examples is done similarly to the address sanitizer, just with -fsanitize=undefined
.
-
Signed integer overflow
#include <stdlib.h> #include <stdint.h> int main(void) { int a = INT32_MAX; a++; return EXIT_SUCCESS; }
Note: To get a stacktrace at runtime, set the environment variable
UBSAN_
toOPTIONS print_
.stacktrace=1 prog.c:6:6: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int' #0 0x564cd011f565 in main /path/to/prog.c:6:6 #1 0x7f5cd7bda78f (/usr/lib/libc.so.6+0x2378f) (BuildId: 4a4bec3d95a1804443e852958fe59ed461135ce9) #2 0x7f5cd7bda849 in __libc_start_main (/usr/lib/libc.so.6+0x23849) (BuildId: 4a4bec3d95a1804443e852958fe59ed461135ce9) #3 0x564cd00ef084 in _start (/path/to/a.out+0x4084) (BuildId: 4cec1b5b0a609e14030e079b75a8865ec6700a91) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior prog.c:6:6 in
-
Invalid bitwise shifts
prog.c:5:15: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' int b = 1 << 32; prog.c:5:15: runtime error: shift exponent 32 is too large for 32-bit type 'int'
-
Use of misaligned pointer
// Move pointer forward by one byte int* int_ptr = &local_int; int_ptr = (int*) ((char*) int_ptr + 1); // Try to dereference the pointer to an int int misaligned_read = *int_ptr; prog.c:11:27: runtime error: load of misaligned address 0x7ffd3f4e3865 for type 'int', which requires 4 byte alignment 0x7ffd3f4e3865: note: pointer points here 00 00 00 78 56 34 12 00 ab 51 ca bb 85 fe c1 01 00 00 00 00 00 00 00 90 e7 83 52 cb 7f 00 00 00 ^
-
Array index out of bounds
int b[10] = a[11]; // Output: prog.c:6:13: runtime error: index 11 out of bounds for type 'int[10]'
Thread Sanitizer
The thread sanitizer detects data races in multithreaded code. A data race happens when two threads access the same piece of data at the same time and at least one of the accesses is a write.
As usual, we will write buggy code and hope the sanitizer catches the errors we make.
Since the examples are all in C so far, we will use the pthread
library to create multiple threads.
#include <stdlib.h>
#include <pthread.h>
int global_variable;
void* naughty_access() {
// Access global variable without mutual exclusion
global_variable = 1;
return NULL;
}
int main(void) {
// Spawn naughty thread
pthread_t child_thread = 0;
pthread_create(&child_thread, NULL, naughty_access, NULL);
// Access global variable without mutual exclusion
global_variable = 2;
// Wait for child to exit
pthread_join(child_thread, NULL);
return EXIT_SUCCESS;
}
Here, we have two threads that concurrently write to the same global variable without a lock. Thread sanitizer, unsurprisingly, catches this issue:
==================
WARNING: ThreadSanitizer: data race (pid=24684)
Write of size 4 at 0x55905a34b368 by main thread:
#0 main /path/to/prog.c:19:21 (a.out+0xe6f50) (BuildId: 86a58ccf01eab51a337e278cfe55e47aa9be92e9)
Previous write of size 4 at 0x55905a34b368 by thread T1:
#0 naughty_access /path/to/prog.c:8:21 (a.out+0xe6ed8) (BuildId: 86a58ccf01eab51a337e278cfe55e47aa9be92e9)
Location is global 'global_variable' of size 4 at 0x55905a34b368 (a.out+0x1490368)
Thread T1 (tid=24691, finished) created by main thread at:
#0 pthread_create <null> (a.out+0x671a6) (BuildId: 86a58ccf01eab51a337e278cfe55e47aa9be92e9)
#1 main /path/to/prog.c:16:5 (a.out+0xe6f44) (BuildId: 86a58ccf01eab51a337e278cfe55e47aa9be92e9)
SUMMARY: ThreadSanitizer: data race /path/to/prog.c:19:21 in main
==================
ThreadSanitizer: reported 1 warnings
The error shows two writes to the global global_
: once by the main
function at line 19, column 21, once by the naughty
function at line 8, column 21. Not only that, the output also shows where thread T1 was created in the main thread (line 16, column 5).
Let’s try to fix this by introducing a mutex!
#include <stdlib.h>
#include <pthread.h>
int global_variable;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
void* naughty_access() {
// Access global variable without mutual exclusion
pthread_mutex_lock(&lock);
global_variable = 1;
pthread_mutex_unlock(&lock);
pthread_mutex_unlock(&lock);
return NULL;
}
int main(void) {
// Spawn naughty thread
pthread_t child_thread = 0;
pthread_create(&child_thread, NULL, naughty_access, NULL);
// Access global variable with mutual exclusion
pthread_mutex_lock(&lock);
global_variable = 2;
pthread_mutex_unlock(&lock);
// Wait for child to exit
pthread_join(child_thread, NULL);
return EXIT_SUCCESS;
}
We screwed up; the function naughty_
unlocks the mutex twice. However, TSAN rightly catches this issue:
==================
WARNING: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong thread) (pid=25009)
#0 pthread_mutex_unlock <null> (a.out+0x7b191) (BuildId: 7fbfa7c6e76b8a3891a8c08351cba1f5ae34ed6f)
#1 naughty_access /path/to/prog.c:12:5 (a.out+0xe6f06) (BuildId: 7fbfa7c6e76b8a3891a8c08351cba1f5ae34ed6f)
Location is global 'lock' of size 40 at 0x55f4e7e83368 (a.out+0x1491368)
Mutex M0 (0x55f4e7e83368) created at:
#0 pthread_mutex_lock <null> (a.out+0x94281) (BuildId: 7fbfa7c6e76b8a3891a8c08351cba1f5ae34ed6f)
#1 main /path/to/prog.c:23:5 (a.out+0xe6f74) (BuildId: 7fbfa7c6e76b8a3891a8c08351cba1f5ae34ed6f)
SUMMARY: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong thread) (/path/to/a.out+0x7b191) (BuildId: 7fbfa7c6e76b8a3891a8c08351cba1f5ae34ed6f) in pthread_mutex_unlock
==================
ThreadSanitizer: reported 1 warnings
There are other bugs that thread sanitizer can catch. One dangerous and sometimes hard-to-catch bug is known as lock order inversion. The easiest way to demonstrate this bug is with two mutexes, where the lock order of the mutexes is opposite in each thread:
#include <stdlib.h>
#include <pthread.h>
int global_variable;
pthread_mutex_t lock_1 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t lock_2 = PTHREAD_MUTEX_INITIALIZER;
void* naughty_access() {
// Get lock 1, then lock 2
pthread_mutex_lock(&lock_1);
pthread_mutex_lock(&lock_2);
global_variable = 1;
pthread_mutex_unlock(&lock_2);
pthread_mutex_unlock(&lock_1);
return NULL;
}
int main(void) {
// Spawn naughty thread
pthread_t child_thread = 0;
pthread_create(&child_thread, NULL, naughty_access, NULL);
// Get lock 2, then lock 1
pthread_mutex_lock(&lock_2);
pthread_mutex_lock(&lock_1);
global_variable = 2;
pthread_mutex_unlock(&lock_1);
pthread_mutex_unlock(&lock_2);
// Wait for child to exit
pthread_join(child_thread, NULL);
return EXIT_SUCCESS;
}
With the option second_
, TSAN gives us a wealth of information:
==================
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=26510)
Cycle in lock order graph: M0 (0x56507bb81368) => M1 (0x56507bb81390) => M0
Mutex M1 acquired here while holding mutex M0 in thread T1:
#0 pthread_mutex_lock <null> (a.out+0x94281) (BuildId: 656741c529d9ded1df2ab184e54ea24e7bdd1b03)
#1 naughty_access /path/to/prog.c:11:5 (a.out+0xe6ee4) (BuildId: 656741c529d9ded1df2ab184e54ea24e7bdd1b03)
Mutex M0 previously acquired by the same thread here:
#0 pthread_mutex_lock <null> (a.out+0x94281) (BuildId: 656741c529d9ded1df2ab184e54ea24e7bdd1b03)
#1 naughty_access /path/to/prog.c:10:5 (a.out+0xe6ed8) (BuildId: 656741c529d9ded1df2ab184e54ea24e7bdd1b03)
Mutex M0 acquired here while holding mutex M1 in main thread:
#0 pthread_mutex_lock <null> (a.out+0x94281) (BuildId: 656741c529d9ded1df2ab184e54ea24e7bdd1b03)
#1 main /path/to/prog.c:26:5 (a.out+0xe6f94) (BuildId: 656741c529d9ded1df2ab184e54ea24e7bdd1b03)
Mutex M1 previously acquired by the same thread here:
#0 pthread_mutex_lock <null> (a.out+0x94281) (BuildId: 656741c529d9ded1df2ab184e54ea24e7bdd1b03)
#1 main /path/to/prog.c:25:5 (a.out+0xe6f84) (BuildId: 656741c529d9ded1df2ab184e54ea24e7bdd1b03)
Thread T1 (tid=26517, finished) created by main thread at:
#0 pthread_create <null> (a.out+0x671a6) (BuildId: 656741c529d9ded1df2ab184e54ea24e7bdd1b03)
#1 main /path/to/prog.c:22:5 (a.out+0xe6f74) (BuildId: 656741c529d9ded1df2ab184e54ea24e7bdd1b03)
SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) (/path/to/a.out+0x94281) (BuildId: 656741c529d9ded1df2ab184e54ea24e7bdd1b03) in __interceptor_pthread_mutex_lock
==================
ThreadSanitizer: reported 1 warnings
Looking at the output, we see a warning that thread T1 locks M1 holding M0, but the main thread locks M0 holding M1. Therefore, there is a risk of deadlock if both threads lock their first lock and wait on the second lock held by the other thread. Cool!
That’s true. There are still trivial bugs that aren’t fixed, and as far as I can tell, development has slowed to a crawl. For example, this is a valid and correct (but dumb) code that produces a warning from TSAN:
#include <stdlib.h>
#include <pthread.h>
int global_variable;
pthread_mutex_t lock_1 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t lock_2 = PTHREAD_MUTEX_INITIALIZER;
int main(void) {
pthread_mutex_lock(&lock_2);
pthread_mutex_lock(&lock_1);
global_variable = 1;
pthread_mutex_unlock(&lock_1);
pthread_mutex_unlock(&lock_2);
pthread_mutex_lock(&lock_1);
pthread_mutex_lock(&lock_2);
global_variable = 1;
pthread_mutex_unlock(&lock_2);
pthread_mutex_unlock(&lock_1);
return EXIT_SUCCESS;
}
All in all, the C sanitizer libraries are worth using during development, with benefits far outweighing the drawbacks. Start using them in your projects today!