scvalex.net

16. Debugging Field Guide

I’ve been programming for over a decade now, and, although I don’t usually use debugging tools, there are a few instances where I’ve found them to be indispensable. In this post, we go over a few debugging scenarios where print statements just don’t cut it.

The following will work on compiled languages like C, C++, and, sometimes, OCaml, and Haskell. They will not work on interpreted languages like Python, or Java; the interpreters and VMs used by the latter introduce to much abstraction.

Why did it crash?

gcc -ggdbgdbrbt

We first make sure we compile our source with debugging symbols enabled:

% gcc -ggdb -o fib fib.c

Now, we run the program with the GNU debugger, and see where it crashed:

% gdb ./fib
...
Reading symbols from /home/scvalex/proj/tmp/fib...done.

(gdb) r
Starting program: /home/scvalex/proj/tmp/fib

Program received signal SIGSEGV, Segmentation fault.
0x000000000040055c in fib (n=<error reading variable: Cannot access memory at address 0x7fffff7feffc>)
    at fib.c:3
3   int fib(int n) {

Now that we know where, we will find out why. The backtrace is usually enough for this:

(gdb) bt
#0  0x000000000040055c in fib (
    n=<error reading variable: Cannot access memory at address 0x7fffff7feffc>) at fib.c:3
#1  0x000000000040056c in fib (n=-174658) at fib.c:4
#2  0x000000000040056c in fib (n=-174657) at fib.c:4
#3  0x000000000040056c in fib (n=-174656) at fib.c:4
#4  0x000000000040056c in fib (n=-174655) at fib.c:4
...

We clearly see that fib recursed until it exhausted the stack.

Why did it deadlock?

gcc -ggdbgdbrC-ci thrthr Xbt

We compile our source with debugging symbols enabled, and run it in gdb:

% gcc -Wall -ggdb -o phil -lpthread phil.c
% gdb ./phil

After running it a few times, we see it deadlocking, and interrupt execution with C-c:

(gdb) r
Starting program: /home/scvalex/proj/tmp/phil
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff7814700 (LWP 27980)]
Philosopher 2 is full.
[New Thread 0x7ffff7013700 (LWP 27981)]
[Thread 0x7ffff7814700 (LWP 27980) exited]
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7bc812f in pthread_join () from /lib64/libpthread.so.0

We can inspect the threads to see what’s going on:

(gdb) info threads
  Id   Target Id         Frame
  3    Thread 0x7ffff7013700 (LWP 27981) "phil" 0x00007ffff7bcd824 in __lll_lock_wait ()
   from /lib64/libpthread.so.0
* 1    Thread 0x7ffff7fcd700 (LWP 27976) "phil" 0x00007ffff7bc812f in pthread_join ()
   from /lib64/libpthread.so.0

(gdb) bt
#0  0x00007ffff7bc812f in pthread_join () from /lib64/libpthread.so.0
#1  0x00000000004008ca in main (argc=1, argv=0x7fffffffde58) at phil.c:29

(gdb) thread 3
[Switching to thread 3 (Thread 0x7ffff7013700 (LWP 27981))]
#0  0x00007ffff7bcd824 in __lll_lock_wait () from /lib64/libpthread.so.0

(gdb) bt
#0  0x00007ffff7bcd824 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffff7bc9155 in _L_lock_490 () from /lib64/libpthread.so.0
#2  0x00007ffff7bc8faa in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000004007cc in philosopher1 (chopsticks=0x7fffffffdd20) at phil.c:5
#4  0x00007ffff7bc6ea7 in start_thread () from /lib64/libpthread.so.0
#5  0x00007ffff790014d in clone () from /lib64/libc.so.6

We see that the main thread is waiting in a pthread_join, that only one of the two other threads is still running, and that it’s waiting in pthread_mutex_lock. In our case, it’s now clear that one of the two child threads did not release its locks.


GDB can do much more (some have even called it a REPL for C), but the above cases are, by far, the most frequent uses for it I have found.