I’ve been programming for over a decade now, and, although I don’t usually use debugging tools, there are a few instances where I’ve found them to be indispensable. In this post, we go over a few debugging scenarios where print statements just don’t cut it.
The following will work on compiled languages like C, C++, and, sometimes, OCaml, and Haskell. They will not work on interpreted languages like Python, or Java; the interpreters and VMs used by the latter introduce to much abstraction.
Why did it crash?
gcc -ggdb
→gdb
→r
→bt
We first make sure we compile our source with debugging symbols enabled:
% gcc -ggdb -o fib fib.c
Now, we run the program with the GNU debugger, and see where it crashed:
% gdb ./fib
...
Reading symbols from /home/scvalex/proj/tmp/fib...done.
(gdb) r
Starting program: /home/scvalex/proj/tmp/fib
Program received signal SIGSEGV, Segmentation fault.
0x000000000040055c in fib (n=<error reading variable: Cannot access memory at address 0x7fffff7feffc>)
at fib.c:3
3 int fib(int n) {
Now that we know where, we will find out why. The backtrace is usually enough for this:
(gdb) bt
#0 0x000000000040055c in fib (
n=<error reading variable: Cannot access memory at address 0x7fffff7feffc>) at fib.c:3
#1 0x000000000040056c in fib (n=-174658) at fib.c:4
#2 0x000000000040056c in fib (n=-174657) at fib.c:4
#3 0x000000000040056c in fib (n=-174656) at fib.c:4
#4 0x000000000040056c in fib (n=-174655) at fib.c:4
...
We clearly see that fib
recursed until it exhausted the stack.
Why did it deadlock?
gcc -ggdb
→gdb
→r
→C-c
→i thr
→thr X
→bt
We compile our source with debugging symbols enabled, and run it in gdb
:
% gcc -Wall -ggdb -o phil -lpthread phil.c
% gdb ./phil
After running it a few times, we see it deadlocking, and interrupt execution with C-c
:
(gdb) r
Starting program: /home/scvalex/proj/tmp/phil
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff7814700 (LWP 27980)]
Philosopher 2 is full.
[New Thread 0x7ffff7013700 (LWP 27981)]
[Thread 0x7ffff7814700 (LWP 27980) exited]
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7bc812f in pthread_join () from /lib64/libpthread.so.0
We can inspect the threads to see what’s going on:
(gdb) info threads
Id Target Id Frame
3 Thread 0x7ffff7013700 (LWP 27981) "phil" 0x00007ffff7bcd824 in __lll_lock_wait ()
from /lib64/libpthread.so.0
* 1 Thread 0x7ffff7fcd700 (LWP 27976) "phil" 0x00007ffff7bc812f in pthread_join ()
from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007ffff7bc812f in pthread_join () from /lib64/libpthread.so.0
#1 0x00000000004008ca in main (argc=1, argv=0x7fffffffde58) at phil.c:29
(gdb) thread 3
[Switching to thread 3 (Thread 0x7ffff7013700 (LWP 27981))]
#0 0x00007ffff7bcd824 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007ffff7bcd824 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007ffff7bc9155 in _L_lock_490 () from /lib64/libpthread.so.0
#2 0x00007ffff7bc8faa in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00000000004007cc in philosopher1 (chopsticks=0x7fffffffdd20) at phil.c:5
#4 0x00007ffff7bc6ea7 in start_thread () from /lib64/libpthread.so.0
#5 0x00007ffff790014d in clone () from /lib64/libc.so.6
We see that the main thread is waiting in a pthread_join
, that only one of the two other threads is still running, and that it’s waiting in pthread_mutex_lock
. In our case, it’s now clear that one of the two child threads did not release its locks.
GDB can do much more (some have even called it a REPL for C), but the above cases are, by far, the most frequent uses for it I have found.