Fortran debugging

Compiling with various gfortran flags

There are a number of flags you can use when compiling your code that will make it easier to debug.

Here’s a generic set of options you might try

$ gfortran -g -Wall -Wextra -fcheck=bounds -pedantic-errors \
       -ffpe-trap=zero,invalid,overflow,underflow  program.f90

See Fortran Flags or the gfortran man page for more information. Most of these options indicate that the program should give warnings or die if certain bad things happen.

Compiling with the -g flag indicates that information should be generated and saved inside the executable during compilation. This information can be used to help debug the code through a debugger such as gdb or lldb. You generally have to compile with this option to use a debugger.

The gdb debugger

GDB is the GNU open source debugger for GNU compilers such as gfortran. Unfortunately it often works poorly on MacOS (GDB works better on Linux). You may find that lldb works better on Mac, and functions in essentially the same way. See more on GDB commands. Also, take a look at this nice GDB tutorial.

Note

Due to the security policies on macOS, it could be very painful to install gdb on there. Reportedly, the situation has improved with the release of the Catalina version of macOS, and installation from homebrew should succeed. High Sierra had little to no compatibility, though you may try these instructions. I recommend using a Linux system for gdb if possible, or to use lldb on macOS. The commands for lldb and gdb are not quite the same, but this command map should help.

Consider the following example:

 1program segfault1
 2  implicit none
 3
 4  real, dimension(10) :: a
 5  integer :: i
 6  a = 0.
 7  do i = 1, 5000, 199
 8    a(i) = i
 9    print*,a(i)
10  end do
11
12end program segfault1

Download this code

First compile the code with

$ gfortran segfault1.f90

and run it. You should see something like

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f9115d13d9f in ???
#1  0x563962fa4202 in ???
#2  0x563962fa42f3 in ???
#3  0x7f9115cfeb24 in ???
#4  0x563962fa40bd in ???
#5  0xffffffffffffffff in ???
[1]    34879 segmentation fault (core dumped)  ./a.out

Now if you compile it with a -g flag

$ gfortran -g segfault1.f90 -o segfault.ex

and run it again you should see something like:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7fce73accd9f in ???
#1  0x55978c040202 in segfault1
        at /<some path>/LectureF22/Fortran/segfault1.f90:8
#2  0x55978c0402f3 in main
        at /<some path>/LectureF22/Fortran/segfault1.f90:12
[1]    34985 segmentation fault (core dumped)  ./a.out

Now let’s see what happens if we run it inside of GDB. We do this by passing the executable name as an argument to GDB:

$ gfortran -g segfault1.f90 -o segfault.ex
$ gdb segfault.ex

Note that GDB does not start running your program immediately. Instead it gives you time to setup any breakpoints or mark any variables to watch. Let’s set a breakpoint at line 7 using break segfault1.f90:7 and then run the executable using r. You should see something similar to this:

(gdb) break segfault1.f90:8
Breakpoint 1 at 0x11ea: file segfault1.f90, line 8.
(gdb) r
Starting program: /<some path>/LectureF22/Fortran/segfault.ex

Breakpoint 1, segfault1 () at segfault1.f90:8
8           a(i) = i
(gdb) print a
1: a = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
(gdb) print i
2: i = 1
(gdb) c
Continuing.
   1.00000000

Breakpoint 1, segfault1 () at segfault1.f90:8
8           a(i) = i
1: a = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
2: i = 200
(gdb) c
Continuing.
   200.000000

Breakpoint 1, segfault1 () at segfault1.f90:8
8           a(i) = i
1: a = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
2: i = 399
(gdb) c
Continuing.
   399.000000

Continuing this, you will eventually see:

Breakpoint 1, segfault1 () at segfault1.f90:8
8           a(i) = i
1: a = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
2: i = 1195
(gdb) c
Continuing.
   1195.00000

Breakpoint 1, segfault1 () at segfault1.f90:8
8           a(i) = i
1: a = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
2: i = 1394
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000555555555202 in segfault1 () at segfault1.f90:8
8        a(i) = i
1: a = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
2: i = 1394

This at least reveals the error happened when the compiler tried to access a(i) with i=1394, which is way beyond the limits of the array. The print command tells gdb to write the value of the variable to the screen at every break. The c command (short for continue) tells gdb to run until the next breakpoint without leaving the current subroutine.

You can use the next or n command to run the code one line at a time, without re-routing into function/subroutine calls. Finally, the step or s command will run one line at a time and attempt to step inside function/subroutine calls.

In this example we’ve seen that the trouble comes from the i variable. Instead of writing print i all the time we could use the watch command. This is kind of like a breakpoint, but it triggers every time the value of variable changes regardless of what line caused the change. Consider the following session:

(gdb) break segfault1.f90:8
Breakpoint 1 at 0x11ea: file segfault1.f90, line 8.
(gdb) r
Starting program: /<some path>/LectureF21/Fortran/segfault.ex

Breakpoint 1, segfault1 () at segfault1.f90:8
8           a(i) = i
(gdb) watch i
Hardware watchpoint 2: i
(gdb) disable 1
(gdb) c
Continuing.

<...>

(gdb) c
Continuing.
   1195.00000

Hardware watchpoint 2: i

Old value = 1195
New value = 1394
0x0000555555555291 in segfault1 () at segfault1.f90:7
7           do i = 1, 5000, 199
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000555555555202 in segfault1 () at segfault1.f90:8
8           a(i) = i
(gdb) c
Continuing.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7ffff799ed9f in ???
#1  0x555555555202 in segfault1
         at /<some path>/LectureF21/Fortran/segfault1.f90:8
#2  0x5555555552f3 in main
         at /<some path>/LectureF21/Fortran/segfault1.f90:12

Program received signal SIGSEGV, Segmentation fault.

Then, after failure you can examine the backtrace with the command bt, and visit different stack frames with frame <num>. This is powerful indeed!

Note that some of the same information that we got interactively before, can be obtained from the stack frames. This will often be easier than specifying print and stepping forward with s or n. Just use whatever suits the situation at hand!

Note

Let’s summarize the commands we’ve used:

  • break <source file>:<line number> sets a breakpoint at the specified line in that file. Execution will be paused each time that line is executed.

  • run or r starts the run of the executable.

  • print <var> or p <var will show the current value of the variable. Note that you can display more complex queries like print a(i).

  • watch <var> watches a variable and pauses execution every time it’s value changes. The variable must be in the current scope, so you may need a breakpoint to get there first.

  • c, n, and s all resume execution after it has been paused. c runs until the next break or watch point, n runs line by line, and s runs line by line as well as into function/subroutine calls.

  • bt shows the backtrace, and frame <num> lets you move around in that trace to look around.

  • q quits execution and kills the process.

On the other hand, if you compile it with the -g and -fcheck=bounds flags

$ gfortran -g -fbounds-check segfault1.f90

and running it again (even without gdb), you see now

$ ./segfault
$    1.00000000
$ At line 8 of file segfault1.f90
$ Fortran runtime error: Index '200' of dimension 1 of array 'a' above upper bound of 10
$
$ Error termination. Backtrace:
$ #0  0x55b0a0bb424c in segfault1
$    at /<some path>/LectureF22/Fortran/segfault1.f90:8
$ #1  0x55b0a0bb4396 in main
$    at /<some path>/LectureF22/Fortran/segfault1.f90:12

Valgrind

Valgrind is a freely available open source programming tool for detecting many memory leaks, memory bugs, and additionally provides profiling information. Originally, it was designed as a free memory debugging tool for Linux, and is now also available on Mac OS, Solaris, and even Android.

To use Valgrind for debugging, take the following steps:

  1. Use package manager

    • On Linux use your distribution’s package manager (e.g. sudo apt install valgrind, sudo pacman -S valgrind, or whatever)

    • On macOS try homebrew (e.g., brew install valgrind, followed by brew link valgrind if needed). If this doesn’t work, and generates some complaints about ruby updates, please follow the instructions here and here.

  2. Download the recent Valgrind release (e.g., 3.19 released April 11, 2022) from the Valgrind website. Untar it (e.g., tar -xvf valgrind-3.15.0.tar.bz2, open the README and follow the steps therein. The installation should look something like this:

    $ ./configure --prefix=/usr/local/opt/valgrind
    $ make
    $ make install
    

    If the last command make install doesn’t work due to permission, try sudo make install.

Note

The closed source nature of MacOS means that Valgrind may or may not be fully supported depending on your version of MacOS. If installation fails using your package manager then you should try installing it from source using the second option.

Recall our buggy code:

 1program segfault1
 2  implicit none
 3
 4  real, dimension(10) :: a
 5  integer :: i
 6  a = 0.
 7  do i = 1, 5000, 199
 8    a(i) = i
 9    print*,a(i)
10  end do
11
12end program segfault1

Download this code

To use Valgrind, first ensure that you compile the code with various useful debugging flags, and at the very least use the -g flag. E.g.:

$ gfortran -g -Wall -Wextra -Wimplicit-interface segfault1.f90 -o segfault1

Then pass the executable to Valgrind. There are many flags to control how Valgrind behaves and what it reports. On MacOS, it is suggested to include --dsymutil=yes, but it is unnecessary on Linux.

$ valgrind  --leak-check=full --dsymutil=yes --track-origins=yes ./segfault1

If you compile the code without such flags in the above (-g is the most important one), then you probably won’t get useful information (e.g., line numbers in the source files) from running Valgrind.

Note

Valgrind and GDB satisfy different use cases. GDB is inherently interactive, letting you poke at the executable while it runs. It also has very low overhead, and can be used during fairly large runs of the code. However, you need to put more work into it to get very complete information out of it.

Valgrind has much higher overhead, and can be hard to use on large runs of the code. This overhead allows Valgrind to report back very comprehensive information, especially around memory leaks and errors. Valgrind also includes a host of other more advanced tools.

Together GDB and Valgrind (and good old print statements!) cover most debugging needs.

Question: Can you think of any types of errors/bugs that are best handled by print statements compared to these more powerful and capable tools?