.. _ch02-fortran-debugging:

=============================================================
Fortran debugging
=============================================================

Print statements
----------------

Adding print statements to a program is a tried and true method of debugging, and probably the
the most ubiquitous method in use. This is not because it is the best method, but rather because
it is the simplest. That said, print statements are still incredibly useful, and in spite of
their simplicity it is still worth looking at them in a little more detail.

Print statements can be added almost anywhere in a Fortran code to print things out to the
terminal as it chugs along. You might want to put some special symbols in debugging statements
to flag them as such, which makes it easier to see what output is your debug output. It also makes
it easier to find them again later to remove from the code, e.g. you might use ``+++`` or ``DEBUG``.  

Recall that you can use the upper case file extension ``.F90`` which allows the use of the C-style
preproccesor. In this way, you can write your code to print out debugging statements only when
desired. Consider this example:

.. literalinclude:: ./codes/segfault.F90
   :language: fortran
   :linenos:

:download:`Download this code <./codes/segfault.F90>`
      
Here you can turn on the print statements at compile time by adding ``-DDEBUG_MODE``
to your compile flags (note the leading ``D`` which stands for `define`).

Compiling with various gfortran flags
---------------------------------------

There are a number of flags you can use when compiling your code that will
make it easier to debug.

Here's a generic set of options you might try

.. code-block:: console

   $ gfortran -g -Wall -Wextra -fcheck=bounds -pedantic-errors \
          -ffpe-trap=zero,invalid,overflow,underflow  program.f90

See :ref:`ch02-fortran-flags` or the `gfortran man page <http://linux.die.net/man/1/gfortran>`_ 
for more information. Most of these options indicate that the program should give warnings or
die if certain bad things happen.

Compiling with the ``-g`` flag indicates that information should be generated and saved inside the
executable during compilation. This information can be used to help debug the code through a
debugger such as ``gdb`` or ``lldb``. You generally have to compile with this option to
use a debugger.

The ``gdb`` debugger
----------------------

`GDB <https://www.gnu.org/software/gdb/>`_ is the GNU open source debugger for GNU
compilers such as gfortran. Unfortunately it often works poorly on MacOS (GDB works
better on Linux). You may find that `lldb <http://lldb.llvm.org>`_
works better on Mac, and functions in essentially the same way.
See more on `GDB commands <http://www.yolinux.com/TUTORIALS/GDB-Commands.html>`_.
Also, take a look at this nice
`GDB tutorial <https://www.cs.umd.edu/~srhuang/teaching/cmsc212/gdb-tutorial-handout.pdf>`_.

.. note::

   Due to the security policies on macOS, it could be very painful to install
   ``gdb`` on there. Reportedly, the situation has improved with the release of
   the `Catalina` version of macOS, and installation from ``homebrew`` should
   succeed. `High Sierra` had little to no compatibility, though you may try these
   `instructions <https://sourceware.org/gdb/wiki/BuildingOnDarwin>`_.
   I recommend using a Linux system for ``gdb`` if possible,
   or to use ``lldb`` on macOS. The commands for ``lldb`` and ``gdb`` are not quite
   the same, but this `command map <https://lldb.llvm.org/lldb-gdb.html>`_ should help.


Consider the following example:

.. literalinclude:: ./codes/segfault1.f90
   :language: fortran
   :linenos:

:download:`Download this code <./codes/segfault1.f90>`
      
First compile the code with

.. code-block:: console

   $ gfortran segfault1.f90

and run it. You should see something like

.. code-block:: console

   Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

   Backtrace for this error:
   #0  0x7f9115d13d9f in ???
   #1  0x563962fa4202 in ???
   #2  0x563962fa42f3 in ???
   #3  0x7f9115cfeb24 in ???
   #4  0x563962fa40bd in ???
   #5  0xffffffffffffffff in ???
   [1]    34879 segmentation fault (core dumped)  ./a.out

Now if you compile it with a ``-g`` flag

.. code-block:: console

   $ gfortran -g segfault1.f90 -o segfault.ex

and run it again you should see something like:

.. code-block:: console

   Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

   Backtrace for this error:
   #0  0x7fce73accd9f in ???
   #1  0x55978c040202 in segfault1
           at /<some path>/LectureF22/Fortran/segfault1.f90:8
   #2  0x55978c0402f3 in main
	   at /<some path>/LectureF22/Fortran/segfault1.f90:12
   [1]    34985 segmentation fault (core dumped)  ./a.out

Now let's see what happens if we run it inside of GDB. We do this by passing the
executable name as an argument to GDB:

.. code-block:: console

   $ gfortran -g segfault1.f90 -o segfault.ex
   $ gdb segfault.ex

Note that GDB does not start running your program immediately. Instead it gives
you time to setup any breakpoints or mark any variables to watch. Let's set a breakpoint
at line 7 using ``break segfault1.f90:7`` and then run the executable using ``r``.
You should see something similar to this:

.. code-block:: console

   (gdb) break segfault1.f90:8
   Breakpoint 1 at 0x11ea: file segfault1.f90, line 8.
   (gdb) r
   Starting program: /<some path>/LectureF22/Fortran/segfault.ex 

   Breakpoint 1, segfault1 () at segfault1.f90:8
   8	       a(i) = i
   (gdb) print a
   1: a = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
   (gdb) print i
   2: i = 1
   (gdb) c
   Continuing.
      1.00000000    

   Breakpoint 1, segfault1 () at segfault1.f90:8
   8	       a(i) = i
   1: a = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
   2: i = 200
   (gdb) c
   Continuing.
      200.000000

   Breakpoint 1, segfault1 () at segfault1.f90:8
   8           a(i) = i
   1: a = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
   2: i = 399
   (gdb) c
   Continuing.
      399.000000

Continuing this, you will eventually see:

.. code-block:: console

   Breakpoint 1, segfault1 () at segfault1.f90:8
   8	       a(i) = i
   1: a = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
   2: i = 1195
   (gdb) c
   Continuing.
      1195.00000    

   Breakpoint 1, segfault1 () at segfault1.f90:8
   8	       a(i) = i
   1: a = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
   2: i = 1394
   (gdb) c
   Continuing.

   Program received signal SIGSEGV, Segmentation fault.
   0x0000555555555202 in segfault1 () at segfault1.f90:8
   8	    a(i) = i
   1: a = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
   2: i = 1394

This at least reveals the error happened when the compiler tried to
access ``a(i)`` with ``i=1394``, which is way beyond the limits of the array.
The ``print`` command tells gdb to write the value of the variable to
the screen at every break. The ``c`` command (short for ``continue``) tells gdb
to run until the next breakpoint without leaving the current subroutine.

You can use the ``next`` or ``n`` command to run the code one line at a time,
without re-routing into function/subroutine calls. Finally, the ``step`` or
``s`` command will run one line at a time and attempt to step inside
function/subroutine calls.

In this example we've seen that the trouble comes from the ``i`` variable. Instead
of writing ``print i`` all the time we could use the ``watch`` command. This is
kind of like a breakpoint, but it triggers every time the value of variable changes
regardless of what line caused the change. Consider the following session:

.. code-block:: console

   (gdb) break segfault1.f90:8
   Breakpoint 1 at 0x11ea: file segfault1.f90, line 8.
   (gdb) r
   Starting program: /<some path>/LectureF21/Fortran/segfault.ex 

   Breakpoint 1, segfault1 () at segfault1.f90:8
   8	       a(i) = i
   (gdb) watch i
   Hardware watchpoint 2: i
   (gdb) disable 1
   (gdb) c
   Continuing.

   <...>

   (gdb) c
   Continuing.
      1195.00000

   Hardware watchpoint 2: i

   Old value = 1195
   New value = 1394
   0x0000555555555291 in segfault1 () at segfault1.f90:7
   7           do i = 1, 5000, 199
   (gdb) c
   Continuing.

   Program received signal SIGSEGV, Segmentation fault.
   0x0000555555555202 in segfault1 () at segfault1.f90:8
   8           a(i) = i
   (gdb) c
   Continuing.

   Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
   
   Backtrace for this error:
   #0  0x7ffff799ed9f in ???
   #1  0x555555555202 in segfault1
            at /<some path>/LectureF21/Fortran/segfault1.f90:8
   #2  0x5555555552f3 in main
            at /<some path>/LectureF21/Fortran/segfault1.f90:12
   
   Program received signal SIGSEGV, Segmentation fault.

Then, after failure you can examine the backtrace with the command ``bt``, and
visit different stack frames with ``frame <num>``. This is powerful indeed!

Note that some of the same information that we got interactively before, can be
obtained from the stack frames. This will often be easier than specifying ``print``
and stepping forward with ``s`` or ``n``. Just use whatever suits the situation at hand!

.. note::
   Let's summarize the commands we've used:

   * ``break <source file>:<line number>`` sets a breakpoint at the specified line in that
     file. Execution will be paused each time that line is executed.

   * ``run`` or ``r`` starts the run of the executable.

   * ``print <var>`` or ``p <var`` will show the current value of the variable. Note that
     you can display more complex queries like ``print a(i)``.

   * ``watch <var>`` watches a variable and pauses execution every time it's value changes.
     The variable must be in the current scope, so you may need a breakpoint to get there
     first.

   * ``c``, ``n``, and ``s`` all resume execution after it has been paused. ``c`` runs until
     the next break or watch point, ``n`` runs line by line, and ``s`` runs line by line as well
     as into function/subroutine calls.

   * ``bt`` shows the backtrace, and ``frame <num>`` lets you move around in that trace to look
     around.

   * ``q`` quits execution and kills the process.

On the other hand, if you compile it with the ``-g`` and ``-fcheck=bounds`` flags

.. code-block:: console

   $ gfortran -g -fbounds-check segfault1.f90

and running it again (even without gdb), you see now

.. code-block:: console

   $ ./segfault 
   $    1.00000000    
   $ At line 8 of file segfault1.f90
   $ Fortran runtime error: Index '200' of dimension 1 of array 'a' above upper bound of 10
   $ 
   $ Error termination. Backtrace:
   $ #0  0x55b0a0bb424c in segfault1
   $ 	at /<some path>/LectureF22/Fortran/segfault1.f90:8
   $ #1  0x55b0a0bb4396 in main
   $ 	at /<some path>/LectureF22/Fortran/segfault1.f90:12
  

Valgrind
--------------------
`Valgrind <http://valgrind.org>`_ is a freely available open source programming tool for
detecting many memory leaks, memory bugs, and additionally provides profiling information.
Originally, it was designed as a free memory debugging tool for Linux, and is now also
available on Mac OS, Solaris, and even Android.

To use Valgrind for debugging, take the following steps:

#. Use package manager
   
   * On Linux use your distribution's package manager (e.g. ``sudo apt install valgrind``,
     ``sudo pacman -S valgrind``, or whatever)
     
   * On macOS try ``homebrew`` (e.g., ``brew install valgrind``, followed by ``brew link valgrind``
     if needed). If this doesn't work, and generates some complaints about ruby updates, please
     follow the instructions `here <https://gorails.com/setup/osx/10.12-sierra>`_
     `and here <https://stackoverflow.com/questions/36485180/how-to-update-ruby-with-homebrew>`_.
      
#. Download the recent Valgrind release (e.g., 3.19 released April 11, 2022) from the Valgrind
   `website <http://valgrind.org/downloads/current.html>`_. Untar it (e.g.,
   ``tar -xvf valgrind-3.15.0.tar.bz2``, open the README and follow the steps therein. The
   installation should look something like this:
   
   .. code-block:: console

      $ ./configure --prefix=/usr/local/opt/valgrind
      $ make
      $ make install

   If the last command ``make install`` doesn't work due to permission, try ``sudo make install``.

.. note::

   The closed source nature of MacOS means that Valgrind may or may not be fully supported
   depending on your version of MacOS. If installation fails using your package manager then
   you should try installing it from source using the second option.

Recall our buggy code:

   .. literalinclude:: ./codes/segfault1.f90
      :language: fortran
      :linenos:

:download:`Download this code <./codes/segfault1.f90>`
	 
To use Valgrind, first ensure that you compile the code with various useful debugging flags,
and at the very least use the ``-g`` flag. E.g.:

.. code-block:: console

   $ gfortran -g -Wall -Wextra -Wimplicit-interface segfault1.f90 -o segfault1

Then pass the executable to Valgrind. There are many flags to control how Valgrind behaves and
what it reports. On MacOS, it is suggested to include ``--dsymutil=yes``, but it is unnecessary
on  Linux.

.. code-block:: console
     
   $ valgrind  --leak-check=full --dsymutil=yes --track-origins=yes ./segfault1

If you compile the code without such flags in the above (``-g`` is the most important one),
then you probably won't get useful information (e.g., line numbers in the source files) from
running Valgrind.

.. note::

   Valgrind and GDB satisfy different use cases. GDB is inherently interactive, letting you poke
   at the executable while it runs. It also has very low overhead, and can be used during fairly
   large runs of the code. However, you need to put more work into it to get very complete
   information out of it.

   Valgrind has much higher overhead, and can be hard to use on large runs of the code. This
   overhead allows Valgrind to report back very comprehensive information, especially around
   memory leaks and errors. Valgrind also includes a host of other more advanced tools.

   Together GDB and Valgrind (and good old print statements!) cover most debugging needs.

**Question:** Can you think of any types of errors/bugs that are *best* handled by print
statements compared to these more powerful and capable tools?