Clemson Home  >  CCIT HomeSkip NavigationA-Z Index    Calendar    CU Safety    Map    Webcams    Phonebook    

Debugging your code

This is probably not your first program and not the first machine you have used. Like all of us, anytime you write or modify your program, there is a good chance you will introduced some "bugs". Compilers differ in how they generate the executable in general and specifically depending on the compiler options you choose. For example, increasing optimization can alter how calculations are ordered and how floating point numbers are calculated. A particular algorithm may not tolerate such adjustments and things can "blow up".

In this section we will describe the general options that are available on most compilers and on the Intel(R) compilers specifically. Those available only on the Intel compilers will have (Intel) along side.

General debugging options

Several options give general debug information or get the debugging process started.

Option Description
-g Tells the compiler to generate symbolic debug information in the executable. For example, while the compiler generally gives your variables its own internal name, this option causes it to retain copies of your names along with the routine names and line numbers at which they are referenced.
-traceback Tells the compiler to generate extra information in the executable file to provide better traceback information (that is, tracing from the error back through the calling subroutines and procedures.) (Intel)
-inline-debug-info Tells the compiler to generate enhanced information for inlined code. This provides more accurate location information when reporting debug information and gives better traceback information. The -g option must also be specified in order for this to take effect. (Intel)

Compiler error messages

After decades of development, many compilers today are quite advanced. They produce very efficient executable code (sometimes despite our own lack of effort.) In order to do this they collect a great deal of information about everything from symbol names through how (non)standard your code may be. In the beginning you will probably just want to use the default compiler message levels. As time goes by, you may want to increase or decrease the level depending on the development stage of your code.

In this section we will talk about how to vary these message levels for debugging purposes.

Warning message settings are available across all compilers. In general, the -w parameter will suppress all warning messages. But for debugging purposes, you may want to increase them. Additional, helpful checks include:

  • Uninitialized variables - While some compilers/loaders automatically initialize variables, most do not. Or it could be a typo.
    • Fortran: Use -CU with the Intel compiler and -Wuninitialized with GNU compiler.
    • C: Use -Wuninitialized with both the Intel and GNU compilers.
  • Unused variables - This could mean at typo.
    • Fortran: Use -warn unused with the Intel compiler and -Wunused with the GNU compiler.
    • C: Use -Wunused-variable with both the Intel and GNU compilers.
  • Undeclared variables - If you typically declare the type of all variables, then this could mean a typo.
    • Fortran: Use -warn declarations with the Intel compiler and -Wimplicit with the GNU compiler.
    • C: Use -Wmissing-declarations with both the Intel and GNU compilers.

These are some good settings to start with. Many more settings exist, particularly with the C compilers. For more information on those, see the associated compiler man pages.

Enhanced debugging information

The Intel compilers also provide the ability to request some enhanced information. (Note that as you ask the compiler to do more and more investigative work, compile time and run time will become longer.) For either C or Fortran (Intel), specify:

  • -debug all or -debug full for complete debugging information.
  • -debug minimal for line number information.
  • -debug variable-locations for enhanced debug information useful in finding scalar local variables.
  • -debug inline_debug_info for information for inlined code. It provides more information to debuggers for function call traceback.

Diagnostic messages

Optimizing compilers will often have components that perform the compilation in stages . For example, the Intel compilers include a preprocessor, vectorizer, and auto-parallelizer. You can also get some diagnostic information from these stages. For either C or Fortran (Intel), specify:

  • -diag-enable driver to get diagnostic messages issued by the compiler driver.
  • -diag-enable vec[n] to get diagnostic messages issued by the vectorizer.
  • -diag-enable par[n] to get diagnostic messages issued by the auto-parallelizer (parallel optimizer).

where n can be

1 = all critical errors
2 = all errors (default)
3 = all errors and warnings

You can also direct the compiler to put the diagnostics into a particular file (-diag-file[=<file>]) or to print the diagnostics and then stop the compilation (-diag-dump.)

Runtime checks

The Fortran compilers provides some extra options for setting up some run time checking. That is, extra code is inserted that will check certain conditions while the code is running.

  • Generate compile-time and run-time checks on array subscript and character substring expressions. This will help you determine if a subscript or substring expression has grown too large or too small, which may be causing other parts of memory to be overwritten.
    • For Intel Fortran: -check bounds
    • For GNU Fortran: -fbounds-check
  • Issue a fatal error when the data type of an item being formatted for output does not match the format descriptor being used.
    • For Intel Fortran: -check format
  • Enable run-time checking for disassociated or uninitialized Fortran pointers, unallocated allocatable objects, and integer pointers that are uninitialized.
    • For Intel Fortran: -check pointers
  • Generate code to check for uninitialized variables.
    • For Intel Fortran: -check uninit (This is the same as -CU.)
  • Enable all of the above.
    • For Intel Fortran: -check all

Floating point exceptions

A "floating point exception" (fpe) basically means that an element has either never been defined, has overflowed the register during a calculation, or has underflowed the register during a calculation. The error generally doesn't appear until you try to use it in another calculation (at which point you will likely see a program crash) or until you try to print it (at which point you may see results like "UNDEF" for an undefined number, "INF" for infinite or overflow, or "NaN" for not a number.) The spot where the error manifests itself may not be anywhere near where the calculation went astray. So floating point exception handling options come to the rescue.

The Intel Fortran compiler provides fpe checking with

-fpe<n>

where n can be

Value Description
0 Floating-point invalid, divide-by-zero, and overflow exceptions are enabled. Execution is terminated.
1 All floating-point exceptions are disabled.
3 All floating-point exceptions are disabled. Floating-point underflow is gradual, unless you explicitly specify a compiler option that enables flush-to-zero.

See man ifort, under -fpe for more discussion of these options.

Several additional floating point checking tools that may be of interest to you include:

  • -fp-stack-check to check every function call to ensure that the floating-point stack is in the expected state.
  • -fmath-errno to test errno after calls to math library functions.
  • -fltconsistency to enable improved floating-point consistency. Floating point instructions are not reordered and are stored in their target variable after each operation.
  • -IPF-fltacc to disable optimizations that affect floating-point accuracy.

Each of these will give information toward helping you determine your coding problems. But note that they restrict optimization and increase overhead. Thus your code will run slower and, in some cases, significantly slower. Use these only as needed to improve accuracy or to discover and correct your coding problems.

Checking for non-standard Fortran or C

Most computing vendors offer special features in their processors to make them more attractive than those of "the other guy". These features do not always adhere to the various language standards and are generally called in via "compiler extensions" or additional options. If you are bringing your program over from another machine or if you are bringing in a code of unknown origin, it might not be a bad idea to check it for non-standard language uses. This is easily done with a compiler option.

For Fortran use

-e<version>

where version can be

Version Description
03 causes the compiler to issue errors instead of warnings for nonstandard Fortran.
90 causes the compiler to issue errors instead of warnings for nonstandard Fortran 90.
95 causes the compiler to issue errors instead of warnings for nonstandard Fortran 95.

Similarly for C use

-std<version>

where version can be

Version Description
c89 causes the compiler to check that the code conforms to the ISO/IEC 9899:1990 International Standard.
c99 causes the compiler to check that the code conforms to the ISO/IEC 9899:1999 International Standard.
gnu89 causes the compiler to check that the code conforms to ISO C90 plus GNU* extensions. (Default for C.).
gnu++98 causes the compiler to check that the code conforms to the 1998 ISO C++ standard plus GNU extensions. (Default for C++.)
c++0x causes the compiler to enable support for a number of C++0x features (see the Intel C++ Compiler Documentation for details).

Debugging parallel codes

When you bring your program code over to palmetto, the first step is to run it successfully as a serial job on several test cases. Once you are sure it is running correctly on a single processor, it is time to try a parallel run.

As noted in the section Compiling your parallel code, there are several ways to approach parallelism. And when taking this step, it is easy to introduce programming and logic errors. As it stands there are only a few options that will help you debug your parallel code. (Intel)

Specifying

-O0 -openmp

will help to debug OpenMP applications.

Diagnostic messages are available from the preprocessors by specifying

-diag-enable vec[n] to get diagnostic messages issued by the vectorizer.
-diag-enable par[n] to get diagnostic messages issued by the auto-parallelizer (parallel optimizer).

where n can be

1 = all critical errors
2 = all errors (default)
3 = all errors and warnings

Debugging examples

There are many ways to combine the options that have been shown in this section. We will provide a few simple examples to get you started.

In Fortran, we can look at an implementation of the Pade approximant

        subroutine pade(t,z,up,ud,dupdz,j,dumdz)
        implicit double precision (a-h,o-z)
        include 'param.h'
        double precision t(ns)
        double precision z(igamma)
        double precision a(igamma),b(igamma),c(igamma),cc(igamma),
     1     d(igamma),cb(igamma)
        double precision ud(igamma)
        double precision up(igamma)
        double precision dupdz(igamma)
        double precision dumdz(igamma)
        double precision f(6)
        common/delt/deltaz,deltat,b4
c
c       Calculate tridiagonal solution
c
        do i = 1,6
        f(i) = float(i)/0.0
        end do
        a(1) = 0.0
        do i = 2,igamma
        a(i) = f(1)
        end do
c       b(1) = f(4)
        b(1) = f(2)
        do i = 2,igamma-1
        b(i)=f(4)
        end do
c       b(igamma) = f(4)
        b(igamma) = f(2)
        do i = 1,igamma-1
        c(i) = f(1)
        end do
        c(igamma)=0.0
c
c       For the solution, define
c
        cc(1) = 1.0/b(1)
        do i = 2,igamma
        cc(i) = 1.0/(b(i)-a(i)*c(i-1)*cc(i-1))
        end do
        do i = 1,igamma
        d(i) = -c(i)*cc(i)
        end do
        cb(1) = cc(1)*dumdz(1)
        do i = 2,igamma
        cb(i) = cc(i)*(dumdz(i)-a(i)*cb(i-1))
        end do
c
c       Keep endpoint value fixed at this time.
c
        dupdz(igamma) = cb(igamma)
        do i = igamma-1, 1, -1
        dupdz(i) = d(i)*dupdz(i+1) + cb(i)
        end do
c
        return
        end

with a typo built in. Compiling it with

ifort -g -fpe0 -traceback -o pade pade.f

will tell the compiler to generate symbolic information in the executable and terminate with an error on any floating point exceptions. Call traceback information will be included. A possible outcome would be

[myuserid@user001 pade]$ pade
Function exp(2.0*(zp-tp))

Deltaz = 0.5000 Deltat = 0.0500
forrtl: error (73): floating divide by zero
Image              PC                Routine            Line        Source             
pade               00000000004040F0  Unknown               Unknown  Unknown
pade               00000000004032F7  Unknown               Unknown  Unknown
pade               0000000000402B98  Unknown               Unknown  Unknown
pade               0000000000402B02  Unknown               Unknown  Unknown
libc.so.6          0000003CDB41D8A4  Unknown               Unknown  Unknown
pade               0000000000402A29  Unknown               Unknown  Unknown
Aborted

In this example we get some information - that we have a divide by zero. But the location isn't given in easily readable form. See the section below on using gdb and idb for some additional help.

In C, we can take a look at Wikipedia's Bellman Ford example

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>

/* Let INFINITY be an integer value not likely to be
   confused with a real weight, even a negative one. */
#define INFINITY ((1 << 14)-1)

typedef struct {
    int source;
    int dest;
    int weight;
} Edge;

void BellmanFord(Edge edges[], int edgecount, int nodecount, int source)
{
    int *distance = malloc(nodecount * sizeof *distance);
    int i, j;
    for (i=0; i < nodecount; ++i)
      distance[i] = INFINITY;
    distance[source] = 0;

    for (i=0; i < nodecount; ++i) {
        for (j=0; j < edgecount; ++j) {
            if (distance[edges[j].source] != INFINITY) {
                int new_distance = distance[edges[j].source] + edges[j].weight;
                if (new_distance < distance[edges[j].dest])
                  distance[edges[j].dest] = new_distance;
            }
        }
    }

    for (i=0; i < edgecount; ++i) {
        if (distance[edges[i].dest] > distance[edges[i].source] + edges[i].weight) {
            puts("Negative edge weight cycles detected!");
            free(distance);
            return;
        }
    }

    for (i=0; i < nodecount; ++i) {
        printf("The shortest distance between nodes %d and %d is %d\n",
            source, i, distance[i]);
    }
    free(distance);
    return;
}

int main(void)
{
    /* This test case should produce the distances 2, 4, 7, -2, and 0. */
    Edge edges[10] = {{0,1, 5}, {0,2, 8}, {0,3, -4}, {1,0, -2},
                      {2,1, -3}, {2,3, 9}, {3,1, 7}, {3,4, 2},
                      {4,0, 6}, {4,2, 7}};
    BellmanFord(edges, 100, 5, 4);
    return 0;
}

(with a little typo built in here as well.) Compiling it with

icc -g -traceback -o bf bf.c

will result in the error

[myuserid@user001 bellman-ford]$ bf
Segmentation fault

We need a little more help from gdb or idb.

Using gdb or idb

The examples above aren't too helpful. But if we run them through the gdb or idb debuggers, we have more options. In the Pade example,

[myuserid@user001 pade]$ gdb pade
GNU gdb Red Hat Linux (6.5-25.el5rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
  Using host libthread_db library "/lib64/libthread_db.so.1".

(gdb) run
Starting program: /home/myuserid/pade/pade 
Function exp(2.0*(zp-tp))

Deltaz = 0.5000 Deltat = 0.0500

Program received signal SIGFPE, Arithmetic exception.
0x00000000004040f0 in pade (
    t=(0.05000000074505806, 0.10000000149011612, 0.15000000223517418, .....
    z=(0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, 10, ....
                        ...........
    dupdz=(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..... , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))) at pade.f:19
19              f(i) = float(i)/0.0
Current language:  auto; currently fortran
(gdb)

Here we see that gdb will give us the values of all variables along with the location (routine, line number, and text) of the problematic area. It shows us the divide by zero at line 19. At this point (the gdb prompt) we can instruct gdb to do other things like set breakpoints and rerun, step line by line through the code, print out information and variable values, and so forth.

In the Bellman-Ford example, using idb we see

>[myuserid@user001 bellman-ford]$ idb bf
Intel(R) Debugger for applications running on Intel(R) 64, Version 10.1-32 ,  Build 20070828
------------------ 
object file name: bf 
Reading symbols from /home/myuserid/bellman-ford/bf...done.
(idb) run
Starting program: /home/myuserid/bellman-ford/bf
Program received signal SIGSEGV
BellmanFord (edges=0x7fff0771ea10, edgecount=100, nodecount=5, source=4) at bfe.c:27
27                      if (new_distance < distance[edges[j].dest])
(idb) print j
$1 = 11
(idb)

Here we see that the program died on line 27 in a call to BellmanFord. The parameters values at the time of the call are listed. In the main code, edges was declared with an edgecount of 10 but we sent in a value of 100. The program died when a reference to edges ( j) reached 11.

For more information, see man gdb and man idb. The two are very similar. A Google search on "gdb tutorials" shows many good explanations on how to use it. An Intel® Debugger (IDB) Manual is also available.

Resources

Intel® Debugger (IDB) Manual on CITI server
Intel® Debugger (IDB) Manual at Intel site
Google search on "gdb tutorials"



Maintained by CITI web services                    Copyright ©2008 Clemson University, Clemson, S.C. 29634, (864) 656-331