| Clemson Home > CCIT Home | Skip Navigation | A-Z Index Calendar CU Safety Map Webcams Phonebook |
Debugging your code
This is probably not your first program and not the first machine you have used. Like all of us, anytime you write or modify your program, there is a good chance you will introduced some "bugs". Compilers differ in how they generate the executable in general and specifically depending on the compiler options you choose. For example, increasing optimization can alter how calculations are ordered and how floating point numbers are calculated. A particular algorithm may not tolerate such adjustments and things can "blow up".
In this section we will describe the general options that are available on most compilers and on the Intel(R) compilers specifically. Those available only on the Intel compilers will have (Intel) along side.
General debugging options
Several options give general debug information or get the debugging process started.
| Option | Description |
|---|---|
| -g | Tells the compiler to generate symbolic debug information in the executable. For example, while the compiler generally gives your variables its own internal name, this option causes it to retain copies of your names along with the routine names and line numbers at which they are referenced. |
| -traceback | Tells the compiler to generate extra information in the executable file to provide better traceback information (that is, tracing from the error back through the calling subroutines and procedures.) (Intel) |
| -inline-debug-info | Tells the compiler to generate enhanced information for inlined code. This provides more accurate location information when reporting debug information and gives better traceback information. The -g option must also be specified in order for this to take effect. (Intel) |
Compiler error messages
After decades of development, many compilers today are quite advanced. They produce very efficient executable code (sometimes despite our own lack of effort.) In order to do this they collect a great deal of information about everything from symbol names through how (non)standard your code may be. In the beginning you will probably just want to use the default compiler message levels. As time goes by, you may want to increase or decrease the level depending on the development stage of your code.
In this section we will talk about how to vary these message levels for debugging purposes.
Warning message settings are available across all compilers. In general, the -w parameter will suppress all warning messages. But for debugging purposes, you may want to increase them. Additional, helpful checks include:
- Uninitialized variables - While some compilers/loaders automatically initialize variables, most do not. Or it could be a typo.
- Fortran: Use -CU with the Intel compiler and -Wuninitialized with GNU compiler.
- C: Use -Wuninitialized with both the Intel and GNU compilers.
- Unused variables - This could mean at typo.
- Fortran: Use -warn unused with the Intel compiler and -Wunused with the GNU compiler.
- C: Use -Wunused-variable with both the Intel and GNU compilers.
- Undeclared variables - If you typically declare the type of all variables, then this could mean a typo.
- Fortran: Use -warn declarations with the Intel compiler and -Wimplicit with the GNU compiler.
- C: Use -Wmissing-declarations with both the Intel and GNU compilers.
These are some good settings to start with. Many more settings exist, particularly with the C compilers. For more information on those, see the associated compiler man pages.
Enhanced debugging information
The Intel compilers also provide the ability to request some enhanced information. (Note that as you ask the compiler to do more and more investigative work, compile time and run time will become longer.) For either C or Fortran (Intel), specify:
- -debug all or -debug full for complete debugging information.
- -debug minimal for line number information.
- -debug variable-locations for enhanced debug information useful in finding scalar local variables.
- -debug inline_debug_info for information for inlined code. It provides more information to debuggers for function call traceback.
Diagnostic messages
Optimizing compilers will often have components that perform the compilation in stages . For example, the Intel compilers include a preprocessor, vectorizer, and auto-parallelizer. You can also get some diagnostic information from these stages. For either C or Fortran (Intel), specify:
- -diag-enable driver to get diagnostic messages issued by the compiler driver.
- -diag-enable vec[n] to get diagnostic messages issued by the vectorizer.
- -diag-enable par[n] to get diagnostic messages issued by the auto-parallelizer (parallel optimizer).
where n can be
1 = all critical errors
2 = all errors (default)
3 = all errors and warnings
You can also direct the compiler to put the diagnostics into a particular file (-diag-file[=<file>]) or to print the diagnostics and then stop the compilation (-diag-dump.)
Runtime checks
The Fortran compilers provides some extra options for setting up some run time checking. That is, extra code is inserted that will check certain conditions while the code is running.
- Generate compile-time and run-time checks on array subscript and character substring expressions. This will help you determine if a subscript or substring expression has grown too large or too small, which may be causing other parts of memory to be overwritten.
- For Intel Fortran: -check bounds
- For GNU Fortran: -fbounds-check
- Issue a fatal error when the data type of an item being formatted for output does not match the format descriptor being used.
- For Intel Fortran: -check format
- Enable run-time checking for disassociated or uninitialized Fortran pointers, unallocated allocatable objects, and integer pointers that are uninitialized.
- For Intel Fortran: -check pointers
- Generate code to check for uninitialized variables.
- For Intel Fortran: -check uninit (This is the same as -CU.)
- Enable all of the above.
- For Intel Fortran: -check all
Floating point exceptions
A "floating point exception" (fpe) basically means that an element has either never been defined, has overflowed the register during a calculation, or has underflowed the register during a calculation. The error generally doesn't appear until you try to use it in another calculation (at which point you will likely see a program crash) or until you try to print it (at which point you may see results like "UNDEF" for an undefined number, "INF" for infinite or overflow, or "NaN" for not a number.) The spot where the error manifests itself may not be anywhere near where the calculation went astray. So floating point exception handling options come to the rescue.
The Intel Fortran compiler provides fpe checking with
-fpe<n>
where n can be
Value Description 0 Floating-point invalid, divide-by-zero, and overflow exceptions are enabled. Execution is terminated. 1 All floating-point exceptions are disabled. 3 All floating-point exceptions are disabled. Floating-point underflow is gradual, unless you explicitly specify a compiler option that enables flush-to-zero.
See man ifort, under -fpe for more discussion of these options.
Several additional floating point checking tools that may be of interest to you include:
- -fp-stack-check to check every function call to ensure that the floating-point stack is in the expected state.
- -fmath-errno to test errno after calls to math library functions.
- -fltconsistency to enable improved floating-point consistency. Floating point instructions are not reordered and are stored in their target variable after each operation.
- -IPF-fltacc to disable optimizations that affect floating-point accuracy.
Each of these will give information toward helping you determine your coding problems. But note that they restrict optimization and increase overhead. Thus your code will run slower and, in some cases, significantly slower. Use these only as needed to improve accuracy or to discover and correct your coding problems.
Checking for non-standard Fortran or C
Most computing vendors offer special features in their processors to make them more attractive than those of "the other guy". These features do not always adhere to the various language standards and are generally called in via "compiler extensions" or additional options. If you are bringing your program over from another machine or if you are bringing in a code of unknown origin, it might not be a bad idea to check it for non-standard language uses. This is easily done with a compiler option.
For Fortran use
-e<version>
where version can be
Version Description 03 causes the compiler to issue errors instead of warnings for nonstandard Fortran. 90 causes the compiler to issue errors instead of warnings for nonstandard Fortran 90. 95 causes the compiler to issue errors instead of warnings for nonstandard Fortran 95.
Similarly for C use
-std<version>
where version can be
Version Description c89 causes the compiler to check that the code conforms to the ISO/IEC 9899:1990 International Standard. c99 causes the compiler to check that the code conforms to the ISO/IEC 9899:1999 International Standard. gnu89 causes the compiler to check that the code conforms to ISO C90 plus GNU* extensions. (Default for C.). gnu++98 causes the compiler to check that the code conforms to the 1998 ISO C++ standard plus GNU extensions. (Default for C++.) c++0x causes the compiler to enable support for a number of C++0x features (see the Intel C++ Compiler Documentation for details).
Debugging parallel codes
When you bring your program code over to palmetto, the first step is to run it successfully as a serial job on several test cases. Once you are sure it is running correctly on a single processor, it is time to try a parallel run.
As noted in the section Compiling your parallel code, there are several ways to approach parallelism. And when taking this step, it is easy to introduce programming and logic errors. As it stands there are only a few options that will help you debug your parallel code. (Intel)
Specifying
-O0 -openmp
will help to debug OpenMP applications.
Diagnostic messages are available from the preprocessors by specifying
-diag-enable vec[n] to get diagnostic messages issued by the vectorizer.
-diag-enable par[n] to get diagnostic messages issued by the auto-parallelizer (parallel optimizer).
where n can be
1 = all critical errors
2 = all errors (default)
3 = all errors and warnings
Debugging examples
There are many ways to combine the options that have been shown in this section. We will provide a few simple examples to get you started.
In Fortran, we can look at an implementation of the Pade approximant
subroutine pade(t,z,up,ud,dupdz,j,dumdz)
implicit double precision (a-h,o-z)
include 'param.h'
double precision t(ns)
double precision z(igamma)
double precision a(igamma),b(igamma),c(igamma),cc(igamma),
1 d(igamma),cb(igamma)
double precision ud(igamma)
double precision up(igamma)
double precision dupdz(igamma)
double precision dumdz(igamma)
double precision f(6)
common/delt/deltaz,deltat,b4
c
c Calculate tridiagonal solution
c
do i = 1,6
f(i) = float(i)/0.0
end do
a(1) = 0.0
do i = 2,igamma
a(i) = f(1)
end do
c b(1) = f(4)
b(1) = f(2)
do i = 2,igamma-1
b(i)=f(4)
end do
c b(igamma) = f(4)
b(igamma) = f(2)
do i = 1,igamma-1
c(i) = f(1)
end do
c(igamma)=0.0
c
c For the solution, define
c
cc(1) = 1.0/b(1)
do i = 2,igamma
cc(i) = 1.0/(b(i)-a(i)*c(i-1)*cc(i-1))
end do
do i = 1,igamma
d(i) = -c(i)*cc(i)
end do
cb(1) = cc(1)*dumdz(1)
do i = 2,igamma
cb(i) = cc(i)*(dumdz(i)-a(i)*cb(i-1))
end do
c
c Keep endpoint value fixed at this time.
c
dupdz(igamma) = cb(igamma)
do i = igamma-1, 1, -1
dupdz(i) = d(i)*dupdz(i+1) + cb(i)
end do
c
return
end
|
with a typo built in. Compiling it with
ifort -g -fpe0 -traceback -o pade pade.f
will tell the compiler to generate symbolic information in the executable and terminate with an error on any floating point exceptions. Call traceback information will be included. A possible outcome would be
[myuserid@user001 pade]$ pade Function exp(2.0*(zp-tp)) Deltaz = 0.5000 Deltat = 0.0500 forrtl: error (73): floating divide by zero Image PC Routine Line Source pade 00000000004040F0 Unknown Unknown Unknown pade 00000000004032F7 Unknown Unknown Unknown pade 0000000000402B98 Unknown Unknown Unknown pade 0000000000402B02 Unknown Unknown Unknown libc.so.6 0000003CDB41D8A4 Unknown Unknown Unknown pade 0000000000402A29 Unknown Unknown Unknown Aborted |
In this example we get some information - that we have a divide by zero. But the location isn't given in easily readable form. See the section below on using gdb and idb for some additional help.
In C, we can take a look at Wikipedia's Bellman Ford example
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
/* Let INFINITY be an integer value not likely to be
confused with a real weight, even a negative one. */
#define INFINITY ((1 << 14)-1)
typedef struct {
int source;
int dest;
int weight;
} Edge;
void BellmanFord(Edge edges[], int edgecount, int nodecount, int source)
{
int *distance = malloc(nodecount * sizeof *distance);
int i, j;
for (i=0; i < nodecount; ++i)
distance[i] = INFINITY;
distance[source] = 0;
for (i=0; i < nodecount; ++i) {
for (j=0; j < edgecount; ++j) {
if (distance[edges[j].source] != INFINITY) {
int new_distance = distance[edges[j].source] + edges[j].weight;
if (new_distance < distance[edges[j].dest])
distance[edges[j].dest] = new_distance;
}
}
}
for (i=0; i < edgecount; ++i) {
if (distance[edges[i].dest] > distance[edges[i].source] + edges[i].weight) {
puts("Negative edge weight cycles detected!");
free(distance);
return;
}
}
for (i=0; i < nodecount; ++i) {
printf("The shortest distance between nodes %d and %d is %d\n",
source, i, distance[i]);
}
free(distance);
return;
}
int main(void)
{
/* This test case should produce the distances 2, 4, 7, -2, and 0. */
Edge edges[10] = {{0,1, 5}, {0,2, 8}, {0,3, -4}, {1,0, -2},
{2,1, -3}, {2,3, 9}, {3,1, 7}, {3,4, 2},
{4,0, 6}, {4,2, 7}};
BellmanFord(edges, 100, 5, 4);
return 0;
}
|
(with a little typo built in here as well.) Compiling it with
icc -g -traceback -o bf bf.c
will result in the error
[myuserid@user001 bellman-ford]$ bf Segmentation fault |
We need a little more help from gdb or idb.
Using gdb or idb
The examples above aren't too helpful. But if we run them through the gdb or idb debuggers, we have more options. In the Pade example,
[myuserid@user001 pade]$ gdb pade
GNU gdb Red Hat Linux (6.5-25.el5rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
Using host libthread_db library "/lib64/libthread_db.so.1".
(gdb) run
Starting program: /home/myuserid/pade/pade
Function exp(2.0*(zp-tp))
Deltaz = 0.5000 Deltat = 0.0500
Program received signal SIGFPE, Arithmetic exception.
0x00000000004040f0 in pade (
t=(0.05000000074505806, 0.10000000149011612, 0.15000000223517418, .....
z=(0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, 10, ....
...........
dupdz=(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..... , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))) at pade.f:19
19 f(i) = float(i)/0.0
Current language: auto; currently fortran
(gdb)
|
Here we see that gdb will give us the values of all variables along with the location (routine, line number, and text) of the problematic area. It shows us the divide by zero at line 19. At this point (the gdb prompt) we can instruct gdb to do other things like set breakpoints and rerun, step line by line through the code, print out information and variable values, and so forth.
In the Bellman-Ford example, using idb we see
>[myuserid@user001 bellman-ford]$ idb bf Intel(R) Debugger for applications running on Intel(R) 64, Version 10.1-32 , Build 20070828 ------------------ object file name: bf Reading symbols from /home/myuserid/bellman-ford/bf...done. (idb) run Starting program: /home/myuserid/bellman-ford/bf Program received signal SIGSEGV BellmanFord (edges=0x7fff0771ea10, edgecount=100, nodecount=5, source=4) at bfe.c:27 27 if (new_distance < distance[edges[j].dest]) (idb) print j $1 = 11 (idb) |
Here we see that the program died on line 27 in a call to BellmanFord. The parameters values at the time of the call are listed. In the main code, edges was declared with an edgecount of 10 but we sent in a value of 100. The program died when a reference to edges ( j) reached 11.
For more information, see man gdb and man idb. The two are very similar. A Google search on "gdb tutorials" shows many good explanations on how to use it. An Intel® Debugger (IDB) Manual is also available.
Resources
Intel® Debugger (IDB) Manual on CITI server
Intel® Debugger (IDB) Manual at Intel site
Google search on "gdb tutorials"
- Login to post comments
