Translation units

We’ve already seen that splitting your code into multiple files allows you to stay much more organized and increases the readability of your code. In Fortran we had multiple .f90 files, one of which declared a program using the program and end program tags. In Python we had multiple .py files, each of which could have a main section that would get ignored upon importation.

In C we will also have one file that declares a main function, and then write other files that contain whatever specialized functions or data types we need. In C these extra files generally come in pairs of .h and .c files. The reason for this is best seen from a few examples.

Header only version

Consider moving the two supporting functions to their own file:

 1/* File: MatFuncs.h
 2 * Author: Ian May
 3 * Purpose: First look at writing your own header
 4 */
 5
 6#ifndef MATFUNCS_H
 7#define MATFUNCS_H
 8
 9void InitMatrix(int m, int n, double A[m][n])
10{
11  for (int i=0; i<m; i++) {
12    for (int j=0; j<n; j++) {
13      A[i][j] = ((double) i*j+1);
14    }
15  }
16}
17
18void MatVecMult(int m, int n, double A[m][n], double x[n], double b[m])
19{
20  for (int i=0; i<m; i++) {
21    b[i] = 0;
22    for (int j=0; j<n; j++) {
23      b[i] += A[i][j]*x[j];
24    }
25  }
26}
27
28#endif

Download this code

then modifying the main program to:

 1/* File: MatMain.c
 2 * Author: Ian May
 3 * Purpose: First look at including your own header
 4 */
 5
 6#include <stdlib.h>
 7#include <stdio.h>
 8
 9/* Notice that this one uses quotes, why? */
10#include "MatFuncs.h"
11
12int main()
13{
14  /* Array of 4 doubles, uninitialized */
15  double b[4];
16  
17  /* Array of 3 doubles, set to initial values */
18  double x[] = {3.2, 4.5, -6.2};
19  
20  /* 4x3 array of doubles */
21  double A[4][3];
22  
23  /* Fill A with something interesting */
24  InitMatrix(4,3,A);
25  
26  /* Store A*x into b, interpreted as matrix-vector product */
27  MatVecMult(4,3,A,x,b);
28  
29  /* Print out result */
30  printf("b = (%1.2f,%1.2f,%1.2f,%1.2f)\n",b[0],b[1],b[2],b[3]);
31  
32  return 0;
33}

Download this code

Download these to the same directory. Then you can compile them by:

gcc -g -Wall -Wextra -pedantic MatMain.c -o mat.ex

Running the executable ./mat.ex will give the same output as before. There are a few critical observations to discuss:

  • We did not compile each file separately

  • There are some funny preprocessor directives surrounding the contents of the header file

  • The include directive in the main file uses quotes instead of angle brackets

The last point is because we need to tell the preprocessor not to search the whole system, but rather look for the file MatFuncs.h locally (this directory). The first two points are due to essentially the same thing: the #include directive is effectively a fancy copy-paste operation.

The preprocessor, as it’s name might suggest, runs before the compiler actually gets called. It interacts with the system to find the file MatFuncs.h and directly pastes it’s contents into MatMain.c before giving control to the compiler, hence there is only one compiler call.

The preprocessor calls in MatFuncs.h are called an include guard.

Exercise: Simulate the setting where MatFuncs.h gets indirectly included by another file by just repeating the include call in the main program. Try compiling with, and without, the include guard in place.

Header/source file pair

The above header-only method is a clunky solution. All functions and their definitions are pasted into the main file. This means that any time we change the main file, we need to re-compile all of the functions defined in our include file.

To support incremental compiling (among other benefits) it is common to split the function declarations from their definitions. We’ll put the former into the header file, and the latter into a .c file. To avoid headaches, you should always give this .c file the same name as the header file it corresponds to. Make a new directory and consider the following files.

The header file:

 1/* File: MatFuncs.h
 2 * Author: Ian May
 3 * Purpose: First look at translation units
 4 */
 5
 6#ifndef MATFUNCS_H
 7#define MATFUNCS_H
 8
 9/* Just declare that a function with this name and these arguments will exist */
10void InitMatrix(int m, int n, double A[m][n]);
11void MatVecMult(int m,int n,double[m][n],double[n],double[m]);
12
13#endif

Download this code

The matching source file:

 1/* File: MatFuncs.c
 2 * Author: Ian May
 3 * Purpose: First look at translation units
 4 */
 5
 6#include "MatFuncs.h"
 7
 8void InitMatrix(int m, int n, double A[m][n])
 9{
10  for (int i=0; i<m; i++) {
11    for (int j=0; j<n; j++) {
12      A[i][j] = ((double) i*j+1);
13    }
14  }
15}
16
17void MatVecMult(int m, int n, double A[m][n], double x[n], double b[m])
18{
19  for (int i=0; i<m; i++) {
20    b[i] = 0;
21    for (int j=0; j<n; j++) {
22      b[i] += A[i][j]*x[j];
23    }
24  }
25}

Download this code

The main file (unchanged):

 1/* File: MatMain.c
 2 * Author: Ian May
 3 * Purpose: First look at translation units
 4 */
 5
 6#include <stdlib.h>
 7#include <stdio.h>
 8
 9/* Notice that this one uses quotes, why? */
10#include "MatFuncs.h"
11
12int main()
13{
14  /* Array of 4 doubles, uninitialized */
15  double b[4];
16  
17  /* Array of 3 doubles, set to initial values */
18  double x[] = {3.2, 4.5, -6.2};
19  
20  /* 4x3 array of doubles */
21  double A[4][3];
22  
23  /* Fill A with something interesting */
24  InitMatrix(4,3,A);
25  
26  /* Store A*x into b, interpreted as matrix-vector product */
27  MatVecMult(4,3,A,x,b);
28  
29  /* Print out result */
30  printf("b = (%1.2f,%1.2f,%1.2f,%1.2f)\n",b[0],b[1],b[2],b[3]);
31  
32  return 0;
33}

Download this code

Download these to the same directory. Then you can compile them by:

gcc -g -Wall -Wextra -pedantic -c MatMain.c
gcc -g -Wall -Wextra -pedantic -c MatFuncs.c
gcc -g -Wall -Wextra -pedantic -o mat.ex MatMain.o MatFuncs.o

There are several observations to make:

  • The include guards are only in the header file, since this is the only thing that has any risk of multiple inclusion

  • The first two compiler calls create the object (.o) files, and can be called in either order

  • Each object file generated by the compiler is defined from one translation unit

  • The third call invokes the linker that generates the executable file

    • The order of the object files in this call don’t matter, but it is good practice to put dependencies after things that depend on them. This will matter when linking external libraries.

The contents of each translation unit are (mostly) distinct. Each one may have undefined symbols that are resolved later. The linker is responsible for joining these units together and resolving symbols between them (linkage). Consider reading more at the Translation unit wiki page.

Exercise: Try removing the line #include "MatFuncs.h" from the main file and re-compiling. What happens and why?

Exercise: Add a static function to the file MatFuncs.c that prints the contents of the matrix. Try calling it from inside MatFuncs.c as well as MatMain.c.

A common usage for this header/source pattern is to declare a custom data type and functions that act on it in the header file, and then define the actual function bodies in the source file. The means that files including the header know what the custom data type looks like, and what functions act on it. This separation preserves the ability to build incrementally, and crucially allows file-local functions to be defined without risk of collision later.

Structures and custom data types

In the last section we alluded to defining custom data types, and the utility of placing them in header/source pair of files. These are also called derived types since they are built up from primitive types (and perhaps other derived types). We’ve actually already seen one derived type, the array. We’ll understand the label derived in this context later.

Perhaps the most common derived type (other than the array) is the struct. These are essentially wrappers around a group of data members that should always be kept together. The generic format for declaring a struct is as follows:

/* Declare */
struct S {
  int n;
  char *s;
  double q;
};
/* Create several instances */
struct S a = {2, "Hello", 2.3e-4};
struct S b = {4, "World", -1.3e4};
/* Access entries */
printf("a contains: {%d, %s, %f}\n", a.n, a.s, a.q);
printf("b contains: {%d, %s, %f}\n", b.n, b.s, b.q);

Notice that we need to use struct S as the type identifier each time we declare another variable of this type. This can be avoided by instead writing the definition as

/* Declare */
typedef struct {
  int n;
  char *s;
  double q;
} S;

S a = {2, "Hello", 2.3e-4};

We’ll look at a pattern using this typedef approach later.

A full example

These ideas are best illustrated with an example. Let’s re-write the previous code using a struct that stores the matrix in column-major format instead of the default row-major form that C uses. This could make interoperability with Fortran easier (more on this later though). We’ll put the definition of the struct into a header:

 1/* File: ColMajorMat.h
 2 * Author: Ian May
 3 * Purpose: Second look at translation units, now splitting a struct
 4 *          definition apart from functions that act on it
 5 */
 6
 7#ifndef COLMAJORMAT_H
 8#define COLMAJORMAT_H
 9
10/* Define a structure that holds the matrix dimensions and a flattened array */
11typedef struct {
12  int m, n;
13  double *data;
14} ColMajorMat;
15
16/* Declare functions that will act on this structure */
17void InitMatrix(ColMajorMat);
18void MatVecMult(ColMajorMat,double[*],double[*]);
19
20#endif

Download this code

Then define some useful functions that act on this struct in the corresponding source file:

 1/* File: ColMajorMat.c
 2 * Author: Ian May
 3 * Purpose: Define several functions to interact with the column
 4 *          major array example
 5 */
 6
 7#include "ColMajorMat.h"
 8
 9void InitMatrix(ColMajorMat A)
10{
11  for (int i=0; i<A.m; i++) {
12    for (int j=0; j<A.n; j++) {
13      A.data[j*A.m + i] = ((double) i*j+1);
14    }
15  }
16}
17
18void MatVecMult(ColMajorMat A, double x[A.n], double b[A.m])
19{
20  for (int i=0; i<A.m; i++) {
21    b[i] = 0;
22    for (int j=0; j<A.n; j++) {
23      b[i] += A.data[j*A.m + i]*x[j];
24    }
25  }
26}

Download this code

We’ll use these in a sample program:

 1/* File: Main.c
 2 * Author: Ian May
 3 * Purpose: Second look at translation units. Now bring in a column-major
 4 *          array format defined in a struct from another file
 5 */
 6
 7#include <stdlib.h>
 8#include <stdio.h>
 9
10/* Notice that this one uses quotes, why? */
11#include "ColMajorMat.h"
12
13int main()
14{
15  /* Array of 4 doubles, uninitialized */
16  double b[4];
17  
18  /* Array of 3 doubles, set to initial values */
19  double x[] = {3.2, 4.5, -6.2};
20  
21  /* 4x3 array of doubles */
22  double data[12];
23  ColMajorMat A = {4, 3, data};
24  
25  /* Fill A with something interesting */
26  InitMatrix(A);
27  
28  /* Store A*x into b, interpreted as matrix-vector product */
29  MatVecMult(A,x,b);
30  
31  /* Print out result */
32  printf("b = (%1.2f,%1.2f,%1.2f,%1.2f)\n",b[0],b[1],b[2],b[3]);
33  
34  return 0;
35}

Download this code

Finally, typing in all of these compile commands is getting tedious. Let’s use a makefile from here on out:

 1CC = gcc
 2CFLAGS = -g -Wall -Wextra -pedantic -Wno-vla-parameter
 3
 4OBJ = Main.o ColMajorMat.o
 5
 6ColMajor.ex: $(OBJ)
 7	$(CC) $(CFLAGS) -o $@ $(OBJ)
 8
 9%.o: %.c
10	$(CC) $(CFLAGS) -c $<
11
12.PHONY: clean
13
14clean:
15	rm -f ColMajor.ex *.o *~

Download this code

To compile, simply call the makefile via:

make

There are a few things introduced in this example that we haven’t seen so far. Some useful commentary is:

  • We are flattening our matrix into a rank one array of length m*n. Can you see from the functions that deal with our matrix struct how the matrix layout is recovered?

  • The data field in our struct has a * in it. This says that the member data is a pointer to the type double. We’re going to cover pointers next

  • The file ColMajorMat.c includes ColMajorMat.h, why?

  • We need to allocate the data array before creating an instance of the struct. Can we come up with a more comfortable way to accomplish this?

  • The instance A of our matrix is passed by value to each of the methods. Is this good practice? Why can we still write to the data array in the InitMatrix function?

    • It’s not terribly good practice, but we’re going to fix it momentarily.

  • We can refer to entries inside the struct while declaring VLA arguments

Overall, packing this information into a struct has simplified how we deal with this custom column-major layout. We can place all functions that care directly about the layout of the array into the ColMajorMat.c source file, away from the end user.

However, there are still some flaws in this programming example. We’ll resolve most of these after studying pointers in the next section.

Exercise:

  • Add a static function inside the ColMajorMat.c source file that takes care of the indexing into the special layout of our data here. Verify that you can not use this function from the main file, despite having linked everything.

  • Try making the array dimensions incompatible. What errors can you generate? Are there things that don’t generate compiler warnings that you thought would?