Translation units¶
We’ve already seen that splitting your code into multiple files
allows you to stay much more organized and increases the readability
of your code. In Fortran we had multiple .f90
files, one of which
declared a program using the program
and end program
tags. In
Python we had multiple .py
files, each of which could have a main
section that would get ignored upon importation.
In C we will also have one file that declares a main
function, and
then write other files that contain whatever specialized functions or
data types we need. In C these extra files generally come in pairs of
.h
and .c
files. The reason for this is best seen from a few
examples.
Header only version¶
Consider moving the two supporting functions to their own file:
1/* File: MatFuncs.h 2 * Author: Ian May 3 * Purpose: First look at writing your own header 4 */ 5 6#ifndef MATFUNCS_H 7#define MATFUNCS_H 8 9void InitMatrix(int m, int n, double A[m][n]) 10{ 11 for (int i=0; i<m; i++) { 12 for (int j=0; j<n; j++) { 13 A[i][j] = ((double) i*j+1); 14 } 15 } 16} 17 18void MatVecMult(int m, int n, double A[m][n], double x[n], double b[m]) 19{ 20 for (int i=0; i<m; i++) { 21 b[i] = 0; 22 for (int j=0; j<n; j++) { 23 b[i] += A[i][j]*x[j]; 24 } 25 } 26} 27 28#endif
then modifying the main program to:
1/* File: MatMain.c 2 * Author: Ian May 3 * Purpose: First look at including your own header 4 */ 5 6#include <stdlib.h> 7#include <stdio.h> 8 9/* Notice that this one uses quotes, why? */ 10#include "MatFuncs.h" 11 12int main() 13{ 14 /* Array of 4 doubles, uninitialized */ 15 double b[4]; 16 17 /* Array of 3 doubles, set to initial values */ 18 double x[] = {3.2, 4.5, -6.2}; 19 20 /* 4x3 array of doubles */ 21 double A[4][3]; 22 23 /* Fill A with something interesting */ 24 InitMatrix(4,3,A); 25 26 /* Store A*x into b, interpreted as matrix-vector product */ 27 MatVecMult(4,3,A,x,b); 28 29 /* Print out result */ 30 printf("b = (%1.2f,%1.2f,%1.2f,%1.2f)\n",b[0],b[1],b[2],b[3]); 31 32 return 0; 33}
Download these to the same directory. Then you can compile them by:
gcc -g -Wall -Wextra -pedantic MatMain.c -o mat.ex
Running the executable ./mat.ex
will give the same output as before.
There are a few critical observations to discuss:
We did not compile each file separately
There are some funny preprocessor directives surrounding the contents of the header file
The include directive in the main file uses quotes instead of angle brackets
The last point is because we need to tell the preprocessor not to search
the whole system, but rather look for the file MatFuncs.h
locally
(this directory). The first two points are due to essentially the same
thing: the #include
directive is effectively a fancy copy-paste
operation.
The preprocessor, as it’s name might suggest, runs before the compiler
actually gets called. It interacts with the system to find the file
MatFuncs.h
and directly pastes it’s contents into MatMain.c
before giving control to the compiler, hence there is only one compiler
call.
The preprocessor calls in MatFuncs.h
are called an include guard.
Exercise: Simulate the setting where MatFuncs.h
gets indirectly
included by another file by just repeating the include call in the main
program. Try compiling with, and without, the include guard in place.
Header/source file pair¶
The above header-only method is a clunky solution. All functions and their definitions are pasted into the main file. This means that any time we change the main file, we need to re-compile all of the functions defined in our include file.
To support incremental compiling (among other benefits) it is common to
split the function declarations from their definitions. We’ll put the
former into the header file, and the latter into a .c
file. To avoid
headaches, you should always give this .c
file the same name as the
header file it corresponds to. Make a new directory and consider the
following files.
The header file:
1/* File: MatFuncs.h 2 * Author: Ian May 3 * Purpose: First look at translation units 4 */ 5 6#ifndef MATFUNCS_H 7#define MATFUNCS_H 8 9/* Just declare that a function with this name and these arguments will exist */ 10void InitMatrix(int m, int n, double A[m][n]); 11void MatVecMult(int m,int n,double[m][n],double[n],double[m]); 12 13#endif
The matching source file:
1/* File: MatFuncs.c 2 * Author: Ian May 3 * Purpose: First look at translation units 4 */ 5 6#include "MatFuncs.h" 7 8void InitMatrix(int m, int n, double A[m][n]) 9{ 10 for (int i=0; i<m; i++) { 11 for (int j=0; j<n; j++) { 12 A[i][j] = ((double) i*j+1); 13 } 14 } 15} 16 17void MatVecMult(int m, int n, double A[m][n], double x[n], double b[m]) 18{ 19 for (int i=0; i<m; i++) { 20 b[i] = 0; 21 for (int j=0; j<n; j++) { 22 b[i] += A[i][j]*x[j]; 23 } 24 } 25}
The main file (unchanged):
1/* File: MatMain.c 2 * Author: Ian May 3 * Purpose: First look at translation units 4 */ 5 6#include <stdlib.h> 7#include <stdio.h> 8 9/* Notice that this one uses quotes, why? */ 10#include "MatFuncs.h" 11 12int main() 13{ 14 /* Array of 4 doubles, uninitialized */ 15 double b[4]; 16 17 /* Array of 3 doubles, set to initial values */ 18 double x[] = {3.2, 4.5, -6.2}; 19 20 /* 4x3 array of doubles */ 21 double A[4][3]; 22 23 /* Fill A with something interesting */ 24 InitMatrix(4,3,A); 25 26 /* Store A*x into b, interpreted as matrix-vector product */ 27 MatVecMult(4,3,A,x,b); 28 29 /* Print out result */ 30 printf("b = (%1.2f,%1.2f,%1.2f,%1.2f)\n",b[0],b[1],b[2],b[3]); 31 32 return 0; 33}
Download these to the same directory. Then you can compile them by:
gcc -g -Wall -Wextra -pedantic -c MatMain.c
gcc -g -Wall -Wextra -pedantic -c MatFuncs.c
gcc -g -Wall -Wextra -pedantic -o mat.ex MatMain.o MatFuncs.o
There are several observations to make:
The include guards are only in the header file, since this is the only thing that has any risk of multiple inclusion
The first two compiler calls create the object (
.o
) files, and can be called in either orderEach object file generated by the compiler is defined from one translation unit
The third call invokes the linker that generates the executable file
The order of the object files in this call don’t matter, but it is good practice to put dependencies after things that depend on them. This will matter when linking external libraries.
The contents of each translation unit are (mostly) distinct. Each one may have undefined symbols that are resolved later. The linker is responsible for joining these units together and resolving symbols between them (linkage). Consider reading more at the Translation unit wiki page.
Exercise: Try removing the line #include "MatFuncs.h"
from the
main file and re-compiling. What happens and why?
Exercise: Add a static function to the file MatFuncs.c
that prints
the contents of the matrix. Try calling it from inside MatFuncs.c
as well
as MatMain.c
.
A common usage for this header/source pattern is to declare a custom data type and functions that act on it in the header file, and then define the actual function bodies in the source file. The means that files including the header know what the custom data type looks like, and what functions act on it. This separation preserves the ability to build incrementally, and crucially allows file-local functions to be defined without risk of collision later.
Structures and custom data types¶
In the last section we alluded to defining custom data types, and the utility of placing them in header/source pair of files. These are also called derived types since they are built up from primitive types (and perhaps other derived types). We’ve actually already seen one derived type, the array. We’ll understand the label derived in this context later.
Perhaps the most common derived type (other than the array) is the struct
.
These are essentially wrappers around a group of data members that should
always be kept together. The generic format for declaring a struct is as
follows:
/* Declare */
struct S {
int n;
char *s;
double q;
};
/* Create several instances */
struct S a = {2, "Hello", 2.3e-4};
struct S b = {4, "World", -1.3e4};
/* Access entries */
printf("a contains: {%d, %s, %f}\n", a.n, a.s, a.q);
printf("b contains: {%d, %s, %f}\n", b.n, b.s, b.q);
Notice that we need to use struct S
as the type identifier each time
we declare another variable of this type. This can be avoided by instead
writing the definition as
/* Declare */
typedef struct {
int n;
char *s;
double q;
} S;
S a = {2, "Hello", 2.3e-4};
We’ll look at a pattern using this typedef
approach later.
A full example¶
These ideas are best illustrated with an example. Let’s re-write the previous code using a struct that stores the matrix in column-major format instead of the default row-major form that C uses. This could make interoperability with Fortran easier (more on this later though). We’ll put the definition of the struct into a header:
1/* File: ColMajorMat.h 2 * Author: Ian May 3 * Purpose: Second look at translation units, now splitting a struct 4 * definition apart from functions that act on it 5 */ 6 7#ifndef COLMAJORMAT_H 8#define COLMAJORMAT_H 9 10/* Define a structure that holds the matrix dimensions and a flattened array */ 11typedef struct { 12 int m, n; 13 double *data; 14} ColMajorMat; 15 16/* Declare functions that will act on this structure */ 17void InitMatrix(ColMajorMat); 18void MatVecMult(ColMajorMat,double[*],double[*]); 19 20#endif
Then define some useful functions that act on this struct in the corresponding source file:
1/* File: ColMajorMat.c 2 * Author: Ian May 3 * Purpose: Define several functions to interact with the column 4 * major array example 5 */ 6 7#include "ColMajorMat.h" 8 9void InitMatrix(ColMajorMat A) 10{ 11 for (int i=0; i<A.m; i++) { 12 for (int j=0; j<A.n; j++) { 13 A.data[j*A.m + i] = ((double) i*j+1); 14 } 15 } 16} 17 18void MatVecMult(ColMajorMat A, double x[A.n], double b[A.m]) 19{ 20 for (int i=0; i<A.m; i++) { 21 b[i] = 0; 22 for (int j=0; j<A.n; j++) { 23 b[i] += A.data[j*A.m + i]*x[j]; 24 } 25 } 26}
We’ll use these in a sample program:
1/* File: Main.c 2 * Author: Ian May 3 * Purpose: Second look at translation units. Now bring in a column-major 4 * array format defined in a struct from another file 5 */ 6 7#include <stdlib.h> 8#include <stdio.h> 9 10/* Notice that this one uses quotes, why? */ 11#include "ColMajorMat.h" 12 13int main() 14{ 15 /* Array of 4 doubles, uninitialized */ 16 double b[4]; 17 18 /* Array of 3 doubles, set to initial values */ 19 double x[] = {3.2, 4.5, -6.2}; 20 21 /* 4x3 array of doubles */ 22 double data[12]; 23 ColMajorMat A = {4, 3, data}; 24 25 /* Fill A with something interesting */ 26 InitMatrix(A); 27 28 /* Store A*x into b, interpreted as matrix-vector product */ 29 MatVecMult(A,x,b); 30 31 /* Print out result */ 32 printf("b = (%1.2f,%1.2f,%1.2f,%1.2f)\n",b[0],b[1],b[2],b[3]); 33 34 return 0; 35}
Finally, typing in all of these compile commands is getting tedious. Let’s use a makefile from here on out:
1CC = gcc 2CFLAGS = -g -Wall -Wextra -pedantic -Wno-vla-parameter 3 4OBJ = Main.o ColMajorMat.o 5 6ColMajor.ex: $(OBJ) 7 $(CC) $(CFLAGS) -o $@ $(OBJ) 8 9%.o: %.c 10 $(CC) $(CFLAGS) -c $< 11 12.PHONY: clean 13 14clean: 15 rm -f ColMajor.ex *.o *~
To compile, simply call the makefile via:
make
There are a few things introduced in this example that we haven’t seen so far. Some useful commentary is:
We are flattening our matrix into a rank one array of length
m*n
. Can you see from the functions that deal with our matrix struct how the matrix layout is recovered?The data field in our struct has a
*
in it. This says that the memberdata
is a pointer to the type double. We’re going to cover pointers nextThe file
ColMajorMat.c
includesColMajorMat.h
, why?We need to allocate the data array before creating an instance of the struct. Can we come up with a more comfortable way to accomplish this?
The instance
A
of our matrix is passed by value to each of the methods. Is this good practice? Why can we still write to the data array in theInitMatrix
function?It’s not terribly good practice, but we’re going to fix it momentarily.
We can refer to entries inside the struct while declaring VLA arguments
Overall, packing this information into a struct has simplified how we deal with this
custom column-major layout. We can place all functions that care directly about the
layout of the array into the ColMajorMat.c
source file, away from the end user.
However, there are still some flaws in this programming example. We’ll resolve most of these after studying pointers in the next section.
Exercise:
Add a static function inside the
ColMajorMat.c
source file that takes care of the indexing into the special layout of our data here. Verify that you can not use this function from the main file, despite having linked everything.Try making the array dimensions incompatible. What errors can you generate? Are there things that don’t generate compiler warnings that you thought would?