File input and output

Writing data to files works in essentially the same way as writing to the terminal. In fact, writing to files in C is very similar to writing to files in Python (or perhaps the reverse is more appropriate).

Just as we did in the Python section, we will limit our discussion to writing ASCII data. Writing binary data is actually identical in behavior on all POSIX systems. As usual, Windows does it’s own strange thing, which doesn’t concern us here.

The stdio.h header that we’ve been including to write to the terminal is also the appropriate header to use for file I/O. This header provides a typedef called FILE that absorbs any platform specific considerations. Opening a file with read permissions proceeds as:

const char* fname = "filename.ext";
FILE *fp1 = fopen(fname, "r");
/* Or */
FILE *fp2 = fopen("anotherOne.ext", "r");

Note that we are declaring a pointer to the type FILE. This is to allow fp to be passed by reference to any functions that interact with it.

Closing the file associated to a file pointer proceeds as:

fclose(fp);

Note the similarity between fopen / fclose, and malloc / free. As mentioned in the dynamic memory section, I would advise always writing the function fclose as soon as you write fopen to help prevent (some) bugs from coming up.

Files can be opened in different modes by changing the second argument. The supported modes are:

  • "r" to read from a file

  • "w" to write to a file

  • "a" to append a file

    • If no file exists with the given name this operates as "w"

  • "r+" to read from a file (extended)

  • "w+" to write to a file (extended)

  • "a+" to append a file (extended)

The extended modes allow mixed input/output access, but have some touchy stipulations that we aren’t going to talk about here.

Formatted input/output

Just like printf and scanf allow us to write to (read from) the terminal, the functions fprintf and fscanf allow us to write to (read from) files. They behave exactly as before, but now a file pointer is given before the remaining arguments.

Let’s start by writing some files out. Consider this sample code:

 1/* File: fileOut.c
 2 * Author: Ian May
 3 * Purpose: Demonstrate writing to a file
 4 */
 5
 6#include <stdlib.h>
 7#include <stdio.h>
 8
 9int main()
10{
11  /* Set filename and open */
12  const char* fname = "sim.dat";
13  FILE *fp = fopen(fname,"w");
14
15  /* Write something to the file */
16  fprintf(fp, "This file is %s\n", fname);
17  fprintf(fp, "%d %le %lf\n", 7, 2.7e-3, 13.46);
18
19  /* Close the file */
20  fclose(fp);
21
22  return 0;
23}

Download this code

Then you can compile it by:

gcc fileOut.c -o fileOut.ex

Try running this program and calling cat on the generated file.

Exercise: Try changing what gets written to the file, and try using the append mode.

Exercise: Write a program that creates 10 files with names sim_00.dat, sim_01.dat, … , sim_09.dat. Have your program write a timestamp into each file.

Formatted input proceeds exactly as output, just using the "r" file mode and fscanf in place of fprintf. Consider this code which can read files generated by the former example:

 1/* File: fileIn.c
 2 * Author: Ian May
 3 * Purpose: Demonstrate reading from a very specific file
 4 */
 5
 6#include <stdlib.h>
 7#include <stdio.h>
 8
 9int main()
10{
11  /* Set filename and open */
12  const char* fname = "sim.dat";
13  FILE *fp = fopen(fname,"r");
14
15  /* Read from the file */
16  char s0[16], s1[16], s2[16], s3[16];
17  fscanf(fp, "%s %s %s %s", s0, s1, s2, s3);
18  int n;
19  double p, q;
20  fscanf(fp, "%d %le %lf\n", &n, &p, &q);
21
22  /* Close the file */
23  fclose(fp);
24
25  /* Report what was read in */
26  printf("The file contained:\n");
27  printf("%s %s %s %s\n", s0, s1, s2, s3);
28  printf("%d %e %f\n", n, p, q);
29
30  return 0;
31}

Download this code

Then you can compile it by:

gcc fileIn.c -o fileIn.ex

Running this will indeed echo back to you the file that the previous example wrote (assuming you didn’t change it too much). However, this probably doesn’t look like what you were expecting. One particular question arises.

Question: Why are there four %s format specifiers when reading the first line of our special file?

Reading whole lines from a file

It is frequently easier to read a whole line from a file, then decide how to process it. In the above example we saw that fscanf stops reading at any whitespace character, and in particular stops reading if it encounters a space without you explicitly accounting for that. Instead, we’ll use the function fgets. Consider this small re-write of the above example:

 1/* File: fileIn2.c
 2 * Author: Ian May
 3 * Purpose: Demonstrate reading from a very specific file, now with fgets
 4 */
 5
 6#include <stdlib.h>
 7#include <stdio.h>
 8#include <string.h>
 9
10#define MAX_LINE_LENGTH 256
11
12int main()
13{
14  /* Set filename and open */
15  const char* fname = "sim.dat";
16  FILE *fp = fopen(fname,"r");
17
18  /* Read the first line from the file */
19  char l1[MAX_LINE_LENGTH];
20  fgets(l1, MAX_LINE_LENGTH, fp);
21  /* Remove newline character from string if present */
22  if (l1[strlen(l1)-1] == '\n') {
23    l1[strlen(l1)-1] = '\0';
24  }
25
26  /* Read the second line from the file */
27  char l2[MAX_LINE_LENGTH];
28  fgets(l2, sizeof(l2), fp);
29
30  /* Extract numbers from second line */
31  int n;
32  double p, q;
33  sscanf(l2, "%d %le %lf\n", &n, &p, &q);
34
35  /* Close the file */
36  fclose(fp);
37
38  /* Report what was read in */
39  printf("The file contained:\n");
40  printf("%s\n", l1);
41  printf("%d %e %f\n", n, p, q);
42
43  return 0;
44}

Download this code

Then you can compile it by:

gcc fileIn2.c -o fileIn2.ex

This is a little better than before, we can at least read full lines from the file without having to account for whitespace so explicitly. Observe the following about fgets:

  • The first argument is a character array, or buffer, to store the read line into

  • The function reads at most N-1 characters from the line, where N is the second argument. The final character will always be a null-terminator

    • The function stops reading if a newline character is encountered

    • The function stops reading if the EOF (end of file) character is encountered

  • The buffer will contain the newline character if it was found

    • The example shows one way remove this character

Typically we will want to continue reading a file regardless of how many lines are present. The function fgets actually has a return value that we can use to accomplish this. To see how, lets write a simplified clone of the cat command:

 1/* File: catClone.c
 2 * Author: Ian May
 3 * Purpose: Use fgets to write a simplified version of cat
 4 */
 5
 6#include <stdlib.h>
 7#include <stdio.h>
 8#include <string.h>
 9
10#define MAX_LINE_LENGTH 256
11
12int main(int argc, char **argv)
13{
14  /* Exit if no file was supplied */
15  /* Note that we should also do other error checking */
16  if (argc < 2) {
17    printf("You need to supply an input file!\n");
18    return 0;
19  }
20  
21  /* Set filename and open */
22  FILE *fp = fopen(argv[1],"r");
23
24  /* Buffer to store lines in */
25  char line[MAX_LINE_LENGTH];
26  
27  /* Read the lines from the file until EOF */
28  while (fgets(line, MAX_LINE_LENGTH, fp) != NULL) {
29    /* Print the line */
30    printf("%s",line);
31  }
32
33  /* Close the file */
34  fclose(fp);
35
36  return 0;
37}

Download this code

Then you can compile it by:

gcc catClone.c -o catClone.ex

and run it like so:

$ ./catClone sim.dat

This is pretty weird way to write a while loop compared to what we’ve seen so far. Let’s walk through how this works:

  • The fgets function returns a pointer to char. This pointer is NULL when reading the line failed (e.g. there is nothing left in the file).

  • We put the call to fgets inside the conditional statement for the while loop

    • This means fgets is called, and line is filled every time the conditional is evaluated.

    • If fgets returns NULL, the line buffer is not filled, but that doesn’t matter since the loop body doesn’t run.

Exercise: Try printing line an additional time after the while loop. Does it do what you expected?

As a brief aside, did you know that gcc can compile from stdin? This means we can pass a file from cat to gcc and get an executable back. That is, we can use our clone of the cat command to compile itself:

./catClone catClone.c | gcc -x c - -o catClone2.ex

It isn’t very useful, but it’s fun little curiosity.

Putting it all together

As a more useful example, let’s write a program that can read the input files we used for the Mathieu example back in the Fortran chapter. Recall that these are structured like this:

num_points 101
q_index 40
run_name Mathieu_101_40

Every line consists of two entries, the name and value of each parameter we want to read in. Our goal is to write a program that can extract these values from the input file. Additionally, the program should work regardless of what order lines are specified in, and regardless of whether all lines are present.

We can accomplish this in the following:

 1/* File: readInit.c
 2 * Author: Ian May
 3 * Purpose: Short program to read init files from the Mathieu example
 4 */
 5
 6#include <stdlib.h>
 7#include <stdio.h>
 8#include <string.h>
 9
10#define MAX_LINE_LENGTH 256
11
12int main(int argc, char **argv)
13{
14  /* Exit if no file was supplied */
15  /* Note that we should also do other error checking */
16  if (argc < 2) {
17    printf("You need to supply an init file!\n");
18    return 0;
19  }
20
21  /* Declare the relevant parameters and give them default values */
22  int N = 101;
23  double q = 40.;
24  char runName[MAX_LINE_LENGTH-1];
25  strcpy(runName,"default_run"); /* why is this line here? */
26  
27  /* Set filename and open */
28  FILE *fp = fopen(argv[1],"r");
29
30  /* Buffers to store lines in */
31  char line[MAX_LINE_LENGTH];
32  char front[MAX_LINE_LENGTH];
33  char back[MAX_LINE_LENGTH];
34  
35  /* Read the lines from the file until EOF */
36  while (fgets(line, MAX_LINE_LENGTH, fp) != NULL) {
37    /* Process the line to see what it specifies */
38    sscanf(line,"%s %s\n",front,back);
39    /* We only have three settings, so an if-else chain is easy */
40    if (strcmp(front,"num_points") == 0) {
41      N = atoi(back);
42    } else if (strcmp(front,"q_index") == 0) {
43      q = atof(back);
44    } else if (strcmp(front,"run_name") == 0) {
45      strcpy(runName,back);
46    } else {
47      printf("Specified parameter unrecognized: %s\n",front);
48    }
49  }
50
51  /* Close the file */
52  fclose(fp);
53
54  /* Report back the found values */
55  printf("We read in the following parameters:\n");
56  printf("%d grid points\n",N);
57  printf("A q value of %f\n",q);
58  printf("A run name of %s\n",runName);
59
60  return 0;
61}

Download this code

Then you can compile it by:

gcc readInit.c -o readInit.ex

Try running it as:

./readInit mathieu.init

Before examining how this program works, try running it a few times. Try mixing up the lines in the init file, try messing up the spelling of the parameters. What happens if the supplied values don’t make sense? What happens if you specify the same parameter multiple times?

Now, let’s observe how this program actually works:

  • Just like in the catClone program, we read lines using fgets until we reach the end of the file

  • We split each given line into two strings

    • The first is compared to the parameter names we care about

    • If the first matches, then the second is used to set that parameter

    • If the line has more than two fields separated by spaces, then the later ones are ignored

  • Since we keep checking all possible matches, the order of the parameters doesn’t matter, and specifying them multiple times will just overwrite the earlier values.

  • Unrecognized lines are reported to the user, and ignored.