.. _ch06_c_io: ====================== File input and output ====================== Writing data to files works in essentially the same way as writing to the terminal. In fact, writing to files in C is very similar to writing to files in Python (or perhaps the reverse is more appropriate). Just as we did in the Python section, we will limit our discussion to writing ASCII data. Writing binary data is actually identical in behavior on all POSIX systems. As usual, Windows does it's own strange thing, which doesn't concern us here. The ``stdio.h`` header that we've been including to write to the terminal is also the appropriate header to use for file I/O. This header provides a ``typedef`` called ``FILE`` that absorbs any platform specific considerations. Opening a file with read permissions proceeds as: .. code-block:: C const char* fname = "filename.ext"; FILE *fp1 = fopen(fname, "r"); /* Or */ FILE *fp2 = fopen("anotherOne.ext", "r"); Note that we are declaring a *pointer* to the type ``FILE``. This is to allow ``fp`` to be passed by reference to any functions that interact with it. Closing the file associated to a file pointer proceeds as: .. code-block:: C fclose(fp); Note the similarity between ``fopen`` / ``fclose``, and ``malloc`` / ``free``. As mentioned in the dynamic memory section, I would advise always writing the function ``fclose`` as soon as you write ``fopen`` to help prevent (some) bugs from coming up. Files can be opened in different modes by changing the second argument. The supported modes are: * ``"r"`` to read from a file * ``"w"`` to write to a file * ``"a"`` to append a file - If no file exists with the given name this operates as ``"w"`` * ``"r+"`` to read from a file (extended) * ``"w+"`` to write to a file (extended) * ``"a+"`` to append a file (extended) The *extended* modes allow mixed input/output access, but have some touchy stipulations that we aren't going to talk about here. ^^^^^^^^^^^^^^^^^^^^^^^ Formatted input/output ^^^^^^^^^^^^^^^^^^^^^^^ Just like ``printf`` and ``scanf`` allow us to write to (read from) the terminal, the functions ``fprintf`` and ``fscanf`` allow us to write to (read from) files. They behave exactly as before, but now a file pointer is given before the remaining arguments. Let's start by writing some files out. Consider this sample code: .. literalinclude:: ./codes/fileOut.c :language: c :linenos: :download:`Download this code <./codes/fileOut.c>` Then you can compile it by: .. code-block:: console gcc fileOut.c -o fileOut.ex Try running this program and calling ``cat`` on the generated file. **Exercise:** Try changing what gets written to the file, and try using the append mode. **Exercise:** Write a program that creates 10 files with names ``sim_00.dat``, ``sim_01.dat``, ... , ``sim_09.dat``. Have your program write a timestamp into each file. Formatted input proceeds exactly as output, just using the ``"r"`` file mode and ``fscanf`` in place of ``fprintf``. Consider this code which can read files generated by the former example: .. literalinclude:: ./codes/fileIn.c :language: c :linenos: :download:`Download this code <./codes/fileIn.c>` Then you can compile it by: .. code-block:: console gcc fileIn.c -o fileIn.ex Running this will indeed echo back to you the file that the previous example wrote (assuming you didn't change it too much). However, this probably doesn't look like what you were expecting. One particular question arises. **Question:** Why are there four ``%s`` format specifiers when reading the first line of our special file? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Reading whole lines from a file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It is frequently easier to read a whole line from a file, then decide how to process it. In the above example we saw that ``fscanf`` stops reading at any whitespace character, and in particular stops reading if it encounters a space without you explicitly accounting for that. Instead, we'll use the function ``fgets``. Consider this small re-write of the above example: .. literalinclude:: ./codes/fileIn2.c :language: c :linenos: :download:`Download this code <./codes/fileIn2.c>` Then you can compile it by: .. code-block:: console gcc fileIn2.c -o fileIn2.ex This is a little better than before, we can at least read full lines from the file without having to account for whitespace so explicitly. Observe the following about ``fgets``: * The first argument is a character array, or buffer, to store the read line into * The function reads at most ``N-1`` characters from the line, where ``N`` is the second argument. The final character will always be a null-terminator - The function stops reading if a newline character is encountered - The function stops reading if the EOF (end of file) character is encountered * The buffer *will* contain the newline character if it was found - The example shows one way remove this character Typically we will want to continue reading a file regardless of how many lines are present. The function ``fgets`` actually has a return value that we can use to accomplish this. To see how, lets write a simplified clone of the ``cat`` command: .. literalinclude:: ./codes/catClone.c :language: c :linenos: :download:`Download this code <./codes/catClone.c>` Then you can compile it by: .. code-block:: console gcc catClone.c -o catClone.ex and run it like so: .. code-block:: console $ ./catClone sim.dat This is pretty weird way to write a while loop compared to what we've seen so far. Let's walk through how this works: * The ``fgets`` function returns a pointer to ``char``. This pointer is ``NULL`` when reading the line failed (e.g. there is nothing left in the file). * We put the call to ``fgets`` *inside* the conditional statement for the while loop - This means ``fgets`` is called, and ``line`` is filled every time the conditional is evaluated. - If ``fgets`` returns ``NULL``, the ``line`` buffer is not filled, but that doesn't matter since the loop body doesn't run. **Exercise:** Try printing ``line`` an additional time after the while loop. Does it do what you expected? As a brief aside, did you know that ``gcc`` can compile from ``stdin``? This means we can pass a file from ``cat`` to ``gcc`` and get an executable back. That is, we can use our clone of the cat command to compile itself: .. code-block:: console ./catClone catClone.c | gcc -x c - -o catClone2.ex It isn't very useful, but it's fun little curiosity. ^^^^^^^^^^^^^^^^^^^^^^^^ Putting it all together ^^^^^^^^^^^^^^^^^^^^^^^^ As a more useful example, let's write a program that can read the input files we used for the Mathieu example back in the Fortran chapter. Recall that these are structured like this: .. code-block:: console num_points 101 q_index 40 run_name Mathieu_101_40 Every line consists of two entries, the name and value of each parameter we want to read in. Our goal is to write a program that can extract these values from the input file. Additionally, the program should work regardless of what order lines are specified in, and regardless of whether all lines are present. We can accomplish this in the following: .. literalinclude:: ./codes/readInit.c :language: c :linenos: :download:`Download this code <./codes/readInit.c>` Then you can compile it by: .. code-block:: console gcc readInit.c -o readInit.ex Try running it as: .. code-block:: console ./readInit mathieu.init Before examining how this program works, try running it a few times. Try mixing up the lines in the init file, try messing up the spelling of the parameters. What happens if the supplied values don't make sense? What happens if you specify the same parameter multiple times? Now, let's observe how this program actually works: * Just like in the ``catClone`` program, we read lines using ``fgets`` until we reach the end of the file * We split each given line into two strings - The first is compared to the parameter names we care about - If the first matches, then the second is used to set that parameter - If the line has more than two fields separated by spaces, then the later ones are ignored * Since we keep checking all possible matches, the order of the parameters doesn't matter, and specifying them multiple times will just overwrite the earlier values. * Unrecognized lines are reported to the user, and ignored.