.. _ch06_c_io:

======================
File input and output
======================

Writing data to files works in essentially the same way as writing to the
terminal. In fact, writing to files in C is very similar to writing to files
in Python (or perhaps the reverse is more appropriate).

Just as we did in the Python section, we will limit our discussion to writing
ASCII data. Writing binary data is actually identical in behavior on all POSIX
systems. As usual, Windows does it's own strange thing, which doesn't concern
us here.

The ``stdio.h`` header that we've been including to write to the terminal is
also the appropriate header to use for file I/O. This header provides a
``typedef`` called ``FILE`` that absorbs any platform specific considerations.
Opening a file with read permissions proceeds as:

.. code-block:: C

   const char* fname = "filename.ext";
   FILE *fp1 = fopen(fname, "r");
   /* Or */
   FILE *fp2 = fopen("anotherOne.ext", "r");

Note that we are declaring a *pointer* to the type ``FILE``. This is to allow
``fp`` to be passed by reference to any functions that interact with it.

Closing the file associated to a file pointer proceeds as:

.. code-block:: C

   fclose(fp);

Note the similarity between ``fopen`` / ``fclose``, and ``malloc`` / ``free``.
As mentioned in the dynamic memory section, I would advise always writing the
function ``fclose`` as soon as you write ``fopen`` to help prevent (some) bugs
from coming up.

Files can be opened in different modes by changing the second argument. The
supported modes are:

* ``"r"`` to read from a file

* ``"w"`` to write to a file

* ``"a"`` to append a file

  - If no file exists with the given name this operates as ``"w"``

* ``"r+"`` to read from a file (extended)

* ``"w+"`` to write to a file (extended)

* ``"a+"`` to append a file (extended)

The *extended* modes allow mixed input/output access, but have some touchy
stipulations that we aren't going to talk about here.

^^^^^^^^^^^^^^^^^^^^^^^
Formatted input/output
^^^^^^^^^^^^^^^^^^^^^^^

Just like ``printf`` and ``scanf`` allow us to write to (read from) the
terminal, the functions ``fprintf`` and ``fscanf`` allow us to write to (read
from) files. They behave exactly as before, but now a file pointer is given
before the remaining arguments.

Let's start by writing some files out. Consider this sample code:

   .. literalinclude:: ./codes/fileOut.c
      :language: c
      :linenos:

:download:`Download this code <./codes/fileOut.c>`

Then you can compile it by:

.. code-block:: console
                
   gcc fileOut.c -o fileOut.ex

Try running this program and calling ``cat`` on the generated file.

**Exercise:** Try changing what gets written to the file, and try using the
append mode.

**Exercise:** Write a program that creates 10 files with names ``sim_00.dat``,
``sim_01.dat``, ... , ``sim_09.dat``. Have your program write a timestamp into
each file.

Formatted input proceeds exactly as output, just using the ``"r"`` file mode
and ``fscanf`` in place of ``fprintf``. Consider this code which can read files
generated by the former example:

   .. literalinclude:: ./codes/fileIn.c
      :language: c
      :linenos:

:download:`Download this code <./codes/fileIn.c>`

Then you can compile it by:

.. code-block:: console
                
   gcc fileIn.c -o fileIn.ex

Running this will indeed echo back to you the file that the previous example wrote
(assuming you didn't change it too much). However, this probably doesn't look like
what you were expecting. One particular question arises.

**Question:** Why are there four ``%s`` format specifiers when reading the first
line of our special file?

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Reading whole lines from a file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It is frequently easier to read a whole line from a file, then decide how to
process it. In the above example we saw that ``fscanf`` stops reading at any
whitespace character, and in particular stops reading if it encounters a
space without you explicitly accounting for that. Instead, we'll use the
function ``fgets``. Consider this small re-write of the above example:

   .. literalinclude:: ./codes/fileIn2.c
      :language: c
      :linenos:

:download:`Download this code <./codes/fileIn2.c>`

Then you can compile it by:

.. code-block:: console
                
   gcc fileIn2.c -o fileIn2.ex

This is a little better than before, we can at least read full lines from the file
without having to account for whitespace so explicitly. Observe the following about
``fgets``:

* The first argument is a character array, or buffer, to store the read line into

* The function reads at most ``N-1`` characters from the line, where ``N`` is the
  second argument. The final character will always be a null-terminator

  - The function stops reading if a newline character is encountered

  - The function stops reading if the EOF (end of file) character is encountered

* The buffer *will* contain the newline character if it was found

  - The example shows one way remove this character

Typically we will want to continue reading a file regardless of how many lines are
present. The function ``fgets`` actually has a return value that we can use to
accomplish this. To see how, lets write a simplified clone of the ``cat`` command:

   .. literalinclude:: ./codes/catClone.c
      :language: c
      :linenos:

:download:`Download this code <./codes/catClone.c>`

Then you can compile it by:

.. code-block:: console
                
   gcc catClone.c -o catClone.ex

and run it like so:

.. code-block:: console

   $ ./catClone sim.dat

This is pretty weird way to write a while loop compared to what we've seen so far.
Let's walk through how this works:

* The ``fgets`` function returns a pointer to ``char``. This pointer is ``NULL``
  when reading the line failed (e.g. there is nothing left in the file).

* We put the call to ``fgets`` *inside* the conditional statement for the while loop

  - This means ``fgets`` is called, and ``line`` is filled every time the conditional
    is evaluated.

  - If ``fgets`` returns ``NULL``, the ``line`` buffer is not filled, but that doesn't
    matter since the loop body doesn't run.

**Exercise:** Try printing ``line`` an additional time after the while loop. Does it do
what you expected?

As a brief aside, did you know that ``gcc`` can compile from ``stdin``? This means we can
pass a file from ``cat`` to ``gcc`` and get an executable back. That is, we can use our
clone of the cat command to compile itself:

.. code-block:: console

   ./catClone catClone.c | gcc -x c - -o catClone2.ex

It isn't very useful, but it's fun little curiosity.

^^^^^^^^^^^^^^^^^^^^^^^^
Putting it all together
^^^^^^^^^^^^^^^^^^^^^^^^

As a more useful example, let's write a program that can read the input files we used for
the Mathieu example back in the Fortran chapter. Recall that these are structured like this:

.. code-block:: console

   num_points 101
   q_index 40
   run_name Mathieu_101_40

Every line consists of two entries, the name and value of each parameter we want to read in.
Our goal is to write a program that can extract these values from the input file.
Additionally, the program should work regardless of what order lines are specified in, and
regardless of whether all lines are present.

We can accomplish this in the following:

   .. literalinclude:: ./codes/readInit.c
      :language: c
      :linenos:

:download:`Download this code <./codes/readInit.c>`

Then you can compile it by:

.. code-block:: console
                
   gcc readInit.c -o readInit.ex

Try running it as:

.. code-block:: console

   ./readInit mathieu.init

Before examining how this program works, try running it a few times. Try mixing up the lines
in the init file, try messing up the spelling of the parameters. What happens if the supplied
values don't make sense? What happens if you specify the same parameter multiple times?

Now, let's observe how this program actually works:

* Just like in the ``catClone`` program, we read lines using ``fgets`` until we reach the end
  of the file

* We split each given line into two strings

  - The first is compared to the parameter names we care about

  - If the first matches, then the second is used to set that parameter

  - If the line has more than two fields separated by spaces, then the later ones are ignored

* Since we keep checking all possible matches, the order of the parameters doesn't matter,
  and specifying them multiple times will just overwrite the earlier values.

* Unrecognized lines are reported to the user, and ignored.