.. _ch03-python-dictionaries: =============================================================== Dictionaries =============================================================== * Link to `incomplete Jupyter Notebook for this section of the notes (for you to fill out while following along with lecture) `_ * Link to `completed Jupyter Notebook for this section of the notes `_ Another useful data type built into Python is the dictionary. A dictionary is like a list, but allows more general indices. In a list, the indices have to be integers; in a dictionary they can be (almost) any type and are now called ``keys``. .. _ch03-python-key-value: Relational storage ------------------------------------------- You can think of a dictionary as a mapping between two things: * **keys**: a set of indices, * **values**: a set of values corresponding to each key. Each key maps to a value. The association of a key to a value is called a **key-value pair**. You can define an empty dictionary in two ways. One way is to use a built-in function ``dict`` .. code-block:: python >>> eng2kor = dict() or alternatively, use an empty curly brackets, ``{}`` .. code-block:: python >>> eng2kor = {} In both cases you would see the following .. code-block:: python >>> type(eng2kor) >>> print(eng2kor) {} Let's add a new pair to the dictionary, ``eng2kor``. To add one, you can use square brackets .. code-block:: python >>> eng2kor['one'] = 'hana' This creates a new key-value pair that maps from the key ``'one'`` to the value ``'hana'``. If we print the dictionary again .. code-block:: python >>> print(eng2kor) {'one': 'hana'} You can initialize a dictionary with multiple items like this: .. code-block:: python >>> eng2kor={'one':'hana','two':'dool','three':'set','four':'net'} The ``append`` method cannot be invoked on a dictionary *directly* (i.e., ``eng2kor.append('five')`` won't work). Instead, one can keep adding new pairs using .. code-block:: python >>> eng2kor['five'] = 'dasut' or using ``update`` (try ``help(dict)`` or ``dir(dict)`` to see more options) .. code-block:: python >>> eng2kor.update({'five':'dasut'}) Let's now print to see what we have defined so far .. code-block:: python >>> print(eng2kor) {'four': 'net', 'three': 'set', 'five': 'dasut', 'two': 'dool', 'one': 'hana'} The order of the key-value pairs may not look like what you probably expected. In fact, they might look different on different computers. The order of pairs in a dictionary is unpredictable. This is because the elements of a dictionary are indexed through a ``hash`` function. In spite of this, dictionaries are still **iterable**. Traversing through the dictionary will show .. code-block:: python >>> for i in eng2kor: ... print(i) ... four three five two one Here we see that traversing a dictionary runs over the *keys*, and not the *values*. Note that keys must be unique, while values may not be (there is a surjection between keys and values): .. code-block:: python >>> eng2kor[5] = 'dasut' >>> eng2kor['five'] 'dasut' >>> eng2kor[5] 'dasut' The dictionary method ``items`` returns something akin to a list of tuples which gives a useful way to iterate over the keys and values together: .. code-block:: python >>> for eng, kor in eng2kor.items(): ... print(eng, kor) ... one hana two dool three set four net five dasut 5 dasut or, to just get keys .. code-block:: python >>> for eng in eng2kor.keys(): ... print(eng) ... one two three four five 5 and finally to get only the values .. code-block:: python >>> for kor in eng2kor.values(): ... print(kor) ... hana dool set net dasut dasut We can apply some of the built-in functions and operators we learned so far to dictionaries as well: .. code-block:: python >>> len(eng2kor) 6 >>> 'one' in eng2kor True >>> 'net' in eng2kor False The second example of the ``in`` operator tells us that Python checks if the search word appears as a ``key``, but not as a ``value`` in the dictionary. To see whether something appears as a value instead of a key, is to use the method ``values`` which returns the values as a list .. code-block:: python >>> print(eng2kor.values()) dict_values(['hana', 'dool', 'set', 'net']) With this we can now search .. code-block:: python >>> 'net' in eng2kor.values() True .. _ch03-python-counter: Dictionary as a set of counters ---------------------------------------------------------------- A common string processing problem is to examine the frequency of characters or substrings. Let's see how we could use a dictionary to help keep track of characters (you'll get to look at the substring version in HW4): .. literalinclude:: ./codes/examples/dictionaries/histogram.py :language: python :linenos: :download:`Download this code <./codes/examples/dictionaries/histogram.py>` Running this in the script mode will give .. code-block:: console $ python histogram.py (a): {'a': 1, 'p': 2, 'l': 1, 'e': 1} (b): {'a': 1, 'p': 2, 'l': 1, 'e': 1} (c): False (d): True (e): dict_keys(['a', 'p', 'l', 'e']) (f): True (g): dict_values([1, 2, 1, 1]) (h): True (i): 1 (j): 159 Dictionaries and lists ---------------------------------------------------------------- Lists can only appear as values in a dictionary, but **not** keys. For example, if you try .. code-block:: python >>> t=['a','e','l'] >>> type(t) >>> d = dict() >>> d[t] Traceback (most recent call last): File "", line 1, in TypeError: unhashable type: 'list' >>> d['1'] = t >>> d {'1': ['a', 'e', 'l']} The above example confirms that lists can only be used as values. For the most part, keys need to be immutable objects (more specifically they must have a ``__hash__()`` method). Now, let's consider an application of using lists as values. Take a look at what we obtained in the last outcome, ``{'1': ['a','e','l']}``. This looks like an inverse map of the output from ``(a)`` or ``(b)``! This example tells us that we could try to create an inverse map from values to keys in a dictionary (*note* this only works if the values are all hashable types). Here is a function that inverts a dictionary: .. literalinclude:: ./codes/examples/dictionaries/invert_dictionary.py :language: python :linenos: :download:`Download this code <./codes/examples/dictionaries/invert_dictionary.py>` The result look like .. code-block:: console $ python3 invert_dictionary.py {'a': 1, 'p': 2, 'e': 1, 'l': 1} {1: ['a', 'e', 'l'], 2: ['p']} Dictionaries and memoization ------------------------------ Dictionaries are an excellent way to store the results of expensive calculations using the inputs as keys. In this way, whenever a set of inputs that have already been used are queried one can fetch the result from the dictionary. This avoids repeating potentially expensive calculations, and trades off computational effort for a larger memory footprint. For example, a naive implementation of the `Fibonacci sequence `_ looks like: .. literalinclude:: ./codes/examples/dictionaries/fibonacci.py :language: python :linenos: :download:`Download this code <./codes/examples/dictionaries/fibonacci.py>` Notice that early terms in the sequence are re-evaluated a huge number of times through those recursive calls. A dictionary can be used to cache the result for previously encountered values, which can short-circuit those recursive calls quite efficiently. .. literalinclude:: ./codes/examples/dictionaries/fibonacci_dict.py :language: python :linenos: :download:`Download this code <./codes/examples/dictionaries/fibonacci_dict.py>` In the above example, ``known`` is a dictionary that stores the Fibonacci numbers we already know. It starts with the first two terms in the sequence: ``F0=0`` and ``F1=1``, or in other words, 0 maps to 0 and 1 maps to 1. This caches every value seen along way. Note that we could use a list or array for this version. The sparse version only caches the values we explicitly ask for. This is slower on the initial calls, but has a smaller memory footprint. This is particularly useful if you know that the function will be queried for the same inputs many times. To compare CPU runtime in seconds, we can do as follows: .. literalinclude:: ./codes/examples/dictionaries/run_fibonacci.py :language: python :linenos: :download:`Download this code <./codes/examples/dictionaries/run_fibonacci.py>` Running this will produce something akin to this: .. code-block:: console First runs through (sec): fibonacci = 0.7044386863708496 fibonacci_dict = 1.4066696166992188e-05 fibonacci_dict_sparse = 0.7033944129943848 Second runs through (sec): fibonacci = 0.7008662223815918 fibonacci_dict = 3.337860107421875e-06 fibonacci_dict_sparse = 1.430511474609375e-06 Global variables ---------------------------------------------------------------- Consider running ``fibonacci_dict.py`` directly. ``known`` is initialized *outside* any function, and thus belongs to the module namespace. Variables declared as such are global from the perspective of this file, and can be accessed by any function here (compare this to module variables in Fortran). When we import that file as a module in ``run_fibonacci.py``, the dictionary known will live in the ``fibonacci_dict`` namespace. The following example illustrates how global variables behave and how they could be modified within a local function: .. literalinclude:: ./codes/examples/dictionaries/global.py :language: python :linenos: :download:`Download this code <./codes/examples/dictionaries/global.py>` The result is: .. code-block:: console (a): True (b): False (c): True (d): True (e): False (f): True An example study ---------------------------------------------------------------- Consider the following example which has been originally adopted from `Dive Into Python 3 `_ and modified for the class. This routine takes a computer file size in kilobytes as an input and converts it approximately to a human-readable form, e.g., 1TB, or 931 GiB, etc. .. literalinclude:: ./codes/examples/dictionaries/humansize.py :language: python :linenos: :download:`Download this code <./codes/examples/dictionaries/humansize.py>` The output from running the routine looks like .. code-block:: console $ python3 humansize.py (a) with the multiple of 1000 bytes: 1 TB (b) with the multiple of 1024 bytes: 931 GiB