Python scripts and modules¶
A Python script is a collection of commands in a file designed to be executed like a program. The file can of course contain functions and import various modules, but the idea is that it will be run or executed from the command line or from within a Python interactive shell to perform a specific task. Often a script first contains a set of function definitions and then has the main program that might call the functions.
Consider the following examples,
1"""
2A module for computing properties of circles
3
4Functions in this module return properties of a circle as determined by
5it's radius
6
7Included functions:
8area -- Returns area of circle
9circumference -- Returns circumference of circle
10"""
11
12import numpy as np
13
14def area(radius=1.0):
15 """
16 Compute area from radius
17
18 Arguments:
19 radius -- radius of circle (default 1.0)
20 """
21 area = np.pi*radius**2
22 return(area)
23
24def circumference(radius=1.0):
25 """
26 Compute circumference from radius
27
28 Arguments:
29 radius -- radius of circle (default 1.0)
30 """
31
32 circumference = 2.*np.pi*radius
33 return circumference
34
35if __name__ == "__main__":
36 print('area of a circle with r = 1 is ', area())
37 print('circumference of a circle with r = 1 is ', circumference())
38 print('area of a circle with r = 2 is ', area(2.0))
39 print('circumference of a circle with r = 2 is ', circumference(2.0))
40
41print('this is a call to the circle module!')
Note that in the above example there is a line checking if the file is being called as a script:
if __name__ == "__main__":
Where __name__
is a special variable set by the interpreter. If the file
is run by the interpreter directly then __name__
will be set to the string
'__main__'
. Otherwise, it will be set to the filename with the extension
removed. This allows a *.py
file to behave differently when imported as
opposed to being run directly. We demonstrate this below.
Let’s start by running circle.py
directly in the terminal:
$ python circle.py
area of a circle with r = 1 is 3.141592653589793
circumference of a circle with r = 1 is 6.283185307179586
area of a circle with r = 2 is 12.566370614359172
circumference of a circle with r = 2 is 12.566370614359172
this is a call to the circle module!
Alternatively, you can import
the file as a module (see Importing modules
below for more about this)
>>> import circle
this is a call to the circle module!
Note that there is a slight difference in outputs depending how you call this
file. When it is imported as a module any statements that are in the main
section
are not executed. However, any lines not guarded by this conditional will be executed,
hence in the example above we still see the line this is a call to the circle module!
printed out even when importing it. To make this special __name__
variable more
clear try the following:
>>> import circle
>>> circle.__name__
'circle'
This means that the calculations of the area and circumference are not executed when imported.
You can also see that the example module has docstrings
present throughout it. Try
out the following in an interactive session to see how these are presented to the user:
>>> help(circle)
Help on module circle:
NAME
circle - A module for computing properties of circles
DESCRIPTION
Functions in this module return properties of a circle as determined by
it's radius
Included functions:
area -- Returns area of circle
circumference -- Returns circumference of circle
FUNCTIONS
area(radius=1.0)
Compute area from radius
Arguments:
radius -- radius of circle (default 1.0)
circumference(radius=1.0)
Compute circumference from radius
Arguments:
radius -- radius of circle (default 1.0)
FILE
/home/<some path>/circle.py
Similarly, you can access the help for individual functions:
>>> help(circle.area)
Help on function area in module circle:
area(radius=1.0)
Compute area from radius
Arguments:
radius -- radius of circle (default 1.0)
If you refer back to circle.py
, can you see how this was generated from the docstring(s)?
A nice reference on how to structure docstrings can be found in
PEP 257. Also try
>>> dir(circle)
['__builtins__', '__doc__', '__file__', '__name__', '__package__','circle_area', 'circle_cir', 'np_pi']
>>> type(circle)
<type 'module'>
>>> help(circle.area)
Help on function circle_area in module circle:
area(radius=1.0)
Compute area from radius
Arguments:
radius -- radius of circle (default 1.0)
In addition, any variables or functions defined in the file are available as attributes of the module, e.g.,
>>> circle.area(3.)
28.274333882308138
If you don’t include the parentheses, you will see something like
>>> circle.area
<function circle.area(radius=1.0)>
Importing modules¶
When Python starts up there are a certain number of basic commands defined
along with the general syntax of the language, but most useful things needed
for specific purposes (such as working with webpages, or solving linear
systems) are in modules that do not load by default. Otherwise it would
take forever to start up Python, loading lots of things you don’t plan to
use. So when you start using Python, either interactively or at the top of
a script, often the first thing you do is import
one or more modules.
A Python module is often defined simply by grouping a set of parameters and
functions together in a single .py
file.
Two useful modules are os
and sys
that help you interact with the
operating system and the Python interpreter itself. These are standard
modules that should be available with any Python implementation, so you
should be able to import them at the Python prompt
>>> import os, sys
Each module contains many different functions and parameters which are the
methods and attributes of the module. Here we will only use a couple of
these. The getcwd
method of the os module is called to return the “current
working directory” (the same thing pwd
prints in Bash), e.g.
>>> os.getcwd()
'/Users/mickey_mouse/Documents/ucsc/2030_spring/am129/playground/python'
Note that this function is called with no arguments, but you need the open and close parentheses. If you type “os.getcwd” without these, Python will instead print what type of object this function is
>>> os.getcwd
<built-in function getcwd>
The Python Path¶
The sys
module has an attribute sys.path
, a variable that is set by
default to the search path for modules. Whenever you perform an import
,
this is the set of directories that Python searches through looking for a
file by that name (with a .py
extension). If you print this, you will see a
list of strings, each one of which is the full path to some directory.
Sometimes the first thing in this list is the empty string, which means “the
current directory”, so it looks for a module in your working directory first
and if it doesn’t find it, searches through the other directories in order:
>>> print(sys.path)
['', '/usr/local/bin', ... ]
If you try to import a module and it doesn’t find a file with this name on the path, then you will get an import error
>>> import junkname
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named junkname
When new Python software such as NumPy or SciPy is installed, the installation script should modify the path appropriately so it can be found. You can also add to the path if you have your own directory that you want Python to look in, e.g.
>>> sys.path.append("/Users/mickey_mouse/mypy")
will append the directory indicated to the path. To avoid having to do this each time you start Python, you can set a Unix environment variable that is used to modify the path every time Python is started. First print out the current value of this variable
$ echo $PYTHONPATH
It will probably be blank unless you’ve set this before or have installed software that sets this automatically. To append the above example directory to this path
$ export PYTHONPATH=$PYTHONPATH:/Users/mickey_mouse/mypy
This appends another directory to the search path already specified (if any). You can repeat this multiple times to add more directories, or put something like
export PYTHONPATH=$PYTHONPATH:dir1:dir2:dir3
in your .bashrc
or .bash_profile
file if there are the only 3 personal
directories you always want to search.
Note
This section regarding PYTHONPATH
is here mostly for completeness sake.
The purpose it fills has largely been superceded by virtual environments
which we’ll cover soon.
Note
If your system has python2 and python3, setting up PYTHONPATH could be a little bit tricky. See this article for more information: about PYTHONPATH
Reloading modules¶
When you import a module, Python keeps track of the fact that it is imported
and if it encounters another statement to import the same module will not
bother to do so again (the list of modules that have already been imported is
sys.modules
). This is convenient since loading a module can be
time consuming.
If you’re debugging a module in an interactive session (e.g. an IPython shell),
you might think you’d need to completely close and re-open the session each time
you update the module. Fortunately we have a way around this. You can force a module
to be reloaded using the importlib
module.
Suppose, for example, that we modify circle.py
so that both the area and
circumference are multiplied by 2 (why would we do this?).
If we make this change and then try the following (in the same Python
session as above, where circle
was already imported as a module)
>>> import circle
>>> circle.area(3.)
28.274333882308138
we get the same results as above, even though we changed circle.py
.
We have to use the reload
function from the importlib
module to see the
changes we made:
>>> import importlib
>>> importlib.reload(circle)
this is a call to the circle module!
<module 'circle' from 'circle.py'>
>>> circle.area(3.)
56.548667764616276
Other forms of import¶
If all we want to use from the os
module is getcwd
, then another option
is to do
>>> from os import getcwd
>>> getcwd()
'/Users/mickey_mouse/Documents/ucsc/2030_spring/am129/playground/python'
In this case we only imported one method from the module, not the whole
thing. Note that now getcwd
is called by just giving the name of the
method, not module.method
. The name getcwd
is
now in our namespace
. If we only imported getcwd
and tried typing
os.getcwd()
we’d get an error, since it wouldn’t find os
in our
namespace.
You can rename things when you import them, which is sometimes useful if
different modules contain different objects with the same name.
For example, to compare how the sqrt
function in the standard Python math
module compares to the numpy version:
>>> from math import sqrt as sqrtm
>>> from numpy import sqrt as sqrtn
>>> sqrtm(-1.)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: math domain error
>>> sqrtn(-1.)
__main__:1: RuntimeWarning: invalid value encountered in sqrt
nan
The standard function gives an error whereas the numpy
version returns
nan
, a special IEEE 754 value indicating Not a Number.
You can also import a module and give it a different name locally. This is
particularly useful if you import a module with a long name, but even for
numpy
many examples you’ll find on the web abbreviate this as np
>>> import numpy as np
>>> theta = np.linspace(0., 2*np.pi, 5)
>>> theta
array([ 0. , 1.57079633, 3.14159265, 4.71238898, 6.28318531])
>>> np.cos(theta)
array([ 1.00000000e+00, 6.12323400e-17, -1.00000000e+00, -1.83697020e-16, 1.00000000e+00])
If you don’t like having to type the module name repeatedly you can import just the things you need into your namespace
>>> from numpy import pi, linspace, cos
>>> theta = linspace(0., 2*pi, 5)
>>> theta
array([ 0. , 1.57079633, 3.14159265, 4.71238898, 6.28318531])
>>> cos(theta)
array([ 1.00000000e+00, 6.12323400e-17, -1.00000000e+00, -1.83697020e-16, 1.00000000e+00])
If you’re going to be using lots of things form numpy
you could import everything
into your local namespace
>>> from numpy import *
Then linspace
, pi
, cos
, and several hundred other things will be available
without the prefix.
Note
When writing code it is often best to not do this, however, since then it is not clear to the reader (or even to the programmer sometimes) what methods or attributes are coming from which module if several different modules are being used. (They may define methods with the same names but that do very different things, for example.)
When using IPython, it is often convenient to start it with
$ ipython --pylab
This automatically imports everything from numpy
into the namespace, and
also all of the plotting tools from matplotlib
.
Installing new modules and using virtual environments¶
We have mentioned the Python package manager pip
a couple of times. There is
a vast ecosystem of packages, most of which in turn depend on other packages.
An unfortunately common problem is that separate packages will have a shared
dependency between them, but require different versions of that same package.
That is, if package A
needs package C
with version v2.1
, and
package B
needs C
with version v1.4
, then we seemingly have no way to
install A
and B
on the same system.
Fortunately there is a solution. We can use so-called virtual environments. You can create a new environment for a project by running
$ python -m venv venv
Here python -m venv
says that we want Python to go find the venv
module
and run the corresponding .py
file as a script. The second venv
is an
argument we supply to give a name to our new virtual environment. You can call
it whatever you like, but calling it venv
has become conventional. Now you
can enter this environment through your terminal by calling
source venv/bin/activate
Now any calls to python
or pip
will alias through this new environment.
When you are done using this environment you can call deactivate
.
In fact, most versions of pip
will complain if you try to use them outside of
virtual environments.
Note
In earlier chapters we saw that we do not want our version control system
(git
) to track intermediate files like object files. Similarly, we do
not want to track the virtual environment itself. Make sure that you add
the venv
directory to your .gitignore
file.
As a final note, you may also want to install Python packages at the system level independent of any virtual environment. Frequently you can accomplish this through you systems own package manager. E.g. on arch derived systems you could run
$ sudo pacman -S python-numpy
to install numpy
system-wide.