Programming Languages, Platforms, and Operating Systems¶
Scientific languages¶
One of the most crucial components of this course is to write computer programs for homework sets and the final coding project.
The default reference language is going to be Fortran 90 in this course. Fortran 90 (or higher) has been one of the most widely used programming languages in high performance computing communities for over half a century. The syntax is simple enough that most of the language can be learned in just a couple weeks, allowing you to start writing full applications early, and perfect your skills later. Its simplicity also makes it very fast, and has been chosen for various benchmark tests that run on large scale computing architectures and supercomputers.
Problems in scientific computing are rarely solved by a single language choice. As our main goal in this course is to build experience in the tools of scientific computing, we will also discuss C and Python in some detail. C++ is also a common choice within scientific computing, though we will not cover it here.
Computing platforms¶
In order to conduct scientific computing, it is required that you have an access to a Linux/Unix computing platform.
There are several options to bring a Linux/Unix computing system to your daily scientific adventures.
If your machine is running Linux then you probably already know most of the material on this page.
If your computer is running macOS, you can use it to run your code locally, but you will have to install the Xcode command line tools.
You can install the Xcode command line tools by opening a terminal and runing the command
xcode-select --install
.
If your machine runs Windows 10 or later, try the Windows Subsystem for Linux (WSL). This approach allows you to install a Linux distribution directly within your Windows environment, and run Linux binaries natively. Graphical support is minimal for now, however this will not be an issue for this course. Please see Getting an appropriate environment on your machine for more information.
I strongly recommend using WSL 2 (as does Microsoft).
If your machine is a Windows PC that does not support WSL, you can remotely access a Linux computer using an X-forwarding terminal such as PuTTy. Putty is an SSH client for Windows allowing you to work on a remote Linux computer (more on SSH later).
If you would rather have a native Linux setup on your Windows PC, but can not use WSL, you can run a virtual Linux environment using free virtualization software such as VirtualBox. This also allows file sharing between your host operating system (e.g., Windows) and the virtual operating system (e.g., Linux).
Finally, you can dual-boot your machine to have a Linux distribution installed directly on your hardware. Upon boot-up you will be able to choose between Linux and Windows. This option is not recommended unless you really know what you are doing.
Remark If you need some help setting up your Unix-like computing environment for this class please come to office hours and we can help you get up and running. The sooner you reach out for help the better!
Remark The ubiquity of Linux means that there is a wealth of useful resources available to you. You can also learn useful tips from both Google searches and YouTube videos. However, please do not just copy the solutions or commands presented. Every single command input and code block has its own meaning/purpose, so please get informed on what you are doing. Do not blindly trust internet resources.
Remark Finally, don’t forget one critical thing: if you need help, please don’t be shy, and never hesitate to ask around. The instructor, the teaching assistant(s), and your peers can be invaluable!
Computing Resources on Campus¶
In addition to your own computing resources (with an appropriate UNIX style environment), you may use other computers available on campus. One good choice available to all students is Hummingbird. The Getting Started and Docs/Tutorials pages there have a wealth of information, and make good backup resources for the first part of this course.
Since Hummingbird uses a Linux operating system, like (nearly) all compute clusters, and it is a remote machine, to use it will require you to:
Learn how to use Unix/Linux command line (BASH) (see Basic Unix/Linux Commands)
Access it remotely with an SSH client
Learn how to use a terminal based text editor (vim, emacs, nano, etc.) or use the built in ssh capabilities of modern text editors like VS Code.
As mentioned, this should be trivial if you are using either a Linux or Mac machine. If you are a Windows user, you can use the Windows 10 OpenSSH client, or install PuTTy (discussed above). The Hummingbird getting started page lists a few others. There are even extensions for Firefox and Chrome to provide a usable terminal inside your browser.
If you haven’t had any chance to work on Linux/Unix type operating systems, please make sure you first familiarize yourself with basic Linux/Unix commands (See Basic Unix/Linux Commands). We will be going over this material soon!
Remark If you prefer to use your own laptop/desktop to program, it is your responsibility to install Fortran 90 and C compilers, the Python interpreter, and all libraries we will be using. The GNU Compiler Collection (GCC) is what we will be using in this course; this will provide both a C compiler (gcc) and a Fortran compiler (gfortran). When we discuss compiler flags and processes GCC compilers will be assumed.
Getting an appropriate environment on your machine¶
There are a couple of different ways to run Unix/Linux-system on Windows. Here are my recommendations (you will need to do one of them).
(Recommend) If you want to keep your Windows PC but want to do some cool scientific computing, please do the following:
Read about how to use the Windows Subsystem for Linux. This is the recommend option for those of you running Windows 10 and above.
There are further instructions for getting it set up here. You can stop at the step the describes git. We won’t be using any of the remaining things in that article (Docker onward).
If you aren’t running windows 10 or above (unlikely) then do the following (or, better yet, upgrade to Windows 10):
Install virtualbox, and create a Linux virtual machine
This is likely to be slightly slower than the WSL option, but you will have reliable graphical support
You will also need a copy of an installer for a Linux distribution. Ubuntu is probably the easiest to use, though it is a bit on the large side so expect this download to take a while!
Microcenter has a pretty good tutorial for setting up VirtualBox here. The instructions should be essentially the same for hosts other than Windows 10.
Find an empty drive (or partition an existing drive if you know what you are doing), and install Linux alongside Windows. This dual-boot setup is not the easiest option, but is the best option that also keeps Windows around.
Access an external Linux machine over the network using OpenSSH (least recommended, but at least Windows finally added a built in SSH client). If you can’t enable this client consider using PuTTY (discussed above), or one of the clients discussed on the Hummingbird pages.
If you are are interested in learning more about Linux and are tired of Windows
Erase Windows and install a Linux OS. Ubuntu is particularly approachable for people migrating from Windows. If you become interested in the internals of Linux then Archlinux or Gentoo may be of interest (not recommended for first time users). Alternatively, you can buy one of these machines that come with Ubuntu out of the box.
Get a mac, and install Xcode on it. This can be done by: opening a terminal and runing the following command
$ xcode-select --install
Remark Installing Linux on your own system, either by removing Windows or dual-booting, means you will be messing with your hard drive(s). Please backup your data, and know what you are doing.
Remark Further details for installing Linux on PCs can be easily found online. Hopefully some of the information here can give you useful search terms (like WSL, dual-boot, virtual machine, etc). You may also find online tutorials on Youtube. Please don’t forget to use these great resources, and of course, you’re always welcome to ask the instructor or your TA for help!
Scientific computing on Linux/Unix, macOS¶
Most projects within scientific computing rely on developers’ ability to build and interact with software on Unix style systems, as well as manage and program their own software. An essential tool for this work is a terminal emulator running an appropriate shell. Indeed, access to a sane choice of shell is partially what prevented Windows from having any utility in scientific computing (and partly why WSL exists in the first place).
Shell¶
The shell provides users with an interface to interact with the computer’s operating system and various target software packages in order to carry out computational tasks.
There are a couple of choices for Unix/Linux shells: Unix shell; Types of shells; Shell differences; We are going to cover the use of bash. Bash is available by default on essentially all Linux distrobutions. MacOS used to have bash as its default shell, but now uses Zsh. These are mostly compatible with each other, and either one can be used for this course.
If your experience with computers has only been through a GUI (graphical user interface), this will be an opportunity to interact with computers in a very different way. It is true that Windows has two native shells, DOS and Powershell. We will not use either of these shells (like every wise soul wouldn’t) for scientific computing. DOS is archaic and PowerShell is oriented more towards system administration than it is towards anything that we’ll be doing.
Basic Unix/Linux Commands¶
In this section we give a quick overview on some of the most common Bash commands. There are several rules to keep in mind when using the command line. Some important items are:
Commands are dependent on the shell being used. Bash is the most common and is assumed throughout the course.
Commands are case-sensitive.
If you are accessing a remote machine, ensure you always logout by typing
exit
when you’re done.
For more comprehensive information on the various options associated with any bash command, you can display a manual page using the man
command followed by the command you are curious about. For instance, if you want to learn more about the cp
command
$ man cp
Here are some basic commands for managing files and directories:
List directory contents
$ ls
ls
in long format$ ls -l
ls
all entries in the current directory (see Linux/Unix file properties)$ ls -a
You will see two special entries in all directories
.
and..
which are aliases for the current directory and the parent directory. Other files and directories can start with a.
to make them hidden.Create a new directory called
dirname
$ mkdir dirname
Change directory, meaning you go to a directory called
dirname
$ cd dirname
Recalling the directory aliases from before, you travel up the directory tree by running
cd ..
By default
cd
with no arguments sends you to your home directory. The following are equivalentcd ~ cd
You could run
cd .
but it is a little pointlessReport where you currently are in the directory tree
$ pwd
Copy
filename1
tofilename2
(this leavesfilename1
in place)$ cp filename1 filename2
You can also copy into a directory by typing
$ cp filename1 somedir $ ls $ ls somedir
Move (or rename)
filename1
tofilename2
(this will overwritefilename2
if it already exists!!)$ mv filename1 filename2
Note that the same rule for moving into directories holds. We’ll circle back to copying and moving directories themselves later.
Remove a file named
filename
(there is no recycle bin here, so be careful!)$ rm filename
Remove an empty directory: it will fail if the directory is not empty
$ rmdir dirname
remove non-empty directory and all of its files and subdirectories
$ rm -rf dirname
Be very careful with this one! What do you think those flags mean?
Move (rename) directories works the same way as moving files, e.g.
$ mv somedir otherdir
All of the internal of that directory will move too. This is a trivial operation since it actually just tweaks the path for that directory, and no data actually gets moved on the disc.
Copying directories is a little bit different. If you try to do it as above you will get this error
$ cp otherdir somedir cp: -r not specified; omitting directory 'otherdir'
The
-r
flag means recursive which needs to be present for directories to ensure that their contents also get copied. Try instead$ cp -r otherdir somedir
Here are some commands for looking at and manipulating files:
Display the contents of a file named
filename
, as much as will fit on your screen$ more filename
Similar to
more
with the extended navigation capability allowing both forward and backward paging$ less filename
Print the entirety of
filename
all at once, rather than a page at a time$ cat filename
Create an empty file named
filename
(multiple filenames aftertouch
command will create multiple empty files)$ touch filename
Pop quiz: What do you think touch does to an existing file?
Get the number of lines, words and characters in
filename
$ wc filename
Redirection, piping, and command chaining
Redirection lets you capture output that would generally show up in your terminal, and instead place it into a file, or forward it as input to another command
>
Redirects standard output to a file, and overwrites it$ cat filename1 filename2 > output $ cat output $ cat filename3 > output $ cat output
>>
Redirects standard output and appends it to a file$ cat filename2 >> output
Standard output (
stdout
) is not the only thing that can print to your screen. Error messages live in their own channel called standard error (stderr
). You can redirectstderr
by prefixing>
or>>
with a2
$ cat filename2 2>> output $ cat output $ cat nonexistant 2> output $ cat output
You can merge
stdout
andstderr
and redirect both by prefixing>
or>>
with&
$ cat filename2 &>> output $ cat output $ cat nonexistent &> output $ cat output
You can combine redirection operators to send output to different places
$ cat filename2 >> output 2>> err.log $ cat nonexistent >> output 2>> err.log
You can also redirect input using
<
$ wc -l filename2 $ wc -l < filename2 $ wc -l < filename2 >> output
The pipe operator
|
lets you send the output of a command as input to another command$ sort fruit | head -3 $ sort -r fruit | head -3 > output_list $ head -4 fruit | tail -1 >> output_list
The operators
&&
and||
let you run one command, then conditionally run another depending on how the first one exits.&&
runs the second command only if the first succeeds, while||
runs the second only if the first fails. Finally,;
simply lets you put multiple commands all on one line. They will all run unconditionally$ cat filename2 >> output && cat output $ cat nonexistant >> output && cat output $ cat filename2 >> output || cat output $ cat nonexistant >> output || cat output $ cat filename2 >> output ; cat output $ cat nonexistant >> output ; cat output
Let’s try the following example to appreciate the difference between
;
and&&
$ mkdir backup $ mkdir backup/old $ touch backup/old/foo{,1,2} $ cd backup/od && rm * $ cd backup/od; rm *
Did I mention that you should be careful with
rm
?Quick summary:
$ > #save output to a file $ >> #append output to a file $ < #read input from a file $ | #send the output from one program as input to another program $ A; B #Run A and then B, regardless of success of A $ A && B #Run B if A succeeded $ A || B #Run B if A failed $ A & #Run A in background
Searching for files or file contents
Find a file named
filename
in the current directory.
, and search recursively through all subdirectories$ find . -name filename
Find a file named
filename
in the entire file system/
(could take some time)$ find / -name filename
Note: The special directory
/
is the root of the whole filesystem. All directories are nested under/
in one way or another. This includes external drives and networked drives (sort of).Find a file named
filename
only under your personal directory~/
$ find ~/ -name filename
Note:
~
is another special directory. Like.
and..
this is an alias to a particular directory on the system. Here~
refers to the users home directory. We’ll have more to say when we talk about environment variables.Find a search keyword
AM129
inside all files in the current directory$ grep AM129 *
Try also
$ grep am129 foo.txt $ grep -i am129 foo.txt $ grep -in am129 * $ grep -il am129 *
Note the presence of the
*
in the prior commands. This is part of the Bash pattern matching (globbing) capabilities. In this*
will match any set of characters,?
will match any single character, and[]
will match a specified range of characters. There is much more to globbing, which can be referenced here. Consider the following block:$ touch file1.txt file2.sh errata.txt awesome.ex $ ls *.txt $ ls *.?? $ ls *.??? $ ls [e-f]* $ ls [e-f]*.txt
The
which
andwhereis
commands are useful for finding out the full path to the binary that is actually being executed when you type a command. For example, if you have successfully installedgfortran
on your computer, you should be able to see$ which gfortran /usr/bin/gfortran $ whereis gfortran gfortran: /usr/bin/gfortran /usr/share/man/man1/gfortran.1.gz /usr/share/info/gfortran.info.gz $ which f77 f77 not found
In the latter case it can be seen that no program called f77 exists in the search path, either because it is not installed, or because it can not be found in PATH.
Also, we see there is a different outcome in using
whereis
as opposed towhich
. Can you guess what the difference is?Note that
whereis
can also be used to locate libraries and header files that have been installed system-wide. For example:$ whereis liblapack.so liblapack.so: /usr/lib/liblapack.so
Here are some commands for killing jobs:
Sometimes you need to kill a job that’s running, perhaps because you realize it’s going to run for too long, or you gave it or the wrong input data. Perhaps you may are running a program like the IPython shell and it freezes up on you with no way to get control back.
Many programs can be killed with <ctrl>-c. For this to work, the job must be running in the foreground, so you might need to first give the
fg
command.Sometimes this doesn’t work, like when IPython freezes. Then try moving it to background with <ctrl>-z (which should work), find out its PID using
ps
, and use thekill
command$ ps 18917 ttys000 0:00.19 -bash 21181 ttys000 0:00.00 /bin/bash /Users/alberteinstein/anaconda/bin/python.app /Users/alberteinstein/anaconda/bin/ipython 21182 ttys000 0:00.19 /Users/alberteinstein/anaconda/python.app/Contents/MacOS/python /Users/alberteinstein/anaconda/bin/ipython 18921 ttys001 0:00.01 -bash 18925 ttys002 0:00.02 -bash 20647 ttys003 0:00.01 -bash 20656 ttys004 0:00.01 -bash 21171 ttys005 0:00.01 -bash $ kill 21181
Hit return again you will see
$ In [1]: Terminated: 15 $
If not, more drastic action is needed with the
-9
flag$ kill -9 21181
This almost always kills a process. Be careful what you kill. Also try to see more options in using
kill
command by typingman kill
.Setting up environment variables (case sensitive!):
Environment variables: We saw before that there are several choices for Unix shells. These roughly fall into two categories, derivatives of the Bourne shell and derivatives of the C shell, where in each category there are a number of variant shells. In this class we have chosen the bash shell as our default choice.
Under any circumstances where your default shell might not the
bash
shell, you can initiatebash
by typing$ bash
in a terminal. This will start the bash prompt.
Remark The shell
zsh
is also completely fine for this class and will behave in almost exactly the same way.In Unix/Linux there are variables called environment variables which define various properties that are important in the session/system. They include things like paths and shortcuts which can be set by the system, users including you, by the instantiation of a shell, or even by some of the programs that are installed or used interactively with other programs.
The following list includes several important environment variables that users often encounter (note that they are all capitalized).
Variables
Description
DISPLAY
Contains the identifier for the display that X11 programs should use by default.
HOME (abbreviated ~)
Indicates the home directory of the current user. the default argument for the cd built-in command, that is to say, typing ‘cd’ will jump to HOME from anywhere.
LD_LIBRARY_PATH
On many Unix systems with a dynamic linker, contains a colon-separated list of directories that the dynamic linker should search for shared objects when building a process image after exec, before searching in any other directories.
PATH
Indicates search path for commands. It is a colon-separated list of directories in which the shell looks for commands.
PWD
Indicates the current working directory as set by the cd command.
USER
Current user name(s)
Please see more in the variables chapter of this Bash guide. Environment variables can be used to store information about the environment, and to provide a shorthand for long but useful strings such as absolute paths. They often become essential for defining the computer’s behavior when the user compiles programs and builds libraries from the command lines. To see all of the environment variables that are active in your shell session, use the
env
command$ env
Environment variables can be set using
export
$ echo $MY_VAR $ export MY_VAR="my cool variable" $ echo $MY_VAR
Note that variables set using
export
only exist within that particular sessionCustom environment variables can be put in
.bashrc
or.bash_profile
(or whatever configuration files your shell uses).Such variables are exported to the system every time you start a new bash shell, e.g., opening a new terminal (in case bash is your default shell), or logging in to the system, or typing in
bash
. In these cases, a file named.bashrc
under your home directory is automatically sourced. This means that you can make your custom variables persistent by modifying the.bashrc
file. What you can do by modifying the file includes:Setting paths
Defining aliases
Customizing your prompt, etc.
Here is an example of
.bashrc
#----------------------------------- # Print current user names #----------------------------------- echo "Current users:" who -a #----------------------------------- # Export some paths and defaults #----------------------------------- export EDITOR="emacs -nw" export IDL_DIR="/usr/local/itt/idl" export PATH="/usr/local/Cellar/colordiff/1.0.13/bin:/usr/local/bin:$PATH" export PATH="/usr/local/Cellar/valgrind/3.8.1/bin:$PATH" . . . . . . #----------------------------------- # Aliases #----------------------------------- # Bash export LSCOLORS=gxBxhxDxfxhxhxhxhxcxcx # dark background alias lls='ls -laghFG' alias la='ls -lA' alias ll='ls -lhv' alias ls='ls -G' # Common Mac programs alias reload='source ~/.bash_profile' alias sublime='/Applications/Sublime\ Text.app/Contents/SharedSupport/bin/subl' alias text='open -a TextEdit' alias pre='open -a Preview' alias grepp='grep -in' alias sshy='ssh -Y'
There is another way to achieve the same using a different file called
.bash_profile
. You can put all of the above in.bash_profile
instead of.bashrc
, and customize your settings as you wish. The difference between these is subtle. The.bashrc
file is sourced for interactive non-login shells (e.g., opening a new terminal), whereas.bash_profile
is executed for login shells. See this nice article.In the above
.bashrc
example, as a system admin you will see a list of the current users logged every time when you open a new shell terminal. Usually you want this information only once when you first login to the machine, and to keep prompting this information on every new terminal would be unnecessary. To avoid this you can instead add such monitoring/diagnostic tools in.bash_profile
which will only be executed at login.This difference doesn’t exist in macOS though, and
.bash_profile
is invoked for each new terminal window instead of.bashrc
.You may not want to maintain the two separate files differently for login and non-login shells. Critically, you want to set
PATH
properly in both shells. A good way of consolidating the two files into one can be done by sourcing.bashrc
from.bash_profile
. A good example of a.bash_profile
could begin with#----------------------------------- # Source global definitions (if any) #----------------------------------- if [ -f ~/.bashrc ]; then source ~/.bashrc fi
With approach this you can put all the paths, custom aliases and common settings only in
.bashrc
.