Please read this

  • HPC is a shared facility, run on almost no budget, by a few full-time admins (Joseph Farran, Harry Mangalam) and a few part-time elves.

  • HPC is NOT your personal machine. It’s shared by about 2000 users of whom 100 or more may be using it at any one time. (Once connected, type w into the terminal to see who’s on the machine at the same time as you.) Actions you take on HPC affect all other users.

  • HPC has finite resources and bandwidth. It’s only via the consensual use of the GridEngine scheduler that it remains a usable resource. It uses QDR Infiniband among most of the high-density nodes and 1 GbE to connect the others. QDR can support about 4GB/s max data rate; GbE can support about 100MB/s per connection. That sounds like a lot, but not when it’s being shared by 50 others and especially not when 15 of those others are all trying to copy 20GB files back and forth (see below) and even more not when there are 100 batch jobs trying to move 60GB data files back and forth. Think before you engage in massive data movement or manipulation. Talk to one of us (email if you think your batch job may cause problems.

If you are unfamiliar with idea of a cluster, please read this brief description of cluster computing.

How to ask a question

Please see this separate web page:

Condo Nodes

HPC supports the use of condo nodes. These are privately owned, but integrated into the HPC infrastructure to take advantage of the shared applications and administration. These nodes are usually configured to allow public jobs to run on them when their owners are not using them. If the owners want to reclaim all the cores for a heavy analysis job, other jobs running on it may be suspended or even killed if RAM is limiting.

The free Qs (free64, free32, free*) are the Qs to which unaffiliated users can submit jobs to run on all free cores. Just beware that your job may be suspended as described above

How do I get an account?

By default, HPC is open to all postgrad UCI researchers, altho it is be available to undergrads with faculty sponsorship.

For non-condo owners, there is no cost to use HPC, but neither is there any right to use it. Your account may be terminated if we observe activity that runs counter to good cluster citizenship. This include attempted hacking, using your account to pirate software, and other proprietary digital content, crack passwords, repeated attempts to jump the GridEngine queue, ignoring cease & desist' emails from admins, etc. See UC Irvine’s policies for complete guidelines.

How do I connect to HPC?

You must use ssh, an encrypted terminal protocol. Be sure to use the -Y or -X options, if you want to view X11 graphics (see below).

On a Mac, you can use the Applications → Utilities → Terminal app, but a much better (and also free alternative is iterm2, which does a much better jobs at trapping mouse input and sending it on, and also forwarding the correct keyboard mappings. MacOSX (post-Mountain Lion) no longer includes its own but it supports native X11 graphics with XQuartz, which should be started before you start the X11-requiring application on HPC.
On Windows, use the excellent putty. To use X11 graphics, see also the section on Xming below.
On Linux, we assume that you know how to start a Terminal session with one of the bazillion terminal apps (konsole & terminator are 2 good ones).

Telnet access is NOT available, since it is not encrypted and can easily be packet-sniffed.

Use your UCINetID and associated password to log into one of the login nodes (they all use via a round-robin alias) via ssh.

To connect using a Mac or Linux, open the Terminal application and type:

ssh -Y
# the '-Y' requests that the X11 protocol is tunneled back to you, encrypted inside of ssh.

How to set up passwordless ssh

Passwordless ssh among the nodes is now set up for you automatically when your account is activated, so you don’t have to do this manually. However, as a reference for those of you who want to set it up on other machines, I’ve moved the documentation to the Appendix.

challenge problem".

If a Mac or Linux user, you may also be interested in using ssh to execute commands on remote machines. This is described here.

are also added to your ~/.ssh/authorized_keys file. If you do not want this, you’re welcome to comment it out, but unless it’s active, we can’t help you with problems that require a direct login.

ssh errors

Occasionally you may get the error below when you try to log into HPC or among the HPC nodes:

 Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
 Please contact your system administrator.
Add correct host key in /Users/joeuser/.ssh/known_hosts to get rid of this message.
Offending key in /Users/joeuser/.ssh/known_hosts:2
RSA host key for has changed and you have requested strict checking.
 Host key verification failed.

The reason for this error is that the computer to which you’re connecting to has changed its identification key. This might be due to the mentioned man-in-the-middle attack but is far more likely to be an administrative change that has caused the HPC node to have changed its ID. This may be due to a change in hardware, reconfiguration of the node, a reboot, an upgrade, etc.

The fix is buried in the error message itself.

Offending key in /Users/joeuser/.ssh/known_hosts:2

Simply edit that file and delete the line referenced. When you log in again, there will be a notification that the key has been added to your known_hosts file. More simply, you can also just delete your ~/.ssh/known_hosts file. The missing connection info will be regenerated when you ssh to new nodes.

Should you want to be able to log in regardless of this warning, you’ll have to edit the /etc/ssh/ssh_config file on your own Mac or Linux machine (sorry, Windows users) and add the 2 lines as shown below. There are good reasons for not doing this, but it’s a convenience that many of us use. Consider it the rolling stop of ssh security.

Host *
        StrictHostKeyChecking ask

After you do that, you’ll still get the warning (which you should investigate) but you’ll be able to log in.

If you’re using putty on Windows, you won’t be able to effect this security skip-around. Read why here.

After you log in…

Logging in to will give you access to a Linux shell, (bash by default, tcsh, ksh available). If you are a complete Linux novice, yo umay want to look over the locally produced Linux Tutorials part 1 - connecting, simple commands and part 2 - More Intro to Linux, bash, Perl, R which were written specifically for new HPC users.

Some bash pointers.

The default shell (or environment in which you type commands) for your HPC login is bash. It looks like the Windows CMD shell, but is MUCH more powerful. There’s a good exposition of some of the things you can do with the shell here and a heet.pdf[good cheatsheet here]. If you’re going to spend some time working on HPC, it’s worth your while to learn some of the more advanced commands and tricks.

If you’re going to be using HPC for more than a few times, it’s useful to set up a file of aliases to useful commands and then source that file from your ~/.bashrc. ie:

# the ~/.aliases file contains shortcuts for frequently used commands
# your ~/.bashrc file should source that file: '. ~/aliases'
alias someh="ssh -Y somehost"  # ssh to 'somehost'
alias hg="history|grep "       # search history for this regex
alias pg="ps aux |grep "       # search processes for this regex
alias nu="ls -lt | head -11"   # what are the 11 newest files?
alias big="ls -lhS | head -20" # what are the 20 biggest files?
# and even some more complicated commands
alias edaccheck='cd /sys/devices/system/edac/mc &&  grep [0-9]* mc*/csrow*/[cu]e_count'

You can also customize your bash prompt to produce more info than the default user@host. While you’re waiting for your calculations to finish, check out the definitive bash prompt HOWTO and / or use bashish to customize your bash environment.

DirB is a set of bash functions that make it very easy to bookmark and skip back and forth to those bookmarks. Download the file from the URL above, source it early in your .bashrc and then read how to use it via this link. It’s very simple and very effective. Very briefly, s bookmark to set a bookmark, g bookmark to cd to bookmark, sl to list bookmarks. Recommended if you have deep dir trees and need to keep hopping among the leaves.

Make sure bash knows if this is an interactive login

If you have customized your .bashrc to spit out some useful data when you log in (such as the number of jobs you have running), make sure to wrap that command in a test for an interactive shell. Otherwise, when you try to scp or sftp or rsync data to your HPC account, your shell will unexpectedly vomit up the same text into the connecting program with unpleasant results. Wrap those commands with something like this in your .bashrc:

interactive=`echo $- | grep -c i `
if [ ${interactive} = 1 ] ; then
  # put all your intereractive stuff in here:
  # ie tell me what my 22 newest files are
  ls -lt | head -22

You will also have access to the resources of the HPC via the Grid Engine (GE aka SGE) commands. The most frequently used commands for GE are qsub to submit a batch job and qstat to check the status of your jobs. Also q to display the status of all GE queues. You can also check the status of various resources with the qconf command. See the SGE cheatsheet for more details.

The login node(s) should be considered your 1st stop in doing real work. You can copy files to and from your home directory via the login node, edit files, compile and test code, etc, but you shouldn’t run any long (>1 hr) jobs on the login node itself. If you do and it impacts the performance of the login node (and we notice), we’ll kill them off to keep the login node responsive. To do real work, please request a node from the interactive queue, like this:

# for a 64bit interactive node
hmangala@hpc:~ $ qrsh

# wait a few seconds...
Rocks Compute Node
Rocks 6.1 (Emerald Boa)
Profile built 17:23 04-Dec-2012

Kickstarted 17:38 04-Dec-2012

Thu Jan 03 14:56:27 [0.00 0.00 0.00]  hmangala@compute-12-20:~
1001 $
# ready to go...

Data Storage on HPC

Quotas for Regular users

Unlike other clusters, a regular user (not part of a condo ownership group) will get 50GB of storage; condo owners will get storage that they have negotiated with OIT. Regular users can use arbitrary amounts of temporary storage on the /pub filesystem, altho this data is expected to be active; idle data may be deleted with short notice unless the user has notified us in advance.

We encourage you to use this temporary data storage, up to hundreds of GB, but we also warn you that if we detect large directories that have not been used in weeks, we retain the right to clean them out. The larger the dataset, the more scrutiny it will get. IF YOU HAVE LARGE DATASETS AND ARE NOT USING THEM, THEY MAY DISAPPEAR WITHOUT WARNING. We mean it when we say that if you generate valuable data, it is up to you to back it up elsewhere ASAP.

How to check your disk usage

Storage is always in short supply. The /pub filesystem is almost full; many of you are approaching your HOME quotas (50GB) on /data/users, and many of you are still generating Zillions of Tiny files (ZOTfiles), the scabies of storage systems. In order for you to figure out how much and how many files you’re using, we have a few tools that can help you figure out how much storage you’re using and in what way.

Commandline tools

These are utilities that can be used from your login shell - they require no X11 graphics nor a specialized connection like x2go.

df & du

df reports disk free or how much space is left on a particular filesystem in total. It does not break it down by user or dir.

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda5              87G   40G   43G  48% /
tmpfs                  32G  548K   32G   1% /dev/shm
/dev/sda1             870M  170M  656M  21% /boot
/dev/sda6             570G  7.6G  534G   2% /state/partition1
/dev/sdc              1.9T  280G  1.6T  16% /var
/dev/sdb              932G  199G  733G  22% /mirrors
zfs                   3.6T  168G  3.4T   5% /sge-zfs
nas-7-7.local:/data    15T  6.7T  7.9T  46% /data
beegfs_fast-scratch    13T  818G   12T   7% /fast-scratch
beegfs_dfs2           191T  106T   86T  56% /dfs2
beegfs_dfs1           464T  402T   63T  87% /dfs1
nas-7-2.ib:/pub        55T   51T  4.2T  93% /share/pub

$ df -h /dfs1   # specifying a filesystem reports only that one
Filesystem            Size  Used Avail Use% Mounted on
beegfs_dfs1           464T  402T   63T  87% /dfs1

du is disk usage and reports on specific dirs.

$ du -shc *
180K    dmc_halide_ion_water_clusters
8.0K    dmc_harmonic_oscillator
8.0K    dmc_quartic_oscillator
203M    dmc_sg_parahydrogen
2.8M    SRC_dmc_cg_true_gs
680K    SRC_dmc_constraints_threshold
3.5M    SRC_dmc_halide_ion_water_true_gs_dw
188K    SRC_dmc_parahydrogen_unconstrained
13M     SRC_mbnrg_O2_no_openmp_flags
7.3M    SRC_mbpol_O2_cppthresh60
4.2M    SRC_parallel_dmc_cg_true_gs_dw
3.7M    SRC_parallel_dmc_constrained_gs
68K     SRC_parallel_dmc_harmonic_quartic_oscillator
240K    SRC_parallel_dmc_parahydrogen_unconstrained_omp
3.9M    SRC_quenching
416K    SRC_ttm3f_O2
120M    water_mbpol
45M     water_tip4p
30M     water_ttm3
435M    total

du will by default recurse to the bottom of subdirs, tho you can restrict it to a certain depth with -d. See man du for more info.


tree provides a text-based listing that displays the complete dir structure as a pseudographic. Deep dir trees are best piped into less to view it more easily. tree has many options (try tree --help or man tree)

$ tree -sh | less
|-- [ 596]
|-- [ 44] repos
| |-- [4.0K] ca1
| | |-- [1.7K] ANsyn.mod
| | |-- [2.3K] ExpGABAab.mod
| | |-- [5.4K] Gfluct2.mod
| | |-- [1.8K] MyExp2Sid.mod
| | |-- [1.8K] MyExp2Sidnw.mod
| | |-- [8.7K] README.txt
| | |-- [2.9K] STDPE2Sid.mod
| | |-- [2.7K] buff_Ca.mod
| | |-- [4.8K] burststim2.mod
| | |-- [ 22K] ca1.hoc
| | |-- [2.5K] cad.mod
| | |-- [4.0K] cellframes
| | | |-- [ 11K] class_axoaxoniccell.hoc
| | | |-- [8.8K] class_bistratifiedcell.hoc
| | | |-- [8.6K] class_cckcell.hoc
| | | |-- [4.5K] class_dgbasketcell.hoc
| | | |-- [4.5K] class_dgbistratifiedcell.hoc
| | | |-- [9.8K] class_ivycell.hoc

gt5 will generate an interactive view of the dir it’s invoked in. You can move up and down in the tree with the left and right arrows to see deeper or higher in the tree.

 ./:   [434MB in 47 files or directories]  -64MB

  203MB [100.00%] ./dmc_sg_parahydrogen/
  119MB [58.87%] ./water_mbpol/  -61MB
   44MB [21.87%] ./water_tip4p/  -2.2MB
   29MB [14.37%] ./water_ttm3/  -1.0MB
   12MB [ 5.98%] ./SRC_mbnrg_O2_no_openmp_flags/
  7.3MB [ 3.60%] ./SRC_mbpol_O2_cppthresh60/
  4.2MB [ 2.06%] ./SRC_parallel_dmc_cg_true_gs_dw/
  3.8MB [ 1.88%] ./SRC_quenching/
  3.7MB [ 1.82%] ./SRC_parallel_dmc_constrained_gs/
  3.4MB [ 1.70%] ./SRC_dmc_halide_ion_water_true_gs_dw/
  2.7MB [ 1.34%] ./SRC_dmc_cg_true_gs/
  680KB [ 0.33%] ./SRC_dmc_constraints_threshold/
  416KB [ 0.20%] ./SRC_ttm3f_O2/
  240KB [ 0.12%] ./SRC_parallel_dmc_parahydrogen_unconstrained_omp/

The trusty ls can also be used as an analytic tool. The -R flag forces it to recurse to the bottom of the dir, so ls -lR | wc will count how many files and dirs are in the current dir.

$ ls -lR | wc
17902  139311  997536

NB: wc output is 'lines words characters' so the above means
   17902 lines (or files + dirs)
  139311 words (lots of words for each line)
  997536 this many characters in total (in the listing)

# get a statistical profile of your files by passing them thru 'stats'
$ ls -lR | scut -f=4 | stats
Sum       1368480263        # sum of all the sizes
Number    17505             # number of files and dirs
Mean      78176.5360182805  # mean of all the sizes
Median    2904              # median of all the sizes
Mode      4096              # etc
NModes    622
Min       0
Max       24653774
Range     24653774
Variance  439628296365.231
Std_Dev   663044.716716173
SEM       5011.43107110892
Skew      19.5251254040379
Std_Skew  1054.62791925003
Kurtosis  511.917104796882

ls also has a sort by size option (-S) that lists the largest files first, which is useful for discovering unexpectedly large files lurking in dirs.

$ ls -lSh |head
total 541M
-rw-r--r--  1 hjm  hjm    85M Dec 23  2008 2sigma.tar.gz
-rw-r--r--  1 hjm  hjm    26M May 13  2013
-rw-r--r--  1 hjm  hjm    25M Mar 12  2013 red+blue_all.txt
-rw-r--r--  1 hjm  hjm    13M Jul 12 12:41 SVSManual.qch
-rw-r--r--  1 hjm  hjm    12M Dec  3  2010 LinuxJournal_01_2011_SysAdmin.pdf
-rw-r--r--  1 hjm  hjm   7.2M Jul 12 12:41 SVSManual.pdf
-rw-rw-r--  1 hjm  hjm   6.4M Jul 29  2011 BackupPC_Project.tar.gz

Graphical tools

Right now there’s really only one useful tool for this on HPC.


k4dirstat and the related qdirstat (also for Mac and Windows, very quickly recurses thru the directory structure and makes a graphic of the layout - even colors it depending on what kind of file it is. The output is interactive and you can easily identify large files or dirs containing many files in the output. You can click the different dirs to open and close them and select files by clicking on the list up top or the icons below and the 2 panes will sync at that file.

Some examples are:

To use k4dirstat, you’ll need to use a connection to HPC that can render X11/XWindow graphics. It can be a native X11 client like a recent Linux distro, an X11 client like Xquartz for the Mac, or an X11 compressor client like x2go (clients for Mac, Win, and Linux). The last is the best performing over multiple hops. How to set it up for use on HPC.

How do I get my files to and from HPC?

Line endings in files from Windows and MacOS vs Linux/Unix/MacOSX

If you are creating data on Windows (or using an old Mac editor) and saving it as plain text for use on Linux, many applications will save the data with DOS end-of-line )(EOL) characters (a Carriage Return plus a Line Feed aka CRLF) as opposed to the Linux/MacOSX newline (a line feed alone aka LF). This may cause problems on Linux as only some applications will detect and automatically correct Windows newlines. Ditto visual editors which you might think would give you an indication of this. Most editors will give you a choice as to which newline type you want when you save the file, but sometimes the choice is not obvious. In any case, unless you’re sure of how your data is formatted, you can pass it though the Linux utility dos2unix which will replace the Windows newline with a Linux newline:

$ dos2unix windows.file linux.file

Ditto for the case of the old MacOS editor. In this case the EOL is is a CR only. Fix it by passing it thru mac2unix

$ mac2unix macosfile linux.file

In both cases if the 'linux.file is omitted, the original file will be converted.

This is covered in more detail in the document HOWTO Move Data. There are several ways to get your files to and from HPC. The most direct, most generally available way is via scp. Besides the commandline scp utility bundled with all Linux and OSX versions, there are GUI clients for MacOSX, Windows, and of course, Linux. Some other GUI clients are described below. If you have large collections of files or large individual files that change only partially, you might be interested in using rsync (included on Linux and OSX, with variants available for Windows.).

Once you copy your data to your HPC $HOME directory, it is available to all the compute nodes via the same mount point on each, so if you need to refer to it in a SGE script, you can reference the same file in the same way on all nodes. ie: /data/users/hmangala/my/file will be the same file on all nodes.


The hands-down, no-question-about-it, go-to utility here is the free WinSCP, which gives you a graphical interface for SCP, SFTP and FTP. Cyberduck is also available for Windows now as well.


There may be others but it looks like the winner here is the oddly named, but freely available Cyberduck, which provides graphical file browsing via FTP, SCP/SFTP, WebDAV, and even Amazon S3(!).


The full range of high-speed data commandline utilities are available via the above-referenced HOWTO Move Data. Summary: For ease of use and general availability, it’s hard to beat scp. For updating data archives, rsync is a utility that all users should know (there’s a graphical version called grsync on HPC. And for moving large amounts of data between long distances, bbcp is an extraordinary tool.


Once you’ve generated some data on HPC, you may want to keep it handy for a short time while you’re further processing it. In order to keep it both compact and accessible, HPC supports the archivemount utility on the login/hpc node. This allows you to mount a compressed archive (tar.gz, tar.bz2, and zip archives) on a mountpoint as a fuse filesystem. You can cd into the archive, modify files in place, copy files out of the archive, or copy files into the archive. When you unmount the archive, the changes are saved into the archive. Here’s an extended article on it from Linux Mag.

Here’s an example of how to use archivemount with a 84MB data tarball ('') that you want to interact with.

# how big is this thang?
$ ls -lh
total 84M
-rw-r--r-- 1 hmangala hmangala 84M Jun 15 14:55

# OK - 84MB, which is fine.  Now let's make a mount point for it.

$ mkdir jk

$ ls

# so now we have a zipfile and a mountpoint.  That's all we need to archivemount
# let's time it just to see how long it takes to unpack and mount this archive:

$ time archivemount jk

real    0m0.810s  <-  less than a second wall clock time
user    0m0.682s
sys     0m0.112s

$ cd jk      # cd into the top of the file tree.

# lets see what the top of this file tree looks like.  All file utils can work on this data
$ tree |head -11
`-- kent
    |-- build
    |   |-- build.crontab
    |   |-- dosEolnCheck
    |   |-- kentBuild
    |   |-- kentGetNBuild
    |   `-- makeErrFilter
    |-- java
    |   |-- build
    |   |-- build.xml

# and the bottom of the file tree.
$ tree |tail
            |   |-- wabaCrude.h
            |   `-- wabaCrude.sql
            |-- xaShow
            |   |-- makefile
            |   `-- xaShow.c
            `-- xenWorm
                |-- makefile
                `-- xenWorm.c

2286 directories, 12793 files <- lots of files that don't take up anymore 'real' space on the disk.

# how does it show up with 'df'?  See the last line..

$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md2             373484336  11607976 342598364   4% /
/dev/md1               1019144     47180    919356   5% /boot
tmpfs                  8254876         0   8254876   0% /dev/shm
/dev/sdc             12695180544 6467766252 6227414292  51% /data
                      66946520   8335072  55155872  14% /sge62
fuse                 1048576000         0 1048576000   0% /home/hmangala/build/fs/jk

# finally, !!IMPORTANTLY!! un-mount it.

$ cd ..   # cd out of the tree

$ fusermount -u jk    # unmount it with 'fusermount -u'
Don’t make huge archives if you’re going to use archivemount

archivemount has to "unpack" the archive before it mounts it, so trying to archivemount an enormous archive will be slow and frustrating. If you’re planning on using this approach, please restrict the size of your archives to ~100MB.

If you need to process huge files, please consider using netCDF or HDF5 formatted files and nco or pytables to process them. NetCDF and HDF5 are highly structured, binary formats that are both extremely compact and extremely fast to parse/process. HPC has a number of utilities for processing both types of files including R, nco, and VISIT.

If you can’t use HDF5 or netCDF, please keep your files compressed. Many domains allow large files to be processed as compressed archives (compressed bam format instead of uncompressed fastq format, for example).


sshfs is a utility for OSX and Linux that allows you to mount remote directories in your HPC home dir. Since it operates in user-mode, you don’t have to be root or use sudo to use it. It’s very easy to use and you don’t have to ask us to use it, except to request to be added to the fuse group.

You have to be able to ssh to the machine from which you want to exchange files, typically the desktop or laptop you’re connecting to HPC from (ergo WinPCs cannot do this without much more effort). For MacOSX and Linux, in the example below assume I’m connecting from a laptop named ringo to the HPC login node. I have a valid HPC login (hmangala) and my login on ringo is frodo.

frodo@ringo:~ $ ssh  # from ringo, ssh to HPC with passwordless ssh

 # <HPC login stuff deleted>

# make a dir named 'ringo' for the ringo filesystem mountpoint
hmangala@hpc:~  $ mkdir ringo

# sshfs-attach the remote filesystem to HPC on ~/ringo
# NOTE: you usually have to provide the FULL PATH to the remote dir, not '~'
# using '~' on the local side (the last arg) is OK.

# ie: this is WRONG:
# hmangala@hpc:~  $ sshfs ringo    # WRONG
#                                                         ^
# this is RIGHT:
hmangala@hpc:~  $ sshfs ~/ringo

hmangala@hpc:~  $ ls -l |head
total 4790888
drwxr-xr-x   2 hmangala hmangala          6 Dec 10 14:17 ringo/  # the new mountpoint for ringo
-rw-r--r--   1 hmangala hmangala       3388 Sep 22 16:25
-rw-r--r--   1 hmangala hmangala       4636 Dec  8 10:18 acct
-rw-r--r--   1 hmangala hmangala        501 Dec  8 10:20 acct.cpu.user
-rwxr-xr-x   1 hmangala hmangala        892 Nov 11 08:55 alias*
-rw-r--r--   1 hmangala hmangala        691 Sep 30 13:21 all3.needs

 <etc>         ^^^^^^^^^^^^^^^^^ note the ownership

# now I cd into the 'ringo' dir
hmangala@hpc:~  $ cd ringo

hmangala@hpc:~/ringo  $ ls -lt |head
total 4820212
drwxr-xr-x 1 frodo frodo       20480 2009-12-10 14:43 nacs/
drwxr-xr-x 1 frodo frodo        4096 2009-12-10 14:41 Mail/
-rw------- 1 frodo frodo          61 2009-12-10 12:54 ~Untitled
-rw-r--r-- 1 frodo frodo          42 2009-12-10 12:44 testfromclaw
-rw-r--r-- 1 frodo frodo      627033 2009-12-10 11:22 sun_virtualbox_3.1.pdf

#<etc>       ^^^^^^^^^^^ note the ownership.  Even tho I'm on hpc, the original ownership is intact
NB: If automapping UIDs don’t work

I recently tried this on HPC to my laptop and the UIDs/GIDs did not automatically map correctly. If they don’t, you can specify which UID/GID you want to have the remote files to have on your side via the -o uid=LOCAL_UID,gid=LOCAL_GID option.

See below for an example.

Sometimes the auto-UID-mapping doesn’t work for some reason. Here’s how to fix it.

# on ringo, my laptop
frodo@ringo:~  $ mkdir hpc       # make a dir to mount my HPC directory on.

# mounting my HOME files from HPC onto my laptop.
frodo@ringo:~  $ sshfs  hmangala@hpcs:/data/users/hmangala ~/hpc

# take a look at the ownership
frodo@ringo:~  $ ls -l ~/hpc | head
total 7703992
-rw-r--r-- 1  785  200      16986 Sep 22  2015
-rw-r--r-- 1  785  200     896184 Dec  9  2016 1CD3.pdb
-rw-r--r-- 1  785  200    2581796 Mar 26  2008 1D-Mangalam.tar.gz
-rw-r--r-- 1  785  200      28250 Sep 17  2015 1liner
-rw-r--r-- 1  785  200      28256 Sep 17  2015 1liner1
-rw-r--r-- 1  785  200          0 Jun 12 13:13 2
-rw-r--r-- 1  785  200    1599750 Jun 21  2006 3DM2-Linux-
-rw-r--r-- 1  785  200        636 Sep 12  2015 9-11-shutdown.txt

# THEY"RE WRONG!! (relative to my local IDs)
# They've been mapped directly across, so the UID/GID from HPC is being used here.

# in order to fix this, we do this:
# first, unmount the bad sshfs mount

frodo@ringo:~  $ fusermount -u hpc

# then use the sshfs option to re-map the UID/GID correctly.

# find your local UID/GID
frodo@ringo:~  $ id frodo  # or usually just 'id' by itself for your own id info
uid=1000(frodo) gid=1000(frodo)   # often there will be more groups, but this is all you need

# use those values to fill in the values in the sshfs option command.
frodo@ringo:~  $ sshfs -o uid=1000,gid=1000  hmangala@hpcs:/data/users/hmangala ~/hpc

frodo@ringo:~  $ ls -l ~hpc | head
total 7703992
-rw-r--r-- 1 frodo frodo      16986 Sep 22  2015
-rw-r--r-- 1 frodo frodo     896184 Dec  9  2016 1CD3.pdb
-rw-r--r-- 1 frodo frodo    2581796 Mar 26  2008 1D-Mangalam.tar.gz
-rw-r--r-- 1 frodo frodo      28250 Sep 17  2015 1liner
-rw-r--r-- 1 frodo frodo      28256 Sep 17  2015 1liner1
-rw-r--r-- 1 frodo frodo          0 Jun 12 13:13 2
-rw-r--r-- 1 frodo frodo    1599750 Jun 21  2006 3DM2-Linux-
-rw-r--r-- 1 frodo frodo        636 Sep 12  2015 9-11-shutdown.txt

# the above files are from HPC, 're-owned' to my local UID/GID.

# the above technique works on HPC as well.

OK, Continuing

# Now, writing from HPC to ringo filesystem
hmangala@hpc:~/ringo  $ echo "testing testing" > test_from_bduc

hmangala@hpc:~/ringo  $ cat test_from_bduc
testing testing

hmangala@hpc:~/ringo  $ ls -lt |head
total 4820216
drwxr-xr-x 1 frodo frodo       20480 2009-12-10 14:47 nacs/
-rw-r--r-- 1 frodo frodo          16 2009-12-10 14:46 test_from_bduc
drwxr-xr-x 1 frodo frodo        4096 2009-12-10 14:41 Mail/
#            ^^^^^^^^^^^  even tho I wrote it as 'hmangala' on HPC, it's owned by 'frodo'

# and finally, unmount the sshfs mounted filesystem.
hmangala@hpc:~/ringo $ fusermount -u ringo

# get more info on sshfs with 'man sshfs'

YOU are responsible for your data

We do not have the resources to provide backups of your data. If you store valuable data on HPC, it is ENTIRELY your responsibility to protect it by backing it up elsewhere. You can do so via the mechanisms discussed above, especially with (if using a Mac or Linux) rsync, which will copy only those bytes which have changed, making it extremely efficient. Using rsync (with examples) is described here.

How do I do stuff?

On the login node, you shouldn’t do anything too strenuous (computationally). If you run something that takes more than an hour or so to complete, you should be running on an interactive node (via qrsh) or submit it to one of the batch queues (via qsub

Can I compile code?

We have the full GNU toolchain available on both the login nodes so normal compilation tools such as autoconf, automake, libtool, make, ant, gcc, g++, gfortran, gdb, ddd, java, python, R, perl, etc are available to you. We also have some proprietary compilers or debuggers available - the Intel & PGC compilers and the TotalView Debugger (see the Modules section below for details). Please let us know if there are other tools or libraries you need that aren’t available.

Compiling your own code

You can always compile your own (or downloaded) code. Compile it in its own subdir and when you’ve built it, install it rooted from your own home directory in the usual lib, include, bin, man directories, except that they’re rooted from your $HOME dir (/lib, /include, /bin, /man).

If the code is well-designed, it should have a configure shell script in the top-level dir. The ./configure --help command should then give you a list of all the parameters it accepts. Typically, all such scripts will accept the --prefix flag. You can use this to tell it to install everything in your $HOME dir. ie:

./configure --prefix=/data/users/you  ...other options..

configure, when it completes successfully will generate a Makefile. At this point, you can type make (or make -j2 to compile on 2 CPUs) and the code will be compiled into whatever kind of executable is called for. Once the code has been compiled successfully (there may be a make test or make check option to run tests to check for this), you can install it in your $HOME directory tree with make install.

Then you can run it out of your ~/bin dir without interfering with other code. In order for you to be able to run it transparently, you will have to prepend your ~/bin to the PATH environment variable, typically by editing it into the appropriate line in your ~/.bashrc.

export PATH=~/bin:${PATH}

How do I find out what’s available?

Via the module command

We use the tcl-based environment module system to wrangle non-standard software versions and subsystems into submission. To find out what modules are available, simply type:

$ module avail # output is long & changes so much it's not useful to include it here

You can also list all modules that start with some letters:

$ module avail be

------- /data/modulefiles/SOFTWARE ---------
beagle-lib      beast/1.7.5     bedtools/2.15.0 bedtools/2.19.1
beast/1.7.4     bedops/2.4.14   bedtools/2.18.2 bedtools/2.23.0

To find out what a module does with the whatis option

$ module whatis bedops
bedops               :

BEDOPS is an open-source command-line toolkit that performs
highly efficient and scalable Boolean and other set operations,
statistical calculations, archiving, conversion and other
management of genomic data of arbitrary scale. Tasks can be
easily split by chromosome for distributing whole-genome analyses
across a computational cluster.


To LOAD a particular module, use the module load <module/version> command:

$ module load bedtools/2.15.0  # for example

(Note that loading a module does not start the application that it loads.)

If a module has a dependency, it should set it up for you automatically. Let us know if it doesn’t. If you note that a module has an update that we should install, tell us.

Also, if you neglect the version number, it will load the numerically highest version, which does not necessarily mean the latest, since some groups use odd numbering schemes. For example, samtools/0.1.7 is numerically higher (but older) than samtools/0.1.18.

To LIST all modules that you have loaded in your session

$ module list
Currently Loaded Modulefiles:
  1) gmp/5.1.3                 5) gcc/4.8.2
  2) mpc/1.0.1                 6) openmpi-1.8.3/gcc-4.8.2
  3) mpfr/3.1.2                7) gdb/7.8
  4) binutils/2.23.2           8) Cluster_Defaults

To UNLOAD a particular module:

$ module unload bedtools/2.15.0  # for example

To UNLOAD ALL modules (start from a clean session):

$ module purge
$ module list
No Modulefiles Currently Loaded.
If you want an app upgraded/updated

If you need the newest version of an app, FIRST make sure that we don’t already have it installed. See module avail above. THEN please supply us with a link to the updated version so we don’t have to scour the internet for it. If it’s going to require a long dependency list, please also supplyy us with an indication of what that is. If it’s an app that few other people will ever use, consider downloading it and installing it in your own ~/bin directory. If after that you think it’s worthwhile, we’d certainly consider installing it system-wide. See the notes on setting up personal modules

Via the shell

This is a bit tricky. There are literally thousands of applications that are available and many of them have names that are entirely unrelated to their function. In order to determine whether a well-known application is already on the system, you can simply try typing its name. If it’s NOT installed or not on your executable’s PATH, the shell will return command not found.

All the interactive nodes have TAB completion enabled, at least in the bash shell. This means that if you type a few characters of the name and hit <TAB> twice, the system will try to compete the command for you. If there are multiple executables that match those characters, the shell will present all the alternatives to you. ie

$ jo<TAB><TAB>
jobs        jockey-kde  joe         join

You can then complete the command or enter enough characters to make the command unique and hit <TAB> again and the command will complete.

Via the YUM installer Database

The CentOS yum repositories will let you search all the applications in the repositories that we have enabled which are currently:

CentOS-Base.repo       elrepo.repo        mirrors-rpmforge-extras   rpmforge.repo
CentOS-Debuginfo.repo  epel.repo          mirrors-rpmforge-testing  x2go.repo
CentOS-Media.repo      epel-testing.repo  RCS
CentOS-Vault.repo      mirrors-rpmforge   rocks-local.repo

If you have favorites that supply notable apps or libs, let us know.

To search for the ones that can be installed direct from the repositories, use yum search:

$ yum search fasta
======================================= N/S Matched: fasta ========================================
perl-Tie-File-AnyData-Bio-Fasta.noarch : Accessing fasta records in a file via Perl array

To see a more detailed descripton of the application, use yum info

$ yum info perl-Tie-File-AnyData-Bio-Fasta.noarch
Available Packages
Name        : perl-Tie-File-AnyData-Bio-Fasta
Arch        : noarch
Version     : 0.01
Release     : 1.el6.rf
Size        : 8.4 k
Repo        : rpmforge
Summary     : Accessing fasta records in a file via Perl array
URL         :
License     : Artistic/GPL
Description : Tie::File::AnyData::Bio::Fasta allows the management of fasta files via a Perl
            : array through Tie::File::AnyData, so read the documentation of this module for
            : further details on its internals.
  1. Debian/Ubuntu repositries are ~5X larger


Note that the Debian/Ubuntu repositories have about 5 times more entries than the yum repositories, so if you can find a Ubuntu host, you can search those repositories for applications that appear to do what you need and request that we acquire them. On Ubuntu machines, use apt-cache search <search term> to search and 'apt-cache show <specific entry name> to show full information.

HOWEVER, this only tells you that the application or library is available, not whether it’s installed. To find out whether it’s installed, you use yum list <rpm name>.

$ yum list zlib
Installed Packages
zlib.x86_64                    1.2.3-27.el6
Available Packages
zlib.i686                      1.2.3-27.el6                    Rocks-6.1

Via the Internet

Obviously, a much wider ocean to search. My first approach is to use a Google search constructed of the platform, application name, and/or function of the software. Something like

linux image photography hdr 'high dynamic range'  # '' enforces the exact phrase

Also, don’t be afraid to try Google’s Advanced Search or even Google’s Linux Search.

After evaluating the results, you’ll come to a package that seems to be what you’re after, pfstools, for example. If you didn’t find this in the previous searches of the application databases, you can look again, searching explicitly:

$yum info rsync
Installed Packages
Name        : rsync
Arch        : x86_64
Version     : 3.0.6
Release     : 9.el6
Size        : 682 k
Repo        : installed
From repo   : anaconda-base-201211270324.x86_64
Summary     : A program for synchronizing files over a network
URL         :
License     : GPLv3+
Description : Rsync uses a reliable algorithm to bring remote and host files into
            : sync very quickly. Rsync is fast because it just sends the differences
            : in the files over the network instead of sending the complete
            : files. Rsync is often used as a very powerful mirroring process or
            : just as a more capable replacement for the rcp command. A technical
            : report which describes the rsync algorithm is included in this
            : package.

and then you can ask an admin to install it for you. Typically the apps found in the application repositories lag the latest releases by a few point versions, so if you really need the latest version, you’ll have to download the source code or binary package and install it from that package. You can compile your own version as a private package, but to install it as a system binary, you’ll have to ask one of the admins.

Interactive Use

Logging on to an interactive node may be all that you need. If you want to slice & dice data interactively, either with a graphical app like MATLAB, VISIT, JMP, or clustalx, or a commandline app like nco or scut or even hybrids like gnuplot or R, you can run them from any of the interactive nodes, read, analyze and save data to your $HOME directory. As long as you satisfy the graphics requirements, you can view the output of the X11 graphics programs as well.

bash Shortcuts

The bash shell allows an infinite amount of customization and shortcuts via scripts and the alias command. Should you wish to make use of such things (such as nu to show you the newest files in a directory or ll to show you the long ls output in human readable form), you can define them yourself by typing them at the commandline:

alias nu="ls -lt |head -22" # gives you the 22 newest files in the dir
alias ll="ls -l"   # long 'ls' output
alias llh="ls -lh" # long 'ls' output in human (KB, MB, GB, etc) form
alias lll="ls -lh |less" # pipe the preceding one into the 'less' pager

# for aliases, there can be no spaces between the alias and the start of
# definition: ie
[myalias = "what it means"] is wrong.  It has to be
[myalias="what it means"]

You can also place all your useful aliases into your ~/.bashrc file so that all of them are defined when you log in. Or separate them from the ~/.bashrc by placing them into a ~/.alias file and have it sourced from your ~/.bashrc file when you log in. That separation makes it easier to move your alias library from machine to machine.

byobu and screen: keeping a session alive between logins

In most cases, when you log out of an interactive session, the processes associated with that login will also be killed off, even if you’ve put them in the background (by appending & to the starting command). If you regularly need a process to continue after you’ve logged out, you should submit it to the GE scheduler with qsub (see immediately below).

However, sometimes it is convenient to continue a long-running process when you have to log out (as when you have to shut down your network connection to take your laptop home). In this case, you can use the under-appreciated screen program, which establishes a long-running proxy connection on the remote machine that you can detach from and then re-attach to without losing the connection. As far as the remote machine is concerned, you’ve never logged off, so your running processes aren’t killed off. When you re-establish the connection by logging in again, you can re-attach to the screen proxy and take up as if you’ve never been away.

The only downsides are that the terminal scrollback is usually lost and that you cannot start an X11 graphics session from a byobu terminal since the remote DISPLAY variable doesn’t get set correctly.

You can also use screen as a terminal multiplexer, allowing multiple terminal sessions to be used from one login, especially useful if you’re using Windows with PuTTY that doesn’t have a multiple terminal function built into it.

For these reasons, screen by itself is a very powerful and useful utility, but it is admittedly hard to use, even with a good cheatsheet and a video. To the rescue comes a screen wrapper called byobu which provides a much easier-to-use interface to the screen utility. byobu has been installed on all the interactive nodes on HPC and can be started by typing:

$ byobu

There will a momentary screen flash as it refreshes and re-displays the login, and then the screen will look similar, except for 2 lines along the bottom that show the screen status. In the images below, the one at left (or on top) is without byobu; at right (or below) is with byobu. The byobu screen shows 3 active sessions: login, claw_1, and bowtie. The graphical tabs at the bottom are part of the KDE application konsole which also supports multiplexed sessions (allowing you to multi-multiplex sessions (polyplex?))

without byobu with byobu

The help screen, shown below, can always be gotten to by hitting the <F9> key, followed by the <Enter> key.

Byobu 2.57 is an enhancement to GNU Screen, a command line
tool providing live system status, dynamic window management,
and some convenient keybindings:

F2    Create a new window    |  F6    Detach from the session
F3    Go to the prev window  |  F7    Enter scrollback mode
F4    Go to the next window  |  F8    Re-title a window
F5    Reload profile         |  F9    Configuration
                             |  F12   Lock this terminal
'screen -r'  - reattach      |  <ctrl-A> Escape sequence
'man screen' - screen's help | 'man byobu'  - byobu's help

Most usefully, you can create new sessions with the F2 key, switch between them with F3/F4 and detach from the screen session with F6. It depends on your OS and your terminal emulator whether the F keys will work correctly. The screen control keys almost always work. See the cheatsheet below.

Note that you must have started a screen session before you can detach, so to make sure you’re always in a screen session, you can have it start automatically on login by changing the state of the Byobu currently launches at login flag (at bottom of screen after the 1st F9.

When you log back in after having detached, type byobu again to re-attach to all your running processes. If you set byobu to start automatically on login, there will be no need of this, of course, as it will have started.

Note that byobu is just a wrapper for screen and the native screen commands continue to work. As you become more familiar with byobu, you’ll probably find yourself using more of the native screen commands. See this very good screen cheatsheet.

Environment Variables

Environment variables (envvars) are those which are set for your session and can be modified for your use. They include directives to the shell as to which browser or editor you want started when needed, or application-specific paths to describe where some data, executables, or libraries are located. For example, here is some of my envvar list, generated by printenv:

$ printenv
SSH_CLIENT= 42655 22


Many of these are generated by the bash shell or by system login processes. Some ones that I set are:

EDITOR=joe                   # the text editor to be invoked from 'less' by typing 'v'
TACGLIB=/usr/local/lib/tacg  # a data dir for a particular application
XEDITOR=nedit                # my default GUI/X11 editor
BROWSER=/usr/bin/firefox     # my default web browser

Many applications require a set of envvars to define paths to particular libraries or to data sets. In bash, you define an envvar very simply by setting it with an =:

# for example, PATH is the directory tree thru which the shell will search for executables

# you can append to it (search the new dir after the defined PATH):

# or prepend to it (search the new dir before the defined PATH)

Note that when you assign to these envvars, you use the non-$name version and when you use them in bash scripts, you use the $name version. Further, in some cases when you use the $name version, if it’s not clear by context as well as allowing you to do additional magic with parameter expansion (using the braced variable to get values from shell or to perform additional work on the variable). Double parentheses (()) are used to indicate that arithmentic is being performed on the variables. Note that inside the parens, you don’t have to use the $name:

# using $a, $b, & $c in an arithmetic expression:
$ a=56;  b=35 c=1221
$ echo $((a + b * 4/c))

# note this will be integer math, so '56' is returned, not '56.1146601147'

See this bit on stackoverflow for a longer, but still brief explanation.

SGE Batch Submission & Queues

If you have jobs that are very long or require multiple nodes to run, you’ll have to submit jobs to an SGE Queue (aka Q).

qsub will submit the job described by to SGE, which will look for an appropriate Q and then start the job running via that Q. For more on the Qs available on HPC and who can use them and how, please see Running Jobs on The HPC Cluster, a description of the system Qs, and especially the free Qs.

Once you log into the login node (via ssh -Y <your_UCINetID>, you can get an idea of the hosts that are currently up by issuing the qhost command. You can find out the status of your jobs with qstat -u <your-login> alone, which will tell you the status of your jobs or qstat alone, which will tell you the status of all jobs currently queued or running. A very useful PDF cheatsheet for the SGE q commands is here.

To get an idea of the overall cluster load, type q, which will display all the Qs with usage and available nodes shown. You can also run clusterload which will summarize the load in 1 line by summing the cores in use vs the total number of cores available.

What cluster resources to request?

All jobs require CPU cycles, RAM, and Input/Output (IO), typically to some storage device. In order to find out how much of each you need, and what would be the best resource to use, you should run your application on a small set of input data, prefixed by the /usr/bin/time -v command. That command will tell you a number of useful things that you can use to request resources that are well-matched to your jobs.

This is important since if you request too many resources, your jobs will linger in the Q longer, waiting for more resources to become available. And obviously, if you request too few resources, your jobs may fail.

Here’s an example using 2 input data sets, first with human chromosome 1 (243M bases) and then with a much smaller input (chromosome 21, 43Mb)

$ export SS=/data/apps/commondata/Homo_sapiens/UCSC/hg19/Sequence/Chromosomes;
$ /usr/bin/time -v tacg -n6 -slLc -S -F2 < ${SS}/chr1.fa > chr1.tacg.out
        Command being timed: "tacg -n6 -slLc -S -F2"
      * User time (seconds): 72.76
      * System time (seconds): 3.28
      * Percent of CPU this job got: 93%
      * Elapsed (wall clock) time (h:mm:ss or m:ss): 1:21.48
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
      * Maximum resident set size (kbytes): 3745120
        Average resident set size (kbytes): 0
      * Major (requiring I/O) page faults: 0
      * Minor (reclaiming a frame) page faults: 233595
        Voluntary context switches: 17852
        Involuntary context switches: 24019
      * Swaps: 0
      * File system inputs:   496560
      * File system outputs: 2878576
      * Socket messages sent: 0
      * Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
      * Exit status: 0

/usr/bin/time -v output comparison

/usr/bin/time parameter Chr1 (243Mb) Chr21 (47Mb) Comments

Command being timed:

"tacg -n6 -slLc -S -F2"


Same command, different inputs

User time (seconds):



5X input yields 5x execution time

System time (seconds):



for system time as well

Percent of CPU this job got:



both got about the same amount of CPU

Elapsed (wall clock) time (h:mm:ss or m:ss):



wall clock time also 5x

Maximum resident set size (kbytes):



5X the RAM requirements

Minor (reclaiming a frame) page faults:



Voluntary context switches:



Involuntary context switches:






no swaps; everything stays in RAM

File system inputs:



5X the number of reads as expected

File system outputs:



4X the number of writes

Socket messages sent:



Socket messages received:



Exit status:



Output size:



output is 4X, matching the # writes

The above output shows both what CPU time is taken up by a particular run and very roughly, how it scales with increasing input data. Particularly useful are the parameters in bold above. The combination of User & System time (seconds) shows how much CPU time is being taken by this application (mod the Percent of CPU this job got:). The Maximum resident set size (kbytes) show how much RAM it consumed during the run. These values allow you to see what runtime ou should ask for if you’re running on a restricted Q or machine with limited RAM (at least 4GB for the larger run, at least 1GB for the smaller run). If you were going to stage the output to another filesystem, the output size is also important.

SGE qstat state codes

When you type qstat, the State codes can tell you a lot about what’s happening. But only if you know what they mean. Here’s what most of them mean.

SGE status codes:

Category State SGE Letter Code




pending, user hold


pending, system hold


pending, user and system hold


pending, user hold, re-queue


pending, system hold, re-queue


pending, user and system hold, re-queue







running, re-submit


transferring, re-submit



job suspended

s, ts

queue suspended

S, tS

queue suspended by alarm

T, tT

all suspended with re-submit

Rs, Rts, RS, RtS, RT, RtT


all pending states with error

Eqw, Ehqw, EhRqw


all running and suspended states with deletion

dr, dt, dRr, dRt, ds, dS, dT, dRs, dRS, dRT

qsub scripts

Kevin Thornton, a knowledgeable cluster user and certified geek, has written his own Introduction to using the HPC cluster, especially describing preparing qsub scripts and creating array jobs. It is also worth a read.

The shell script that you submit ( above) should be written in bash and should completely describe the job, including where the inputs and outputs are to be written (if not specified, the default is your home directory). The following is a simple shell script that defines bash as the job environment, calls date, waits 20s and then calls it again.


# request Bourne shell as shell for job
#$ -S /bin/bash

# print date and time
# Sleep for 20 seconds
sleep 20
# print date and time again

Note that your script has to include (usually at the end) at least one line that executes something - generally a compiled program but it could also be a Perl or Python script (which could also invoke a number of other programs). Otherwise your SGE job won’t do anything.

Using qsub scripts to keep data local

HPC depends on a network-shared /data filesystem. The actual disks are on a network file server node so users are local to the data when they log in. However, when you submit an SGE job, unless otherwise specified, the nodes have to read the data over the network and write it back across the network. This is fine when the total data involved is a few MB, such as is often the case with molecular dynamics runs - small data in, lots of computation, small data out. However, if your jobs involve 100s or 1000s of MB, the network traffic can grind the entire cluster to a halt.

To prevent this network armaggedon, there is a /scratch directory on each node (writable by all users, but sticky - files written can only be deleted by the user who wrote them).

$ ls -ld /scratch
drwxrwxrwt 6 root root 4096 Oct 29 18:20 /scratch/
         + the 't' indicates 'stickiness'

If there is a chance that your job will consume or emit lots of data, please use the local /scratch dir to stage your data, and especially your output.

This is dirt simple to do. Since your qsub script executes on each node, your script should copy the data from your $HOME dir to /scratch/$USER/input to stage the data, then specify /scratch/$USER/input as input, with your application writing to /scratch/$USER/output_node#. When the application has finished, copy the output files back to your $HOME dir again, and finally cleaning up the /scratch/$USER/whatever afterwards.

Here’s another page of information on using scratch space.

More example qsub scripts

  • is a slightly more elaborate sleeper script.

  • an annotated example script that does data copying to /scratch

  • another annotated example script that uses /scratch and collates and moves data back to $HOME after it’s done.

  • fsl_sub is a longer, much more elaborate one that uses a variety of parameters and tests to set up the run.

  • a longer annotated qsub script that demonstrates the use of md5 checksums.

  • is a qsub script that implements an array job - it uses SGE’s internal counter to vary the parameters to a command. This example also uses some primitive bash arithmetic to calculate the parameters.

  • is a Python script for generating serial qsubs, in a manner similar to the SGE array jobs. However, if you need more control over your inputs & outputs and /or are more familiar with Python, it may be useful.

  • a script that launches an MPI script in a way that allows it to suspend and restart. If you do not write your MPI scripts in this way and try to suspend them, they will be aborted and you’ll lose your intermediate data. (NB: it can take minutes for an MPI job to smoothly suspend; only seconds to restart).

Staging data - some important caveats

READING: Copying data to the remote node makes sense when you have large input data and it has to be repeatedly parsed. It makes less sense when a lot of data has to be read once and is then ignored. (If the data is only read once, why copy it? Just read it in the script.) If you stage it to /scratch, it is still traversing the network once so there is little advantage. (If you have significant data to be re-read on an ongoing basis, contact me and depending on circumstances, we may be able to let you leave it on the /scratch system of a set of nodes for an extended period of time. Otherwise, we expect that all /scratch data will be cleaned up post-job.

If it does make sense to stage your data, please try to follow the guidelines below. If the cluster locks up, offending jobs will be deleted without warning so ask me if you have questions.

Limit your staging bandwidth
If your job(s) are going to require a mass copy (for example, if you submit 20 jobs that each have to copy 1GB), then throttle your job appropriately by using a bandwidth-limiting protocol like scp -C -l 2000 instead of cp. This scp command compresses the data and also limits the bandwidth to ~250KB/s in the above case (2000 refers to KiloBITS, not KiloBYTES). scp will work without requiring passwords, just like ssh within the cluster. The syntax is slightly different tho.

# use scp to copy from my $HOME dir to a local node /scratch dir as would be required in a qsub
scp -C -l 2000 /scratch/hmangala

This prevents a few bandwidth-unlimited jobs from causing the available cluster bandwidth to drop to zero, locking up all users. If you have a single job that will copy a single 100MB file, then don’t worry about it; just copy it directly.

Assume the aggregate bandwidth of the cluster is about 100 MB/s. No set of jobs should exceed half of that, so if you’re submitting 50 jobs, the total bandwidth should be set to no more than 50MB/s or 1 MB/s per job or in scp terms -l 10000.

Check the network before you submit a job
While there’s no way to predict the cluster environment after you submit a job, there’s no reason to make an existing BAD situation WORSE. If the cluster is exhibiting network congestion, don’t add to it by submitting 100 staging jobs. (and if it does appear to be lagging, please let me know)

How to check for cluster congestion
On the login node, you can use a number of tools to see what the status is.

  • top give you an updating summary of the top CPU-using processes on the node. If the top processes include nfsd, and the load average is above \~4 with no user processes exceeding 100%, then the cluster can be considered congested. Most users have a multi-colored prompt that shows the current 5m, 10m, & 15m load on the system in square brackets.

Fri Sep 23 14:56:15 [0.13 0.20 0.36]  hjm@bongo:~
617 $

(For those that don’t have the fancy prompt, you can add it by inserting the following line into your ~/.profile or ~/.bashrc.)

PS1="\n\[\033[01;34m\]\d \t \[\033[00;33m\][\$(cat /proc/loadavg | cut -f1,2,3 -d' ')] \
\[\033[01;32m\]\u@\[\033[01;31m\]\h:\[\033[01;33m\]\w\n\! \$ \[\033[00m\]"
  • nfswatch produces a top-like output that can display a number of usage patterns on NFS, including top client by hostname, username, etc.

  • nethogs produces a top-like output that shows which processes are using the most bandwidth.

  • ifstat will produce a continuous, instantaneous chart of network interface output.

  • dstat will produce a similar readout of many system parameters including CPU, memory usage, network, and storage activity.

  • iotop will produce a very useful top like display of who & what is using up disk bandwidth.

  • htop produces a colored, top-like output that is multiply sortable to debug what’s happening with the system.

  • atop produces yet another top-like output but highlights saturated systems. It provides more info to the root user, but is also useful for regular users.

  • iftop produces a very useful (but only available to root) text-based, updating diagram of network bandwidth by endpoints. Mentioned as it might be useful to users on their own machines.

  • etherape will produce a graphical ring picture of your network with connections colored by connection type and sized by amount of data flowing thru it.

Fixing qsub errors

Occasionally, a script will hiccup and put your job into an error state. This can be seen by the qstat state output:

$ qstat -u '*'

job-ID  prior   name       user         state submit/start at     queue
slots ja-task-ID
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - -
   6868 0.62500  hmangala     E     06/08/2009 11:29:02 free@compute-1-1

the E (^) means that the job is in an ERROR state. You can either delete the job with qdel:

qdel <Job ID> # deletes the job

or often change it’s status with the qmod command.

qmod -cj <Job ID> # clears the error state of the job

Some useful SGE script parameters

When you submit an SGE script, it is processed by both bash and SGE. In order to protect the SGE directives from being misinterpreted by bash, they are prefixed by #$ This prefix causes bash to ignore the rest of the line (considers it a comment), but allows SGE to process the directive correctly.

So, the rules are:

  • If it’s a bash command, don’t prefix it at all.

  • If it’s an SGE directive, prefix it with both characters (#$).

  • If it’s a comment, prefix it only with a #.

Here are some of the most frequently used

#$ -N job_name     # this name shows in qstat
#$ -S /bin/bash    # run with this shell
#$ -q free64     # run in this Q
#$ -l h_rt=50:00:00  # need 50 hour runtime
#$ -l mem_size=2G  # need 2GB free RAM
#$ -pe mpich 4     # define parallel env and request 4 CPU cores
#$ -cwd            # run the job out of the current directory
                   # (the one from which you ran the script)
#$ -o job_name.out # the name of the output file
#$ -e job_name.err # the name of the error file
#  or
#$ -o job_name.outerr -j y            # '-j y' merges stdout and stderr

#$ -t 0-10:2       # task index range (for looping); generates 0 2 4..10
#                    Uses $SGE_TASK_ID to find out whether they are task
#$ -notify         # send mail about this job
#$ -M <email> -    # to the this <email> address.
#$ -m beas         # send a mail to owner when the job
#                      begins (b), ends (e), aborts (a),
#                      or suspends (s).

When a job starts, a number of SGE environment variables are set and are available to the job script.

Here are most of them:

  • ARC - The Sun Grid Engine architecture name of the node on which the job is running; the name is compiled-in into the sge_execd binary

  • SGE_ROOT - The Sun Grid Engine root directory as set for sge_execd before start-up or the default /usr/SGE

  • SGE_CELL - The Sun Grid Engine cell in which the job executes

  • SGE_JOB_SPOOL_DIR - The directory used by sge_shepherd(8) to store jobrelated data during job execution

  • SGE_O_HOME - The home directory path of the job owner on the host from which the job was submitted

  • SGE_O_HOST - The host from which the job was submitted

  • SGE_O_LOGNAME - The login name of the job owner on the host from which the job was submitted

  • SGE_O_MAIL - The content of the MAIL environment variable in the context of the job submission command

  • SGE_O_PATH - The content of the PATH environment variable in the context of the job submission command

  • SGE_O_SHELL - The content of the SHELL environment variable in the context of the job submission command

  • SGE_O_TZ - The content of the TZ environment variable in the context of the job submission command

  • SGE_O_WORKDIR - The working directory of the job submission command

  • SGE_CKPT_ENV - Specifies the checkpointing environment (as selected with the qsub -ckpt option) under which a checkpointing job executes

  • SGE_CKPT_DIR - Only set for checkpointing jobs; contains path ckpt_dir (see the checkpoint manual page) of the checkpoint interface

  • SGE_STDERR_PATH - The path name of the file to which the standard error stream of the job is diverted; commonly used for enhancing the output with error messages from prolog, epilog, parallel environment start/stop or checkpointing scripts

  • SGE_STDOUT_PATH - The path name of the file to which the standard output stream of the job is diverted; commonly used for enhancing the output with messages from prolog, epilog, parallel environment start/stop or checkpointing scripts

  • SGE_TASK_ID - The task identifier in the array job represented by this task

  • ENVIRONMENT - Always set to BATCH; this variable indicates that the script is run in batch mode

  • HOME - The user’s home directory path from the passwd file

  • HOSTNAME - The host name of the node on which the job is running

  • JOB_ID - A unique identifier assigned by the sge_qmaster when the job was submitted; the job ID is a decimal integer in the range to 99999

  • JOB_NAME - The job name, built from the qsub script filename, a period, and the digits of the job ID; this default may be overwritten by qsub -N

  • LOGNAME - The user’s login name from the passwd file

  • NHOSTS - The number of hosts in use by a parallel job

  • NQUEUES - The number of queues allocated for the job (always 1 for serial jobs)

  • NSLOTS - The number of queue slots in use by a parallel job

The above was extracted from this useful page. For more on SGE shell scripts, see here.

For a sample SGE script that uses mpich2, see below

Where do I get more info on SGE?

Oracles purchase of Sun has resulted in a major disorganization of SGE (now OGE) documentation. If a link doesn’t work, it may be because of this kerfuffle. Tell me if a link doesn’t work anymore and I’ll try to fix it.

If you need to run an MPI parallel job, you can request the needed resources by Q as well by specifying the resources inside the shell script (more on this later) or externally via the -q and -pe flags (type man sge_pe on one of the HPC nodes).

Special cases

Editing Huge Files

In a word, don’t. Many research domains generate or use multi-GB text files. Prime offenders are log files and High-Thruput Sequencing files such as those from Illumina. These are meant to be processed programmatically, not with an interactive editor. When you use most such editors, it typically tried to load the entire thing into memory and generates various cache files. (If you know of a text editor that handles such files without doing this, please let me know.)

Otherwise, use the utilities head which will dump the 1st few lines of a file, tail which will dump the last few lines of a file, grep which will allow you to search for regular expressions, split which will split the file into smaller bits, less, a pager which allows you to page thru a text document, sed a stream editor which allows you to change one regex with another, and tr the translate utility which allows you to translate or delete character strings to another, possibly in combinations with Perl/Python to peek into such files and or change them.

grep especially is one of the most useful tools for text processing you’ll ever use.

For example, the following command starts at 2,000,000 lines into a file and stops at 2,500,000 lines and shows that range in the less pager.

$ perl -n -e 'print if ( 2000000 .. 2500000)' humongo.txt | less

In addition, please use the compression utilities gzip/gunzip, bzip2, zip, zcat, etc instead of the ark graphical utility on such files. ark apparently tries to store everything in RAM before dumping it.

NAMD scripts

namd is a molecular dynamics application that interfaces well with VMD. Both of these are available on HPC - see the output of the module avail command.

The qsub scripts to submit namd 2.7 jobs to the SGE Q’ing system are a bit tricky due to the way early namd is compiled - the specification of the worker nodes is provided by the charmrun executable and some complicated additional files supplied with the namd package. This means that namd2.7x is more complicated to set up and run than namd2.8x. The qsub scripts are provided separately below.

R on HPC

R is an object-oriented language for statistical computing, like SAS (see below). It is becoming increasingly popular among both academic and commercial users to the extent that it was noted in the New York Times in early 2009. For a very simple overview with links to other, better resources, see this link

There are multiple versions of R on HPC, and they do not all behave identically because of module requirements or simply due to time required to install. If you run across a situation where a library isn’t available, please let us know.

For most things, everything works identically. The things that don’t usually have to do with parallel processing in R and the underlying Message Passing Interface (MPI) technology. If a parallel library in R doesn’t work as expected, please let us know.

We also support RStudio on the HPC login node. You’ll need to 'module load your favorite R version and then type rstudio. It should pop up on your local screen as long as you’ve logged in with x2go or started an X11 server. See the connection section and the Graphics section to make sure you can view X11 graphics.

SAS 9.3 for Linux

We have a single node-locked license for SAS 9.3 on the login node. While the license is for that node only, as many instances of SAS can be run as there is RAM for it.

To start SAS on the login node:

ssh -Y <Your_UCINETID>

# then change directories (cd) to where your data is
cd /dir/holding/data

# and start SAS

This will start an X11 SAS session, opening several windows on your monitor (as long as you have an active X11 server running). If you’re connecting from Mac or Windows, please see this link.

You can use the SAS program editor (one of the windows that opens automatically), or use any other editor you want and paste or import that code into SAS. The combination of emacs and ESS (Emacs Speaks Statistics) is a very powerful combination. It’s mostly targeted to the R language, but it also supports SAS and Stata.

Parallel jobs

HPC supports several MPI variants.


HPC provides mpich in 2 versions; mpich 1.2.7, mpich2 1.4.1, and mpich 3.0.4 in conjunction with a few compiler combinations. Please choose the best one via module avail.

  • To compile MPI programs, you’ll have to module load the correct MPICH/MPICH2 environment:

module load mpich2
  • you may need to create the file ~/.mpd.conf, as below:

# replace 'thisismysecretpassword' with something random.
# You won't have to remember it.
echo "MPD_SECRETWORD=thisismysecretpassword" >.mpd.conf
chmod og-rw .mpd.conf
  • your mpich2 qsub scripts have to include the 2 following lines in order to allow SGE to find the PATHS to executables and libraries

module load mpich2

A full MPICH2 script is shown below. Note the #$ -pe mpich2 8 line which sets up the MPICH2 parallel environment for SGE and requests 8 slots (CPUs). (see above for more SGE script parameters)

# good idea to be explicit about using /bin/bash (NOT /bin/sh).
# Some Linux distros symlink bash -> dash for a lighter weight
# shell, which works 99% of the time but causes unimaginable pain
# in those 1% occassions.

# Note that SGE directives are prefixed by '#$' and plain comments are prefixed by '#'.
# Text after the '<-' should be removed before executing.

#$ -q long    <- the name of the Q you want to submit to
#$ -pe mpich2 8    <- load the mpich2 parallel env and ask for 8 slots
#$ -S /bin/bash    <- run the job under bash
#$ -M <- mail this guy ..
#$ -m bea          <- .. when the script (b)egins, (e)nds, or (a)borts or (s)uspends
#$ -N cells500     <- name of the job in the qstat output
#$ -o cells500.out <- name of the output file.
module load mpich2              <- load the mpich2 environment
export MPD_CON_EXT="sge_$JOB_ID.$SGE_TASK_ID" <- this is REQUIRED for SGE to set it up.
module load neuron              <- load another env (specific for 'neuron')
export NRNHOME=/apps/neuron/7.0 <- ditto
cd /data/users/hmangala/newmodel      <- cd to this dir before executing
echo "calling mpiexec now"      <- some deugging text
mpiexec -np 8 nrniv -mpi -nobanner -nogui /data/users/hmangala/newmodel/model-2.1.hoc
# above, start the job with 'mpiexec -np 8', followed by the executable command.


HPC also supports the openMPI versions 1.4.4, 1.6.0, 1.6.3, 1.6.5, also in multiple compiler combinations. OpenMPI is more easily set up for runs than mpich, at least in the earlier versions. However using them is fairly similar and the recent versions are very compatible.


MATLAB can be started from the login node by loading the appropriate module and typing matlab:

module load MATLAB

This will start the MATLAB Desktop on the login node which is fine to edit and check code but NOT to run computationally heavy jobs. If you need to do that, use qrsh to be moved to another machine and then use the above sequence to start MATLAB on the secondary node.

We have a few licenses for interactive MATLAB on the HPC cluster which are decremented from the campus MATLAB license pool. They are meant for running interactive, relatively short-term MATLAB jobs, typically less than a couple hours. If they go longer than that, or we see that you’ve launched several MATLAB jobs, they are liable to be killed off.

If you want to run long jobs using MATLAB code, the accepted practice is to compile your MATLAB .m code to a native executable using the MATLAB compiler mcc and then submit that code, along with your data to an SGE Q (see above for submitting batch jobs). This approach does not require a MATLAB license, so you can run as many instances of this compiled code for as long as you want without impacting the campus license pool.

The official mechanics of doing this is described here.

Some additional notes from someone who has done this is in the Appendix.

MATLAB license status

You can check the license status of the campus MATLAB pool with the following command (after you module load MATLAB):

$MATLAB/bin/glnxa64/lmutil lmstat -a -c

#Please include the above line in your qsub scripts if you're using MATLAB to make sure the license server is online.

# you can check more specifically by then grepping thru the output.
# For example to find the status of the Distributed Computing Toolbox licenses:

$MATLAB/bin/glnxa64/lmutil lmstat -a -c | grep Distrib_Computing_Toolbox

MATLAB Alternatives

There are a number of MATLAB alternatives, the most popular of which are available on HPC. Since these are Open Source, they aren’t limited in the number of simultaneous uses, altho you should always try to run batch jobs in the SGE queue if possible. See this doc for an overview of them and further links.


HPC has one node that contains 4 recent Nvidia GPUs. Please see this document for more information on the GPUs and how to use them.


All the interactive nodes will have the full set of X11 graphical tools and libraries. However, since you’ll be running remotely, any application that requires OpenGL, while it will probably run, will run so slowly that you won’t want to run it for long. If you have an application that requires OpenGL, you’ll be much better off downloading the processed data to your own desktop and running the application locally.

If you connect using Linux

In order to have access to these X11 tools via Linux, your local Linux must have the X11 libraries available. Unless you have explicitly excluded them, all modern Linux distros include X11 runtime libraries. Don’t forget to use the the -Y flag when you connect using ssh to tunnel the X11 display back to your machine:

ssh -Y

If you connect using MacOSX

MacOSX no longer supplies the previous X11 libraries and applications, so for modern Macs, you’ll have to install the (still free) XQuartz package by yourself. XQuartz is also required by the x2go package to view graphical applications remotely.

If you connect using Windows

There are quite a few ways to use a Linux system besides logging into it directly from the console.

  • remote shell access, using PuTTY, a free ssh client, which even allows X11 forwarding so that you can use it with Xming (below) to view Graphical apps from HPC. Putty is a straight ssh terminal connection that allows you to securely connect to the Linux server and interact with it in a purely text-based basis. For a shell/terminal cognoscenti, it’s considerably less capable than any of the terminal apps (konsole, eterm, gnome-terminal, etc) that come with Linux, but it’s fine for establishing the 1st connection to the Linux server. If you’re going to run anything that requires an X11 GUI, you’ll need to set PuTTY to do X11 forwarding. To enable this, double-click the PuTTY icon to bring up the PuTTY configuration window. On the left Pane, follow the clickpath: Connection → SSH → X11 → set the Enable X11 Forwarding. After setting this, click on Session at top of the pane, and set a name in Saved Sessions on lower right pane, click the [Save] button to save the connection information so that the next time you need to connect, the correct setting will already be set. You can customize PuTTY with a number of add-ons and config tweaks, some of which are described here.

  • x2go is a dramatic improvement on the NoMachine code (see below), in ease of installation performance, and features. You can download the clients for OSX, Windows, and Linux for free here. The server has been installed on the HPC login node and all you have to do is configure your client to connect to it. See this link for instructions to do so.

  • Xming, a lightweight and free X11 server (client, in normal terminology). Xming provides only the X server, as opposed to Cygwin/X below. Xming provides the X server that displays the X11 GUI information that comes from the Linux machine. When started, it looks like it has done nothing, but it has started a hidden X11 window (note the Xming icon in the toolbar). When you start an X application on the Linux server (after logging in with PuTTY as described above), it will accept a connection from the Linux machine and display the X11 app as a single window that looks very much like a normal MS WinXP window. You’ll be able to move it around, minimize it, maximize it and close it by clicking on the appropriate button in the title bar. There may be a slight lag in response in that window, but over the University network, it should be be acceptable.

  • if you have trouble setting up Putty and Xming, please see this page which describes it in more detail, with screenshots

  • Cygwin/X, another free, but much larger and capable X server (combined with an entire Linux-on-Windows implementation). Provides much more power and requires much more user configuration than Xming. Cygwin/X provides not only a free Xserver but nearly the entire Linux experience to Windows. This is more than what most normal users want (both in diskspace and configuration), especially if you have a real Linux server to use. The X11 server is very good tho, as you might expect.

  • VNC server and client. A decent way to connect to a server, but outclassed by the x2go system described below.

  • NoMachine Server and Clients, a system much like the VNC system but much more efficient and therefore has better performance. Better than VNS due to its compression routines. NoMachine still makes its client available for free but has closed its server source code so it is no longer useful to HPC. The older source code has been forked and improved by the x2go group (above) and that is the solution we recommend now.

How to Manipulate Data on Linux

This is a topic for a another document named Manipulating Data on Linux and the documents and sites referred to therein.

Frequently Asked Questions

OK, maybe not frequently, but cogently, and CAQ just doesn’t have the same ring. If you have other questions, please ask them. If they address a frequent theme, I’ll add them here. In any case, I’ll try to answer them.

What’s a node? Is it the same as a processor?

A node refers to a self-contained chassis that has its own power supply, motherboard (containing RAM, CPU, controllers, IO slots and devices (like ethernet ports), various wires and unidentifiable electrogrunge). It usually contains a disk, altho this is not necessary with boot-over-the-network. It’s not the same as a processor. Typical HPC nodes (from the Jurassic period) have 2-4 CPU cores per node. Modern nodes have 8 to >100 cores.

When I submit a .sh script with qsub, does the following line refer to 10 processors or 10 nodes or what?

#$ -pe openmpi 10

10 processor cores. Most modern physical CPUs (the thing that plugs into the motherboard socket) have multiple processor cores internally these days.

What about the call to mpiexec?

mpiexec -np 10 nrniv -mpi -nobanner -nogui modelbal.hoc

Same thing as above. That’s why they should be the same number.

Is it possible for the processors on one node to be working on different jobs?

Yes, altho the scheduler can be told to try to keep the jobs on 1 node (better for sharing memory objects like libs, but worse if there’s significant contention for other resources like disk & network IO). Most of the MPI environments on HPC are currently set to spread out the jobs rather than bunch them together on as few nodes as possible.

If CPU 1 (working on Job A) fails, does it bring down CPU 2 (working on Job B)?

No, and in fact it doesn’t typically work that way. A job does not run on a particular CPU; on a multi-core node, different threads of the same job can hop among CPU cores. The kernel allocates threads and processes to whatever resources it has to optimize the job.

Is the performance of processor 1 dependent on whether processor 2 is engaged in the same or different job?

It depends. The computational bits of a thread, when they are being executed on a CPU, don’t interfere much with the other processor. They do share memory, interrupts, and IO so if they’re doing roughly the same thing at roughly the same time, they’ll typically want to read and write at the same time and thus compete for those resources. That was the rationale for spreading out the MPI jobs rather than filling up nodes.

Is it possible for one processor to use more than its "share" of the memory available to the node?

i.e., is it wrong for me to count on having a certain amount of memory just because I’ve specified a certain number of processors (nodes?) for my job? The CPU running prog1 will request the RAM that it needs independent of other CPUs running prog1 or prog2, prog3, etc. If the node gets close to running out of real RAM, it will start to swap idle (haven’t-been- accessed-recently) pages of RAM to the disk, freeing up more RAM for active programs. If the computer runs out of both RAM and swap, it will hopefully kill off the offending programs until it regains enough RAM to function and then it will continue until it happens again. This is why you should try to estimate the amount of RAM your prog will use and indicate that to the scheduler with the -l mem_free directive. See the section above.

Why I can ssh to HPC but can’t scp files to it?

Probably because you edited your .bashrc (or .zrc or .tcshrc) to emit something useful when you log in. (Both scp and ssh have a useful option -v that puts it into verbose mode that tells you much more about what the process is doing and why it fails). You need to mask this output from non-interactive logins like scp and remote ssh execution by placing such commands inside a test for an interactive shell. When using bash, you would typically do something like this:

interactive=`echo $- | grep -c i `
if [ ${interactive} = 1 ] ; then
  # tell me what my 22 latest files are
  ls -lt | head -22

Where are the Perl/Python scripts that came with an application?

It’s often the case that an app is delivered with a number of scripts that make use of it in a particular way. If the application itself is written in that language and is delivered as a library that is supposed to be installed as part of the Python / Perl tree, we’ll install it directly into the Perl / Python libs (currently perl/5.16.2 or enthought_python/7.3.2).

If it’s a standalone script, which doesn’t require such integration, it’ll go in the app’s bin dir. In either case, the module should set up the paths so you can just call the script. For example, in the case of rseqc, if you module load rseqc, it will also module load enthought_python and set up all the paths:

$ module load rseqc

# is a script supplied with rseqc, but installed with enthought_python
$ which     # where is it installed?

# so it's installed in the enthought_python tree. If the scripts aren't automatically found,
# the module probably isn't written correctly, so let us know.

How to I install my own Python module?

Some modules are clearly not going to be used by most HPC users. For those Python modules and libs, we suggest that you install and maintain them locally. For most users, you’ll want to use the enthought_python module as a basis, so start from there and then use pip to install the package locally.

$ module load enthought_python
$ pip install --user PeachPy   # as an example
Downloading/unpacking PeachPy
  Running egg_info for package PeachPy

Installing collected packages: PeachPy
  Running install for PeachPy

Successfully installed PeachPy
Cleaning up...

This installs the module PeachPy into your local dir ~/.local/lib/python2.7/site-packages.

NB: use pip instead of easy_install if there’s a choice. easy_install seems to be deprecated or at least is not as smooth and reversible as pip.

You might also use the package virtualenv to isolate your packages from the system versions.

Both pip and virtualenv are installed as part of the enthought_python module.

How do I write the shebang line so that the script is portable?

Many interpreted languages Perl, Python, bash, Ruby, etc) can be run like any other application by just making it executable and naming the script:

$ chmod +x /path/to/
$ --opt1=bannana --scope=34 --infile=/path/to/my/file

This is accomplished by specifying the shebang line, the 1st line of the script that specifies the interpreter. It’s typically of the form:


... rest of script ...

This is usually the path to the system-supplied interpreter, which is generally fine for personal use, but on a cluster or for an app that is meant to be shared more widely, it can generate odd error messages if the system doesn’t have the interpreter in the expected place. Recent versions of bash (4.2.25, for example) will produce a useful error message if the interpreter is in the wrong place:

$ scut --opt1=this --opt2=that
bash: /home/hjm/bin/scut: /usr/local/bin/perl: bad interpreter: No such file or directory

The above error message diagrams the failure, like a traceback: you tried to execute scut, but it failed because the specified interpreter /usr/local/bin/perl didn’t exist.

The way to specify the shebang line portably is to use the env mechanism which asks the environment what it knows about, rather than telling the system what to do and risk it knot knowing.

#    so instead of telling the system to use a specific Perl '
$ /usr/bin/perl
#    and risk it not being there, or conflicting with various libs that
#    the script needs that might be in a different installation..
#    you ask the environment to use the Perl it knows about
$ /usr/bin/env perl
#    so if you've 'module load'ed a different Perl, the environment
#    now knows about it and will direct the script to use it instead.
  1. You can’t use flags in an env shebang

The kernel only accepts one argument for #!/usr/bin/env [interpreter] so while #!/usr/bin/env perl is valid, additional parameters are not. Many coders use Perl’s -w flag to help debug their scripts and while you can specify it in the regular shebang, you will need to remove it in the env version.

Some workarounds are to modify a calling bash script and prepend the word "perl -w" before you call your perl script if you want warnings. You can also modify your perl script internall to: use warnings;

Where is my job running?

Use qstat.

$ qstat -u UCINETID
job-ID  prior   name       user    state submit/start at     queue             slots ja-task-ID
 978260 0.07021 ap1_fast   UCINETID   r  10/25/2013 16:09:13 cee@compute-4-5.local    1
 978262 0.07021 ap2_fast   UCINETID   r  10/25/2013 16:09:53 cee@compute-4-5.local    1
 978279 0.07021 ap3_fast   UCINETID   r  10/25/2013 16:10:43 cee@compute-4-5.local    1
 978281 0.07021 chm_rpt_fa UCINETID   r  10/25/2013 16:11:03 cee@compute-4-5.local    1
# your job is running on this node ------------------------------^^^^^^^^^^^^^^^^^

How do I tell how much RAM my application is using?

Use top. ssh to the node running your application (see above) and run top:

ssh -t compute-4-5 'top -M'

top will show you how much RAM the app is using and how much is available. The partial output below shows that there are multiple runs of Flexf running, the 1st one using 945MB of RAM (RES for resident) which is 0.4% of the total RAM (252 GB) on the machine - note the line Mem: 252.395G total. The VIRT (virtual) RAM use is the total of the RES, plus any shared memory plus swapped mem plus mapped memory from libraries. The other numbers to note are the used RAM (how much RAM is in use on the node) and the cached RAM. In the case below, the amount used 77.836G used includes the amount cached 44.507G cached (the amount used for caching files IO), so the amount of RAM being actively used by applications and the OS is the difference (33GB), so the node has quite a lot of available RAM (220GB), more than the amount noted as free (174.55GB)

top itself is using 15.608 MB in total (VIRT) of which 3.532MB is RAM-Resident (RES), which is eqaul to the amount referred to in the %MEM column. The node has 132291304k total (132GB) of which 102286304k (102GB) is used and 30GB are free. (this is somewhat misleading since the used total includes the RAM that’s being used for file-caching (~59GB cached, which can be reclaimed quickly if needed).

top - 08:02:58 up 27 days, 11:45,  1 user,  load average: 16.00, 16.00, 15.99
Tasks: 1376 total,  17 running, 1359 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.0%us,  0.0%sy,  0.0%ni, 75.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   252.395G total,   77.836G used,  174.559G free,  220.191M buffers
Swap:   16.602G total,    0.000k used,   16.602G free,   44.507G cached

  959 aamelire  20   0 1423m 945m 2140 R 100.0  0.4  18283:44 Flexf
 5014 aamelire  20   0 2097m 1.0g 2144 R 100.0  0.4   1227:04 Flexf
 5448 aamelire  20   0 2050m 1.1g 2140 R 100.0  0.4   1227:00 Flexf
 5741 aamelire  20   0 1950m 843m 2140 R 100.0  0.3   1226:40 Flexf
 6218 aamelire  20   0 1924m 1.7g 2140 R 100.0  0.7  18257:42 Flexf
 7502 aamelire  20   0 5182m 4.4g 2140 R 100.0  1.7  18256:46 Flexf                                                       --------------------------------------------------------------


Cluster Computing

What is a cluster?

A compute cluster is typically composed of a pool of computers (aka nodes) that allow users (and there are usually several to several hundred simultaneous users) to spread compute jobs over them in a way that allows the maximum number of jobs to matched to number of computers. The cluster is often composed of specialized login nodes, compute nodes, storage nodes, and specialty nodes (ie: a large memory node, a GPU node, an FPGA node, a database server node, etc)

The HPC cluster consists of about 100 computers, each of which has 4-64 64bit CPU cores and 8-256GB RAM. All these nodes have a small amount of local disk storage (filesystems or fs) that are directly connected with the node that hold its Operating System, a few utilities and some scratch space (in /scratch). Some nodes have considerably larger local storage to provide more storage for a specific application or to the research group that bought it. All the nodes communicate with each other over a private 1 Gb/s ethernet network, via a few central switches. This means that each node can communicate at almost 100MB/s total bandwidth with all the other nodes but there is a bottleneck at the switches and at frequently used nodes, such as the login node and at main storage nodes.

Additionally on HPC, most nodes also communicate over QDR Infiniband at about 4 GB/s so traffic from our large filesystems to compute nodes are quite fast.

The difference between your HOME dir and gluster-based dirs

The main storage system for HPC was the /data filesystem, provided by the nas-1-1 node. The HOME filesystem is a 5.5TB RAID6. RAID6 means that it can lose 2 disks before it will lose any data. However, if more than 2 disks are lost, ALL data will be lost. It has been supplemented by the BeeGFS filesystem which is a distributed filesystem. On BeeGFS, the data is spread piecewise over 8 RAID6s on 4 different servers, each of which hosts 1/4 of the data, so even if a whole node is destroyed, 3/4 of the files will survive (but not necessarily entire files, since large files are striped across multiple arrays for better performance. That’s why we repeat the mantra Back up your files if they are of value.

The Strongly Suggested approach is to put your code and and small intermediate analyses on HOME and keep your large data and intermediate files on /dfsX if you can. In this way, you’ll be able to search thru your files quickly, but when you submit large jobs to the cluster via SGE, they won’t bog down the login node, nor will they interfere with other cluster jobs since the /dfsX are distributed FSs. In other words, it scales well.

Some words about Big Data

To new users, especially to users who have never done BIG DATA work before: Understand what it is you’re trying to do and what that means to the system. Consider the size of your data, the pipes that you’re trying to force it thru and what analyses you’re trying to get it to perform.

It should not be be necessary to posit this, but there are clearly users who don’t understand it. There is a 1000 fold difference between each of these:

  • 1,000 bytes, a KILOBYTE (KB) ~ an email

  • 1,000,000 bytes, a MEGABYTE (MB) ~ a PhD thesis

  • 1,000,000,000 bytes, a GIGABYTE (GB) ~ 30 X the 10 Volume The Story of Civilization.

  • 1,000,000,000,000 bytes, a TERABYTE (TB) ~ 1/10 of the text content of the Library of Congress.

  • 1,000,000,000,000,000 bytes, a PETABYTE (PB) ~ 100 X the text content of the Library of Congress

HPC has about 30TB of storage on /gl to be shared among 400 users, and the instantaneous needs of those users varies tremendously. We do not use disk quotas to enforce user limits to allow substantial dynamic storage use. However, if you use hundreds of GB, the onus is on you to clean up your files and decrease that usage as soon as you’re done with it.

1 Big File vs Zillions of Tiny Files

This subject - arcane as it might seem - is important enough to merit its own subsection. Because HPC is community infrastructure, efficient use of its resources is important. Tiny files require almost the same amount of directory space as a large file, so if you have only 100bytes to store, store it in single file. However, the problems start compounding when there are many of them. Because of the way data is stored on disk, 10 MB stored in ZOTfiles of 100bytes each can easily take up NOT 10MB, but more than 400MB - 40 times more space. Worse, data stored in this manner makes many operations very slow - instead of looking up 1 directory entry, the OS has to look up 100,000. This means 100,000 times more disk head movement, with a concommittent decrease in performance and disk lifetime. If you are writing your own utilities, whether in Perl , C, Java, or Haskell, please use efficient data storage techniques, minimally as indexed file appending, preferably as real data storage such as binary formats, HDF5 and netCDF, and don’t forget about in-memory data compression (for example, using the excellent free zlib library or language-specific libraries that use compression, such as:

libio-compress-perl - bundle of IO::Compress modules
python-snappy - Python library for the snappy compression library from Google

If you are using someone else’s analytical tools and you find they are writing ZOTfiles, ask them, plead with them to fix this problem. Despite the sophistication of the routines that may be in the tools, it is a mark of a poor programmer to continue this practice.

Reducing your own ZOTfiles

Adam and I have written a utlity that can help address this problem if you’re generating ZOTfiles. It can coordinate multiple writes into a single file from hundreds of processes via the use of file locking. It is described here in more detail, including a link to the utility.

HOWTO: Passwordless ssh

Passwordless ssh will allow you to ssh/scp to frequently used hosts without entering a passphrase each time. The process below works on Linux and Mac only. Windows clients can do it as well, but it’s a different procedure. However, regardless of your desktop machine, you can use passwordless ssh to log in to all the nodes of the HPC cluster once you’ve logged into the login node.

Note for HPC Parallel / MPICH2 Users

If you’re going to be using MPI, via some variant of MPI (MPICH, MPICH2, OpenMPI), or another parallel toolkit, you almost certainly will have to set this up to work on HPC so you (or your scripts) can passwordlessly ssh to other nodes. For HPC users using only serial programs it can still be useful as it cuts down on the amount of typing of passwords you’ll have to do.

And it’s dead simple.

In a terminal on your Mac or Linux machine, type:

# for no passphrase, use
ssh-keygen -b 1024 -N ""

# if you want to use a passphrase:
ssh-keygen -b 1024 -N "your passphrase"
# but you probably /don't/ want a passphrase - else why would you be going thru this?

save to the default places.

For the HPC cluster case: Since all cluster nodes share a common /home, all you have to do is rename the public key file (normally in your ~/.ssh dir) to authorized_keys.

For unrelated (non-cluster) hosts: Linux users, use the ssh-copy-id command, included as part of your ssh distribution. (Mac users will have to do it manually, described just below.) ssh-copy-id does all the copying one shot, using your ~/.ssh/ key (by default; use the -i option to specify another identity file, say ~/.ssh/ if you’re using DSA keys)

# you'll have to enter your password one last time to get it there.

What this does is to scp to the remote host (the ssh server your’re trying to log into) and append that key to the remote file ~/.ssh/authorized_keys. If things don’t work, check that the file has been appended correctly.

Then verify that it’s worked by ssh’ing to HPC. You shouldn’t have to enter a password anymore.

If it does not work, check the permissions on the ~/.ssh dir and the files therein. In my case on the HPC side (where passwordless ssh works) my permissions are set to:

$ ls -ld ~/.ssh
drwx------ 2 hmangala staff 4096 Apr 20 09:08 /data/users/hmangala/.ssh

# the files inside:
ls -l ~/.ssh
total 92
# contains remote public keys
-rw------- 1 hmangala staff  2770 Apr 14 14:46 authorized_keys

# contains directives to ssh for local configs
-rw------- 1 hmangala staff    73 Jan  2  2013 config

# local private DSA key - MUST be set to private
-rw------- 1 hmangala staff   668 Jul 23  2013 id_dsa

#  local public DSA key - MUST be set to public read-all
-rw-r--r-- 1 hmangala staff   614 Jul 23  2013

# ditto for RSA-based keys
-rw------- 1 hmangala staff   883 Oct 14  2013 id_rsa
-rw-r--r-- 1 hmangala staff   234 Oct 14  2013

# contains the verified fingerprints of hosts to which you have connected
-rw-r--r-- 1 hmangala staff 23985 Aug  2 11:36 known_hosts

For Mac users, scp the same keys to the remote host and append your public key to the remote ~/.ssh/authorized_keys. Here are the commands below. Just modify the UCINETID value and mouse them into the Terminal window on your local Mac.

bash  # starts the bash shell just to make sure the rest of the commands work
cd    # makes sure you're in your local home dir
export UCINETID=""  # fill in the empty quotes with *your UCINETID*

# you'll need to enter the password manually for the next 2 commands)

scp ~/.ssh/ ${UCINETID}
ssh ${UCINETID} 'cat ~/.ssh/ >> ~/.ssh/authorized_keys'

# and now you should be able to ssh in without a password
First time challenge from ssh

If this is the 1st time you’re connecting to HPC from your Mac (or PC), you’ll get a challenge like this:

The authenticity of host ' (' can't be established.
RSA key fingerprint is 57:70:23:8e:e1:15:8c:51:b0:52:ca:c7:a8:e9:26:9b.
Are you sure you want to continue connecting (yes/no)?

and you have to type yes.

For MPI / Parallel users, you should set up a local ~/.ssh/config file to tell ssh to ignore such requests. The file should contain:

Host *
   StrictHostKeyChecking no

and must be chmod’ed to be readable only by you. ie

chmod go-rw ~/.ssh/config

Notes on using the MATLAB comiler on the HPC cluster

(Thanks to Michael Vershinin and Fan Wang for their help and patience in debugging this procedure).

As noted, the official docs for compiling your MATLAB code is is described here (note that many of the MATLAB links will require that you create a Mathworks account). Before you start hurling your .m code at the compiler, please read the following for some hints.

The following is a simple case where all the MATLAB code is in a single file, say test.m. Note that for the easiest path, you should write your MATLAB code to compile as a function. This means that keyword function has to be used to define the MATLAB code (see example below). If you want to pass parameters to the function, you have include a function parameter indicating this.

# Before you use any MATLAB utilities, you will have to load the
# MATLAB environment via the 'module' command

module load MATLAB/r2011b

# for a C file dependency, you compile it with 'mex'.  Note that mex doesn't like
# C++ style comments (//), so you'll have to change them to the C style /* comment */

mex some_C_code.c    # -> produces 'some_C_code.mexa64'

# then compile the MATLAB code for a standalone application.
# (type mcc -? for all mcc options)

# If the m-code has a C file dependency which has already been mex-compiled,
# mcc will detect the requirement and link the '.mexa64' file automatically.

mcc -m test.m  # -> 'test'  (can take a minute or more)

# !! if you have additional files that are dependencies, you may have to define
# !! them via the '-I /path/to/dir' flags to describe the dirs where your
# !! additional m code resides.

# for a _C_ shared lib (named with multiple input .m files

mcc -B csharedlib:libmymatlib file1.m file2.m file3.m

# for a _C++_ shared lib (named with multiple input .m files

mcc -B cpplib:libmymatlib file1.m file2.m file3.m

Passing variables to compiled MATLAB applications

Also, few programs will be useful with all the variables compiled statically. There are a few ways to pass variables to the program - the easiest for a single or a few variables is to use the the MATLAB input function to read in a character, string, or vector and process it internally to provide the required variables.

Another way, especially if you have a large number of variables to pass, include the variables in a file and feed that file to the matlab app. This will require that the matlab app is designed to read a file and parse it correctly.

Both are described in some detail in the official MATLAB documentation Passing Arguments to and from a Standalone Application.

More examples are described here, in the example function matlab_sim() and in the text following.

Files produced by the mcc compiler

In the standalone case which will probably be the most popular approach on HPC, the mcc compilation will generate a number of files:

readme.txt  ...............  autogen'd description of the process
test   ....................  the 'semi-executable'
test.m  ...................  original 'm code'
test_main.c  ..............  C code wrapper for the converted m code
test_mcc_component_data.c .  m code translated into C code  ..............  the script that wraps and runs the executable
test.prj  .................  XML description of the entire compilation
                               dependencies (Project file)

In order to now run the executable to test it, you can run the auto-generated shell script, HOWEVER to submit it to SGE, you should not write your qsub script to call The fact that the wraps the native executable shields it from SGE process control and can cause a lot of unexpected behavior. Instead, write your qsub script to call the native executable directly (you may have to inspect the and copy some setup variables into the qsub script). Otherwise the shell wrapper will intercept the process control commands and usually misbehave.

So while you can test it for a few minutes like this on an interactive node:

./ [matlab_root] [arguments]

# where the [matlab_root] would be '/data/apps/matlab/r2011b' for the
# matlab version that supports the compiler
# and [arguments] are inputs to the matlab function 'test' (separated by space
# if there are multiple input arguments).

you have to run it via the scheduler in a qsub script. for long/production runs

ie:, you will have to create a qsub script (call it like this:


#$ -S /bin/bash          # run with this shell

#$ -N comp_matlab_run    # this name shows in qstat
#$ -q Free64               # run in this Q
#$ -l mem_free=2G        # need 2GB free RAM
#$ -cwd            # run the job out of the current directory;
                   # (the one from which you ran the script)
# be sure to load the MATLAB module, to define the PATHs to the
# various libs and resources that it needs.

module load MATLAB/r2014a

./test  [arguments]

and qsub it to SGE:


MATLAB Compilation Example

Below is a very simple example showing how to compile and execute some MATLAB code. Save the following code to a file named average.m.

function y = average(x)
% AVERAGE Mean of vector elements.
% AVERAGE(X) is the mean of vector, where X is a vector of
% elements. Nonvector input results in an error.
[m,n] = size(x);
if (~((m == 1) | (n == 1)) | (m == 1 & n == 1))
    error('Input must be a vector')
y = sum(x)/length(x);      % Actual computation

Once the code is saved as average.m, compile by copying and pasting into a terminal window.

module load MATLAB/r2011b   # load the MATLAB environment
mcc -m average.m;           # compile the code (takes many seconds)
z=1:99                      # assign the input vector to a shell variable
./average $z                # call the executable with the range (also very slow)
# or equivalently and more directly
./average 1:99

Note also that if you’re going to run this under SGE as multiple instances, each instance will have to run with the appropriate MATLAB environment so you will have to preface each exec with the module load MATLAB/r2011b directive.

Resolving Missing Libraries

Many of the problems we hear about are due to missing or incompatible library dependencies. A complicated program (like R) has many such dependencies:

$ ldd =>  (0x00007fff003fc000) => /usr/lib64/ (0x00002b83c1c32000) => /usr/lib64/ (0x00002b83c1e88000) => /lib64/ (0x00002b83c217c000) => /apps/readline/5.2/lib/ (0x00002b83c23ff000) => /usr/lib64/ (0x00002b83c263c000) => /usr/NX/lib/ (0x00002b83c2899000) => /lib64/ (0x00002b83c29ad000) => /lib64/ (0x00002b83c2bb7000) => not found => /usr/lib64/ (0x00002b83c2dbb000) => /lib64/ (0x00002b83c2fc8000) => /lib64/ (0x00002b83c31e4000)
        /lib64/ (0x0000003fe7600000) => /usr/lib64/ (0x00002b83c353c000)

(there is no dependency yet in R)

and each of them typically has more, so it’s fairly common for an update to break such dependency chains, if only due to a few missing or changed functions.

If you run into a problem that seems to related to this, such as:

unable to load shared object '/apps/R/2.14.0/lib64/R/modules/':/ cannot open shared object file: No such file or

The above extract implies that the library can’t find to resolve missing functions, so that lib may be missing on the node that emitted the error.

If this error is emitted from a node during a batch job, it may be hard to debug which nodes are in error. To resolve this by yourself, it’s sometimes useful to use clusterfork to debug the problem.

In the above case, you would issue a command such as:

 cf --target=PERC 'module load R/2.14.0;  \
   ldd /apps/R/2.14.0/lib64/R/modules/ |grep found'

where the is the library in question. The results will capture the STDERR and STDOUT from the single-quoted command in node-named files in a subdir that begins with REMOTE_CMD- in the working directory. Examining those files usually identify the offending nodes.

Please be careful in using cf since you can easily overwhelm the cluster if the command demands a lot of CPU or disk activity. Try the command on one node first to determine the effect and only issue the cf command after you’ve perfected it.

Release information & Latest version

The latest version of this document should always be available here. The asciidoc source is available here.

This document is released under the GNU Free Documentation License.