Important New instructions for using HPC GPU Nodes

The normal and correct method for using GPU nodes with Grid Engine is to use what is called an SGE consumable so that you can request one, two, or however many GPU cards are available on a given node. By doing this you and other users using the same node will not collide and use the same GPU card.

The method for accomplishing the above entails using what is called an SGE consumable called RSMAP. Unfortunately RSMAP consumable is not available on the free SGE that HPC is using. It is only available with commercial SGE called UNIVA which is very very expensive and we cannot afford it.

There is a workaround to using RSMAP consumables BUT it requires a LOT of work to redo HPC SGE epilog and prolog scripts as our current scripts are very complex in order to do HPC Checkpoint and Restart.

Another reason for not working on HPC GPU SGE is that HPC3 will be replacing HPC and HPC3 will probably be using a very different scheduler called SLURM and not SGE. So there is no sense in wasting our time and effort in re-doing HPC SGE since both HPC and SGE will be going-away "adios" in the not too distance future.

So with all that said, we are going to use GPU nodes in their simple format which is to request the GPU node just like any other node on HPC except that you will be able to use one or all of the GPU cards on a node. Yes there will be waste if only one GPU card is used on a given node so try to use as many GPU cards as the program allows.

Note One last note: HPC Checkpoint does NOT work with GPU nodes. You can use HPC restart however.

Here is a table of the available GPU queues on the cluster:

Queue Name Suspend-able? Count GPU Card Type

gpu

No

4

Tesla M2090

gpu2

No

4

Tesla K80

gpu1080

Yes

3

GeForce GTX 1080

free32i-gpu

Yes

2

Tesla K20c

Joseph Farran <jfarran@uci.edu>