HPC

ZERO WASTE COMPUTING

The UCI HPC Cluster is a large collection of computing hardware ( nodes ) from several schools/departments at UCI.

HPC is managed by the OIT Research Computing Support RCS group as one single large cluster with the majority of compute nodes connected together via fast Mellanox Infiniband Network.

When node owners are not using their compute nodes, the idle compute nodes ( the cores ) become part of the Free Queue system on HPC. The Free Queue system are queues that all HPC users can access for harvesting free compute cycles that would otherwise go to waste since un-used CPU cycles cannot be stored.

Note Compute nodes are always running consuming electricity and cooling and when not in use, all those not-is-use CPU cycles go to waste!

The Free Queue is basically a scavenger system looking for any idle not-in-use nodes on the cluster and allowing access to those idle nodes when the owners are NOT using their nodes.

All HPC users can submit jobs to the Free queues and Grid Engine functional FairShare policy takes care of evenly distributing jobs among the available idle cores. Users/departments who have contributed nodes on HPC receive a SIGNIFICANTLY higher priority than non-contributors to discourage FREE LOADERS.

How Does it Work?

The HPC cluster is using Son of Grid Engine (GE) customized to allow the configuration of sharing of the resources (the nodes) on HPC.

When owners of compute nodes are NOT using their nodes, GE makes those nodes available to the Free Queue system. When the owner starts using their nodes, GE suspends the jobs on the Free Queue so that the owner jobs can run. When the owner is done, GE then automatically resumes the suspended jobs on the Free Queue.

So how does it work? When you submit a job to the free64 queue for example, the job will first sit in the queue looking for a node to run on:

 job-ID    name   user      state submit/start  queue               slots
 ------------------------------------------------------------------------
   2411    TEST   jfarran   qw    08/13/2012                          1

As soon as a node is found, the job state will change from qw ( queue wait ) to r ( running ) along with the node it found ( compute-4-1 ):

   2411    TEST   jfarran   r     08/13/2012   free64@compute-4-1     1

If the owner of compute-4-1 submits a job, your job will be suspended and the state will change from r ( running ) to S ( suspended ):

   2411    TEST   jfarran   S     08/13/2012   free64@compute-4-1     1

If you use HPC Checkpoint ( highly recommended ), then the job WILL NOT SUSPEND but rather will be moved to another idle node and keep on running. If a node is not immediately available, the job will go back into the queue and will then resume running when a node becomes available.

What Free Nodes are Available Now?

Use the queue q command on HPC.

 $ q

For more info how HPC Queues, see:

What is the drawback of the Free Queues?

TIME!

Your job running on the free queue may run uninterrupted to completion, or it may get suspended & resumed one or more times. It all depends on the usage of the node owner. Since you are getting a free ride, you can’t complain.

On the other hand, If you use HPC Checkpoint, then your job will NOT suspend but will keep on running on a different node as long as a node is available to run on.

What are the benefits of the Free Queues?

FREE COMPUTE CYCLES!

Free compute cycles on some very expensive and fast compute nodes that would otherwise sit idle collecting dust when not in use. The HPC cluster has over 5,000 cores and a lot of computing can be done while the nodes are not in use.

What is the difference between the Public & Free Queues?

The Public queues guarantee you exclusive access and time for your job without any interruptions. When you submit a job to a public queue, once your jobs starts no other job will be able to suspend it.

A job running on the free queue can however be suspended and resumed multiple times.

I am a node Owner, do I have to wait to use my node?

NO!

When you submit a job your job will immediately ( a minute or less ) start on your node and anyone else running on it will either get suspended ( the other job will go-to sleep ) or if the other job is running with checkpoint, that job will be pulled OUT of your node and migrated to a different idle node.

Note Please note that the checkpoint process takes a few minutes ( 1-10 minutes ) in order to migrate a job out of the node. The majority of time is less than 2 minutes.

Q/A: Free Loaders & Node Owners:

Are you encouraging Free-Loaders?

Excellent Question!

We are encouraging NOT WASTING CPU CYCLES that cannot be harvested and put aside to be used on a future date.

Important CPU cycles are either CONSUMED right now or they simply go to WASTE. Period.

By Free-Loaders we are talking about users who have NOT contributed nodes on HPC but are using HPC Free Queue system.

OK but you ARE encouraging Free-Loaders!

We cannot dictate responsible behavior by forcing everyone to buy a node or nodes on HPC. Also some users and/or departments do not have the immediate resources to purchase a $12,000 dollar node or nodes for their work. They need the computing resources in order get their grant submitted to then get the needed resources if approved.

It has been the case, several times over, throughout the years that an HPC user has been able to get critical computing done on HPC via the Free Queues and as a direct result, have been able to convince their PI/department to purchase nodes on HPC for their private work which results in their node(s) going back to the Free Queue when they don’t use it to help the next person in need.

This may seem like a mystery, but over the term of a year I have never seen a node owner use their nodes 100% of the time 24/7. The best I have seen is a 70-80% utilization over the span of a year with the average node owner using their nodes around 50% of the time. That 20-50% extra cycles is what goes back to the Free Queue for everyone to use. On a 5,500-core cluster like HPC, that’s a lot of spare not in use CPU cycles that can be used by everyone to get some free computing done.

OK that’s very nice and also very Naive

We were not born yesterday. Some users will use any and all free cycles without ever contributing to HPC even if it’s helping their cause.

A gold fish will keep on eating until it explodes just like some users will keep on free-loading until they destroyed that which they rely on, the HPC Cluster. If everyone free-loads then in-time there will be NO HPC.

So in order to keep HPC alive and also to be as fair as possible to those who have contributed nodes on HPC, Free-Loaders are restricted in the following ways:

  • 256-cores max run on free64 queue ( out of 5,500 possible cores )

  • GE FairShare job Priority of around 5%

  • 25% max access on the public pub64 queue

  • Depending on HPC load, the above may be reduced even further in real time

How will the Free Queues help me as a node owner?

Since not everyone is using their private nodes at any given point in time, there are almost always some cores that are idle. In real practice on HPC, at any given time around ~50% of cores are not in used by their owners.

You have the advantage of not only running on all of your own private cores, but also run with other idle cores through the Free Queue system.

Note It is very common on HPC to be able to run with more cores than you own. Simply because not everyone is 100% active at any given point in time.

Will I get my fair share of cores on the Free Queues as a node owner?

Yes. HPC Grid Engine is configured with the Functional FairShare scheduler so that everyone waiting to use the Free Queues get equal access. So as a node owner, you can easily get more than your purchased.

Free-Loaders get significantly less priority of the Free Queues.

Any other benefits as a node owner?

Aside from the obvious such as the large number of software installed on HPC, support and the management of the cluster, some users have had success when writing grants showing how computing resources are maximized and not wasted on the HPC cluster due to the Free Queue system and Checkpoint.

Note Many funding agencies want to make sure that every penny they dish out are not wasted. HPC has this capability.

HPC is the only cluster on campus that has the infrastructure in place striving for ZERO WASTE COMPUTING. With the use of the Free Queue system and HPC Checkpoint facility, HPC efficiency is at a premium ( minimal waste ).

Why don’t other clusters have similar setup?

Short answer is that it is highly complicated and very time consuming to setup things like Checkpoint and the knowledge and experience needed to configure a cluster scheduler to maximize the resources of a cluster. Grid Engine on HPC has been heavily modified and several home grown scripts have been created to make it all work.

Joseph Farran, senior HPC Architect who invented the HPC Free-Queue system and Checkpoint configuration, has 30+ years as a System Administrator of various operating systems including Super Computers. Of the 30+ years, 15 of those years have been solely dedicated to Cluster computing at UCI staring with a 16-core cluster ( MPC ) in 2000 to today’s HPC Cluster of almost 6,000 cores.