PGI Compilers:

The Portland Group Compilers is a commercial supplier of high performance scalar and parallel compilers and tools for workstations, servers, and high-performance computing systems.

On the HPC Cluster we have the complete license suite for the PGI Compilers, including the PGI Accelators.

To use the PGI Compilers, you need to load it with module.   Example:

$ module load  pgi/12.6

You now have PGI compiler version 12.6 in your path and you can enter pgcc for example to compile.   The manual (man) pages are also available ( man pgcc ).

Here is a short table of the various PGI software available:
pgcc ANSI and K&R C compiler
C++ compiler
pgfortran is an alias for pgf90 and pgf95
Fortran 90/95 compiler
Fortran 77 compiler
High Performance Fortran compiler
Performance analysis tool
Graphical F77, F90, C, C++ Debugger
Performance data collection tool

For additional documentation aside from the man pages on the HPC cluster, click here for on-line web PGI documentation.

Compiling Optimization:

When compiling you can gain significant speedup in your binary execuable by using the "-fast" optimization flag.  Example:

pgcc -fast  mycode.c

You can also gain additional speedup by compiling for the CPU architecture your code is going to run on.   So if you are going to run on one of the AMD 64-core nodes, you will want to use the "-tp bulldozer" flag.    For example:

pgcc -fast  -tp bulldozer mycode.c

By default, the PGI compilers default to build the binary executable for the CPU architecture on which the code was compiled on, so you can omitt the "-tp bulldozer" flag if you are compiling on one of the 64-core nodes on HPC.

With PGI you can find out the CPU Architecture of any node by using the "-V" flag.   For example, running the following command on compute-1-7 node:

$ hostname
$ pgcc -V

pgcc 12.6-0 64-bit target on x86-64 Linux -tp bulldozer
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2012, STMicroelectronics, Inc.  All Rights Reserved.

Shows that this node CPU architecture compilation option will be "-tp bulldozer".   This may gain you ~5-10% additional speedup in your executable, or a lot more - it depends on many factors.   The dissavantage here is that your executable can only run on a AMD 64-core node however.

Other optimizations to conider are flags such as:

-fast -Mipa=fast,inline -Mfprelaxed

To see a help summary of all opmization flags:

pgcc -help -fast