NVIDIA GPU Computing Documentation

The GPU Computing SDK includes 100+ code samples, utilities, whitepapers, and additional documentation to help you get started developing, porting, and optimizing your applications for the CUDA architecture. You can get quick access to many of the SDK resources on this page, or download the complete SDK.

Please note that you may need to install the latest NVIDIA drivers and CUDA Toolkit to compile and run the code samples.


CUDA Getting Started Guide (Windows) 

This guide will show you how to install and check the correct operation of the CUDA development tools in Windows.
 




Open


CUDA Getting Started Guide (Linux) 

This guide will show you how to install and check the correct operation of the CUDA development tools in Linux.
 




Open


CUDA Getting Started Guide (Mac OS X) 

This guide will show you how to install and check the correct operation of the CUDA development tools in Mac OS X.
 




Open


Getting Started with CUDA SDK samples 

This guide covers the introductary CUDA SDK samples beginning CUDA developers should review before developing your own projects.
 




Open


SDK Code Sample Guide New Features in CUDA Toolkit 4.2 

This guide covers what is new in CUDA Toolkit 4.2 and the new code samples that are part of the CUDA SDK 4.2.
 




Open


CUDA C Programming Guide 

This is a detailed programming guide for CUDA C developers.
 




Open


CUDA C Best Practices Guide 

This is a manual to help developers obtain the best performance from the NVIDIA CUDA Architecture. It presents established optimization techniques and explains coding metaphors and idioms that can greatly simplify progarmming for the CUDA architecture.
 




Open


CUDA Occupancy Calculator 

The CUDA Occupancy Calculator allows you to compute the multiprocessor occupancy of a GPU by a given CUDA kernel. This tool provides guidance for optimizing the best kernel launch configuration for the best possible occupancy for the GPU.
 




Open


CUDA Developer Guide for Optimus Platforms 

This document provides guidance to CUDA developers and explains how NVIDIA CUDA APIs can be used to query for GPU capabilities in Optimus systems. It is strongly recommended to follow these guidelines to ensure CUDA applications are compatible with all notebooks featuring Optimus.
 




Open


OpenCL Programming Guide 

This is a detailed programming guide for OpenCL developers.
 




Open


OpenCL Best Practices Guide 

This is a manual to help developers obtain the best performance from OpenCL.
 




Open


OpenCL Overview for the CUDA Architecture 

This whitepaper summarizes the guidelines for how to choose the best implementations for NVIDIA GPUs.
 




Open


OpenCL Implementation Notes 

This document describes the "Implementation Defined" behavior for the NVIDIA OpenCL implementation as required by the OpenCL specification Version: 1.0. The implementation defined behavior is referenced below in the order of it's reference in the OpenCL specification and is grouped by the section number for the specification.
 




Open


CUDA API Reference Manual (PDF) 

This is the CUDA Runtime and Driver API reference manual in PDF format.
 




Open


CUDA API Reference Manual (CHM) 

This is the CUDA Runtime and Driver API reference manual in CHM format (Microsoft Compiled HTML help).
 




Open


The CUDA Compiler Driver (NVCC) 

This CUDA compiler driver allows one to compile each CUDA source file, and several of these steps are subtly different for different modes of CUDA compilation (such as generation of device code repositories). It is the purpose of the CUDA compiler driver nvcc to hide the intricate details of CUDA compilation from developers."
 




Open


PTX: Parallel Thread Execution ISA Version 3.0 

This document describes PTX, a low-level parallel thread execution virtual machine and instruction set architecture (ISA). PTX exposes the GPU as a data-parallel computing device.
 




Open


Compute Command Line Profiler User Guide 

The Compute Command Line Profiler is a command line based profiling tool that can be used to measure performance and find potential opportunities for CUDA and OpenCL optimizations, to achieve maximum performance from NVIDIA GPUs. The Compute Command Line Profiler provides metrics in the form of plots and counter values presented in tables and as graphs. It tracks events with hardware counters on signals in the chip; this is explained in detail in the chapter entitled, "Compute Command Line Profiler Counters."
 




Open


CUDA Fermi Compatibility Guide 

The Fermi Compatibility Guide for CUDA Applications is intended to help developers ensure that their NVIDIA CUDA applications will run effectively on GPUs based on the NVIDIA Fermi Architecture. This document provides guidance to developers who are already familiar with programming in CUDA C/C++ and want to make sure that their software applications are compatible with Fermi.
 




Open


CUDA Fermi Tuning Guide 

An overview on how to tune applications for Fermi to further increase these speedups is provided. More details are available in the CUDA C Programming Guide (version 3.2 and later) as noted throughout the document..
 




Open


CUBLAS Library User Guide 

The CUBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the NVIDIA CUDA runtime. It allows the user to access the computational resources of NVIDIA Graphical Processing Unit (GPU), but does not auto-parallelize across multiple GPUs.
 




Open


CUFFT Library User Guide 

This document describes CUFFT, the NVIDIA CUDA Fast Fourier Transform (FFT) library. The FFT is a divide-and-conquer algorithm for efficiently computing discrete Fourier transforms of complex or real-valued data sets, and it is one of the most important and widely used numerical algorithms, with applications that include computational physics and general signal processing. The CUFFT library provides a simple interface for computing parallel FFTs on an NVIDIA GPU, which allows users to leverage the floating-point power and parallelism of the GPU without having to develop a custom, GPUbased FFT implementation.
 




Open


CUSPARSE Library User Guide 

The NVIDIA CUDA CUSPARSE library contains a set of basic linear algebra subroutines used for handling sparse matrices and is designed to be called from C or C++. These subroutines can be classified in four categories.
 




Open


CURAND Library User Guide 

The NVIDIA CURAND library provides facilities that focus on the simple and efficient generation of high-quality pseudorandom and quasirandom numbers.
 




Open


NVIDIA Performance Primitives (NPP) Library User Guide 

NVIDIA NPP is a library of functions for performing CUDA accelerated processing. The initial set of functionality in the library focuses on imaging and video processing and is widely applicable for developers in these areas. NPP will evolve over time to encompass more of the compute heavy tasks in a variety of problem domains. The NPP library is written to maximize flexibility, while maintaining high performance.
 




Open


CUDA Profiler Tools SDK Interface (CUPTI) User Guide 

The CUDA Profiling Tools Interface (CUPTI) enables the creation of profiling and tracing tools that target CUDA applications. CUPTI provides four APIs, the Activity API, the Callback API, the Event API, and the Metric API. Using these APIs, you can develop profiling tools that give insight into the CPU and GPU behavior of CUDA applications. CUPTI is delivered as a dynamic library on all platforms supported by CUDA.
 




Open


CUDA Profiler Tools SDK Interface Release Notes 

The CUDA Profiler Tools Interface Release Notes.
 




Open


Thrust Quick Start Guide 

Thrust is a C++ template library for CUDA based on the Standard Template Library (STL). Thrust allows you to implement high performance parallel applications with minimal programming effort through a high-level interface that is fully interoperable with CUDA C.
 




Open


NVIDIA CUDA H.264 Video Encoder Library User Guide 

The NVIDIA CUDA H.264 Video Encoder is a library for performing CUDA accelerated video encoding. The functionality in the library takes raw YUV frames as input and generates NAL packets. This encoder supports up to various profiles up to High Profile @ Level 4.1.
 




Open


NVIDIA CUDA Video Decoder Library User Guide  

The CUDA Video Decoder API gives developers access to hardware video decoding capabilities on NVIDIA GPU. The actual hardware decode can run on either Video Processor (VP) or CUDA hardware, depending on the hardware capabilities and the codecs. This API supports the following video stream formats for Linux and Windows platforms: MPEG-2, VC-1, and H.264 (AVCHD).
 




Open


CUDA C SDK Release Notes 

CUDA C SDK Release Notes.
 




Open


OpenCL SDK Release Notes 

OpenCL SDK Release Notes.
 




Open


GPU Computing SDK End User License Agreement 

This is the Software License Agreement for developers or licensees.
 




Open

Last Update: 1/10/2012