Installing and Updating GTX 1080 Ti Drivers / CUDA on Ubuntu

I recently had to figure out how to set up a new Ubuntu 16.04 machine with NVIDIA's new GTX 1080 Ti graphics card for use with CUDA-enabled machine learning libraries, e.g. Tensorflow and PyTorch; since the card (as of this writing) is relatively new, the process was pretty involved. The same tricks should also work for the newer Titan Xp graphics card.

Edits
(02/01/2019):
I've updated the install instructions to use driver version 410 (necessary for CUDA 10, but should retain backwards compatibility with older CUDA versions).

(1/27/2018):
Tensorflow 1.5.0 and PyTorch 0.3 now have pre-built binaries for CUDA 9. If you install CUDA 9, the driver version that comes with it should be fully compatible with the 1080 Ti. You can easily install CUDA 9 on most Linux distributions with your package manager (see here for details).

If you want to use CUDA 8 for some reason (e.g. using an older Tensorflow), read on...

(5/10/2017):
Looks like driver version 381 is out of beta and on the PPA, so I've updated the recommended driver versions and install instructions accordingly.

1. Install CUDA without the driver

I couldn't just install CUDA and have it work, since certain CUDA version (e.g., 8.0) come with a driver version (in the case of CUDA 8.0, driver version 375.26) that doesn't support the GTX 1080 Ti and other newer cards. As a result, installing CUDA from apt-get doesn't work since it installs this driver version. Thus, you have to install with the runfile, to opt-out of installing the driver.

When running the installer, make sure to not install the driver that comes with CUDA. We'll install the driver with apt-get in the next step.

Post Install Notes (Thanks to Jake Boggan for mentioning this in the comments): After installing, check that the CUDA folders are where you expect them to be (usually /usr/local). The CUDA installer creates a symlink at /usr/local/cuda that automatically points to the version of CUDA installed.

Make sure to add /usr/local/cuda/bin to your $PATH, and /usr/local/cuda/lib64 to your $LD_LIBRARY_PATH if you're on a 64-bit machine / /usr/local/cuda/lib to your $LD_LIBRARY_PATH if you're on a 32-bit machine. There's a bit more info at the CUDA docs, but the paths will likely differ based on version so be sure to manually verify that the folders you're adding to the environment variables exist.

2. Installing the driver with apt-get

To install the driver with apt-get, I used the Ubuntu graphics-drivers PPA. This method isn't officially supported by NVIDIA, but it seems to work well for many people.

At the graphics-drivers PPA homepage, there's a listing of the various graphics drivers that they offer; check the NVIDIA download website to figure out what version of the driver you need for your card. If it's in the PPA, great! If not, you unfortunately have to wait for them to add it. They're pretty timely, though.

Add the PPA to apt-get and update the index by running:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update

Now, we use it to install the desired driver versions (Major version 410 as of this writing):

sudo apt-get install nvidia-410

Reboot your computer, and the GPU should run on the new driver. To verify, run nvidia-smi and confirm that the Driver Version at the top of the output is what you expect and that the rest of the information looks good.

You should now be able to fire up Python and test that it works with Tensorflow or your favorite deep learning framework.

3. Verifying the installation worked

CUDA

To test the CUDA installation, you can run the deviceQuery example bundled with CUDA. If you navigate to the CUDA samples folder (/usr/local/cuda#.#/samples or ~/NVIDIA_CUDA-#.#_Samples by default), you can find the deviceQuery example in <samples_dir>/1_Utilities/deviceQuery.

Running make in this directory should compile the CUDA source file to produce a binary that will produce a variety of statistics about your GPU and run some test on it. Run the binary with ./deviceQuery, and you should see a bunch of output about your device; here's my output with a 1080 Ti for comparison.

Drivers

If the driver installation went properly, you should be able to run nvidia-smi and get an output like the one below (the memory usage / temp / fan / GPU utilization will probably differ, since this was measured under load). Make sure that the version displayed in the top-left corner is the same as the one you expect:

Sun May  7 19:54:19 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 381.09                 Driver Version: 381.09                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 0000:02:00.0     Off |                  N/A |
| 42%   73C    P2   194W / 250W |   8417MiB / 11172MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

When I was using driver version 378, it oddly didn't show the name as GeForce GTX 108..., but rather as just Graphics Device. The card worked fine with TensorFlow, though.

It'd probably be a good idea to test that your GPU works with your machine learning library of choice, here are instructions for doing so on Tensorflow.

For the future: updating the apt-get drivers

It's pretty easy to upgrade the drivers to a different version.

First, remove the old drivers:

sudo apt-get purge nvidia*

Now, just install the new driver with the PPA as detailed above and reboot.

Nelson Liu's Blog