-
Notifications
You must be signed in to change notification settings - Fork 48
CUDA Troubleshooting
Most of the time, installing Tensorflow/Keras/CUDA/CuDNN following the Tensorflow installation guide will just work, but if not, here we list some troubleshooting steps we've found to address most problems.
These are specific to Windows, but the Anaconda steps should work on any OS.
- If installed from the NVIDIA installer files, uninstall CUDA Runtime & CUDA Development from Control Panel -> Programs.
- Uninstall the pip version of tensorflow:
pip uninstall tensorflow-gpu
- Install CUDA:
conda install cudatoolkit==9.0
- Install CuDNN:
conda install cudnn=7.1.4
- Install Tensorflow:
conda install tensorflow-gpu==1.12
If you're having issues importing Keras:
pip uninstall keras
conda install keras==2.2.4
First, check if maybe the python being seen by MATLAB and the one on your system are different for some reason. In MATLAB, try this:
>> !python -c "import tensorflow as tf; print(tf.test.is_gpu_available())"
And similarly on a system terminal (Start Menu -> type 'Command Prompt'
):
python -c "import tensorflow as tf; print(tf.test.is_gpu_available())"
They should give the same (possibly error) message.
Assuming that both of them give the same message, try this next -- let's see the versions of the CUDA libraries that anaconda is reporting. In MATLAB you should see something similar:
>> !conda list cuda
# packages in environment at C:\Anaconda3:
#
# Name Version Build Channel
cudatoolkit 9.0 1
>> !conda list cudnn
# packages in environment at C:\Anaconda3:
#
# Name Version Build Channel
cudnn 7.1.4 cuda9.0_0
Assuming those versions look right, let's check tensorflow:
>> !conda list tensorflow
# packages in environment at C:\Anaconda3:
#
# Name Version Build Channel
tensorflow 1.12.0 gpu_py36ha5f9131_0
tensorflow-base 1.12.0 gpu_py36h6e53903_0
tensorflow-gpu 1.12.0 h0d30ee6_0
Importantly, the regular "tensorflow" package (first line) should have the string "gpu" in the build, like the one above. If not, maybe the CPU version of tensorflow is installed, so try this:
conda remove tensorflow-gpu
conda remove tensorflow
pip uninstall tensorflow-gpu
pip uninstall tensorflow
The last two may not do anything, but good to check. If you did do this, go ahead and reinstall it with:
conda install tensorflow-gpu
Next, let's try checking the system cuDNN version. In MATLAB type this:
>> cd C:\
>> !where cudnn*
C:\Program Files\MATLAB\R2019a\bin\win64\cudnn64_7.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin\cudnn64_7.dll
If you see one that isn't in the MATLAB folder, like my second one, we can check the cuDNN versions like this:
>> !type "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\include\cudnn.h" | findstr CUDNN_MAJOR
#define CUDNN_MAJOR 7
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
>> !type "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\include\cudnn.h" | findstr CUDNN_MINOR
#define CUDNN_MINOR 1
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
>> !type "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\include\cudnn.h" | findstr CUDNN_PATCHLEVEL
#define CUDNN_PATCHLEVEL 4
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
Your versions should be CUDA 9.0 and cuDNN 7.1.4.
If NONE of that works, here are some more things to try:
- Check if your GPU has a driver update at: http://nvidia.com/drivers
- Uninstall anything from the Control Panel -> Programs that says NVIDIA CUDA (especially Runtime and Developer packages).
- Restart the PC, being sure to spin around in your chair a minimum of 3 times during boot. I'm not sure if it's superstition but if you've gotten to this point, it can't hurt to try.