Tensorflow: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Created on 21 Dec 2018  ·  181Comments  ·  Source: tensorflow/tensorflow

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes and No (described below)
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Manjaro
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): tf-nightly-gpu (Dec 19, r1.13)
  • TensorFlow version (use command below): 1.13.0-dev20181219
  • Python version: 3.7.1
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: CUDA 10 with cuDNN 7.4.1
  • GPU model and memory: RTX 2070 8GB

Describe the current behavior
I'm running the CNN model on MNIST. When I'm running with the GPU, I am encountering
2018-12-20 20:09:13.644176: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

I did some digging and realized that it is a memory issue (which shouldn't be the case as I have 32GB of RAM and 64GB of swap. I ran htop when running the model and I have 20+GB free, which is more than enough to fit the 8GB vRAM mappings.

Using the gpu_options.allow_growth = True gets the model to work properly, and setting os.environ['CUDA_VISIBLE_DEVICES'] = '-1' also works. This means that I AM facing a memory issue, but I don't see how.

Also, using gpu_options.allow_growth = True does not fix the same issue when trying to run tensorflow/models/official/mnist/ model, which should have a similar behavior with my code.

Code to reproduce the issue

import os
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import math
import time
# Killing optional CPU driver warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
tf.logging.set_verbosity(tf.logging.ERROR)


class Model:

    def __init__(self, image, label):
        """
        A Model class contains a computational graph that classifies images
        to predictions. Each of its methods builds part of the graph
        on Model initialization. Do not modify the constructor, as doing so
        would break the autograder. You may, however, add class variables
        to use in your graph-building. e.g. learning rate, 

        image: the input image to the computational graph as a tensor
        label: the correct label of an image as a tensor
        prediction: the output prediction of the computational graph,
                    produced by self.forward_pass()
        optimize: the model's optimizing tensor produced by self.optimizer()
        loss: the model's loss produced by computing self.loss_function()
        accuracy: the model's prediction accuracy
        """
        self.image = image
        self.label = label

        # TO-DO: Add any class variables you want to use.

        self.prediction = self.forward_pass()
        self.loss = self.loss_function()
        self.optimize = self.optimizer()
        self.accuracy = self.accuracy_function()

    def forward_pass(self):
        """
        Predicts a label given an image using convolution layers

        :return: the prediction as a tensor
        """
        filter_1 = tf.Variable(tf.truncated_normal([3, 3, 1, 8], stddev=0.1))
        conv_1 = tf.nn.conv2d(self.image, filter_1, [1, 1, 1, 1], "SAME")

        reshaped = tf.reshape(conv_1, shape=[50, -1])

        L1 = reshaped.shape[1].value
        L2 = 500
        W1 = tf.Variable(tf.random_normal([L1, L2], mean=0, stddev=0.01))
        b1 = tf.Variable(tf.random_normal([L2], mean=0, stddev=0.01))
        relu_1 = tf.nn.relu(tf.matmul(reshaped, W1) + b1)

        W2 = tf.Variable(tf.random_normal([L2, 10], mean=0, stddev=0.01))
        b2 = tf.Variable(tf.random_normal([10], mean=0, stddev=0.01))
        logits = tf.nn.relu(tf.matmul(relu_1, W2) + b2)
        return logits

    def loss_function(self):
        """
        Calculates the model cross-entropy loss

        :return: the loss of the model as a tensor
        """
        loss = tf.losses.softmax_cross_entropy(onehot_labels=self.label, logits=self.prediction)
        return loss

    def optimizer(self):
        """
        Optimizes the model loss using an Adam Optimizer

        :return: the optimizer as a tensor
        """
        learning_rate = 0.1
        sgd = tf.train.GradientDescentOptimizer(learning_rate)
        train = sgd.minimize(self.loss)
        return train

    def accuracy_function(self):
        """
        Calculates the model's prediction accuracy by comparing
        predictions to correct labels – no need to modify this

        :return: the accuracy of the model as a tensor
        """
        correct_prediction = tf.equal(tf.argmax(self.prediction, 1),
                                      tf.argmax(self.label, 1))
        return tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


def main():
    t_start = time.time()

    mnist = input_data.read_data_sets("data/mnist/", one_hot=True)
    batch_sz = 50
    batch = 2000

    inputs = tf.placeholder(shape=[batch_sz, 28, 28, 1], dtype=tf.float32)
    labels = tf.placeholder(shape=[batch_sz, 10], dtype=tf.float32)

    model = Model(inputs, labels)

    session_config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))
    sess = tf.Session(config=session_config)

    # sess = tf.Session()

    sess.run(tf.global_variables_initializer())
    for i in range(batch):
        next_image, next_label = mnist.train.next_batch(batch_sz)
        next_image = next_image.reshape((batch_sz, 28, 28, 1))
        sess.run(model.optimize, feed_dict={inputs: next_image, labels: next_label})

    acc, test_images, test_labels = 0, mnist.test.images, mnist.test.labels
    test_batch = math.ceil(len(test_images) / batch_sz)
    for i in range(test_batch):
        batch_images = test_images[i * batch_sz: (i + 1) * batch_sz]
        batch_images = batch_images.reshape((batch_sz, 28, 28, 1))
        batch_labes = test_labels[i * batch_sz: (i + 1) * batch_sz]
        acc += sess.run(model.accuracy, feed_dict={inputs: batch_images, labels: batch_labes})
    acc /= test_batch
    print(acc)

    print(time.time() - t_start, 'seconds')

    return


if __name__ == '__main__':
    main()
TF 2.0 gpu bug

Most helpful comment

I did try compiling from source, but ran into the same issue. I was finally able to fix my problem was setting config.gpu_options.allow_growth = True.

All 181 comments

I've been running into the same issue with the same GPU: "CUDNN_STATUS_INTERNAL_ERROR".

RTX 2070 GPU
CUDA 10
cuDNN 7.4.2
Ubuntu 18.04
tf-nightly-gpu (r1.13, Jan 13)
Python 3.6.7

2019-01-15 05:01:03.503415: I tensorflow/stream_executor/platform/default/dso_loader.cc:154] successfully opened CUDA li
brary libcublas.so.10.0 locally
2019-01-15 05:01:03.752563: I tensorflow/stream_executor/platform/default/dso_loader.cc:154] successfully opened CUDA li
brary libcudnn.so.7 locally
2019-01-15 05:01:04.905618: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STAT
US_INTERNAL_ERROR
2019-01-15 05:01:04.908147: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STAT
US_INTERNAL_ERROR
2019-01-15 05:01:04.908191: W tensorflow/core/framework/op_kernel.cc:1412] OP_REQUIRES failed at conv_ops_fused.cc:801 :
 Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to se
e if a warning log message was printed above.

I've the same problem running on

RTX2080 GPU
CUDA 10
cudnn 7.4.2

I tried the following tf Versions tf-nightly-gpu and a self compiled Version from master (060b6e32ad).
I found out that its possible to set the following ENVIRONMENT Variables to get further Debug Info.

CUDNN_LOGINFO_DBG=1;
CUDNN_LOGDEST_DBG=stdout

Then i get the following error:

I0117 14:11:24.441819 140433563125568 basic_session_run_hooks.py:594] Saving checkpoints for 0 into /tmp/mnist/model.ckpt.
2019-01-17 14:11:25.916269: I tensorflow/stream_executor/platform/default/dso_loader.cc:154] successfully opened CUDA library libcublas.so.10.0 locally

I! CuDNN (v7402) function cudnnCreate() called:
i! Time: 2019-01-17T14:11:26.079184 (0d+0h+0m+0s since start)
i! Process=29255; Thread=29356; GPU=NULL; Handle=NULL; StreamId=NULL.

2019-01-17 14:11:26.079151: I tensorflow/stream_executor/platform/default/dso_loader.cc:154] successfully opened CUDA library libcudnn.so.7 locally

I! CuDNN (v7402) function cudnnCreate() called:
i! Time: 2019-01-17T14:11:26.571897 (0d+0h+0m+0s since start)
i! Process=29255; Thread=29356; GPU=NULL; Handle=NULL; StreamId=NULL.

2019-01-17 14:11:26.571858: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-01-17 14:11:26.579375: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

I! CuDNN (v7402) function cudnnCreate() called:
i! Time: 2019-01-17T14:11:26.579803 (0d+0h+0m+0s since start)
i! Process=29255; Thread=29356; GPU=NULL; Handle=NULL; StreamId=NULL.

2019-01-17 14:11:26.585818: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-01-17 14:11:26.585850: W ./tensorflow/stream_executor/stream.h:2109] attempting to perform DNN operation using StreamExecutor without DNN support
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1320, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1408, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node Discriminator_1/Conv/Conv2D}}]]
[[train/discriminator_train/train_op/control_dependency/_569]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/dj/projects/gan/tf_models/research/gan/mnist/train.py", line 151, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/dj/projects/gan/tf_models/research/gan/mnist/train.py", line 147, in main
get_hooks_fn=tfgan.get_joint_train_hooks())
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/gan/python/train.py", line 1200, in gan_train
config=config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/training/python/training/training.py", line 546, in train
loss = session.run(train_op, run_metadata=run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 693, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1188, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1287, in run
raise six.reraise(original_exc_info)
File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1272, in run
return self._sess.run(
args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1336, in run
feed_dict, options)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1362, in _call_hook_before_run
request = hook.before_run(run_context)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/gan/python/train.py", line 1061, in before_run
run_context.session.run(self._train_ops)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 930, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1153, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1329, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1349, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node Discriminator_1/Conv/Conv2D (defined at home/dj/projects/gan/tf_models/research/gan/mnist/networks.py:152) ]]
[[train/discriminator_train/train_op/control_dependency/_569]]

Errors may have originated from an input operation.
Input Source operations connected to node Discriminator_1/Conv/Conv2D:
inputs/batch/n (defined at home/dj/projects/gan/tf_models/research/gan/mnist/data_provider.py:67)

Original stack trace for 'Discriminator_1/Conv/Conv2D':
File "home/dj/projects/gan/tf_models/research/gan/mnist/train.py", line 151, in
tf.app.run()
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "home/dj/projects/gan/tf_models/research/gan/mnist/train.py", line 87, in main
[FLAGS.batch_size, FLAGS.noise_dims]))
File "usr/local/lib/python3.6/dist-packages/tensorflow/contrib/gan/python/train.py", line 118, in gan_model
discriminator_real_outputs = discriminator_fn(real_data, generator_inputs)
File "home/dj/projects/gan/tf_models/research/gan/mnist/networks.py", line 176, in unconditional_discriminator
net = _discriminator_helper(img, False, None, weight_decay)
File "home/dj/projects/gan/tf_models/research/gan/mnist/networks.py", line 152, in _discriminator_helper
net = layers.conv2d(img, 64, [4, 4], stride=2)
File "usr/local/lib/python3.6/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
return func(args, *current_args)
File "usr/local/lib/python3.6/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1155, in convolution2d
conv_dims=2)
File "usr/local/lib/python3.6/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
return func(args, *current_args)
File "usr/local/lib/python3.6/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1058, in convolution
outputs = layer.apply(inputs)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1228, in apply
return self.__call__(inputs, args, *kwargs)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/layers/base.py", line 531, in __call__
outputs = super(Layer, self).__call__(inputs, args, *kwargs)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 564, in __call__
outputs = self.call(inputs, args, *kwargs)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/convolutional.py", line 196, in call
outputs = self._convolution_op(inputs, self.kernel)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py", line 966, in __call__
return self.conv_op(inp, filter)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py", line 591, in __call__
return self.call(inp, filter)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py", line 208, in __call__
name=self.name)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py", line 1578, in conv2d
name=name)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 1040, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 501, in new_func
return func(args, *kwargs)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
self._traceback = tf_stack.extract_stack()

Any ideas somebody? I am just before reinstalling my complete environement :-(

Try to compile r1.13 from source. It would take a long time, but it should fix your problem. At least it fixed mine.

I did try compiling from source, but ran into the same issue. I was finally able to fix my problem was setting config.gpu_options.allow_growth = True.

I've been having the same issue (on an RTX 2060, Ubuntu 18.04, Python 3.6.7, CUDA 10.0.130, cuDNN 7.4.2, Tensorflow 1.13.0-rc0 from source). Thanks to @va-andrew's suggestion I have it working with the allow_growth option set.

FWIW, in the course of searching for solutions to this it seems that this issue is a common problem with the RTX series (although it might be a general problem with CUDA 10.0, since the new cards don't support the older versions). It would be great if the defaults could get updated in the release of 1.13 so that special options don't need to be set for these cards.

Chiming in to say I also experienced this under the following configuration:

Tensorflow Docker GPU containers with stable releases of everything don't work either (they straight up segfault rather than report CUDNN_STATUS_INTERNAL_ERROR).

Curiously, things work fine on Windows 10 with Tensorflow v1.12!

And has others have reported, setting allow_growth allows things to run properly.

Same problem here.

  • RTX 2070
  • Ubuntu 18.04
  • CudNN 7.4.2 (but I have tried compiling with other older versions with no luck)
  • Tensorflow 1.13.0-dev20190125 (also tried Tensorflow 1.12 compiled with Cuda 10)

And as others have reported, setting allow_growth=TRUE allows things to run.

Closing this issue since its resolved. Thanks!

@ymodak Can you please reference the PR that fixed this bug?

I have a similar issue with tf-nightly-gpu-2.0-preview on the RTX 2080

Same issue with an RTX2080, spent two days recompiling and bug hunting until I found this fix.
(the allow_growth=true thing fixed it)

You made my day

How do you actually set allow_growth=true? I have tf-nightly-gpu-2.0-preview and tried:

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

but get this error:

AttributeError Traceback (most recent call last)
in ()
1 import tensorflow as tf
----> 2 config = tf.ConfigProto()

AttributeError: module 'tensorflow' has no attribute 'ConfigProto'

How can I set allow_growth in tensorflow 2.0?

ok, made it work in tf-nightly-gpu-2.0-preview and ipython notebook adding this to my code:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

same issue, with gpu_options.allow_growth = True the issue fixed.

@newhouseb how/where did you set that true for all benchmarks? Was it an easy change?

Is blanket allow growth a solution ?

It is turned off by default for a reason see
https://www.tensorflow.org/guide/using_gpu#allowing_gpu_memory_growth

In my program memory management is important

I would like to limit the amount of GPU used by TF because in my graphics application the GPU memory will be used for other things and putting it into a limited space is important to prevent out of memory errors

I am working in C++ under Windows

Adding the allow growth option results in an OOM error.

Without this line of code the model runs fine on the same machine with the same card.

With OOM error

options.config.mutable_gpu_options()->set_allow_growth(true);
options.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(fraction);

Without OOM error

//options.config.mutable_gpu_options()->set_allow_growth(true);
options.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(fraction);

So to solve this problem with set allow growth results in a segfault.

@ymodak This bug is not fixed. Arguably, using any sort of convnet should work in the default configuration. Either allow_growth should be true by default, it should be fixed so this works, or there should be a better error than CUDNN_STATUS_INTERNAL_ERROR.

@ymodak It looks like this issue was closed prematurely. While there is a work-around for this issue it involves changing application code. As a result the example code does not work _out of the box_ on RTX cards and most recipes on line will also need modification.

@samhodge can't you prevent OOM by using config.gpu_options.per_process_gpu_memory_fraction = 0.4 as suggested on the tensorflow documentation page you posted yourself ?

I'm confused by this boolean hack to enable tensorflow-gpu on my RTX 2080: will this allow_growth = True be an issue if I use my GPU solely for one tensorflow script/jupyter notebook at a time ? (in addition to standard GPU usage for the screen etc)

I intend to set a static ML stack on a computer and would like to know whether this will end up in a mess at some point (big gridsearch, models with lots of parameters etc). I didn't figure out yet whether I definitely need to build from sources to try to avoid this internal error or just change this boolean.

Ok I think I found the source of my issues before I create my session I measure the GPU RAM free so if I am on a 8Gb card and 6Gb are free I use a fraction of 0.75 and occasionally that ends in an OOM but recently I have been experimenting with 0.95*0.75 and I have yet to have an OOM. So if you push the space for allocation of Tensorflow to the limit it sometimes clashes. Obviously if you inputs and outputs to an individual Op don’t fit it will OOM, but I measure against this an will use GPU or CPU depending on which fits.

@samhodge great, so in the end the allow_growth boolean hack does provide a solution if no major GPU operation is launched in parallel and if what is processed _at a time_ by tensorflow (batch size would be critical) doesn't overflow the memory provided by the GPU... ?

Everything uses the GPU even your browser

Running into the same issue on a GTX 1050 using tensorflow-gpu 1.13.1 from pip with CUDA 10.0/cuDNN 7.4.2.24/Nvidia driver 410/Ubuntu 16.04.

Still having the same issue here but "config.gpu_options.allow_growth = True" doesn't fix the problem. Happens on both TF-gpu 1.14.1 and TF-gpu 2.0. RTX1070, CUDA 10.0, Ubuntu 18.04, Nvidia driver 430.09.

The descriptions of the problems you are seeing makes me believe that (particular version of) cuDNN tries to allocate GPU memory when creating the handle. If TensorFlow already took all the memory (either because config.gpu_options.allow_growth = false, or per_process_gpu_memory_fraction close to 1.0) there is no memory left to allocate for cuDNN.

You could confirm this by running TensorFlow through nvprof and generate an API trace to inspect the failing cuMemAlloc call.

Issue #6698 seems to discuss the same problem. Some people noticed that they had accidentally used a cuDNN release that doesn't match their CUDA version. Could you please verify that you are using cuDNN for CUDA 10 when running with CUDA 10?

Turns out I didn't have cuDNN installed correctly because I am a great fool. Got it in, reinstalled TF2-nightly, added the lines to allow the growth, and all is good.

How to delete cudatoolkit and cudnn from Conda?

Since Anaconda-included(or embedded) cudnn has the error as follows, I want to remove conda-installed cudatoolkit and cudnn, and install independent CUDA and cudnn from the Nvidia's website.

Error : Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

However, while I use the commands as follows but can not remove them, I can not remove them.
conda remove --name cuda --all
conda remove --name cudnn --all

I see that two documents including cudatoolkit-10.0.130-0 and cudnn-7.3.1-cuda10.0.0_0 in the path as
follows.

/home/anaconda3/pkgs/cudatoolkit-10.0.130-0
/home/anaconda3/pkgs/cudnn-7.3.1-cuda10.0.0_0

How can I delete(or remove) cuda and cudnn that included(or embedded) in Anaconda.

Thanks in advance,

Mike

@mikechen66 What is the output of conda? It may be because other packages depend on cuda and cudnn. Why would you want to delete them in the first place? If you want to get a custom environment, use miniconda rather than anaconda. Miniconda only comes with conda, and you need to install all the packages you need manually.

Hi tydlwav:

Thanks for your feedback. After checking the version compatibility and release date of the core libraries, I installed the related dev environments, run the simple MNIST test code and got the outputs as follows.

I think that Anaconda3 can not even support the core libraries of cudnn and TensorFlow. So it is a big problem of Anaconda3. So I want to delete the lightweight cudnn libraries from Anaconda and use the independent and powerful Nvidia cuda and cudnn libraries to run the test code. Please help give some suggestions.

  1. Installation Environments

Nvidia GeForce RTX 2060
Graphics Driver: NVIDIA-Linux-x86_64-415.27 (Jan 15, 2019)
1st version that supports RTX 2060
Anaconda3: Anaconda3-2019.03-Linux-x86_64.sh (2019.04-04)
-- cudatoolkit-10.0.130-0
-- cudnn-7.3.1-cuda10.0.0_0
-- TensorFlow 13.1
-- Juputer Notebook and ipykernel
--defaulted by Ananconda3

  1. MNIST Test Code:

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Flatten, MaxPooling2D, Conv2D
from keras.callbacks import TensorBoard

(X_train,y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000,28,28,1).astype('float32')
X_test = X_test.reshape(10000,28,28,1).astype('float32')

X_train /= 255
X_test /= 255

n_classes = 10
y_train = keras.utils.to_categorical(y_train, n_classes)
y_test = keras.utils.to_categorical(y_test, n_classes)

model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), activation='relu', input_shape=(28,28,1)) )
model.add(Conv2D(64, kernel_size=(3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(n_classes, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

tensor_board = TensorBoard('./logs/LeNet-MNIST-1')

model.fit(X_train, y_train, batch_size=128, epochs=15, verbose=1,
validation_data=(X_test,y_test), callbacks=[tensor_board])

  1. Outputs:

Using TensorFlow backend.

WARNING:tensorflow:From /home/mike/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/mike/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob.
WARNING:tensorflow:From /home/mike/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Train on 60000 samples, validate on 10000 samples
Epoch 1/15

UnknownError Traceback (most recent call last)
in
34
35 model.fit(X_train, y_train, batch_size=128, epochs=15, verbose=1,
---> 36 validation_data=(X_test,y_test), callbacks=[tensor_board])

~/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
1037 initial_epoch=initial_epoch,
1038 steps_per_epoch=steps_per_epoch,
-> 1039 validation_steps=validation_steps)
1040
1041 def evaluate(self, x=None, y=None,

~/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/keras/engine/training_arrays.py in fit_loop(model, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
197 ins_batch[i] = ins_batch[i].toarray()
198
--> 199 outs = f(ins_batch)
200 outs = to_list(outs)
201 for l, o in zip(out_labels, outs):

~/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
2713 return self._legacy_call(inputs)
2714
-> 2715 return self._call(inputs)
2716 else:
2717 if py_any(is_tensor(x) for x in inputs):

~/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py in _call(self, inputs)
2673 fetched = self._callable_fn(array_vals, run_metadata=self.run_metadata)
2674 else:
-> 2675 fetched = self._callable_fn(
array_vals)
2676 return fetched[:len(self.outputs)]
2677

~/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/tensorflow/python/client/session.py in __call__(self, args, *kwargs)
1437 ret = tf_session.TF_SessionRunCallable(
1438 self._session._session, self._handle, args, status,
-> 1439 run_metadata_ptr)
1440 if run_metadata:
1441 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
526 None, None,
527 compat.as_text(c_api.TF_Message(self.status.status)),
--> 528 c_api.TF_GetCode(self.status.status))
529 # Delete the underlying status object from memory otherwise it stays alive
530 # as there is a reference to status from this from the traceback due to

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}}]]
[[{{node metrics/acc/Mean}}]]

Hi tydlwav:

I use the following command to uninstall both cuda and cudnn. However, both the libraries are still located in Anaconda3 even though they do not work right now. I guess that Anaconda3 intends to protect the core libraries not to be removed. It might be the core capability of Continuum even thought it has bugs. I will try to use either Independent Nvdia cuda(uncluding nvcc) and cudnn or find the new cuda or cudnn with conda to be installed.

Uninstall Command:

conda uninstall cudatoolkit

Collecting package metadata: done
Solving environment: done

Package Plan

environment location: /home/mike/anaconda3/envs/tf-gpu

removed specs:
- cudatoolkit

The following packages will be REMOVED:

cudatoolkit-10.0.130-0
cudnn-7.3.1-cuda10.0_0
cupti-10.0.130-0
keras-2.2.4-0
tensorflow-1.13.1-gpu_py37hc158e3b_0
tensorflow-base-1.13.1-gpu_py37h8d69cac_0
tensorflow-gpu-1.13.1-h0d30ee6_0

Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Notes:

After I uninstalled both of them, Jupyter Notebook shown "No mudule named "tensorflow". That means that uninsallation is successful. However, both cudatoolkit and cudnn are still found in Anaconda3. I think that Continuum defaults them not to not delete although both of them does not work.

/home/anaconda3/pkgs/cudatoolkit-10.0.130-0
/home/anaconda3/pkgs/cudnn-7.3.1-cuda10.0.0_0

You have already removed them. The files in pkgs are for installation. These are downloaded cache for the installation. Also, this is not the place to discuss conda environment issues. It is not relevant to this issue. You may want to try stack overflow.

I'm a little confused by the state of this issue. I am using an RTX 2080, cuda 10.1, cudnn v7.5.1.10 and tensorflow 1.14.

Using the allow growth work around works, but maybe I have a different version mismatch?

Will there be a fix for this in tensorflow 1.14?

Thank you

Thanks. I see the compatibility issue among RTX 20XX Turing series, TensorFlow and Anaconda. It is obvious that RTX 20XX series Supports cudnn 7.5.0, TensorFlow only supports cudnn 7.4, but Anaconda includes a streamlined 7.3.1, it is a total mismatch among the three vendors. In addition, RTX 20XX series has a big compatibility problem with Ubuntu 16.04 LTS. Sometimes, the Ubuntu 16.04 crashed. I had to bring two bootable USB stick to reinstall the OS. Therefore, I upgraded two PCs to Ubuntu 18.04 LTS and installed Miniconda. Then I will try a higher version Tensorflow.

Notes:

Nvidia has its own custom Ubuntu 18.04 LTS for its Jetson TX1/TX2 and Jetson Nano Mobile GPU platform. Nvidia seems to determine its new products such as RTX 20XX series in compatibility with Ubuntu 18.04 LTS rather than the lower version Ubuntu 16.04. However, I do not know whether Continuum has its upgrade plan for Nvidia RTX 20XX Turing series.

RTX series are well supported as of right now. I have used tf with RTX 2070 through a conda environment on non-ubuntu distribution. This should be the worst case scenario, and it's still working fine. Cuda and cudnn are backwards compatible, and it should not be an issue if you use the newer versions. You should simply create a new Python 3.6 environment with conda create -n tf python==3.6.8 and run conda install tensorflow-gpu.

That is great I have compiled from source and have had clients work with Tensorflow 1.12.0 CUDA 10.0 and CUDNN 7.4.2.24 on most hardware but I have had issues with a handful of clients with RTX cards with a CNN with cudnn on the GPU. I may have accidentally packaged the wrong CUDNN for CUDA 9.0 the files are identically named.

Can anyone confirm that these versions work on RTX2080 and other Turing based cards?

Hi tydlwav:

I installed Miniconda and related python and tensorflow environment according to your suggestion. It still has the error: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize.......
Please help find a solution.

Please see the steps I operated.

  1. Instal python 3.6.8 according to your guideline.
    conda create -n tf python==3.6.8

  2. activate tf
    conda activate tf

  3. install tensorflow-gpu in the tf environment according to your guideline.
    conda install tensorflow-gpu

The installed package includes cudatoolkit and cudnn as follows.
....................................................................................................
cudatoolkit pkgs/main/linux-64::cudatoolkit-10.0.130-0
cudnn pkgs/main/linux-64::cudnn-7.3.1-cuda10.0_0
....................................................................................................

  1. install jupyter notebook, ipykernel and related environment the webpage.

1). install jupyter notebook
conda install jupyter notebook

2). install ipykernel based on jupyter notebook
conda install ipykernel jupyter

3). create TensorFlow-GPU in the webpage of jupyter notebook
python -m ipykernel install --user --name tf-gpu --display-name "TensorFlow-GPU"

  1. Open jupyter notebook
    1). command into jupyter notebook webpage
    jupyter notebook

2). Click TensorFlow-GPU
While clikcing in the menu "new" and selecting "TensorFlow-GPU" in the wepage, the cell shows in the webpage of jupyter notebook. The webpage is listed as follows.
http://localhost:8888/notebooks/Untitled3.ipynb?kernel_name=tf-gpu

  1. Paste Run the simple MNIST test code

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Flatten, MaxPooling2D, Conv2D
from keras.callbacks import TensorBoard

(X_train,y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000,28,28,1).astype('float32')
X_test = X_test.reshape(10000,28,28,1).astype('float32')

X_train /= 255
X_test /= 255

n_classes = 10
y_train = keras.utils.to_categorical(y_train, n_classes)
y_test = keras.utils.to_categorical(y_test, n_classes)

model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), activation='relu', input_shape=(28,28,1)) )
model.add(Conv2D(64, kernel_size=(3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(n_classes, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

tensor_board = TensorBoard('./logs/LeNet-MNIST-1')

model.fit(X_train, y_train, batch_size=128, epochs=15, verbose=1,
validation_data=(X_test,y_test), callbacks=[tensor_board])

  1. Errors as same as the last-mentioned message:

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d_1/convolution}}]]
[[{{node metrics/acc/Mean}}]]

Thanks,

Mike

HI tydlwav:

By the way, I also installed keras with the following command.
conda install keras-gpu

Since ever installation is correct, I have got the error. So I assume that it is the version compatibility issue between Miniconda and RTX20XX Turing series. The error is as same as Anaconda. I get to know that the cudnn and cuda version in Miniconda and Anaconda are same.

That's fairly interesting. I got cuda 10 and cudnn7.3 working with conda about a month and a half ago. I haven't used tensorflow since then. If it doesn't work for you, you can build from source. That always works for me. If you're just starting, I'd recommend using pytorch. You'd have a much easier time installing and getting things working.

Hi tydlwav:

I will try the other method such as pytorch. Now that Google releases tensorflow-gpu 1.14 , can I use the Miniconda to install independent tensorflow-gpu 1.14 at the Google Tensorflow website as follows.

Google tensorflow: https://www.tensorflow.org/install/source

Notes:

Conda has only tensorflow-gpu builds from 1.0.1 to 1.13.1 as follows. The builds are so old that the builds could not catch up with the official Google TensorFlow and the official Nvidia GeForce RTX 20XX (2060~2080) Truing series.

Command:
conda search tensorflow-gpu

Loading channels: done

Name Version Build Channel
tensorflow-gpu 1.0.1 py27_4 pkgs/free
tensorflow-gpu 1.0.1 py35_4 pkgs/free
tensorflow-gpu 1.0.1 py36_4 pkgs/free
tensorflow-gpu 1.1.0 np111py27_0 pkgs/free
tensorflow-gpu 1.1.0 np111py35_0 pkgs/free
tensorflow-gpu 1.1.0 np111py36_0 pkgs/free
tensorflow-gpu 1.1.0 np112py27_0 pkgs/free
tensorflow-gpu 1.1.0 np112py35_0 pkgs/free
tensorflow-gpu 1.1.0 np112py36_0 pkgs/free
tensorflow-gpu 1.2.1 py27cuda7.5cudnn5.1_0 pkgs/free
tensorflow-gpu 1.2.1 py27cuda7.5cudnn6.0_0 pkgs/free
tensorflow-gpu 1.2.1 py27cuda8.0cudnn5.1_0 pkgs/free
tensorflow-gpu 1.2.1 py27cuda8.0cudnn6.0_0 pkgs/free
tensorflow-gpu 1.2.1 py35cuda7.5cudnn5.1_0 pkgs/free
tensorflow-gpu 1.2.1 py35cuda7.5cudnn6.0_0 pkgs/free
tensorflow-gpu 1.2.1 py35cuda8.0cudnn5.1_0 pkgs/free
tensorflow-gpu 1.2.1 py35cuda8.0cudnn6.0_0 pkgs/free
tensorflow-gpu 1.2.1 py36cuda7.5cudnn5.1_0 pkgs/free
tensorflow-gpu 1.2.1 py36cuda7.5cudnn6.0_0 pkgs/free
tensorflow-gpu 1.2.1 py36cuda8.0cudnn5.1_0 pkgs/free
tensorflow-gpu 1.2.1 py36cuda8.0cudnn6.0_0 pkgs/free
tensorflow-gpu 1.3.0 0 pkgs/free
tensorflow-gpu 1.4.1 0 pkgs/main
tensorflow-gpu 1.5.0 0 pkgs/main
tensorflow-gpu 1.6.0 0 pkgs/main
tensorflow-gpu 1.7.0 0 pkgs/main
tensorflow-gpu 1.8.0 h7b35bdc_0 pkgs/main
tensorflow-gpu 1.9.0 hf154084_0 pkgs/main
tensorflow-gpu 1.10.0 hf154084_0 pkgs/main
tensorflow-gpu 1.11.0 h0d30ee6_0 pkgs/main
tensorflow-gpu 1.12.0 h0d30ee6_0 pkgs/main
tensorflow-gpu 1.13.1 h0d30ee6_0 pkgs/main

They are not old, as I've used conda's release of tf 1.12 with RTX 2070. New hardware are usually backward compatible, and RTX is no different. It is most likely there are some weird environment issue at play. I don't have access to an RTX machine until July so I can't help with testing right now. Building from source should solve your problem. I've never failed to run convnets from tf built from source (assuming you have the correct configurations during build).

Once again, this is not the right place to discuss the distribution issue of tensorflow. You can make a post on stack overflow or reddit and link it here. More people will be able to see it and help you this way.

Your issue is not a bug, and it is definitely not what this issue is discussing.

@chsigg you're diagnosis that this is a problem w/ CUDNN attempting to allocate GPU memory resources that tensorflow has already allocated seems correct to me. Simply setting per_process_gpu_memory_fraction=0.9 instead of 0.95 was sufficient to resolve my issues.

I was also facing this issue. Fixed it by updating cuDNN to 7.6 version.

tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above

Tensorflow-gpu: 1.13.1
Cuda: 10.0
CuDNN: 7.3.1

Also, tensorflow and CuDNN was installed by Conda.
conda list cudnn

cudnn                     7.3.1                cuda10.0_0    anaconda

Things I did:

  1. Uninstalled conda tensorflow.
    conda remove tensorflow
  2. Uninstall conda cuDNN
    conda remove cudnn
  3. Install tensorflow with pip
    pip install tensorflow
  4. Download corresponding cuDNN 7.6 runtime deb file from https://developer.nvidia.com/cudnn
  5. Install it with sudo dpkg -i libcudnn_xxxxx_amd64.deb

@nluehr any comments? Can we make MinSystemMemory() cuda/cudnn version aware?

It is legit a memory error, if using tf.keras then do the following at the top of your file
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.keras.backend.set_session(tf.Session(config=config))

I ran into this issue as well, and was able to solve it by using @va-andrew 's solution, and specifically, I used @colinsteidtmann 's implementation, since I use some of the tensorflow.keras functions in my code. I spent a long time trying to debug this problem, so thank you both for your contributions.

EDIT: I was just looking at tensorflow documentation (https://www.tensorflow.org/guide/using_gpu), and you can also tell it to allow memory growth by setting the environment variable TF_FORCE_GPU_ALLOW_GROWTH to true. It also says that this configuration is platform specific, so YMMV (works for me with Ubuntu 18.04).

For reference, I am running:
Ubuntu 18.04.2 LTS, Gigabyte GeForce RTX 2080 Turbo, NVIDIA driver 430.26, CUDA 10.0.130, cuDNN 7.4.2.24, tensorflow-gpu 1.13.1, python 3.6. I run tensorflow from within a virtual environment, using spyder 3.3.4.

I have a 2nd computer with the exact same hardware, and I set it up following the same set of instructions, used the same files to do the install, and had this issue on that machine as well. No surprise there.

I have a 3rd computer with the exact same hardware, except that it has a 2080 Ti instead of the 2080, and I set it up following the same set of instructions, and again used the same files to do the install. But this time, there was no issue.

So, I'm led to believe it's not related to some conflict of CUDA, cuDNN, and driver version; it's not an incorrectly done installation, etc. Rather, it's related to the model of video card; I've only seen mention of this issue with RTX 2060, 2070, and 2080.

Fortunately, it's not a big inconvenience to use the workaround.

I was also facing this issue. Fixed it by updating cuDNN to 7.6 version.

tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above

Tensorflow: 1.13.1
Cuda: 10.0
CuDNN: 7.3.1

Also, tensorflow and CuDNN was installed by Conda.
conda list cudnn

cudnn                     7.3.1                cuda10.0_0    anaconda

Things I did:

1. Uninstalled conda tensorflow.
   `conda remove tensorflow`

2. Uninstall conda cuDNN
   `conda remove cudnn`

3. Install tensorflow with pip
   `pip install tensorflow`

4. Download corresponding cuDNN 7.6 runtime deb file from https://developer.nvidia.com/cudnn

5. Install it with `sudo dpkg -i libcudnn7_-1+cuda9.0_amd64.deb`

@alexforever86 After you did your update, are you sure that you are running on your GPU, and not the CPU? It seems you are using the GPU before you did your update (due to the error message referencing cuDNN), but I wonder about after. You use "pip install tensorflow", but it should be "pip install tensorflow-gpu", no? Also, you said you are using CUDA 10, but the cuDNN deb file you listed is for cuda9.0, so that shouldn't work.

So, I think it might be the case that you aren't actually using the GPU, and thus is not proof that updating to cuDNN 7.6 resolves the issue.

@synapse8 You are absolutely right about tensorflow-gpu and cuDNN version. I'm also very much confused by my comment now, and I don't remember the details anymore. Anyways, given below are the current versions in my system.

pip show tensorflow-gpu
Name: tensorflow-gpu
Version: 1.13.1

nvidia-smi
NVIDIA-SMI 430.26 Driver Version: 430.26 CUDA Version: 10.2

sudo apt search cudnn | grep installed
libcudnn7/now 7.6.0.64-1+cuda10.0 amd64

@alexforever86 with the configuration you mentioned now do you still see this problem? (I assume it works for you). I recently installed a system with cuda10, 410 driver, 7.6 cudnn and TF-gpu 1.14 (pip install) and have not seen the issue.

@robzor92 I've been using tensorflow-gpu 1.13, and out of curiosity, I just installed 1.14 to test if this resolved the issue (for me). I'm still getting the error, and still have to do the 'allow growth' workaround (again, not that big a deal).

What video card are you using?

@synapse8 Tried it with a GTX 1070.

@synapse8 I also tried the sample code provided by this thread creator just now, it worked without a problem. I would however not claim it is only a problem of the RTX line as I saw the same problem on a GTX 1050Ti with TF 1.13.1. Using the same driver/cuda/cudnn combination I posted before.

@robzor92 I doubt the 1050Ti's problem is with the small VRAM size. The RTX cards would encounter this on the basic CNN MNIST models. I doubt it's NVIDIA's tweaking of VRAM allocation on RTX cards somehow messed things up.

I have the same error on tensorflow 1.14.0 and RTX2080. But in my case, this error occurs only when I use convolution layer.

2019-07-14 21:48:13.041683: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-14 21:48:13.064262: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2019-07-14 21:48:13.064955: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55abe99bcd30 executing computations on platform Host. Devices:
2019-07-14 21:48:13.064967: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-07-14 21:48:13.066219: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-07-14 21:48:13.153748: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-14 21:48:13.154195: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55abebb44f00 executing computations on platform CUDA. Devices:
2019-07-14 21:48:13.154207: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5
2019-07-14 21:48:13.154317: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-14 21:48:13.154707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.71
pciBusID: 0000:01:00.0
2019-07-14 21:48:13.154845: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-14 21:48:13.155504: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-14 21:48:13.156112: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-14 21:48:13.156265: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-14 21:48:13.157040: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-14 21:48:13.157646: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-14 21:48:13.159661: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-14 21:48:13.159730: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-14 21:48:13.160165: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-14 21:48:13.160542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-14 21:48:13.160559: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-14 21:48:13.161120: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-14 21:48:13.161129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-07-14 21:48:13.161133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-07-14 21:48:13.161331: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-14 21:48:13.161730: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-14 21:48:13.162120: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6794 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-07-14 21:48:13.497639: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-14 21:48:14.077729: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-07-14 21:48:14.080055: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "test.py", line 16, in <module>
    print(model.predict(test_inputs))
  File "/home/yudai/.local/share/virtualenvs/pipenv_practice-DKmRVcs4/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1078, in predict
    callbacks=callbacks)
  File "/home/yudai/.local/share/virtualenvs/pipenv_practice-DKmRVcs4/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 363, in model_iteration
    batch_outs = f(ins_batch)
  File "/home/yudai/.local/share/virtualenvs/pipenv_practice-DKmRVcs4/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3292, in __call__
    run_metadata=self.run_metadata)
  File "/home/yudai/.local/share/virtualenvs/pipenv_practice-DKmRVcs4/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1458, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node conv2d/Conv2D}}]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node conv2d/Conv2D}}]]
     [[flatten/Reshape/_7]]
0 successful operations.
0 derived errors ignored.

I tried config.gpu_options.allow_growth = True, but it does not solve this error.

I want someone to help me.

Thank you.

Same issue with RTX 2070

I've made an interesting observation concerning this, that might help track down this error or find a viable solution:
I get also the error Failed to get convolution algorithm with reference to Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR.
System: laptop machine with the Nvidia Quadro P2000, Ubuntu 18.04, tf 1.13.1, cuda10, cudnn 7.4.2
As mentioned, I can run the program smoothly using allow_growth, so thanks for that, good enough for me.

Interesting: I get this error only when using tf.layers.conv... but switching to tf.keras.layers.... allows the program to run without allow_growth, so something in the keras code seems to work better than in the tf code. Maybe somebody can use this information to track down a solution from keras.
I am sticking to the tf.layers for now, as they provide an easy weight sharing through variable scopes, which are not supported by keras sadly.

@DavidS3141 It's interesting. In my case, the only convolution layer does not work in both tf.layers and tf.keras.layers...

When I use pytorch, torch.cuda.is_available is True and can use convolution layer without any trouble, so I believe the cause is the tensorflow, but I do not know what is wrong.

I agree with @Hayashi-Yudai: The same is true about MXNet. Identical configuration works fine when Tensorflow fails.

Environment:
RTX2080
Ubuntu 18.10
Driver 430.26
CUDA 10.0 (also 10.1, which isn't yet supported by TF)
cuDNN 7.6.1
mxnet-cu100 1.4.1
tensorflow-gpu 1.14.0

Hey guys, I am using the weights from the pre-trained model with ResNet50 backbone on COCO dataset to train on my CSV dataset. I am getting this error : Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning │ log message was printed above.I am running the following command in a virtual environment on Ubuntu 16.0 to for training.: keras-retinanet/keras_retinanet/bin/train.py --weights resnet50_coco_best_v2.1.0.h5
--batch-size 7 --steps 9 --epochs 4
--snapshot-path snapshots --tensorboard-dir tensorboard
csv dataset/train.csv dataset/classes.csvI tried to resolve the problem by the following script in command line in the virtual environment:
python

import tensorflow

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

as well as
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)but it did not resolve my error.:

I am using:-
Ubuntu 16.0
Cuda: 10.0
Tensorflow 1.14.0

Error:
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. │| No running processes found |
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning │+-----------------------------------------------------------------------------+
log message was printed above. │
[[{{node conv1/convolution}}]] │
[[loss/add/_2377]] │
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning │
log message was printed above. │
[[{{node conv1/convolution}}]] │
0 successful operations. │
0 derived errors ignored. │
terminate called without an active exception │
Aborted (core dumped)
Any help would be appreciated.

Same problem here. Allow_growth workaround works. Otherwise I get this error on the most basic MNIST tensorflow dataset.

RTX2060 mobile here.

Issue occurs with compiled tensorflow from r2.0 branch as well as TF 1.4 installed via conda with conda ( tensorflow-gpu).

@Hayashi-Yudai

I tried config.gpu_options.allow_growth = True, but it does not solve this error.

What were the exact commands you added to your code? Try the following instead if it's different ...

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.keras.backend.set_session(tf.Session(config=config))

@synapse8 Thank you for your comment. I tried but the result was the same.

By the way, I tried nvidia-docker and went well except for that the python version is 3.5.
https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/running.html#running

An additional information, if you do not mind using python 3.6.8 and tensorflow-gpu 1.12.0, you can use anaconda.

conda create -n <virtual env name> python=3.6.8
conda install tensorflow-gpu==1.12.0
conda install cudnn==7.3.1    # By default, cudnn7.6 is installed but it causes the error

I tested building tf-2.0.0-beta1 from sources with CUDA-10.1 and CUDNN-7.6.2.4 and the error doesn't manifest.

You can find docker images for building a tf-gpu package and a tf-base package here:
https://github.com/edowson/docker-tensorflow

The anaconda channel doesn't have cudnn==7.6.2 at the time of writing this comment.

Windows 7, bashed my head against the wall over Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR for quite a while trying to get a new machine up.

Reinstalls, lots of other things in this and other threads didn't fix it.

While testing that not having cudnn64_7.dll would cause a different error than the CUDNN_STATUS_INTERNAL_ERROR I renamed the dll. Confirming the error was instead a CUDNN NOT INSTALLED type error, I undid the file name change.

Magically, everything started working.

No idea why or how, but it does. Hopefully this helps someone else. If not, it only takes a few seconds to try.

I found this issue was caused by me erroneously making two calls to tf.Session

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

# several lines of code later...

sess = tf.Session(config=config)

Probably not the root cause for most folks but it might be worth looking out for.

Just to share "allow_growth = True" solves the issue for my system below
rtx 2080ti, ubuntu18.04, cuda9.0, cudnn7, tf1.9

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)

It has to do with the memory fraction available to load GPU resources to create cudnn handle, also known as per_process_gpu_memory_fraction.
Reducing this memory fraction by yourself will solve the error.

> sess_config = tf.ConfigProto(gpu_options =
> tf.GPUOptions(per_process_gpu_memory_fraction=0.7),
> allow_soft_placement = True)
> 
> with tf.Session(config=sess_config) as sess:
>      sess.run([whatever])

Use as small fraction as could fit in your memory. (In the code, I use 0.7, you can start with 0.3 or even smaller, then increase until you get the same error, that's your limit.)
Pass it to your tf.Session() or tf.train.MonitoredTrainingSession() or Supervisor's sv.managed_session() as config.

This should allow your GPU create a cudnn handle for your TensorFlow code.

As explained here, the new approach in TF 2.0 for setting config.gpu_options.allow_growth = True is:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Currently, memory growth needs to be the same across GPUs
  try:
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
  except RuntimeError as e:
    print(e)

With this code snippet and TF 2.0 RC1, the error no longer appears.
However, due to the number of people that have a 20XX Nvidia GPU, I think that it would be a good idea to address this problem natively before the final version of TF 2.0 is released.

I had the same issue with 1080Ti & TitanX on TF1.4 and the suggestions from @va-andrew and @oscarlinux saved the day! Which reminds me in the first place why I switched to pytorch and never coming back. Unfortunately there are still ppl using TF.... so I still have to go through this pain whenever I use their codebase... maybe it's time to play a bit with ONNX.

For anyone else finding this after upgrading to tensorflow 2.0, the API and the code are slightly different.

Ubuntu 18
Tensorflow 2.0
Tensorflow-gpu 2.0
GeForce RTX 2070

Updated code for this system.

import tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)

This solution worked for me. (TF-GPU 2.0, Windows 10, GeForce RTX 2070)

physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
tf.config.experimental.set_memory_growth(physical_devices[0], True)

Add an additional datapoint.:
rtx 2080ti, ubuntu18.04, cuda10.0, cudnn7
In my case it does not work with either tf1.14 and 1.15rc3

@w4nderlust, for 1.14 and 1.15 you will want to continue to set the session config option config.gpu_options.allow_growth = True. Is that what you are reporting does not work, or just the tf.config.experimental mechanism?

@w4nderlust, for 1.14 and 1.15 you will want to continue to set the session config option config.gpu_options.allow_growth = True. Is that what you are reporting does not work, or just the tf.config.experimental mechanism?

Sorry should have been more precise, I'm reporting that without config.gpu_options.allow_growth = True it still doesn't work in my configuration with both 1.14 and 1.15rc3.

I think I found a better workaround than the config.gpu_options.allow_growth = True.

For my setup (_RTX 2070_, docker image _tensorflow:1.15.0-gpu-py3_), setting config as shown below avoids the _CUDNN_STATUS_INTERNAL_ERROR_ while still allocating the whole GPU memory.
This is very useful for large models that would not fit into memory in allow_growth mode but just fits when the whole memory is allocated.

To allocate the whole memory on RTX:
config.gpu_options.per_process_gpu_memory_fraction = 1.0

To allocate the whole memory on RTX:
config.gpu_options.per_process_gpu_memory_fraction = 1.0

@PoloShock
I tried this with TF 2.0 and it does not seem to work.
Ubuntu18.04, RTX 2080, CUDA10, cudnn 7.6.

For TF 2.0 the API for limiting GPU memory usage has changed.

gpus = tf.config.experimental.list_physical_devices('GPU')

tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)]

@nluehr do you understand why this issue only shows up on RTX? Could it be because we have other applications using it as a display GPU concurrently with TensorFlow?

It is difficult for me to debug this directly because I don't have access to an RTX GPU.

@sanjoy I am running display on integrated gpu. No other apps on my single RTX gpu while running TensorFlow.

I tried using that for tensorflow 2.0:

    config = tf.compat.v1.ConfigProto()
    config.gpu_options.allow_growth = True
    session = tf.compat.v1.Session(config=config)

It fixes cudnn error on my rtx2080, but the training is as fast as my 1050Ti on my laptop!
While training a CNN:

Tue Nov 12 19:22:35 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.26       Driver Version: 440.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2080    Off  | 00000000:2D:00.0 Off |                  N/A |
|  0%   37C    P2    75W / 265W |   2904MiB /  7979MiB |     27%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1026      G   /usr/lib/Xorg                                200MiB |
|    0      6420      G   cinnamon                                      43MiB |
|    0     21073      C   /home/clementpoiret/anaconda3/bin/python    2647MiB |
+-----------------------------------------------------------------------------+

Adding

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=7000)]

Didn't solve the issue, without allow_growth I'm getting the cudnn error, and anyway my RTX is only using something like 3Gb or memory.

Any idea ?

I tried

    gpus = tf.config.experimental.list_physical_devices('GPU')
    tf.config.experimental.set_memory_growth(gpus[0], True)
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=7900)])

but cudnn is still throwing an error

I also get this error working in the tensorflow 1.15.0-py3-gpu Docker image (Ubuntu 18.04) with two Titan V GPUs (@sanjoy) - not RTXs. However, this error only seems to occur for me on my GPU0 which has Xorg and gnome-shell using GPU0 memory while GPU1 only has python using GPU mem and does not throw this error. The error is also unfortunately intermittent -- sometimes I will be able to remove the docker container, recreate it with the same settings and same code, then then the error will go away. Or not.

I was able to fix it using the Keras backend interface with:

import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
allow_growth_session = tf.Session(config=config)
tf.keras.backend.set_session(allow_growth_session)

Following is my nvidia-smi on both GPUs

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.26       Driver Version: 440.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN V             Off  | 00000000:01:00.0  On |                  N/A |
| 46%   63C    P2    51W / 250W |   7936MiB / 12065MiB |     31%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN V             Off  | 00000000:02:00.0 Off |                  N/A |
| 52%   70C    P2   131W / 250W |  12014MiB / 12066MiB |     60%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1564      G   /usr/lib/xorg/Xorg                            56MiB |
|    0      1607      G   /usr/bin/gnome-shell                          58MiB |
|    0      2428      G   /usr/lib/xorg/Xorg                           442MiB |
|    0      2574      G   /usr/bin/gnome-shell                         289MiB |
|    0      3292      G   ...p/pycharm-professional/167/jbr/bin/java    12MiB |
|    0      6794      G   anki                                          60MiB |
|    0     10336      G   /usr/lib/firefox/firefox                       6MiB |
|    0     16986      C   python                                      6981MiB |
|    1      4057      C   python                                     12001MiB |
+-----------------------------------------------------------------------------+

I'm having the same issue as @clementpoiret with TF 2.0 installed via conda. By using the allow_growth flag the issue disappears but that also makes the training very very slow, slower than what I had on TF 1.x... Eager first uh?

@clementpoiret and @EKami , does it speed up your training if you replace config.gpu_options.allow_growth = True with config.gpu_options.per_process_gpu_memory_fraction = 0.8? You can experiment to see what fraction makes the most use of your gpu.

@synapse8 I don't see something equivalent in tensorflow 2.0's documentation, any way to do so with tf.config.experimental ?

Edit: I'm gonna try to set memory this way, to see if it's solving the issue:

import subprocess
import tensorflow as tf


def get_gpus_memory():
    """Get the max gpu memory.

    Returns
    -------
    usage: list
        Returns a list of total memory for each gpus.
    """
    result = subprocess.check_output([
        "nvidia-smi", "--query-gpu=memory.total",
        "--format=csv,nounits,noheader"
    ]).decode("utf-8")

    gpus_memory = [int(x) for x in result.strip().split("\n")]
    return gpus_memory


def setup_gpus(allow_growth=True, memory_fraction=.9):
    """Setup GPUs.

    Parameters:
    allow_growth (Boolean)
    memory_fraction (Float): Set maximum memory usage, with 1 using
        maximum memory
    """
    gpus = tf.config.experimental.list_physical_devices("GPU")
    if gpus:
        try:
            # Currently, memory growth needs to be the same across GPUs
            for i, gpu in enumerate(gpus):
                memory = get_gpus_memory()[i]

                tf.config.experimental.set_memory_growth(gpu, allow_growth)

                # Setting memory limit to max*fraction
                tf.config.experimental.set_virtual_device_configuration(
                    gpu, [
                        tf.config.experimental.VirtualDeviceConfiguration(
                            memory_limit=memory * memory_fraction)
                    ])

                logical_gpus = tf.config.experimental.list_logical_devices(
                    "GPU")
                print(len(gpus), "Physical GPUs,", len(logical_gpus),
                      "Logical GPUs")
        except RuntimeError as e:
            # Memory growth must be set before GPUs have been initialized
            print(e)

This way we can conveniently just call setup_gpus(True, .9)

@clementpoiret: Please note that the tf.config.experimental.set_memory_growth call is unnecessary since tf.config.experimental.set_virtual_device_configuration overrides that flag since it slices up the GPU memory and pre-allocates the allocated memory.

This issue isn't limited to the RTX. Or TF 2.0.

Adding:
_from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)_

Solves the "Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR" issue with environment as follows:

nvidia-smi | NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 | | 0 GeForce GT 1030 | 49% 67C P0 N/A / 30W | 1957MiB / 2000MiB | 94%

python -c 'import tensorflow as tf; print(tf.__version__)' 1.14.0
Could this be a maximum contiguous block allocation issue with the NVIDIA drivers? Where it's ok to allocate the same total amount of memory but in smaller blocks?

Hi,

I cannot reproduce this on my machine so I'll need some help root-causing this. Do we have someone here who can reproduce the problem and is willing to do some hands-on debugging?

As a starting point I'd like to understand why MinSystemMemory does not preserve enough memory for cuDNN. If someone with a setup that reproduces this issue can add some logging (as a local patch) to discover out the amount of memory returned by MinSystemMemory that would be great. And does increasing the magic 0.05 number in MinSystemMemory help the situation?

@sanjoy I have a version that exhibits this problem. How would I go about accessing MinSystemMemory or "setting the magic 0.05 number"? I have reverted to using cuda 9.1 for the most part, but I don't mind trying a few things.

@odinsbane you'll have to build TensorFlow from source to do what I suggest below.

First step is to add LOG(INFO) or std::cerr lines to MinSystemMemory to print out available_memory and the return value from MinSystemMemory. Does available_memory agree with what nvidia-smi prints? How much memory are we leaving for the system?

Secondly, does increasing the 0.05 magic number to, say, 0.07 help at all?

This one works! Thank you guys!

from keras.backend.tensorflow_backend import set_session
$ import tensorflow as tf
$ config = tf.ConfigProto()
$ config.gpu_options.allow_growth = True
$ config.log_device_placement = True
$ sess = tf.Session(config=config)
$ set_session(sess)

we are facing a similar issue on our RTX 2070 (Ubuntu 18.04, TF2) We tried different combinations of CUDA 10.0 and libcudnn7.x.x.x versions, but the error keeps showing up again.
On another machine we have a GTX 1080ti and this one runs without issue.
The nvidia-driver is 430.50 in both cases.

It is not caused by tf.keras.utils.plot_model, I remove it and this error still appears, but less frequently.
Update: I find this only happens when I use tf.keras.utils.plot_model. I'm not sure whether this is a coincident. I'll keep trying.

============

I have a similar issue with RTX 2080 Ti on Ubuntu 18.04.3 LTS, tf 1.15, cuda 10.0.

What is weird in my case is that this only happens very occasionally, and once it happens, it will last for minuets to hours and then just disappear itself.

I tried all the above solutions and none can fix it immediately. I tried to do nothing and just wait, it will finally disappear.

What I also tried and is not mentioned above:

  1. Remove ~/.nv directory
  2. Simply reboot

FYI, error logs

2019-12-21 14:47:30.785233: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-12-21 14:47:30.959825: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-12-21 14:47:31.722238: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-21 14:47:31.749524: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "train_cifar.py", line 204, in <module>
    main()
  File "train_cifar.py", line 133, in main
    validation_data=(x_test, output_test), callbacks=callbacks, verbose=0)
  File "/home/xxx/anaconda3/envs/tf-1-gpu/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
    use_multiprocessing=use_multiprocessing)
  File "/home/xxx/anaconda3/envs/tf-1-gpu/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_generator.py", line 603, in fit
    steps_name='steps_per_epoch')
  File "/home/xxx/anaconda3/envs/tf-1-gpu/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_generator.py", line 265, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "/home/xxx/anaconda3/envs/tf-1-gpu/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1017, in train_on_batch
    outputs = self.train_function(ins)  # pylint: disable=not-callable
  File "/home/xxx/anaconda3/envs/tf-1-gpu/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py", line 3476, in __call__
    run_metadata=self.run_metadata)
  File "/home/xxx/anaconda3/envs/tf-1-gpu/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node stem_layer/conv2d/Conv2D}}]]
     [[metrics/classifier_acc/Identity/_1749]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node stem_layer/conv2d/Conv2D}}]]
0 successful operations.
0 derived errors ignored.

We are facing relevant issues

System specifications

  • Ubuntu 18.04.3 LTS
  • RTX 2070
  • python 3.7.1
  • tf-gpu 2.0.0
  • V10.0.130 CUDA
  • libcudnn7 7.6.2

The error is triggered when I try to use LSTM, GRU, RNN etc.

Actual error

2019-12-23 16:09:00.912238: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2019-12-23 16:09:01.408990: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-12-23 16:09:01.409043: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at cudnn_rnn_ops.cc:1491 : Unknown: Fail to find the dnn implementation.

File "/home/alex/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/recurrent_v2.py", line 961, in call **cudnn_lstm_kwargs) File "/home/alex/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/recurrent_v2.py", line 1174, in cudnn_lstm rnn_mode='lstm') File "/home/alex/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_cudnn_rnn_ops.py", line 109, in cudnn_rnn ctx=_ctx) File "/home/alex/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_cudnn_rnn_ops.py", line 198, in cudnn_rnn_eager_fallback attrs=_attrs, ctx=_ctx, name=name) File "/home/alex/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute six.raise_from(core._status_to_exception(e.code, message), None) File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.UnknownError: Fail to find the dnn implementation. [Op:CudnnRNN]

Apparent problem

As it seems all my memory is eaten out pretty fast. The problems seems to come up only in gpu mode, the same code works fine with cpu

Trials

  • allow memory growth
  • create virtual device with limited memory

Both tries produce the same error.

Any ideas?

I can't make progress on this issue because I cannot reproduce it. If you're able to reliably reproduce this on your machine you can help; here's how: https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-560963770, https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-561366750

I can't make progress on this issue because I cannot reproduce it. If you're able to reliably reproduce this on your machine you can help; here's how: #24496 (comment), #24496 (comment)

Hi @sanjoy , I am very willing to help, but unfortunately I might cannot build tf from source because I am using my university's properties to do my experiments and my personal laptop is not equipped with a GPU. Is there any other ways to obtain the log we need?

I found the following code on stack overflow, could it help?

from tensorflow.contrib.memory_stats.python.ops.memory_stats_ops import BytesInUse
with tf.device('/device:GPU:0'):  # Replace with device you are interested in
  bytes_in_use = BytesInUse()
with tf.Session() as sess:
  print(sess.run(bytes_in_use))

Is there any other ways to obtain the log we need?

I'll check in a VLOG statement to get this information. Once that is done, will you be able to install and reproduce this with tf-nightly (with some extra flags, I'll let you know exactly which ones)?

Surely, I can install a package on that computer if it is available on pip or conda and I use a virtual environment. I'll try to reproduce the error.

Surely, I can install a package on that computer if it is available on pip or conda and I use a virtual environment. I'll try to reproduce the error.

Can you please install tf-nightly (so that it picks up the commit that adds logging) and run with the enviroment variable TF_CPP_VMODULE set to gpu_device=5? That should print out two lines like

2019-12-26 12:07:37.196206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:837] available_memory = 12319588352                                             
2019-12-26 12:07:37.196221: I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] min_system_memory = 615979417                                              

Can you please report these numbers here?

Sorry, my current code is not compatible with tf 2.0 (I use 1.15), I am trying to update it. Please give me some time.

This problem seems related with my RTX2080, I have a desktop GTX1080, everything seems ok, then i use conda clone the conda enviroment to my RTX2080 notebook, I use tensorflow2.0.0-gpu . once application code use Conv2d, LSTM, GRU then this trouble come.
before I use the following codes to solve this problem:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:

Currently, memory growth needs to be the same across GPUs

    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:

Memory growth must be set before GPUs have been initialized

    print(e)

but since several days ago, the above method does not work any more

I am having the same problem with gtx 960m

Hi @sanjoy , I just got this output:

2019-12-30 17:38:23.824323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:837] available_memory = 10840309760
2019-12-30 17:38:23.824328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] min_system_memory = 542015488

Hi @sanjoy , I just got this output:

2019-12-30 17:38:23.824323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:837] available_memory = 10840309760
2019-12-30 17:38:23.824328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] min_system_memory = 542015488

Thanks!

Unfortunately this didn't help as much as I thought. If I clamp MinSystemMemory on a local build to 542015488 (i.e. min_system_memory = std::min(min_system_memory, 542015488ll)) resnet (for instance) seems to work just fine, and I don't get any errors from cuDNN.

@sanjoy I'm able to (mostly consistently) reproduce the issue on my end.

Relevant messages from the latest nightly:

With memory growth explicitly allowed

2019-12-30 22:51:06.846774: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
WARNING:tensorflow:Falling back to tensorflow client, its recommended to install the cloud tpu client directly with pip install cloud-tpu-client .
2019-12-30 22:51:08.851660: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-12-30 22:51:08.877811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.7715GHz coreCount: 15 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 238.66GiB/s
2019-12-30 22:51:08.887672: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2019-12-30 22:51:08.895277: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2019-12-30 22:51:08.906016: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2019-12-30 22:51:08.913767: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2019-12-30 22:51:08.921329: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2019-12-30 22:51:08.930208: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2019-12-30 22:51:08.941818: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-12-30 22:51:08.945713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Adding visible gpu devices: 0
TF GPU device: PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')



CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
Tensorflow Version: 2.1.0-dev20191230
Tensorflow_addons Version: 0.7.0-dev



Preparing data
Loading dataset
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:03<00:00, 21.61it/s] 
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 68/68 [00:00<00:00, 447.32it/s] 
Performing NLP
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:00<00:00, 13332.71it/s] 
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 68/68 [00:00<?, ?it/s] 
Transforming dataset
Generating primitives and constructing vocabulary
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:00<00:00, 139.11it/s] 
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 68/68 [00:00<00:00, 4249.86it/s] 
Encoding primitives
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16654/16654 [00:00<00:00, 33640.74it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 805/805 [00:00<00:00, 33538.43it/s] 
2019-12-30 22:51:22.970554: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-12-30 22:51:22.977228: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.7715GHz coreCount: 15 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 238.66GiB/s
2019-12-30 22:51:22.983571: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2019-12-30 22:51:22.986832: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2019-12-30 22:51:22.990667: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2019-12-30 22:51:22.993801: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2019-12-30 22:51:22.996967: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2019-12-30 22:51:23.002629: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2019-12-30 22:51:23.006072: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-12-30 22:51:23.010482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Adding visible gpu devices: 0
2019-12-30 22:51:23.557556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1087] TensorFlow compiled with CUDA 10.1 and cuDNN 7.6.5
2019-12-30 22:51:23.560870: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1099] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-30 22:51:23.564144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105]      0 
2019-12-30 22:51:23.569159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 0:   N
2019-12-30 22:51:23.571310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:837] available_memory = 7038160076
2019-12-30 22:51:23.573861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] min_system_memory = 351908003
2019-12-30 22:51:23.576728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1370] GPUDevice PlatformGpuId 0 TfGpuId 0 on bus 1 numa: 0 pci: 0000:08:00.0 DeviceLocality: bus_id: 1
links {
}

2019-12-30 22:51:23.583814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6376 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:08:00.0, compute capability: 6.1)
2019-12-30 22:51:23.590034: I tensorflow/core/common_runtime/gpu/gpu_device.cc:249] Created stream[0] = 000002093BAB9860
2019-12-30 22:51:23.594885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:268] Created host_to_device_stream[0] = 000002093BAB9360
2019-12-30 22:51:23.597951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:273] Created device_to_host_stream[0] = 000002093BABA960
2019-12-30 22:51:23.600920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:289] Created device_to_device_stream[0] = 000002093BAB8EE0

Without any changes to the GPU device's config

2019-12-30 22:54:47.762913: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
WARNING:tensorflow:Falling back to tensorflow client, its recommended to install the cloud tpu client directly with pip install cloud-tpu-client .
2019-12-30 22:54:50.073199: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-12-30 22:54:50.100339: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties:
pciBusID: 0000:08:00.0 name: GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.7715GHz coreCount: 15 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 238.66GiB/s
2019-12-30 22:54:50.105836: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2019-12-30 22:54:50.115940: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2019-12-30 22:54:50.127341: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2019-12-30 22:54:50.131871: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2019-12-30 22:54:50.139786: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2019-12-30 22:54:50.144940: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2019-12-30 22:54:50.159197: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-12-30 22:54:50.162685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Adding visible gpu devices: 0
TF GPU device: PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')



CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
Tensorflow Version: 2.1.0-dev20191230
Tensorflow_addons Version: 0.7.0-dev



Preparing data
Loading dataset
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:03<00:00, 21.71it/s] 
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 68/68 [00:00<00:00, 433.07it/s] 
Performing NLP
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:00<00:00, 13332.18it/s] 
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 68/68 [00:00<?, ?it/s] 
Transforming dataset
Generating primitives and constructing vocabulary
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:00<00:00, 140.34it/s] 
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 68/68 [00:00<00:00, 4249.55it/s] 
Encoding primitives
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16654/16654 [00:00<00:00, 33039.93it/s] 
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 805/805 [00:00<00:00, 33537.43it/s] 
2019-12-30 22:55:04.084880: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-12-30 22:55:04.088867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties:
pciBusID: 0000:08:00.0 name: GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.7715GHz coreCount: 15 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 238.66GiB/s
2019-12-30 22:55:04.094516: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2019-12-30 22:55:04.097049: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2019-12-30 22:55:04.099754: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2019-12-30 22:55:04.102329: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2019-12-30 22:55:04.105131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2019-12-30 22:55:04.108029: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2019-12-30 22:55:04.110629: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-12-30 22:55:04.114339: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Adding visible gpu devices: 0
2019-12-30 22:55:04.655119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1087] TensorFlow compiled with CUDA 10.1 and cuDNN 7.6.5
2019-12-30 22:55:04.658124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1099] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-30 22:55:04.660826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105]      0
2019-12-30 22:55:04.662403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 0:   N
2019-12-30 22:55:04.664213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:837] available_memory = 7038160076
2019-12-30 22:55:04.666185: I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] min_system_memory = 351908003
2019-12-30 22:55:04.668490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1370] GPUDevice PlatformGpuId 0 TfGpuId 0 on bus 1 numa: 0 pci: 0000:08:00.0 DeviceLocality: bus_id: 1
links {
}

2019-12-30 22:55:04.672820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6376 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:08:00.0, compute capability: 6.1)
2019-12-30 22:55:04.677690: I tensorflow/core/common_runtime/gpu/gpu_device.cc:249] Created stream[0] = 0000021EC0CF5840
2019-12-30 22:55:04.679747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:268] Created host_to_device_stream[0] = 0000021EC0CF58C0
2019-12-30 22:55:04.682343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:273] Created device_to_host_stream[0] = 0000021EC0CF5940
2019-12-30 22:55:04.685266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:289] Created device_to_device_stream[0] = 0000021EC0CF59C0

EDIT: Model information, if it helps.

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
Feature_1 (InputLayer)          [(None, 150)]        0
__________________________________________________________________________________________________
Feature_2 (InputLayer)          [(None, 150)]        0
__________________________________________________________________________________________________
embedding (Embedding)           (None, 150, 64)      5632        Feature_1[0][0]
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, 150, 64)      2944        Feature_2[0][0]
__________________________________________________________________________________________________
bidirectional (Bidirectional)   (None, 150, 128)     66048       embedding[0][0]
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, 150, 128)     66048       embedding_1[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 150, 256)     0           bidirectional[0][0]
                                                                 bidirectional_1[0][0]
__________________________________________________________________________________________________
bidirectional_2 (Bidirectional) (None, 64)           73984       concatenate[0][0]
__________________________________________________________________________________________________
dense (Dense)                   (None, 32)           2080        bidirectional_2[0][0]
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 1)            33          dense[0][0]
==================================================================================================
Total params: 216,769
Trainable params: 216,769
Non-trainable params: 0

A minimal example using TF 1.15, and I get this error. On RTX 2070 and NVIDIA 440.44 and CUDA version 10.2.

import tensorflow as tf
import tensorflow.keras.applications as applications
import tensorflow.keras.utils as utils
import numpy as np

num_samples = 1000
height = 224
width = 224
num_classes = 1000

model = applications.ResNet50(weights=None, input_shape=(height, width, 3), classes=num_classes)

parallel_model = utils.multi_gpu_model(model, gpus=2, cpu_relocation=True)
parallel_model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

x = np.random.random((num_samples, height, width, 3))
y = np.random.random((num_samples, num_classes))

parallel_model.fit(x, y, epochs=20, batch_size=256)

print('all done')
Train on 1000 samples
Epoch 1/20
2020-02-06 15:06:40.524918: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-06 15:06:41.291528: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-06 15:06:41.329183: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 822083584 exceeds 10% of system memory.
2020-02-06 15:06:42.082319: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 851705856 exceeds 10% of system memory.
2020-02-06 15:06:42.293092: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 822083584 exceeds 10% of system memory.
2020-02-06 15:06:43.173764: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 822083584 exceeds 10% of system memory.
2020-02-06 15:06:43.820074: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-02-06 15:06:44.390897: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 822083584 exceeds 10% of system memory.
2020-02-06 15:06:45.839525: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-02-06 15:06:45.856793: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-02-06 15:06:45.883423: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "./test_tf.py", line 19, in <module>
    parallel_model.fit(x, y, epochs=20, batch_size=256)
  File "/nix/store/520352w3m8lyj2zgv647qfqrws5q798n-python3.7-tensorflow-gpu-1.15.0/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
    use_multiprocessing=use_multiprocessing)
  File "/nix/store/520352w3m8lyj2zgv647qfqrws5q798n-python3.7-tensorflow-gpu-1.15.0/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 675, in fit
    steps_name='steps_per_epoch')
  File "/nix/store/520352w3m8lyj2zgv647qfqrws5q798n-python3.7-tensorflow-gpu-1.15.0/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 394, in model_iteration
    batch_outs = f(ins_batch)
  File "/nix/store/520352w3m8lyj2zgv647qfqrws5q798n-python3.7-tensorflow-gpu-1.15.0/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py", line 3476, in __call__
    run_metadata=self.run_metadata)
  File "/nix/store/520352w3m8lyj2zgv647qfqrws5q798n-python3.7-tensorflow-gpu-1.15.0/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
    [[{{node replica_1/resnet50/conv1_conv/Conv2D}}]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
    [[{{node replica_1/resnet50/conv1_conv/Conv2D}}]]
    [[training/RMSprop/gradients/gradients/Switch_482/_3893]]
0 successful operations.
1 derived errors ignored.

I want to point out in a separate issue https://github.com/tensorflow/tensorflow/issues/36501 that while using those options enables the code to run, observing the actual memory usage of the GPUs shows that it is not even really doing incremental memory usage. So the option above fixes the error, but it doesn't actually do what it claims to be doing. I used to use the same model back in older TF versions like 1.2... etc and those did actual incremental memory allocation.

I have the same problems as everyone here! After having installed tf 2.1 I couldn't get a simple MNIST example to run without adding memory growth to the GPU. I use a 2080 ti.

The major problem I face is that I cannot run tensorflow-probability together with tf 2.1 without getting the cursed CUDNN internal error, even with memory growth added to the code. I have tried installing tf 2.0, CUDA 10.0 and CUDA 10.1, different CUDNN versions. I managed to fix the simple MNIST example to work without the growth after completely reinstalling my ubuntu but not the tensorflow probability example. I finally tried using a tensorflow official nightly docker and still got the same error when using tensorflow probability (tf 2.2 inside container). Everything runs fine on CPU. I have also tried running the same docker on a machine with 1080 ti and that worked... There is definitely something wrong with the RTX series I feel.

error with tf docker and tensorflow-probability example and extra cudnn debug info:

TF VERSION: 2.2.0-dev20200208
2020-02-11 08:51:05.891560: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-11 08:51:05.912465: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3696000000 Hz
2020-02-11 08:51:05.913040: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x57b1fd0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-11 08:51:05.913052: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-02-11 08:51:05.914414: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-11 08:51:05.975016: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.975364: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5679220 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-02-11 08:51:05.975376: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2020-02-11 08:51:05.975477: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.975744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.75GiB deviceMemoryBandwidth: 573.69GiB/s
2020-02-11 08:51:05.975865: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-11 08:51:05.976745: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-11 08:51:05.977582: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-11 08:51:05.977722: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-11 08:51:05.978636: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-11 08:51:05.979165: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-11 08:51:05.981150: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-11 08:51:05.981216: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.981528: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.981792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Adding visible gpu devices: 0
2020-02-11 08:51:05.981812: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-11 08:51:05.982323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1099] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-11 08:51:05.982331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105]      0 
2020-02-11 08:51:05.982335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 0:   N 
2020-02-11 08:51:05.982395: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.982687: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.982959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/device:GPU:0 with 9604 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-02-11 08:51:05.983594: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.983864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.75GiB deviceMemoryBandwidth: 573.69GiB/s
2020-02-11 08:51:05.983881: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-11 08:51:05.983889: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-11 08:51:05.983896: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-11 08:51:05.983904: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-11 08:51:05.983912: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-11 08:51:05.983920: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-11 08:51:05.983928: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-11 08:51:05.983961: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.984238: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.984497: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Adding visible gpu devices: 0
2020-02-11 08:51:05.984508: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1099] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-11 08:51:05.984512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105]      0 
2020-02-11 08:51:05.984516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 0:   N 
2020-02-11 08:51:05.984563: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.984842: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.985099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/device:GPU:0 with 9604 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
SUCCESS: Found GPU: /device:GPU:0
2020-02-11 08:51:05.989382: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.989649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.75GiB deviceMemoryBandwidth: 573.69GiB/s
2020-02-11 08:51:05.989663: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-11 08:51:05.989671: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-11 08:51:05.989678: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-11 08:51:05.989684: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-11 08:51:05.989691: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-11 08:51:05.989700: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-11 08:51:05.989709: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-11 08:51:05.989744: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.990021: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.990347: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Adding visible gpu devices: 0
2020-02-11 08:51:05.990544: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.990807: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.75GiB deviceMemoryBandwidth: 573.69GiB/s
2020-02-11 08:51:05.990820: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-11 08:51:05.990828: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-11 08:51:05.990834: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-11 08:51:05.990841: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-11 08:51:05.990848: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-11 08:51:05.990854: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-11 08:51:05.990861: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-11 08:51:05.990892: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.991171: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.991426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Adding visible gpu devices: 0
2020-02-11 08:51:05.991437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1099] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-11 08:51:05.991441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105]      0 
2020-02-11 08:51:05.991444: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 0:   N 
2020-02-11 08:51:05.991486: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.991763: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-11 08:51:05.992022: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9604 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/linalg/linear_operator_lower_triangular.py:158: calling LinearOperator.__init__ (from tensorflow.python.ops.linalg.linear_operator) with graph_parents is deprecated and will be removed in a future version.
Instructions for updating:
Do not pass `graph_parents`.  They will  no longer be used.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/linalg/linear_operator_lower_triangular.py:158: calling LinearOperator.__init__ (from tensorflow.python.ops.linalg.linear_operator) with graph_parents is deprecated and will be removed in a future version.
Instructions for updating:
Do not pass `graph_parents`.  They will  no longer be used.
2020-02-11 08:51:06.822991: W tensorflow/python/util/util.cc:319] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
Epoch 1/15
2020-02-11 08:51:07.907445: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-11 08:51:09.832694: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7

I! CuDNN (v7604) function cudnnCreate() called:
i! Time: 2020-02-11T08:51:09.832722 (0d+0h+0m+4s since start)
i! Process=205; Thread=269; GPU=NULL; Handle=NULL; StreamId=NULL.

2020-02-11 08:51:10.409902: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

I! CuDNN (v7604) function cudnnCreate() called:
i! Time: 2020-02-11T08:51:10.410012 (0d+0h+0m+5s since start)
i! Process=205; Thread=269; GPU=NULL; Handle=NULL; StreamId=NULL.

2020-02-11 08:51:10.417952: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
      1/Unknown - 4s 4s/stepTraceback (most recent call last):
  File "VAE_MNIST_tfp.py", line 150, in <module>
    validation_data=eval_dataset)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 718, in fit
    use_multiprocessing=use_multiprocessing)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 341, in fit
    total_epochs=epochs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 128, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 98, in execution_function
    distributed_function(input_fn))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 576, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 640, in _call
    return self._stateless_fn(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2414, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1660, in _filtered_call
    self.captured_inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1741, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 598, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node model/conv2d/Conv2D (defined at VAE_MNIST_tfp.py:150) ]] [Op:__inference_distributed_function_4291]

Errors may have originated from an input operation.
Input Source operations connected to node model/conv2d/Conv2D:
 model/lambda/sub (defined at VAE_MNIST_tfp.py:98)

Function call stack:
distributed_function

@sanjoy I have the same issue with RTX 2080 and can build from source if needed.

@odinsbane you'll have to build TensorFlow from source to do what I suggest below.

First step is to add LOG(INFO) or std::cerr lines to MinSystemMemory to print out available_memory and the return value from MinSystemMemory. Does available_memory agree with what nvidia-smi prints? How much memory are we leaving for the system?

Secondly, does increasing the 0.05 magic number to, say, 0.07 help at all?

can confirm that building from source with changing the magic number 0.05 magic number to 0.1 seems to fix the issue (at least for 1.15.2)!

In an ocean of noisy post the minimum system memory magic number totally seems logical. Thanks for sharing!

@chsigg Any suggestions? Maybe we can try to initialize cuDNN, cuBLAS and other NVIDIA libraries _before_ we reserve all of the GPU memory?

We can also try to enable allow_growth by default, but that's going to take time.

This problem seems related with my RTX2080, I have a desktop GTX1080, everything seems ok, then i use conda clone the conda enviroment to my RTX2080 notebook, I use tensorflow2.0.0-gpu . once application code use Conv2d, LSTM, GRU then this trouble come.
before I use the following codes to solve this problem:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:

Currently, memory growth needs to be the same across GPUs

    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:

Memory growth must be set before GPUs have been initialized

    print(e)

but since several days ago, the above method does not work any more

Have been trying to run the lambda Tensorflow2-tutorial basic-image-classification code for days and getting the same cudnn handle error until I tried your solution. It is finally running now on RTX 2070 Max Q and using minimal GPU memory.

I also meet this problem
anacondacloud install tensorflow-gpu2.0

rtx2070s
tensorflow-gpu.2.0.0
cuda 10.0.13
cudnn 7.6.5
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

I also meet this problem
anacondacloud install tensorflow-gpu2.0

rtx2070s
tensorflow-gpu.2.0.0
cuda 10.0.13
cudnn 7.6.5
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

Did you insert:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
    print(e)

at the top of your entry code?

After quite some time experimenting with an apparently different problem failing with tf.signal.stft
I finally came across this thread and tried the solution allowing the memory growth. It solved my problem as well.
I have installed tensorflow-gpu=2.1 with cudatoolkit=10.1 from anaconda, but tried as well installing
tensorflow-gpu via pip with exactly the same result. I can reproduce this under linux-ubuntu 18.04 and debian 9.12 with the cards

   GeForce GTX 1050 Ti with Max-Q Design   
   GeForce GTX 1050 Ti
   GeForce RTX 2080 Ti

I also tried two other cards in our lab

  GeForce GTX 1080 Ti
  TITAN Xp COLLECTORS EDITION

where the code runs fine with and without allowing memory growth

My minimal problem is below. Interestingly the problem is not conv2d. I can change the order of these three commands and it is always the third that one fails.

import sys
import tensorflow as tf

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus and len(sys.argv)> 1 and sys.argv[1].startswith("-a"):
    print("allowing growth")
    growth = True
else:
    print("nogrowth")
    growth = False

try:
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, growth)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
    print(e)

tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
tf.matmul(tf.zeros((2,2,2)), tf.zeros((2,2,2)))
tf.nn.conv2d(tf.zeros((2,20,20,20), dtype=tf.float32),
                                         filters=tf.zeros((2,2,20,20), dtype=tf.float32),
            strides=(1,1,1,1), padding="VALID")
print("done")

我也遇到这个问题
anacondacloud install tensorflow-gpu2.0
rtx2070s
tensorflow-gpu.2.0.0
cuda 10.0.13
cudnn 7.6.5
无法创建cudnn句柄:CUDNN_STATUS_INTERNAL_ERROR
无法获取卷积算法。这可能是因为cuDNN初始化失败,所以请尝试查看上面是否打印了警告日志消息。

您是否插入:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
    print(e)

在您输入代码的顶部?

yeah,I solved this problem like this way.Thanks!!

I had the same problem and allow_growth = True was the solution. BUT, for TensorFlow 2, in order to do that you need to add the following lines:

gpu_devices = tf.config.experimental.list_physical_devices('GPU') for device in gpu_devices: tf.config.experimental.set_memory_growth(device, True)

Thanks to user @opcecco in this issue: https://github.com/tensorflow/tensorflow/issues/25446

Interestingly the problem is not conv2d. I can change the order of these three commands and it is always the third that one fails.

@roebel Can you please attach logs for a few different six permutations?

And what happens if you change the program to (say):

tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
tf.matmul(tf.zeros((2,2,2)), tf.zeros((2,2,2)))
tf.nn.conv2d(tf.zeros((2,20,20,20), dtype=tf.float32),
                                         filters=tf.zeros((2,2,20,20), dtype=tf.float32),
            strides=(1,1,1,1), padding="VALID")

Does the failure still happen at the conv2d or does it happen at the third stft?

@sanjoy sure here three variations of the script above changing the order of commands and a fourth variant that starts with 4 stft and ends with conv2d

The four different logs use the script from
https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-593098386
replacing the last four lines.

In short the results depending on the order:

stft->blas->conv2d fails when executing conv2d
conv2d->stft->blas fails when executing stft (so not the third, but blas seems to be loaded already for conv2d
matmul-> conv2d-> stft fails when executing STFT
stft->-stft->-stft->stft->matmul-> conv2d fails when conv2d is executed. Please see the logs below.

Don't mind asking for other variants if needed.

conv2d last:

tf.matmul(tf.zeros((2,2,2)), tf.zeros((2,2,2)))
tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
tf.nn.conv2d(tf.zeros((2,20,20,20), dtype=tf.float32),
                                         filters=tf.zeros((2,2,20,20), dtype=tf.float32),
            strides=(1,1,1,1), padding="VALID")
print("done")

log.conv2d.last.txt

matmul last

tf.nn.conv2d(tf.zeros((2,20,20,20), dtype=tf.float32),
                                         filters=tf.zeros((2,2,20,20), dtype=tf.float32),
            strides=(1,1,1,1), padding="VALID")
tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
tf.matmul(tf.zeros((2,2,2)), tf.zeros((2,2,2)))
print("done")

log.matmul.last.txt

stft last

tf.matmul(tf.zeros((2,2,2)), tf.zeros((2,2,2)))
tf.nn.conv2d(tf.zeros((2,20,20,20), dtype=tf.float32),
                                         filters=tf.zeros((2,2,20,20), dtype=tf.float32),
            strides=(1,1,1,1), padding="VALID")
tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
print("done")

log.stft.last.txt

4 stft first conv2d last:

tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
tf.matmul(tf.zeros((2,2,2)), tf.zeros((2,2,2)))
tf.nn.conv2d(tf.zeros((2,20,20,20), dtype=tf.float32),
                                         filters=tf.zeros((2,2,20,20), dtype=tf.float32),
            strides=(1,1,1,1), padding="VALID")
print("done")

log.multi_stft.first.txt

Many thanks

I got the same problem with following configuration:
TensorFlow installed from (source or binary): r1.13.1,r.1.13.2,r1.14
Python version: 3.6.1
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: CUDA 10 with cuDNN 7.4.1
GPU model and memory: RTX 2070 8GB.

I sovled this problem with:
TensorFlow installed from (source or binary): r1.12.0
Python version: 3.6.9
GCC/Compiler version: 4.8
CUDA/cuDNN version: CUDA 9.0 with cuDNN 7.1.4
GPU model and memory: RTX 2070 8GB.
Hope helpful to you

I've also faced such a problem, which was solved by adding an environment variable TF_FORCE_GPU_ALLOW_GROWTH=true.

The configuration is the following:
Windows 10
Tensorflow compiled from source r2.0
Bazel: 0.26.1
C++ compiler: MSVC 2017
CUDA: 10
cuDNN: 7.6.5

intel4930 cpu, nvidia titan XP pascal
Ubuntu 18.04.4 , miniconda latest,
`!conda list | grep "cud" gives

    cudatoolkit               10.1.243             h6bb024c_0  
    cudnn                     7.6.5                cuda10.1_0  

`!conda list | grep "tensor"`` gives

tensorboard               2.1.0                     py3_0  
tensorflow                2.1.0           gpu_py37h7a4bb67_0  
tensorflow-base           2.1.0           gpu_py37h6c5654b_0  
tensorflow-estimator      2.1.0              pyhd54b08b_0  
tensorflow-gpu            2.1.0                h0d30ee6_0  

first cell in jupyter notebook is:

import tensorflow as tf
gpu_devices = tf.config.experimental.list_physical_devices('GPU')
for device in gpu_devices: tf.config.experimental.set_memory_growth(device, True)

model is a variational autoencoder with Total params: 112,269
x_train.shape, y_train.shape, x_test.shape, y_test.shape gives
((106496, 32, 32, 1), (106496,), (12288, 32, 32, 1), (12288,))

code includes:

batch_size=64
var_auto_encoder.fit(x_train, x_train, verbose=1, 
                 batch_size=batch_size, epochs=100,
                 validation_data=(x_test, x_test))

and it fails. console shows

2020-03-18 15:46:03.019451: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-18 15:46:03.179472: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-18 15:46:03.566267: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-03-18 15:46:03.569842: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-03-18 15:46:03.569907: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node conv2d/Conv2D}}]]
2020-03-18 15:46:03.573206: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

I f instead of the first cell as noted above , I use

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.2
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

then I get this error


2020-03-18 15:55:43.050094: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-03-18 15:55:43.050123: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-18 15:55:43.050150: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-03-18 15:55:43.050177: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-03-18 15:55:43.050209: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-03-18 15:55:43.050246: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-03-18 15:55:43.050273: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-18 15:55:43.050337: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-18 15:55:43.050720: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-18 15:55:43.051063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-03-18 15:55:43.051097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-18 15:55:43.051108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-03-18 15:55:43.051116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-03-18 15:55:43.051201: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-18 15:55:43.051573: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-18 15:55:43.051915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 16 MB memory) -> physical GPU (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-03-18 15:56:07.877181: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-03-18 15:56:07.882424: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-03-18 15:56:07.886148: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-03-18 15:56:07.889830: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR


Why am I having the problem if I allow memory growth? Do I need to reboot to reinitialize the gpu?

Interestingly, during my struggles, I got a message from a red 'no entry' sign in my menubar that said 'error broken count you have unmet dependenceis'
I ran software update and it wants to remove libcudnn7-dev and libcudnn7-doc
as well as upgrade 57 other libraries having to do with linux

EDIT: After reboot the model seems to train successfully using this:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.2
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

or this:

import tensorflow as tf
gpu_devices = tf.config.experimental.list_physical_devices('GPU')
for device in gpu_devices: tf.config.experimental.set_memory_growth(device, True)

memory utilization on the gpu is <700 MB with batch size 16 and
~1 gigabyte with batch size 256 (which trains 3x faster)

I did try compiling from source, but ran into the same issue. I was finally able to fix my problem was setting config.gpu_options.allow_growth = True.

But if i met this issue in command line, how to add these codes ?

I also meet this problem
anacondacloud install tensorflow-gpu2.0
rtx2070s
tensorflow-gpu.2.0.0
cuda 10.0.13
cudnn 7.6.5
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

Did you insert:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
    print(e)

at the top of your entry code?

I had the exact same problem as above. Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

The solution from @robosmith fix my problem completely!

My specs:
RTX 2070
Ubuntu 18.04 LTE
Tensorflow 2.1.0
Keras 2.3.0
cudnn 7.6.5
cuda10.1.0
conda 4.8.3
python 3.7.7

Built via conda install tensorflow-gpu keras

Thank you so much! This is the first time that I've gotten TF-2 to work at all! And TF-1 stopped working altogether, which is why I decided to upgrade and 'see what happens'!

Thank you!

config.gpu_options.allow_growth = True

when you use tensorflow 2.0 , you can use
tf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True)
this code is after import tensorflow as tf but before your code.

I did try compiling from source, but ran into the same issue. I was finally able to fix my problem was setting config.gpu_options.allow_growth = True.

This code is shared to make it faster available for both tensorflow and keras users.
source from here

# Tensorflow
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)


#And for Keras
from keras.callbacks import ModelCheckpoint
from keras.models import Model, load_model, save_model, Sequential
from keras.layers import Dense, Activation, Dropout, Input, Masking, TimeDistributed, LSTM, Conv1D
from keras.layers import GRU, Bidirectional, BatchNormalization, Reshape
from keras.optimizers import Adam
from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True  # dynamically grow the memory used on the GPU
config.log_device_placement = True  # to log device placement (on which device the operation ran)
sess = tf.Session(config=config)
set_session(sess)  # set this TensorFlow session as the default session for Keras

Just wanted to chime in and say that the problem is still there;

My specs:
Ubuntu 20.04
NVIDIA RTX 2070
Nvidia_driver 440.64
Tensorflow-gpu 2.0.1 (Installed through conda, which automatically installs Cudatoolkit and CuDNN in same env)
cudatoolkit 10.1.243
cudnn 7.6.5

Problem is solved by tf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True)

However this seems more like a work-around than an actual fix, and a lot of people have 20XX cards these days. Probably there should be an update in which this issue is addressed.

Update: Since I'm dual-booting, I tried to check for windows as well. Problem persists there.
Windows 10
Nvidia-driver 445.87
Other than that everything is similar

Installing the latest driver (445.87) for my RTX 2080 solved this issue for me.

@NBouman That is interesting but for me on Ubuntu 18.04 with GeForce GTX 1050 TI, I just updated to the last available driver 440.82. Still allowing memory growth is required to make it work.

Installing the latest driver (445.87) for my RTX 2080 solved this issue for me.

@NBouman What OS are you using? I'm on Ubuntu 20.40, and the latest available driver I could find is 440.82, and, like @roebel, the problem persists.

@roebel @eduardoscsouza I am on Windows 10 with the machine that earlier had this issue.

Just wanted to chime in and say that the problem is still there;

My specs:
Ubuntu 20.04
NVIDIA RTX 2070
Nvidia_driver 440.64
Tensorflow-gpu 2.0.1 (Installed through conda, which automatically installs Cudatoolkit and CuDNN in same env)
cudatoolkit 10.1.243
cudnn 7.6.5

Problem is solved by tf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True)

However this seems more like a work-around than an actual fix, and a lot of people have 20XX cards these days. Probably there should be an update in which this issue is addressed.

Update: Since I'm dual-booting, I tried to check for windows as well. Problem persists there.
Windows 10
Nvidia-driver 445.87
Other than that everything is similar

For tensorflow 2.0.0 worked with:
tf.config.experimental.set_memory_growth(tf.config.experimental.list_physical_devices('GPU')[0],True)

Thank you!!! thousands of thank you!!!!!!!!!!!!!!!

OS: ubuntu 18.04 lts

Driver Version: 435.21

CUDA: cudatoolkit 10.1

CUDNN: cudnn-7.6.5-cuda10.1_0

I used anaconda install tensorflow

conda create -n tf-gpu tensorflow-gpu

the cudatoolkit and cudnn are auto-install by anaconda through the command before.

I have the same question, The error:

coreClock: 1.5315GHz coreCount: 3 deviceMemorySize: 1.96GiB deviceMemoryBandwidth: 44.76GiB/s
2020-05-12 17:58:44.119679: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-12 17:58:44.119694: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-12 17:58:44.119707: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-12 17:58:44.119719: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-12 17:58:44.119732: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-12 17:58:44.119744: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-12 17:58:44.119756: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-12 17:58:44.119819: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-12 17:58:44.120069: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-12 17:58:44.120277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-12 17:58:44.120308: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-12 17:58:44.174976: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-12 17:58:44.175003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-05-12 17:58:44.175012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-05-12 17:58:44.175136: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-12 17:58:44.175392: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-12 17:58:44.175624: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-12 17:58:44.175844: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1439 MB memory) -> physical GPU (device: 0, name: GeForce MX150, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-05-12 17:58:44.177113: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55abc3d20b80 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-12 17:58:44.177129: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce MX150, Compute Capability 6.1
2020-05-12 17:58:44.177749: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 376320000 exceeds 10% of system memory.
2020-05-12 17:58:44.787493: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 376320000 exceeds 10% of system memory.
WARNING:tensorflow:Layer my_model is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.

To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

2020-05-12 17:58:45.311821: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-12 17:58:45.467966: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-12 17:58:45.904025: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-05-12 17:58:45.913861: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-05-12 17:58:45.913978: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node my_model/conv2d/Conv2D}}]]

So we have here a problem that is unsolved (besides a workaround that is against official recommendations to not use memory growth for more efficient memory handling). There has not been much feedback by the dev team. I wonder why?

This bug seems to affect quite a variety of tensorflow versions (1.13, 2.0, 2.1), If I saw correctly all problems are reported to happen with cuda 10. The code runs fine on many cards but not on others.
Could somebody of the dev team tell us whether this hints towards a problem in the cuda driver more than in the tensorflow layer ? In that case it would certainly be helpful to transmit the bug report to the NVIDIA support pages. Wouldn't it?

Could somebody from the tensorflow dev team comment on how they see this bug ? Is there anybody looking into this ?

Have people been checking if there are two CuDNN 7 shared libraries on the path or LD library path. There are no minor or patch numbers in this library but version mismatches can lead to this error message.

I opened a bug report at NVIDIA, I'll let you know what comes out of that.

@samhodge
Indeed there are many versions of libcudnn installed, each anaconda env has its own version.
Normally anaconda installs with rpath properly set up so it is rather difficult to not get the right libraries.

I have made an strace and grepped the libraries that are opened when it fails
They consistently come from the anaconda env dir that hosts the tensorflow package (see below).
Besides libcuda that is version 440.82 and that I have compiled with the NVIDIA installer.

I can set the LD_LIBRARY_PATH to one of the other anaconda env lib dirs with different cudatoolkits and different libcudnn, the trace remains the same.
Note as well that it is not lbcudnn that poses the problem. It is always the third libcuxyz library that is
used and this only on specifc GPUs (I have used the same install script on different machines with different GPUs, some do work some don't) and they work all if memory growth is enabled.

(tf2.1) m3088.roebel: (test_sd) 510> grep open trace.log  | grep libcu | grep -v -- -1
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libcuda.so.1", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/../../../../libcudart.so.10.1", O_RDONLY|O_CLOEXEC) = 11
openat(AT_FDCWD, "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/../../../../libcublas.so.10", O_RDONLY|O_CLOEXEC) = 11
openat(AT_FDCWD, "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/../../../.././libcublasLt.so.10", O_RDONLY|O_CLOEXEC) = 11
openat(AT_FDCWD, "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/../../../../libcufft.so.10", O_RDONLY|O_CLOEXEC) = 11
openat(AT_FDCWD, "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/../../../../libcurand.so.10", O_RDONLY|O_CLOEXEC) = 11
openat(AT_FDCWD, "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/../../../../libcusolver.so.10", O_RDONLY|O_CLOEXEC) = 11
openat(AT_FDCWD, "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/../../../../libcusparse.so.10", O_RDONLY|O_CLOEXEC) = 11
openat(AT_FDCWD, "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/../../../../libcudnn.so.7", O_RDONLY|O_CLOEXEC) = 11

I got the same problem on Ubuntu 20.04 with a GeForce RTX 2060 SUPER. A NN with dense layers works well. But with CNN layers I'm getting Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
Adding tf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True) makes no difference to the error.
I followed the installation according to https://www.tensorflow.org/install/gpu and nvidia-smi shows:
Driver Version: 440.64.00 CUDA Version: 10.2
My conda env has:

cudatoolkit               10.1.243             h6bb024c_0  
cudnn                     7.6.5                cuda10.1_0  
tensorflow-gpu            2.1.0                h0d30ee6_0

In a conda env with tf 1.15 I am getting the same error. It would be great if this could be fixed.

Update

After using export TF_FORCE_GPU_ALLOW_GROWTH=true it all works. I was of the impression that the tf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True) would to the same thing, but that's not the case. I think this should be clearly stated on the TensorFlow GPU support webpage.

@samhodge
Indeed there are many versions of libcudnn installed, each anaconda env has its own version.
Normally anaconda installs with rpath properly set up so it is rather difficult to not get the right libraries.

I have made an strace and grepped the libraries that are opened when it fails
They consistently come from the anaconda env dir that hosts the tensorflow package (see below).
Besides libcuda that is version 440.82 and that I have compiled with the NVIDIA installer.

I can set the LD_LIBRARY_PATH to one of the other anaconda env lib dirs with different cudatoolkits and different libcudnn, the trace remains the same.
Note as well that it is not lbcudnn that poses the problem. It is always the third libcuxyz library that is
used and this only on specifc GPUs (I have used the same install script on different machines with different GPUs, some do work some don't) and they work all if memory growth is enabled.

(tf2.1) m3088.roebel: (test_sd) 510> grep open trace.log  | grep libcu | grep -v -- -1
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libcuda.so.1", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/../../../../libcudart.so.10.1", O_RDONLY|O_CLOEXEC) = 11
openat(AT_FDCWD, "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/../../../../libcublas.so.10", O_RDONLY|O_CLOEXEC) = 11
openat(AT_FDCWD, "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/../../../.././libcublasLt.so.10", O_RDONLY|O_CLOEXEC) = 11
openat(AT_FDCWD, "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/../../../../libcufft.so.10", O_RDONLY|O_CLOEXEC) = 11
openat(AT_FDCWD, "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/../../../../libcurand.so.10", O_RDONLY|O_CLOEXEC) = 11
openat(AT_FDCWD, "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/../../../../libcusolver.so.10", O_RDONLY|O_CLOEXEC) = 11
openat(AT_FDCWD, "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/../../../../libcusparse.so.10", O_RDONLY|O_CLOEXEC) = 11
openat(AT_FDCWD, "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/../../../../libcudnn.so.7", O_RDONLY|O_CLOEXEC) = 11

So you are sort of illustrating my point libcudnn.so.7 doesn't say 7.XXX.YYY on top of that 7.XXX.YYY has a further dependancy on CUDA 10.2 10.1 10.0 9.2 9.1 9.0 etc

I have not seen the error since I started managing the path well and managing the amount of memory available before initialising a graph of a known size and making sure that the targeted GPU only used enough memory for the graph and enough memory to query how much CUDA memory is available.

I think it is a resources problem. How much memory is available when you start the process and how much memory does your graph use?

@kognat-docs

So you are sort of illustrating my point libcudnn.so.7 doesn't say 7.XXX.YYY on top of that 7.XXX.YYY has a further dependancy on CUDA 10.2 10.1 10.0 9.2 9.1 9.0 etc

The question you posed was "Have people been checking if there are two CuDNN 7 shared libraries on the path or LD library path". And my answer was: I have checked this, there is only one.
I've sent you the trace.

I have not seen the error since I started managing the path

What do you mean by managing the path?
I always manage my paths ! I have installed a conda environment which I verified to be consistent! Everything is as it has been packaged by anaconda, I verified this.

Anyway you may believe I am too stupid, to set up anaconda. Well
I now have downloaded the official docker image

tensorflow/tensorflow:2.1.0-gpu-py3

and run my script in there. It crashes if I don't have

export TF_FORCE_GPU_ALLOW_GROWTH=true

Can I manage paths any better ?

and managing the amount of memory available before initialising a graph of a known size and making sure that the targeted GPU only used enough memory for the graph and enough memory to query how much CUDA memory is available.

I think it is a resources problem. How much memory is available when you start the process and how much memory does your graph use?

As I wrote above in my report there is no graph (or better say there is hardly a graph) ! I just run these four lines

import tensorflow as tf
tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
tf.matmul(tf.zeros((2,2,2)), tf.zeros((2,2,2)))
tf.nn.conv2d(tf.zeros((2,20,20,20), dtype=tf.float32), filters=tf.zeros((2,2,20,20), dtype=tf.float32), strides=(1,1,1,1), padding="VALID")

and it crashes. If I change the order of the three lines it always crashes after these three operations (I had explained this in my bug report).

Just for the fun of it I counted the bytes: there is <83kB of data memory required. The GPU is empty, I don't use it for graphics, and there are no other processes running on it. On the various systems there are 4GB or 11GB available! Besides I know how to run nvidia-smi! So the card is empty still I cannot run these 4 lines that require 84kB!

Just for your information, an error due to memory exhausted looks quite differently, I have these as well. For my real graphs, I am very well able to detect these and react accordingly.

Thanks for your efforts anyway.

@roebel Did you see @sanjoy s comment about debugging from the cpp https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-561366750 ?

I haven't gotten around to recompiling tensorflow and trying it out. Their versions move so fast it would take me a bit to setup and compile everything. Plus, 1.15 dropped support for the gcc version I use, and 1.13 doesn't receive any updates so it was somewhat pointless for me to debug this anyway.

@roebel I did not recall what triggered the problem for you.

see this https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-480549043

Which is why I thought it was memory related, this issue has not effected me for some time, nor the users of my software on a variety of platforms.

@samhodge

Yes, I understand if there is a bug it does seem to be triggered by a rather particular situation only.

@odinsbane

thanks, no I had not noticed that. I will see whether manage to compile the most recent version tf2.2.0.

In fact I tried the docker with tensorflow 2.2, it uses the same version of cuda 10.1 and has the same problem.

Thought this was a windows only problem so I installed an ubuntu environment from scratch, only to find out it's my graphics card (RTX 2080) that is the issue. Unfortunately I think I'm going to select a different machine learning platform due to this issue, since this seems to have been a problem since 2018.

@kognat-docs

and managing the amount of memory available before initialising a graph of a known size and making sure that the targeted GPU only used enough memory for the graph and enough memory to query how much CUDA memory is available.

I think it is a resources problem. How much memory is available when you start the process and how much memory does your graph use?

As I wrote above in my report there is no graph (or better say there is hardly a graph) ! I just run these four lines

import tensorflow as tf
tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
tf.matmul(tf.zeros((2,2,2)), tf.zeros((2,2,2)))
tf.nn.conv2d(tf.zeros((2,20,20,20), dtype=tf.float32), filters=tf.zeros((2,2,20,20), dtype=tf.float32), strides=(1,1,1,1), padding="VALID")

and it crashes. If I change the order of the three lines it always crashes after these three operations (I had explained this in my bug report).

Just for the fun of it I counted the bytes: there is <83kB of data memory required. The GPU is empty, I don't use it for graphics, and there are no other processes running on it. On the various systems there are 4GB or 11GB available! Besides I know how to run nvidia-smi! So the card is empty still I cannot run these 4 lines that require 84kB!

Did you observe how much memory was in use by using watch on nvidia-smi while you process was running with an interval of 50 ms?

See this fix that worked for other people

https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-497202806

So you can do the patch without touching the code just by altering your runtime environment.

Another way to enable this option is to set the environmental variable TF_FORCE_GPU_ALLOW_GROWTH to true. This configuration is platform specific.

@sanjoy @odinsbane

Good news!
Following
https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-561366750

I rebuilt the version 2.1 using the anaconda tensorflow recipe from here
https://github.com/AnacondaRecipes/tensorflow_recipes

I added two prints in MinSystemMemory showing available_memory and min_system_memory.
On my system with GeForce GTX 1050 Ti disabling the TF standard log
I have got this

TF_CPP_MIN_LOG_LEVEL=2 python run_cuda.py 
=========================================================
MinSystemMemory: available_memory::4163764224
MinSystemMemory: min_system_memory::314572800
=========================================================
1 Physical GPUs, 1 Logical GPUs
2020-05-21 09:44:32.143642: E tensorflow/stream_executor/cuda/cuda_fft.cc:223] failed to make cuFFT batched plan:5
2020-05-21 09:44:32.143671: E tensorflow/stream_executor/cuda/cuda_fft.cc:426] Initialize Params: rank: 1 elem_count: 512 input_embed: 512 input_stride: 1 input_distance: 512 output_embed: 257 output_stride: 1 output_distance: 257 batch_count: 20
2020-05-21 09:44:32.143677: F tensorflow/stream_executor/cuda/cuda_fft.cc:435] failed to initialize batched cufft plan with customized allocator: Failed to make cuFFT batched plan.
Aborted

nvidia-smi reports the GPU has 4040MiB, on this system there is X running on the card which has 13MiB so the numbers seem fine.

min_system_memory is set like this

    min_system_memory =                                                                                                                        
        std::max(int64{314572800}, static_cast<int64>(available_memory * 0.05));                                                               

So the maximum amount o memory is chosen anyway. Instead I added a mechanism to force the min_system_memory via environment variable TF_FORCE_MIN_SYSTEM_MEMORY_MB.
Then running

TF_FORCE_MIN_SYSTEM_MEMORY_MB=310 TF_CPP_MIN_LOG_LEVEL=2 python run_cuda.py 
=========================================================
MinSystemMemory: available_memory::4163764224
MinSystemMemory: min_system_memory::314572800
MinSystemMemory: forced min_system_memory::325058560
=========================================================
1 Physical GPUs, 1 Logical GPUs
done

the problem is solved!

Unfortunately I don't currently have a system with a working RTX card and I am not sure when those will be back working. If anybody would be willing to test this on such a card I could provide the pip package and the content of the conda environment for ubuntu linux that needs to be installed to have it run.

Nice one @roebel !

Might be worth suggesting that as a pull request and add to the docs.

@samhodge @sanjoy @odinsbane

Might be worth suggesting that as a pull request and add to the docs.

Sure, but the problem is that the solution will probably not work for the other cards.
For my GTX 1050 the total memory is 4GB and the default system memory retained
by tensorflow is max(300MB,4GB*0.05). So for GTX1050 this will be 300MB which apparently is too small. As mentioned above I need to increase to 310MB.

Now for the RTX2080 the total memory is 11GB which with max(300MB,11GB*0.05)
will select system memory to be 550MB, which according to the findings on the 1050
should normally be enough.

I will have access to the RTX2080 GPUs again by the end of the week and will see
what I get there.

@samhodge @sanjoy @odinsbane

Finally I have been able to run the patched library on the rtx 2080 cards.
As expected the patched version does not pass. Here again the script

import tensorflow as tf
tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
tf.matmul(tf.zeros((2,2,2)), tf.zeros((2,2,2)))
tf.nn.conv2d(tf.zeros((2,20,20,20), dtype=tf.float32), filters=tf.zeros((2,2,20,20), dtype=tf.float32), strides=(1,1,1,1), padding="VALID")

And here the matrix of available memory reported from gpu_device.cc,
default value of Min_system_memory as selected in gpu_device.cc and the
min value of the min_system_memory I need to select for the script to not abort:

Card | AvailMem | Def MinSysMem | Required MinSysMem
:-------|:-----------|:----------|:-----------------------
1050 TI | 4163764224 | 314572800 | 325058560
1080 TI | 11567431680 | 578371584 | 335544320
2080 TI | 11381964800 | 569098240 | 618659840

So while 1050 and 1080 run the script with about the same memory size
the RTX2080 requires nearly twice as much memory. This does not sound good
to me.

Any suggestions what to try to get this to a comparable value?

@roebel

I have struggled with this in my C++ application for a number of iterations.

What is came down to in the end was the following.

Only run models on the GPU when enough memory is available to run the model.

So the amount of memory that the model will require is quantifiable.

So you need to have a GPU memory as a percentage which will fit that model.

Then you also need to know about how much memory is available on the card exactly before allocating the memory, which is subject to race conditions, because you don't know what else is using CUDA memory at the same time on the operating system.

But the race condition aside, you also need to measure the memory free.

This is done by using cudaMemInfo, which in itself uses memory.

So on the provision that you have enough memory to run cudaMemInfo once to measure and you need to make sure that enough memory is free to fit the model and run cudaMemInfo one more time, then and only then you can allocate enough of the percentage of available VRAM on that card for running the model.

Anyway the take home from my random babbling is that cudaMemInfo is required to poll the amount of memory available to allocate which in itself also uses some of that available memory.

Maybe somehow the amount of memory used by cudaMemInfo is different on a Turing based card compared at a Pascal based card, I can get someone from NVIDIA to have a look if you wish.

Yeah I cannot find reference to the cudaMemInfo at all but that seems like the kind of footprint which would be the max of 300Mb and 5% of the card's memory.

having a look at:

https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow/core/common_runtime/gpu/gpu_process_state.cc

It doesn't seem like it is using this per se.

I don't think we should be playing cat-and-mouse with the amount of memory we need to reserve for system libraries -- as you've observed, there is no systematic way to get this right.

Instead IMO we should try to initialize the system libraries before BFC allocator has had a chance to allocate the rest of the GPU's memory.

CC @chsigg

Probably one should do this only if allow memory growth is off. Otherwise you will always need about 580MB for the 2080 even if you don't need all the operators.

I made a few more test concerning the minimum system memory requirements for running combinations of the three operations from my test case. I compare only the 1080 and 2080 cards. You dont find conv2d alone because it initializes blas in any case. Out comes

GPU| MatMul | STFT | Conv2D+MatMUL | MatMul+STFT | MATMUL+STFT+Conv2D|
:---|:---|:---|:---|:---|:---
1080 | 140MB | 130MB | 290MB | 170MB | 320MB
2080 | 190MB | 190MB | 520MB | 250MB |580MB

One can see that on the 2080 cuda requires an overhead for each operation, and that this overhead increases when using more libraries. In most cases the overhead is <100MB but it becomes >220MB once Conv2D is involved..

If @samhodge has contact to NVIDIA I would personnally find it interesting to hear whether this is intended.

Hello everyone!
I have solved similar problem with limiting memory growth and you can try.

You can find code in section Limit memory growth

(This is my first comment in GitHub)

I had a similar issue before. limiting GPU memory manually helped. https://github.com/tensorflow/tensorflow/issues/25160#issuecomment-643703167

I got the same problem on Ubuntu 20.04 with a GeForce RTX 2060 SUPER. A NN with dense layers works well. But with CNN layers I'm getting Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
Adding tf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True) makes no difference to the error.
I followed the installation according to https://www.tensorflow.org/install/gpu and nvidia-smi shows:
Driver Version: 440.64.00 CUDA Version: 10.2
My conda env has:

cudatoolkit               10.1.243             h6bb024c_0  
cudnn                     7.6.5                cuda10.1_0  
tensorflow-gpu            2.1.0                h0d30ee6_0

In a conda env with tf 1.15 I am getting the same error. It would be great if this could be fixed.

Update

After using export TF_FORCE_GPU_ALLOW_GROWTH=true it all works. I was of the impression that the tf.config.experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True) would to the same thing, but that's not the case. I think this should be clearly stated on the TensorFlow GPU support webpage.

Dude, your solution saves my life.

Nvidia just released the 440.100 and 450.51(Beta) Linux display drivers.
I tried out the 440.100, and it didn't fix the issue. Has anyone tried out the beta 450.51?

@eduardoscsouza

Nvidia just released the 440.100 and 450.51(Beta) Linux display drivers.
I tried out the 440.100, and it didn't fix the issue. Has anyone tried out the beta 450.51?

I tried 450.36.06. check https://github.com/tensorflow/tensorflow/issues/25160#issuecomment-643703167.

the code that worked for me:

import tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.InteractiveSession(config=config)

_Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template_

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes and No (described below)
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Manjaro
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): tf-nightly-gpu (Dec 19, r1.13)
  • TensorFlow version (use command below): 1.13.0-dev20181219
  • Python version: 3.7.1
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: CUDA 10 with cuDNN 7.4.1
  • GPU model and memory: RTX 2070 8GB

Describe the current behavior
I'm running the CNN model on MNIST. When I'm running with the GPU, I am encountering
2018-12-20 20:09:13.644176: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

I did some digging and realized that it is a memory issue (which shouldn't be the case as I have 32GB of RAM and 64GB of swap. I ran htop when running the model and I have 20+GB free, which is more than enough to fit the 8GB vRAM mappings.

Using the gpu_options.allow_growth = True gets the model to work properly, and setting os.environ['CUDA_VISIBLE_DEVICES'] = '-1' also works. This means that I AM facing a memory issue, but I don't see how.

Also, using gpu_options.allow_growth = True does not fix the same issue when trying to run tensorflow/models/official/mnist/ model, which should have a similar behavior with my code.

Code to reproduce the issue

import os
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import math
import time
# Killing optional CPU driver warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
tf.logging.set_verbosity(tf.logging.ERROR)


class Model:

    def __init__(self, image, label):
        """
        A Model class contains a computational graph that classifies images
        to predictions. Each of its methods builds part of the graph
        on Model initialization. Do not modify the constructor, as doing so
        would break the autograder. You may, however, add class variables
        to use in your graph-building. e.g. learning rate, 

        image: the input image to the computational graph as a tensor
        label: the correct label of an image as a tensor
        prediction: the output prediction of the computational graph,
                    produced by self.forward_pass()
        optimize: the model's optimizing tensor produced by self.optimizer()
        loss: the model's loss produced by computing self.loss_function()
        accuracy: the model's prediction accuracy
        """
        self.image = image
        self.label = label

        # TO-DO: Add any class variables you want to use.

        self.prediction = self.forward_pass()
        self.loss = self.loss_function()
        self.optimize = self.optimizer()
        self.accuracy = self.accuracy_function()

    def forward_pass(self):
        """
        Predicts a label given an image using convolution layers

        :return: the prediction as a tensor
        """
        filter_1 = tf.Variable(tf.truncated_normal([3, 3, 1, 8], stddev=0.1))
        conv_1 = tf.nn.conv2d(self.image, filter_1, [1, 1, 1, 1], "SAME")

        reshaped = tf.reshape(conv_1, shape=[50, -1])

        L1 = reshaped.shape[1].value
        L2 = 500
        W1 = tf.Variable(tf.random_normal([L1, L2], mean=0, stddev=0.01))
        b1 = tf.Variable(tf.random_normal([L2], mean=0, stddev=0.01))
        relu_1 = tf.nn.relu(tf.matmul(reshaped, W1) + b1)

        W2 = tf.Variable(tf.random_normal([L2, 10], mean=0, stddev=0.01))
        b2 = tf.Variable(tf.random_normal([10], mean=0, stddev=0.01))
        logits = tf.nn.relu(tf.matmul(relu_1, W2) + b2)
        return logits

    def loss_function(self):
        """
        Calculates the model cross-entropy loss

        :return: the loss of the model as a tensor
        """
        loss = tf.losses.softmax_cross_entropy(onehot_labels=self.label, logits=self.prediction)
        return loss

    def optimizer(self):
        """
        Optimizes the model loss using an Adam Optimizer

        :return: the optimizer as a tensor
        """
        learning_rate = 0.1
        sgd = tf.train.GradientDescentOptimizer(learning_rate)
        train = sgd.minimize(self.loss)
        return train

    def accuracy_function(self):
        """
        Calculates the model's prediction accuracy by comparing
        predictions to correct labels – no need to modify this

        :return: the accuracy of the model as a tensor
        """
        correct_prediction = tf.equal(tf.argmax(self.prediction, 1),
                                      tf.argmax(self.label, 1))
        return tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


def main():
    t_start = time.time()

    mnist = input_data.read_data_sets("data/mnist/", one_hot=True)
    batch_sz = 50
    batch = 2000

    inputs = tf.placeholder(shape=[batch_sz, 28, 28, 1], dtype=tf.float32)
    labels = tf.placeholder(shape=[batch_sz, 10], dtype=tf.float32)

    model = Model(inputs, labels)

    session_config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))
    sess = tf.Session(config=session_config)

    # sess = tf.Session()

    sess.run(tf.global_variables_initializer())
    for i in range(batch):
        next_image, next_label = mnist.train.next_batch(batch_sz)
        next_image = next_image.reshape((batch_sz, 28, 28, 1))
        sess.run(model.optimize, feed_dict={inputs: next_image, labels: next_label})

    acc, test_images, test_labels = 0, mnist.test.images, mnist.test.labels
    test_batch = math.ceil(len(test_images) / batch_sz)
    for i in range(test_batch):
        batch_images = test_images[i * batch_sz: (i + 1) * batch_sz]
        batch_images = batch_images.reshape((batch_sz, 28, 28, 1))
        batch_labes = test_labels[i * batch_sz: (i + 1) * batch_sz]
        acc += sess.run(model.accuracy, feed_dict={inputs: batch_images, labels: batch_labes})
    acc /= test_batch
    print(acc)

    print(time.time() - t_start, 'seconds')

    return


if __name__ == '__main__':
    main()

This worked for me.
RTX 2060
ubuntu 18.04
python 3.6

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
sess = InteractiveSession(config=config)
with sess.as_default():
       process ...

Hello @bm777

following my investigation from a few month ago I summarize how I understand the problem

GPU model and memory: RTX 2070 8GB
... which shouldn't be the case as I have 32GB of RAM and 64GB of

The problem is not the system memory, the problem is the GPU memory!

os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

works because it does not use the GPU!

A few explanations:

TF has two modes of operation:

  1. allow memory growth = false: In this case TF preallocates some memory for the system libraries using a rough guess of
    how much memory is needed. AS you can read here https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-633953715 TF uses the formula max(300MB, GPU-MEM * fac) for this guess. For TF2.1 fac = 0.05 for TF2.2 and if I
    remember right it is fac=0.07. So now you have 8GB which gives 400MB for GPU pre-allocated memory under TF2.1
    and 560MB under TF2.2.

    I have experimentally evaluated the necessary pre-allocated memory for a few GPUs and TF21 here: https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-637715002 and here https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-637715002

    Turns out for Conv2D operations I needed 520MB there, you would have less than that under TF21 but more under TF22. Unfortunately you don't mention your TF version but I assume you use TF2.1. If you use TF2.2 and it still fails this might be because you use a different GPU. Anyway fact is it fails. See below

2) allow memory growth = true: TF does not use any pre-allocated memory and loads the libraries as they come. In the TF documentation this is declared as problematic due to potential memory fragmentation and is therefore off by default.

My take:

Given the large range of required memory for the libraries that depends on the operations you perform as well on the GPU you have it seems very difficult to get mode allow memory growth = false right (see https://github.com/tensorflow/tensorflow/issues/24496#issuecomment-637950411). The current solution: to increase the size of the pre-allocated memory, which was done for TF2.2, is problematic if your GPU is rather small. This blocks memory from use assuming you will need all available libraries (blas, Conv, FFT and I don't know whether there are others). In the case where you don't use all of these, this will result in wasting pre-allocated memory, in turn reducing the modelsize you may load for your application. On the other hand I believe that the memory fragmentation problem can be prevented when you create models early forcing system libraries to load before starting the training. This seems what is happening in most cases anyway and it seems therefore beneficial, especially for GPUs with small memory and especially for training a single model, to not pre-allocate but to use allow memory growth = true.

Personally I use GPUs with memory ranging from 4GB to 11GB and following the argument above I have set TF_FORCE_GPU_ALLOW_GROWTH=true for all of them. For the moment I did not have any problems with that.

Hello @roebel

Me too, I was thinking about the issues of error of allocation of memory. This is clearly for me now.
Now it looks good GPU memory

In the past, I tested many options to pre-allocate memory 😢:

gpus = tf.config.experimental.list_physical_devices('GPU')
try:
    tf.config.experimental.set_virtual_device_configuration(gpus[0], 
                 tf.config.experimental.VirtualDeviceConfiguration(memory_limit=5044)])
    """process...."""
except Exception as e:
    raise e

Personally I use GPU with 6GB of memory.
And thank you @roebel, for this new arrow TF_FORCE_GPU_ALLOW_GROWTH=true to force my GPU for allocation 😊.

I had this same issue. I can say with certainty that the problem only occurs on my 2070 RTX, and NOT on a Titan RTX, running exactly the same code.

https://github.com/DeepLabCut/DeepLabCut/issues/837

Just upgrade to Tensorflow 2.3 with CUDA 11 and cudnn 8.0. It magically solved all my problems and I don't even need the workaround with config.gpu_options.allow_growth = True now.

unfortunately, I need to run code that only supports tensorflow 1.X

Just upgrade to Tensorflow 2.3 with CUDA 11 and cudnn 8.0. It magically solved all my problems and I don't even need the workaround with config.gpu_options.allow_growth = True now.

Upgrading from 2.2 to 2.3 even with explicit TF_FORCE_GPU_ALLOW_GROWTH=false solved this for me as well (at least for now I am able to run delf demo code; have not tested with anything else).

I am still on CUDA 10.1, Cudnn 7.6.5.

Is there a fix for this issue with tensorflow 2 and python3 ???

I have a:
RTX 2080

I am getting this message:


2020-08-20 12:38:27.172496: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-08-20 12:38:27.177708: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "/home/anantha/Desktop/RaspiCar/car.py", line 85, in <module>
    tnet.train(x, y)
  File "/home/anantha/Desktop/RaspiCar/car.py", line 65, in train
    self.model.fit(x, y, epochs=epochs)
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 66, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 848, in fit
    tmp_logs = train_function(iterator)
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 644, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2420, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1661, in _filtered_call
    return self._call_flat(
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1745, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 593, in call
    outputs = execute.execute(
  File "/home/anantha/.local/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node sequential/conv2d/Conv2D (defined at /Desktop/RaspiCar/car.py:65) ]] [Op:__inference_train_function_951]

Function call stack:
train_function

In case your problem has the same origin as the problems that are treated in the present issue (which I cannot know from your report) then there are a few solutions that you can easily find by means of reading the last 10-20 posts in this thread.

I Fixed it with this:

config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.compat.v1.Session(config=config)
sess.as_default()

I had this same issue with RTX 2080. Then following code worked for me.

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

Thanks everyone

I think we can stop posting the allow_growth fix now :)

RTX 2070 here. Was getting this error, but now running with TF_FORCE_GPU_ALLOW_GROWTH=true (as other commenters have pointed out, fixes it for them) changes the error message to an out of memory error (even though I've got plenty of memory):

2020-10-17 16:35:11.717658: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 3.87G (4159818752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory

But my GPU has 8GB and only about 250MB were in use before I started the process. So I don't understand, why can't it allocate 3.87GB? (lowering batch size had no effect; the weights hdf5 file is less than 200MB)

TF_FORCE_GPU_ALLOW_GROWTH=true worked for me.
tf.config.experimental.set_memory_growth(gpu, True) worked too.

Here is my configuration:
GPU GTX 1650
cuda-10-1 10.1.243-1
libcudnn7 7.6.5.32-1+cuda10.1
Ubuntu 18.04.5 LTS

Whoever cannot set the environment variable, could try this as suggested in https://www.tensorflow.org/guide/gpu:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)

Typing the command mentioned on the terminal just worked for me.

https://github.com/tensorflow/tfjs/issues/671#issuecomment-494832790

Just upgrade to Tensorflow 2.3 with CUDA 11 and cudnn 8.0. It magically solved all my problems and I don't even need the workaround with config.gpu_options.allow_growth = True now.

It seems that the issue is noticed and solved in tensorflow 2.3.0.

  • CUDA 10.1
  • GPU: Quadro RTX 6000
  • Tensorflow 2.2.0
  • cudnn 7.6.5

Same problem:
tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

And the workaround allow_growth = True does not help.

After I upgrade tensorflow to 2.3.0, the problem disappeared, even without adding the line allow_growth = True .

ok, made it work in tf-nightly-gpu-2.0-preview and ipython notebook adding this to my code:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

it works in my case

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tpankaj picture tpankaj  ·  170Comments

mohamedmansour picture mohamedmansour  ·  180Comments

guppythegod picture guppythegod  ·  137Comments

kirk86 picture kirk86  ·  136Comments

natsukium picture natsukium  ·  148Comments