Tensorflow: ошибка использования Π½Π΅ΡΠΊΠΎΠ»ΡŒΠΊΠΈΡ… GPU, связанная с tf.Variable, привязанной ΠΊ CPU

Π‘ΠΎΠ·Π΄Π°Π½Π½Ρ‹ΠΉ Π½Π° 9 мая 2016  Β·  3ΠšΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΈ  Β·  Π˜ΡΡ‚ΠΎΡ‡Π½ΠΈΠΊ: tensorflow/tensorflow

Π˜Π½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡ ΠΎΠ± ΠΎΠΊΡ€ΡƒΠΆΠ°ΡŽΡ‰Π΅ΠΉ срСдС

ΠžΠΏΠ΅Ρ€Π°Ρ†ΠΈΠΎΠ½Π½Π°Ρ систСма: Ubuntu 14.04

УстановлСнная вСрсия CUDA ΠΈ cuDNN: 7.5 ΠΈ 4.0.7
(ΠΏΡ€ΠΈΠ»ΠΎΠΆΠΈΡ‚Π΅ Π²Ρ‹Π²ΠΎΠ΄ ls -l /path/to/cuda/lib/libcud* ):

ΠŸΡ€ΠΈ установкС ΠΈΠ· исходников ΡƒΠΊΠ°ΠΆΠΈΡ‚Π΅ Ρ…Π΅Ρˆ фиксации: 4a4f2461533847dde239851ecebe5056088a828c

ДСйствия ΠΏΠΎ Π²ΠΎΡΠΏΡ€ΠΎΠΈΠ·Π²Π΅Π΄Π΅Π½ΠΈΡŽ

ЗапуститС ΡΠ»Π΅Π΄ΡƒΡŽΡ‰ΠΈΠΉ ΠΊΠΎΠ΄

import tensorflow as tf

def main():
    a = tf.Variable(1)
    init_a = tf.initialize_all_variables()
    with tf.Session() as sess:
        sess.run(init_a)

    with tf.device("/gpu:0"):
        b = tf.constant(2)
        init_b = tf.initialize_all_variables()
    with tf.Session() as sess:
        sess.run(init_b)

    with tf.device("/cpu:0"):
        c = tf.Variable(2)
        init_c = tf.initialize_all_variables()
    with tf.Session() as sess:
        sess.run(init_c)

    with tf.device("/gpu:0"):
        d = tf.Variable(2)
        init_d = tf.initialize_all_variables()
    with tf.Session() as sess:
        sess.run(init_d)

if __name__ == '__main__':
    main()

Π–ΡƒΡ€Π½Π°Π»Ρ‹ ΠΈΠ»ΠΈ Π΄Ρ€ΡƒΠ³ΠΎΠΉ ΠΏΠΎΠ»Π΅Π·Π½Ρ‹ΠΉ Π²Ρ‹Π²ΠΎΠ΄

(Если ΠΆΡƒΡ€Π½Π°Π»Ρ‹ большиС, Π·Π°Π³Ρ€ΡƒΠ·ΠΈΡ‚Π΅ ΠΈΡ… ΠΊΠ°ΠΊ Π²Π»ΠΎΠΆΠ΅Π½ΠΈΠ΅).

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.266
pciBusID 0000:05:00.0
Total memory: 12.00GiB
Free memory: 11.02GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 1 with properties: 
name: GeForce GTX 980
major: 5 minor: 2 memoryClockRate (GHz) 1.2785
pciBusID 0000:09:00.0
Total memory: 4.00GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 1 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y N 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 1:   N Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:756] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:05:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:756] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 980, pci bus id: 0000:09:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:756] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:05:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:756] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 980, pci bus id: 0000:09:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:756] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:05:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:756] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 980, pci bus id: 0000:09:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:756] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:05:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:756] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 980, pci bus id: 0000:09:00.0)
Traceback (most recent call last):
  File "test_multi_gpu.py", line 30, in <module>
    main()
  File "test_multi_gpu.py", line 26, in main
    sess.run(init_d)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 332, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 572, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 652, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 672, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'Variable_2': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available
     [[Node: Variable_2 = Variable[container="", dtype=DT_INT32, shape=[], shared_name="", _device="/device:GPU:0"]()]]
Caused by op u'Variable_2', defined at:
  File "test_multi_gpu.py", line 30, in <module>
    main()
  File "test_multi_gpu.py", line 23, in main
    d = tf.Variable(2)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 211, in __init__
    dtype=dtype)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 292, in _init_from_args
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/state_ops.py", line 139, in variable_op
    container=container, shared_name=shared_name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 351, in _variable
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 693, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2177, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1161, in __init__
    self._traceback = _extract_stack()

Π― Ρ‚Π°ΠΊΠΆΠ΅ Π·Π°ΠΌΠ΅Ρ‚ΠΈΠ», Ρ‡Ρ‚ΠΎ Π² Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°Ρ†ΠΈΠΈ ΠΏΠΎ использованию графичСских процСссоров Π½Π΅ упоминаСтся ΠΎ tf.Variable, ΠΎΠ½Π° Π²ΠΊΠ»ΡŽΡ‡Π°Π΅Ρ‚ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ tf.constant ΠΈ tf.matmul.

Π₯ΠΎΡ€ΠΎΡˆΠΎ, я нашСл Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°Ρ†ΠΈΡŽ ΠΈΠ· [Convolutional Neural Networks] (https://www.tensorflow.org/versions/r0.8/tutorials/deep_cnn/index.html),
Ρ†ΠΈΡ‚Π°Ρ‚Ρ‹:

All variables are pinned to the CPU and accessed via tf.get_variable() in order to share them in a multi-GPU version. See how-to on Sharing Variables.

Π― Ρ…ΠΎΡ‡Ρƒ ΡΠΏΡ€ΠΎΡΠΈΡ‚ΡŒ, ΠΏΠΎΡΠΊΠΎΠ»ΡŒΠΊΡƒ tf.Variables ΠΏΡ€ΠΈΠΊΡ€Π΅ΠΏΠ»Π΅Π½ ΠΊ процСссору с ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ Ρ‚Π΅Π½Π·ΠΎΡ€Π½ΠΎΠ³ΠΎ ΠΏΠΎΡ‚ΠΎΠΊΠ°, ΠΌΠΎΠΆΠ΅ΠΌ Π»ΠΈ ΠΌΡ‹ ΠΈΡΠΏΡ€Π°Π²ΠΈΡ‚ΡŒ эту ΠΎΡˆΠΈΠ±ΠΊΡƒ? НуТно Π»ΠΈ Π½Π°ΠΌ ΠΎΡ‡Π΅Π½ΡŒ Π²Π½ΠΈΠΌΠ°Ρ‚Π΅Π»ΡŒΠ½ΠΎ ΡΠ»Π΅Π΄ΠΈΡ‚ΡŒ Π·Π° Ρ‚Π΅ΠΌ, Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΠΈΡΠΊΠ»ΡŽΡ‡ΠΈΡ‚ΡŒ объявлСниС tf.Variable Π·Π° ΠΏΡ€Π΅Π΄Π΅Π»Π°ΠΌΠΈ области with tf.device('/gpu:xx') , ΠΈΠ»ΠΈ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ netsted with tf.device(None) для Π΅Π³ΠΎ ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ?

Π‘Π°ΠΌΡ‹ΠΉ ΠΏΠΎΠ»Π΅Π·Π½Ρ‹ΠΉ ΠΊΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΉ

ΠŸΡ€ΠΎΠ±Π»Π΅ΠΌΠ° высокого уровня Π΄ΠΎΠ»ΠΆΠ½Π° Π±Ρ‹Ρ‚ΡŒ устранСна с ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ постоянной Ρ€Π°Π±ΠΎΡ‚Ρ‹ tf.Variable ignore tf.device() Π½Π΅ ΡΡ€Π°Π±ΠΎΡ‚Π°Ρ‚ΡŒ, ΠΏΠΎΡ‚ΠΎΠΌΡƒ Ρ‡Ρ‚ΠΎ ΠΌΠ½ΠΎΠ³ΠΈΠ΅ ΠΈΠ· Π½Π°ΡˆΠΈΡ… ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»Π΅ΠΉ, особСнно Π² распрСдСлСнных настройках, ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΡŽΡ‚ это для настройки сСрвСров ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ΠΎΠ².) Π’ краткосрочной пСрспСктивС ΠΏΠΎΠΏΡ€ΠΎΠ±ΡƒΠΉΡ‚Π΅ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ мягкоС Ρ€Π°Π·ΠΌΠ΅Ρ‰Π΅Π½ΠΈΠ΅ Π² своСм сСансС. конструктор:

config = tf.ConfigProto(allow_soft_placement=True)
with tf.Session(config=config) as sess:
    # ...

ВсС 3 ΠšΠΎΠΌΠΌΠ΅Π½Ρ‚Π°Ρ€ΠΈΠΉ

Π˜Ρ‚Π°ΠΊ, Π΅ΡΡ‚ΡŒ Π½Π΅ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ ΠΎΠΏΠ΅Ρ€Π°Ρ†ΠΈΠΈ, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ нСдопустимы для tf.device (), Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€ tf.nn.local_response_normalization (),
Бм. Код ниТС:

    with tf.device("/gpu:0"):
        d = tf.placeholder("float", shape=[100, 100, 100, 10])
        with tf.device(None):
            lrn1 = tf.nn.local_response_normalization(d, depth_radius=5, bias=1.0, alpha=1e-4, beta=0.75)
        lrn2 = tf.nn.local_response_normalization(d, depth_radius=5, bias=1.0, alpha=1e-4, beta=0.75)
        init_d = tf.initialize_all_variables()
    with tf.Session() as sess:
        sess.run(init_d)
        r = np.random.randn(100, 100, 100, 10)
        sess.run(lrn1, feed_dict={d: r}) #Run ok
        sess.run(lrn2, feed_dict={d: r}) # Error

Π Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ Π½ΠΈΠΆΠ΅:

Traceback (most recent call last):
  File "test_multi_gpu.py", line 44, in <module>
    main()
  File "test_multi_gpu.py", line 40, in main
    sess.run(lrn2, feed_dict={d: r})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 332, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 572, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 652, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 672, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'LRN_1': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available
     [[Node: LRN_1 = LRN[alpha=0.0001, beta=0.75, bias=1, depth_radius=5, _device="/device:GPU:0"](Placeholder)]]
Caused by op u'LRN_1', defined at:
  File "test_multi_gpu.py", line 44, in <module>
    main()
  File "test_multi_gpu.py", line 34, in main
    lrn2 = tf.nn.local_response_normalization(d, depth_radius=5, bias=1.0, alpha=1e-4, beta=0.75)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 737, in lrn
    bias=bias, alpha=alpha, beta=beta, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 693, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2177, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1161, in __init__
    self._traceback = _extract_stack()

Π”ΡƒΠΌΠ°ΡŽ, ΠΏΡ€ΠΈΡ‡ΠΈΠ½Π° этой ошибки ΠΌΠΎΠΆΠ΅Ρ‚ Π±Ρ‹Ρ‚ΡŒ достаточно ясна. Π’ tf.nn.local_response_normalization Π΅ΡΡ‚ΡŒ внутрСнняя tf.Variable, ΠΊΠΎΡ‚ΠΎΡ€ΡƒΡŽ ΠΌΡ‹ Π½Π΅ ΠΌΠΎΠ³Π»ΠΈ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ Π²ΠΎ внСшнСм ΠΊΠΎΠ΄Π΅, Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΠΎΡΡ‚Π°Π²Π°Ρ‚ΡŒΡΡ Π²Ρ‹Ρ‡ΠΈΡΠ»ΠΈΡ‚Π΅Π»ΡŒΠ½Ρ‹ΠΌ ΡƒΠ·Π»ΠΎΠΌ для ΡƒΠΊΠ°Π·Π°Π½Π½ΠΎΠ³ΠΎ графичСского процСссора, ΠΈΡΠΊΠ»ΡŽΡ‡Π°Ρ всС Π²Π½ΡƒΡ‚Ρ€Π΅Π½Π½ΠΈΠ΅ ΠΏΠ΅Ρ€Π΅ΠΌΠ΅Π½Π½Ρ‹Π΅.

На Π΄Π°Π½Π½Ρ‹ΠΉ ΠΌΠΎΠΌΠ΅Π½Ρ‚ я Π΄ΡƒΠΌΠ°ΡŽ, Ρ‡Ρ‚ΠΎ Ρ‚Π΅Π½Π·ΠΎΡ€Π½Ρ‹ΠΉ ΠΏΠΎΡ‚ΠΎΠΊ Π΄ΠΎΠ»ΠΆΠ΅Π½ Π²Ρ‹ΠΏΠΎΠ»Π½ΡΡ‚ΡŒ ΠΎΠ΄Π½ΠΎ ΠΈΠ· Π΄Π²ΡƒΡ… ΡΠ»Π΅Π΄ΡƒΡŽΡ‰ΠΈΡ… дСйствий:

  1. Π‘Π΄Π΅Π»Π°ΠΉΡ‚Π΅ tf.Variable Π½Π΅ зависимым ΠΎΡ‚ tf.device (). (Π­Ρ‚ΠΎ ΠΌΠΎΠΆΠ΅Ρ‚ Π±Ρ‹Ρ‚ΡŒ ΠΏΡ€Π΅Π΄ΠΏΠΎΡ‡Ρ‚ΠΈΡ‚Π΅Π»ΡŒΠ½Π΅Π΅.)
  2. ΠŸΠ΅Ρ€Π΅Ρ‡ΠΈΡΠ»ΠΈΡ‚Π΅ ΠΎΠΏΠ΅Ρ€Π°Ρ†ΠΈΠΈ, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΌ Π½Π΅ΠΎΠ±Ρ…ΠΎΠ΄ΠΈΠΌΠΎ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ tf.device(None) Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΠΏΠΎΠΌΠΎΡ‡ΡŒ ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»ΡŽ Π·Π°Π²Π΅Ρ€ΡˆΠΈΡ‚ΡŒ свой ΠΊΠΎΠ΄, Π²Π΅Ρ€Π½ΠΎ?

ΠŸΡ€ΠΎΠ±Π»Π΅ΠΌΠ° высокого уровня Π΄ΠΎΠ»ΠΆΠ½Π° Π±Ρ‹Ρ‚ΡŒ устранСна с ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ постоянной Ρ€Π°Π±ΠΎΡ‚Ρ‹ tf.Variable ignore tf.device() Π½Π΅ ΡΡ€Π°Π±ΠΎΡ‚Π°Ρ‚ΡŒ, ΠΏΠΎΡ‚ΠΎΠΌΡƒ Ρ‡Ρ‚ΠΎ ΠΌΠ½ΠΎΠ³ΠΈΠ΅ ΠΈΠ· Π½Π°ΡˆΠΈΡ… ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»Π΅ΠΉ, особСнно Π² распрСдСлСнных настройках, ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΡŽΡ‚ это для настройки сСрвСров ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ΠΎΠ².) Π’ краткосрочной пСрспСктивС ΠΏΠΎΠΏΡ€ΠΎΠ±ΡƒΠΉΡ‚Π΅ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ мягкоС Ρ€Π°Π·ΠΌΠ΅Ρ‰Π΅Π½ΠΈΠ΅ Π² своСм сСансС. конструктор:

config = tf.ConfigProto(allow_soft_placement=True)
with tf.Session(config=config) as sess:
    # ...

Бпасибо Π·Π° вашС ΠΏΡ€Π΅Π΄Π»ΠΎΠΆΠ΅Π½ΠΈΠ΅, ΠΏΠΎΡ…ΠΎΠΆΠ΅, использованиС allow_soft_placement=True Ρ€Π΅ΡˆΠΈΡ‚ ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΡƒ. Как ΡƒΠΊΠ°Π·Π°Π½ΠΎ Π² β„– 2292, Π»ΡƒΡ‡ΡˆΠ΅ ΡƒΠ»ΡƒΡ‡ΡˆΠΈΡ‚ΡŒ ΡΠΎΠΎΡ‚Π²Π΅Ρ‚ΡΡ‚Π²ΡƒΡŽΡ‰ΠΈΠΉ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚, Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒ Π·Π½Π°Π» ΠΎΠ± этом.

Π‘Ρ‹Π»Π° Π»ΠΈ эта страница ΠΏΠΎΠ»Π΅Π·Π½ΠΎΠΉ?
0 / 5 - 0 Ρ€Π΅ΠΉΡ‚ΠΈΠ½Π³ΠΈ