Detectron: RuntimeError: [enforce fail at conv_op_cudnn.cc:811] status == CUDNN_STATUS_SUCCESS。 8 vs 0.,错误:/pytorch/caffe2/operators/conv_op_cudnn.cc:811: CUDNN_STATUS_EXECUTION_FAILED

创建于 2019-06-05  ·  3评论  ·  资料来源: facebookresearch/Detectron

Ubuntu18.04
CUDA 10.0
当我运行 infer_simple.py 时,会发生此错误。 有谁知道如何修理它?

Screenshot from 2019-06-05 11-37-03

所有3条评论

Caffe2 正式支持 CUDA 9.0 和 CUDA 8.0

我也需要这方面的帮助。 我刚刚将一台机器而不是 SLI 中的两个 1070 GPU 升级到一个 RTX 2080 Ti 并且我的配置之前使用这两张卡现在似乎抛出了同样的错误。 这不是 CUDA 10.0 支持错误,因为我之前一直在运行它。 我也在使用 cuDNN 7.5.1.0,请帮忙,因为我也尝试过重新安装 CUDA、cuDNN 和 pytroch。 我再次从源代码构建了 pytroch,它进行得很顺利,该过程也找到了 CUDA 和 cuDNN,并且没有错误地完成。 ( Ubuntu 16.04 )

INFO net.py: 133: res2_0_branch2a_b preserved in workspace (unused)
INFO net.py: 133: res4_9_branch2c_b preserved in workspace (unused)
INFO net.py: 133: res4_7_branch2a_b preserved in workspace (unused)
[I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 0.000110464 secs
[I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 9.2788e-05 secs
[I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 1.9226e-05 secs
INFO infer_simple.py: 147: Processing demo/24274813513_0cfd2ce6d0_k.jpg -> /tmp/detectron-visualizations/24274813513_0cfd2ce6d0_k.jpg.pdf
[I net_async_base.h:205] Using specified CPU pool size: 4; device id: -1
[I net_async_base.h:210] Created new CPU pool, size: 4; device id: -1
[E net_async_base.cc:377] [enforce fail at conv_op_cudnn.cc:811] status == CUDNN_STATUS_SUCCESS. 8 vs 0. , Error at: /pytorch/caffe2/operators/conv_op_cudnn.cc:811: CUDNN_STATUS_EXECUTION_FAILED
Error from operator: 
input: "gpu_0/data" input: "gpu_0/conv1_w" output: "gpu_0/conv1" name: "" type: "Conv" arg { name: "kernel" i: 7 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 3 } arg { name: "stride" i: 2 } arg { name: "order" s: "NCHW" } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f143fc74441 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x49 (0x7f143fc74259 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #2: <unknown function> + 0x161e4ff (0x7f13e427c4ff in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #3: <unknown function> + 0x1620424 (0x7f13e427e424 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x456 (0x7f13e4284216 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x198 (0x7f13e4271278 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #6: <unknown function> + 0x157d9f5 (0x7f13e41db9f5 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f1418f1a1f4 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: <unknown function> + 0x18e7669 (0x7f1418f20669 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x253 (0x7f143fc6e723 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #10: <unknown function> + 0xb8c80 (0x7f1445134c80 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #11: <unknown function> + 0x76ba (0x7f144c18f6ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #12: clone + 0x6d (0x7f144bec541d in /lib/x86_64-linux-gnu/libc.so.6)
,  op Conv
[E net_async_base.cc:129] Rethrowing exception from the run of 'generalized_rcnn'
WARNING workspace.py: 218: Original python traceback for operator `0` in network `generalized_rcnn` in exception above (most recent call last):
WARNING workspace.py: 223:   File "tools/infer_simple.py", line 185, in <module>
WARNING workspace.py: 223:   File "tools/infer_simple.py", line 135, in main
WARNING workspace.py: 223:   File "/home/qbeer666/Detectron/detectron/core/test_engine.py", line 327, in initialize_model_from_cfg
WARNING workspace.py: 223:   File "/home/qbeer666/Detectron/detectron/modeling/model_builder.py", line 124, in create
WARNING workspace.py: 223:   File "/home/qbeer666/Detectron/detectron/modeling/model_builder.py", line 89, in generalized_rcnn
WARNING workspace.py: 223:   File "/home/qbeer666/Detectron/detectron/modeling/model_builder.py", line 229, in build_generic_detection_model
WARNING workspace.py: 223:   File "/home/qbeer666/Detectron/detectron/modeling/optimizer.py", line 54, in build_data_parallel_model
WARNING workspace.py: 223:   File "/home/qbeer666/Detectron/detectron/modeling/model_builder.py", line 169, in _single_gpu_build_func
WARNING workspace.py: 223:   File "/home/qbeer666/Detectron/detectron/modeling/FPN.py", line 63, in add_fpn_ResNet101_conv5_body
WARNING workspace.py: 223:   File "/home/qbeer666/Detectron/detectron/modeling/FPN.py", line 104, in add_fpn_onto_conv_body
WARNING workspace.py: 223:   File "/home/qbeer666/Detectron/detectron/modeling/ResNet.py", line 48, in add_ResNet101_conv5_body
WARNING workspace.py: 223:   File "/home/qbeer666/Detectron/detectron/modeling/ResNet.py", line 99, in add_ResNet_convX_body
WARNING workspace.py: 223:   File "/home/qbeer666/Detectron/detectron/modeling/ResNet.py", line 252, in basic_bn_stem
WARNING workspace.py: 223:   File "/home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/cnn.py", line 97, in Conv
WARNING workspace.py: 223:   File "/home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/brew.py", line 108, in scope_wrapper
WARNING workspace.py: 223:   File "/home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/helpers/conv.py", line 186, in conv
WARNING workspace.py: 223:   File "/home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/helpers/conv.py", line 139, in _ConvBase
Traceback (most recent call last):
  File "tools/infer_simple.py", line 185, in <module>
    main(args)
  File "tools/infer_simple.py", line 153, in main
    model, im, None, timers=timers
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/qbeer666/Detectron/detectron/utils/c2.py", line 111, in NamedCudaScope
    yield
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/qbeer666/Detectron/detectron/utils/c2.py", line 118, in GpuNameScope
    yield
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/scope.py", line 48, in NameScope
    yield
  File "/home/qbeer666/Detectron/detectron/utils/c2.py", line 118, in GpuNameScope
    yield
  File "/home/qbeer666/Detectron/detectron/utils/c2.py", line 111, in NamedCudaScope
    yield
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/qbeer666/Detectron/detectron/utils/c2.py", line 126, in CudaScope
    yield
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/scope.py", line 82, in DeviceScope
    yield
  File "/home/qbeer666/Detectron/detectron/utils/c2.py", line 126, in CudaScope
    yield
  File "/home/qbeer666/Detectron/detectron/utils/c2.py", line 111, in NamedCudaScope
    yield
  File "tools/infer_simple.py", line 153, in main
    model, im, None, timers=timers
  File "/home/qbeer666/Detectron/detectron/core/test.py", line 66, in im_detect_all
    model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE, boxes=box_proposals
  File "/home/qbeer666/Detectron/detectron/core/test.py", line 158, in im_detect_bbox
    workspace.RunNet(model.net.Proto().name)
  File "/home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/workspace.py", line 250, in RunNet
    StringifyNetName(name), num_iter, allow_fail,
  File "/home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/workspace.py", line 211, in CallWithExceptionIntercept
    return func(*args, **kwargs)
RuntimeError: [enforce fail at conv_op_cudnn.cc:811] status == CUDNN_STATUS_SUCCESS. 8 vs 0. , Error at: /pytorch/caffe2/operators/conv_op_cudnn.cc:811: CUDNN_STATUS_EXECUTION_FAILED
Error from operator: 
input: "gpu_0/data" input: "gpu_0/conv1_w" output: "gpu_0/conv1" name: "" type: "Conv" arg { name: "kernel" i: 7 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 3 } arg { name: "stride" i: 2 } arg { name: "order" s: "NCHW" } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f143fc74441 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x49 (0x7f143fc74259 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #2: <unknown function> + 0x161e4ff (0x7f13e427c4ff in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #3: <unknown function> + 0x1620424 (0x7f13e427e424 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x456 (0x7f13e4284216 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x198 (0x7f13e4271278 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #6: <unknown function> + 0x157d9f5 (0x7f13e41db9f5 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f1418f1a1f4 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: <unknown function> + 0x18e7669 (0x7f1418f20669 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x253 (0x7f143fc6e723 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #10: <unknown function> + 0xb8c80 (0x7f1445134c80 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #11: <unknown function> + 0x76ba (0x7f144c18f6ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #12: clone + 0x6d (0x7f144bec541d in /lib/x86_64-linux-gnu/libc.so.6)

我的电脑环境:
================================================== =====
Ubuntu16.04
CUDA 10.0
protobuf==3.6.1
================================================== =====
可以重新安装cudatoolkit 9.0,和pytorch-nightly-1.0.0.dev20190328-py2.7_cuda9.0.176_cudnn7.4.2_0.tar.bz2,可以成功运行。

此页面是否有帮助?
0 / 5 - 0 等级

相关问题

Hwang-dae-won picture Hwang-dae-won  ·  3评论

fangpengcheng95 picture fangpengcheng95  ·  4评论

kampelmuehler picture kampelmuehler  ·  4评论

gaopeng-eugene picture gaopeng-eugene  ·  4评论

lilichu picture lilichu  ·  3评论