Pytorch: [build/nccl]์ด ๋ถˆ์•ˆ์ •ํ•œ Debian์—์„œ libnccl์„ ๋นŒ๋“œํ•˜์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค.

์— ๋งŒ๋“  2017๋…„ 02์›” 02์ผ  ยท  3์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: pytorch/pytorch

์ตœ์‹  ์†Œ์Šค๋กœ pytorch์˜ CUDA ๋ฒ„์ „(CUDNN ์ œ์™ธ)์„ ๋นŒ๋“œํ•˜์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค.

OS: ๋ฐ๋น„์•ˆ ๋ถˆ์•ˆ์ •/์‹คํ—˜์ 
์ปดํŒŒ์ผ๋Ÿฌ: gcc-5, g++-5
CUDA: 8.0.44(๋ฐ๋น„์•ˆ์—์„œ ์ œ๊ณตํ•˜๋Š” ํŒจํ‚ค์ง€)

๋นŒ๋“œ๋กœ๊ทธ: http://debomatic-amd64.debian.net/distribution#experimental/pytorch -contrib/0.1.7~1/buildlog

-- The C compiler identification is GNU 5.4.1
-- The CXX compiler identification is GNU 5.4.1
-- Check for working C compiler: /usr/bin/gcc-5
-- Check for working C compiler: /usr/bin/gcc-5 -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/g++-5
-- Check for working CXX compiler: /usr/bin/g++-5 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDA: /usr (found suitable version "8.0", minimum required is "7.0") 
-- Configuring done
-- Generating done
-- Build files have been written to: /<<PKGBUILDDIR>>/torch/lib/build/nccl
make[2]: Entering directory '/<<PKGBUILDDIR>>/torch/lib/build/nccl'
make[3]: Entering directory '/<<PKGBUILDDIR>>/torch/lib/build/nccl'
make[4]: Entering directory '/<<PKGBUILDDIR>>/torch/lib/build/nccl'
Scanning dependencies of target nccl
make[4]: Leaving directory '/<<PKGBUILDDIR>>/torch/lib/build/nccl'
make[4]: Entering directory '/<<PKGBUILDDIR>>/torch/lib/build/nccl'
[100%] Generating lib/libnccl.so
make[5]: Entering directory '/<<PKGBUILDDIR>>/torch/lib/nccl'
ls: cannot access '/usr/lib64/libcudart.so.*': No such file or directory
ls: cannot access '/usr/lib64/libcudart.so.*': No such file or directory
Grabbing  src/nccl.h                > /<<PKGBUILDDIR>>/torch/lib/build/nccl/include/nccl.h
Compiling src/libwrap.cu            > /<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/libwrap.o
Compiling src/core.cu               > /<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/core.o
Compiling src/all_gather.cu         > /<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/all_gather.o
Compiling src/all_reduce.cu         > /<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/all_reduce.o
Compiling src/broadcast.cu          > /<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/broadcast.o
Compiling src/reduce.cu             > /<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/reduce.o
Compiling src/reduce_scatter.cu     > /<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/reduce_scatter.o
src/core.cu(724): error: expected an expression

src/core.cu(724): error: expected an expression

2 errors detected in the compilation of "/tmp/tmpxft_00002c02_00000000-13_core.compute_52.cpp1.ii".
Makefile:98: recipe for target '/<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/core.o' failed
make[5]: *** [/<<PKGBUILDDIR>>/torch/lib/build/nccl/obj/core.o] Error 2

๋‚˜๋Š” ์ด๊ฒƒ์— ๋Œ€ํ•ด ์•„๋ฌด ์ƒ๊ฐ์ด ์—†๋‹ค ...

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

@apaszke ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ์ˆ˜์ •์€ ๋‘ ๊ฐ€์ง€ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ๋‚ด๋ณด๋‚ด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

export CUDA_HOME=/usr
export CUDA_LIB=/usr/lib/$(shell dpkg-architecture -qDEB_HOST_MULTIARCH)

๋ชจ๋“  3 ๋Œ“๊ธ€

์—ฌ๊ธฐ ๋ฅผ ๋ณด๋ฉด ls ๊ฐ€ libcudart.so ๋ฅผ ์ฐพ์„ ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— CUDA_MAJOR ๋ฐ CUDA_MINOR ๊ฐ€ ๋น„์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‹คํŒจํ•˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฌธ์ž์—ด. ๋ฐ๋น„์•ˆ CUDA ํŒจํ‚ค์ง€๊ฐ€ ์–ด๋–ป๊ฒŒ ์ƒ๊ฒผ๋Š”์ง€ ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ ์•„๋งˆ๋„ ๋ฒ„์ „ ํ™•์žฅ์ด ์žˆ๋Š” libcudart.so ๊ฐ€ ์—†์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

CUDA_VERSION=8.0 pp setup.py build ์ด ๋„์›€์ด ๋ ๊นŒ์š”?

@apaszke ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ์ˆ˜์ •์€ ๋‘ ๊ฐ€์ง€ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ๋‚ด๋ณด๋‚ด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

export CUDA_HOME=/usr
export CUDA_LIB=/usr/lib/$(shell dpkg-architecture -qDEB_HOST_MULTIARCH)
์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰