Tensorflow: SSE4.1, SSE4.2 ๋ฐ AVX๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ tensorflow๋ฅผ ์ปดํŒŒ์ผํ•˜๋Š” ๋ฐฉ๋ฒ•.

์— ๋งŒ๋“  2017๋…„ 03์›” 03์ผ  ยท  44์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: tensorflow/tensorflow

๋ฐฉ๊ธˆ ํ…์„œํ”Œ๋กœ๋ฅผ ์‹คํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ์ด ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

ํ˜„์žฌ Mac Yosemite๋ฅผ ์‚ฌ์šฉ ์ค‘์ด๋ฉฐ python 3.5๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ anaconda๋ฅผ ํ†ตํ•ด pip3๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ tensorflow๋ฅผ ๋‹ค์šด๋กœ๋“œํ–ˆ์Šต๋‹ˆ๋‹ค.

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.

๋”ฐ๋ผ์„œ anaconda์—๋Š” ํŠน๋ณ„ํ•œ ๋ช…๋ น ์„ธํŠธ๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ anaconda ๋ช…๋ น ์‹œ์Šคํ…œ์„ ํ†ตํ•ด SSE4.1, SSE4.2 ๋ฐ AVX์—์„œ tensorflow๋ฅผ ์‹คํ–‰ํ•˜๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ? ๋‚˜๋Š” ์ด๊ฒƒ์„ ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ• ์ง€ ์ •๋ง ํ˜ผ๋ž€์Šค๋Ÿฝ๋‹ค.

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

์ด๊ฒƒ์€ ์˜ค๋ฅ˜๊ฐ€ ์•„๋‹ˆ๋ผ ์†Œ์Šค์—์„œ TensorFlow๋ฅผ ๋นŒ๋“œํ•˜๋ฉด ์ปดํ“จํ„ฐ์—์„œ ๋” ๋น ๋ฅผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒฝ๊ณ ์ž…๋‹ˆ๋‹ค.

์ด์— ๋Œ€ํ•œ SO ์งˆ๋ฌธ: http://stackoverflow.com/questions/41293077/how-to-compile-tensorflow-with-sse4-2-and-avx-instructions
์†Œ์Šค์—์„œ ๋นŒ๋“œํ•˜๊ธฐ ์œ„ํ•œ TensorFlow ๊ฐ€์ด๋“œ: https://www.tensorflow.org/install/install_sources

๋ชจ๋“  44 ๋Œ“๊ธ€

์ด๊ฒƒ์€ ์˜ค๋ฅ˜๊ฐ€ ์•„๋‹ˆ๋ผ ์†Œ์Šค์—์„œ TensorFlow๋ฅผ ๋นŒ๋“œํ•˜๋ฉด ์ปดํ“จํ„ฐ์—์„œ ๋” ๋น ๋ฅผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒฝ๊ณ ์ž…๋‹ˆ๋‹ค.

์ด์— ๋Œ€ํ•œ SO ์งˆ๋ฌธ: http://stackoverflow.com/questions/41293077/how-to-compile-tensorflow-with-sse4-2-and-avx-instructions
์†Œ์Šค์—์„œ ๋นŒ๋“œํ•˜๊ธฐ ์œ„ํ•œ TensorFlow ๊ฐ€์ด๋“œ: https://www.tensorflow.org/install/install_sources

@Carmezim์ด ๋งํ–ˆ๋“ฏ์ด ์ด๊ฒƒ์€ ๋‹จ์ˆœํžˆ ๊ฒฝ๊ณ  ๋ฉ”์‹œ์ง€์ž…๋‹ˆ๋‹ค.
๊ฐ ํ”„๋กœ๊ทธ๋žจ์— ๋Œ€ํ•ด ํ•œ ๋ฒˆ๋งŒ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.
๊ทธ๋ฆฌ๊ณ  ๊ฒฝ๊ณ ์—์„œ ๋งํ–ˆ๋“ฏ์ด TF๋ฅผ ๋” ๋น ๋ฅด๊ฒŒ ๋งŒ๋“ค์–ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ์—๋งŒ ์ด๋Ÿฌํ•œ ํ”Œ๋ž˜๊ทธ๋กœ TF๋ฅผ ์ปดํŒŒ์ผํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๊ฐ€์ด๋“œ์— ๋”ฐ๋ผ ์†Œ์Šค์—์„œ TensorFlow๋ฅผ ์„ค์น˜ํ•˜์—ฌ SIMD ๋ช…๋ น์–ด ์„ธํŠธ๋ฅผ ์ง€์›ํ•˜๋Š” TF๋ฅผ ์ปดํŒŒ์ผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์•Œ์•˜์–ด ๊ณ ๋งˆ์›Œ. ๋‚˜๋Š” ๊ทธ๊ฒƒ์„ ์–ป๋Š”๋‹ค.

์šฐ๋ฆฌ๊ฐ€ ์ด๊ฒƒ์„ ์นจ๋ฌต์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๊นŒ?

์ด๋Ÿฌํ•œ ๊ฒฝ๊ณ  ๋ฉ”์‹œ์ง€๋ฅผ ์ˆจ๊ธฐ๋Š” ์œ ์ผํ•œ ๋ฐฉ๋ฒ•์€ --config opt ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜์—ฌ ์†Œ์Šค์—์„œ ๋นŒ๋“œํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Unix/Linux/OSX์—์„œ ๋ฉ”์‹œ์ง€๋ฅผ ๋ฆฌ๋””๋ ‰์…˜ํ•˜๋Š” ์ผ์ข…์˜ "ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•"(๋ถˆ์™„์ „ํ•˜์ง€๋งŒ):
ํŒŒ์ด์ฌ myscript.py 2>/dev/null

@CGTheLegend @ocampesato TF ํ™˜๊ฒฝ ๋ณ€์ˆ˜ TF_CPP_MIN_LOG_LEVEL ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

  • ๊ธฐ๋ณธ๊ฐ’์€ 0 ์ด๋ฉฐ ๋ชจ๋“  ๋กœ๊ทธ๋ฅผ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค.
  • ํ•„ํ„ฐ๋งํ•˜๋ ค๋ฉด INFO ๋กœ๊ทธ๊ฐ€ 1๋กœ ์„ค์ •
  • WARNINGS ์ถ”๊ฐ€, 2
  • ERROR ๋กœ๊ทธ๋ฅผ ์ถ”๊ฐ€๋กœ ํ•„ํ„ฐ๋งํ•˜๋ ค๋ฉด 3์œผ๋กœ ์„ค์ •ํ•˜์‹ญ์‹œ์˜ค.

๋”ฐ๋ผ์„œ ๋‹ค์Œ์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ๊ฒฝ๊ณ ๋ฅผ ๋ฌด์Œ์œผ๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf

@gunan @mrry ๊ฒฝ๊ณ ๋ฅผ ๋ฌด์‹œํ•˜๋Š” ๋ฐ ๊ด€์‹ฌ์ด ์žˆ๋Š” ์‚ฌ๋žŒ๋“ค์„ ๋งŽ์ด

tensorflow ์„ค์น˜ ๊ฐ€์ด๋“œ์—์„œ ์„ค์น˜ํ–ˆ๋Š”๋ฐ ์ด ๊ฒฝ๊ณ ๋„ ๋ฐ›์•˜์Šต๋‹ˆ๋‹ค.

pip3 install --upgrade tensorflow

@jadeydi ์†Œ์Šค์—์„œ ์ปดํŒŒ์ผํ•˜๋Š” ๋Œ€์‹  "pip" ๋ฐ”์ด๋„ˆ๋ฆฌ๋„ ์„ค์น˜ํ•˜๋ฉด ์ด๋Ÿฌํ•œ ๊ฒฝ๊ณ ๊ฐ€ ๊ณ„์† ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

๋ฐฉ๊ธˆ SSE4.1 SSE4.2 AVX AVX2 ๋ฐ FMA๋ฅผ ์ง€์›ํ•˜๋Š” tensorflow๋ฅผ ์ปดํŒŒ์ผํ–ˆ์Šต๋‹ˆ๋‹ค. ๋นŒ๋“œ๋Š” https://github.com/lakshayg/tensorflow-build์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด ์œ ์šฉํ•˜๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค.

์•ˆ๋…•ํ•˜์„ธ์š” @lakshayg , ๊ณต์œ ํ•ด ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. https://github.com/yaroslavvb/tensorflow-community-wheels ๋ฅผ ํ™•์ธํ•˜๊ณ  ์‹ถ์„ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

Ubuntu์˜ ํ‘œ์ค€ pip install tensorflow-gpu ์— ๋น„ํ•ด ๋นŒ๋“œ๊ฐ€ ๋Œ€๋žต ํ›จ์”ฌ ๋น ๋ฆ…๋‹ˆ๊นŒ? CPU ๊ณ„์‚ฐ์˜ ๊ฒฝ์šฐ์—๋งŒ ๋” ๋น ๋ฆ…๋‹ˆ๊นŒ, ์•„๋‹ˆ๋ฉด GPU ๊ณ„์‚ฐ์— ์ด์ ์ด ์žˆ์Šต๋‹ˆ๊นŒ?

http://www.anandtech.com/show/2362/5

์ด๊ฒƒ์€ ๊ตฌ๊ธ€์— ์˜ฌ๋ผ์™”๊ณ  ๋ช‡ ๊ฐ€์ง€ ๊ดœ์ฐฎ์€ ๊ธฐ์ˆ ์ ์ธ ์„ธ๋ถ€ ์‚ฌํ•ญ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

ํ…Œ์ŠคํŠธ๋Š” VirtualDub 1.7.6 ๋ฐ DivX 6.7์„ ์‚ฌ์šฉํ•˜๋Š” DivX ์ธ์ฝ”๋”ฉ์ž…๋‹ˆ๋‹ค. SSE4๋Š” MPSADBW ๋ฐ PHMINPOSUW ๋‘ SSE4 ๋ช…๋ น์–ด์— ์˜ํ•ด ๊ฐ€์†ํ™”๋˜๋Š” ๋ชจ์…˜ ์ถ”์ •์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์ „์ฒด ๊ฒ€์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ™œ์„ฑํ™”ํ•˜๋„๋ก ์„ ํƒํ•œ ๊ฒฝ์šฐ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. ์•„์ด๋””์–ด๋Š” ๋ชจ์…˜ ์ถ”์ •(๋น„๋””์˜ค์˜ ํ›„์† ํ”„๋ ˆ์ž„์—์„œ ์–ด๋–ค ์ผ์ด ์ผ์–ด๋‚  ๊ฒƒ์ธ์ง€ ํŒŒ์•…)์—๋Š” ์ ˆ๋Œ€ ์ฐจ์ด์˜ ํ•ฉ์— ๋Œ€ํ•œ ๋งŽ์€ ๊ณ„์‚ฐ๊ณผ ์ด๋Ÿฌํ•œ ๊ณ„์‚ฐ ๊ฒฐ๊ณผ์˜ ์ตœ์†Œ๊ฐ’์„ ์ฐพ๋Š” ๊ฒƒ์ด ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. SSE2 ๋ช…๋ น์–ด PSADBW ๋Š” 16B ๋ถ€ํ˜ธ ์—†๋Š” ์ •์ˆ˜ ์Œ์—์„œ ์ฐจ์ด์˜ ๋‘ ํ•ฉ์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. SSE4 ๋ช…๋ น์–ด MPSADBW ๋Š” 8๊ฐœ๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

...

QX9650์—์„œ SSE4๊ฐ€ ํ™œ์„ฑํ™”๋œ ์ „์ฒด ๊ฒ€์ƒ‰์€ SSE2๋งŒ ์‚ฌ์šฉํ•  ๋•Œ๋ณด๋‹ค ์•ฝ 45% ๋” ๋น ๋ฅด๊ฒŒ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค.

์ด์ œ tensorflow๊ฐ€ ์–ด๋–ค ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์ง€๋งŒ ๋…ธ๋ ฅํ•  ๊ฐ€์น˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

์ฃ„์†กํ•˜์ง€๋งŒ ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ชจ๋“  TF ์Šคํฌ๋ฆฝํŠธ์—์„œ ์ถœ๋ ฅ์„ ๊ฐ–๋Š” ๊ฒƒ์€ ํ„ฐ๋ฌด๋‹ˆ์—†๋Š” ์ผ์ž…๋‹ˆ๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ ์‚ฌ๋žŒ๋“ค์€ ์†Œ์Šค์—์„œ TF๋ฅผ ์ปดํŒŒ์ผํ•˜์ง€๋„ ์›ํ•˜์ง€๋„ ์•Š์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

@Tomashley303 , ์ด๊ฒƒ์€ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ์•„์ฃผ ๋ฉ‹์ง„ ์ •๋ณด์ž…๋‹ˆ๋‹ค! ์†Œ์Šค์—์„œ ๋‹ค์‹œ ์ปดํŒŒ์ผํ•  ๊ณ„ํš์ด ์—†์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ์›ํ•˜์ง€ ์•Š๋Š”๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ •๋ณด๋Š” ๋‚ด ๋ชจ๋ธ์ด ์ปค์ง€๊ณ  ๋Š๋ ค์ง€๊ณ  ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ ์ˆ˜ํ–‰ํ•  ์ž‘์—…์„ ์•Œ๋ ค์ค๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ์ƒˆ ํ•˜๋“œ์›จ์–ด๋ฅผ ๊ตฌ์ž…ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ํ™•์žฅ ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜์—ฌ ์žฌ์ปดํŒŒ์ผํ•˜๋Š” ๊ฒƒ์ด ๋” ์ €๋ ดํ•ฉ๋‹ˆ๋‹ค. ์ข‹์€ ์„ค๋ช…(์šฐ๋ฆฌ๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ์Œ)์ด ์žฌ์ปดํŒŒ์ผ์˜ ์ธ๊ฑด๋น„๋ฅผ ์ตœ์†Œํ™”ํ•œ๋‹ค๋Š” ์ ์„ ๊ฐ์•ˆํ•˜๋ฉด(CPU ์‹œ๊ฐ„์€ ์ค‘์š”ํ•˜์ง€ ์•Š์œผ๋ฉฐ ๋ฐค์ƒˆ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Œ).

๋‚˜๋Š” ๊ทธ ๊ณผ์ •์„ ๊ฑฐ์ณค๊ณ ... ๊ฐ„๋‹จํ•˜๊ณ  ์ „ํ˜€ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ์ง€ ์•Š์•˜๋‹ค. ์ผ๋ฐ˜์ ์ธ cmake C++๋Š” ์•…๋ชฝ์ด ์•„๋‹™๋‹ˆ๋‹ค.

MacOS/Linux์—์„œ TF๋ฅผ ์ปดํŒŒ์ผํ•˜๋Š” ์ž‘์€ bash ์Šคํฌ๋ฆฝํŠธ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. CPU ๊ธฐ๋Šฅ์„ ๋™์ ์œผ๋กœ ๊ณ„์‚ฐํ•˜๊ณ  ๋นŒ๋“œ ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ๋„ฃ์Šต๋‹ˆ๋‹ค. PR์„ ๋งŒ๋“ค๋ ค๊ณ  ์ƒ๊ฐํ–ˆ์ง€๋งŒ ๋กœ์ปฌ ๋นŒ๋“œ์šฉ ์Šคํฌ๋ฆฝํŠธ(๋„์šฐ๋ฏธ)๊ฐ€ ์žˆ๋Š” ํด๋”๋ฅผ ์ฐพ์ง€ ๋ชปํ•˜๊ณ  ci_build๋งŒ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค. ๊ทธ๊ฒƒ์ด ์˜๋ฏธ๊ฐ€ ์žˆ๋‹ค๋ฉด ๋‚˜๋Š” ๊ทธ๊ฒƒ์„ ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค

์š”์ 
https://gist.github.com/venik/9ba962c8b301b0e21f99884cbd35082f

@gunan์—๊ฒŒ ๋ณด๋‚ด๋Š” ๋ฉ”๋ชจ

TensorFlow๋ฅผ ์ฒ˜์Œ ์„ค์น˜ํ•  ๋•Œ ์ด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ ์ƒˆ ์ปดํ“จํ„ฐ์— TensorFlow๋ฅผ ์„ค์น˜ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฌธ์ œ๋ฅผ ๋‹ค์‹œ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋‚ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋ชฉ์ด ์•„ํ”„๊ณ  ๋‹น์‹ ์ด ์ œ๊ณตํ•œ ๋ฌธ์„œ๊ฐ€ ์ „ํ˜€ ๋ช…ํ™•ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๋‚ด ๋งˆ์Œ๋Œ€๋กœ ํ•ด์•ผ ํ•œ๋‹ค๋Š” ์‚ฌ์‹ค์ด ์›ƒ๊ธฐ๊ณ  ํ™”๊ฐ€ ๋‚œ๋‹ค. ํ•˜๋ฃจ ์ข…์ผ ๊ฒฝ๊ณ ๋งŒ ๋˜์ง€๋ฉด pip/pip3์—์„œ ๋ฌด์–ธ๊ฐ€๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฒƒ์€ ์ข‹์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ตœ์†Œํ•œ https://www.tensorflow.org/install/install_sources ๋ฅผ ํŽธ์ง‘ํ•˜๊ณ  SSE/AVX๋กœ ์ปดํŒŒ์ผํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ช…์‹œ์ ์œผ๋กœ ์„ค๋ช…ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋‚˜๋ฅผ ์œ„ํ•ด ์ผํ•œ ์†”๋ฃจ์…˜ : ๊ตฌ์„ฑ ํ”„๋กœ์„ธ์Šค ์ค‘์— ํ”„๋กฌํ”„ํŠธ๊ฐ€ ํ‘œ์‹œ๋˜๋ฉด "-mavx -msse4.1 -msse4.2"๋ฅผ ์ž…๋ ฅํ•˜์‹ญ์‹œ์˜ค(./configure๋ฅผ ์‹คํ–‰ํ•  ๋•Œ).

์ด๊ฒƒ์„ ์„ค์น˜ ์ง€์นจ์— ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ๊ทธ๋ ‡๊ฒŒ ์–ด๋ ค์šด๊ฐ€์š”?

@Carmezim ๋‹ต๋ณ€์— ๋”ฐ๋ฅด๋ฉด avx ๋ฐ sse๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” CPU ์†๋„ ํ–ฅ์ƒ ๋ฒ„์ „์„ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค. Intel์—์„œ fast-rcnn(resnet-101)์„ ํ…Œ์ŠคํŠธํ–ˆ์Šต๋‹ˆ๋‹ค. ๋น„์šฉ ์‹œ๊ฐ„์ด ์•ฝ 30% ๋นจ๋ผ์ ธ ์ •๋ง ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

๊ฒฝ๊ณ ๋ฅผ ๋ฌด์Œ์œผ๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ์ฝ”๋“œ๋ฅผ ์ƒ๋‹จ์— ์ถ”๊ฐ€ํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
์ˆ˜์ž… OS
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
ํ…์„œํ”Œ๋กœ๋ฅผ tf๋กœ ๊ฐ€์ ธ์˜ค๊ธฐ
์—ฌ๊ธฐ์— ์–ธ๊ธ‰๋œ ๋Œ€๋กœ: https://stackoverflow.com/a/44984610

์‹œ์Šคํ…œ ํ™˜๊ฒฝ ๋ณ€์ˆ˜์— ์‚ฌ์šฉ์ž ๋ณ€์ˆ˜๋ฅผ ์‰ฝ๊ฒŒ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
TF_CPP_MIN_LOG_LEVEL, ๊ฐ’ = 2. ๊ทธ๋Ÿฐ ๋‹ค์Œ IDE๋ฅผ ๋‹ค์‹œ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.

@mikalyoung GPU ๊ณ„์‚ฐ์— ๋Œ€ํ•œ ๊ฐœ์„ ์€ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น ๋ช…๋ น์–ด ์„ธํŠธ๋Š” CPU ์ „์šฉ์ด๊ณ  ๋ฒกํ„ฐํ™”๋œ ์ž‘์—…์„ ํ—ˆ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
๋”ฐ๋ผ์„œ GPU์—์„œ (์ด์ƒ์ ์œผ๋กœ๋Š”) 100% ์‹คํ–‰๋˜๋Š” ๋‘ ๊ฐœ์˜ ์ฝ”๋“œ๋ฅผ ๋น„๊ตํ•˜๋ฉด ํ•˜๋‚˜๋Š” SIMD ์ง€์›์œผ๋กœ ์ปดํŒŒ์ผ๋œ Tensorflow ์ธ์Šคํ„ด์Šค์—์„œ ์‹คํ–‰๋˜๊ณ  ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” SIMD ์ง€์› ์—†์ด ์ปดํŒŒ์ผ๋˜๋ฉด ์†๋„ ๋ฉด์—์„œ ๋™์ผํ•œ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

IC:\tf_jenkinshome\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] ๊ท€ํ•˜์˜ CPU๋Š” ์ด TensorFlow ๋ฐ”์ด๋„ˆ๋ฆฌ๊ฐ€ ์‚ฌ์šฉํ•˜๋„๋ก ์ปดํŒŒ์ผ๋˜์ง€ ์•Š์€ ๋ช…๋ น์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค: AVX AVX2

๋ณด์‹œ๋‹ค์‹œํ”ผ ๊ฒฝ๊ณ ๋Š” ๋‚ด ์‹œ์Šคํ…œ์—๋„ ์žˆ์ง€๋งŒ ๊ฒฝ๊ณ ์˜ ์‹œ์ž‘ ๋ถ€๋ถ„์—์„œ '๋‚˜'๋ฅผ ์ดํ•ดํ•  ์ˆ˜ ์—†์œผ๋ฏ€๋กœ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๊ทธ ๊ฒฝ์šฐ ๋‚˜๋ฅผ ๋„์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

"I"๋Š” "INFO"์˜ ์ค„์ž„๋ง์ผ ๋ฟ์ž…๋‹ˆ๋‹ค. ๋ณผ ์ˆ˜ ์žˆ๋Š” ๋‹ค๋ฅธ ๋ฌธ์ž๋Š” E(์˜ค๋ฅ˜) ๋˜๋Š” F(์น˜๋ช…์ )์ž…๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ conda๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์„ค์น˜ํ–ˆ์Šต๋‹ˆ๋‹ค. ์†๋„ ํ–ฅ์ƒ์„ ์ด์šฉํ•˜๊ธฐ ์œ„ํ•ด ๋Œ€์‹  ์†Œ์Šค์—์„œ ์ปดํŒŒ์ผํ•˜๋ ค๋ฉด tensorflow์˜ conda ์„ค์น˜๋ฅผ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด ๋ฌด์—‡์ด๋“  ํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ? ์•„๋‹ˆ๋ฉด ์ž์ฒด ์ž‘์€ ์ปจํ…Œ์ด๋„ˆ์— ์žˆ๊ณ  ์†Œ์Šค์—์„œ ๋ณ„๋„๋กœ ์ปดํŒŒ์ผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

DeepSpeech์™€ DeepSpeech ์„œ๋ฒ„๋„ ์„ค์น˜ํ–ˆ์Šต๋‹ˆ๋‹ค. ์„œ๋ฒ„๋ฅผ ์‹œ์ž‘ํ•˜๊ณ  ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€๊ฐ€ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค - "2018-01-17 08:21:49.120154: F tensorflow/core/platform/cpu_feature_guard.cc:35] TensorFlow ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” AVX2 ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•˜๋„๋ก ์ปดํŒŒ์ผ๋˜์—ˆ์ง€๋งŒ ์ด๊ฒƒ๋“ค์€ ์•„๋‹™๋‹ˆ๋‹ค. ์ปดํ“จํ„ฐ์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ค‘๋‹จ๋จ(์ฝ”์–ด ๋คํ”„๋จ)"

๋ถ„๋ช…ํžˆ ๊ฐ™์€ ์ปดํ“จํ„ฐ์—์„œ TensorFlow๋ฅผ ์ปดํŒŒ์ผํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. Kubuntu 17.10.1 ๋ฐ HP Probook 4330S์™€ ์ผ์น˜ํ•˜๋Š” ๋ชฉ๋ก์ด ์žˆ์Šต๋‹ˆ๊นŒ?

Windows ์ปดํŒŒ์ผ์ด ์—†๋Š” ์ด์œ ๋Š” ๋ฌด์—‡์ž…๋‹ˆ๊นŒ? ๋™์ผํ•œ ๋ฌธ์ œ๊ฐ€ ์žˆ์ง€๋งŒ GPU๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ๋‹ค๋Š” ๊ฒฝ๊ณ ๋ฅผ ๋ฌด์‹œํ•˜๋Š” ๋Œ€์‹  Nvidia๊ฐ€ ์•„๋‹Œ ๋ฐ ๊ทธ๋ž˜ํ”ฝ ์นด๋“œ๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ?

*๋‚˜๋Š” Nvidia ๊ทธ๋ž˜ํ”ฝ ์นด๋“œ๊ฐ€ ์—†๊ณ , ํ•˜๋‚˜๋Š” ์žˆ๋Š”๋ฐ ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ?

*AMD ๊ทธ๋ž˜ํ”ฝ ์นด๋“œ.. ์ž๋™ ๊ณ ์นจ

์ด๊ฒƒ์€ ๋‚ด ํ…Œ์ŠคํŠธ ์ƒ์ž์—์„œ ํ”„๋กœ์„ธ์Šค๋ฅผ ์ข…๋ฃŒํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‹จ์ˆœํ•œ ๊ฒฝ๊ณ ๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค. ๋˜ํ•œ AMD GPU๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— Digital Ocean ํ…์„œํ”Œ๋กœ ์ƒ์ž๋ฅผ ๋Œ๋ ค์„œ ์‚ฌ์šฉํ•ด ๋ณด์•˜์ง€๋งŒ GPU ์ง€์›๋„ ์—†๋Š” ๊ฒƒ ๊ฐ™๊ณ  ๋น„์ฐธํ•˜๊ฒŒ ์‹คํŒจํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

`# ์ž‘์—… ID 0

/home/science/tf-demo/models/nmt-chatbot/model/hparams์—์„œ hparams ๋กœ๋“œ

hparams๋ฅผ /home/science/tf-demo/models/nmt-chatbot/model/hparams์— ์ €์žฅ
hparams๋ฅผ /home/science/tf-demo/models/nmt-chatbot/model/best_bleu/hparams์— ์ €์žฅ
์ฃผ์˜ = scaled_luong
Attention_architecture=ํ‘œ์ค€
๋ฐฐ์น˜ ํฌ๊ธฐ=128
beam_width=10
best_bleu=0
best_bleu_dir=/home/science/tf-demo/models/nmt-chatbot/model/best_bleu
check_special_token=์‚ฌ์‹ค
colocate_gradients_with_ops=์ฐธ
๋ถ•๊ดด ๊ณ„์ˆ˜=1.0
๋ถ•๊ดด_๋‹จ๊ณ„=10000
dev_prefix=/home/science/tf-demo/models/nmt-chatbot/data/tst2012
ํƒˆ๋ฝ=0.2
์ธ์ฝ”๋”_์œ ํ˜•=๋น„
์—์˜ค์Šค=
epoch_step=0
forget_bias=1.0
infer_batch_size=32
init_op=์ œ๋ณต
init_weight=0.1
learning_rate=0.001
learning_rate_decay_scheme=
length_penalty_weight=1.0
log_device_placement=๊ฑฐ์ง“
max_gradient_norm=5.0
max_train=0
์ธก์ •ํ•ญ๋ชฉ=['๋ธ”๋ฃจ']
num_buckets=5
num_embeddings_partitions=0
num_gpus=1
num_layers=2
num_residual_layers=0
num_train_steps=500000
num_translations_per_input=10
num_units=512
์˜ตํ‹ฐ๋งˆ์ด์ €=์•„๋‹ด
out_dir=/home/science/tf-demo/models/nmt-chatbot/model
output_attention=์ฐธ
override_loaded_hparams=์ฐธ
pass_hidden_state=์ฐธ
random_seed=์—†์Œ
์ž”์ฐจ=๊ฑฐ์ง“
share_vocab=๊ฑฐ์ง“
์กฐ์กฐ=
source_reverse=๊ฑฐ์ง“
src=์—์„œ
src_max_len=50
src_max_len_infer=์—†์Œ
src_vocab_file=/home/science/tf-demo/models/nmt-chatbot/data/vocab.from
src_vocab_size=15003
start_decay_step=0
steps_per_external_eval=์—†์Œ
steps_per_stats=100
ํ•˜์œ„ ๋‹จ์–ด_์˜ต์…˜=
test_prefix=/home/science/tf-demo/models/nmt-chatbot/data/tst2013
tgt=์—
tgt_max_len=50
tgt_max_len_infer=์—†์Œ
tgt_vocab_file=/home/science/tf-demo/models/nmt-chatbot/data/vocab.to
tgt_vocab_size=15003
time_major=์ฐธ
train_prefix=/home/science/tf-demo/models/nmt-chatbot/data/train
unit_type=lstm
vocab_prefix=/home/science/tf-demo/models/nmt-chatbot/data/vocab
warmup_scheme=t2t
์ค€๋น„ ๋‹จ๊ณ„=0

๊ธฐ์ฐจ ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ ์ค‘...

num_bi_layers = 1, num_bi_residual_layers=0
์…€ 0 LSTM, forget_bias=1 DropoutWrapper, dropout=0.2 DeviceWrapper, device=/ gpu:0
์…€ 0 LSTM, forget_bias=1 DropoutWrapper, dropout=0.2 DeviceWrapper, device=/ gpu:0
์…€ 0 LSTM, forget_bias=1 DropoutWrapper, dropout=0.2 DeviceWrapper, device=/ gpu:0
์…€ 1 LSTM, forget_bias=1 DropoutWrapper, dropout=0.2 DeviceWrapper, device=/ gpu:0
learning_rate=0.001, warmup_steps=0, warmup_scheme=t2t
decay_scheme=,start_decay_step=0,decay_steps10000,decay_factor 1

ํ›ˆ๋ จ ๊ฐ€๋Šฅํ•œ ๋ณ€์ˆ˜

Embeddings/encoder/embedding_ encoder:0 , (15003, 512),
์ž„๋ฒ ๋”ฉ/๋””์ฝ”๋”/embedding_ ๋””์ฝ”๋”:0 , (15003, 512),
dynamic_seq2seq/encoder/bidirectional_rnn/fw/basic_lstm_cell/ kernel:0 , (1024, 2048), / device:GPU :0
dynamic_seq2seq/encoder/bidirectional_rnn/fw/basic_lstm_cell/ bias:0 , (2048,), / ์žฅ์น˜:GPU :0
dynamic_seq2seq/encoder/bidirectional_rnn/bw/basic_lstm_cell/ kernel:0 , (1024, 2048), / device:GPU :0
dynamic_seq2seq/encoder/bidirectional_rnn/bw/basic_lstm_cell/ bias:0 , (2048,), / ์žฅ์น˜:GPU :0
dynamic_seq2seq/๋””์ฝ”๋”/memory_layer/ ์ปค๋„:0 , (1024, 512),
dynamic_seq2seq/decoder/attention/multi_rnn_cell/cell_0/basic_lstm_cell/ kernel:0 , (1536, 2048), / device:GPU :0
dynamic_seq2seq/decoder/attention/multi_rnn_cell/cell_0/basic_lstm_cell/ bias:0 , (2048,), / device:GPU :0
dynamic_seq2seq/decoder/attention/multi_rnn_cell/cell_1/basic_lstm_cell/ kernel:0 , (1024, 2048), / device:GPU :0
dynamic_seq2seq/decoder/attention/multi_rnn_cell/cell_1/basic_lstm_cell/ bias:0 , (2048,), / device:GPU :0
dynamic_seq2seq/decoder/attention/luong_attention/attention_g:0, (), / device:GPU :0
dynamic_seq2seq/decoder/attention/attention_layer/ kernel:0 , (1536, 512), / device:GPU :0
dynamic_seq2seq/decoder/output_projection/ kernel:0 , (512, 15003), / device:GPU :0

ํ‰๊ฐ€ ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ ์ค‘...

num_bi_layers = 1, num_bi_residual_layers=0
์…€ 0 LSTM, forget_bias=1 DeviceWrapper, device=/ gpu:0
์…€ 0 LSTM, forget_bias=1 DeviceWrapper, device=/ gpu:0
์…€ 0 LSTM, forget_bias=1 DeviceWrapper, device=/ gpu:0
์…€ 1 LSTM, forget_bias=1 DeviceWrapper, device=/ gpu:0

ํ›ˆ๋ จ ๊ฐ€๋Šฅํ•œ ๋ณ€์ˆ˜

Embeddings/encoder/embedding_ encoder:0 , (15003, 512),
์ž„๋ฒ ๋”ฉ/๋””์ฝ”๋”/embedding_ ๋””์ฝ”๋”:0 , (15003, 512),
dynamic_seq2seq/encoder/bidirectional_rnn/fw/basic_lstm_cell/ kernel:0 , (1024, 2048), / device:GPU :0
dynamic_seq2seq/encoder/bidirectional_rnn/fw/basic_lstm_cell/ bias:0 , (2048,), / ์žฅ์น˜:GPU :0
dynamic_seq2seq/encoder/bidirectional_rnn/bw/basic_lstm_cell/ kernel:0 , (1024, 2048), / device:GPU :0
dynamic_seq2seq/encoder/bidirectional_rnn/bw/basic_lstm_cell/ bias:0 , (2048,), / ์žฅ์น˜:GPU :0
dynamic_seq2seq/๋””์ฝ”๋”/memory_layer/ ์ปค๋„:0 , (1024, 512),
dynamic_seq2seq/decoder/attention/multi_rnn_cell/cell_0/basic_lstm_cell/ kernel:0 , (1536, 2048), / device:GPU :0
dynamic_seq2seq/decoder/attention/multi_rnn_cell/cell_0/basic_lstm_cell/ bias:0 , (2048,), / device:GPU :0
dynamic_seq2seq/decoder/attention/multi_rnn_cell/cell_1/basic_lstm_cell/ kernel:0 , (1024, 2048), / device:GPU :0
dynamic_seq2seq/decoder/attention/multi_rnn_cell/cell_1/basic_lstm_cell/ bias:0 , (2048,), / device:GPU :0
dynamic_seq2seq/decoder/attention/luong_attention/attention_g:0, (), / device:GPU :0
dynamic_seq2seq/decoder/attention/attention_layer/ kernel:0 , (1536, 512), / device:GPU :0
dynamic_seq2seq/decoder/output_projection/ kernel:0 , (512, 15003), / device:GPU :0

์ถ”๋ก  ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ ์ค‘...

num_bi_layers = 1, num_bi_residual_layers=0
์…€ 0 LSTM, forget_bias=1 DeviceWrapper, device=/ gpu:0
์…€ 0 LSTM, forget_bias=1 DeviceWrapper, device=/ gpu:0
์…€ 0 LSTM, forget_bias=1 DeviceWrapper, device=/ gpu:0
์…€ 1 LSTM, forget_bias=1 DeviceWrapper, device=/ gpu:0

ํ›ˆ๋ จ ๊ฐ€๋Šฅํ•œ ๋ณ€์ˆ˜

Embeddings/encoder/embedding_ encoder:0 , (15003, 512),
์ž„๋ฒ ๋”ฉ/๋””์ฝ”๋”/embedding_ ๋””์ฝ”๋”:0 , (15003, 512),
dynamic_seq2seq/encoder/bidirectional_rnn/fw/basic_lstm_cell/ kernel:0 , (1024, 2048), / device:GPU :0
dynamic_seq2seq/encoder/bidirectional_rnn/fw/basic_lstm_cell/ bias:0 , (2048,), / ์žฅ์น˜:GPU :0
dynamic_seq2seq/encoder/bidirectional_rnn/bw/basic_lstm_cell/ kernel:0 , (1024, 2048), / device:GPU :0
dynamic_seq2seq/encoder/bidirectional_rnn/bw/basic_lstm_cell/ bias:0 , (2048,), / ์žฅ์น˜:GPU :0
dynamic_seq2seq/๋””์ฝ”๋”/memory_layer/ ์ปค๋„:0 , (1024, 512),
dynamic_seq2seq/decoder/attention/multi_rnn_cell/cell_0/basic_lstm_cell/ kernel:0 , (1536, 2048), / device:GPU :0
dynamic_seq2seq/decoder/attention/multi_rnn_cell/cell_0/basic_lstm_cell/ bias:0 , (2048,), / device:GPU :0
dynamic_seq2seq/decoder/attention/multi_rnn_cell/cell_1/basic_lstm_cell/ kernel:0 , (1024, 2048), / device:GPU :0
dynamic_seq2seq/decoder/attention/multi_rnn_cell/cell_1/basic_lstm_cell/ bias:0 , (2048,), / device:GPU :0
dynamic_seq2seq/decoder/attention/luong_attention/attention_g:0, (), / device:GPU :0
dynamic_seq2seq/decoder/attention/attention_layer/ kernel:0 , (1536, 512), / device:GPU :0
dynamic_seq2seq/decoder/output_projection/ kernel:0 , (512, 15003),

log_file=/home/science/tf-demo/models/nmt-chatbot/model/log_1519669184

2018-02-26 18:19:44.862736: I tensorflow/core/platform/cpu_feature_guard.cc:137] ๊ท€ํ•˜์˜ CPU๋Š” ์ด TensorFlow ๋ฐ”์ด๋„ˆ๋ฆฌ๊ฐ€ ์‚ฌ์šฉํ•˜๋„๋ก ์ปดํŒŒ์ผ๋˜์ง€ ์•Š์€ ๋ช…๋ น์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค: SSE4.1 SSE4.2 AVX AVX2 FMA
์ฃฝ์ž„`

์‹คํ–‰ํ•ด์•ผ ํ•˜๋Š” ๋ช…๋ น๊ณผ ์ด๋Ÿฌํ•œ ๋ช…๋ น์„ ์‹คํ–‰ํ•  ์œ„์น˜ ๋ฐ ๋ฐฉ๋ฒ•์„ ์•Œ๋ ค์ฃผ์‹ญ์‹œ์˜ค. ๋„์›€์ด ์ ˆ์‹คํžˆ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ์‹œ์Šคํ…œ์ด ํ”„๋กœ์„ธ์Šค์— GPU๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๊นŒ?

KerasClassifier์—์„œ k-fold๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์€ ๊ฐ€์† ํ™˜๊ฒฝ์—์„œ tensorflow๋ฅผ ๊ตฌ์ถ•ํ•˜๋Š” ๊ฒฝ์šฐ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๋ฉด ๋ชจ๋‘๊ฐ€ ๊ถŒ์žฅํ•˜๋Š” ๋Œ€๋กœ ์†Œ์Šค์—์„œ tensorflow๋ฅผ ๋นŒ๋“œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
์†Œ์Šค์—์„œ tensorflow๋ฅผ ๋นŒ๋“œํ•˜๋ ค๋ฉด ๋‹ค์Œ ๋„๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

  1. ์•„์ง ๋‹ค์šด๋˜์ง€ ์•Š์€ ๊ฒฝ์šฐ ์ปดํ“จํ„ฐ์— git์„ ์„ค์น˜ํ•˜์‹ญ์‹œ์˜ค - ์šฐ๋ถ„ํˆฌ ์ปดํ“จํ„ฐ์—์„œ "sudo apt-get install git
  2. bazel์„ ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž ์ง€์ • APT ์ €์žฅ์†Œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์ด ๋งํฌ์˜ ์ง€์นจ์— ๋”ฐ๋ผ bazel https://docs.bazel.build/versions/master/install-ubuntu.html ์„ ์„ค์น˜
  3. ์•„๋ž˜ ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์Œ Python ์ข…์†์„ฑ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
    numpy, dev ๋ฐ ํœ 
    sudo apt-get install python-numpy python-dev python-pip python-wheel
    4. ๋ชจ๋“  ์ข…์† ํ•ญ๋ชฉ์„ ์„ค์น˜ํ–ˆ์œผ๋ฉด tensorflow github๋ฅผ ๋กœ์ปฌ ๋“œ๋ผ์ด๋ธŒ์— ๋ณต์ œํ•ฉ๋‹ˆ๋‹ค.
    ์ž์‹ ํด๋ก  https://github.com/tensorflow/tensorflow
  4. tensorflow ๋ฐ cd๋ฅผ tensorflow ํŒŒ์ผ์— ๋ณต์ œํ•  ์œ„์น˜๋กœ ์ด๋™ํ•˜๊ณ  ๊ตฌ์„ฑ ํŒŒ์ผ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
    CD ํ…์„œ
    ./๊ตฌ์„ฑ

ํ™”๋ฉด์˜ ์ง€์‹œ์— ๋”ฐ๋ผ tensorflow ์„ค์น˜๋ฅผ ์™„๋ฃŒํ•˜์‹ญ์‹œ์˜ค.
tensorflow๊ฐ€ ์„ค์น˜๋˜๋ฉด ์ปดํ“จํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.
sudo apt-get ์—…๋ฐ์ดํŠธ

ํ–‰์šด์„ ๋น•๋‹ˆ๋‹ค ๊ทธ๋ฆฌ๊ณ  ์ฆ...

์ด ์Šค๋ ˆ๋“œ์—์„œ ์ด๋Ÿฌํ•œ ๊ฒฝ๊ณ ๋ฅผ ๋ฌด์‹œํ•ด์„œ๋Š” ์•ˆ ๋œ๋‹ค๋Š” ์ ์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค. ์†Œ์Šค์—์„œ ๋นŒ๋“œํ•จ์œผ๋กœ์จ ๊ต์œก ์‹œ๊ฐ„์ด ์•ฝ 43% ๋นจ๋ผ์กŒ์Šต๋‹ˆ๋‹ค. ๋…ธ๋ ฅํ•  ๊ฐ€์น˜๊ฐ€ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

  • ์†Œ์Šค์—์„œ ๋นŒ๋“œ์— ๋Œ€ํ•œ Tensorflow์˜
  • ...ํ•˜์ง€๋งŒ ์‹ค์ œ๋กœ SSE/AVX/FMA ๋“ฑ์„ ์ผœ๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด ์Šค๋ ˆ๋“œ ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Bazel ๋นŒ๋“œ ํ”Œ๋ž˜๊ทธ๋ฅผ ์„ค์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์•„์ด๋””์–ด๋ฅผ ์–ป์œผ์‹ญ์‹œ์˜ค.

์ด ํŒŒ์ผ์„ ์‚ฌ์šฉํ•˜์—ฌ tensorflow๋ฅผ ์„ค์น˜ํ•˜๋Š” ๋ฐฉ๋ฒ•" tensorflow-1.6.0-cp36-cp36m-win_amd64.whl"

@anozele pip3 install --upgrade *path to wheel file*

@gunan --config=opt๋งŒ์œผ๋กœ๋Š” ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์†Œ์Šค์—์„œ TensorFlow๋ฅผ ๋นŒ๋“œํ•  ๋•Œ --copt="-msse4.2"๋„ ์ถ”๊ฐ€ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ธํ…”์— ๋”ฐ๋ฅด๋ฉด https://software.intel.com/en-us/articles/intel-optimization-for-tensorflow-installation-guide , ์ธํ…” ๋นŒ๋“œ Tensorflow๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ๋ช…๋ น ์„ธํŠธ๊ฐ€ ๋ฐฑ์—”๋“œ MKL์—์„œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. Tensorflow์˜ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ์ด๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

์ด๊ฒƒ์€ ์˜ค๋ฅ˜๊ฐ€ ์•„๋‹ˆ๋ผ ์†Œ์Šค์—์„œ TensorFlow๋ฅผ ๋นŒ๋“œํ•˜๋ฉด ์ปดํ“จํ„ฐ์—์„œ ๋” ๋น ๋ฅผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒฝ๊ณ ์ž…๋‹ˆ๋‹ค.

์ด์— ๋Œ€ํ•œ SO ์งˆ๋ฌธ: http://stackoverflow.com/questions/41293077/how-to-compile-tensorflow-with-sse4-2-and-avx-instructions
์†Œ์Šค์—์„œ ๋นŒ๋“œํ•˜๊ธฐ ์œ„ํ•œ TensorFlow ๊ฐ€์ด๋“œ: https://www.tensorflow.org/install/install_sources

๊ทธ๋Ÿฌ๋‚˜ -FMA -AVX -SSE https://stackoverflow.com/questions/57197854/fma-avx-sse-flags-did-not-bring-me-good-performance๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๊ฒƒ๋ณด๋‹ค ๋น ๋ฅด์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์•ˆ๋…•ํ•˜์„ธ์š”. ์ฃฝ์€ ๋ง์„ ๋•Œ๋ฆฌ๋ฉด ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ ํ• ํœ ์ด ๊ณ ๊ธ‰ ์ง€์นจ์œผ๋กœ ์ปดํŒŒ์ผ๋œ ๋ฐ”์ด๋„ˆ๋ฆฌ๊ฐ€ ์•„๋‹Œ ์ด์œ ๊ฐ€ ๊ถ๊ธˆํ•˜์‹ญ๋‹ˆ๊นŒ?

์•ˆ๋…•ํ•˜์„ธ์š”. ์ฃฝ์€ ๋ง์„ ๋•Œ๋ฆฌ๋ฉด ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ ํ• ํœ ์ด ๊ณ ๊ธ‰ ์ง€์นจ์œผ๋กœ ์ปดํŒŒ์ผ๋œ ๋ฐ”์ด๋„ˆ๋ฆฌ๊ฐ€ ์•„๋‹Œ ์ด์œ ๊ฐ€ ๊ถ๊ธˆํ•˜์‹ญ๋‹ˆ๊นŒ?

์ด๋Š” ๊ตฌํ˜• CPU ์•„ํ‚คํ…์ฒ˜๊ฐ€ ๊ณ ๊ธ‰ ๋ช…๋ น์–ด ์„ธํŠธ๋ฅผ ์ง€์›ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. AVX, AVX2 ๋˜๋Š” AVX512๋ฅผ ์ง€์›ํ•˜๋Š” CPU์˜ ์ž์„ธํ•œ ๋ชฉ๋ก์€ wiki ๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค. ๊ธฐ๋ณธ pip ๋ฐ”์ด๋„ˆ๋ฆฌ๊ฐ€ ์ด๋Ÿฌํ•œ ๋ช…๋ น์–ด ์„ธํŠธ๋กœ ์ปดํŒŒ์ผ๋˜๋ฉด tensorflow๋Š” ์ด์ „ CPU์—์„œ ์ž‘๋™ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ์‹œ์Šคํ…œ์ด ํ”„๋กœ์„ธ์Šค์— GPU๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๊นŒ?

์•„๋‹ˆ์š”, GPU๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ์—๋„ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. ๋ฉ”์‹œ์ง€๋ฅผ ์ฐจ๋‹จํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ ๋ช…๋ น ํ”„๋กฌํ”„ํŠธ์—์„œ GPU ์žฅ์น˜๋ฅผ ๋กœ๋“œํ•˜๋Š” Tensorflor๋„ ํ‘œ์‹œ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ด ๋ฆฌํฌ์ง€ํ† ๋ฆฌ๋กœ ํ™•์ธํ•˜๋Š” ๊ฒฝ์šฐ:
์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

https://github.com/fo40225/tensorflow-windows-wheel

๊ทธ๋Š” SSE ๋ฐ AVX๋กœ ๊ฑฐ์˜ ๋ชจ๋“  ๋ฒ„์ „์˜ TF๋ฅผ ์ปดํŒŒ์ผํ–ˆ์Šต๋‹ˆ๋‹ค.
๊ทธ๋Š” ๊ฑฐ์˜ ๋ชจ๋“  TF ๋ฒ„์ „์„ ํŽธ์ง‘ํ–ˆ์Šต๋‹ˆ๋‹ค!

์ด ๊ธฐ์‚ฌ๋Š” ํ”Œ๋ž˜๊ทธ๋ฅผ ํฌํ•จํ•˜์—ฌ ์†Œ์Šค์—์„œ ๋นŒ๋“œํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ข‹์€ ์ž์Šต์„œ์˜€์Šต๋‹ˆ๋‹ค.
https://medium.com/@pierreontech/setup -a-high-performance-conda-tensorflow-environment-976995158cb1

--copt=-mavx --copt=-msse4.1 --copt=-msse4.2 ์™€ ๊ฐ™์€ ์ถ”๊ฐ€ bazel ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜์—ฌ ์ ์ ˆํ•œ ํ™•์žฅ์„ ๊ฐ•์ œ๋กœ ํฌํ•จ์‹œํ‚ค์‹ญ์‹œ์˜ค.

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰