Kubernetes: k8s: v1.3.10, gpu๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ gpu์—์„œ ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

์— ๋งŒ๋“  2017๋…„ 02์›” 27์ผ  ยท  3์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: kubernetes/kubernetes

์ด์ œ ์ด ์งˆ๋ฌธ์— ๋Œ€ํ•ด ์ž‘์—…ํ•  ์‹œ๊ฐ„์ด ๊ฑฐ์˜ 1์ฃผ์ผ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋‚˜๋Š” ์‹คํŒจํ–ˆ๋‹ค.
ํ™˜๊ฒฝ: redhat7.2
k8s:v1.3.10 cuda:v7.5 ์ปค๋„ ๋ฒ„์ „:367.44 t ensorflow:0.11 gpu:1080
tensorflow ๋ฐ k8s๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” ์šฐ๋ฆฌ ํ”Œ๋žซํผใ€‚ ML์— ๋Œ€ํ•œ ๊ต์œก์„ ์œ„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
cpu ์‚ฌ์šฉ์‹œ ์ •์ƒ์ธ๋ฐ gpu๊ฐ€ ์•ˆ๋˜๋Š” ์ด์œ ๋ฅผ ์•Œ๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.
๋‚˜๋Š” ๋‹น์‹ ์ด ๋งํ•œ ๋งŽ์€ ์˜ˆ๋ฅผ ํ…Œ์ŠคํŠธํ–ˆ์ง€๋งŒ ์—ฌ์ „ํžˆ ์‹คํŒจํ–ˆ์Šต๋‹ˆ๋‹ค.
๋‚ด ํด๋Ÿฌ์Šคํ„ฐ: 1 ๋งˆ์Šคํ„ฐ 2 ๋…ธ๋“œ. ๋ชจ๋“  ๋…ธ๋“œ์—๋Š” GPU ์นด๋“œ๊ฐ€ ์žˆ์ง€๋งŒ ๋งˆ์Šคํ„ฐ๋งŒ ์—†์Šต๋‹ˆ๋‹ค.
๋จผ์ € @Hui-Zhi๊ฐ€ ๋งํ•œ ๊ฒƒ์ฒ˜๋Ÿผ ํ…Œ์ŠคํŠธํ•ฉ๋‹ˆ๋‹ค.

vim  test.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nvidia-gpu-test
spec:
  containers:
  - name: nvidia-gpu
    image: nginx
    resources:
      limits:
        alpha.kubernetes.io/nvidia-gpu: 1

์˜ˆ, ํ…Œ์ŠคํŠธํ–ˆ์œผ๋ฉฐ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. nvidia-gpu: 1์—์„œ 2๋กœ ๋ณ€๊ฒฝํ•˜๋ฉด ์‹คํŒจํ•ฉ๋‹ˆ๋‹ค. ํฌ๋“œ ์œ ์ง€ ๋ณด๋ฅ˜ ์ค‘์ž…๋‹ˆ๋‹ค. ๋ชจ๋“  ๋…ธ๋“œ์— GPU ์นด๋“œ๊ฐ€ ํ•˜๋‚˜๋งŒ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋…ธ๋“œ๊ฐ€ ์ด๋ฅผ ๋งŒ์กฑ์‹œํ‚ฌ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์ œ ์ƒ๊ฐ์—๋Š” ์ž‘๋™ํ•˜๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
๊ทธ๋Ÿฌ๋‚˜ ์งˆ๋ฌธ์ด ์˜ค๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค: GPU์—์„œ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•? ์ด ์˜ˆ๋Š” k8์ด gpu๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๊ณ  gpu๋ฅผ ์•Œ ์ˆ˜ ์žˆ์Œ์„ ์ฆ๋ช…ํ•˜์ง€๋งŒ ์–ด๋–ป๊ฒŒ ์‹คํ–‰ํ•ฉ๋‹ˆ๊นŒ? yaml ํŒŒ์ผ์„ ์‚ฌ์šฉํ•˜์—ฌ GPU ๋ฆฌ์†Œ์Šค์—์„œ ์‹คํ–‰๋˜๋Š” ํ•˜๋‚˜์˜ ํฌ๋“œ๋ฅผ ์ƒ์„ฑํ•˜๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ?

๊ทธ๋Ÿฐ ๋‹ค์Œ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์„ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค. nvidia-docker
๋‚˜๋Š” gpu-image: gcr.io/tensorflow/t ensorflow:0.11-gpu ๋ฅผ ํ’€๊ณ  docker, docker run -it ${image} /bin/bash์— ๋”ฐ๋ผ mnist.py ๋ฐ๋ชจ๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
ํ•˜์ง€๋งŒ ์‹คํŒจํ–ˆ๋‹ค. "CUDA ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ libcuda.so๋ฅผ ์—ด ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. libcuda.so๋ฅผ ์ฐพ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค"์™€ ๊ฐ™์€ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค.
๋ˆ„๊ตฐ๊ฐ€ ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ๋Š”์ง€ ์—ฌ๋ถ€
๊ทธ๋Ÿฐ ๋‹ค์Œ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋งํ•œ ๊ฒƒ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค. GPU๋Š” nvidia-docker๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
์šด ์ข‹๊ฒŒ๋„ tensorflow๋กœ ์„ค์น˜ํ–ˆ์Šต๋‹ˆ๋‹ค. https://www.tensorflow.org/install/install_linux#gpu_support๊ฐ€ ๋งํ–ˆ์Šต๋‹ˆ๋‹ค. nvidia-docker์— ๋”ฐ๋ฅด๋ฉด GPU์—์„œ ํ›ˆ๋ จ์ด ์‹คํ–‰๋˜๊ณ  GPU ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๊ฑฐ์˜ 7g, ๊ฑฐ์˜ 70%์ธ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค.
๋‚˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‹คํ–‰ํ•œ๋‹ค: nvidia-docker run -it ${image} /bin/bash
ํŒŒ์ด์ฌ mnist.py
์˜ˆ, ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ƒˆ๋กœ์šด ์งˆ๋ฌธ์ด ์˜ค๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. cpu์—์„œ ์‹คํ–‰ํ•˜๋ ค๋ฉด docker๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๊ณ  gpu์—์„œ๋Š” nvidia-docker๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ? ๋‚œ ๊ทธ๋ƒฅ docker, ์•„๋งˆ๋„ nvidia-docker์—์„œ๋งŒ gpu์—์„œ ์‹คํ–‰ํ•˜์ง€๋งŒ k8s์—์„œ gpu๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•.
k8s ์ปจํ…Œ์ด๋„ˆ๋Š” ๋„์ปค๋ฅผ ์‚ฌ์šฉํ–ˆ์ง€๋งŒ nvidia-docker๋Š” ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜์œผ๋ฏ€๋กœ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ์ด ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ? ๋„์™€์ฃผ์‹œ๊ฒ ์Šต๋‹ˆ๊นŒ? k8์ด gpu๋ฅผ ์ง€์›ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์ฆ๋ช…ํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ๋ชจ๋‚˜ ํ…Œ์ŠคํŠธ yaml์ด ์•„๋‹Œ k8s์—์„œ gpu๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.
๋‹น์‹ ์ด ๋‚˜์—๊ฒŒ ๋Œ€๋‹ต ํ•  ์ˆ˜ ์žˆ๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค, ๋‚˜๋Š” ๊ธฐ๋‹ค๋ฆฌ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค ....
๊ฐ์‚ฌ.

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

๋ฐฉ๊ธˆ ํ…Œ์ŠคํŠธํ–ˆ๋Š”๋ฐ ์ž‘๋™ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด์ „์— ๋งˆ์šดํŠธํ•œ ๋ณผ๋ฅจ์ด ์ž˜๋ชป๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋‚ด๊ฐ€ ์•„๋ž˜์™€ ๊ฐ™์ด ์‚ฌ์šฉํ•œ ์ƒˆ๋กœ์šด yaml

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
spec:
  volumes:
  - name: nvidia-driver
    hostPath:
      path: /var/lib/nvidia-docker/volumes/nvidia_driver/367.44
  containers:
  - name: tensorflow
    image: tensorflow:0.11.0-gpu
    ports:
    - containerPort: 8000
    resources:
      limits:
        alpha.kubernetes.io/nvidia-gpu: 1
    volumeMounts:
    - name: nvidia-driver
      mountPath: /usr/local/nvidia/
      readOnly: true

๋‚˜๋Š” ๋‚ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐ, ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค

๋ชจ๋“  3 ๋Œ“๊ธ€

@tbchj #42116์ด ์ด์ œ ๋ณ‘ํ•ฉ๋˜์—ˆ์œผ๋ฉฐ 1.6๊ณผ ํ•จ๊ป˜ ์ถœ์‹œ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

@cmluciano ์˜ˆ, ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ์•„๋งˆ๋„ ๋‹น์‹ ์ด ๋งž์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ฐฉ๊ธˆ # 42116์„ ์™„์ „ํžˆ ์ฝ์—ˆ์Šต๋‹ˆ๋‹ค. ํ•„์š”ํ•œ ๊ฒƒ์ด์žˆ๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋ฐฉ๊ธˆ ํ…Œ์ŠคํŠธํ–ˆ๋Š”๋ฐ ์ž‘๋™ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด์ „์— ๋งˆ์šดํŠธํ•œ ๋ณผ๋ฅจ์ด ์ž˜๋ชป๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋‚ด๊ฐ€ ์•„๋ž˜์™€ ๊ฐ™์ด ์‚ฌ์šฉํ•œ ์ƒˆ๋กœ์šด yaml

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
spec:
  volumes:
  - name: nvidia-driver
    hostPath:
      path: /var/lib/nvidia-docker/volumes/nvidia_driver/367.44
  containers:
  - name: tensorflow
    image: tensorflow:0.11.0-gpu
    ports:
    - containerPort: 8000
    resources:
      limits:
        alpha.kubernetes.io/nvidia-gpu: 1
    volumeMounts:
    - name: nvidia-driver
      mountPath: /usr/local/nvidia/
      readOnly: true

๋‚˜๋Š” ๋‚ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐ, ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰