Kubernetes: k8s: v1.3.10 , how to use gpu,to run container on gpu?

Created on 27 Feb 2017  ·  3Comments  ·  Source: kubernetes/kubernetes

now i have almost one week to work on this question. but i failed.
environment: redhat7.2
k8s:v1.3.10 cuda:v7.5 kernel version:367.44 tensorflow:0.11 gpu:1080
our platform based on tensorflow and k8s。it is for training about ML.
when use cpu, it's ok, but can't work on gpu,i want to know why.
i tested many examples you said,but still failed
my cluster: 1 master 2 node. every node has a gpu card, only master hasn't
first I test like @Hui-Zhi said :

vim  test.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nvidia-gpu-test
spec:
  containers:
  - name: nvidia-gpu
    image: nginx
    resources:
      limits:
        alpha.kubernetes.io/nvidia-gpu: 1

yes, i tested, and it works. if i change nvidia-gpu: 1 to 2, failed. pod keeping pending. and describe found : no node can satisfied this .because every node has only one gpu card, i think it works.
but question is coming: how to run on gpu? this example only prove that k8s can get gpu,and know gpu,but how to run on it? how can i use yaml file to create one pod run on gpu resource?

then , i found another way: nvidia-docker
i pull gpu-image: gcr.io/tensorflow/tensorflow:0.11-gpu, and run mnist.py demo according to docker, docker run -it ${image} /bin/bash
but failed. something error like "can't open CUDA libarary libcuda.so, cant find libcuda.so ",
Whether someone has encountered the same problem?
then i found that someone said: gpu need use nvidia-docker
luckly i installed as tensorflow: https://www.tensorflow.org/install/install_linux#gpu_support said,accord to nvidia-docker i found my training run on gpu,and gpu memory almost 7g, almost 70%
i run like this: nvidia-docker run -it ${image} /bin/bash
python mnist.py
yes, it works. but a new question coming: should i use docker to run on cpu,and nvidia-docker on gpu? i just run on gpu only on docker , maybe nvidia-docker, but how to run gpu on k8s.
k8s container used docker but not nvidia-docker,so how can i to do this by the same way ,can you help me ? i want to know how to run gpu on k8s, not just a demo or a test yaml to prove k8s support gpu.
hopfully you can answer me,i'm waiting ....
thanks.

Most helpful comment

i just tested, it did work. the volumn i mount was wrong before . the new yaml i used as below

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
spec:
  volumes:
  - name: nvidia-driver
    hostPath:
      path: /var/lib/nvidia-docker/volumes/nvidia_driver/367.44
  containers:
  - name: tensorflow
    image: tensorflow:0.11.0-gpu
    ports:
    - containerPort: 8000
    resources:
      limits:
        alpha.kubernetes.io/nvidia-gpu: 1
    volumeMounts:
    - name: nvidia-driver
      mountPath: /usr/local/nvidia/
      readOnly: true

i solve my problem, thank you

All 3 comments

@tbchj #42116 is now merged and should be released with 1.6

@cmluciano yes, thank you,maybe you are right. i just read #42116 totally, it seems has something i need.

i just tested, it did work. the volumn i mount was wrong before . the new yaml i used as below

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
spec:
  volumes:
  - name: nvidia-driver
    hostPath:
      path: /var/lib/nvidia-docker/volumes/nvidia_driver/367.44
  containers:
  - name: tensorflow
    image: tensorflow:0.11.0-gpu
    ports:
    - containerPort: 8000
    resources:
      limits:
        alpha.kubernetes.io/nvidia-gpu: 1
    volumeMounts:
    - name: nvidia-driver
      mountPath: /usr/local/nvidia/
      readOnly: true

i solve my problem, thank you

Was this page helpful?
0 / 5 - 0 ratings