如何在kubernetes环境中共享GPU

随着人工智能和大模型的快速发展,云上GPU资源共享变得必要,因为它可以降低硬件成本,提升资源利用效率,并满足模型训练和推理对大规模并行计算的需求。

在kubernetes内置的资源调度功能中,GPU调度只能根据“核数”进行调度,但是深度学习等算法程序执行过程中,资源占用比较高的是显存,这样就形成了很多的资源浪费。

目前的GPU资源共享方案有两种。一种是将一个真正的GPU分解为多个虚拟GPU,即vGPU,这样就可以基于vGPU的数量进行调度;另一种是根据GPU的显存进行调度。

本文将讲述如何安装kubernetes组件实现根据GPU显存调度资源。

系统信息

  • 系统:centos stream8

  • 内核:4.18.0-490.el8.x86_64

  • 驱动:NVIDIA-Linux-x86_64-470.182.03

  • docker:20.10.24

  • kubernetes版本:1.24.0

1. 驱动安装

请登录nvida官网自行安装:https://www.nvidia.com/Download/index.aspx?lang=en-us

2. docker安装

请自行安装docker或其他容器运行时,如果使用其他容器运行时,第三步配置请参考NVIDA官网 https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installation-guide

注意:官方支持docker、containerd、podman,但本文档只验证过docker的使用,如果使用其他容器运行时,请注意差异性。

3. NVIDIA Container Toolkit 安装

  1. 设置仓库与GPG Key
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)     && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo 
  1. 开始安装
sudo dnf clean expire-cache --refresh sudo dnf install -y nvidia-container-toolkit 
  1. 修改docker配置文件添加容器运行时实现
sudo nvidia-ctk runtime configure --runtime=docker 
  1. 修改/etc/docker/daemon.json,设置nvidia为默认容器运行时(必需)
{     "default-runtime": "nvidia",     "runtimes": {         "nvidia": {             "path": "/usr/bin/nvidia-container-runtime",             "runtimeArgs": []         }     } } 
  1. 重启docker并开始验证是否生效
sudo systemctl restart docker sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi 

如果返回如下数据,说明配置成功

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     | |-------------------------------+----------------------+----------------------+ | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC | | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. | |                               |                      |               MIG M. | |===============================+======================+======================| |   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 | | N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default | |                               |                      |                  N/A | +-------------------------------+----------------------+----------------------+ ​ +-----------------------------------------------------------------------------+ | Processes:                                                                  | |  GPU   GI   CI        PID   Type   Process name                  GPU Memory | |        ID   ID                                                   Usage      | |=============================================================================| |  No running processes found                                                 | +-----------------------------------------------------------------------------+ 

4. 安装K8S GPU调度器

  1. 首先执行以下yaml,部署调度器
# rbac.yaml --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata:   name: gpushare-schd-extender rules:   - apiGroups:       - ""     resources:       - nodes     verbs:       - get       - list       - watch   - apiGroups:       - ""     resources:       - events     verbs:       - create       - patch   - apiGroups:       - ""     resources:       - pods     verbs:       - update       - patch       - get       - list       - watch   - apiGroups:       - ""     resources:       - bindings       - pods/binding     verbs:       - create   - apiGroups:       - ""     resources:       - configmaps     verbs:       - get       - list       - watch --- apiVersion: v1 kind: ServiceAccount metadata:   name: gpushare-schd-extender   namespace: kube-system --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata:   name: gpushare-schd-extender   namespace: kube-system roleRef:   apiGroup: rbac.authorization.k8s.io   kind: ClusterRole   name: gpushare-schd-extender subjects:   - kind: ServiceAccount     name: gpushare-schd-extender     namespace: kube-system ​ # deployment yaml --- kind: Deployment apiVersion: apps/v1 metadata:   name: gpushare-schd-extender   namespace: kube-system spec:   replicas: 1   strategy:     type: Recreate   selector:     matchLabels:       app: gpushare       component: gpushare-schd-extender   template:     metadata:       labels:         app: gpushare         component: gpushare-schd-extender       annotations:         scheduler.alpha.kubernetes.io/critical-pod: ''     spec:       hostNetwork: true       tolerations:         - effect: NoSchedule           operator: Exists           key: node-role.kubernetes.io/master         - effect: NoSchedule           key: node-role.kubernetes.io/control-plane           operator: Exists         - effect: NoSchedule           operator: Exists           key: node.cloudprovider.kubernetes.io/uninitialized       nodeSelector:         node-role.kubernetes.io/control-plane: ""       serviceAccount: gpushare-schd-extender       containers:         - name: gpushare-schd-extender           image: registry.cn-hangzhou.aliyuncs.com/acs/k8s-gpushare-schd-extender:1.11-d170d8a           env:             - name: LOG_LEVEL               value: debug             - name: PORT               value: "12345" ​ # service.yaml --- apiVersion: v1 kind: Service metadata:   name: gpushare-schd-extender   namespace: kube-system   labels:     app: gpushare     component: gpushare-schd-extender spec:   type: NodePort   ports:     - port: 12345       name: http       targetPort: 12345       nodePort: 32766   selector:     # select app=ingress-nginx pods     app: gpushare     component: gpushare-schd-extender 
  1. 在/etc/kubernetes目录下添加调度策略配置文件
#scheduler-policy-config.yaml --- apiVersion: kubescheduler.config.k8s.io/v1beta2 kind: KubeSchedulerConfiguration clientConnection:   kubeconfig: /etc/kubernetes/scheduler.conf extenders:     # 不知道为什么不支持svc的方式调用,必须用nodeport   - urlPrefix: "http://gpushare-schd-extender.kube-system:12345/gpushare-scheduler"     filterVerb: filter     bindVerb: bind     enableHTTPS: false     nodeCacheCapable: true     managedResources:       - name: aliyun.com/gpu-mem         ignoredByScheduler: false     ignorable: false 

上面的 http://gpushare-schd-extender.kube-system:12345 注意要替换为你本地部署的{nodeIP}:{gpushare-schd-extender的nodeport端口},否则会访问不到

查询命令如下:

kubectl get service gpushare-schd-extender -n kube-system -o jsonpath='{.spec.ports[?(@.name=="http")].nodePort}' 
  1. 修改kubernetes调度配置 /etc/kubernetes/manifests/kube-scheduler.yaml
1. 在commond中添加  - --config=/etc/kubernetes/scheduler-policy-config.yaml ​ 2. 添加pod挂载目录 在volumeMounts:中添加 - mountPath: /etc/kubernetes/scheduler-policy-config.yaml   name: scheduler-policy-config   readOnly: true 在volumes:中添加 - hostPath:       path: /etc/kubernetes/scheduler-policy-config.yaml       type: FileOrCreate   name: scheduler-policy-config 

注意:这里千万不要改错,否则可能会出现莫名其妙的错误
示例如下:
如何在kubernetes环境中共享GPU
如何在kubernetes环境中共享GPU

  1. 配置rbac及安装device插件
kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-rbac.yaml kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-ds.yaml 

5. 在GPU节点上添加标签

kubectl label node <target_node> gpushare=true 

6. 安装kubectl Gpu 插件

cd /usr/bin/ wget https://github.com/AliyunContainerService/gpushare-device-plugin/releases/download/v0.3.0/kubectl-inspect-gpushare chmod u+x /usr/bin/kubectl-inspect-gpushare 

7. 验证

  1. 使用kubectl查询GPU资源使用情况
# kubectl inspect gpushare NAME                                IPADDRESS     GPU0(Allocated/Total)  GPU Memory(GiB) cn-shanghai.i-uf61h64dz1tmlob9hmtb  192.168.0.71  6/15                   6/15 cn-shanghai.i-uf61h64dz1tmlob9hmtc  192.168.0.70  3/15                   3/15 ------------------------------------------------------------------------------ Allocated/Total GPU Memory In Cluster: 9/30 (30%) 
  1. 创建一个有GPU需求的资源,查看其资源调度情况
apiVersion: apps/v1 kind: Deployment metadata:   name: binpack-1   labels:     app: binpack-1 spec:   replicas: 1   selector: # define how the deployment finds the pods it manages     matchLabels:       app: binpack-1   template: # define the pods specifications     metadata:       labels:         app: binpack-1     spec:       tolerations:         - effect: NoSchedule           key: cloudClusterNo           operator: Exists               containers:         - name: binpack-1           image: cheyang/gpu-player:v2           resources:             limits:               # 单位GiB               aliyun.com/gpu-mem: 3  

8. 问题排查

如果在安装过程中发现资源未安装成功,可以通过pod查看日志

kubectl get po -n kube-system -o=wide | grep gpushare-device  kubecl logs -n kube-system <pod_name> 

参考地址:
NVIDA官网container-toolkit安装文档: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker
阿里云GPU插件安装:https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/docs/install.md

发表评论

评论已关闭。

相关文章