link icon replaced

This commit is contained in:
govardhan
2025-06-19 14:09:10 +05:30
parent 60adbde60c
commit 172f8e2b34
158 changed files with 996 additions and 996 deletions

View File

@ -1,4 +1,4 @@
Deploying vGPU workloads on CloudFerro Cloud Kubernetes[](#deploying-vgpu-workloads-on-brand-name-kubernetes "Permalink to this headline")
Deploying vGPU workloads on CloudFerro Cloud Kubernetes[🔗](#deploying-vgpu-workloads-on-brand-name-kubernetes "Permalink to this headline")
===========================================================================================================================================
Utilizing GPU (Graphical Processing Units) presents a highly efficient alternative for fast, highly parallel processing of demanding computational tasks such as image processing, machine learning and many others.
@ -7,7 +7,7 @@ In cloud environment, virtual GPU units (vGPU) are available with certain Virtua
We will present three alternative ways for adding vGPU capability to your Kubernetes cluster, based on your required scenario. For each, you should be able to verify the vGPU installation and test it by running vGPU workload.
What Are We Going To Cover[](#what-are-we-going-to-cover "Permalink to this headline")
What Are We Going To Cover[🔗](#what-are-we-going-to-cover "Permalink to this headline")
---------------------------------------------------------------------------------------
> * **Scenario No. 1** - Add vGPU nodes as a nodegroup on a non-GPU Kubernetes clusters created **after** June 21st 2023
@ -17,7 +17,7 @@ What Are We Going To Cover[](#what-are-we-going-to-cover "Permalink to this h
> * Test vGPU workload
> * Add non-GPU nodegroup to a GPU-first cluster
Prerequisites[](#prerequisites "Permalink to this headline")
Prerequisites[🔗](#prerequisites "Permalink to this headline")
-------------------------------------------------------------
No. 1 **Hosting**
@ -44,7 +44,7 @@ No. 4 **Familiarity with the notion of nodegroups**
[Creating Additional Nodegroups in Kubernetes Cluster on CloudFerro Cloud OpenStack Magnum](Creating-Additional-Nodegroups-in-Kubernetes-Cluster-on-CloudFerro-Cloud-OpenStack-Magnum.html.md).
vGPU flavors per cloud[](#vgpu-flavors-per-cloud "Permalink to this headline")
vGPU flavors per cloud[🔗](#vgpu-flavors-per-cloud "Permalink to this headline")
-------------------------------------------------------------------------------
Below is the list of GPU flavors in each cloud, applicable for using with Magnum Kubernetes service.
@ -80,7 +80,7 @@ FRA1-2
> | **vm.l40s.2** | 8 | 29.8 GB | 80 GB | Yes |
> | **vm.l40s.8** | 32 | 119.22 GB | 320 GB | Yes |
Hardware comparison between RTX A6000 and NVIDIA L40S[](#hardware-comparison-between-rtx-a6000-and-nvidia-l40s "Permalink to this headline")
Hardware comparison between RTX A6000 and NVIDIA L40S[🔗](#hardware-comparison-between-rtx-a6000-and-nvidia-l40s "Permalink to this headline")
---------------------------------------------------------------------------------------------------------------------------------------------
The NVIDIA L40S is designed for 24x7 enterprise data center operations and optimized to deploy at scale. As compared to A6000, NVIDIA L40S is better for
@ -90,7 +90,7 @@ The NVIDIA L40S is designed for 24x7 enterprise data center operations and optim
> * real-time ray tracing applications and is
> * faster for in memory-intensive tasks.
Table 1 Comparison of NVIDIA RTX A6000 vs NVIDIA L40S[](#id1 "Permalink to this table")
Table 1 Comparison of NVIDIA RTX A6000 vs NVIDIA L40S[🔗](#id1 "Permalink to this table")
| Specification | NVIDIA RTX A60001 | NVIDIA L40S1 |
| --- | --- | --- |
@ -103,7 +103,7 @@ Table 1 Comparison of NVIDIA RTX A6000 vs NVIDIA L40S[](#id1 "Permalink to th
| **Performance** | Strong performance for diverse workloads | Superior AI and machine learning performance |
| **Use Cases** | 3D rendering, video editing, AI development | Data center, large-scale AI, enterprise applications |
Scenario 1 - Add vGPU nodes as a nodegroup on a non-GPU Kubernetes clusters created after June 21st 2023[](#scenario-1-add-vgpu-nodes-as-a-nodegroup-on-a-non-gpu-kubernetes-clusters-created-after-june-21st-2023 "Permalink to this headline")
Scenario 1 - Add vGPU nodes as a nodegroup on a non-GPU Kubernetes clusters created after June 21st 2023[🔗](#scenario-1-add-vgpu-nodes-as-a-nodegroup-on-a-non-gpu-kubernetes-clusters-created-after-june-21st-2023 "Permalink to this headline")
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
In order to create a new nodegroup, called **gpu**, with one node vGPU flavor, say, **vm.a6000.2**, we can use the following Magnum CLI command:
@ -140,7 +140,7 @@ We get:
The result is that a new nodegroup called **gpu** is created in the cluster and that it is using the GPU flavor.
Scenario 2 - Add vGPU nodes as nodegroups on non-GPU Kubernetes clusters created before June 21st 2023[](#scenario-2-add-vgpu-nodes-as-nodegroups-on-non-gpu-kubernetes-clusters-created-before-june-21st-2023 "Permalink to this headline")
Scenario 2 - Add vGPU nodes as nodegroups on non-GPU Kubernetes clusters created before June 21st 2023[🔗](#scenario-2-add-vgpu-nodes-as-nodegroups-on-non-gpu-kubernetes-clusters-created-before-june-21st-2023 "Permalink to this headline")
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The instructions are the same as in the previous scenario, with the exception of adding an additional label:
@ -194,7 +194,7 @@ openstack coe nodegroup list $CLUSTER_ID_OLDER --max-width 120
```
Scenario 3 - Create a new GPU-first Kubernetes cluster with vGPU-enabled default nodegroup[](#scenario-3-create-a-new-gpu-first-kubernetes-cluster-with-vgpu-enabled-default-nodegroup "Permalink to this headline")
Scenario 3 - Create a new GPU-first Kubernetes cluster with vGPU-enabled default nodegroup[🔗](#scenario-3-create-a-new-gpu-first-kubernetes-cluster-with-vgpu-enabled-default-nodegroup "Permalink to this headline")
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
To create a new vGPU-enabled cluster, you can use the usual Horizon commands, selecting one of the existing templates with **vgu** in their names:
@ -226,7 +226,7 @@ openstack coe cluster create k8s-gpu-with_template \
```
### Verify the vGPU installation[](#verify-the-vgpu-installation "Permalink to this headline")
### Verify the vGPU installation[🔗](#verify-the-vgpu-installation "Permalink to this headline")
You can verify that vGPU-enabled nodes were properly added to your cluster, by checking the **nvidia-device-plugin** deployed in the cluster, to the **nvidia-device-plugin** namespace. The command to list the contents of the **nvidia** namespace is:
@ -282,7 +282,7 @@ kubectl describe node k8s-gpu-with-template-lfs5335ymxcn-node-0 | grep 'Taints'
```
### Run test vGPU workload[](#run-test-vgpu-workload "Permalink to this headline")
### Run test vGPU workload[🔗](#run-test-vgpu-workload "Permalink to this headline")
We can run a sample workload on vGPU. To do so, create a YAML manifest file **vgpu-pod.yaml**, with the following contents:
@ -339,7 +339,7 @@ Done
```
Add non-GPU nodegroup to a GPU-first cluster[](#add-non-gpu-nodegroup-to-a-gpu-first-cluster "Permalink to this headline")
Add non-GPU nodegroup to a GPU-first cluster[🔗](#add-non-gpu-nodegroup-to-a-gpu-first-cluster "Permalink to this headline")
---------------------------------------------------------------------------------------------------------------------------
We refer to GPU-first clusters as the ones created with **worker\_type=gpu** flag. For example, in cluster created with Scenario No. 3, the default nodegroup consists of vGPU nodes.