MetaX GPU Component Installation and Usage¶
This chapter provides installation guidance for MetaX's gpu-extensions, gpu-operator, and other components, as well as usage methods for both the full GPU and vGPU modes.
Prerequisites¶
- The required tar package has been downloaded and installed from the MetaX Software Center. This article uses metax-gpu-k8s-package.0.7.10.tar.gz as an example.
- Prepare the basic Kubernetes environment.
Component Introduction¶
Metax provides two helm-chart packages: metax-extensions and gpu-operator. Depending on the usage scenario, different components can be selected for installation.
- Metax-extensions: Includes two components, gpu-device and gpu-label. When using the Metax-extensions solution, the user's application container image needs to be built based on the MXMACA® base image. Moreover, Metax-extensions is only suitable for scenarios using the full GPU.
- gpu-operator: Includes components such as gpu-device, gpu-label, driver-manager, container-runtime, and operator-controller. When using the gpu-operator solution, users can choose to create application container images that do not include the MXMACA® SDK. The gpu-operator is suitable for both full GPU and vGPU scenarios.
Operation Steps¶
-
Extract the following from the
/home/metax/metax-docs/k8s/metax-gpu-k8s-package.0.7.10.tar.gz
file:- deploy-gpu-extensions.yaml # Deployment YAML
- metax-gpu-extensions-0.7.10.tgz, metax-operator-0.7.10.tgz # Helm chart files
- metax-k8s-images.0.7.10.run # Offline image
-
Check if the system has the driver installed:
$ lsmod | grep metax metax 1605632 0 ttm 86016 3 drm_vram_helper,metax,drm_ttm_helper drm 618496 7 drm_kms_helper,drm_vram_helper,ast,metax,drm_ttm_helper,ttm
- If no content is displayed, it indicates that the software package has not been installed. If content is displayed, it indicates that the software package has been installed.
- When using metax-operator, it is not recommended to pre-install the MXMACA kernel driver on worker nodes; if it has already been installed, there is no need to uninstall it.
-
Install the driver.
gpu-extensions¶
-
Push the image:
-
Push the Helm Chart:
-
Install metax-gpu-extensions on the AI computing platform.
After successful deployment, resources can be viewed on the node.
-
After successful modification, you can see the label with
Metax GPU
on the node.
gpu-operator¶
Known issues when installing gpu-operator
:
-
The images for the components
metax-operator
,gpu-label
,gpu-device
, andcontainer-runtime
must have theamd64
suffix. -
The image for the
metax-maca
component is not included in themetax-k8s-images.0.7.13.run
package and needs to be separately downloaded, such asmaca-mxc500-2.23.0.23-ubuntu20.04-x86_64.tar.xz
. After loading it, the image for themetax-maca
component needs to be modified again. -
The image for the
metax-driver
component needs to be downloaded fromhttps://pub-docstore.metax-tech.com:7001
as thek8s-driver-image.2.23.0.25.run
file, and then execute the commandk8s-driver-image.2.23.0.25.run push {registry}/metax
to push the image to the image repository. After pushing, modify the image address for themetax-driver
component.
Using GPU¶
After installation, you can use MetaX GPU in workloads. Note that after enabling the GPU, you need to select the GPU type as Metax GPU.
Enter the container and execute mx-smi
to view the GPU usage.