Enabling MIG Features¶
This section describes how to enable NVIDIA MIG features. NVIDIA currently provides two strategies for exposing MIG devices on Kubernetes nodes:
- Single mode : Nodes expose a single type of MIG device on all their GPUs.
- Mixed mode : Nodes expose a mixture of MIG device types on all their GPUs.
For more details, refer to the NVIDIA GPU Usage Modes.
Prerequisites¶
- Check the system requirements for the GPU driver installation on the target node: GPU Support Matrix
- Ensure that the cluster nodes have GPUs of the proper models (NVIDIA H100, A100, and A30 Tensor Core GPUs). For more information, see the GPU Support Matrix.
- All GPUs on the nodes must belong to the same product line (e.g., A100-SXM-40GB).
Install GPU Operator Addon¶
Parameter Configuration¶
When installing the Operator, you need to set the MigManager Config parameter accordingly. The default setting is default-mig-parted-config. You can also customize the sharding policy configuration file:
Custom Sharding Policy¶
## Custom GI Instance Configuration
all-disabled:
- devices: all
mig-enabled: false
all-enabled:
- devices: all
mig-enabled: true
mig-devices: {}
all-1g.10gb:
- devices: all
mig-enabled: true
mig-devices:
1g.5gb: 7
all-1g.10gb.me:
- devices: all
mig-enabled: true
mig-devices:
1g.10gb+me: 1
all-1g.20gb:
- devices: all
mig-enabled: true
mig-devices:
1g.20gb: 4
all-2g.20gb:
- devices: all
mig-enabled: true
mig-devices:
2g.20gb: 3
all-3g.40gb:
- devices: all
mig-enabled: true
mig-devices:
3g.40gb: 2
all-4g.40gb:
- devices: all
mig-enabled: true
mig-devices:
4g.40gb: 1
all-7g.80gb:
- devices: all
mig-enabled: true
mig-devices:
7g.80gb: 1
all-balanced:
- device-filter: ["0x233110DE", "0x232210DE", "0x20B210DE", "0x20B510DE", "0x20F310DE", "0x20F510DE"]
devices: all
mig-enabled: true
mig-devices:
1g.10gb: 2
2g.20gb: 1
3g.40gb: 1
# After setting, CI instances will be partitioned according to the specified configuration
custom-config:
- devices: all
mig-enabled: true
mig-devices:
3g.40gb: 2
In the above YAML, set custom-config to partition CI instances according to the specifications.
After completing the settings, you can use GPU MIG resources when confirming the deployment of the application.
Switch Node GPU Mode¶
After successfully installing the GPU operator, the node is in full card mode by default. There will be an indicator on the node management page, as shown below:
Click the ┇ at the right side of the node list, select a GPU mode to switch, and then choose the proper MIG mode and sharding policy. Here, we take MIXED mode as an example:
There are two configurations here:
- MIG Policy: Mixed and Single.
- Sharding Policy: The policy here needs to match the key in the default-mig-parted-config (or user-defined sharding policy) configuration file.
After clicking OK button, wait for about a minute and refresh the page. The MIG mode will be switched to: