Skip to content

Enabling MIG Features

This section describes how to enable NVIDIA MIG features. NVIDIA currently provides two strategies for exposing MIG devices on Kubernetes nodes:

  • Single mode : Nodes expose a single type of MIG device on all their GPUs.
  • Mixed mode : Nodes expose a mixture of MIG device types on all their GPUs.

For more details, refer to the NVIDIA GPU Usage Modes.

Prerequisites

  • Check the system requirements for the GPU driver installation on the target node: GPU Support Matrix
  • Ensure that the cluster nodes have GPUs of the proper models (NVIDIA H100, A100, and A30 Tensor Core GPUs). For more information, see the GPU Support Matrix.
  • All GPUs on the nodes must belong to the same product line (e.g., A100-SXM-40GB).

Install GPU Operator Addon

Parameter Configuration

When installing the Operator, you need to set the MigManager Config parameter accordingly. The default setting is default-mig-parted-config. You can also customize the sharding policy configuration file:

single

Custom Sharding Policy

  ## Custom GI Instance Configuration
  all-disabled:
    - devices: all
      mig-enabled: false
  all-enabled:
    - devices: all
      mig-enabled: true
      mig-devices: {}
  all-1g.10gb:
    - devices: all
      mig-enabled: true
      mig-devices:
        1g.5gb: 7
  all-1g.10gb.me:
    - devices: all
      mig-enabled: true
      mig-devices:
        1g.10gb+me: 1
  all-1g.20gb:
    - devices: all
      mig-enabled: true
      mig-devices:
        1g.20gb: 4
  all-2g.20gb:
    - devices: all
      mig-enabled: true
      mig-devices:
        2g.20gb: 3
  all-3g.40gb:
    - devices: all
      mig-enabled: true
      mig-devices:
        3g.40gb: 2
  all-4g.40gb:
    - devices: all
      mig-enabled: true
      mig-devices:
        4g.40gb: 1
  all-7g.80gb:
    - devices: all
      mig-enabled: true
      mig-devices:
        7g.80gb: 1
  all-balanced:
    - device-filter: ["0x233110DE", "0x232210DE", "0x20B210DE", "0x20B510DE", "0x20F310DE", "0x20F510DE"]
      devices: all
      mig-enabled: true
      mig-devices:
        1g.10gb: 2
        2g.20gb: 1
        3g.40gb: 1
  # After setting, CI instances will be partitioned according to the specified configuration
  custom-config:
    - devices: all
      mig-enabled: true
      mig-devices:
        3g.40gb: 2

In the above YAML, set custom-config to partition CI instances according to the specifications.

custom-config:
  - devices: all
    mig-enabled: true
    mig-devices:
      1c.3g.40gb: 6

After completing the settings, you can use GPU MIG resources when confirming the deployment of the application.

Switch Node GPU Mode

After successfully installing the GPU operator, the node is in full card mode by default. There will be an indicator on the node management page, as shown below:

mixed

Click the at the right side of the node list, select a GPU mode to switch, and then choose the proper MIG mode and sharding policy. Here, we take MIXED mode as an example:

mig

There are two configurations here:

  1. MIG Policy: Mixed and Single.
  2. Sharding Policy: The policy here needs to match the key in the default-mig-parted-config (or user-defined sharding policy) configuration file.

After clicking OK button, wait for about a minute and refresh the page. The MIG mode will be switched to: