-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
what is the correct way to enable MIG for the GPU card via Gpu operator #1197
Comments
Hope the following will be helpful
Refer manual and create the config profile based on your gpu model
to enable mig in the node |
Thanks @shan100github . I was trying to add the following so that I can use this configuraiton to enable all the cards with MIG only. But this method requires mig manager to be enabled right?
For my use case, the exact MIG profile is to be managed by a 3rd party application, which requires mig manager to be disabled. So in this case, this method will not work right? |
@okyspace could you please share the 3rd party application name? |
Hi, I have an openshift cluster installed with nvidia gpu operator.
In this openshift cluster, I have requirements to set mig manager to false and config the GPU cards in the node to be MIG enabled but the MIG profiles are to be managed by a 3rd party application. It was advised to run nvidia-smi -i -mig=1 directly in the node. I have tried and it seems to work but after the node is rebooted, the nvidia pods in the node are stuck in init stage. From the logs it seems that the necessary labels are not added to the node by gpu operator and thus the validation pods cannot complete the validations. Thereafter all other pods are stuck at init.
With the above, I suspects that the issue might be how we configure the MIG, so trying to find the correct way to do so.
Looking for advice if there could be other possible issues.
The text was updated successfully, but these errors were encountered: