-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer halves shuffle pattern produces worse codegen with canonical IR #122425
Comments
@llvm/issue-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm)
While investigating a regressions related to lowered shufflevector as an integer, I noticed this code does a worse job with canonical IR.
For old AMDGPU targets without v_perm_b32, the there is an extra instruction. With perm it's a neutral net result (though I still prefer the non-canonical output, since it involves fewer steps in codegen since it doesn't rely on the SDWA pass to clean up an extra instruction). X86 also has one additional instruction in the canonical case
|
@llvm/issue-subscribers-backend-x86 Author: Matt Arsenault (arsenm)
While investigating a regressions related to lowered shufflevector as an integer, I noticed this code does a worse job with canonical IR.
For old AMDGPU targets without v_perm_b32, the there is an extra instruction. With perm it's a neutral net result (though I still prefer the non-canonical output, since it involves fewer steps in codegen since it doesn't rely on the SDWA pass to clean up an extra instruction). X86 also has one additional instruction in the canonical case
|
While investigating a regressions related to lowered shufflevector as an integer, I noticed this code does a worse job with canonical IR.
For old AMDGPU targets without v_perm_b32, the there is an extra instruction. With perm it's a neutral net result (though I still prefer the non-canonical output, since it involves fewer steps in codegen since it doesn't rely on the SDWA pass to clean up an extra instruction).
X86 also has one additional instruction in the canonical case
The text was updated successfully, but these errors were encountered: