-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[X86] Possible missed optimization with truncating vector cast #81883
Labels
Comments
@llvm/issue-subscribers-backend-x86 Author: Collyn O'Kane (okaneco)
This function produces the following assembly
https://llvm.godbolt.org/z/q8YEhevbo
```llvm
define void @cast_i16x4_to_u8x4(ptr sret(<4 x i8>) %_0, ptr %x) unnamed_addr #0 {
%1 = load <4 x i16>, ptr %x
%2 = trunc <4 x i16> %1 to <4 x i8>
store <4 x i8> %2, ptr %_0
ret void
}
```
```asm
cast_i16x4_to_u8x4: # @cast_i16x4_to_u8x4
mov rax, rdi
mov rcx, qword ptr [rsi]
movq xmm0, rcx
movdqa xmmword ptr [rsp - 24], xmm0
movzx ecx, cl
movzx edx, byte ptr [rsp - 22]
shl edx, 8
or edx, ecx
movzx ecx, byte ptr [rsp - 20]
shl ecx, 16
or ecx, edx
movzx edx, byte ptr [rsp - 18]
shl edx, 24
or edx, ecx
mov dword ptr [rdi], edx
ret
```
but I expected something like this instead.
```asm
.LCPI0_0:
.short 255 # 0xff
.short 255 # 0xff
.short 255 # 0xff
.short 255 # 0xff
.short 255 # 0xff
.short 255 # 0xff
.short 255 # 0xff
.short 255 # 0xff
cast_i16x4_to_u8x4: # @cast_i16x4_to_u8x4
mov rax, rdi
movq xmm0, qword ptr [rsi]
pand xmm0, xmmword ptr [rip + .LCPI0_0]
packuswb xmm0, xmm0
movd dword ptr [rdi], xmm0
ret
```
Original Rust code https://rust.godbolt.org/z/WrKdjx9sa |
Even x86-64-v4 scalarizes this, despite it being able to have done a simple zextload/truncstore combo |
RKSimon
added a commit
to RKSimon/llvm-project
that referenced
this issue
Feb 27, 2024
…irectly We were scalarizing these truncations, but in most cases we can widen the source vector to 128-bits and perform the truncation as a shuffle directly (which will usually lower as a PACK or PSHUFB). For the cases where the widening and shuffle isn't legal we can leave it to generic legalization to scalarize for us. Fixes llvm#81883
RKSimon
added a commit
to RKSimon/llvm-project
that referenced
this issue
Feb 27, 2024
…irectly We were scalarizing these truncations, but in most cases we can widen the source vector to 128-bits and perform the truncation as a shuffle directly (which will usually lower as a PACK or PSHUFB). For the cases where the widening and shuffle isn't legal we can leave it to generic legalization to scalarize for us. Fixes llvm#81883
fadlyas07
pushed a commit
to greenforce-project/llvm-project
that referenced
this issue
Feb 27, 2024
…irectly (llvm#83120) We were scalarizing these truncations, but in most cases we can widen the source vector to 128-bits and perform the truncation as a shuffle directly (which will usually lower as a PACK or PSHUFB). For the cases where the widening and shuffle isn't legal we can leave it to generic legalization to scalarize for us. Fixes llvm#81883
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This function produces the following assembly
https://llvm.godbolt.org/z/q8YEhevbo
but I expected something like this instead.
Original Rust code https://rust.godbolt.org/z/WrKdjx9sa
Originally found in Rust
portable-simd
code for casting ani16x4
tou8x4
rust-lang/portable-simd#369 (comment)The text was updated successfully, but these errors were encountered: