[LoongArch] Fix fp_to_uint/fp_to_sint conversion errors for lasx#137129

tangaac · 2025-04-24T07:45:18Z

Prvious fp_to_uint/fp_to_sint patterns for v4f64 -> v4i32 are wrong.
Conversion error was triggered after pr #126456.

llvmbot · 2025-04-24T07:45:52Z

@llvm/pr-subscribers-backend-loongarch

Author: None (tangaac)

Changes

Prvious fp_to_uint/fp_to_sint patterns for v4f64 -> v4i32 are wrong.
Conversion error was triggered after pr #126456.

Full diff: https://github.com/llvm/llvm-project/pull/137129.diff

3 Files Affected:

(modified) llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td (+14-14)
(modified) llvm/test/CodeGen/LoongArch/lasx/ir-instruction/fptosi.ll (+2-2)
(modified) llvm/test/CodeGen/LoongArch/lasx/ir-instruction/fptoui.ll (+2-2)

diff --git a/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td index e4268920e0b27..fe08c1050b4d7 100644 --- a/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td+++ b/llvm/lib/Target/LoongArch/LoongArchLASXInstrInfo.td@@ -1815,24 +1815,24 @@ def : Pat<(v4f32 (uint_to_fp v4i64:$vj)), // XVFTINTRZ_{W_S/L_D} def : Pat<(v8i32 (fp_to_sint v8f32:$vj)), (XVFTINTRZ_W_S v8f32:$vj)>; def : Pat<(v4i64 (fp_to_sint v4f64:$vj)), (XVFTINTRZ_L_D v4f64:$vj)>; -def : Pat<(v4i64 (fp_to_sint v4f32:$vj)),- (VEXT2XV_D_W (SUBREG_TO_REG (i64 0), (VFTINTRZ_W_S v4f32:$vj),- sub_128))>;-def : Pat<(v4i32 (fp_to_sint (v4f64 LASX256:$vj))),- (EXTRACT_SUBREG (XVFTINTRZ_W_S (XVFCVT_S_D (XVPERMI_D v4f64:$vj, 238),- v4f64:$vj)),- sub_128)>;+def : Pat<(v4i64(fp_to_sint v4f32:$vj)), (VEXT2XV_D_W(SUBREG_TO_REG(i64 0),+ (VFTINTRZ_W_S v4f32:$vj),+ sub_128))>;+def : Pat<(v4i32(fp_to_sint v4f64:$vj)),+ (EXTRACT_SUBREG(XVPICKEV_W(XVPERMI_D(XVFTINTRZ_L_D v4f64:$vj), 238),+ (XVFTINTRZ_L_D v4f64:$vj)),+ sub_128)>; // XVFTINTRZ_{W_SU/L_DU} def : Pat<(v8i32 (fp_to_uint v8f32:$vj)), (XVFTINTRZ_WU_S v8f32:$vj)>; def : Pat<(v4i64 (fp_to_uint v4f64:$vj)), (XVFTINTRZ_LU_D v4f64:$vj)>; -def : Pat<(v4i64 (fp_to_uint v4f32:$vj)),- (VEXT2XV_DU_WU (SUBREG_TO_REG (i64 0), (VFTINTRZ_WU_S v4f32:$vj),- sub_128))>;-def : Pat<(v4i32 (fp_to_uint (v4f64 LASX256:$vj))),- (EXTRACT_SUBREG (XVFTINTRZ_W_S (XVFCVT_S_D (XVPERMI_D v4f64:$vj, 238),- v4f64:$vj)),- sub_128)>;+def : Pat<(v4i64(fp_to_uint v4f32:$vj)), (VEXT2XV_DU_WU(SUBREG_TO_REG(i64 0),+ (VFTINTRZ_WU_S v4f32:$vj),+ sub_128))>;+def : Pat<(v4i32(fp_to_uint v4f64:$vj)),+ (EXTRACT_SUBREG(XVPICKEV_W(XVPERMI_D(XVFTINTRZ_LU_D v4f64:$vj), 238),+ (XVFTINTRZ_LU_D v4f64:$vj)),+ sub_128)>; // XVPERMI_Q foreach vt = [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64] in diff --git a/llvm/test/CodeGen/LoongArch/lasx/ir-instruction/fptosi.ll b/llvm/test/CodeGen/LoongArch/lasx/ir-instruction/fptosi.ll index 0d9f57b57ffae..ed333c303879c 100644 --- a/llvm/test/CodeGen/LoongArch/lasx/ir-instruction/fptosi.ll+++ b/llvm/test/CodeGen/LoongArch/lasx/ir-instruction/fptosi.ll@@ -31,9 +31,9 @@ define void @fptosi_v4f64_v4i32(ptr %res, ptr %in){ ; CHECK-LABEL: fptosi_v4f64_v4i32: ; CHECK: # %bb.0: ; CHECK-NEXT: xvld $xr0, $a1, 0 +; CHECK-NEXT: xvftintrz.l.d $xr0, $xr0 ; CHECK-NEXT: xvpermi.d $xr1, $xr0, 238 -; CHECK-NEXT: xvfcvt.s.d $xr0, $xr1, $xr0-; CHECK-NEXT: xvftintrz.w.s $xr0, $xr0+; CHECK-NEXT: xvpickev.w $xr0, $xr1, $xr0 ; CHECK-NEXT: vst $vr0, $a0, 0 ; CHECK-NEXT: ret %v0 = load <4 x double>, ptr %in diff --git a/llvm/test/CodeGen/LoongArch/lasx/ir-instruction/fptoui.ll b/llvm/test/CodeGen/LoongArch/lasx/ir-instruction/fptoui.ll index 27d70f33cd34e..9c499ba71d646 100644 --- a/llvm/test/CodeGen/LoongArch/lasx/ir-instruction/fptoui.ll+++ b/llvm/test/CodeGen/LoongArch/lasx/ir-instruction/fptoui.ll@@ -31,9 +31,9 @@ define void @fptoui_v4f64_v4i32(ptr %res, ptr %in){ ; CHECK-LABEL: fptoui_v4f64_v4i32: ; CHECK: # %bb.0: ; CHECK-NEXT: xvld $xr0, $a1, 0 +; CHECK-NEXT: xvftintrz.lu.d $xr0, $xr0 ; CHECK-NEXT: xvpermi.d $xr1, $xr0, 238 -; CHECK-NEXT: xvfcvt.s.d $xr0, $xr1, $xr0-; CHECK-NEXT: xvftintrz.w.s $xr0, $xr0+; CHECK-NEXT: xvpickev.w $xr0, $xr1, $xr0 ; CHECK-NEXT: vst $vr0, $a0, 0 ; CHECK-NEXT: ret %v0 = load <4 x double>, ptr %in

heiher · 2025-04-24T09:36:21Z

llvm/test/CodeGen/LoongArch/lasx/ir-instruction/fptosi.ll

 ; CHECK-NEXT: xvpermi.d $xr1, $xr0, 238
-; CHECK-NEXT: xvfcvt.s.d $xr0, $xr1, $xr0
-; CHECK-NEXT: xvftintrz.w.s $xr0, $xr0
+; CHECK-NEXT: xvpickev.w $xr0, $xr1, $xr0
 ; CHECK-NEXT: vst $vr0, $a0, 0


The previous impl appears to convert f64 to f32 before performing the integer conversion, which introduces a loss of precision at the f64 -> f32 step.
d: float64 s: float32 w: int32 *a1: d0, d1, d2, d3 xr0: d3, d2, d1, d0 // xvld $xr0, $a1, 0 xr1: d3, d2, d3, d2 // xvpermi.d $xr1, $xr0, 238 xr0: s3, s2, s3, s2, s3, s2, s1, s0 // xvfcvt.s.d $xr0, $xr1, $xr0 ^ +-- Loss of precision at this step xr0: w3, w2, w3, w2, w3, w2, w1, w0 // xvftintrz.w.s $xr0, $xr0 vr0: w3, w2, w1, w0 // vst $vr0, $a0, 0 *a0: w0, w1, w2, w3
In the updated version, it seems the f64 values are first converted to i64 and then truncated to u32. This can produce incorrect results when the original f64 values exceed the range representable by (signed) i32.
Would it make sense to go with a direct f64 -> i32 conversion instead?
*a1: d0, d1, d2, d3 xr0: d3, d2, d1, d0 // xvld $xr0, $a1, 0 xr1: d3, d2, d3, d2 // xvpermi.d $xr1, $xr0, 238 xr0: w3, w2, w3, w2, w3, w2, w1, w0 // xvftintrz.w.d $xr0, $xr1, $xr0 vr0: w3, w2, w1, w0 // vst $vr0, $a0, 0 *a0: w0, w1, w2, w3

3A6000 latency throughput
xvftintrz.w.d 5 2
xvftintrz.l.d 4 4
xvpickev.w 1 4
xvftintrz.w.d or xvftintrz.l.d + xvpickev.w
which one is faster?

Just to correct my earlier comment - it was inaccurate and has been deleted. :P
According to the LLVM IR fptosi spec:
If the value cannot fit in ty2, the result is a poison value.
So unless overflow/invalid recording is required, using xvftintrz.l.d + xvpickev.w is also valid, and this sequence may offer better throughput.

heiher · 2025-04-24T10:06:48Z

llvm/test/CodeGen/LoongArch/lasx/ir-instruction/fptoui.ll

 ; CHECK-NEXT: xvpermi.d $xr1, $xr0, 238
-; CHECK-NEXT: xvfcvt.s.d $xr0, $xr1, $xr0
-; CHECK-NEXT: xvftintrz.w.s $xr0, $xr0
+; CHECK-NEXT: xvpickev.w $xr0, $xr1, $xr0
 ; CHECK-NEXT: vst $vr0, $a0, 0


LGTM.
d: float64 l: int64 *a1: d0, d1, d2, d3 xr0: d3, d2, d1, d0 // xvld $xr0, $a1, 0 xr0: l3, l2, l1, l0 // xvftintrz.lu.d $xr0, $xr0 xr0: {l3h,l3l}, {l2h,l2l}, {l1h,l1l}, {l0h,l0l} xr1: l3, l2, l3, l2 // xvpermi.d $xr1, $xr0, 238 xr1: {l3h,l3l}, {l2h,l2l}, {l3h,l3l}, {l2h,l2l} xr0: l3l, l2l, l3l, l2l, l3l, l2l, l1l, l0l // xvpickev.w $xr0, $xr1, $xr0 vr0: l3l, l2l, l1l, l0l // vst $vr0, $a0, 0 *a0: l0l, l1l, l2l, l3l

Fix fp_to_uint/fp_to_sint conversion errors for lasx
3cacf7a

llvmbot added the backend:loongarch label Apr 24, 2025

tangaac requested review from heiher, wangleiat and SixWeining April 24, 2025 07:45

heiher reviewed Apr 24, 2025
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LoongArch] Fix fp_to_uint/fp_to_sint conversion errors for lasx#137129

[LoongArch] Fix fp_to_uint/fp_to_sint conversion errors for lasx #137129

tangaac commented Apr 24, 2025

llvmbot commented Apr 24, 2025

heiherApr 24, 2025•
edited
Loading

tangaacApr 25, 2025•
edited
Loading

heiherApr 25, 2025

heiherApr 24, 2025•
edited
Loading

[LoongArch] Fix fp_to_uint/fp_to_sint conversion errors for lasx#137129

Are you sure you want to change the base?

[LoongArch] Fix fp_to_uint/fp_to_sint conversion errors for lasx #137129

Conversation

tangaac commented Apr 24, 2025

llvmbot commented Apr 24, 2025

heiherApr 24, 2025• edited Loading

Choose a reason for hiding this comment

tangaacApr 25, 2025• edited Loading

Choose a reason for hiding this comment

heiherApr 25, 2025

Choose a reason for hiding this comment

heiherApr 24, 2025• edited Loading

Choose a reason for hiding this comment

heiherApr 24, 2025•
edited
Loading

tangaacApr 25, 2025•
edited
Loading

heiherApr 24, 2025•
edited
Loading