Skip to content

Improve llvm.ucmp.iN.i1 codegen (specifically the 1-bit inputs case) #129401

Open
@scottmcm

Description

@scottmcm

(Context: I was making a rustc PR and accidentally regressed bool::cmp by having it use llvm.ucmp, thus this bug that it would be nice if ucmp just was smart about it.)

llvm.ucmp.i8.i1(a, b) is actually the same as just zext(a) - zext(b): https://alive2.llvm.org/ce/z/oHq3bh

But today they don't codegen the same: https://llvm.godbolt.org/z/nxWdYhvTo

define noundef range(i8 -1, 2) i8 @src(i1 noundef zeroext %a, i1 noundef zeroext %b) unnamed_addr {
start:
  %0 = call i8 @llvm.ucmp.i8.i1(i1 %a, i1 %b)
  ret i8 %0
}

define noundef range(i8 -1, 2) i8 @tgt(i1 noundef zeroext %a, i1 noundef zeroext %b) unnamed_addr {
start:
  %aa = zext i1 %a to i8
  %bb = zext i1 %b to i8
  %0 = sub nsw i8 %aa, %bb
  ret i8 %0
}

on x64 gives

src:                                    # @src
        cmp     dil, sil
        seta    al
        sbb     al, 0
        ret
tgt:                                    # @tgt
        mov     eax, edi
        sub     al, sil
        ret

I don't know if it's better to InstSimplify the ucmp to sext(b) + zext(a) or to improve the codegen for the i1 case, but either way, it'd be nice if the intrinsic worked optimally for i1 in addition to the wider widths.

EDIT: based on comments below, sound like it'd be better to have the codegen special-case this, rather than optimize it away in the middle-end.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions