-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[VPlan] Introduce VPInstructionWithType, use instead of VPScalarCast(NFC) #129706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…(NFC) There are some opcodes that currently require specialized recipes, due to their result type not being implied by their operands, including casts. This leads to duplication from defining multiple full recipes. This patch introduces a new VPInstructionWithType subclass that also stores the result type. The general idea is to have opcodes needing to specify a result type to use this general recipe. The current patch replaces VPScalarCastRecipe with VInstructionWithType, a similar patch for VPWidenCastRecipe will follow soon. There are a few proposed opcodes that should also benefit, without the need of workarounds: * llvm#129508 * llvm#119284
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesThere are some opcodes that currently require specialized recipes, due to their result type not being implied by their operands, including casts. This leads to duplication from defining multiple full recipes. This patch introduces a new VPInstructionWithType subclass that also stores the result type. The general idea is to have opcodes needing to specify a result type to use this general recipe. The current patch replaces VPScalarCastRecipe with VInstructionWithType, a similar patch for VPWidenCastRecipe will follow soon. There are a few proposed opcodes that should also benefit, without the need of workarounds:
Patch is 34.01 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/129706.diff 15 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index ed3e45dd2c6c8..45e4fcad01b6c 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -246,10 +246,10 @@ class VPBuilder {
new VPDerivedIVRecipe(Kind, FPBinOp, Start, Current, Step, Name));
}
- VPScalarCastRecipe *createScalarCast(Instruction::CastOps Opcode, VPValue *Op,
- Type *ResultTy, DebugLoc DL) {
+ VPInstruction *createScalarCast(Instruction::CastOps Opcode, VPValue *Op,
+ Type *ResultTy, DebugLoc DL) {
return tryInsertInstruction(
- new VPScalarCastRecipe(Opcode, Op, ResultTy, DL));
+ new VPInstructionWithType(Opcode, Op, ResultTy, DL));
}
VPWidenCastRecipe *createWidenCast(Instruction::CastOps Opcode, VPValue *Op,
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index cb860a472d8f7..1f15b96cd6518 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4502,7 +4502,6 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
switch (R.getVPDefID()) {
case VPDef::VPDerivedIVSC:
case VPDef::VPScalarIVStepsSC:
- case VPDef::VPScalarCastSC:
case VPDef::VPReplicateSC:
case VPDef::VPInstructionSC:
case VPDef::VPCanonicalIVPHISC:
@@ -10396,8 +10395,10 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
assert(all_of(IV->users(),
[](const VPUser *U) {
return isa<VPScalarIVStepsRecipe>(U) ||
- isa<VPScalarCastRecipe>(U) ||
isa<VPDerivedIVRecipe>(U) ||
+ Instruction::isCast(
+ cast<VPInstruction>(U)->getOpcode()) ||
+
cast<VPInstruction>(U)->getOpcode() ==
Instruction::Add;
}) &&
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index b1288c42b20f2..4ae34e3e4d552 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -533,7 +533,6 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
case VPRecipeBase::VPWidenIntOrFpInductionSC:
case VPRecipeBase::VPWidenPointerInductionSC:
case VPRecipeBase::VPReductionPHISC:
- case VPRecipeBase::VPScalarCastSC:
case VPRecipeBase::VPScalarPHISC:
case VPRecipeBase::VPPartialReductionSC:
return true;
@@ -1026,6 +1025,56 @@ class VPInstruction : public VPRecipeWithIRFlags,
StringRef getName() const { return Name; }
};
+/// A specialization of VPInstruction augmenting it with a dedicated result
+/// type, to be used when the opcode and operands of the VPInstruction don't
+/// directly determine the result type.
+class VPInstructionWithType : public VPInstruction {
+ /// Scalar result type produced by the recipe.
+ Type *ResultTy;
+
+ Value *generate(VPTransformState &State);
+
+public:
+ VPInstructionWithType(unsigned Opcode, ArrayRef<VPValue *> Operands,
+ Type *ResultTy, DebugLoc DL, const Twine &Name = "")
+ : VPInstruction(Opcode, Operands, DL, Name), ResultTy(ResultTy) {}
+
+ static inline bool classof(const VPRecipeBase *R) {
+ auto *VPI = dyn_cast<VPInstruction>(R);
+ return VPI && Instruction::isCast(VPI->getOpcode());
+ }
+
+ static inline bool classof(const VPUser *R) {
+ return isa<VPInstructionWithType>(cast<VPRecipeBase>(R));
+ }
+
+ VPInstruction *clone() override {
+ auto *New =
+ new VPInstructionWithType(getOpcode(), {getOperand(0)}, getResultType(),
+ getDebugLoc(), getName());
+ New->setUnderlyingValue(getUnderlyingValue());
+ return New;
+ }
+
+ void execute(VPTransformState &State) override;
+
+ /// Return the cost of this VPIRInstruction.
+ InstructionCost computeCost(ElementCount VF,
+ VPCostContext &Ctx) const override {
+ return 0;
+ }
+
+ Type *getResultType() const { return ResultTy; }
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+ /// Print the recipe.
+ void print(raw_ostream &O, const Twine &Indent,
+ VPSlotTracker &SlotTracker) const override;
+#endif
+
+ bool onlyFirstLaneUsed(const VPValue *Op) const override;
+};
+
/// A recipe to wrap on original IR instruction not to be modified during
/// execution, execept for PHIs. For PHIs, a single VPValue operand is allowed,
/// and it is used to add a new incoming value for the single predecessor VPBB.
@@ -1183,54 +1232,6 @@ class VPWidenCastRecipe : public VPRecipeWithIRFlags {
Type *getResultType() const { return ResultTy; }
};
-/// VPScalarCastRecipe is a recipe to create scalar cast instructions.
-class VPScalarCastRecipe : public VPSingleDefRecipe {
- Instruction::CastOps Opcode;
-
- Type *ResultTy;
-
- Value *generate(VPTransformState &State);
-
-public:
- VPScalarCastRecipe(Instruction::CastOps Opcode, VPValue *Op, Type *ResultTy,
- DebugLoc DL)
- : VPSingleDefRecipe(VPDef::VPScalarCastSC, {Op}, DL), Opcode(Opcode),
- ResultTy(ResultTy) {}
-
- ~VPScalarCastRecipe() override = default;
-
- VPScalarCastRecipe *clone() override {
- return new VPScalarCastRecipe(Opcode, getOperand(0), ResultTy,
- getDebugLoc());
- }
-
- VP_CLASSOF_IMPL(VPDef::VPScalarCastSC)
-
- void execute(VPTransformState &State) override;
-
- /// Return the cost of this VPScalarCastRecipe.
- InstructionCost computeCost(ElementCount VF,
- VPCostContext &Ctx) const override {
- // TODO: Compute accurate cost after retiring the legacy cost model.
- return 0;
- }
-
-#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- void print(raw_ostream &O, const Twine &Indent,
- VPSlotTracker &SlotTracker) const override;
-#endif
-
- /// Returns the result type of the cast.
- Type *getResultType() const { return ResultTy; }
-
- bool onlyFirstLaneUsed(const VPValue *Op) const override {
- // At the moment, only uniform codegen is implemented.
- assert(is_contained(operands(), Op) &&
- "Op must be an operand of the recipe");
- return true;
- }
-};
-
/// A recipe for widening vector intrinsics.
class VPWidenIntrinsicRecipe : public VPRecipeWithIRFlags {
/// ID of the vector intrinsic to widen.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 6f6875f0e5e0e..bc81c6d1862d3 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -252,20 +252,17 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
VPPartialReductionRecipe>([this](const VPRecipeBase *R) {
return inferScalarType(R->getOperand(0));
})
+ .Case<VPInstructionWithType, VPWidenIntrinsicRecipe>(
+ [](const auto *R) { return R->getResultType(); })
.Case<VPBlendRecipe, VPInstruction, VPWidenRecipe, VPReplicateRecipe,
VPWidenCallRecipe, VPWidenMemoryRecipe, VPWidenSelectRecipe>(
[this](const auto *R) { return inferScalarTypeForRecipe(R); })
- .Case<VPWidenIntrinsicRecipe>([](const VPWidenIntrinsicRecipe *R) {
- return R->getResultType();
- })
.Case<VPInterleaveRecipe>([V](const VPInterleaveRecipe *R) {
// TODO: Use info from interleave group.
return V->getUnderlyingValue()->getType();
})
.Case<VPWidenCastRecipe>(
[](const VPWidenCastRecipe *R) { return R->getResultType(); })
- .Case<VPScalarCastRecipe>(
- [](const VPScalarCastRecipe *R) { return R->getResultType(); })
.Case<VPExpandSCEVRecipe>([](const VPExpandSCEVRecipe *R) {
return R->getSCEV()->getType();
})
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index d154d54c37862..ecdb5e6788bc9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -148,7 +148,6 @@ bool VPRecipeBase::mayHaveSideEffects() const {
switch (getVPDefID()) {
case VPDerivedIVSC:
case VPPredInstPHISC:
- case VPScalarCastSC:
case VPReverseVectorPointerSC:
return false;
case VPInstructionSC:
@@ -413,7 +412,7 @@ bool VPInstruction::doesGeneratePerAllLanes() const {
}
bool VPInstruction::canGenerateScalarForFirstLane() const {
- if (Instruction::isBinaryOp(getOpcode()))
+ if (Instruction::isBinaryOp(getOpcode()) || Instruction::isCast(getOpcode()))
return true;
if (isSingleScalar() || isVectorToScalar())
return true;
@@ -961,6 +960,43 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent,
}
#endif
+Value *VPInstructionWithType::generate(VPTransformState &State) {
+ State.setDebugLocFrom(getDebugLoc());
+ assert(vputils::onlyFirstLaneUsed(this) &&
+ "Codegen only implemented for first lane.");
+ switch (getOpcode()) {
+ case Instruction::SExt:
+ case Instruction::ZExt:
+ case Instruction::Trunc: {
+ // Note: SExt/ZExt not used yet.
+ Value *Op = State.get(getOperand(0), VPLane(0));
+ return State.Builder.CreateCast(Instruction::CastOps(getOpcode()), Op,
+ ResultTy);
+ }
+ default:
+ llvm_unreachable("opcode not implemented yet");
+ }
+}
+
+void VPInstructionWithType::execute(VPTransformState &State) {
+ State.set(this, generate(State), VPLane(0));
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPInstructionWithType::print(raw_ostream &O, const Twine &Indent,
+ VPSlotTracker &SlotTracker) const {
+ O << Indent << "EMIT ";
+ printAsOperand(O, SlotTracker);
+ O << " = " << Instruction::getOpcodeName(getOpcode()) << " ";
+ printOperands(O, SlotTracker);
+ O << " to " << *ResultTy;
+}
+#endif
+
+bool VPInstructionWithType::onlyFirstLaneUsed(const VPValue *Op) const {
+ return vputils::onlyFirstLaneUsed(this);
+}
+
void VPIRInstruction::execute(VPTransformState &State) {
assert((isa<PHINode>(&I) || getNumOperands() == 0) &&
"Only PHINodes can have extra operands");
@@ -2436,38 +2472,6 @@ void VPReplicateRecipe::print(raw_ostream &O, const Twine &Indent,
}
#endif
-Value *VPScalarCastRecipe ::generate(VPTransformState &State) {
- State.setDebugLocFrom(getDebugLoc());
- assert(vputils::onlyFirstLaneUsed(this) &&
- "Codegen only implemented for first lane.");
- switch (Opcode) {
- case Instruction::SExt:
- case Instruction::ZExt:
- case Instruction::Trunc: {
- // Note: SExt/ZExt not used yet.
- Value *Op = State.get(getOperand(0), VPLane(0));
- return State.Builder.CreateCast(Instruction::CastOps(Opcode), Op, ResultTy);
- }
- default:
- llvm_unreachable("opcode not implemented yet");
- }
-}
-
-void VPScalarCastRecipe ::execute(VPTransformState &State) {
- State.set(this, generate(State), VPLane(0));
-}
-
-#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
-void VPScalarCastRecipe ::print(raw_ostream &O, const Twine &Indent,
- VPSlotTracker &SlotTracker) const {
- O << Indent << "SCALAR-CAST ";
- printAsOperand(O, SlotTracker);
- O << " = " << Instruction::getOpcodeName(Opcode) << " ";
- printOperands(O, SlotTracker);
- O << " to " << *ResultTy;
-}
-#endif
-
void VPBranchOnMaskRecipe::execute(VPTransformState &State) {
State.setDebugLocFrom(getDebugLoc());
assert(State.Lane && "Branch on Mask works only on single instance.");
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
index 1a7322ec0aff6..e13ed0f6d986e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
@@ -113,7 +113,16 @@ bool vputils::isUniformAcrossVFsAndUFs(VPValue *V) {
all_of(R->operands(),
[](VPValue *Op) { return isUniformAcrossVFsAndUFs(Op); });
})
- .Case<VPScalarCastRecipe, VPWidenCastRecipe>([](const auto *R) {
+ .Case<VPInstruction>([](const auto *VPI) {
+ return Instruction::isCast(VPI->getOpcode())
+ ? all_of(VPI->operands(),
+ [](VPValue *Op) {
+ return isUniformAcrossVFsAndUFs(Op);
+ })
+ : false;
+ })
+
+ .Case<VPWidenCastRecipe>([](const auto *R) {
// A cast is uniform according to its operand.
return isUniformAcrossVFsAndUFs(R->getOperand(0));
})
diff --git a/llvm/lib/Transforms/Vectorize/VPlanValue.h b/llvm/lib/Transforms/Vectorize/VPlanValue.h
index 0a59b137bbd79..1777ab0b71093 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanValue.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanValue.h
@@ -332,7 +332,6 @@ class VPDef {
VPReductionSC,
VPPartialReductionSC,
VPReplicateSC,
- VPScalarCastSC,
VPScalarIVStepsSC,
VPVectorPointerSC,
VPReverseVectorPointerSC,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp b/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
index 1b3b69ea6a13d..882323d5a1e4f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
@@ -146,8 +146,8 @@ bool VPlanVerifier::verifyEVLRecipe(const VPInstruction &EVL) const {
.Case<VPWidenLoadEVLRecipe, VPReverseVectorPointerRecipe,
VPScalarPHIRecipe>(
[&](const VPRecipeBase *R) { return VerifyEVLUse(*R, 1); })
- .Case<VPScalarCastRecipe>(
- [&](const VPScalarCastRecipe *S) { return VerifyEVLUse(*S, 0); })
+ .Case<VPInstructionWithType>(
+ [&](const auto *S) { return VerifyEVLUse(*S, 0); })
.Case<VPInstruction>([&](const VPInstruction *I) {
if (I->getOpcode() != Instruction::Add) {
errs() << "EVL is used as an operand in non-VPInstruction::Add\n";
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-call-intrinsics.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-call-intrinsics.ll
index a213608857728..c81f650c338ed 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-call-intrinsics.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-call-intrinsics.ll
@@ -34,7 +34,7 @@ define void @vp_smax(ptr %a, ptr %b, ptr %c, i64 %N) {
; IF-EVL-NEXT: CLONE ir<[[GEP3:%.+]]> = getelementptr inbounds ir<%a>, vp<[[ST]]>
; IF-EVL-NEXT: vp<[[PTR3:%[0-9]+]]> = vector-pointer ir<[[GEP3]]>
; IF-EVL-NEXT: WIDEN vp.store vp<[[PTR3]]>, ir<[[SMAX]]>, vp<[[EVL]]>
-; IF-EVL-NEXT: SCALAR-CAST vp<[[CAST:%[0-9]+]]> = zext vp<[[EVL]]> to i64
+; IF-EVL-NEXT: EMIT vp<[[CAST:%[0-9]+]]> = zext vp<[[EVL]]> to i64
; IF-EVL-NEXT: EMIT vp<[[IV_NEXT]]> = add vp<[[CAST]]>, vp<[[EVL_PHI]]>
; IF-EVL-NEXT: EMIT vp<[[IV_NEXT_EXIT:%.+]]> = add vp<[[IV]]>, vp<[[VFUF]]>
; IF-EVL-NEXT: EMIT branch-on-count vp<[[IV_NEXT_EXIT]]>, vp<[[VTC]]>
@@ -90,7 +90,7 @@ define void @vp_smin(ptr %a, ptr %b, ptr %c, i64 %N) {
; IF-EVL-NEXT: CLONE ir<[[GEP3:%.+]]> = getelementptr inbounds ir<%a>, vp<[[ST]]>
; IF-EVL-NEXT: vp<[[PTR3:%[0-9]+]]> = vector-pointer ir<[[GEP3]]>
; IF-EVL-NEXT: WIDEN vp.store vp<[[PTR3]]>, ir<[[SMIN]]>, vp<[[EVL]]>
-; IF-EVL-NEXT: SCALAR-CAST vp<[[CAST:%[0-9]+]]> = zext vp<[[EVL]]> to i64
+; IF-EVL-NEXT: EMIT vp<[[CAST:%[0-9]+]]> = zext vp<[[EVL]]> to i64
; IF-EVL-NEXT: EMIT vp<[[IV_NEXT]]> = add vp<[[CAST]]>, vp<[[EVL_PHI]]>
; IF-EVL-NEXT: EMIT vp<[[IV_NEXT_EXIT:%.+]]> = add vp<[[IV]]>, vp<[[VFUF]]>
; IF-EVL-NEXT: EMIT branch-on-count vp<[[IV_NEXT_EXIT]]>, vp<[[VTC]]>
@@ -146,7 +146,7 @@ define void @vp_umax(ptr %a, ptr %b, ptr %c, i64 %N) {
; IF-EVL-NEXT: CLONE ir<[[GEP3:%.+]]> = getelementptr inbounds ir<%a>, vp<[[ST]]>
; IF-EVL-NEXT: vp<[[PTR3:%[0-9]+]]> = vector-pointer ir<[[GEP3]]>
; IF-EVL-NEXT: WIDEN vp.store vp<[[PTR3]]>, ir<[[UMAX]]>, vp<[[EVL]]>
-; IF-EVL-NEXT: SCALAR-CAST vp<[[CAST:%[0-9]+]]> = zext vp<[[EVL]]> to i64
+; IF-EVL-NEXT: EMIT vp<[[CAST:%[0-9]+]]> = zext vp<[[EVL]]> to i64
; IF-EVL-NEXT: EMIT vp<[[IV_NEXT]]> = add vp<[[CAST]]>, vp<[[EVL_PHI]]>
; IF-EVL-NEXT: EMIT vp<[[IV_NEXT_EXIT:%.+]]> = add vp<[[IV]]>, vp<[[VFUF]]>
; IF-EVL-NEXT: EMIT branch-on-count vp<[[IV_NEXT_EXIT]]>, vp<[[VTC]]>
@@ -202,7 +202,7 @@ define void @vp_umin(ptr %a, ptr %b, ptr %c, i64 %N) {
; IF-EVL-NEXT: CLONE ir<[[GEP3:%.+]]> = getelementptr inbounds ir<%a>, vp<[[ST]]>
; IF-EVL-NEXT: vp<[[PTR3:%[0-9]+]]> = vector-pointer ir<[[GEP3]]>
; IF-EVL-NEXT: WIDEN vp.store vp<[[PTR3]]>, ir<[[UMIN]]>, vp<[[EVL]]>
-; IF-EVL-NEXT: SCALAR-CAST vp<[[CAST:%[0-9]+]]> = zext vp<[[EVL]]> to i64
+; IF-EVL-NEXT: EMIT vp<[[CAST:%[0-9]+]]> = zext vp<[[EVL]]> to i64
; IF-EVL-NEXT: EMIT vp<[[IV_NEXT]]> = add vp<[[CAST]]>, vp<[[EVL_PHI]]>
; IF-EVL-NEXT: EMIT vp<[[IV_NEXT_EXIT:%.+]]> = add vp<[[IV]]>, vp<[[VFUF]]>
; IF-EVL-NEXT: EMIT branch-on-count vp<[[IV_NEXT_EXIT]]>, vp<[[VTC]]>
@@ -255,7 +255,7 @@ define void @vp_ctlz(ptr %a, ptr %b, i64 %N) {
; IF-EVL-NEXT: CLONE ir<[[GEP2:%.+]]> = getelementptr inbounds ir<%a>, vp<[[ST]]>
; IF-EVL-NEXT: vp<[[PTR2:%[0-9]+]]> = vector-pointer ir<[[GEP2]]>
; IF-EVL-NEXT: WIDEN vp.store vp<[[PTR2]]>, ir<[[CTLZ]]>, vp<[[EVL]]>
-; IF-EVL-NEXT: SCALAR-CAST vp<[[CAST:%[0-9]+]]> = zext vp<[[EVL]]> to i64
+; IF-EVL-NEXT: EMIT vp<[[CAST:%[0-9]+]]> = zext vp<[[EVL]]> to i64
; IF-EVL-NEXT: EMIT vp<[[IV_NEXT]]> = add vp<[[CAST]]>, vp<[[EVL_PHI]]>
; IF-EVL-NEXT: EMIT vp<[[IV_NEXT_EXIT:%.+]]> = add vp<[[IV]]>, vp<[[VFUF]]>
; IF-EVL-NEXT: EMIT branch-on-count vp<[[IV_NEXT_EXIT]]>, vp<[[VTC]]>
@@ -306,7 +306,7 @@ define void @vp_cttz(ptr %a, ptr %b, i64 %N) {
; IF-EVL-NEXT: CLONE ir<[[GEP2:%.+]]> = getelementptr inbounds ir<%a>, vp<[[ST]]>
; IF-EVL-NEXT: vp<[[PTR2:%[0-9]+]]> = vector-pointer ir<[[GEP2]]>
; IF-EVL-NEXT: WIDEN vp.store vp<[[PTR2]]>, ir<[[CTTZ]]>, vp<[[EVL]]>
-; IF-EVL-NEXT: SCALAR-CAST vp<[[CAST:%[0-9]+]]> = zext vp<[[EVL]]> to i64
+; IF-EVL-NEXT: EMIT vp<[[CAST:%[0-9]+]]> = zext vp<[[EVL]]> to i64
; IF-EVL-NEXT: EMIT vp<[[IV_NEXT]]> = add vp<[[CAST]]>, vp<[[EVL_PHI]]>
; IF-EVL-NEXT: EMIT vp<[[IV_NEXT_EXIT:%.+]]> = add vp<[[IV]]>, vp<[[VFUF]]>
; IF-EVL-NEXT: EMIT branch-on-count vp<[[IV_NEXT_EXIT]]>, vp<[[VTC]]>
@@ -359,7 +359,7 @@ define void @vp_lrint(ptr %a, ptr %b, i64 %N) {
; IF-EVL-NEXT: CLONE ir<[[GEP2:%.+]]> = getelementptr inbounds ir<%a>, vp<[[ST]]>
; IF-EVL-NEXT: vp<[[PTR2:%[0-9]+]]> = vector-pointer ir<[[GEP2]]>
; IF-EVL-NEXT: WIDEN vp.store vp<[[PTR2]]>, ir<[[TRUNC]]>, vp<[[EVL]]>
-; IF-EVL-NEXT: SCALAR-CAST vp<[[CAST:%[0-9]+]]> = zext vp<[[EVL]]> to i64
+; IF-EVL-NEXT: EMIT vp<[[CAST:%[0-9]+]]> = zext vp<[[EVL]]> to i64
; IF-EVL-NEXT: EMIT vp<[[IV_NEXT]]> = add vp<[[CAST]]>, vp<[[EVL_PHI]]>
; IF-EVL-NEXT: EMIT vp<[[IV_NEXT_EXIT:%.+]]> = add vp<[[IV]]>, vp<[[VFUF]]>
; IF-EVL-NEXT: EMIT branch-on-count vp<[[IV_NEXT_EXIT]]>, vp<[[VTC]]>
@@ -414,7 +414,7 @@ define void @vp_llrint(ptr %a, ptr %b, i64 %N) {
; IF-EVL-NEXT: CLONE ir<[[GEP2:%.+]]> = getelementptr inbounds ir<%a>, vp<[[ST]]>
; IF-EVL-NEXT: vp<[[PTR2:%[0-9]+]]> = vector-pointer ir<[[GEP2]]>
; IF-EVL-NEXT: WIDEN vp.store vp<[[PTR2]]>, ir<[[TRUNC]]>, vp<[[EVL]]>
-; IF-EVL-NEXT: SCALAR-CAST vp<[[CAST:%[0-9]+]]> = zext vp<[[EVL]]> to i64
+; IF-EVL-NEXT: EMIT vp<[[CAST:%[0-9]+]]> = zext vp<[[EVL]]> to i64
; IF-EVL-NEXT: EMIT vp<[[IV_NEXT]]> = add vp<[[CAST]]>, vp<[[EVL_PHI]]>
; IF-EVL-NEXT: EMIT vp<[[IV_NEXT_EXIT:%.+]]> = add vp<[[IV]]>, vp<[[VFUF]]>
; IF-EVL-NEXT: EMIT branch-on-count vp<[[IV_NEXT_EXIT]]>, vp<[[VTC]]>
@@ -467,7 +467,7 @@ define void @vp_abs(ptr %a, ptr %b, i64 %N) {
; IF-EVL-NEXT: CLONE ir<[[GEP2:%.+]]> = getelementptr inbounds ir<%a>, vp<[[ST]]>
; IF-EVL-NEXT: vp<[[PTR2:%[0-9]+]]> = vector-pointer ir<[[GEP2]]>
; IF-EVL-NEXT: WIDEN vp....
[truncated]
|
/// Return the cost of this VPIRInstruction. | ||
InstructionCost computeCost(ElementCount VF, | ||
VPCostContext &Ctx) const override { | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return actual cost here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now this only represents scalar casts, which are not corresponding to any input IR, hence are not yet costed (Added a TODO). It will be fleshed out in #129712.
WIP as it depends on llvm#129706.
WIP patch to also use it to replace VPWidenCastRecipe: #129712 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, +1 on this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Gentle reverse ping! |
/// Scalar result type produced by the recipe. | ||
Type *ResultTy; | ||
|
||
Value *generate(VPTransformState &State); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document. May be a bit confusing being non-virtual nor override, worth a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I folded it into execute
directly, thanks
|
||
VPInstruction *clone() override { | ||
auto *New = | ||
new VPInstructionWithType(getOpcode(), {getOperand(0)}, getResultType(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assumed to have a single operand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now yes, but removed restriction, thanks!
public: | ||
VPInstructionWithType(unsigned Opcode, ArrayRef<VPValue *> Operands, | ||
Type *ResultTy, DebugLoc DL, const Twine &Name = "") | ||
: VPInstruction(Opcode, Operands, DL, Name), ResultTy(ResultTy) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about FMF for, say, FPExt, or any other IRFlags supported by VPInstruction casts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, non of the cast take/use them, can be added as follow-up?
|
||
void execute(VPTransformState &State) override; | ||
|
||
/// Return the cost of this VPIRInstruction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Return the cost of this VPIRInstruction. | |
/// Return the cost of this VPInstruction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed thanks
.Case<VPInstructionWithType, VPWidenIntrinsicRecipe>( | ||
[](const auto *R) { return R->getResultType(); }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting that this case should appear before the next to catch VPInstructionWithType before its parent VPInstruction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
case Instruction::SExt: | ||
case Instruction::ZExt: | ||
case Instruction::Trunc: { | ||
// Note: SExt/ZExt not used yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then leave them as default until they are, to avoid dead code? Admittedly unrelated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limited to Trunc and ZExt, which both are used, thanks
Instruction::isCast( | ||
cast<VPInstruction>(U)->getOpcode()) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth wrapping in VPInstruction::isCast()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks!
/// A specialization of VPInstruction augmenting it with a dedicated result | ||
/// type, to be used when the opcode and operands of the VPInstruction don't | ||
/// directly determine the result type. | ||
class VPInstructionWithType : public VPInstruction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A recipe can "have" a single def, and support the interface of VPValue, by inheriting from it.
A (single def) recipe can have IRflags, by being VPRecipeWithFlags.
A (single def with IRFlags) recipe can have an opcode, by being a VPInstruction.
A (single def with IRFlags with opcode) recipe can have a Type, by being a VPInstructionWithType.
Would using multiple inheritance be better to cope with desired combinations of independent "with's", where IRFlags, opcode, Type, are provided as base classes? In particular, allowing recipes to have Type w/o having IRFlags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the Type base class store the Type or just provide an interface for e.g. getScalarType()
? Is this something that should be explored in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should split off the flags from VPRecipeWithFlags and generally move in this direction.
For the type, this would just add a type field and a ::getScalarType
? I'll check that, although for now, the only user would be VPInstructionWithType
, so it might be simpler to do this as follow-up, once more users arise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could do something like d6eb747, but unfortunately I've not yet been able to figure out how to make this not actually crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to sort this out now using a specialization of CastInfo
, which allows isa/dyn_cast<VPWithResultType>
to work, effectively turning it into a mixin/trait.
A sketch is here 5c54367, might be worth doing separately>
Reverse ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New changes LGTM
Type *ResultTy, DebugLoc DL, const Twine &Name = "") | ||
: VPInstruction(Opcode, Operands, DL, Name), ResultTy(ResultTy) {} | ||
|
||
static inline bool classof(const VPRecipeBase *R) { return isCast(R); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for splitting out isCast. I needed to do something similar to make VPInstruction::StepVector a VPInstructionWithType in #129508
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this looks fine to me, the alternatives of inheriting Type rather than holding it as a field can be explored as follow-up. Adding several minor comments.
@@ -1023,6 +1022,56 @@ class VPInstruction : public VPRecipeWithIRFlags, | |||
|
|||
/// Returns the symbolic name assigned to the VPInstruction. | |||
StringRef getName() const { return Name; } | |||
|
|||
/// Return true if \p U is a cast. | |||
static bool isCast(const VPUser *U) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems more natural to have VPUser::isCast() or VPRecipeBase::isCast(), analogous to Instruction::isCast()?
Rather than the static Instruction::isCast(opcode) which takes an opcode as parameter rather than a [VP]User.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added VPRecipeBase::isScalarCast
, thanks
Type *ResultTy, DebugLoc DL, const Twine &Name = "") | ||
: VPInstruction(Opcode, Operands, DL, Name), ResultTy(ResultTy) {} | ||
|
||
static inline bool classof(const VPRecipeBase *R) { return isCast(R); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently VPInstructionWithType is used for cast recipes only, but if/when that changes the return isCast(R)
here should be updated. Some comment should be added to explain that all VPInstructionWithType recipes should be identified here based on their opcode, rather than by checking VPDefID, because the latter is VPInstructionSC rather than VPInstructionWithTypeSC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
.Case<VPScalarCastRecipe, VPWidenCastRecipe>([](const auto *R) { | ||
.Case<VPInstruction>([](const auto *VPI) { | ||
return Instruction::isCast(VPI->getOpcode()) && | ||
all_of(VPI->operands(), isUniformAcrossVFsAndUFs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is Cast, suffice to check first operand only, as before (and assert it has a single operand).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done , thanks
@@ -45,8 +45,7 @@ inline bool isUniformAfterVectorization(const VPValue *VPV) { | |||
return true; | |||
if (auto *Rep = dyn_cast<VPReplicateRecipe>(VPV)) | |||
return Rep->isUniform(); | |||
if (isa<VPWidenGEPRecipe, VPDerivedIVRecipe, VPScalarCastRecipe, | |||
VPBlendRecipe>(VPV)) | |||
if (isa<VPWidenGEPRecipe, VPDerivedIVRecipe, VPBlendRecipe>(VPV)) | |||
return all_of(VPV->getDefiningRecipe()->operands(), | |||
isUniformAfterVectorization); | |||
if (auto *VPI = dyn_cast<VPInstruction>(VPV)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now also covers the VPInstructionWithType case, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's one of the main advantages of sharing the VPInstruction base class
.Case<VPScalarCastRecipe>( | ||
[&](const VPScalarCastRecipe *S) { return VerifyEVLUse(*S, 0); }) | ||
.Case<VPInstructionWithType>( | ||
[&](const auto *S) { return VerifyEVLUse(*S, 0); }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[&](const auto *S) { return VerifyEVLUse(*S, 0); }) | |
[&](const VPRecipeBase *S) { return VerifyEVLUse(*S, 0); }) |
consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to VPInstructionWithType
,. thanks
|
||
public: | ||
VPInstructionWithType(unsigned Opcode, ArrayRef<VPValue *> Operands, | ||
Type *ResultTy, DebugLoc DL, const Twine &Name = "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth documenting somewhere that VPInstructionWithType does not have its own VPDef::VPInstructionWithTypeSC identifying it, as do all other recipes, but it uses VPInstructionSC instead. VPInstructions should be identified according to their opcode, whether they are actually of class VPInstruction itself or a subclass thereof.
I.e., there is no VP_CLASSOF_IMPL(VPDef::VPInstructionWithTypeSC)
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
@@ -142,7 +142,6 @@ bool VPRecipeBase::mayHaveSideEffects() const { | |||
switch (getVPDefID()) { | |||
case VPDerivedIVSC: | |||
case VPPredInstPHISC: | |||
case VPScalarCastSC: | |||
case VPVectorEndPointerSC: | |||
return false; | |||
case VPInstructionSC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which now takes care of VPInstructionWithType recipes as well, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
@@ -4459,7 +4459,6 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF, | |||
switch (R.getVPDefID()) { | |||
case VPDef::VPDerivedIVSC: | |||
case VPDef::VPScalarIVStepsSC: | |||
case VPDef::VPScalarCastSC: | |||
case VPDef::VPReplicateSC: | |||
case VPDef::VPInstructionSC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPInstructionSC now also holds for VPInstructionWithType recipes, instead of VPScalarCastSC.
@@ -531,7 +531,6 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue { | |||
case VPRecipeBase::VPWidenIntOrFpInductionSC: | |||
case VPRecipeBase::VPWidenPointerInductionSC: | |||
case VPRecipeBase::VPReductionPHISC: | |||
case VPRecipeBase::VPScalarCastSC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPInstructionSC above now also holds for VPInstructionWithType recipes, instead of VPScalarCastSC.
…ScalarCast(NFC) (#129706) There are some opcodes that currently require specialized recipes, due to their result type not being implied by their operands, including casts. This leads to duplication from defining multiple full recipes. This patch introduces a new VPInstructionWithType subclass that also stores the result type. The general idea is to have opcodes needing to specify a result type to use this general recipe. The current patch replaces VPScalarCastRecipe with VInstructionWithType, a similar patch for VPWidenCastRecipe will follow soon. There are a few proposed opcodes that should also benefit, without the need of workarounds: * llvm/llvm-project#129508 * llvm/llvm-project#119284 PR: llvm/llvm-project#129706
…NFC) (llvm#129706) There are some opcodes that currently require specialized recipes, due to their result type not being implied by their operands, including casts. This leads to duplication from defining multiple full recipes. This patch introduces a new VPInstructionWithType subclass that also stores the result type. The general idea is to have opcodes needing to specify a result type to use this general recipe. The current patch replaces VPScalarCastRecipe with VInstructionWithType, a similar patch for VPWidenCastRecipe will follow soon. There are a few proposed opcodes that should also benefit, without the need of workarounds: * llvm#129508 * llvm#119284 PR: llvm#129706
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/23150 Here is the relevant piece of the build log for the reference
|
…NFC) (llvm#129706) There are some opcodes that currently require specialized recipes, due to their result type not being implied by their operands, including casts. This leads to duplication from defining multiple full recipes. This patch introduces a new VPInstructionWithType subclass that also stores the result type. The general idea is to have opcodes needing to specify a result type to use this general recipe. The current patch replaces VPScalarCastRecipe with VInstructionWithType, a similar patch for VPWidenCastRecipe will follow soon. There are a few proposed opcodes that should also benefit, without the need of workarounds: * llvm#129508 * llvm#119284 PR: llvm#129706
There are some opcodes that currently require specialized recipes, due to their result type not being implied by their operands, including casts.
This leads to duplication from defining multiple full recipes.
This patch introduces a new VPInstructionWithType subclass that also stores the result type. The general idea is to have opcodes needing to specify a result type to use this general recipe. The current patch replaces VPScalarCastRecipe with VInstructionWithType, a similar patch for VPWidenCastRecipe will follow soon.
There are a few proposed opcodes that should also benefit, without the need of workarounds: