Skip to content

[TTI] Simplify implementation (NFCI)#136674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 26, 2025

Conversation

s-barannikov
Copy link
Contributor

@s-barannikovs-barannikov commented Apr 22, 2025

Replace "concept based polymorphism" with simpler PImpl idiom.

This pursues two goals:

  • Enforce static type checking. Previously, target implementations hid base class methods and type checking was impossible. Now that they override the methods, the compiler will complain on mismatched signatures.
  • Make the code easier to navigate. Previously, if you asked your favorite LSP server to show a method (e.g. getInstructionCost()), it would show you methods from TTI, TTI::Concept, TTI::Model, TTIImplBase, and target overrides. Now it is two less :)

There are three commits to hopefully simplify the review.

The first commit removes TTI::Model. This is done by deriving TargetTransformInfoImplBase from TTI::Concept. This is possible because they implement the same set of interfaces with identical signatures.

The first commit makes TargetTransformImplBase polymorphic, which means all derived classes should override its methods. This is done in second commit to make the first one smaller. It appeared infeasible to extract this into a separate PR because the first commit landed separately would result in tons of -Woverloaded-virtual warnings (and break -Werror builds).

The third commit eliminates TTI::Concept by merging it with the only derived class TargetTransformImplBase. This commit could be extracted into a separate PR, but it touches the same lines in TargetTransformInfoImpl.h (removes override added by the second commit and adds virtual), so I thought it may make sense to land these two commits together.

I don't expect any compile-time regressions/improvements here (there is still one virtual call). It might be slightly faster to build as TargetTransformInfo.h now has three times fewer methods.

I plan to do more cleanup in follow-up PRs.

@llvmbot
Copy link
Member

llvmbot commented Apr 22, 2025

@llvm/pr-subscribers-backend-nvptx
@llvm/pr-subscribers-backend-loongarch
@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-backend-aarch64

Author: Sergei Barannikov (s-barannikov)

Changes

Replace "concept based polymorphism" with simpler PImpl idiom.

This pursues two goals:

  • Enforce static type checking. Previously, target implementations hid base class methods and type checking was impossible. Now that they override the methods, the compiler will complain on mismatched signatures.
  • Make the code easier to navigate. Previously, if you asked your favorite LSP server to show a method (e.g. getInstructionCost()), it would show you the methods from TTI, TTI::Concept, TTI::Model, TTIImplBase, and target overrides. Now it is two less :)

There are three commits to hopefully simplify the review.

The first commit removes TTI::Model. This is done by deriving TargetTransformInfoImplBase from TTI::Concept. This is possible because they implement the same set of interfaces with identical signatures.

The first commit makes TargetTransformImplBase polymorphic, which means all derived classes should override its methods. This is done in second commit to make the first one smaller. It appeared infeasible to extract this into a separate PR because the first commit landed separately would result in tons of -Woverloaded-virtual warnings (and break -Werror builds).

The third commit eliminates TTI::Concept by merging it with the only derived class TargetTransformImplBase. This commit could be extracted into a separate PR, but it touches the same lines in TargetTransformInfoImpl.h (removes override added by the second commit and adds virtual), so I thought it may make sense to land these two commits together.

I don't expect any compile-time regressions/improvements here (there is still one virtual call). It might be slightly faster to build as TargetTransformInfo.h now has three times fewer methods.

I plan to do more cleanup in follow-up PRs.


Patch is 315.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/136674.diff

45 Files Affected:

  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+4-1354)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+399-329)
  • (modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+150-138)
  • (modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+6-2)
  • (modified) llvm/lib/CodeGen/CodeGenTargetMachineImpl.cpp (+1-1)
  • (modified) llvm/lib/Target/AArch64/AArch64TargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h (+120-109)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h (+64-59)
  • (modified) llvm/lib/Target/AMDGPU/R600TargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/AMDGPU/R600TargetTransformInfo.h (+12-11)
  • (modified) llvm/lib/Target/ARC/ARCTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/ARM/ARMTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/ARM/ARMTargetTransformInfo.h (+86-76)
  • (modified) llvm/lib/Target/BPF/BPFTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/BPF/BPFTargetTransformInfo.h (+6-9)
  • (modified) llvm/lib/Target/DirectX/DirectXTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/DirectX/DirectXTargetTransformInfo.h (+4-4)
  • (modified) llvm/lib/Target/Hexagon/HexagonTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h (+55-47)
  • (modified) llvm/lib/Target/Lanai/LanaiTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h (+11-11)
  • (modified) llvm/lib/Target/LoongArch/LoongArchTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.h (+8-6)
  • (modified) llvm/lib/Target/Mips/MipsTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/Mips/MipsTargetTransformInfo.h (+2-2)
  • (modified) llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h (+31-29)
  • (modified) llvm/lib/Target/PowerPC/PPCTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp (+5-1)
  • (modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h (+58-55)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+102-93)
  • (modified) llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/SPIRV/SPIRVTargetTransformInfo.h (+1-1)
  • (modified) llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h (+44-45)
  • (modified) llvm/lib/Target/VE/VETargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/VE/VETargetTransformInfo.h (+11-10)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h (+21-21)
  • (modified) llvm/lib/Target/X86/X86TargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/X86/X86TargetTransformInfo.h (+94-90)
  • (modified) llvm/lib/Target/XCore/XCoreTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/XCore/XCoreTargetTransformInfo.h (+1-1)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h index ab5306b7b614e..76a77737394b8 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfo.h+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h@@ -210,6 +210,7 @@ struct TailFoldingInfo { class TargetTransformInfo; typedef TargetTransformInfo TTI; +class TargetTransformInfoImplBase; /// This pass provides access to the codegen interfaces that are needed /// for IR-level transformations. @@ -226,7 +227,8 @@ class TargetTransformInfo { /// /// This is used by targets to construct a TTI wrapping their target-specific /// implementation that encodes appropriate costs for their target. - template <typename T> TargetTransformInfo(T Impl);+ explicit TargetTransformInfo(+ std::unique_ptr<const TargetTransformInfoImplBase> Impl); /// Construct a baseline TTI object using a minimal implementation of /// the \c Concept API below. @@ -1915,1361 +1917,9 @@ class TargetTransformInfo { SmallVectorImpl<std::pair<StringRef, int64_t>> &LB) const; private: - /// The abstract base class used to type erase specific TTI- /// implementations.- class Concept;-- /// The template model for the base class which wraps a concrete- /// implementation in a type erased interface.- template <typename T> class Model;-- std::unique_ptr<const Concept> TTIImpl;-};--class TargetTransformInfo::Concept {-public:- virtual ~Concept() = 0;- virtual const DataLayout &getDataLayout() const = 0;- virtual InstructionCost getGEPCost(Type *PointeeType, const Value *Ptr,- ArrayRef<const Value *> Operands,- Type *AccessType,- TTI::TargetCostKind CostKind) const = 0;- virtual InstructionCost- getPointersChainCost(ArrayRef<const Value *> Ptrs, const Value *Base,- const TTI::PointersChainInfo &Info, Type *AccessTy,- TTI::TargetCostKind CostKind) const = 0;- virtual unsigned getInliningThresholdMultiplier() const = 0;- virtual unsigned getInliningCostBenefitAnalysisSavingsMultiplier() const = 0;- virtual unsigned- getInliningCostBenefitAnalysisProfitableMultiplier() const = 0;- virtual int getInliningLastCallToStaticBonus() const = 0;- virtual unsigned adjustInliningThreshold(const CallBase *CB) const = 0;- virtual int getInlinerVectorBonusPercent() const = 0;- virtual unsigned getCallerAllocaCost(const CallBase *CB,- const AllocaInst *AI) const = 0;- virtual InstructionCost getMemcpyCost(const Instruction *I) const = 0;- virtual uint64_t getMaxMemIntrinsicInlineSizeThreshold() const = 0;- virtual unsigned- getEstimatedNumberOfCaseClusters(const SwitchInst &SI, unsigned &JTSize,- ProfileSummaryInfo *PSI,- BlockFrequencyInfo *BFI) const = 0;- virtual InstructionCost getInstructionCost(const User *U,- ArrayRef<const Value *> Operands,- TargetCostKind CostKind) const = 0;- virtual BranchProbability getPredictableBranchThreshold() const = 0;- virtual InstructionCost getBranchMispredictPenalty() const = 0;- virtual bool hasBranchDivergence(const Function *F = nullptr) const = 0;- virtual bool isSourceOfDivergence(const Value *V) const = 0;- virtual bool isAlwaysUniform(const Value *V) const = 0;- virtual bool isValidAddrSpaceCast(unsigned FromAS, unsigned ToAS) const = 0;- virtual bool addrspacesMayAlias(unsigned AS0, unsigned AS1) const = 0;- virtual unsigned getFlatAddressSpace() const = 0;- virtual bool collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,- Intrinsic::ID IID) const = 0;- virtual bool isNoopAddrSpaceCast(unsigned FromAS, unsigned ToAS) const = 0;- virtual bool- canHaveNonUndefGlobalInitializerInAddressSpace(unsigned AS) const = 0;- virtual unsigned getAssumedAddrSpace(const Value *V) const = 0;- virtual bool isSingleThreaded() const = 0;- virtual std::pair<const Value *, unsigned>- getPredicatedAddrSpace(const Value *V) const = 0;- virtual Value *rewriteIntrinsicWithAddressSpace(IntrinsicInst *II,- Value *OldV,- Value *NewV) const = 0;- virtual bool isLoweredToCall(const Function *F) const = 0;- virtual void- getUnrollingPreferences(Loop *L, ScalarEvolution &, UnrollingPreferences &UP,- OptimizationRemarkEmitter *ORE) const = 0;- virtual void getPeelingPreferences(Loop *L, ScalarEvolution &SE,- PeelingPreferences &PP) const = 0;- virtual bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE,- AssumptionCache &AC,- TargetLibraryInfo *LibInfo,- HardwareLoopInfo &HWLoopInfo) const = 0;- virtual unsigned getEpilogueVectorizationMinVF() const = 0;- virtual bool preferPredicateOverEpilogue(TailFoldingInfo *TFI) const = 0;- virtual TailFoldingStyle- getPreferredTailFoldingStyle(bool IVUpdateMayOverflow = true) const = 0;- virtual std::optional<Instruction *>- instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const = 0;- virtual std::optional<Value *>- simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &II,- APInt DemandedMask, KnownBits &Known,- bool &KnownBitsComputed) const = 0;- virtual std::optional<Value *> simplifyDemandedVectorEltsIntrinsic(- InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,- APInt &UndefElts2, APInt &UndefElts3,- std::function<void(Instruction *, unsigned, APInt, APInt &)>- SimplifyAndSetOp) const = 0;- virtual bool isLegalAddImmediate(int64_t Imm) const = 0;- virtual bool isLegalAddScalableImmediate(int64_t Imm) const = 0;- virtual bool isLegalICmpImmediate(int64_t Imm) const = 0;- virtual bool isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV,- int64_t BaseOffset, bool HasBaseReg,- int64_t Scale, unsigned AddrSpace,- Instruction *I,- int64_t ScalableOffset) const = 0;- virtual bool isLSRCostLess(const TargetTransformInfo::LSRCost &C1,- const TargetTransformInfo::LSRCost &C2) const = 0;- virtual bool isNumRegsMajorCostOfLSR() const = 0;- virtual bool shouldDropLSRSolutionIfLessProfitable() const = 0;- virtual bool isProfitableLSRChainElement(Instruction *I) const = 0;- virtual bool canMacroFuseCmp() const = 0;- virtual bool canSaveCmp(Loop *L, BranchInst **BI, ScalarEvolution *SE,- LoopInfo *LI, DominatorTree *DT, AssumptionCache *AC,- TargetLibraryInfo *LibInfo) const = 0;- virtual AddressingModeKind- getPreferredAddressingMode(const Loop *L, ScalarEvolution *SE) const = 0;- virtual bool isLegalMaskedStore(Type *DataType, Align Alignment,- unsigned AddressSpace) const = 0;- virtual bool isLegalMaskedLoad(Type *DataType, Align Alignment,- unsigned AddressSpace) const = 0;- virtual bool isLegalNTStore(Type *DataType, Align Alignment) const = 0;- virtual bool isLegalNTLoad(Type *DataType, Align Alignment) const = 0;- virtual bool isLegalBroadcastLoad(Type *ElementTy,- ElementCount NumElements) const = 0;- virtual bool isLegalMaskedScatter(Type *DataType, Align Alignment) const = 0;- virtual bool isLegalMaskedGather(Type *DataType, Align Alignment) const = 0;- virtual bool forceScalarizeMaskedGather(VectorType *DataType,- Align Alignment) const = 0;- virtual bool forceScalarizeMaskedScatter(VectorType *DataType,- Align Alignment) const = 0;- virtual bool isLegalMaskedCompressStore(Type *DataType,- Align Alignment) const = 0;- virtual bool isLegalMaskedExpandLoad(Type *DataType,- Align Alignment) const = 0;- virtual bool isLegalStridedLoadStore(Type *DataType,- Align Alignment) const = 0;- virtual bool isLegalInterleavedAccessType(VectorType *VTy, unsigned Factor,- Align Alignment,- unsigned AddrSpace) const = 0;-- virtual bool isLegalMaskedVectorHistogram(Type *AddrType,- Type *DataType) const = 0;- virtual bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0,- unsigned Opcode1,- const SmallBitVector &OpcodeMask) const = 0;- virtual bool enableOrderedReductions() const = 0;- virtual bool hasDivRemOp(Type *DataType, bool IsSigned) const = 0;- virtual bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) const = 0;- virtual bool prefersVectorizedAddressing() const = 0;- virtual InstructionCost getScalingFactorCost(Type *Ty, GlobalValue *BaseGV,- StackOffset BaseOffset,- bool HasBaseReg, int64_t Scale,- unsigned AddrSpace) const = 0;- virtual bool LSRWithInstrQueries() const = 0;- virtual bool isTruncateFree(Type *Ty1, Type *Ty2) const = 0;- virtual bool isProfitableToHoist(Instruction *I) const = 0;- virtual bool useAA() const = 0;- virtual bool isTypeLegal(Type *Ty) const = 0;- virtual unsigned getRegUsageForType(Type *Ty) const = 0;- virtual bool shouldBuildLookupTables() const = 0;- virtual bool shouldBuildLookupTablesForConstant(Constant *C) const = 0;- virtual bool shouldBuildRelLookupTables() const = 0;- virtual bool useColdCCForColdCall(Function &F) const = 0;- virtual bool- isTargetIntrinsicTriviallyScalarizable(Intrinsic::ID ID) const = 0;- virtual bool- isTargetIntrinsicWithScalarOpAtArg(Intrinsic::ID ID,- unsigned ScalarOpdIdx) const = 0;- virtual bool isTargetIntrinsicWithOverloadTypeAtArg(Intrinsic::ID ID,- int OpdIdx) const = 0;- virtual bool- isTargetIntrinsicWithStructReturnOverloadAtField(Intrinsic::ID ID,- int RetIdx) const = 0;- virtual InstructionCost- getScalarizationOverhead(VectorType *Ty, const APInt &DemandedElts,- bool Insert, bool Extract, TargetCostKind CostKind,- ArrayRef<Value *> VL = {}) const = 0;- virtual InstructionCost- getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,- ArrayRef<Type *> Tys,- TargetCostKind CostKind) const = 0;- virtual bool supportsEfficientVectorElementLoadStore() const = 0;- virtual bool supportsTailCalls() const = 0;- virtual bool supportsTailCallFor(const CallBase *CB) const = 0;- virtual bool enableAggressiveInterleaving(bool LoopHasReductions) const = 0;- virtual MemCmpExpansionOptions- enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const = 0;- virtual bool enableSelectOptimize() const = 0;- virtual bool shouldTreatInstructionLikeSelect(const Instruction *I) const = 0;- virtual bool enableInterleavedAccessVectorization() const = 0;- virtual bool enableMaskedInterleavedAccessVectorization() const = 0;- virtual bool isFPVectorizationPotentiallyUnsafe() const = 0;- virtual bool allowsMisalignedMemoryAccesses(LLVMContext &Context,- unsigned BitWidth,- unsigned AddressSpace,- Align Alignment,- unsigned *Fast) const = 0;- virtual PopcntSupportKind- getPopcntSupport(unsigned IntTyWidthInBit) const = 0;- virtual bool haveFastSqrt(Type *Ty) const = 0;- virtual bool- isExpensiveToSpeculativelyExecute(const Instruction *I) const = 0;- virtual bool isFCmpOrdCheaperThanFCmpZero(Type *Ty) const = 0;- virtual InstructionCost getFPOpCost(Type *Ty) const = 0;- virtual InstructionCost getIntImmCodeSizeCost(unsigned Opc, unsigned Idx,- const APInt &Imm,- Type *Ty) const = 0;- virtual InstructionCost getIntImmCost(const APInt &Imm, Type *Ty,- TargetCostKind CostKind) const = 0;- virtual InstructionCost- getIntImmCostInst(unsigned Opc, unsigned Idx, const APInt &Imm, Type *Ty,- TargetCostKind CostKind,- Instruction *Inst = nullptr) const = 0;- virtual InstructionCost- getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,- Type *Ty, TargetCostKind CostKind) const = 0;- virtual bool preferToKeepConstantsAttached(const Instruction &Inst,- const Function &Fn) const = 0;- virtual unsigned getNumberOfRegisters(unsigned ClassID) const = 0;- virtual bool hasConditionalLoadStoreForType(Type *Ty, bool IsStore) const = 0;- virtual unsigned getRegisterClassForType(bool Vector,- Type *Ty = nullptr) const = 0;- virtual const char *getRegisterClassName(unsigned ClassID) const = 0;- virtual TypeSize getRegisterBitWidth(RegisterKind K) const = 0;- virtual unsigned getMinVectorRegisterBitWidth() const = 0;- virtual std::optional<unsigned> getMaxVScale() const = 0;- virtual std::optional<unsigned> getVScaleForTuning() const = 0;- virtual bool isVScaleKnownToBeAPowerOfTwo() const = 0;- virtual bool- shouldMaximizeVectorBandwidth(TargetTransformInfo::RegisterKind K) const = 0;- virtual ElementCount getMinimumVF(unsigned ElemWidth,- bool IsScalable) const = 0;- virtual unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const = 0;- virtual unsigned getStoreMinimumVF(unsigned VF, Type *ScalarMemTy,- Type *ScalarValTy) const = 0;- virtual bool shouldConsiderAddressTypePromotion(- const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const = 0;- virtual unsigned getCacheLineSize() const = 0;- virtual std::optional<unsigned> getCacheSize(CacheLevel Level) const = 0;- virtual std::optional<unsigned> getCacheAssociativity(CacheLevel Level)- const = 0;- virtual std::optional<unsigned> getMinPageSize() const = 0;-- /// \return How much before a load we should place the prefetch- /// instruction. This is currently measured in number of- /// instructions.- virtual unsigned getPrefetchDistance() const = 0;-- /// \return Some HW prefetchers can handle accesses up to a certain- /// constant stride. This is the minimum stride in bytes where it- /// makes sense to start adding SW prefetches. The default is 1,- /// i.e. prefetch with any stride. Sometimes prefetching is beneficial- /// even below the HW prefetcher limit, and the arguments provided are- /// meant to serve as a basis for deciding this for a particular loop.- virtual unsigned getMinPrefetchStride(unsigned NumMemAccesses,- unsigned NumStridedMemAccesses,- unsigned NumPrefetches,- bool HasCall) const = 0;-- /// \return The maximum number of iterations to prefetch ahead. If- /// the required number of iterations is more than this number, no- /// prefetching is performed.- virtual unsigned getMaxPrefetchIterationsAhead() const = 0;-- /// \return True if prefetching should also be done for writes.- virtual bool enableWritePrefetching() const = 0;-- /// \return if target want to issue a prefetch in address space \p AS.- virtual bool shouldPrefetchAddressSpace(unsigned AS) const = 0;-- /// \return The cost of a partial reduction, which is a reduction from a- /// vector to another vector with fewer elements of larger size. They are- /// represented by the llvm.experimental.partial.reduce.add intrinsic, which- /// takes an accumulator and a binary operation operand that itself is fed by- /// two extends. An example of an operation that uses a partial reduction is a- /// dot product, which reduces two vectors to another of 4 times fewer and 4- /// times larger elements.- virtual InstructionCost- getPartialReductionCost(unsigned Opcode, Type *InputTypeA, Type *InputTypeB,- Type *AccumType, ElementCount VF,- PartialReductionExtendKind OpAExtend,- PartialReductionExtendKind OpBExtend,- std::optional<unsigned> BinOp) const = 0;-- virtual unsigned getMaxInterleaveFactor(ElementCount VF) const = 0;- virtual InstructionCost- getArithmeticInstrCost(unsigned Opcode, Type *Ty,- TTI::TargetCostKind CostKind,- OperandValueInfo Opd1Info, OperandValueInfo Opd2Info,- ArrayRef<const Value *> Args,- const Instruction *CxtI = nullptr) const = 0;- virtual InstructionCost getAltInstrCost(- VectorType *VecTy, unsigned Opcode0, unsigned Opcode1,- const SmallBitVector &OpcodeMask,- TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) const = 0;-- virtual InstructionCost getShuffleCost(ShuffleKind Kind, VectorType *Tp,- ArrayRef<int> Mask,- TTI::TargetCostKind CostKind,- int Index, VectorType *SubTp,- ArrayRef<const Value *> Args,- const Instruction *CxtI) const = 0;- virtual InstructionCost getCastInstrCost(unsigned Opcode, Type *Dst,- Type *Src, CastContextHint CCH,- TTI::TargetCostKind CostKind,- const Instruction *I) const = 0;- virtual InstructionCost getExtractWithExtendCost(unsigned Opcode, Type *Dst,- VectorType *VecTy,- unsigned Index) const = 0;- virtual InstructionCost- getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,- const Instruction *I = nullptr) const = 0;- virtual InstructionCost- getCmpSelInstrCost(unsigned Opcode, Type *ValTy, Type *CondTy,- CmpInst::Predicate VecPred, TTI::TargetCostKind CostKind,- OperandValueInfo Op1Info, OperandValueInfo Op2Info,- const Instruction *I) const = 0;- virtual InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,- TTI::TargetCostKind CostKind,- unsigned Index, Value *Op0,- Value *Op1) const = 0;-- /// \param ScalarUserAndIdx encodes the information about extracts from a- /// vector with 'Scalar' being the value being extracted,'User' being the user- /// of the extract(nullptr if user is not known before vectorization) and- /// 'Idx' being the extract lane.- virtual InstructionCost getVectorInstrCost(- unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index,- Value *Scalar,- ArrayRef<std::tuple<Value *, User *, int>> ScalarUserAndIdx) const = 0;-- virtual InstructionCost getVectorInstrCost(const Instruction &I, Type *Val,- TTI::Targ... [truncated] 
@llvmbot
Copy link
Member

@llvm/pr-subscribers-backend-spir-v

Author: Sergei Barannikov (s-barannikov)

Changes

Replace "concept based polymorphism" with simpler PImpl idiom.

This pursues two goals:

  • Enforce static type checking. Previously, target implementations hid base class methods and type checking was impossible. Now that they override the methods, the compiler will complain on mismatched signatures.
  • Make the code easier to navigate. Previously, if you asked your favorite LSP server to show a method (e.g. getInstructionCost()), it would show you the methods from TTI, TTI::Concept, TTI::Model, TTIImplBase, and target overrides. Now it is two less :)

There are three commits to hopefully simplify the review.

The first commit removes TTI::Model. This is done by deriving TargetTransformInfoImplBase from TTI::Concept. This is possible because they implement the same set of interfaces with identical signatures.

The first commit makes TargetTransformImplBase polymorphic, which means all derived classes should override its methods. This is done in second commit to make the first one smaller. It appeared infeasible to extract this into a separate PR because the first commit landed separately would result in tons of -Woverloaded-virtual warnings (and break -Werror builds).

The third commit eliminates TTI::Concept by merging it with the only derived class TargetTransformImplBase. This commit could be extracted into a separate PR, but it touches the same lines in TargetTransformInfoImpl.h (removes override added by the second commit and adds virtual), so I thought it may make sense to land these two commits together.

I don't expect any compile-time regressions/improvements here (there is still one virtual call). It might be slightly faster to build as TargetTransformInfo.h now has three times fewer methods.

I plan to do more cleanup in follow-up PRs.


Patch is 315.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/136674.diff

45 Files Affected:

  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+4-1354)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+399-329)
  • (modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+150-138)
  • (modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+6-2)
  • (modified) llvm/lib/CodeGen/CodeGenTargetMachineImpl.cpp (+1-1)
  • (modified) llvm/lib/Target/AArch64/AArch64TargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h (+120-109)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h (+64-59)
  • (modified) llvm/lib/Target/AMDGPU/R600TargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/AMDGPU/R600TargetTransformInfo.h (+12-11)
  • (modified) llvm/lib/Target/ARC/ARCTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/ARM/ARMTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/ARM/ARMTargetTransformInfo.h (+86-76)
  • (modified) llvm/lib/Target/BPF/BPFTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/BPF/BPFTargetTransformInfo.h (+6-9)
  • (modified) llvm/lib/Target/DirectX/DirectXTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/DirectX/DirectXTargetTransformInfo.h (+4-4)
  • (modified) llvm/lib/Target/Hexagon/HexagonTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h (+55-47)
  • (modified) llvm/lib/Target/Lanai/LanaiTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h (+11-11)
  • (modified) llvm/lib/Target/LoongArch/LoongArchTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.h (+8-6)
  • (modified) llvm/lib/Target/Mips/MipsTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/Mips/MipsTargetTransformInfo.h (+2-2)
  • (modified) llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h (+31-29)
  • (modified) llvm/lib/Target/PowerPC/PPCTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp (+5-1)
  • (modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h (+58-55)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+102-93)
  • (modified) llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/SPIRV/SPIRVTargetTransformInfo.h (+1-1)
  • (modified) llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h (+44-45)
  • (modified) llvm/lib/Target/VE/VETargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/VE/VETargetTransformInfo.h (+11-10)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h (+21-21)
  • (modified) llvm/lib/Target/X86/X86TargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/X86/X86TargetTransformInfo.h (+94-90)
  • (modified) llvm/lib/Target/XCore/XCoreTargetMachine.cpp (+1-1)
  • (modified) llvm/lib/Target/XCore/XCoreTargetTransformInfo.h (+1-1)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h index ab5306b7b614e..76a77737394b8 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfo.h+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h@@ -210,6 +210,7 @@ struct TailFoldingInfo { class TargetTransformInfo; typedef TargetTransformInfo TTI; +class TargetTransformInfoImplBase; /// This pass provides access to the codegen interfaces that are needed /// for IR-level transformations. @@ -226,7 +227,8 @@ class TargetTransformInfo { /// /// This is used by targets to construct a TTI wrapping their target-specific /// implementation that encodes appropriate costs for their target. - template <typename T> TargetTransformInfo(T Impl);+ explicit TargetTransformInfo(+ std::unique_ptr<const TargetTransformInfoImplBase> Impl); /// Construct a baseline TTI object using a minimal implementation of /// the \c Concept API below. @@ -1915,1361 +1917,9 @@ class TargetTransformInfo { SmallVectorImpl<std::pair<StringRef, int64_t>> &LB) const; private: - /// The abstract base class used to type erase specific TTI- /// implementations.- class Concept;-- /// The template model for the base class which wraps a concrete- /// implementation in a type erased interface.- template <typename T> class Model;-- std::unique_ptr<const Concept> TTIImpl;-};--class TargetTransformInfo::Concept {-public:- virtual ~Concept() = 0;- virtual const DataLayout &getDataLayout() const = 0;- virtual InstructionCost getGEPCost(Type *PointeeType, const Value *Ptr,- ArrayRef<const Value *> Operands,- Type *AccessType,- TTI::TargetCostKind CostKind) const = 0;- virtual InstructionCost- getPointersChainCost(ArrayRef<const Value *> Ptrs, const Value *Base,- const TTI::PointersChainInfo &Info, Type *AccessTy,- TTI::TargetCostKind CostKind) const = 0;- virtual unsigned getInliningThresholdMultiplier() const = 0;- virtual unsigned getInliningCostBenefitAnalysisSavingsMultiplier() const = 0;- virtual unsigned- getInliningCostBenefitAnalysisProfitableMultiplier() const = 0;- virtual int getInliningLastCallToStaticBonus() const = 0;- virtual unsigned adjustInliningThreshold(const CallBase *CB) const = 0;- virtual int getInlinerVectorBonusPercent() const = 0;- virtual unsigned getCallerAllocaCost(const CallBase *CB,- const AllocaInst *AI) const = 0;- virtual InstructionCost getMemcpyCost(const Instruction *I) const = 0;- virtual uint64_t getMaxMemIntrinsicInlineSizeThreshold() const = 0;- virtual unsigned- getEstimatedNumberOfCaseClusters(const SwitchInst &SI, unsigned &JTSize,- ProfileSummaryInfo *PSI,- BlockFrequencyInfo *BFI) const = 0;- virtual InstructionCost getInstructionCost(const User *U,- ArrayRef<const Value *> Operands,- TargetCostKind CostKind) const = 0;- virtual BranchProbability getPredictableBranchThreshold() const = 0;- virtual InstructionCost getBranchMispredictPenalty() const = 0;- virtual bool hasBranchDivergence(const Function *F = nullptr) const = 0;- virtual bool isSourceOfDivergence(const Value *V) const = 0;- virtual bool isAlwaysUniform(const Value *V) const = 0;- virtual bool isValidAddrSpaceCast(unsigned FromAS, unsigned ToAS) const = 0;- virtual bool addrspacesMayAlias(unsigned AS0, unsigned AS1) const = 0;- virtual unsigned getFlatAddressSpace() const = 0;- virtual bool collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,- Intrinsic::ID IID) const = 0;- virtual bool isNoopAddrSpaceCast(unsigned FromAS, unsigned ToAS) const = 0;- virtual bool- canHaveNonUndefGlobalInitializerInAddressSpace(unsigned AS) const = 0;- virtual unsigned getAssumedAddrSpace(const Value *V) const = 0;- virtual bool isSingleThreaded() const = 0;- virtual std::pair<const Value *, unsigned>- getPredicatedAddrSpace(const Value *V) const = 0;- virtual Value *rewriteIntrinsicWithAddressSpace(IntrinsicInst *II,- Value *OldV,- Value *NewV) const = 0;- virtual bool isLoweredToCall(const Function *F) const = 0;- virtual void- getUnrollingPreferences(Loop *L, ScalarEvolution &, UnrollingPreferences &UP,- OptimizationRemarkEmitter *ORE) const = 0;- virtual void getPeelingPreferences(Loop *L, ScalarEvolution &SE,- PeelingPreferences &PP) const = 0;- virtual bool isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE,- AssumptionCache &AC,- TargetLibraryInfo *LibInfo,- HardwareLoopInfo &HWLoopInfo) const = 0;- virtual unsigned getEpilogueVectorizationMinVF() const = 0;- virtual bool preferPredicateOverEpilogue(TailFoldingInfo *TFI) const = 0;- virtual TailFoldingStyle- getPreferredTailFoldingStyle(bool IVUpdateMayOverflow = true) const = 0;- virtual std::optional<Instruction *>- instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const = 0;- virtual std::optional<Value *>- simplifyDemandedUseBitsIntrinsic(InstCombiner &IC, IntrinsicInst &II,- APInt DemandedMask, KnownBits &Known,- bool &KnownBitsComputed) const = 0;- virtual std::optional<Value *> simplifyDemandedVectorEltsIntrinsic(- InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,- APInt &UndefElts2, APInt &UndefElts3,- std::function<void(Instruction *, unsigned, APInt, APInt &)>- SimplifyAndSetOp) const = 0;- virtual bool isLegalAddImmediate(int64_t Imm) const = 0;- virtual bool isLegalAddScalableImmediate(int64_t Imm) const = 0;- virtual bool isLegalICmpImmediate(int64_t Imm) const = 0;- virtual bool isLegalAddressingMode(Type *Ty, GlobalValue *BaseGV,- int64_t BaseOffset, bool HasBaseReg,- int64_t Scale, unsigned AddrSpace,- Instruction *I,- int64_t ScalableOffset) const = 0;- virtual bool isLSRCostLess(const TargetTransformInfo::LSRCost &C1,- const TargetTransformInfo::LSRCost &C2) const = 0;- virtual bool isNumRegsMajorCostOfLSR() const = 0;- virtual bool shouldDropLSRSolutionIfLessProfitable() const = 0;- virtual bool isProfitableLSRChainElement(Instruction *I) const = 0;- virtual bool canMacroFuseCmp() const = 0;- virtual bool canSaveCmp(Loop *L, BranchInst **BI, ScalarEvolution *SE,- LoopInfo *LI, DominatorTree *DT, AssumptionCache *AC,- TargetLibraryInfo *LibInfo) const = 0;- virtual AddressingModeKind- getPreferredAddressingMode(const Loop *L, ScalarEvolution *SE) const = 0;- virtual bool isLegalMaskedStore(Type *DataType, Align Alignment,- unsigned AddressSpace) const = 0;- virtual bool isLegalMaskedLoad(Type *DataType, Align Alignment,- unsigned AddressSpace) const = 0;- virtual bool isLegalNTStore(Type *DataType, Align Alignment) const = 0;- virtual bool isLegalNTLoad(Type *DataType, Align Alignment) const = 0;- virtual bool isLegalBroadcastLoad(Type *ElementTy,- ElementCount NumElements) const = 0;- virtual bool isLegalMaskedScatter(Type *DataType, Align Alignment) const = 0;- virtual bool isLegalMaskedGather(Type *DataType, Align Alignment) const = 0;- virtual bool forceScalarizeMaskedGather(VectorType *DataType,- Align Alignment) const = 0;- virtual bool forceScalarizeMaskedScatter(VectorType *DataType,- Align Alignment) const = 0;- virtual bool isLegalMaskedCompressStore(Type *DataType,- Align Alignment) const = 0;- virtual bool isLegalMaskedExpandLoad(Type *DataType,- Align Alignment) const = 0;- virtual bool isLegalStridedLoadStore(Type *DataType,- Align Alignment) const = 0;- virtual bool isLegalInterleavedAccessType(VectorType *VTy, unsigned Factor,- Align Alignment,- unsigned AddrSpace) const = 0;-- virtual bool isLegalMaskedVectorHistogram(Type *AddrType,- Type *DataType) const = 0;- virtual bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0,- unsigned Opcode1,- const SmallBitVector &OpcodeMask) const = 0;- virtual bool enableOrderedReductions() const = 0;- virtual bool hasDivRemOp(Type *DataType, bool IsSigned) const = 0;- virtual bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) const = 0;- virtual bool prefersVectorizedAddressing() const = 0;- virtual InstructionCost getScalingFactorCost(Type *Ty, GlobalValue *BaseGV,- StackOffset BaseOffset,- bool HasBaseReg, int64_t Scale,- unsigned AddrSpace) const = 0;- virtual bool LSRWithInstrQueries() const = 0;- virtual bool isTruncateFree(Type *Ty1, Type *Ty2) const = 0;- virtual bool isProfitableToHoist(Instruction *I) const = 0;- virtual bool useAA() const = 0;- virtual bool isTypeLegal(Type *Ty) const = 0;- virtual unsigned getRegUsageForType(Type *Ty) const = 0;- virtual bool shouldBuildLookupTables() const = 0;- virtual bool shouldBuildLookupTablesForConstant(Constant *C) const = 0;- virtual bool shouldBuildRelLookupTables() const = 0;- virtual bool useColdCCForColdCall(Function &F) const = 0;- virtual bool- isTargetIntrinsicTriviallyScalarizable(Intrinsic::ID ID) const = 0;- virtual bool- isTargetIntrinsicWithScalarOpAtArg(Intrinsic::ID ID,- unsigned ScalarOpdIdx) const = 0;- virtual bool isTargetIntrinsicWithOverloadTypeAtArg(Intrinsic::ID ID,- int OpdIdx) const = 0;- virtual bool- isTargetIntrinsicWithStructReturnOverloadAtField(Intrinsic::ID ID,- int RetIdx) const = 0;- virtual InstructionCost- getScalarizationOverhead(VectorType *Ty, const APInt &DemandedElts,- bool Insert, bool Extract, TargetCostKind CostKind,- ArrayRef<Value *> VL = {}) const = 0;- virtual InstructionCost- getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,- ArrayRef<Type *> Tys,- TargetCostKind CostKind) const = 0;- virtual bool supportsEfficientVectorElementLoadStore() const = 0;- virtual bool supportsTailCalls() const = 0;- virtual bool supportsTailCallFor(const CallBase *CB) const = 0;- virtual bool enableAggressiveInterleaving(bool LoopHasReductions) const = 0;- virtual MemCmpExpansionOptions- enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const = 0;- virtual bool enableSelectOptimize() const = 0;- virtual bool shouldTreatInstructionLikeSelect(const Instruction *I) const = 0;- virtual bool enableInterleavedAccessVectorization() const = 0;- virtual bool enableMaskedInterleavedAccessVectorization() const = 0;- virtual bool isFPVectorizationPotentiallyUnsafe() const = 0;- virtual bool allowsMisalignedMemoryAccesses(LLVMContext &Context,- unsigned BitWidth,- unsigned AddressSpace,- Align Alignment,- unsigned *Fast) const = 0;- virtual PopcntSupportKind- getPopcntSupport(unsigned IntTyWidthInBit) const = 0;- virtual bool haveFastSqrt(Type *Ty) const = 0;- virtual bool- isExpensiveToSpeculativelyExecute(const Instruction *I) const = 0;- virtual bool isFCmpOrdCheaperThanFCmpZero(Type *Ty) const = 0;- virtual InstructionCost getFPOpCost(Type *Ty) const = 0;- virtual InstructionCost getIntImmCodeSizeCost(unsigned Opc, unsigned Idx,- const APInt &Imm,- Type *Ty) const = 0;- virtual InstructionCost getIntImmCost(const APInt &Imm, Type *Ty,- TargetCostKind CostKind) const = 0;- virtual InstructionCost- getIntImmCostInst(unsigned Opc, unsigned Idx, const APInt &Imm, Type *Ty,- TargetCostKind CostKind,- Instruction *Inst = nullptr) const = 0;- virtual InstructionCost- getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,- Type *Ty, TargetCostKind CostKind) const = 0;- virtual bool preferToKeepConstantsAttached(const Instruction &Inst,- const Function &Fn) const = 0;- virtual unsigned getNumberOfRegisters(unsigned ClassID) const = 0;- virtual bool hasConditionalLoadStoreForType(Type *Ty, bool IsStore) const = 0;- virtual unsigned getRegisterClassForType(bool Vector,- Type *Ty = nullptr) const = 0;- virtual const char *getRegisterClassName(unsigned ClassID) const = 0;- virtual TypeSize getRegisterBitWidth(RegisterKind K) const = 0;- virtual unsigned getMinVectorRegisterBitWidth() const = 0;- virtual std::optional<unsigned> getMaxVScale() const = 0;- virtual std::optional<unsigned> getVScaleForTuning() const = 0;- virtual bool isVScaleKnownToBeAPowerOfTwo() const = 0;- virtual bool- shouldMaximizeVectorBandwidth(TargetTransformInfo::RegisterKind K) const = 0;- virtual ElementCount getMinimumVF(unsigned ElemWidth,- bool IsScalable) const = 0;- virtual unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const = 0;- virtual unsigned getStoreMinimumVF(unsigned VF, Type *ScalarMemTy,- Type *ScalarValTy) const = 0;- virtual bool shouldConsiderAddressTypePromotion(- const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const = 0;- virtual unsigned getCacheLineSize() const = 0;- virtual std::optional<unsigned> getCacheSize(CacheLevel Level) const = 0;- virtual std::optional<unsigned> getCacheAssociativity(CacheLevel Level)- const = 0;- virtual std::optional<unsigned> getMinPageSize() const = 0;-- /// \return How much before a load we should place the prefetch- /// instruction. This is currently measured in number of- /// instructions.- virtual unsigned getPrefetchDistance() const = 0;-- /// \return Some HW prefetchers can handle accesses up to a certain- /// constant stride. This is the minimum stride in bytes where it- /// makes sense to start adding SW prefetches. The default is 1,- /// i.e. prefetch with any stride. Sometimes prefetching is beneficial- /// even below the HW prefetcher limit, and the arguments provided are- /// meant to serve as a basis for deciding this for a particular loop.- virtual unsigned getMinPrefetchStride(unsigned NumMemAccesses,- unsigned NumStridedMemAccesses,- unsigned NumPrefetches,- bool HasCall) const = 0;-- /// \return The maximum number of iterations to prefetch ahead. If- /// the required number of iterations is more than this number, no- /// prefetching is performed.- virtual unsigned getMaxPrefetchIterationsAhead() const = 0;-- /// \return True if prefetching should also be done for writes.- virtual bool enableWritePrefetching() const = 0;-- /// \return if target want to issue a prefetch in address space \p AS.- virtual bool shouldPrefetchAddressSpace(unsigned AS) const = 0;-- /// \return The cost of a partial reduction, which is a reduction from a- /// vector to another vector with fewer elements of larger size. They are- /// represented by the llvm.experimental.partial.reduce.add intrinsic, which- /// takes an accumulator and a binary operation operand that itself is fed by- /// two extends. An example of an operation that uses a partial reduction is a- /// dot product, which reduces two vectors to another of 4 times fewer and 4- /// times larger elements.- virtual InstructionCost- getPartialReductionCost(unsigned Opcode, Type *InputTypeA, Type *InputTypeB,- Type *AccumType, ElementCount VF,- PartialReductionExtendKind OpAExtend,- PartialReductionExtendKind OpBExtend,- std::optional<unsigned> BinOp) const = 0;-- virtual unsigned getMaxInterleaveFactor(ElementCount VF) const = 0;- virtual InstructionCost- getArithmeticInstrCost(unsigned Opcode, Type *Ty,- TTI::TargetCostKind CostKind,- OperandValueInfo Opd1Info, OperandValueInfo Opd2Info,- ArrayRef<const Value *> Args,- const Instruction *CxtI = nullptr) const = 0;- virtual InstructionCost getAltInstrCost(- VectorType *VecTy, unsigned Opcode0, unsigned Opcode1,- const SmallBitVector &OpcodeMask,- TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) const = 0;-- virtual InstructionCost getShuffleCost(ShuffleKind Kind, VectorType *Tp,- ArrayRef<int> Mask,- TTI::TargetCostKind CostKind,- int Index, VectorType *SubTp,- ArrayRef<const Value *> Args,- const Instruction *CxtI) const = 0;- virtual InstructionCost getCastInstrCost(unsigned Opcode, Type *Dst,- Type *Src, CastContextHint CCH,- TTI::TargetCostKind CostKind,- const Instruction *I) const = 0;- virtual InstructionCost getExtractWithExtendCost(unsigned Opcode, Type *Dst,- VectorType *VecTy,- unsigned Index) const = 0;- virtual InstructionCost- getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,- const Instruction *I = nullptr) const = 0;- virtual InstructionCost- getCmpSelInstrCost(unsigned Opcode, Type *ValTy, Type *CondTy,- CmpInst::Predicate VecPred, TTI::TargetCostKind CostKind,- OperandValueInfo Op1Info, OperandValueInfo Op2Info,- const Instruction *I) const = 0;- virtual InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,- TTI::TargetCostKind CostKind,- unsigned Index, Value *Op0,- Value *Op1) const = 0;-- /// \param ScalarUserAndIdx encodes the information about extracts from a- /// vector with 'Scalar' being the value being extracted,'User' being the user- /// of the extract(nullptr if user is not known before vectorization) and- /// 'Idx' being the extract lane.- virtual InstructionCost getVectorInstrCost(- unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index,- Value *Scalar,- ArrayRef<std::tuple<Value *, User *, int>> ScalarUserAndIdx) const = 0;-- virtual InstructionCost getVectorInstrCost(const Instruction &I, Type *Val,- TTI::Targ... [truncated] 
Copy link
Contributor

@nikicnikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this has already managed to bitrot :)

Generally this makes sense to me. Do you have any idea why this design was used in the first place? There's a few places where LLVM does this concept+model dance where plain virtual dispatch would be a lot simpler and cleaner, and I always feel like I must be missing something...

@davemgreen
Copy link
Collaborator

I don't expect any compile-time regressions/improvements here (there is still one virtual call).

I have often thought that the cost model dance is more difficult than helpful and thought recently of simplifying it. Does this mean that all calls between functions like AArch64TTIImpl::getIntrinsicInstrCost and functions in BasicTTIImplBase now go through a virtual dispatch, so need a vtable lookup and cannot be inlined and whatnot? Same for calls from BasicTTIImplBase -> AArch64TTIImpl methods or even AArch64TTIImpl -> AArch64TTIImpl. (Some of those directions might be helped by final, if that does anything useful).

That had been my guess for the reason, that you pay one virtual call to get into the getIntrinsicInstrCost functions, but none to calls once you were inside AArch64TTIImpl and the base classes. Honestly it might be worth removing if we don't see much benefit from it, but it would be nice if the cost model was efficient (to allow us to do other things with it). It might be worth checking, and maybe marking more of the target TTI implementations as final.

Copy link
Contributor

@fhahnfhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect any compile-time regressions/improvements here (there is still one virtual call).

I have often thought that the cost model dance is more difficult than helpful and thought recently of simplifying it. Does this mean that all calls between functions like AArch64TTIImpl::getIntrinsicInstrCost and functions in BasicTTIImplBase now go through a virtual dispatch, so need a vtable lookup and cannot be inlined and whatnot? Same for calls from BasicTTIImplBase -> AArch64TTIImpl methods or even AArch64TTIImpl -> AArch64TTIImpl. (Some of those directions might be helped by final, if that does anything useful).

That had been my guess for the reason, that you pay one virtual call to get into the getIntrinsicInstrCost functions, but none to calls once you were inside AArch64TTIImpl and the base classes. Honestly it might be worth removing if we don't see much benefit from it, but it would be nice if the cost model was efficient (to allow us to do other things with it). It might be worth checking, and maybe marking more of the target TTI implementations as final.

FWIW we now have a quite reliable way of getting a good idea of the compile-time impact via the compile-time-tracker ;)

@s-barannikov
Copy link
ContributorAuthor

Do you have any idea why this design was used in the first place?

This design was introduced by 705b185. I'm not sure I fully understand the motivation, it mostly talks about benefits for the NewPM which I'm not familiar with.

Does this mean that all calls between functions like AArch64TTIImpl::getIntrinsicInstrCost and functions in BasicTTIImplBase now go through a virtual dispatch, so need a vtable lookup and cannot be inlined and whatnot? Same for calls from BasicTTIImplBase -> AArch64TTIImpl methods or even AArch64TTIImpl -> AArch64TTIImpl. (Some of those directions might be helped by final, if that does anything useful).

The honest answer -- I'm not sure. I hope there will be no compile time regressions, and if there are, I'm counting on final.
Here is how I see it:
BasicTTIImplBase still calls methods of a derived class via thisT()->. If that doesn't devirtualize the call, making the implementation final should help. I was hoping to remove the template base in a future patch if there is no compile time difference.
Adding final should also help in AArch64TTIImpl -> AArch64TTIImpl case; calling TTIImpl method from TTI always goes through a virtual dispatch (before/after this change).

Looks like this has already managed to bitrot :)

Yeah, quicker than I expected. Should be easy to fix though.

@nikic
Copy link
Contributor

Compile-time is neutral: https://llvm-compile-time-tracker.com/compare.php?from=9efd798a278a7ddda3b88365558ceb655c329d11&to=ee2a57335ae3be03e9aa925734a6681b2363d628&stat=instructions:u

This is in terms of instruction count though, so it's possible that virtual dispatch overhead is underestimated.

…ve-concept-model # Conflicts: # llvm/include/llvm/Analysis/TargetTransformInfo.h # llvm/include/llvm/Analysis/TargetTransformInfoImpl.h # llvm/include/llvm/CodeGen/BasicTTIImpl.h # llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@s-barannikov
Copy link
ContributorAuthor

Compile-time is neutral: https://llvm-compile-time-tracker.com/compare.php?from=9efd798a278a7ddda3b88365558ceb655c329d11&to=ee2a57335ae3be03e9aa925734a6681b2363d628&stat=instructions:u

This is in terms of instruction count though, so it's possible that virtual dispatch overhead is underestimated.

That sounds optimistic. Can you please add my fork to compile time tracker? I'd like to experiment with final and dropping the template base class.
https://github.com/s-barannikov/llvm-project

@nikic
Copy link
Contributor

@s-barannikov
Copy link
ContributorAuthor

s-barannikov commented Apr 23, 2025

I did some measurements adding commits on top this PR. The results are here, and I summarize them below.

Just adding final seem to change nothing link. Measurements below don't include this commit.

  1. Removing TargetTransformInfoImplCRTPBase slightly worsens compile time (link).
  2. Removing template argument of BasicTTIImplBase worsens compile time more (link). This commit also makes getTLI() and getST() virtual for good measure. Two commits together: link.
  3. Adding finalnow recovers compile time, but not completely. link. Three commits together: link.
  4. So I thought that devirtualizing getST() and getTLI() by caching ST and TLI in both base and derived classes would recover the compile time more, but instead it made it much worse (link). This puzzles me.

Three conclusions from this (YMMV):

  1. De-templatizing introduces more virtual calls worsening compile time, meaning virtual dispatch is measurable in instruction count terms.
  2. After de-templatizing, adding final helps in AArch64TTIImpl -> AArch64TImpl case, but doesn't help in BasicTTIImplBase -> AArch64TTIImpl / BasicTTIImplBase -> BasicTTIImplBase cases. This is expected, since BasicTTIImpl is no longer specialized for a concrete implementation, and calls from it can't be devirtualized (there are many derived classes).
  3. The above results explain why this PR is compile time neutral: it doesn't introduce more virtual calls (well, maybe a few, but not two hundreds). If it introduced many virtual calls, measurements would show regression.
Copy link
Contributor

@nikicnikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@s-barannikov
Copy link
ContributorAuthor

Thanks, I'll wait for more approvals and land it this weekend if there are no objections.

Copy link
Collaborator

@RKSimonRKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I love the idea of CRTP, but so much of it always seems to get in the way :(

@s-barannikovs-barannikov merged commit bb17651 into llvm:mainApr 26, 2025
13 checks passed
@s-barannikovs-barannikov deleted the tti/remove-concept-model branch April 26, 2025 12:25
jyli0116 pushed a commit to jyli0116/llvm-project that referenced this pull request Apr 28, 2025
Replace "concept based polymorphism" with simpler PImpl idiom. This pursues two goals: * Enforce static type checking. Previously, target implementations hid base class methods and type checking was impossible. Now that they override the methods, the compiler will complain on mismatched signatures. * Make the code easier to navigate. Previously, if you asked your favorite LSP server to show a method (e.g. `getInstructionCost()`), it would show you methods from `TTI`, `TTI::Concept`, `TTI::Model`, `TTIImplBase`, and target overrides. Now it is two less :) There are three commits to hopefully simplify the review. The first commit removes `TTI::Model`. This is done by deriving `TargetTransformInfoImplBase` from `TTI::Concept`. This is possible because they implement the same set of interfaces with identical signatures. The first commit makes `TargetTransformImplBase` polymorphic, which means all derived classes should `override` its methods. This is done in second commit to make the first one smaller. It appeared infeasible to extract this into a separate PR because the first commit landed separately would result in tons of `-Woverloaded-virtual` warnings (and break `-Werror` builds). The third commit eliminates `TTI::Concept` by merging it with the only derived class `TargetTransformImplBase`. This commit could be extracted into a separate PR, but it touches the same lines in `TargetTransformInfoImpl.h` (removes `override` added by the second commit and adds `virtual`), so I thought it may make sense to land these two commits together. Pull Request: llvm#136674
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment
close