Performance regression in SLPVectorize between llvm 10.0 and 11.0

@alexey-bataev


Bugzilla Link	48486
Version	11.0
OS	Linux
CC	@alexey-bataev,@topperc,@LebedevRI,@RKSimon,@rscottmanley,@rotateright

Extended Description

With llvm 11.0 the change to the heuristics and/or instructions costs used in SLPVectorize.cpp (opt) have causes a 30% regression in overall application performance with routine __nv_MorphologyPrimitive_F1L2849_2 in the attached morphology.ll as measured on an Intel Skylake 40 core Xeon server.

With llvm 10.0, SLPVectorize promotes some of the loops from using xmm pd to ymm pd. Those same transformations do not happen with llvm 11.0.

Attached in SLPV.tar are:
morphology.ll (used as input for llvm opt releases 10 and 11)
morphology-10.llvm (output of opt using --opt-bisect-limit=778 - just after the SLP pass) - exactly:

lim=778
opt -O2 -mcpu=skylake-avx512 --enable-unsafe-fp-math --enable-no-nans-fp-math --enable-no-infs-fp-math --enable-no-signed-zeros-fp-math --opt-bisect-limit=${lim} ./obj/magick/morphology.ll -S -o ./obj/magick/morphology-10.llvm

morphology-11.llvm
morphology-10.s output from llc invoked with:
-mcpu=skylake-avx512 -O2 --enable-unsafe-fp-math --enable-no-nans-fp-math --enable-no-infs-fp-math --enable-no-signed-zeros-fp-math -fast-isel=0 -non-global-value-max-name-size=4294967295 -x86-cmov-converter=0 -filetype=obj

perf-10.lst and perf-11.lst: snapshots of perf report ofthe most costly loop in routine __nv_MorphologyPrimitive_F1L2849_2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in SLPVectorize between llvm 10.0 and 11.0 #47830

Extended Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance regression in SLPVectorize between llvm 10.0 and 11.0 #47830

Description

Extended Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions