Skip to content

Latest commit

 

History

History
63 lines (48 loc) · 3.23 KB

itex_fusion.md

File metadata and controls

63 lines (48 loc) · 3.23 KB

Graph fusion

Intel® Extension for TensorFlow* provides graph optimization to fuse specified operator patterns into a new single operator for better performance.

Basic fusion

The basic list of supported fusions is shown below. These fusions require input and output of the same data type.

PatternOperator number
(Equal, NotEqual, GreaterEqual, Greater, LessEqual, Less)+Cast2
L2loss+AddN2
BatchMatMul+Mul2
Mul+AddN+TrainingOp3
Conv+Bias2
Conv+Bias+(Relu, Relu6, Elu, LeakyRelu, Gelu_erf, Gelu_tanh, Tanh, Sigmoid)3
MatMul+Bias2
MatMul+Bias+(Relu, Relu6, Elu, Gelu_erf, Gelu_tanh, Tanh, Sigmoid)3
FusedBatchNorm+Relu2
FusedBatchNormGrad+ReluGrad2
Conv+Bias+Add3
Conv+Bias+Add+(Relu, Relu6, Elu, LeakyRelu, Gelu_erf, Gelu_tanh, Tanh, Sigmoid)4
MatMul+Bias+Add3
MatMul+Bias+Add+(Relu, Relu6, Elu, Gelu_erf, Gelu_tanh, Tanh, Sigmoid)4
MatMul+BiasAddGrad2
ConvGradFilter+BiasAddGrad2
Pad+Conv2
BatchMatMul with variable post-op2+
Swish2
LayerNorm3+

Mixed data type fusion

As stock TensorFlow only supports same-data-type input and output, inserting a cast node during BF16 inference and training may break the existing fusion pattern and impact performance.

Intel® Extension for TensorFlow* provides mixed data type fusion, which removes the additional data type conversions on the graph level.

Here is the list of supported mixed data type fusions, and we'll take a closer look at MatMul as an example.

PatternFused operatorInput data typeOutput data typeoneDNN FP32 Math mode
MatMul + CastAccMatMulBF16FP32N/A
FusedMatMul + CastFusedAccMatMulBF16FP32N/A
AccMatMul + any MatMul FusionFusedAccMatMulBF16FP32N/A
Cast + MatMul + CastAccMatMulFP32FP32BF16
Cast + FusedMatMul + CastFusedAccMatMulFP32FP32BF16

Implementation Details

The Cast + (Fused)MatMul + Cast pattern is covered by pattern matcher; the rest is covered by remapper fusion. The new kernels are implemented(AccMatMul and FusedAccMatMul(WithSum))as an extension of original MatMul with the following new attributes:

  • Tout: Output data type ∈ {float32}.
  • Tpost: Post op data type ∈ {bfloat16, float32}.
  • is_bf16_math_mode: A Boolean to indicate whether to use oneDNN BF16 math mode if FP32 input, FP32 output.

Generic layout optimizer

As the channels_first format is not supported by stock TensorFlow on CPU, it inserts transpose nodes before and after the Conv3D/MaxPool3D nodes. However, this problem does not exist in GPU device. To avoid unnecessary layout transformation when running on a GPU device, Intel® Extension for TensorFlow* adds a separate layout optimizer.

PatternFused operatorConv data format (before optimization)Conv data format (after optimization)
Transpose + Conv3D + TransposeConv3DNDHWCNCDHW
close