I was trying to understand some basics about the tensorflow and I got stuck while reading documentation for max pooling 2D layer: https://www.tensorflow.org/tutorials/layers#pooling_layer_1
This is how max_pooling2d is specified:
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
where conv1
has a tensor with shape [batch_size, image_width, image_height, channels]
, concretely in this case it's [batch_size, 28, 28, 32]
.
So our input is a tensor with shape: [batch_size, 28, 28, 32]
.
My understanding of a max pooling 2D layer is that it will apply a filter of size pool_size
(2x2 in this case) and moving sliding window by stride
(also 2x2). This means that both width
and height
of the image will be halfed, i.e. we will end up with 14x14 pixels per channel (32 channels in total), meaning our output is a tensor with shape: [batch_size, 14, 14, 32]
.
However, according to the above link, the shape of the output tensor is [batch_size, 14, 14, 1]
:
Our output tensor produced by max_pooling2d() (pool1) has a shape of [batch_size, 14, 14, 1]: the 2x2 filter reduces width and height by 50%.
What am I missing here?
How was 32 converted to 1?
They apply the same logic later here: https://www.tensorflow.org/tutorials/layers#convolutional_layer_2_and_pooling_layer_2
but this time it's correct, i.e. [batch_size, 14, 14, 64]
becomes [batch_size, 7, 7, 64]
(number of channels is the same).