1
$\begingroup$

Note: I've read How do subsequent convolution layers work? a few times, but it's still difficult to understand because of the parameters $k_1$, $k_2$, and many proposals (1, 2.1, 2.2) in the question. This seems to be complex for other people too, I think I'm not the only one (see a few comments like "I have just struggled with this same question for a few hours"). So here it is formulated with a particular specific example with no parameters, to grasp the idea more easily.

Let's say we have a CNN with:

  • input: 28x28x1 grayscale images (28x28 pixels, 1 channel)
  • 1st convolutional layer with kernel size 3x3, and 32 features
  • 2nd convolutional layer with kernel size 3x3, and 64 features

Keras implementation:

model = Sequential() model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1))) model.add(Conv2D(64, (3, 3), activation='relu')) 

Question: how does the 2nd layer work?

More precisely:

  • for the 1st layer:

    • input size: (1, 28, 28, 1)
    • weights size: (3, 3, 1, 32) (good to know: the number of weights doesn't depend on input pixel size)
    • output size: (1, 26, 26, 32)
  • for the 2nd layer:

    • input size: (1, 26, 26, 32)
    • weights size: (3, 3, 1 32, 64)
    • output size: (1, 24, 24, 64)

How is the latter possible? It seemed to me that, in the 2nd layer, every input 26x26 image will be convolved with the 3x3 kernels of each 64 feature maps, but this could done for all the 32 channels!

Thus I had the feeling the output of 2nd layer should be (1, 24, 24, 32*64)

How does it work here?

$\endgroup$

    2 Answers 2

    0
    $\begingroup$

    I think I found the reason: the correct description of the 2nd layer is:

    • for the 2nd layer:

      • input size: (1, 26, 26, 32)
      • weights size: (3, 3, 32, 64)
      • output size: (1, 24, 24, 64)

    So for each one of the 64 features, the (26,26,32) input is convoled with a (3,3,32)-sized kernel, producing a (24,24) output.

    Since this is for each one of the 64 features, the output will finally be (1, 24, 24, 64).


    Code to display the shape of the weights:

    for l in model.layers: if len(l.get_weights()) > 0: print(l.get_weights()[0].shape) # ...[1].shape would be for the biases 
    $\endgroup$
      0
      $\begingroup$

      I modified your code a bit. The input has 3 channels. The filter size is 5x5. No bias for each filter.

      model = Sequential([ Convolution2D(32, 5, activation='relu', use_bias=False, input_shape=(28, 28, 3),), Convolution2D(64, 5, activation='relu', use_bias=False,), ]) model.summary() 

      The output is:

      _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_5 (Conv2D) (None, 24, 24, 32) 2400 _________________________________________________________________ conv2d_6 (Conv2D) (None, 20, 20, 64) 51200 ================================================================= Total params: 53,600 Trainable params: 53,600 Non-trainable params: 0 Non-trainable params: 0 

      Notice that the output of the first layer has only 32 channels (32 feature maps) not 32x3 channels. This is because each filter is not a 2D filter. It is 3D! For the first layer, each filter has 5x5x3 weights. Therefore the total weights for the first layer is 5x5x3x32=2400 weights.

      For the second layer, each filter has 5x5x32 weights. Therefore the total weights for the first layer is 5x5x32x64=51200 weights.

      $\endgroup$

        Start asking to get answers

        Find the answer to your question by asking.

        Ask question

        Explore related questions

        See similar questions with these tags.