I have the network architecture from the paper "learning fine-grained image similarity with deep ranking" and I am unable to figure out how the output from the three parallel network is merged using the linear embedding layer. The only information given on this layer, in the paper is
Finally, we normalize the embeddings from the three parts, and combine them with a linear embedding layer. The dimension of the embedding is 4096.
Can anyone help me in figuring out what exactly does the author mean when he is talking about this layer?