How to Create a 1D Embedding from Tensors of Varying Sizes?

Question

I am a newbie in AI and playing with some computer vision algorithms.

I have three tensors with different sizes. Noise augmentation levels tensor with size (N, C, H, W), diffusion timestep tensor of size (N, H) and pooled pose embeddings of size (N, C, H, W). I need to sum these tensors so that the 1D embedding result can be fed to FiLM layer.

How can I apply the summation without losing data?

Thank you!

Alexander Wan · Accepted Answer · 2024-01-19 01:04:48Z

If you're mapping from a higher dimension to a smaller dimension, you're almost always going to be losing data. The question is how to decide which data you want to keep.

This all is highly domain specific.

I would start with one modality: how should you compress that into a reasonably sized 1D vector? Maybe look for past work that deals with similar dimensions of data: e.g., for video classification you might also want to convert a series of images (n x c x w x h) to a 1D vector that gets fed into a linear classifier. This textbook chapter seems particularly useful. One architecture they propose is a 3D CNN.

To incorporate multiple modalities, you'll probably have to figure out when to concatenate your data. E.g., you could encode each modality separately, then concat at the very end or concatenate at the very beginning. You should also look at past work for this (e.g., maybe video + audio classification?).

Stack Exchange Network

How to Create a 1D Embedding from Tensors of Varying Sizes?

1 Answer 1

You must log in to answer this question.

Hot Network Questions

How to Create a 1D Embedding from Tensors of Varying Sizes?

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions