1
$\begingroup$

I am thinking of how would it be if I can create asynchronous forward function in sub-class of nn.Module . When I came across architecture in attached image, I felt that it would be faster if we could train pass through each branch parallelly.

enter image description here

Another scenario I have come across is when input is of shape (B, C ,V ,T) , where : B: batch_sz , C: node_features , V : num_nodes , T : time_stamps So, there is a graph for each time-stamp corresponding to each instance in the batch. Now there is a module which calculates Adj_matrix of a graph based on its node_features... It would be compute intensive if we club (B,C,V,T)->(BxT, C,V) , and then treat (B*T) as new batch , where T~=250 , B=32....

Is there some work-around that we can evaluate adj_mat efficiently... This problem is not only limited to calculation of adj_mat through my custom module, but rather to use any graph layer from PyG, we need to club B,T

I'm not talking about training on multiple GPU, but making calls asynchronously rather than sequentially.

Please answer both the question , thanks in advance

$\endgroup$
0

    1 Answer 1

    0
    $\begingroup$

    By default, CUDA computations (i.e. kernels) are asynchronous (see the CUDA docs and the Pytorch docs). Therefore, what you suggested is already happening under the hoods.

    $\endgroup$
    1
    • $\begingroup$Thanks bro, I didn't know about it... The way I write forward function made me felt that things are done synchronously... I mean I don't write async anywhere .... Btw I would go through the resources you provided....$\endgroup$CommentedMar 27, 2024 at 12:48

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.