0
$\begingroup$

For my Bachelor's thesis, I am working on a project named "Neural Networks for Matrix Inversion" where deep learning methods are used to compute the inverse of a matrix in comparison to other traditional methods. My objective is to select a suitable group of neural network architectures, implement them for this specific problem, and analyze their performance. The context of the research concerns CFD applications, therefore, my personal goal is to train the models on unsymmetric, large and sparse matrices, however, this has proven to be challenging to implement.

Prior to this, I had no serious experience with deep learning and coding side of the task, but I'm gradually understanding things more fluently now. For now, I created a simple MLP as a benchmark, however, I've been running to issues with overfitting even for small matrices like 3x3, 5x5, which I'm assuming is due to the high trainable parameters to data samples ratio. I also wanted to test ResNet18, however, the model becomes quite large (~11.1 params), hence, larger training time. No matter what I do, there is a significant issue with unstable convergence (train/test loss bouncing up and down) and severe overfitting.

About the dataset generation, I was using a code to generate 10-100k samples but with a minimum size of 16x16. Upon countless plots indicating overfitting, I wanted to use smaller matrices and see the results with them first, but this would mean I have to change the matrix generation code. I've been using sklearn.datasets's make_spd_matrix which means the models are being trained on dense small SPD matrices instead of large, non-symmetric, ill-conditioned ones.

I've been really stuck with this, trying to make progress, but always getting negative results. I do not want to overcomplicate it and would like to list my questions as follows:

  • Is ResNet the right model for this? If not, how would I go on about evaluating this? For example, given the repeated instances of overfitting, how can I analyze whether ResNet is highly prone to overfitting with the computation of the matrix inverse?
  • What are some other models that might be suitable for researching possibly (want to look at auto-encoders if I can make progress) suited for this task?
  • How can I generate non-symmetric, large, sparse (all possibly controllable parameters) matrices? Where to start, what to research?
  • What can be the source to my overfitting issue? How can I identify the source of the problem?

I hope that some of you can help me and provide some direction. Thank you for your time.

New contributor
Arda Bulbul is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
$\endgroup$

    0

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.