How can I tune LSTM hyperparameters?

Question

If anyone is there to answer these, that'll be great. I'm in the midst of a Final Year Project on LSTM.

Currently, I’m stuck and confused over LSTM codes. There are 4 hyperparameters that I can play around with:

Look back
Batch size
LSTM units
No. of Epochs

Can you explain what will happen to my results if I tune each of these hyperparameters? And also is it common if we get different results each time we run the codes?

Eduardo Di Santi Grönros · Accepted Answer · 2020-02-02 18:00:05Z

Look back, I don't know look back as an hyper parameter, but in LSTM when you trying to predict the next step you need to arrange your data by "looking back" certain time steps to prepare the data set for training, for example, suppose you want to estimate the next value of an episode that happens every time t. You need to re-arrange you data in a shape like: {t1, t2, t3} -> t4 {t2, t3, t4] -> t5 {t3, t4, t5} -> t6 The net will learn this and will be able to predict tx based on previous time steps.

Batch size (is not referred to LSTM only), roughly is how much samples will be trained per single step, as bigger the batch size is the faster the training is but more memory is needed. In a GPU is better to have bigger batch sizes because copying the values from GPU to memory is slow.

LSTM units, refers to how much "smart" neurons you will have. This is highly dependent on your dataset, usually you determine this depending on your vector dimensions.

No. of Epochs, how much times the algorithm will run to approximate the observations. Usually to much epochs will overfit your model and to little will end up in an under fitted one.

is it common that i get different results each time i run my LSTM codes? if so, is there a way to make the result consistent? i've tried number.seed() method already. — Marcus, CommentedFeb 3, 2020 at 2:58
Provided your dataset is balanced, results has to be the same, if not, try more epochs first. — Eduardo Di Santi Grönros, CommentedFeb 4, 2020 at 11:49
That is not true. Depending on your framework you need to set seeds and make sure the batches are fed in the exactly same manner as well ie. seed for shuffling the batches. — hH1sG0n3, CommentedOct 30, 2020 at 10:52

hH1sG0n3 · Accepted Answer · 2020-10-30 11:39:26Z

Lookback: I am not sure what you refer to. First thing that comes to mind is clip which is a hyperparameter controlling for vanishing/exploding gradients.
Mini batching: There is a tradeoff between computational speed and speed of convergence. In short, *it has been observed that when using a larger batch there is a significant degradation in the quality of the model, as measured by its ability to generalize.*Hence, you need to search over the ideal size for your case. See an excellent discussion here
LSTM units: otherwise called latent dimension of each LSTM cell, it controls the size of your hidden and cell states. The larger the value of this the "bigger" the memory of your model in terms of sequential dependencies. This will be softly depended to the size of your embedding.
Epochs: if you are not familiar with the notion of stochastic, mini-batching and batch training I suggest you familiarise your self with it before moving any further. here or here. In essence the number of epochs will define the number of times your model will see the entirety of your dataset.

Stack Exchange Network

How can I tune LSTM hyperparameters?

2 Answers 2

Hot Network Questions

How can I tune LSTM hyperparameters?

2 Answers 2

Related

Hot Network Questions