1
$\begingroup$

I'm encountering a scaling issue in a machine learning project. I'm predicting a target variable from an input sequence (and doing this for many). However, I've encountered a challenge where the predicted target variable sometimes falls outside the range of the input sequence.

Say the input sequence consists of values ranging between -5 and 5. Most of the time, my target variable is in that range but occasionally, it can say go up to 10 or down to -10. Then, having scaled my x values together using a MinMax scaler that I fit only to them, they're between 0 and 1. If i then use that scaler to transform a target variable of 10, I end up with a value above 1. Which is higher than the tanh activation function's range.

So of course, training the model, I get a very high error rate because it doesn't often want to predict above 1 or below 0. I also think I need to scale the target variables because for two otherwise similar sequences, the target variables can be different because each sequence isn't necessarily using the same product.

Any ideas regarding scaling or a way to bypass it even?

$\endgroup$

    1 Answer 1

    1
    $\begingroup$

    "Interpolation is easy. Extrapolation is hard." Extrapolating might be the right thing to do. But always be suspect of a model that is leading you to big unexplored regions of the state space. Absent a solid theory of the underlying generative process, perhaps a theory from physics, there's an excellent chance that extrapolated values will be wildly wrong.


    We scale input values for the convenience of distance functions (norms).

    For example, if you gather (x, y, z) spatial coordinates at Yosemite National Park in terms of (longitude, latitude, mm of mercury), then computing Euclidean distances in 3-space will not go well for you, as they're not drawn from comparable scales. Expressing them as displacements in meters from a convenient origin like base of Vernal Falls would make more sense.

    Similarly, if input var1 in your project ranges from .04 to .07, and var2 is observed to go from -5 to 5, that's going to pose problems for some of the inference methods you might choose to use. Scaling both to the unit interval puts them on the same footing.

    higher than the tanh activation function's range

    This is a purely technical concern. You might want to choose a more robust activation function. Or, simply clip the extreme values. Downside of clipping is that it's hard to chase a zero gradient in the desired direction.

    $\endgroup$

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.