Regression - random error term

Question

When we use a regression algorithm in out dataset it's because we assume that there is a relation between our input data and some quantitative value. This is expressed as :

$y = f(x)+\varepsilon $, where x is an input vector and $\varepsilon$ is the random error term.

Now, this random error term can have any probability distribution ?

nsaura · Accepted Answer · 2018-06-22 14:58:29Z

$\varepsilon$ is referred as a noise term with 0 mean. The distribution is random in the real world, but you can make assumptions on its distribution.

For example the Gaussian Process machine learning suggests that it follows a Gaussian distribution i.e : $\varepsilon \sim \mathcal{N}\left(0, \sigma^ 2 \right)$.

The variance $\sigma$ of the latter distribution can be seen as an hyperparameter that we obtaining by maximizing the likelihood function, or by having prior information on the data set. You can find more information in this book Rasmussen, C. E., and C. K. Williams. "I (2006) Gaussian Processes for Machine Learning." (2006)

In some cases by looking at your a priori data, and estimating the possible error sources, you can a priori expect the type of the error distribution and try to assess this assumption a posteriori (mainly using data-driven method).

Why can we assume that the mean of the random error term is 0 ? To me, it seems really improbable that in real situations we can always assume it... — Qwerto, CommentedJun 24, 2018 at 9:32
I think the main point of this assumption is the random distribution of the error. This suggests the same probability to have negative or positive error. Note that, in Gaussian Process this error only impacts this model prediction's variance. So the mean of the prediction remains the same despite the error — nsaura, CommentedJun 25, 2018 at 9:47

Vincenzo Lavorini · Accepted Answer · 2018-06-22 09:18:59Z

No, if it's a random error it has to follow a normal distribution with mean value equal to zero.

This because of the central limit theorem.

If instead that error do not follow a normal distribution, than it is not random: it's a systematic you probably have to consider into the original function f(x).

FYI: here* there is a similar question, and an answer same as this one was accepted. * datascience.stackexchange.com/questions/33599/noise-has-0-mean — Vincenzo Lavorini, CommentedJun 25, 2018 at 12:29

Stack Exchange Network

Regression - random error term

2 Answers 2

Linked

Hot Network Questions

Regression - random error term

2 Answers 2

Linked

Related

Hot Network Questions