1
$\begingroup$

When we use a regression algorithm in out dataset it's because we assume that there is a relation between our input data and some quantitative value. This is expressed as :

$y = f(x)+\varepsilon $, where x is an input vector and $\varepsilon$ is the random error term.

Now, this random error term can have any probability distribution ?

$\endgroup$

    2 Answers 2

    2
    $\begingroup$

    $\varepsilon$ is referred as a noise term with 0 mean. The distribution is random in the real world, but you can make assumptions on its distribution.

    For example the Gaussian Process machine learning suggests that it follows a Gaussian distribution i.e : $\varepsilon \sim \mathcal{N}\left(0, \sigma^ 2 \right)$.

    The variance $\sigma$ of the latter distribution can be seen as an hyperparameter that we obtaining by maximizing the likelihood function, or by having prior information on the data set. You can find more information in this book Rasmussen, C. E., and C. K. Williams. "I (2006) Gaussian Processes for Machine Learning." (2006)

    In some cases by looking at your a priori data, and estimating the possible error sources, you can a priori expect the type of the error distribution and try to assess this assumption a posteriori (mainly using data-driven method).

    $\endgroup$
    2
    • $\begingroup$Why can we assume that the mean of the random error term is 0 ? To me, it seems really improbable that in real situations we can always assume it...$\endgroup$
      – Qwerto
      CommentedJun 24, 2018 at 9:32
    • $\begingroup$I think the main point of this assumption is the random distribution of the error. This suggests the same probability to have negative or positive error. Note that, in Gaussian Process this error only impacts this model prediction's variance. So the mean of the prediction remains the same despite the error$\endgroup$
      – nsaura
      CommentedJun 25, 2018 at 9:47
    -1
    $\begingroup$

    No, if it's a random error it has to follow a normal distribution with mean value equal to zero.

    This because of the central limit theorem.

    If instead that error do not follow a normal distribution, than it is not random: it's a systematic you probably have to consider into the original function f(x).

    $\endgroup$
    1

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.