I am trying to implement a simple multivariate linear regression model without using any inbuilt machine libraries. So far, I have been able to get a root mean squared error for training about $2.93$ and the model from the normal (closed-form) equation is able to produce a training RMSE of $~2.3$. I am looking for ways in which I can improve my implementation of the gradient descent algorithm. Below is my implementation:
My gradient descent method looks like this: $\theta = \theta - [(\alpha/2N) * X (X\theta - Y)]$ where $\theta$ is the model parameter, $N$ is the number of training elements, $X$ is the input and $Y$ are the target elements. $\alpha$ is the step size.
def gradientDescent(self): for i in range(self.iters): # T = T - (\alpha/2N) * X*(XT - Y) self.theta = self.theta - (self.alpha/len(self.X)) * np.sum(self.X * (self.X @ self.theta.T - self.Y), axis=0) return errors
I had set the $\alpha$ as $0.1$ and number of iterations as 1000. The gradient descent reaches convergence at around 700-800 iterations (checked).
My error function is like:
def error_function(self): # Error function: (1/2N) * (XT - Y)^2 where T is theta error_values = np.power(((self.X @ self.theta.T) - self.Y), 2) return np.sum(error_values)/(2 * len(self.X))
I was expecting the training error from the gradient descent and the normal equations would turn out to be similar, but they have a bit of a huge difference. So, I wanted to know whether I am doing anything wrong or not.
PS I have not normalized the data, yet. Normalizing leads to a much lower RMSE (~$0.22$)
self.theta = self.theta - (self.alpha/len(self.X)) * np.sum(self.X * (self.X @ self.theta.T - self.Y), axis=0)
. Lot of errors in this line of code.$\endgroup$