Linear Regression in Python using gradient descent

Question

I am trying to implement a simple multivariate linear regression model without using any inbuilt machine libraries. So far, I have been able to get a root mean squared error for training about $2.93$ and the model from the normal (closed-form) equation is able to produce a training RMSE of $~2.3$. I am looking for ways in which I can improve my implementation of the gradient descent algorithm. Below is my implementation:

My gradient descent method looks like this: $\theta = \theta - [(\alpha/2N) * X (X\theta - Y)]$ where $\theta$ is the model parameter, $N$ is the number of training elements, $X$ is the input and $Y$ are the target elements. $\alpha$ is the step size.

def gradientDescent(self): for i in range(self.iters): # T = T - (\alpha/2N) * X*(XT - Y) self.theta = self.theta - (self.alpha/len(self.X)) * np.sum(self.X * (self.X @ self.theta.T - self.Y), axis=0) return errors

I had set the $\alpha$ as $0.1$ and number of iterations as 1000. The gradient descent reaches convergence at around 700-800 iterations (checked).

My error function is like:

def error_function(self): # Error function: (1/2N) * (XT - Y)^2 where T is theta error_values = np.power(((self.X @ self.theta.T) - self.Y), 2) return np.sum(error_values)/(2 * len(self.X))

I was expecting the training error from the gradient descent and the normal equations would turn out to be similar, but they have a bit of a huge difference. So, I wanted to know whether I am doing anything wrong or not.

PS I have not normalized the data, yet. Normalizing leads to a much lower RMSE (~$0.22$)

Shouldn't that be X transposed? $𝑋^{T}(𝑋𝜃−𝑌)$. — Elmex80s, CommentedNov 18, 2020 at 21:40
Yes, it would be $X^T(X\theta - Y)$. I have this habit of typing all vector equations in one go :P — MaJoR21, CommentedNov 19, 2020 at 12:09
self.theta = self.theta - (self.alpha/len(self.X)) * np.sum(self.X * (self.X @ self.theta.T - self.Y), axis=0). Lot of errors in this line of code. — Elmex80s, CommentedNov 19, 2020 at 16:14

nimar · Accepted Answer · 2020-06-19 17:37:36Z

That could be due to many different reasons. The most important one is that your cost function might be stuck in local minima. To solve this issue, you can use a different learning rate or change your initialization for the coefficients.

There might be a problem in your code for updating weights or calculating the gradient.

However, I used both methods for a simple linear regression and got the same results as follows:

import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.datasets import make_regression # generate regression dataset X, y = make_regression(n_samples=100, n_features=1, noise=30) def cost_MSE(y_true, y_pred): ''' Cost function ''' # Shape of the dataset n = y_true.shape[0] # Error error = y_true - y_pred # Cost mse = np.dot(error, error) / n return mse def cost_derivative(X, y_true, y_pred): ''' Compute the derivative of the loss function ''' # Shape of the dataset n = y_true.shape[0] # Error error = y_true - y_pred # Derivative der = -2 / n * np.dot(X, error) return der # Lets run an example X_new = np.concatenate((np.ones(X.shape), X), axis = 1) learning_rate = 0.1 X_new_T = X_new.T n_iters = 100 mse = [] #initialize the weight vector alpha = np.array([0, np.random.rand()]) for _ in range(n_iters): # Compute the predicted y y_pred = np.dot(X_new, alpha) # Compute the MSE mse.append(cost_MSE(y, y_pred)) # Compute the derivative der = cost_derivative(X_new_T, y, y_pred) # Update the weight alpha -= learning_rate * der alpha

for the gradient descent the coefficients were:

array([-3.36575322, 28.06370831])

Here is the code for closed-form solution:

np.dot(np.linalg.inv(np.dot(X_new_T,X_new)), np.dot(X_new_T, y))

And the coefficients for the closed-form solution:

array([-3.36575322, 28.06370831])

As the coefficients are equal, the RMSE, MSE, R2 are equal.

Stack Exchange Network

Linear Regression in Python using gradient descent

1 Answer 1

Hot Network Questions

Linear Regression in Python using gradient descent

1 Answer 1

Related

Hot Network Questions