So I recently started with Andrew Ng's ML Course and this is the formula that Andrew lays out for calculating gradient descent on a linear model.
$$ \theta_j = \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)}\right)x_j^{(i)} \qquad \text{simultaneously update } \theta_j \text{ for all } j$$
As we see, the formula asks us to the sum over all the rows in data.
However, the below code doesn't work if I apply np.sum()
def gradientDescent(X, y, theta, alpha, num_iters): # Initialize some useful values m = y.shape[0] # number of training examples # make a copy of theta, to avoid changing the original array, since numpy arrays # are passed by reference to functions theta = theta.copy() J_history = [] # Use a python list to save cost in every iteration for i in range(num_iters): temp = np.dot(X, theta) - y temp = np.dot(X.T, temp) theta = theta - ((alpha / m) * np.sum(temp)) # save the cost J in every iteration J_history.append(computeCost(X, y, theta)) return theta, J_history
On the other hand, if I get rid of the np.sum(), the formula works perfectly.
def gradientDescent(X, y, theta, alpha, num_iters): # Initialize some useful values m = y.shape[0] # number of training examples # make a copy of theta, to avoid changing the original array, since numpy arrays # are passed by reference to functions theta = theta.copy() J_history = [] # Use a python list to save cost in every iteration for i in range(num_iters): temp = np.dot(X, theta) - y temp = np.dot(X.T, temp) theta = theta - ((alpha / m) * temp) # save the cost J in every iteration J_history.append(computeCost(X, y, theta)) return theta, J_history
Can someone please explain this?