1
$\begingroup$

So I recently started with Andrew Ng's ML Course and this is the formula that Andrew lays out for calculating gradient descent on a linear model.

$$ \theta_j = \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)}\right)x_j^{(i)} \qquad \text{simultaneously update } \theta_j \text{ for all } j$$

As we see, the formula asks us to the sum over all the rows in data.

However, the below code doesn't work if I apply np.sum()

def gradientDescent(X, y, theta, alpha, num_iters): # Initialize some useful values m = y.shape[0] # number of training examples # make a copy of theta, to avoid changing the original array, since numpy arrays # are passed by reference to functions theta = theta.copy() J_history = [] # Use a python list to save cost in every iteration for i in range(num_iters): temp = np.dot(X, theta) - y temp = np.dot(X.T, temp) theta = theta - ((alpha / m) * np.sum(temp)) # save the cost J in every iteration J_history.append(computeCost(X, y, theta)) return theta, J_history 

On the other hand, if I get rid of the np.sum(), the formula works perfectly.

def gradientDescent(X, y, theta, alpha, num_iters): # Initialize some useful values m = y.shape[0] # number of training examples # make a copy of theta, to avoid changing the original array, since numpy arrays # are passed by reference to functions theta = theta.copy() J_history = [] # Use a python list to save cost in every iteration for i in range(num_iters): temp = np.dot(X, theta) - y temp = np.dot(X.T, temp) theta = theta - ((alpha / m) * temp) # save the cost J in every iteration J_history.append(computeCost(X, y, theta)) return theta, J_history 

Can someone please explain this?

$\endgroup$
2
  • $\begingroup$Could it be that the dot product is doing the relevant summing?$\endgroup$
    – Ben Reiniger
    CommentedSep 8, 2019 at 4:24
  • $\begingroup$I don't think so.$\endgroup$
    – tripma
    CommentedSep 8, 2019 at 7:29

2 Answers 2

0
$\begingroup$

Your goal if to compute the gradients for the whole theta vector of size p (number of variables). Your temp is a vector also of size $p$, which contains the values of gradients of the cost function relative to each of your theta values.

Therefore, you want to substract point-wise the two vectors (with learning rate $\alpha$) to make an update, so no reason to sum the vector.

$\endgroup$
2
  • $\begingroup$I still don't understand why the formula then takes the sum over i. Even for the cost function, we have a sum component and we do take the sum : J = np.sum(np.square((np.dot(X, theta) - y))) / (2 * m).$\endgroup$
    – tripma
    CommentedSep 7, 2019 at 18:06
  • $\begingroup$@ManasTripathi if you refer to the $i$ in the formula, it’s just the sum over the training examples. The dot product already handles this (that’s why we say your code is “vectorized”)$\endgroup$
    – Elliot
    CommentedSep 9, 2019 at 8:03
0
$\begingroup$

Commenters are correct - you confuse vector and scalar operations.

The formula is scalar one, and here is how you can implement it:

for n in range(num_iters): for j in range(len(theta)): sum_j = 0 for i in range(len(X)): temp = X[i, j]*theta[j] - y[i] temp = temp * X[i, j] sum_j += temp sum_j = (alpha / m)*sum_j theta[j] = theta[j] - sum_j J_history.append(computeCost(X, y, theta)) 

But you're trying to plug vectors into the scalar formula and that's what causes confusion.

$\endgroup$

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.