Improve Performance of Comparing two Numpy Arrays

Question

I had a code challenge for a class I'm taking that built a NN algorithm. I got it to work but I used really basic methods for solving it. There are two 1D NP Arrays that have values 0-2 in them, both equal length. They represent two different trains and test data The output is a confusion matrix that shows which received the right predictions and which received the wrong (doesn't matter ;).

This code is correct - I just feel I took the lazy way out working with lists and then turning those lists into a ndarray. I would love to see if people have some tips on maybe utilizing Numpy for this? Anything Clever?

import numpy as np x = [0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 2, 0, 0, 0, 0, 0, 1, 0] y = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] testy = np.array(x) testy_fit = np.array(y) row_no = [0,0,0] row_dh = [0,0,0] row_sl = [0,0,0] # Code for the first row - NO for i in range(len(testy)): if testy.item(i) == 0 and testy_fit.item(i) == 0: row_no[0] += 1 elif testy.item(i) == 0 and testy_fit.item(i) == 1: row_no[1] += 1 elif testy.item(i) == 0 and testy_fit.item(i) == 2: row_no[2] += 1 # Code for the second row - DH for i in range(len(testy)): if testy.item(i) == 1 and testy_fit.item(i) == 0: row_dh[0] += 1 elif testy.item(i) == 1 and testy_fit.item(i) == 1: row_dh[1] += 1 elif testy.item(i) == 1 and testy_fit.item(i) == 2: row_dh[2] += 1 # Code for the third row - SL for i in range(len(testy)): if testy.item(i) == 2 and testy_fit.item(i) == 0: row_sl[0] += 1 elif testy.item(i) == 2 and testy_fit.item(i) == 1: row_sl[1] += 1 elif testy.item(i) == 2 and testy_fit.item(i) == 2: row_sl[2] += 1 confusion = np.array([row_no,row_dh,row_sl]) print(confusion)

the result of the print is correct as follow:

[[16 10 0] [ 2 10 0] [ 2 0 22]]

Good thing this got an answer on SO before it was moved. Performance questions for numpy are routine on SO. — hpaulj, CommentedMay 6, 2019 at 0:15

Warren WeckesserWarren Weckesser · Accepted Answer · 2019-05-05 23:41:43Z

5

This can be implemented concisely by using numpy.add.at:

In [2]: c = np.zeros((3, 3), dtype=int) In [3]: np.add.at(c, (x, y), 1) In [4]: c Out[4]: array([[16, 10, 0], [ 2, 10, 0], [ 2, 0, 22]])

answered May 5, 2019 at 23:41

Warren Weckesser

\$\begingroup\$Oh my! I thought there would be something better but i didn't think 1 line of code! Wow. So glad I asked and thank you!\$\endgroup\$
– broepke
CommentedMay 6, 2019 at 2:04
2
\$\begingroup\$Rule #1 of numpy is if you want to do something, check the docs first to check for a 1 line solution.\$\endgroup\$
– Oscar Smith
CommentedMay 6, 2019 at 5:39

Add a comment |

Graipher · Accepted Answer · 2019-05-06 06:58:19Z

For now disregarding that there is a (way) better numpy solution to this, as explained in the answer by @WarrenWeckesser, here is a short code review of your actual code.

testy.item(i) is a very unusual way to say testy[i]. It is probably also slower as it involves an attribute lookup.

Don't repeat yourself. You test e.g. if testy.item(i) == 0 three times, each time with a different second condition. Just nest them in an if block:

for i in range(len(testy)): if testy[i] == 0: if testy_fit[i] == 0: row_no[0] += 1 elif testy_fit[i] == 1: row_no[1] += 1 elif testy_fit[i] == 2: row_no[2] += 1

Loop like a native. Don't iterate over the indices of iterables, iterate over the iterable(s)! You can also use the fact that the value encodes the position you want to increment:
```
for test, fit in zip(testy, testy_fit): if test == 0 and fit in {0, 1, 2}: row_no[fit] += 1 
```

You can even use the fact that the first value encodes the list you want to use and iterate only once. Or even better, make it a list of lists right away:

n = 3 confusion_matrix = [[0] * n for _ in range(n)] for test, fit in zip(testy, testy_fit): confusion_matrix[test][fit] += 1 print(np.array(confusion_matrix))

Don't put everything into the global space, to be run whenever you interact with the script at all. Put your code into functions, document them with a docstring, and execute them under a if __name__ == "__main__": guard, which allows you to import from this script from another script without your code running:

def confusion_matrix(x, y): """Return the confusion matrix for two vectors `x` and `y`. x and y must only have values from 0 to n and 0 to m, respectively. """ n, m = np.max(x) + 1, np.max(y) + 1 matrix = [[0] * m for _ in range(n)] for a, b in zip(x, y): matrix[a][b] += 1 return matrix if __name__ == "__main__": x = ... y = ... print(np.array(confusion_matrix(x, y)))

Once you have come this far, you can just swap the implementation of this function to the faster numpy one without changing anything (except that it then directly returns a numpy.array instead of a list of lists).

Stack Exchange Network

Improve Performance of Comparing two Numpy Arrays

2 Answers 2

Hot Network Questions

Improve Performance of Comparing two Numpy Arrays

2 Answers 2

Related

Hot Network Questions