5
\$\begingroup\$

I am working on a relatively simple binning program, where I take a 5D array and bin it based on two 3D arrays to create a contour plot. See the sample code below. In actuality, my arrays are of size [27, 150, 20, 144, 288], so running a 4-nested for loop as shown below takes a LONG time.

Is there any way to speed up this loop and ideally avoid the need for all these loops? Apologies in advance if this isn't clear - I am new to all this!

S_mean = np.random.rand(5,10,10,10) T_mean = np.random.rand(5,10,10,10) Volume_mean = np.random.rand(2,5,10,10,10) T_bins = np.linspace(0,1,36) S_bins = np.linspace(0,1,100) int_temp = [] int_sal = [] for i in range(5): int_temp.append(np.digitize(np.ndarray.flatten(T_mean[i,:,:,:]), T_bins)) int_sal.append(np.digitize(np.ndarray.flatten(S_mean[i,:,:,:]), S_bins)) volume_sum = np.zeros((2,5,S_bins.size,T_bins.size)) # This is the problem loop for k in range(2): for l in range(5): for i in range(T_bins.size): for j in range(S_bins.size): volume_sum[k,l,j,i]=(np.nansum(np.ndarray.flatten(Volume_mean[k,l,:,:,:]) [np.argwhere(np.logical_and(int_temp[l] == i, int_sal[l] == j))])) # The output I am trying to get plt.pcolormesh(T_bins, S_bins, volume_sum[0,0,:,:]) plt.show() 
\$\endgroup\$

    1 Answer 1

    3
    \$\begingroup\$

    "Relatively simple" is in the eye of the beholder. Some ideas though:

    This code

    int_temp = [] int_sal = [] for i in range(5): int_temp.append(np.digitize(np.ndarray.flatten(T_mean[i,:,:,:]), T_bins)) int_sal.append(np.digitize(np.ndarray.flatten(S_mean[i,:,:,:]), S_bins)) 

    can be transformed into two list comprehesions

    int_temp = np.array([np.digitize(T_mean[i, :, :, :].flatten(), T_bins) for i in range(5)]) int_sal = np.array([np.digitize(S_mean[i, :, :, :].flatten(), S_bins) for i in range(5)]) 

    I also converted the results into numpy arrays, for reasons that I'll refer to later. Also take note that np.ndarray.flatten(T_mean[i,:,:,:]) was rewritten as T_mean[i, :, :, :].flatten() which is the usual way to use np.ndarray.<fun>: call whatever.<fun>() (<fun> is meant to be a placeholder here, not actual code).

    The next part of the code that caught my eye was

    np.ndarray.flatten(volume_mean[k, l, :, :, :]) [np.argwhere(np.logical_and(int_temp[l] == i, int_sal[l] == j))]) 

    This line is very hard to read, but can easily be rewritten as:

    indices = np.argwhere(np.logical_and(int_temp[l] == i, int_sal[l] == j)) volume_sum[k, l, j, i] = nansum(volume_mean[k, l, :, :, :].flatten()[indices]) 

    which looks muss less frightening. You can even get rid of np.argwhere and use np.logical_and(int_temp[l] == i, int_sal[l] == j) directly as binary mask:

    mask = np.logical_and(int_temp[l] == i, int_sal[l] == j) volume_sum[k, l, j, i] = ( np.nansum(volume_mean[k, l, :, :, :].flatten()[mask]) ) 

    To see if there is a difference in performance, I put the original code and the refactored version into functions:

    def compute_volume_sum(volume_mean, s_mean, t_mean, s_bins, t_bins): """Original implementation""" int_temp = [] int_sal = [] for i in range(5): int_temp.append(np.digitize(np.ndarray.flatten(t_mean[i,:,:,:]), t_bins)) int_sal.append(np.digitize(np.ndarray.flatten(s_mean[i,:,:,:]), s_bins)) volume_sum = np.zeros((2, 5, s_bins.size, t_bins.size)) for k in range(2): for l in range(5): for i in range(t_bins.size): for j in range(s_bins.size): volume_sum[k, l, j, i] = (np.nansum( np.ndarray.flatten(volume_mean[k, l, :, :, :])[np.argwhere( np.logical_and(int_temp[l] == i, int_sal[l] == j))])) return volume_sum def compute_volume_sum_ref(volume_mean, s_mean, t_mean, s_bins, t_bins): """Refactored implementation""" int_temp = np.array([np.digitize(t_mean[i, :, :, :].flatten(), t_bins) for i in range(5)]) int_sal = np.array([np.digitize(s_mean[i, :, :, :].flatten(), s_bins) for i in range(5)]) volume_sum = np.zeros((2, 5, s_bins.size, t_bins.size)) for k in range(2): for l in range(5): for i in range(t_bins.size): for j in range(s_bins.size): mask = np.logical_and(int_temp[l] == i, int_sal[l] == j) volume_sum[k, l, j, i] = ( np.nansum(volume_mean[k, l, :, :, :].flatten()[mask]) ) return volume_sum 

    BTW: You can confirm that both functions return the same value using np.allclose

    I then measured how long it takes to run each function ten times. The results were as follows:

    original: 16.860085s refactored: 11.849922s 

    Not bad, but those loops still hurt.

    Enter numba, a just-in-time Python compiler. It works nicely with numpy and can make loops quite a bit faster since they are compiled to native code. I recommend to use a scientific Python distribution like Anaconda if you want to try/use it.

    Unfortunately, numba does not support the full feature set of Python (e.g. list comprehensions are not supported), so I had to resort to a little bit of trickery to get it working:

    import numba def compute_volume_sum_nb(volume_mean, s_mean, t_mean, s_bins, t_bins): """numba version""" int_temp = np.array([np.digitize(t_mean[i, :, :, :].flatten(), t_bins) for i in range(5)]) int_sal = np.array([np.digitize(s_mean[i, :, :, :].flatten(), s_bins) for i in range(5)]) return _numba_inner(volume_mean, s_bins, t_bins, int_temp, int_sal) @numba.njit() def _numba_inner(volume_mean, s_bins, t_bins, int_temp, int_sal): volume_sum = np.zeros((2, 5, s_bins.size, t_bins.size)) for k in range(2): for l in range(5): for i in range(t_bins.size): for j in range(s_bins.size): mask = np.logical_and(int_temp[l] == i, int_sal[l] == j) volume_sum[k, l, j, i] = ( np.nansum(volume_mean[k, l, :, :, :].flatten()[mask]) ) return volume_sum 

    numba.njit() is a decorator that can be used on functions to mark their function body for jit-compilation. The first time compute_volume_sum_nb is called, numba triggers the compilation process so be sure to exclude the first call from any timings you do.

    Let's see how the timings now look like:

    original: 16.860085s refactored: 11.849922s numba: 0.833529s 

    That's quite a bit faster, isn't it?

    I'm almost sure that some of the loops could be replaced by clever indexing and use of vectorized functions in numpy, but I don't have the time to dive deeper into the problem.


    A few non-performance related notes: be sure so have a look at the Style Guide for Python Code (aka PEP 8). The code sometimes already follows those guidelines, but sometimes also doesn't, e.g. when using uppercase letters in variable names.

    Maybe also think about how you can modularize your code, e.g by using functions. Those separated parts are also easier to document. You want to write """"documentation""" to save future you some headache, don't you?

    \$\endgroup\$
    2
    • \$\begingroup\$This is great. Just wanted to point out that numpy.searchsorted is faster than numpy.digitize for large datasets. See here for more information\$\endgroup\$
      – user182606
      CommentedMar 13, 2020 at 5:31
    • 1
      \$\begingroup\$@allthemikeysaretaken: That was fixed in 2014, which is also mentioned at the end of the issue you linked.\$\endgroup\$
      – Graipher
      CommentedMar 14, 2020 at 16:12

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.