5
\$\begingroup\$

I've got a filter that I'd like to work out how to optimize. The most basic implementation, using loops is like the following:

import numpy as np nrow = 500 ncol = 500 refArray = np.random.randint( 0,100, (nrow,ncol) ) boundsArray = np.random.randint( 90,95, (2,nrow,ncol) ) window = 31 halfwindow = window/2 result = np.zeros_like(refArray) ## we'll ignore edges for the moment for i in range(halfwindow, nrow - halfwindow): for j in range(halfwindow, ncol - halfwindow): window = refArray[ (i-halfwindow):(i+1+halfwindow), (j-halfwindow):(j+1+halfwindow)] valid = (window >= boundsArray[0,i,j]) & (window <= boundsArray[1,i,j]) result[ (i-halfwindow):(i+1+halfwindow), (j-halfwindow):(j+1+halfwindow) ] \ += valid 

So, for each location (i,j) in an array, I find a window around this location, and identify all elements which meet a particular requirement, then move on. At each step I count how often element (i,j) has met the requirement.

I'm looking for suggestions on approaches to write this efficiently. I can't see how I can avoid the loops, so I'd consider options like Cython or f2py.

\$\endgroup\$
1
  • \$\begingroup\$Are boundsArray and refArray always random, or would these be specified in production code?\$\endgroup\$CommentedApr 9, 2015 at 12:50

1 Answer 1

1
\$\begingroup\$

Here's a partial solution:

as_strided = np.lib.stride_tricks.as_strided def fun2(refArray): y = len(range(-halfwindow,halfwindow+1)) II=slice(halfwindow, nrow - halfwindow) bA = boundsArray[:,II,II] x=len(range(halfwindow,nrow-halfwindow)) shape = (x,x) + (y,y) strides = refArray.strides strides = strides + strides print(shape, strides) A = as_strided(refArray, shape=shape,strides=strides) print(A.shape,(refArray[0,0],refArray[-1,-1]),(A[0,0,0,0],A[-1,-1,-1,-1])) valid = (A >= bA[0,:,:,None,None]) & \ (A <= bA[1,:,:,None,None]) valid = valid.sum((-2,-1)) # # valid pts per window print(valid.shape) return valid 

It uses np.lib.stride_tricks.as_strided to create overlapping windows. It's an idea that has been presented in several SO answers.

In timing it is 4-5x faster than your iteration.

A is a (470,470,31,31) view of refArray, i.e. 470x470 windows, each 31x31. And with the strides value they overlap.

But it is partial. valid.sum() gives the same values as your result.sum(). So it finds the same total number of 'valid' points. But valid (after summing on the 31x31 dimensions) is a (470,470) array, one value for each 'window', and that value is the number of valid points within the window. It is the equivalent of doing windows[i-halfwindow,j-halfwindow] = valid.sum() in your iteration.

I think your result is the number of windows in which each point is valid.

I haven't figured out how to map the (470,470,31,31)valid array back on to the (500,500) points. But I pretty sure there is such a mapping.


This is a temporary fix:

def fun2(refArray): y = len(range(-halfwindow,halfwindow+1)) II=slice(halfwindow, nrow - halfwindow) bA = boundsArray[:,II,II] x=len(range(halfwindow,nrow-halfwindow)) shape = (x,x) + (y,y) strides = refArray.strides strides = strides + strides A = as_strided(refArray, shape=shape,strides=strides) valid = (A >= bA[0,:,:,None,None]) & \ (A <= bA[1,:,:,None,None]) return valid def fun3(refArray,valid): result = np.zeros_like(refArray) n0,n1,n2,n3 = valid.shape for i in range(n0): for j in range(n1): result[ i:(i+n2),j:(j+n3) ] += valid[i,j] return result result = fun3(refArray, fun2(refArray)) 

fun3 has the same sort of iteration as your code, but there's still a 2x speedup.

While each window has the same number of points, the reverse is not true. Corner points occur in only one window each. Middle ones around 900 each. That may make a pure vectorized numpy solution impossible.

\$\endgroup\$

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.