Here's a partial solution:
as_strided = np.lib.stride_tricks.as_strided def fun2(refArray): y = len(range(-halfwindow,halfwindow+1)) II=slice(halfwindow, nrow - halfwindow) bA = boundsArray[:,II,II] x=len(range(halfwindow,nrow-halfwindow)) shape = (x,x) + (y,y) strides = refArray.strides strides = strides + strides print(shape, strides) A = as_strided(refArray, shape=shape,strides=strides) print(A.shape,(refArray[0,0],refArray[-1,-1]),(A[0,0,0,0],A[-1,-1,-1,-1])) valid = (A >= bA[0,:,:,None,None]) & \ (A <= bA[1,:,:,None,None]) valid = valid.sum((-2,-1)) # # valid pts per window print(valid.shape) return valid
It uses np.lib.stride_tricks.as_strided
to create overlapping windows. It's an idea that has been presented in several SO answers.
In timing it is 4-5x faster than your iteration.
A
is a (470,470,31,31)
view of refArray
, i.e. 470x470 windows, each 31x31. And with the strides
value they overlap.
But it is partial. valid.sum()
gives the same values as your result.sum()
. So it finds the same total number of 'valid' points. But valid
(after summing on the 31x31 dimensions) is a (470,470) array, one value for each 'window', and that value is the number of valid points within the window. It is the equivalent of doing windows[i-halfwindow,j-halfwindow] = valid.sum()
in your iteration.
I think your result
is the number of windows in which each point is valid.
I haven't figured out how to map the (470,470,31,31)
valid
array back on to the (500,500)
points. But I pretty sure there is such a mapping.
This is a temporary fix:
def fun2(refArray): y = len(range(-halfwindow,halfwindow+1)) II=slice(halfwindow, nrow - halfwindow) bA = boundsArray[:,II,II] x=len(range(halfwindow,nrow-halfwindow)) shape = (x,x) + (y,y) strides = refArray.strides strides = strides + strides A = as_strided(refArray, shape=shape,strides=strides) valid = (A >= bA[0,:,:,None,None]) & \ (A <= bA[1,:,:,None,None]) return valid def fun3(refArray,valid): result = np.zeros_like(refArray) n0,n1,n2,n3 = valid.shape for i in range(n0): for j in range(n1): result[ i:(i+n2),j:(j+n3) ] += valid[i,j] return result result = fun3(refArray, fun2(refArray))
fun3
has the same sort of iteration as your code, but there's still a 2x speedup.
While each window has the same number of points, the reverse is not true. Corner points occur in only one window each. Middle ones around 900 each. That may make a pure vectorized numpy solution impossible.
boundsArray
andrefArray
always random, or would these be specified in production code?\$\endgroup\$