I need a function that takes a numpy array and a row number as inputs and returns the array (or copy of the array) excluding the given row. I want to do this as efficiently as possible.
# Test array x = np.repeat(range(10),2).reshape([10,2])
Indexing by slices is very fast in numpy, but as far as I can tell this can only be used to get a contiguous set of rows. For instance, I know I can use slicing to exclude the first row
def index_rows_by_exclusion_firstrow(arr): """ Return slice of arr excluding first row using slice-based indexing """ return arr[1:] %timeit index_rows_by_exclusion_firstrow(x) #The slowest run took 33.84 times longer than the fastest. This could mean that an intermediate result is being cached #1000000 loops, best of 3: 204 ns per loop
There is a numpy function, numpy.delete
, that does the operation that I'm looking for, but it creates a new array and is therefore very slow.
def index_rows_by_exclusion_npdel(arr, i): """ Return copy of arr excluding single row of position i using numpy delete function """ return np.delete(arr, i, 0) %timeit index_rows_by_exclusion_npdel(x, 1) #The slowest run took 5.51 times longer than the fastest. This could mean that an intermediate result is being cached #100000 loops, best of 3: 9.65 µs per loop
The best I have come up with is indexing by a list, which is about twice as fast as the numpy.delete
version of the function, but still ~30 times as slow as slicing
def index_rows_by_exclusion_list(arr, i): """ Return slice of arr excluding single row of position i using list-based indexing """ return arr[[ x for x in range(arr.shape[0]) if x != i]] %timeit index_rows_by_exclusion_list(x,1) #The slowest run took 12.54 times longer than the fastest. This could mean that an intermediate result is being cached #100000 loops, best of 3: 5.82 µs per loop
My question is: is there any faster way to index a numpy array like this? Is there any way to use slicing to index all but one row in an array?
One additional note: the code I'm writing has to be compatible with Cython, which means I can't use boolean masking arrays because Numpy in Cython does not support boolean arrays (this is what I was doing before I realized that boolean arrays don't work in Cython).