1

I got the following numpy array named 'data'. It consists of 15118 rows and 2 columns. The first column mostly consist of 0.01 steps, but sometimes there is a step in between (shown in red) which I would like to remove/filter out.

I achieved this with the following code:

# Create array [0, 0.01 .... 140], rounded 2 decimals to prevent floating point error b = np.round(np.arange(0,140.01,0.01),2) # New empty data array new_data = np.empty(shape=[0, 2]) # Loop over values to remove/filter out data for x in b: Index = np.where(x == data[:,0])[0][0] new_data = np.vstack([new_data,data[Index]]) 

I feel like this code is far from optimal and I was wondering if anyone knows a faster/better way of achieving this?

3
  • 1
    Does this answer your question? How to check if consecutive elements of array are evenly spaced?
    – albert
    CommentedApr 9, 2021 at 13:26
  • Is this question regarding the rounding error? I would probably do b = np.arange(0, 141, dtype=np.float32) / 100
    – Kevin
    CommentedApr 9, 2021 at 14:13
  • 1
    Using rounding to ensure equality is really unsafe unless both data[:,0] and b are rounded using a IEEE-754 compliant method. Using a strict floating-point equality is generally a bad idea. Use intervals or epsilon-based checking.CommentedApr 9, 2021 at 18:50

1 Answer 1

1

Here's a solution using pandas for resampling, you can probably achieve the same result in pure numpy but there are a number of floating point and rounding error pitfalls you are going to face, maybe it's better to let a trusted library do the work for you.

Let's say arr is your data array and assume your index to be in fractions of seconds. You can convert your array to a dataframe with a timedelta index:

df = pd.DataFrame(arr[:,1], index=arr[:,0]) df.index = pd.to_timedelta(df.index, unit="s") 

Than resampling it's pretty easy, 10ms is the frequency you want, first() should give you the expected result dropping everything but the records at 10ms ticks, but feel free to experiment with other functions

df = df.resample("10ms").first() 

Eventually you could get back to your array with something like:

np.vstack([pd.to_numeric(df.index, downcast="float").values / 1e9, df.values.squeeze()]).T 

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.