2

I have a Pandas Dataframe that I derive from a process like this:

df1 = pd.DataFrame({'c1':['A','B','C','D','E'],'c2':[1,2,3,4,5]}) df2 = pd.DataFrame({'c1':['A','B','C'],'c2':[1,2,3],'c3': [np.array((1,2,3,4,5,6)),np.array((6,7,8,9,10,11)),np.full((6,),np.nan)]}) df3 = df1.merge(df2,how='left',on=['c1','c2']) 

This looks like this:

c1c2c3
A1[1,2,3,4,5,6]
B2[6,7,8,9,10,11]
C3[nan,nan,nan,nan,nan,nan]
D4NaN
E5NaN

In order to run the next step of my code, I need all of the arrays in c3 to have a consistent length. For the inputs coming in that were present in the join (i.e. row 1 through 3) this was already taken care of. However, for the rows that were missing from df2 where I now have only a single NaN value (rows 4 and 5) I need to replace those NaN's with an array of NaN values like in row 3. The problem is that I can't figure out how to do that.

I've tried a number of things, starting with the obvious:

df3.loc[pd.isnull(df3.c3),'c3'] = np.full((6,),np.nan) 

Which gave me a

ValueError: Must have equal len keys and value when setting with an iterable 

Fair enough; I understand this error and why python is confused about what I'm trying to do. How about this?

for i in df3.index: df3.at[i,'c3'] = np.full((6,),np.nan) if all(pd.isnull(df3.c3)) else df3.c3 

That code runs without error but then when I go to print out df3 (or use it) I get this error:

RecursionError: maximum recursion depth exceeded 

That one I don't understand, but moving on, what if I preassign a column full of my NaN arrays and then I can do some logic after the join:

for i in df1.index: df1.at[i,'c4'] = np.full((6,),np.nan) 

This gives me the understandable error:

ValueError: setting an array element with a sequence 

How about another variation of the same idea:

df1['c4'] = np.full((6,),np.nan) 

This one gives a different, also understandable error:

ValueError: Length of values (6) does not match length of index (5) 

Hence, the question: How do I replace values in my dataframe (in this case null values) with an empty numpy array of a given length?

For clarity, the desired final result is this:

c1c2c3
A1[1,2,3,4,5,6]
B2[6,7,8,9,10,11]
C3[nan,nan,nan,nan,nan,nan]
D4[nan,nan,nan,nan,nan,nan]
E5[nan,nan,nan,nan,nan,nan]
1
  • maybe you should use df3[i].c3 or df3.at[i,'c3'] instead of df3.c3 because df3.c3 gives all values in column but you need only value from current row.
    – furas
    CommentedApr 24 at 20:30

2 Answers 2

1

A possible solution:

# the array with the 6 nan values arr_nan = np.full( df3['c3'].map( lambda x: np.size(x) if isinstance(x, np.ndarray) else 0).max(), np.nan) df3.assign(c3 = df3['c3'].map( lambda y: arr_nan if not isinstance(y, np.ndarray) else y)) 

This solution first determines the length of the arrays in c3, and then replaces all non-array entries in c3 by the array of 6 np.nan.

Output:

 c1 c2 c3 0 A 1 [1, 2, 3, 4, 5, 6] 1 B 2 [6, 7, 8, 9, 10, 11] 2 C 3 [nan, nan, nan, nan, nan, nan] 3 D 4 [nan, nan, nan, nan, nan, nan] 4 E 5 [nan, nan, nan, nan, nan, nan] 
0
    0

    Get the index of the rows where you have na values, and create a Series with an equal amount of rows, and with the same index.

    idx = df3[df3['c3'].isna()].index df3.loc[idx, 'c3'] = pd.Series([np.full((6,), np.nan)] * len(idx), index=idx) 

    End result:

    c1 c2 c3 A 1 [1, 2, 3, 4, 5, 6] B 2 [6, 7, 8, 9, 10, 11] C 3 [nan, nan, nan, nan, nan, nan] D 4 [nan, nan, nan, nan, nan, nan] E 5 [nan, nan, nan, nan, nan, nan] 

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.