Pandas: Fill in missing values with an empty numpy array

Question

I have a Pandas Dataframe that I derive from a process like this:

df1 = pd.DataFrame({'c1':['A','B','C','D','E'],'c2':[1,2,3,4,5]}) df2 = pd.DataFrame({'c1':['A','B','C'],'c2':[1,2,3],'c3': [np.array((1,2,3,4,5,6)),np.array((6,7,8,9,10,11)),np.full((6,),np.nan)]}) df3 = df1.merge(df2,how='left',on=['c1','c2'])

This looks like this:

c1	c2	c3
A	1	`[1,2,3,4,5,6]`
B	2	`[6,7,8,9,10,11]`
C	3	`[nan,nan,nan,nan,nan,nan]`
D	4	`NaN`
E	5	`NaN`

In order to run the next step of my code, I need all of the arrays in c3 to have a consistent length. For the inputs coming in that were present in the join (i.e. row 1 through 3) this was already taken care of. However, for the rows that were missing from df2 where I now have only a single NaN value (rows 4 and 5) I need to replace those NaN's with an array of NaN values like in row 3. The problem is that I can't figure out how to do that.

I've tried a number of things, starting with the obvious:

df3.loc[pd.isnull(df3.c3),'c3'] = np.full((6,),np.nan)

Which gave me a

ValueError: Must have equal len keys and value when setting with an iterable

Fair enough; I understand this error and why python is confused about what I'm trying to do. How about this?

for i in df3.index: df3.at[i,'c3'] = np.full((6,),np.nan) if all(pd.isnull(df3.c3)) else df3.c3

That code runs without error but then when I go to print out df3 (or use it) I get this error:

RecursionError: maximum recursion depth exceeded

That one I don't understand, but moving on, what if I preassign a column full of my NaN arrays and then I can do some logic after the join:

for i in df1.index: df1.at[i,'c4'] = np.full((6,),np.nan)

This gives me the understandable error:

ValueError: setting an array element with a sequence

How about another variation of the same idea:

df1['c4'] = np.full((6,),np.nan)

This one gives a different, also understandable error:

ValueError: Length of values (6) does not match length of index (5)

Hence, the question: How do I replace values in my dataframe (in this case null values) with an empty numpy array of a given length?

For clarity, the desired final result is this:

c1	c2	c3
A	1	`[1,2,3,4,5,6]`
B	2	`[6,7,8,9,10,11]`
C	3	`[nan,nan,nan,nan,nan,nan]`
D	4	`[nan,nan,nan,nan,nan,nan]`
E	5	`[nan,nan,nan,nan,nan,nan]`

maybe you should use df3[i].c3 or df3.at[i,'c3'] instead of df3.c3 because df3.c3 gives all values in column but you need only value from current row. — furas, CommentedApr 24 at 20:30

PaulS · Accepted Answer · 2025-04-24 21:02:17Z

A possible solution:

# the array with the 6 nan values arr_nan = np.full( df3['c3'].map( lambda x: np.size(x) if isinstance(x, np.ndarray) else 0).max(), np.nan) df3.assign(c3 = df3['c3'].map( lambda y: arr_nan if not isinstance(y, np.ndarray) else y))

This solution first determines the length of the arrays in c3, and then replaces all non-array entries in c3 by the array of 6 np.nan.

Output:

 c1 c2 c3 0 A 1 [1, 2, 3, 4, 5, 6] 1 B 2 [6, 7, 8, 9, 10, 11] 2 C 3 [nan, nan, nan, nan, nan, nan] 3 D 4 [nan, nan, nan, nan, nan, nan] 4 E 5 [nan, nan, nan, nan, nan, nan]

Triky · Accepted Answer · 2025-04-25 06:59:19Z

Get the index of the rows where you have na values, and create a Series with an equal amount of rows, and with the same index.

idx = df3[df3['c3'].isna()].index df3.loc[idx, 'c3'] = pd.Series([np.full((6,), np.nan)] * len(idx), index=idx)

End result:

c1 c2 c3 A 1 [1, 2, 3, 4, 5, 6] B 2 [6, 7, 8, 9, 10, 11] C 3 [nan, nan, nan, nan, nan, nan] D 4 [nan, nan, nan, nan, nan, nan] E 5 [nan, nan, nan, nan, nan, nan]

Collectives™ on Stack Overflow

Pandas: Fill in missing values with an empty numpy array

2 Answers 2

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Related