I have a pandas dataframe (called base_mortality
) with 1 column and n rows, which is of the following form:
age | death_prob --------------------------- 60 | 0.005925 61 | 0.006656 62 | 0.007474 63 | 0.008387 64 | 0.009405 65 | 0.010539 66 | 0.0118 67 | 0.013201 68 | 0.014756 69 | 0.016477
age
is the index and death_prob
is the probability that a person who is a given age will die in the next year. I want to use these death probabilities to project the expected annuity payment that would be paid to an annuitant over the next t years.
Suppose I have 3 annuitants, whose names and ages are contained in a dictionary:
policy_holders = {'John' : 65, 'Mike': 67, 'Alan': 71}
Then I would want to construct a new dataframe whose index is time (rather than age) which has 3 columns (one for each annuitant) and t rows (one for each time step). Each column should specify the probability of death for each policy holder at that time step. For example:
John Mike Alan 0 0.010539 0.013201 0.020486 1 0.011800 0.014756 0.022807 2 0.013201 0.016477 0.025365 3 0.014756 0.018382 0.028179 4 0.016477 0.020486 0.031269 .. ... ... ... 96 1.000000 1.000000 1.000000 97 1.000000 1.000000 1.000000 98 1.000000 1.000000 1.000000 99 1.000000 1.000000 1.000000 100 1.000000 1.000000 1.000000
At present, my code for doing this is as follows:
import pandas as pd base_mortality = pd.read_csv('/Users/joshchapman/PycharmProjects/VectorisedAnnuityModel/venv/assumptions/base_mortality.csv', index_col=['x']) policy_holders = {'John' : 65, 'Mike': 67, 'Alan': 71} out = pd.DataFrame(index=range(0,101)) for name, age in policy_holders.items(): out[name] = base_mortality.loc[age:].reset_index()['age'] out = out.fillna(1) print(out)
However, my aim is to remove this loop and achieve this using vector operations (i.e. pandas and/or numpy functions). Any suggestions on how I might improve my code to work in this way would be great!