2
\$\begingroup\$

I have a pandas dataframe (called base_mortality) with 1 column and n rows, which is of the following form:

 age | death_prob --------------------------- 60 | 0.005925 61 | 0.006656 62 | 0.007474 63 | 0.008387 64 | 0.009405 65 | 0.010539 66 | 0.0118 67 | 0.013201 68 | 0.014756 69 | 0.016477 

age is the index and death_prob is the probability that a person who is a given age will die in the next year. I want to use these death probabilities to project the expected annuity payment that would be paid to an annuitant over the next t years.

Suppose I have 3 annuitants, whose names and ages are contained in a dictionary:

policy_holders = {'John' : 65, 'Mike': 67, 'Alan': 71} 

Then I would want to construct a new dataframe whose index is time (rather than age) which has 3 columns (one for each annuitant) and t rows (one for each time step). Each column should specify the probability of death for each policy holder at that time step. For example:

 John Mike Alan 0 0.010539 0.013201 0.020486 1 0.011800 0.014756 0.022807 2 0.013201 0.016477 0.025365 3 0.014756 0.018382 0.028179 4 0.016477 0.020486 0.031269 .. ... ... ... 96 1.000000 1.000000 1.000000 97 1.000000 1.000000 1.000000 98 1.000000 1.000000 1.000000 99 1.000000 1.000000 1.000000 100 1.000000 1.000000 1.000000 

At present, my code for doing this is as follows:

import pandas as pd base_mortality = pd.read_csv('/Users/joshchapman/PycharmProjects/VectorisedAnnuityModel/venv/assumptions/base_mortality.csv', index_col=['x']) policy_holders = {'John' : 65, 'Mike': 67, 'Alan': 71} out = pd.DataFrame(index=range(0,101)) for name, age in policy_holders.items(): out[name] = base_mortality.loc[age:].reset_index()['age'] out = out.fillna(1) print(out) 

However, my aim is to remove this loop and achieve this using vector operations (i.e. pandas and/or numpy functions). Any suggestions on how I might improve my code to work in this way would be great!

\$\endgroup\$

    1 Answer 1

    2
    \$\begingroup\$

    Enter pandas.cut. It returns the bin in which each event lies. You can even pass the labels directly. This way you can reduce it to a Python loop over the people:

    import pandas as pd import numpy as np age_bins = range(59, 70) # one more than the probabilities death_prob = [0.005925, 0.006656, 0.007474, 0.008387, 0.009405, 0.010539, 0.0118, 0.013201, 0.014756, 0.016477] policy_holders = {'John' : 65, 'Mike': 67, 'Alan': 71} values = {name: pd.cut(range(age, age + 101), age_bins, labels=death_prob) for name, age in policy_holders.items()} out = pd.DataFrame(values, dtype=np.float64).fillna(1) print(out) # John Mike Alan # 0 0.010539 0.013201 1.0 # 1 0.011800 0.014756 1.0 # 2 0.013201 0.016477 1.0 # 3 0.014756 1.000000 1.0 # 4 0.016477 1.000000 1.0 # .. ... ... ... # 96 1.000000 1.000000 1.0 # 97 1.000000 1.000000 1.0 # 98 1.000000 1.000000 1.0 # 99 1.000000 1.000000 1.0 # 100 1.000000 1.000000 1.0 # # [101 rows x 3 columns] 

    Note that the hin edges need to be one larger than the labels, because technically, this is interpreted as (59, 60], (60, 61], ..., i.e. including the right edge.

    \$\endgroup\$
    3
    • \$\begingroup\$Thanks for your help on this one! Quick question though: what if the probabilities are not unique? I've tried replacing the last probability with the second to last and this gives the error Categorical categories must be unique from pd.cut.\$\endgroup\$CommentedJul 13, 2020 at 10:18
    • \$\begingroup\$@JRChapman In that case you will have to pass labels=False (or None, not quite sure atm) and use the resulting indices to index into pd.Series(death_prob). See also the first revision of my answer for that.\$\endgroup\$
      – Graipher
      CommentedJul 13, 2020 at 10:27
    • \$\begingroup\$@JRChapman: It is False, and here is the direct link to that revision: codereview.stackexchange.com/revisions/245225/1\$\endgroup\$
      – Graipher
      CommentedJul 13, 2020 at 13:30

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.