Thanks for checking out the hands-on reinforcement exercises for this seminar. The goal of this homework is to provide you with a handful of questions that necessitate visualization that you might conceivably face on the job. There is not one "right" answer for the questions below, but some answers are more right than others. For example, if you were to be asked to visualize the trends in LTV over the course of a year, would plotting average LTV over time be a better visualization than building twelve violin plots of LTV--one for each month? Not necessarily. But would both of those be better than a single box-and-whisker plot of LTV all originations in that year? Absolutely. It all depends on the context of the question, and the information you intend to convey with your visualization.
When in doubt, ask yourself: am I clearly and powerfully communicating the relevant information with this visualization?
We'll be using the same data we've been dealing with throughout the seminar: January and December 2017 FNMA originations. Remember, if you don't understand what some of the variables mean, all the information you need is in the data_prep_nb.ipynb
, including links to relevant glossaries and data dictionnaries.
Note: For all questions below, you are free to use whatever python visualization package you want. That said, some questions require a specific type of visualization (example: if you know that you need an interactive visualization, don't start by using a package that you know cannot build interactive visualizations).
Good luck!
# basic packagesimportnumpyasnpimportpandasaspdimportdatetime
# store the datetime of the most recent running of this notebook as a form of a logmost_recent_run_datetime=datetime.datetime.now().strftime("%Y-%m-%d %H:%M")f"This notebook was last executed on {most_recent_run_datetime}"
'This notebook was last executed on 2019-09-08 20:42'
# pulling in our main data; for more info on the data, see the "data_prep_nb.ipynb" filemain_df=pd.read_csv(filepath_or_buffer='../data/jan_and_dec_17_acqs.csv')# taking a peek at our datamain_df.head()
loan_id | orig_chn | seller_name | orig_rt | orig_amt | orig_trm | orig_dte | frst_dte | oltv | ocltv | ... | occ_stat | state | zip_3 | mi_pct | product_type | cscore_c | mi_type | relocation_flg | cscore_min | orig_val | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100020736692 | B | CALIBER HOME LOANS, INC. | 4.875 | 492000 | 360 | 12/2017 | 02/2018 | 75 | 75 | ... | I | CA | 920 | NaN | FRM | NaN | NaN | N | 757.0 | 656000.000000 |
1 | 100036136334 | R | OTHER | 2.750 | 190000 | 180 | 12/2017 | 01/2018 | 67 | 67 | ... | P | MD | 206 | NaN | FRM | 798.0 | NaN | N | 797.0 | 283582.089552 |
2 | 100043912941 | R | OTHER | 4.125 | 68000 | 360 | 12/2017 | 02/2018 | 66 | 66 | ... | P | OH | 432 | NaN | FRM | NaN | NaN | N | 804.0 | 103030.303030 |
3 | 100057175226 | R | OTHER | 4.990 | 71000 | 360 | 12/2017 | 02/2018 | 95 | 95 | ... | P | NC | 278 | 30.0 | FRM | NaN | 1.0 | N | 696.0 | 74736.842105 |
4 | 100060715643 | R | OTHER | 4.500 | 180000 | 360 | 12/2017 | 02/2018 | 75 | 75 | ... | I | WA | 983 | NaN | FRM | NaN | NaN | N | 726.0 | 240000.000000 |
5 rows × 27 columns
A business partner of yours came to you to ask about how occupancy status relates to risk. They were wondering, what occupancy status appears riskier in our data: principal homes (i.e. someone's primary residence), second homes, or investor-owned homes? There are obviously many ways of measuring risk. Here it's safe to assume your business partner means credit risk, so some variables you may want to consider would be the borrower's credit score, DTI, or LTV. You can use one or more of these variables in your analysis, or something else altogether if you see fit; just ensure that in the end you arrive at one a single visualization to share with your business partner.
# code for visualization goes here
Explanation for why you chose this particular visualization goes here...
Imagine that a recent news event broke that had to do with mortgage insurance (MI), and even though we don't yet know exactly how that news will impact Fannie Mae's business, you've been asked to produce a visualization that communicates to what extent our December 2017 acquisitions were covered by MI.
# code for visualization goes here
Explanation for why you chose this particular visualization goes here...
One of your business partners is trying to learn more about the areas of the country where we are providing the highest value loans in terms of origination amount. You've also been told that an interactive map of the United States would be optimal here, and they'd like you to add whatever data you might think are relevant to the tooltip.
# code for visualization goes here
Explanation for why you chose this particular visualization goes here...
You've received a very open-ended question from an account manager hoping to learn more about how the seller with whom they work most closely compares to all sellers. Pick any seller (aside from "Other") and any two variables in our data (i.e. origination amount and origination value, but don't use that combo), and put together a visualization that communicates whether or not that seller is unique in any way as it pertains to the two variables you selected. The answer can be yes, no, or maybe... just justify your answer with your visualization.
# code for visualization goes here
Explanation for why you chose this particular visualization goes here...