How to visualize (make plot) of regression output against categorical input variable? [closed]

Question

I am doing linear regression with multiple variables. In my data I have n = 143 features and m = 13000 training examples. Some of my features are continuous (ordinal) variables (area, year, number of rooms). But I also have categorical variables (district, color, type). For now I visualized some of my feautures against predicted price. For example here is the plot of area against predicted price:

Since area is continuous ordinal variable I had no troubles visualizing the data. But now I wanted to somehow visualize dependency of my categorical variables (such as district) on predicted price. For categorical variables I used one-hot (dummy) encoding.
For example that kind of data:

turned to this format:

If I were using ordinal encoding for districts this way:

DistrictA - 1 DistrictB - 2 DistrictC - 3 DistrictD - 4 DistrictE - 5

I would plot this values against predicted price pretty easy by putting 1-5 to X axis and price to Y axis.

But I used dummy coding and now I do not know how can I show (visualize) dependency between price and categorical variable 'District' represented as series of zeros and ones.

How can I make a plot showing a regression line of districts against predicted price in case of using dummy coding?

Cross-posted on Stats.SE, SO, and DataScience.SE: stats.stackexchange.com/q/186027/2921, stackoverflow.com/q/34193685/781723, datascience.stackexchange.com/q/9301/8560. Please do not post the same question on multiple sites. Each community should have an honest shot at answering without anybody's time being wasted. — D.W., CommentedAug 29, 2016 at 2:40
I'm voting to close this question as off-topic because it's a duplicate of stackoverflow.com/questions/34193685/… — Sean Owen, CommentedAug 29, 2016 at 9:34

Marmite Bomber · Accepted Answer · 2015-12-10 22:42:41Z

One possible first step is to convert the data back to the original coding. This is called in SQL unpivot, in R melt.

Here an R example

> my.df <- read.table( + text = "DistrictA DistrictB DistrictC DistrictD DistrictE Price + 1 0 0 0 0 10000 + 0 1 0 0 0 20000 + 0 0 1 0 0 30000 + 0 0 0 1 0 40000 + 0 0 0 0 1 50000" + , header = TRUE) > my.df DistrictA DistrictB DistrictC DistrictD DistrictE Price 1 1 0 0 0 0 10000 2 0 1 0 0 0 20000 3 0 0 1 0 0 30000 4 0 0 0 1 0 40000 5 0 0 0 0 1 50000 > library(reshape) > subset(melt(my.df, id="Price", variable = "District"),value == 1)[,c(1,2)] Price District 1 10000 DistrictA 7 20000 DistrictB 13 30000 DistrictC 19 40000 DistrictD 25 50000 DistrictE

After that you plot the Price dependent on a factor variable. You may additionally consider to order the factor based on the predicted price.

I provide no details, as you don't tagged your tool, but I would recommend additional to a scatter plot to consider a box plot and/or density plot - always combined with the prediction value from the model for each factor level.

Stack Exchange Network

How to visualize (make plot) of regression output against categorical input variable? [closed]

1 Answer 1

Hot Network Questions

How to visualize (make plot) of regression output against categorical input variable? [closed]

1 Answer 1

Related

Hot Network Questions