Scatter plot for binary class dataset with two features in python

Question

I have my dataset that has multiple features and based on that the dependent variable is defined to be 0 or 1. I want to get a scatter plot such that all my positive examples are marked with 'o' and negative ones with 'x'. I am using python and here is the code for the beginning.

import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('/home/Dittu/Desktop/Project/creditcard.csv')

now I know how to make scatter plots for two different classes.

fig = plt.figure() ax1 = fig.add_subplot(111) ax1.scatter(x[:4], y[:4], s=10, c='b', marker="s", label='first') ax1.scatter(x[40:],y[40:], s=10, c='r', marker="o", label='second') plt.show()

but how to segregate both class of examples and the plot them or plot them with distinct marks without separating?

Added an answer below, tweak it to suit your target variable, you will find it in the docs — Aditya, CommentedJun 28, 2018 at 20:02

Aditya · Accepted Answer · 2018-06-28 20:06:13Z

One approach is to plot the data as a scatter plot with a low alpha, so you can see the individual points as well as a rough measure of density.

from sklearn.datasets import load_iris iris = load_iris() features = iris.data.T plt.scatter(features[0], features[1], alpha=0.2, s=100*features[3], c=iris.target, cmap='viridis') plt.xlabel(iris.feature_names[0]) plt.ylabel(iris.feature_names[1]);

We can see that this scatter plot has given us the ability to simultaneously explore four different dimensions of the data:

the (x, y) location of each point corresponds to the sepal length and width,
the size of the point is related to the petal width, and
the color is related to the particular species of flower, i.e the Target Variable...

Multicolor and multifeature scatter plots like this can be useful for both exploration and presentation of data.

Nitish · Accepted Answer · 2018-06-28 21:06:32Z

Found the answer. Thank you @Aditya

import seaborn as sns sns.lmplot('Time', 'Amount', dataset, hue='Class', fit_reg=False) fig = plt.gcf() fig.set_size_inches(15, 10) plt.show()

where Time and Amount are the two features I needed to plot. Class is the column of the dataset that has the dependent binary class value. And this is the plot I got as required.

Community · Accepted Answer · 2020-03-28 18:52:14Z

Let's assume that the name of your dependent variable column is "target", and you have stored the data in "dataset" variable. You can segregate the dataset based on value of target in following way:

import numpy as np idx_1 = np.where(dataset.target == 1) idx_0 = np.where(dataset.target == 0)

The above code with return indices of dataset with target values 0 and 1.

Now, to display the data, use:

plt.scatter(dataset.iloc[idx_1].x, dataset.iloc[idx_1].y, s=10, c='b', marker="o", label='first') plt.scatter(dataset.iloc[idx_0].x, dataset.iloc[idx_0].y, s=10, c='r', marker="o", label='second') plt.ylabel('y') plt.xlabel('x') plt.show()

Stack Exchange Network

Scatter plot for binary class dataset with two features in python

3 Answers 3

Hot Network Questions

Scatter plot for binary class dataset with two features in python

3 Answers 3

Related

Hot Network Questions