I'm trying to visualize a customer data, two of the variables I'm looking at are verified_status
and video_transcription_length
.
- verified_status: the status of the videos, which contain verified and not verified
- video_transcription_length: the length of the video transcription
I'm trying to plot histograms to see if the distributions of video_transcription_length for verified and not verified videos are different or not.
If i use plt.hist, I get the following result
status_name = data_unsampled['verified_status'].unique() color=['red','green'] for i, status in enumerate(status_name): ax = data_unsampled[data_unsampled['verified_status']==status]['video_transcription_length'].hist(figsize=(8,4),bins=50, color=color[i]) ax.legend(status_name) ax.set_title('distribution of video_transcription_text length of 2 different video status') ax.set_xlabel('status') ax.set_ylabel('text length') plt.show()
However if I use sns.histplot, I got a different result
sns.histplot(data=data_unsampled, stat="count", multiple="stack", x="video_transcription_length", kde=False, palette="pastel", hue="verified_status", element="bars", legend=True) plt.title("Seaborn Stacked Histogram") plt.xlabel("video_transcription_text length (number of characters)") plt.ylabel("Count") plt.title("Distribution of video_transcription_text length for videos posted by verified accounts and videos posted by unverified accounts") plt.show()
Why does using seaborn and matplotlib get different results? And even in the second picture the not verified is much higher than verified, but in the first picture verified is higher than not verified.
What's the difference between these two?
seaborn.histplot
you're usingmultiple=stack
, which means that the bars for the two different categories are stacked on top of each other.pyplot.hist
does not stack the bars but layers them on top of each other.$\endgroup$