Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Trying to create a grouped bar chart but the legend isn't displaying every category

I’m trying to make a simple categorical barplot to show how many extra-marital affairs people had based on their happiness rating (x axis) and age group (in the legend/key). This is my code.

import pandas as pd
import seaborn as sns

url = 'https://vincentarelbundock.github.io/Rdatasets/csv/AER/Affairs.csv'

affairs = pd.read_csv(url)
age_categorical = []
for row in affairs['age']:
    if 0<row<30:
        age_categorical.append("Under 30")
    elif 30<=row<=40:
        age_categorical.append("30 to 40 Years Old")
    elif 40<row<=55:
      age_categorical.append("41 to 55 years old")
    else:
      age_categorical.append("Older than 55")

affairs['age_categorical'] = age_categorical
# count number of affairs for each happiness rating 
affairs_subset = affairs.copy()
affairs_subset = affairs_subset[affairs_subset["affairs_dummy"] != 0]
affairs_rating = affairs_subset.groupby('rating').size()
# create dataframe table for plot
happiness = pd.DataFrame({
    'Happiness rating' : [1, 2, 3, 4, 5],
    'Number of Affairees': [8, 33, 27, 48, 34]
})
# put in barplot
plot = sns.barplot(x = 'Happiness rating', y='Number of Affairees', hue=affairs['age_categorical'],data = happiness).set(title='Happiness and extra-marital affairs')

The resulting bar plot doesn’t show all the categories in the legend.

enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

The Values need to be in the same dataframe so that the order is correct. Also you can just group by and create the table you are looking to plot.

affairs_subset = affairs_subset[affairs_subset["affairs"] != 0]
df = affairs_subset.groupby(['rating','age_categorical'])['affairs'].sum().reset_index()
sns.barplot(data=df, x="rating", y="affairs", hue="age_categorical")

enter image description here

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading