Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Boxplot in ggplot with N label appended just outside of graph

I can’t figure out ggplot. I’ve got the following data:

# generate some mock data
df <- data.frame( Category=c("black", "black", "brown", "brown"),variable=c("Class1", "Class2", "Class1", "Class2"), value=c(0.2,0.8,0.3,0.7), counts=c(200,800,300,700), total=c(1000,1000,1001,1001)  )       
        
head(df)
       Category variable value counts    total
1        black   Class1  0.20        200 1000
2        black   Class2  0.80        800 1000
3        brown   Class1  0.30        301 1001
4        brown   Class2  0.70        700 1001
    
# generate new object containing total counts per category (otherwise "total" gets repeated per category, since there are two variables per category)
df_unique_counts_per_category <- df[df$variable=="Class1",]

head(df_unique_counts_per_category)
    Category variable value counts total
1    black   Class1   0.2    200   1000
3    brown   Class1   0.3    300   1001

I want to create a plot in which I’ve got bars per Category which should be divided according to the percentage indicated in the table. For example, Category "black" will be dividing the bar to represent 20% for Class 1, and 80% Class 2. For category "brown", this will be 30% Class 1, and 70% Class 2. I want to include the label of the Category (black/brown) at the bottom of the graph, the percentage of each Class in the middle of each section of the bar, and the number of total counts per category (counts), at either the top or bottom of the graph.

I was able to come up with the following code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

png(file="mock.png", h=1400 , w=1400, unit="px")
ggplot(df, aes(x=Category, y=value, fill = variable)) +
  theme(text = element_text(size = 50), axis.text.x = element_text(angle = 45, hjust = 1)) + 
  geom_bar(position = "fill", stat = "identity",color='black',width=0.9) +
  scale_y_continuous(labels = scales::percent) +
  geom_text(aes(label = paste0(round(value*100),"%")), 
            position = position_stack(vjust = 0.5), size = 5) + # this is the % label
  geom_text(data = df_unique_counts_per_category, aes(label = paste("N = ", total)), position =  position_dodge(width = 1), vjust = -0.5, size = 4) + # this is the N label            
  labs(title = "Module composition (%)", y = "Percentage", x = "Module")
dev.off()

This has generated the following graph:
enter image description here

It’s almost correct, but as you can see, the location of the "N = xxxx" label is misaligned. Can you help me understand where/how I can specify the position of the "N=" label, so that it’s either outside of the bars (e.g., just on top of the graph, or before the x axis legend)?

>Solution :

I would simply use annotate here rather than geom_text. You know the x positions (1 and 2), and you know the y positions (both about 1.1). The labels can be obtained by using aggregate to get the sum of the counts per category:

library(ggplot2)

df <- data.frame(Category = c("black", "black", "brown", "brown"),
                 variable = c("Class1", "Class2", "Class1", "Class2"), 
                 value = c(0.2, 0.8, 0.3, 0.7), 
                 counts = c(200, 800, 301, 700), 
                 total = c(1000, 1000, 1001, 1001))

ggplot(df, aes(Category, value, fill = variable)) + 
  geom_col(position = "fill", color = 'black', width = 0.9) +
  geom_text(aes(label = scales::percent(value)), 
            position = position_stack(vjust = 0.5), size = 5) + 
  annotate('text', x = c(1, 2), y = 1.1, 
           label = paste('N =', aggregate(counts ~ Category, df, sum)$counts)) +
  scale_y_continuous("Percentage", labels = scales::percent, breaks = 0:4/4) +
  labs(title = "Module composition (%)", x = "Module") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

enter image description here

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading