I can’t figure out ggplot. I’ve got the following data:
# generate some mock data
df <- data.frame( Category=c("black", "black", "brown", "brown"),variable=c("Class1", "Class2", "Class1", "Class2"), value=c(0.2,0.8,0.3,0.7), counts=c(200,800,300,700), total=c(1000,1000,1001,1001) )
head(df)
Category variable value counts total
1 black Class1 0.20 200 1000
2 black Class2 0.80 800 1000
3 brown Class1 0.30 301 1001
4 brown Class2 0.70 700 1001
# generate new object containing total counts per category (otherwise "total" gets repeated per category, since there are two variables per category)
df_unique_counts_per_category <- df[df$variable=="Class1",]
head(df_unique_counts_per_category)
Category variable value counts total
1 black Class1 0.2 200 1000
3 brown Class1 0.3 300 1001
I want to create a plot in which I’ve got bars per Category which should be divided according to the percentage indicated in the table. For example, Category "black" will be dividing the bar to represent 20% for Class 1, and 80% Class 2. For category "brown", this will be 30% Class 1, and 70% Class 2. I want to include the label of the Category (black/brown) at the bottom of the graph, the percentage of each Class in the middle of each section of the bar, and the number of total counts per category (counts), at either the top or bottom of the graph.
I was able to come up with the following code:
png(file="mock.png", h=1400 , w=1400, unit="px")
ggplot(df, aes(x=Category, y=value, fill = variable)) +
theme(text = element_text(size = 50), axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_bar(position = "fill", stat = "identity",color='black',width=0.9) +
scale_y_continuous(labels = scales::percent) +
geom_text(aes(label = paste0(round(value*100),"%")),
position = position_stack(vjust = 0.5), size = 5) + # this is the % label
geom_text(data = df_unique_counts_per_category, aes(label = paste("N = ", total)), position = position_dodge(width = 1), vjust = -0.5, size = 4) + # this is the N label
labs(title = "Module composition (%)", y = "Percentage", x = "Module")
dev.off()
This has generated the following graph:

It’s almost correct, but as you can see, the location of the "N = xxxx" label is misaligned. Can you help me understand where/how I can specify the position of the "N=" label, so that it’s either outside of the bars (e.g., just on top of the graph, or before the x axis legend)?
>Solution :
I would simply use annotate here rather than geom_text. You know the x positions (1 and 2), and you know the y positions (both about 1.1). The labels can be obtained by using aggregate to get the sum of the counts per category:
library(ggplot2)
df <- data.frame(Category = c("black", "black", "brown", "brown"),
variable = c("Class1", "Class2", "Class1", "Class2"),
value = c(0.2, 0.8, 0.3, 0.7),
counts = c(200, 800, 301, 700),
total = c(1000, 1000, 1001, 1001))
ggplot(df, aes(Category, value, fill = variable)) +
geom_col(position = "fill", color = 'black', width = 0.9) +
geom_text(aes(label = scales::percent(value)),
position = position_stack(vjust = 0.5), size = 5) +
annotate('text', x = c(1, 2), y = 1.1,
label = paste('N =', aggregate(counts ~ Category, df, sum)$counts)) +
scale_y_continuous("Percentage", labels = scales::percent, breaks = 0:4/4) +
labs(title = "Module composition (%)", x = "Module") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
