This is the data I’m working with:
| Station | Salinity | CentricD | PennateD | Dinoflag | MarineFlag | Cilliates |
|---|---|---|---|---|---|---|
| A3 | 18.3 | 181000 | 26500 | 1000 | 15500 | 2250 |
| A6 | 27.4 | 584666.6667 | 4666.666667 | 11666.66667 | 0 | 61333.33333 |
| A8 | 25.7 | 625071.4286 | 2000 | 74000 | 294.1176471 | 1907.563025 |
| B | 29.77785714 | 503693.8776 | 2000 | 6642.857143 | 7642.857143 | 5622.44898 |
| C | 31.283 | 266991.5966 | 5285.714286 | 10714.28571 | 71352.94118 | 12067.22689 |
| D | 32.21625 | 349375 | 6437.5 | 6142.857143 | 39651.78571 | 4339.285714 |
| E | 32.23 | 379200 | 466.6666667 | 3714.285714 | 12228.57143 | 4504.761905 |
| F | 32.8 | 559000 | 0 | 333.3333333 | 0 | 11000 |
| G | 33.185 | 209276.7857 | 2125 | 5714.285714 | 27937.5 | 3062.5 |
| H | 33.67 | 98714.28571 | 1812.5 | 7125 | 6410.714286 | 7750 |
| I | 34.33294118 | 113302.521 | 1764.705882 | 40142.85714 | 5588.235294 | 9260.504202 |
| J | 34.537 | 68142.85714 | 1000 | 12842.85714 | 20228.57143 | 5271.428571 |
I want to make a stacked barchart, with ‘Station’ on the x-axis, and then each type of phytoplankton stacked on top of each other per station to create a comprehensive idea of both how many phytoplankton there are per station and what that composition is made up of.
I just don’t know how to do that. Looking at the geom_bar() command, I need to specify a ‘fill’ variable, of which I don’t have just one, I have 5 types of phytoplankton that I want to fill it with.
I’m sure that this is just a data formatting issue, but I can’t find any examples of how to properly format it. Thanks in advance.
>Solution :
You would first have to pivot the data to be in long format, then you could make the graph using the pivoted values as the y-axis values and the pivoted variable names as the fill variable. Here’s an example.
Original Data
library(dplyr)
library(tidyr)
library(ggplot2)
dat <- tibble::tribble(
~Station , ~Salinity , ~CentricD , ~PennateD , ~Dinoflag , ~MarineFlag , ~Cilliates ,
"A3" , 18.3 , 181000 , 26500 , 1000 , 15500 , 2250 ,
"A6" , 27.4 , 584666.6667 , 4666.666667 , 11666.66667 , 0 , 61333.33333 ,
"A8" , 25.7 , 625071.4286 , 2000 , 74000 , 294.1176471 , 1907.563025 ,
"B" , 29.77785714 , 503693.8776 , 2000 , 6642.857143 , 7642.857143 , 5622.44898 ,
"C" , 31.283 , 266991.5966 , 5285.714286 , 10714.28571 , 71352.94118 , 12067.22689 ,
"D" , 32.21625 , 349375 , 6437.5 , 6142.857143 , 39651.78571 , 4339.285714 ,
"E" , 32.23 , 379200 , 466.6666667 , 3714.285714 , 12228.57143 , 4504.761905 ,
"F" , 32.8 , 559000 , 0 , 333.3333333 , 0 , 11000 ,
"G" , 33.185 , 209276.7857 , 2125 , 5714.285714 , 27937.5 , 3062.5 ,
"H" , 33.67 , 98714.28571 , 1812.5 , 7125 , 6410.714286 , 7750 ,
"I" , 34.33294118 , 113302.521 , 1764.705882 , 40142.85714 , 5588.235294 , 9260.504202 ,
"J" , 34.537 , 68142.85714 , 1000 , 12842.85714 , 20228.57143 , 5271.428571 )
Here, I use pivot_longer() from tidyr. This will plot the raw values by station and phytoplankton. Note, that if you are providing the y value directly (and not calculating it from the data), you need to use stat="identity" in geom_bar().
dat %>%
pivot_longer(CentricD:Cilliates, names_to = "phyto", values_to = "val") %>%
ggplot(aes(x=Station, y=val, fill = phyto)) +
geom_bar(stat="identity") +
theme_bw()

If you would rather percentagize the figures so each bar has the same height, you could make the percentage variable by Station first and then plot that variable instead.
dat %>%
pivot_longer(CentricD:Cilliates, names_to = "phyto", values_to = "val") %>%
group_by(Station) %>%
mutate(pct = val/sum(val)) %>%
ggplot(aes(x=Station, y=pct, fill = phyto)) +
geom_bar(stat="identity") +
theme_bw()

Created on 2024-12-04 with reprex v2.1.0