Apologies if the question isn’t formulated correctly in the title, I am fairly new to this and still not used to the exact terminology.
I need to add a column to a data frame that contains the result of operations (e.g., mean, sum, etc.) grouped by values in other columns. At the same time, I would not like to lose any of the data in the data frame. However, I haven’t been able to find a function that does that, or a solution of any kind.
Let me illustrate with an example:
df
Name Round Points
1 John 1 6
2 John 2 4
3 John 3 6
4 John 4 8
5 Annie 1 7
6 Annie 2 4
7 Annie 3 3
8 Annie 4 6
9 Bill 1 4
10 Bill 2 5
11 Bill 3 8
12 Bill 4 5
I would like to add a column at to the dataframe that contains the average points of each player displayed in each row:
Name Round Points Avg.points
1 John 1 6 6.0
2 John 2 4 6.0
3 John 3 6 6.0
4 John 4 8 6.0
5 Annie 1 7 5.0
6 Annie 2 4 5.0
7 Annie 3 3 5.0
8 Annie 4 6 5.0
9 Bill 1 4 5.5
10 Bill 2 5 5.5
11 Bill 3 8 5.5
12 Bill 4 5 5.5
However, if I use group_by(Name)
I lose part of the data:
df <- df %>%
group_by(Name) %>%
summarise(Avg.points = mean(Points, na.rm = TRUE))
df
# A tibble: 3 × 2
Name Avg.points
<chr> <dbl>
1 Annie 5
2 Bill 5.5
3 John 6
If I use group_by(Name, Round, Points)
I don’t get the correct average:
df <- df %>%
group_by(Name, Round, Points) %>%
summarise(Avg.points = mean(Points, na.rm = TRUE))
df
# A tibble: 12 × 4
# Groups: Name, Round [12]
Name Round Points Avg.points
<chr> <dbl> <dbl> <dbl>
1 Annie 1 7 7
2 Annie 2 4 4
3 Annie 3 3 3
4 Annie 4 6 6
5 Bill 1 4 4
6 Bill 2 5 5
7 Bill 3 8 8
8 Bill 4 5 5
9 John 1 6 6
10 John 2 4 4
11 John 3 6 6
12 John 4 8 8
Thanks to anyone who could help me out with this.
>Solution :
You can use the across
function within summarize to easily keep any columns that you want to keep. Using across(everything())
will maintain everything in your dataframe.
Try:
library(dplyr)
df <- df %>%
group_by(Name) %>%
summarise(across(everything()), Avg.points = mean(Points, na.rm = TRUE))
df
#> # A tibble: 12 × 4
#> # Groups: Name [3]
#> Name Round Points Avg.points
#> <chr> <dbl> <dbl> <dbl>
#> 1 Annie 1 7 5
#> 2 Annie 2 4 5
#> 3 Annie 3 3 5
#> 4 Annie 4 6 5
#> 5 Bill 1 4 5.5
#> 6 Bill 2 5 5.5
#> 7 Bill 3 8 5.5
#> 8 Bill 4 5 5.5
#> 9 John 1 6 6
#> 10 John 2 4 6
#> 11 John 3 6 6
#> 12 John 4 8 6
If you want to order the dataframe by average number of points, add the arrange
function:
library(dplyr)
df <- df %>%
group_by(Name) %>%
summarise(across(everything()), Avg.points = mean(Points, na.rm = TRUE)) %>%
arrange(-Avg.points)
df
#> # A tibble: 12 × 4
#> # Groups: Name [3]
#> Name Round Points Avg.points
#> <chr> <dbl> <dbl> <dbl>
#> 1 John 1 6 6
#> 2 John 2 4 6
#> 3 John 3 6 6
#> 4 John 4 8 6
#> 5 Bill 1 4 5.5
#> 6 Bill 2 5 5.5
#> 7 Bill 3 8 5.5
#> 8 Bill 4 5 5.5
#> 9 Annie 1 7 5
#> 10 Annie 2 4 5
#> 11 Annie 3 3 5
#> 12 Annie 4 6 5
Check out the documentation for more info about across
, or see this release note from the creators of dplyr
.