group_by counts across multiple columns in R

November 24, 2021

I have a table with the information about shops and fruits (apples and oranges). Each shop has an inventory where these fruits are recorded by their IDs.

Apples and oranges can be of different types e.g., pink lady, royal gala, apple eve, apple jazz etc. When they are of different types, they are stored with different IDs.
But when two apples are of the same type, they will have the same ID.

I am interested in counting, for each shop, how many different types of apples and how many different types of oranges are there.

My input is:

Shop     Apple_id   Orange_id
Coles    12         201
Woolies  20         51
Walmart  13         16
Woolies  20         52
Coles    14         202
Target   19         81
M&S      75         99
Coles    16         203
M&S      71         99
Dunnes   56         101
M&S      72         91

My expected output is:

Shop    Apples  Oranges
Coles   3       3
Dunnes  1       1
M&S     3       2
Target  1       1
Walmart 1       1
Woolies 1       2

I can run the code one by one for each fruit using the dplyr() package:

# Extract information on Apples only
library(dplyr)

apples_by_shop = raw %>%
  group_by(shop) %>%
  distinct(Apple_id) %>%
  count()

Similarly, I can write code to extract information for oranges only:

# Extract information on Oranges only
oranges_by_shop = raw %>%
  group_by(shop) %>%
  distinct(Orange_id) %>%
  count()

My question is, can I merge the above two in a single line of code e.g., through the summarise function?

I would greatly appreciate any advice on this.

>Solution :

You may try

library(dplyr)

raw %>%
  group_by(Shop) %>%
  summarise(Apples = length(unique(Apple_id)),
            Oranges = length(unique(Orange_id)))

  Shop    Apples Oranges
  <chr>    <int>   <int>
1 Coles        3       3
2 Dunnes       1       1
3 M&S          3       2
4 Target       1       1
5 Walmart      1       1
6 Woolies      1       2

df %>%
  group_by(Shop) %>%
  summarise(Apples = n_distinct(Apple_id),
            Oranges = n_distinct(Orange_id))