I have a table with the information about shops and fruits (apples and oranges). Each shop has an inventory where these fruits are recorded by their IDs.
- Apples and oranges can be of different types e.g., pink lady, royal gala, apple eve, apple jazz etc. When they are of different types, they are stored with different IDs.
- But when two apples are of the same type, they will have the same ID.
I am interested in counting, for each shop, how many different types of apples and how many different types of oranges are there.
My input is:
Shop Apple_id Orange_id
Coles 12 201
Woolies 20 51
Walmart 13 16
Woolies 20 52
Coles 14 202
Target 19 81
M&S 75 99
Coles 16 203
M&S 71 99
Dunnes 56 101
M&S 72 91
My expected output is:
Shop Apples Oranges
Coles 3 3
Dunnes 1 1
M&S 3 2
Target 1 1
Walmart 1 1
Woolies 1 2
I can run the code one by one for each fruit using the dplyr() package:
# Extract information on Apples only
library(dplyr)
apples_by_shop = raw %>%
group_by(shop) %>%
distinct(Apple_id) %>%
count()
Similarly, I can write code to extract information for oranges only:
# Extract information on Oranges only
oranges_by_shop = raw %>%
group_by(shop) %>%
distinct(Orange_id) %>%
count()
My question is, can I merge the above two in a single line of code e.g., through the summarise function?
I would greatly appreciate any advice on this.
>Solution :
You may try
library(dplyr)
raw %>%
group_by(Shop) %>%
summarise(Apples = length(unique(Apple_id)),
Oranges = length(unique(Orange_id)))
Shop Apples Oranges
<chr> <int> <int>
1 Coles 3 3
2 Dunnes 1 1
3 M&S 3 2
4 Target 1 1
5 Walmart 1 1
6 Woolies 1 2
df %>%
group_by(Shop) %>%
summarise(Apples = n_distinct(Apple_id),
Oranges = n_distinct(Orange_id))