Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

group_by counts across multiple columns in R

I have a table with the information about shops and fruits (apples and oranges). Each shop has an inventory where these fruits are recorded by their IDs.

  • Apples and oranges can be of different types e.g., pink lady, royal gala, apple eve, apple jazz etc. When they are of different types, they are stored with different IDs.
  • But when two apples are of the same type, they will have the same ID.

I am interested in counting, for each shop, how many different types of apples and how many different types of oranges are there.

My input is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Shop     Apple_id   Orange_id
Coles    12         201
Woolies  20         51
Walmart  13         16
Woolies  20         52
Coles    14         202
Target   19         81
M&S      75         99
Coles    16         203
M&S      71         99
Dunnes   56         101
M&S      72         91

My expected output is:

Shop    Apples  Oranges
Coles   3       3
Dunnes  1       1
M&S     3       2
Target  1       1
Walmart 1       1
Woolies 1       2

I can run the code one by one for each fruit using the dplyr() package:

# Extract information on Apples only
library(dplyr)

apples_by_shop = raw %>%
  group_by(shop) %>%
  distinct(Apple_id) %>%
  count()

Similarly, I can write code to extract information for oranges only:

# Extract information on Oranges only
oranges_by_shop = raw %>%
  group_by(shop) %>%
  distinct(Orange_id) %>%
  count()

My question is, can I merge the above two in a single line of code e.g., through the summarise function?

I would greatly appreciate any advice on this.

>Solution :

You may try

library(dplyr)

raw %>%
  group_by(Shop) %>%
  summarise(Apples = length(unique(Apple_id)),
            Oranges = length(unique(Orange_id)))

  Shop    Apples Oranges
  <chr>    <int>   <int>
1 Coles        3       3
2 Dunnes       1       1
3 M&S          3       2
4 Target       1       1
5 Walmart      1       1
6 Woolies      1       2

df %>%
  group_by(Shop) %>%
  summarise(Apples = n_distinct(Apple_id),
            Oranges = n_distinct(Orange_id))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading