Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Is there a way to count repeated observations using the summarize function in R?

I’m working with a data set that contains CustomerID, Sales_Rep, Product, and year columns. The problem I have with this dataset is that there is no unique Transaction Number. The data looks like this:

CustomerID        Sales Rep          Product            Year
301978        Richard Grayson       Product A           2017
302151        Maurin Thompkins      Product B           2018
301962        Wallace West          Product C           2019
301978        Richard Grayson       Product B           2018
402152        Maurin Thompkins      Product A           2017
501967        Wallace West          Product B           2017
301978        Richard Grayson       Product B           2018

What I’m trying to do is count how many transactions were made by each Sales Rep, per year by counting the number of Customer IDs that appear for each Sales Rep per year regardless if the customer ID is repeated, and then compile it into one data frame called "Count". I tried using the following functions in R:

Count <- Sales_Data %>% 
  group_by(Sales_Rep, year) %>%
  summarize(count(CustomerID))

but I get this error:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Error: Problem with `summarise()` input `..1`.
i `..1 = count(PatientID)`.
x no applicable method for 'count' applied to an object of class "c('integer', 'numeric')"

The result I want to produce is this:

Sales Rep                2017          2018            2019
Richard Grayson            1             2            
Maurin Thompkins           1             1            
Wallace West               1                             1

Can anybody help me?

>Solution :

There is no need to group and summarise, function count does that in one step. Then reshape to wide format.

Sales_Data <- read.table(text = "
CustomerID        'Sales Rep'          Product            Year
301978        'Richard Grayson'       'Product A'           2017
302151        'Maurin Thompkins'      'Product B'           2018
301962        'Wallace West'          'Product C'           2019
301978        'Richard Grayson'       'Product B'           2018
402152        'Maurin Thompkins'      'Product A'           2017
501967        'Wallace West'          'Product B'           2017
301978        'Richard Grayson'       'Product B'           2018
", header = TRUE, check.names = FALSE)

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
})

Sales_Data %>% count(CustomerID)
#>   CustomerID n
#> 1     301962 1
#> 2     301978 3
#> 3     302151 1
#> 4     402152 1
#> 5     501967 1

Sales_Data %>% 
  count(`Sales Rep`, Year) %>%
  pivot_wider(id_cols = `Sales Rep`, names_from = Year, values_from = n)
#> # A tibble: 3 x 4
#>   `Sales Rep`      `2017` `2018` `2019`
#>   <chr>             <int>  <int>  <int>
#> 1 Maurin Thompkins      1      1     NA
#> 2 Richard Grayson       1      2     NA
#> 3 Wallace West          1     NA      1

Created on 2022-04-03 by the reprex package (v2.0.1)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading