Home Is there a way to count repeated observations using the summarize function in R?

Questions

Is there a way to count repeated observations using the summarize function in R?

April 3, 2022

I’m working with a data set that contains CustomerID, Sales_Rep, Product, and year columns. The problem I have with this dataset is that there is no unique Transaction Number. The data looks like this:

CustomerID        Sales Rep          Product            Year
301978        Richard Grayson       Product A           2017
302151        Maurin Thompkins      Product B           2018
301962        Wallace West          Product C           2019
301978        Richard Grayson       Product B           2018
402152        Maurin Thompkins      Product A           2017
501967        Wallace West          Product B           2017
301978        Richard Grayson       Product B           2018

What I’m trying to do is count how many transactions were made by each Sales Rep, per year by counting the number of Customer IDs that appear for each Sales Rep per year regardless if the customer ID is repeated, and then compile it into one data frame called "Count". I tried using the following functions in R:

Count <- Sales_Data %>% 
  group_by(Sales_Rep, year) %>%
  summarize(count(CustomerID))

but I get this error:

Error: Problem with `summarise()` input `..1`.
i `..1 = count(PatientID)`.
x no applicable method for 'count' applied to an object of class "c('integer', 'numeric')"

The result I want to produce is this:

Sales Rep                2017          2018            2019
Richard Grayson            1             2            
Maurin Thompkins           1             1            
Wallace West               1                             1

Can anybody help me?

>Solution :

There is no need to group and summarise, function count does that in one step. Then reshape to wide format.

Sales_Data <- read.table(text = "
CustomerID        'Sales Rep'          Product            Year
301978        'Richard Grayson'       'Product A'           2017
302151        'Maurin Thompkins'      'Product B'           2018
301962        'Wallace West'          'Product C'           2019
301978        'Richard Grayson'       'Product B'           2018
402152        'Maurin Thompkins'      'Product A'           2017
501967        'Wallace West'          'Product B'           2017
301978        'Richard Grayson'       'Product B'           2018
", header = TRUE, check.names = FALSE)

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
})

Sales_Data %>% count(CustomerID)
#>   CustomerID n
#> 1     301962 1
#> 2     301978 3
#> 3     302151 1
#> 4     402152 1
#> 5     501967 1

Sales_Data %>% 
  count(`Sales Rep`, Year) %>%
  pivot_wider(id_cols = `Sales Rep`, names_from = Year, values_from = n)
#> # A tibble: 3 x 4
#>   `Sales Rep`      `2017` `2018` `2019`
#>   <chr>             <int>  <int>  <int>
#> 1 Maurin Thompkins      1      1     NA
#> 2 Richard Grayson       1      2     NA
#> 3 Wallace West          1     NA      1

^{Created on 2022-04-03 by the reprex package (v2.0.1)}