How to get the sum of a column by the uniqueness of another column in R

August 5, 2022

this is my data:

## # A tibble: 41,685 × 3

##    dow   rider_count     Date
##    <chr>       <dbl>    <int>
##  1 Fri            17 20160101
##  2 Fri            20 20160101
##  3 Fri            17 20160101
##  4 Fri            17 20160101
##  5 Fri            NA 20160101
##  6 Fri            63 20160101
##  7 Fri            12 20160101
##  8 Fri            14 20160101
##  9 Fri            NA 20160101
## 10 Fri            28 20160101
## # … with 41,675 more rows

There are many more rows as the date goes on. I want a dataframe that has the sum of rider_count for each date. So that it looks something like this:

## # A tibble: 41,685 × 3

##    dow   rider_count     Date
##    <chr>       <dbl>    <int>
##  1 Fri            .. 20160101
##  2 Sat            .. 20160102
##  3 Sun            .. 20160103
##  4 Mon            .. 20160104
##  5 Tue            .. 20160105
##  6 Wed            .. 20160106
##  7 Thu            .. 20160107
##  8 Fri            .. 20160108
##  9 Sat            .. 20160109
## 10 Sun            .. 20160110
## # … with 41,675 more rows

I have tried the below, but it doesn’t give me the sum of rider_count as it only removes those that had the same date instead of summing the rider_count as you can see here:

new_both$Date <- as.factor(new_both$Date)
new_both[!duplicated(new_both[,c('Date')]),]

## # A tibble: 1,097 × 3
##    dow   rider_count Date    

##    <chr>       <dbl> <fct>   
##  1 Fri            17 20160101
##  2 Sat            21 20160102
##  3 Sun           208 20160103
##  4 Mon           434 20160104
##  5 Tue           292 20160105
##  6 Wed           336 20160106
##  7 Thu           323 20160107
##  8 Fri            95 20160108
##  9 Sat           228 20160109
## 10 Sun           310 20160110
## # … with 1,087 more rows

Any tips I could apply? Thanks.

>Solution :

Maybe you need something like this. We could group then summarise:

library(dplyr)

df %>% 
  group_by(Date, dow) %>% 
  summarise(Sum_rider_count = sum(rider_count, na.rm = TRUE))


     Date dow   Sum_rider_count
     <int> <chr>           <int>
1 20160101 Fri               188