Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to get the sum of a column by the uniqueness of another column in R

this is my data:

## # A tibble: 41,685 × 3

##    dow   rider_count     Date
##    <chr>       <dbl>    <int>
##  1 Fri            17 20160101
##  2 Fri            20 20160101
##  3 Fri            17 20160101
##  4 Fri            17 20160101
##  5 Fri            NA 20160101
##  6 Fri            63 20160101
##  7 Fri            12 20160101
##  8 Fri            14 20160101
##  9 Fri            NA 20160101
## 10 Fri            28 20160101
## # … with 41,675 more rows

There are many more rows as the date goes on. I want a dataframe that has the sum of rider_count for each date. So that it looks something like this:

## # A tibble: 41,685 × 3

##    dow   rider_count     Date
##    <chr>       <dbl>    <int>
##  1 Fri            .. 20160101
##  2 Sat            .. 20160102
##  3 Sun            .. 20160103
##  4 Mon            .. 20160104
##  5 Tue            .. 20160105
##  6 Wed            .. 20160106
##  7 Thu            .. 20160107
##  8 Fri            .. 20160108
##  9 Sat            .. 20160109
## 10 Sun            .. 20160110
## # … with 41,675 more rows

I have tried the below, but it doesn’t give me the sum of rider_count as it only removes those that had the same date instead of summing the rider_count as you can see here:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

new_both$Date <- as.factor(new_both$Date)
new_both[!duplicated(new_both[,c('Date')]),]

## # A tibble: 1,097 × 3
##    dow   rider_count Date    

##    <chr>       <dbl> <fct>   
##  1 Fri            17 20160101
##  2 Sat            21 20160102
##  3 Sun           208 20160103
##  4 Mon           434 20160104
##  5 Tue           292 20160105
##  6 Wed           336 20160106
##  7 Thu           323 20160107
##  8 Fri            95 20160108
##  9 Sat           228 20160109
## 10 Sun           310 20160110
## # … with 1,087 more rows

Any tips I could apply? Thanks.

>Solution :

Maybe you need something like this. We could group then summarise:

library(dplyr)

df %>% 
  group_by(Date, dow) %>% 
  summarise(Sum_rider_count = sum(rider_count, na.rm = TRUE))


     Date dow   Sum_rider_count
     <int> <chr>           <int>
1 20160101 Fri               188
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading