Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Dplyr::last – rows dropped if one variable can't be computed in summarise

I have a data frame I want to summarise, for some of the groups, some variables should return NA, but instead the whole row is removed.
Toy example Df:

df=data.frame(button=c(1,2,3,3,3,2),group=c(1,1,1,2,2,2),RT=c(100,110,120,130,140,150))

When I summarise without using "last" I get as expected:

df%>%dplyr::group_by(group) %>%dplyr::summarize(RT=mean(RT), RT.button1=mean(RT[button==1]))
# A tibble: 2 x 3
  group    RT RT.button1
* <dbl> <dbl>      <dbl>
1     1   110        110
2     2   140        NaN

But when I use last, instead the row is removed

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df%>%dplyr::group_by(group) %>%dplyr::summarize(RT=mean(RT), RT.button1=mean(RT[button==1]),RT.last.button1=last(RT[button==1]))
# A tibble: 1 x 4
# Groups:   group [1]
  group    RT RT.button1 RT.last.button1
  <dbl> <dbl>      <dbl>           <dbl>
1     1   110        110             110

Is there any way to get "last" to return NA instead of removing the row?
I’d be very grateful for any pointers!

>Solution :

This is certainly because you are using data.table::last instead of dplyr::last.

With data.table::last:

df %>% 
  dplyr::group_by(group) %>% 
  dplyr::summarize(RT = mean(RT), 
                   RT.button1 = mean(RT[button == 1]),
                   RT.last.button1 = data.table::last(RT[button == 1]))

# Groups:   group [1]
  group    RT RT.button1 RT.last.button1
  <dbl> <dbl>      <dbl>           <dbl>
1     1   110        110             110

With dplyr::last:

df %>% 
  dplyr::group_by(group) %>% 
  dplyr::summarize(RT = mean(RT), 
                   RT.button1 = mean(RT[button == 1]),
                   RT.last.button1 = dplyr::last(RT[button == 1]))
# A tibble: 2 x 4
  group    RT RT.button1 RT.last.button1
  <dbl> <dbl>      <dbl>           <dbl>
1     1   110        110             110
2     2   140        NaN              NA
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading