Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Count rows by group based on values

I have a dataframe, DATA, containing information on students and their test dates. I want to create a variable called WANT, where for all the STUDENTs you count the unique months (not the unique rows) as shown in the sample of the WANT variable below:

library(dplyr)

set.seed(0)

DATA <- data.frame("STUDENT" = sample(1:5, 100, r = T),
                   "TESTDATE" = sample(seq(as.Date('2010/01/01'), as.Date('2010/12/31'), by="day"), 100, r=T))
    
DATA <- DATA %>% arrange(STUDENT, TESTDATE)
    
DATA$WANT <- c(1,1,1,2,2,3,3,4,4,5,5,6,7,7,8,8,9,1,1,1,2,3,3,4,5,5,6,7,8,8,9,10,10, rep(NA, 67))

My attempt only does rows and it’s not what I wish for

DATA %>% group_by(STUDENT) %>% mutate(WANT = 1:n())

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

We may extract the month part and use match or as.integer(factor(WANT2 levels = unique(WANT2)))

library(dplyr)
out <- DATA %>% 
  group_by(STUDENT) %>% 
  mutate(WANT2 = as.integer(format(TESTDATE, '%m')),
        WANT2 = match(WANT2, unique(WANT2))) %>%
   ungroup

-output

> head(out, 22) %>% as.data.frame
   STUDENT   TESTDATE WANT WANT2
1        1 2010-01-20    1     1
2        1 2010-01-31    1     1
3        1 2010-02-10    2     2
4        1 2010-02-10    2     2
5        1 2010-03-27    3     3
6        1 2010-04-20    4     4
7        1 2010-04-21    4     4
8        1 2010-05-02    5     5
9        1 2010-05-06    5     5
10       1 2010-05-13    5     5
11       1 2010-05-20    5     5
12       1 2010-06-17    6     6
13       1 2010-08-22    7     7
14       1 2010-08-25    7     7
15       1 2010-08-27    7     7
16       1 2010-08-30    7     7
17       1 2010-09-06    8     8
18       1 2010-09-30    8     8
19       1 2010-10-27    9     9
20       1 2010-10-31    9     9
21       1 2010-12-10   10    10
22       1 2010-12-21   10    10

If we want the year-month to count as separate, then do

out <- DATA %>% 
  group_by(STUDENT) %>% 
  mutate(WANT2 = format(TESTDATE, '%Y-%m'),
        WANT2 = match(WANT2, unique(WANT2))) %>%
   ungroup
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading