Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Rank a dataframe based on two columns with same values in R

I have a database of animals that have been tested. Animals are grouped in herds. Each herd can be tested multiple times. I would like to make a new column that will tell me if it was the first time or the second that this herd has been tested.

This is an example of my database:

df <- data.frame(
        animal = c("Animal1", "Animal2", "Animal3", "Animal1", "Animal2", "Animal3", "Animal4", "Animal5", "Animal6", "Animal4", "Animal5", "Animal6"),
        herd = c("Herd1","Herd1","Herd1", "Herd1","Herd1","Herd1","Herd2","Herd2", "Herd2","Herd2","Herd2","Herd2"),
        date = c("2017-01-01", "2017-01-01", "2017-01-01", "2018-07-01" , "2018-07-01", "2018-07-01", "2017-05-01", "2017-05-01", "2017-05-01", "2019-07-01", "2019-07-01", "2019-07-01"))

So, i would like it to look like this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

     animal herd    date        testing
1   Animal1 Herd1   2017-01-01  1 
2   Animal2 Herd1   2017-01-01  1
3   Animal3 Herd1   2017-01-01  1
4   Animal1 Herd1   2018-07-01  2
5   Animal2 Herd1   2018-07-01  2
6   Animal3 Herd1   2018-07-01  2
7   Animal4 Herd2   2017-05-01  1
8   Animal5 Herd2   2017-05-01  1
9   Animal6 Herd2   2017-05-01  1
10  Animal4 Herd2   2019-07-01  2
11  Animal5 Herd2   2019-07-01  2
12  Animal6 Herd2   2019-07-01  2

I’ve tried this but is not exactly what I would like, and gets really messy with the whole database

df <- df %>%
  group_by(herd) %>%
  mutate(testing = rank(date))

> df
# A tibble: 12 x 4
# Groups:   herd [2]
   animal  herd  date       testing
   <fct>   <fct> <fct>        <dbl>
 1 Animal1 Herd1 2017-01-01       2
 2 Animal2 Herd1 2017-01-01       2
 3 Animal3 Herd1 2017-01-01       2
 4 Animal1 Herd1 2018-07-01       5
 5 Animal2 Herd1 2018-07-01       5
 6 Animal3 Herd1 2018-07-01       5
 7 Animal4 Herd2 2017-05-01       2
 8 Animal5 Herd2 2017-05-01       2
 9 Animal6 Herd2 2017-05-01       2
10 Animal4 Herd2 2019-07-01       5
11 Animal5 Herd2 2019-07-01       5
12 Animal6 Herd2 2019-07-01       5

Thanks for your help!

>Solution :

You can use dplyr::dense_rank:

df %>% 
  group_by(herd) %>% 
  mutate(testing = dense_rank(date))

output

   animal  herd  date       testing
   <chr>   <chr> <chr>        <int>
 1 Animal1 Herd1 2017-01-01       1
 2 Animal2 Herd1 2017-01-01       1
 3 Animal3 Herd1 2017-01-01       1
 4 Animal1 Herd1 2018-07-01       2
 5 Animal2 Herd1 2018-07-01       2
 6 Animal3 Herd1 2018-07-01       2
 7 Animal4 Herd2 2017-05-01       1
 8 Animal5 Herd2 2017-05-01       1
 9 Animal6 Herd2 2017-05-01       1
10 Animal4 Herd2 2019-07-01       2
11 Animal5 Herd2 2019-07-01       2
12 Animal6 Herd2 2019-07-01       2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading