Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sequentially number across groups without restarting sequence in r

I want to create the "turn" column in the example data frame. I have a larger dataset with thousands of rows. This column will indicate the current turn of the speaker. Even if the sentences are across different rows, if they are spoken by the same speaker, it will count as the same turn. Then, the next time said person has a turn to speak, it will be nth turn.

df <- data.frame(
  line = c(1:9),
  speaker = c("nick", "nick", "nick", "bob", "nick", "ann", "ann", "nick", "bob"),
  sentence = c("hi", "how are you?", "what's up?", "i'm good", "me too", "hi guys", "any plans for the weekend", "no", "ya, the movies"),
  turn = c(1, 1, 1, 2, 3, 4, 4, 5, 6))

I have used:

  • group_by(speaker) %>% mutate(turn2 = cur_group_id()) – but it numbers by speaker’s name in alphabetical order and the same speaker is coded as the same number e.g., Nick is always numbered as 3, but should be numbered as turns 1, 3, and 5:
   line speaker sentence      turn turn_curgroupid
1     1 nick    hi               1               3
2     2 nick    how are you?     1               3
3     3 nick    what's up?       1               3
4     4 bob     i'm good         2               2
5     5 nick    me too           3               3
6     6 ann     hi guys          4               1
  • seq_along(speaker) – sequentially counts the rows per speaker despite it being the same turn e.g., what should be Nick’s first turn, is numbered as 1:3
   line speaker sentence      turn turn_seqalong
1     1 nick    hi               1             1
2     2 nick    how are you?     1             2
3     3 nick    what's up?       1             3
4     4 bob     i'm good         2             1
5     5 nick    me too           3             4
6     6 ann     hi guys          4             1

Thanks for your help.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

df |>
  mutate(turn2 = cumsum(speaker != lag(speaker, 1, "")),
         turn3 = consecutive_id(speaker)) 
         # H/T @andre-wildberg for mentioning this useful dplyr 1.1.0 function

Result

  line speaker                  sentence turn turn2 turn3
1    1    nick                        hi    1     1     1
2    2    nick              how are you?    1     1     1
3    3    nick                what's up?    1     1     1
4    4     bob                  i'm good    2     2     2
5    5    nick                    me too    3     3     3
6    6     ann                   hi guys    4     4     4
7    7     ann any plans for the weekend    4     4     4
8    8    nick                        no    5     5     5
9    9     bob            ya, the movies    6     6     6
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading