Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create unique row values in new column based on matching criteria in R

I have a dataframe with one identifier column of unique values, and one column which contains specific criteria.

I want to create a new identifier column of unique values, but where the value also contains information about which criteria it meets. In the example below, I have used case_when() and seq_along() to accomplish this:

set.seed(1)
df <- data.frame(
    ID = LETTERS[1:10],
    Criteria = paste0("Crit ", floor(runif(10, min=1, max=4)))
)
df %>%
mutate(
    ID2 = case_when(
        Criteria == "Crit 1" ~ paste0("x", seq_along(Criteria)),
        Criteria == "Crit 2" ~ paste0("y", seq_along(Criteria)),
        Criteria == "Crit 3" ~ paste0("z", seq_along(Criteria))
    )
)

Output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

A data.frame: 10 × 3
ID  Criteria ID2
A   c1       x1
B   c2       y2
C   c2       y3
D   c3       z4
E   c1       x5
F   c3       z6
G   c3       z7
H   c2       y8
I   c2       y9
J   c1       x10

The new column, ID2, now has row values that are both unique (numbers 1 to 10) and where the criteria can be identified (letters x, y and z). However, seq_along() inserts a new number for each row regardless of criterion. I’d rather that the count starts anew at one for each criterion. (Eg. for criterion c1: x1, x2, x3, …, xn; for c2: y1, y2, y3, …, ym; etc.)

What I want:

A data.frame: 10 × 3
ID  Criteria ID2
A   c1       x1
B   c2       y1
C   c2       y2
D   c3       z1
E   c1       x2
F   c3       z2
G   c3       z3
H   c2       y3
I   c2       y4
J   c1       x3

>Solution :

You can just add group_by(Criteria):

library(dplyr)

df %>%
  group_by(Criteria) %>%
  mutate(
    ID2 = case_when(
      Criteria == "Crit 1" ~ paste0("x", seq_along(Criteria)),
      Criteria == "Crit 2" ~ paste0("y", seq_along(Criteria)),
      Criteria == "Crit 3" ~ paste0("z", seq_along(Criteria))
    )
  )

Output:

# A tibble: 10 × 3
# Groups:   Criteria [3]
   ID    Criteria ID2  
   <chr> <chr>    <chr>
 1 A     Crit 1   x1   
 2 B     Crit 2   y1   
 3 C     Crit 2   y2   
 4 D     Crit 3   z1   
 5 E     Crit 1   x2   
 6 F     Crit 3   z2   
 7 G     Crit 3   z3   
 8 H     Crit 2   y3   
 9 I     Crit 2   y4   
10 J     Crit 1   x3 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading