Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merging two columns with condition?

I have a dataframe that looks like this:

> dput(df)
structure(list(Ethnicity = c("Non-Hispanic/Non-Latino", 
"Non-Hispanic/Non-Latino", "Non-Hispanic/Non-Latino", NA, "Non-Hispanic/Non-Latino", 
"Non-Hispanic/Non-Latino", "Hispanic/Latino", "Non-Hispanic/Non-Latino", 
"Non-Hispanic/Non-Latino", NA), Race = structure(c(1L, 
1L, 1L, NA, 5L, 1L, 7L, 1L, 7L, NA), levels = c("White", "2+ Races", 
"American Indian or Alaska Native", "Asian", "Black or African American", 
"Native Hawaiian or Other Pacific Islander", "Other", "Refused/Unknown"
), class = "factor")), row.names = c(NA, -10L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x7fe0098120e0>, index = integer(0))

I want to combine the info in both the Ethnicity and Race columns, so that if an individual’s ethnicity is Hispanic/Latino, that is recorded in the Race column. If the individual is Non-Hispanic/Non-Latino, then that information does not need to be copied into the race column.

The dataframe should look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

> dput(r)
structure(list(Ethnicity = c("Non-Hispanic/Non-Latino", "Non-Hispanic/Non-Latino", 
"Non-Hispanic/Non-Latino", NA, "Non-Hispanic/Non-Latino", "Non-Hispanic/Non-Latino", 
"Hispanic/Latino", "Non-Hispanic/Non-Latino", "Non-Hispanic/Non-Latino", 
NA), Race = c("White ", "White", "White", NA, "Black or African American", 
"White", "Other (Hispanic/Latino)", "White", "Other", NA)), class = "data.frame", row.names = c(NA, 
-10L))

As you can see, row 7 includes that the individual was Hispanic/Latino in the Race column now.

>Solution :

As it is a data.table, we can use data.table methods – specify the i with a logical expression and paste to assign (:=) the value

library(data.table)
df[Ethnicity == "Hispanic/Latino", Race := sprintf("%s (%s)", Race, Ethnicity)]

-output

> df
                  Ethnicity                      Race
 1: Non-Hispanic/Non-Latino                     White
 2: Non-Hispanic/Non-Latino                     White
 3: Non-Hispanic/Non-Latino                     White
 4:                    <NA>                      <NA>
 5: Non-Hispanic/Non-Latino Black or African American
 6: Non-Hispanic/Non-Latino                     White
 7:         Hispanic/Latino   Other (Hispanic/Latino)
 8: Non-Hispanic/Non-Latino                     White
 9: Non-Hispanic/Non-Latino                     Other
10:                    <NA>                      <NA>
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading