Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can i create a new column in R that will return specific values according to the initial values in R using dplyr?

I have a data frame that looks like this :

names
MARY123L
MARYL123.00
MARYNLO
MARYNLA
JOHN330
JOHNNLA
JOHN123A
JOHN123n456.00
GEORGEJ
GEORGEJ
GEORGEJ
GEORGENLA

i want to create a new column variable that will check each element in the column name and will return a word according to a condition :

  1. if the word in the column names ends with a letter to give me the word "table",
  2. f the word in the column names ends with a number to give me the word "chair"
  3. and if the word in the column names ends with a "NLA" or "NLO" to give me the word "clothing"

Ideally i want the new data frame to look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

names var
MARY123L table
MARYL123.00 chair
MARYNLO clothing
MARYNLA clothing
JOHN330 chair
JOHNNLA clothing
JOHN123A table
JOHN123n456.00 chair
GEORGEJ table
GEORGEJ table
GEORGEJ table
GEORGENLA clothing

How I can do this in R using dplyr?

library(tidyverse)
names = c("MARY123L","MARYL123.00","MARYNLO","MARYNLA",
          "JOHN330","JOHNNLA","JOHN123A","JOHN123n456.00","GEORGEJ","GEORGEJ","GEORGEJ","GEORGENLA")
DATA = tibble(names);DATA


>Solution :

Essentially the $ (ends with) metacharacter is what you are looking for.

DATA  |>
    mutate(
        var = case_when(
            grepl("NLA$|NLO$", names) ~ "clothing",
            grepl("[0-9]$", names) ~ "chair", 
            grepl("[[:alpha:]]$", names) ~ "table",
            TRUE ~ "Something has gone wrong - this should never appear"
        )
    )

# A tibble: 12 x 2
#    names          var     
#    <chr>          <chr>
#  1 MARY123L       table
#  2 MARYL123.00    chair
#  3 MARYNLO        clothing
#  4 MARYNLA        clothing
#  5 JOHN330        chair
#  6 JOHNNLA        clothing
#  7 JOHN123A       table
#  8 JOHN123n456.00 chair
#  9 GEORGEJ        table
# 10 GEORGEJ        table
# 11 GEORGEJ        table
# 12 GEORGENLA      clothing

Difference between [[:alpha:]]$ and [a-zA-Z]$

I see another answer was posted at the same time which was pretty similar. It may get different results depending on your locale. For example:

accented_sometimes  <- c(
    "This line ends with a letter", 
    "But this line ends with é"
)

grepl("[[:alpha:]]$", accented_sometimes)
# [1] TRUE TRUE
grepl("[a-zA-Z]$", accented_sometimes)
# [1]  TRUE FALSE

There can also be differences between \\d and [0-9] – see here for more. I suspect this depends heavily on which R you are using – I am using 4.1 on Windows which does not have Unicode support but any later version or the same version on Linux/Mac will do.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading