I have data like this: (many more columns not shown here)
df<-structure(list(email = c("lbelcher@place.org", "bbelchery@place.org",
"b.smith@place.org", "jsmith1@place.org"), employee_number = c(123456,
654321, 664422, 321458)), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
And I need to make a third column called "username". Username is usually just everything before the @ in their email UNLESS there’s a period or a number in that name, then it would be their employee number.
In other words, I’m hoping to get results like this:
Any help would be appreciated!
>Solution :
We could use str_detect on the substring of ’email’ (before the @) to find for . or digits, then return the ’employee_number’ or else remove the suffix part of ’email’ with str_remove
library(dplyr)
library(stringr)
df <- df %>%
mutate(username = case_when(str_detect(trimws(email,
whitespace = "@.*"), "[.0-9]")
~ as.character(employee_number), TRUE ~ str_remove(email, "@.*")))
-output
df
# A tibble: 4 × 3
email employee_number username
<chr> <dbl> <chr>
1 lbelcher@place.org 123456 lbelcher
2 bbelchery@place.org 654321 bbelchery
3 b.smith@place.org 664422 664422
4 jsmith1@place.org 321458 321458
