I want to create a column 2 (i.e., firstletter) with a numeric value (e.g., 1) assigned depending on the first letter of a word in column 1 (i.e., catname). In the sample dataset, column 1 has a list of cats’ names and I want to assign 1 to cats whose first letter of the name starts with A, 2 to cats whose first letter of the name starts with B, 3 to C, and so forth until the letter Z.
df <- data.frame(catname=c("Ave", "Ares", "Aze", "Bill", "Buz", "Chris", "Chase", "Charlie", "Coco"))
At the moment, I can only think of doing this using case_when() function, e.g.,
df %>% mutate(firstletter = case_when(str_start(catname) == "A" ~ "1",
str_start(catname) == "B" ~ "2",
str_start(catname) == "C" ~ "3"))
So the resulting outcome I hope is
| catname | firstletter |
| -------- | -------------- |
| Ave | 1 |
| Ares | 1 |
| Aze | 1 |
| Bill | 2 |
| Buz | 2 |
| Chris | 3 |
| Chase | 3 |
| Charlie | 3 |
| Coco | 3 |
I would appreciate your insights if there is another way to approach my problem.
>Solution :
You can subset to the first character, and then match against the build in LETTER array if you want the values to always be 1…26 even if some letters might be missing
df %>% mutate(first=match(substr(catname,1,1), LETTERS))
If you only want numbers for observed values, you can use the factor trick:
df %>% mutate(first=as.numeric(factor(substr(catname,1,1))))