Home R: Add new column based on character vector and existing column in dataframe with unique items

Questions

R: Add new column based on character vector and existing column in dataframe with unique items

December 7, 2021

I want to assign elements in the character vector to the dataframe based on matching information in the existing column.

Data frame with one column

head(df, 5)

tail(df, 5)

The character vector chr_v consists of 44 unique items.

chr_v <- c("T1_1", "C1_1", "T1_2", "A_1", "C_2", "C_3", "T1_3", "A_2", "C_4", 
"C_5", "C_6", "C_7", "C_8", "A_3", "C_9", 'C_10', "C_11", "A_4", 'C_12', "A_5", 
"C_13", "A_6", "A_7", "C_14", "C_15", "C_16", "T_4", "C_17", "C_18", "C_19", 'T_5', 
"C_20", "C_21", "T_6", "A_8", "C_22", "C_23", "C_24", "C_25", "C_26", "T_7", "T_8", 
'C_27', 'C_28')

The length of ```chr_v``` is 
length(chr_v)
[1] 44

There are 44 unique ordered items in column items in dataframe and 44 rows in the character vector. I want to create a new column by repeating each item in the character vector to the unique ordered item in the column of dataframe.

Expected Output:
head(df, 5)

       items    newitem
1        1      T1_1
2        1      T1_1
3        1      T1_1
4        1      T1_1
5        1      T1_1

tail(df, 5)

      items    newitem
120001  44      C_28
120002  44      C_28
120003  44      C_28
120004  44      C_28
120005  44      C_28

I checked the dimension of each items in the df with table command but the output is not ordered (even tried to sort). Therefore, I cannot use the output to simply repeat the items sequentially.

>Solution :

Martin provided a tidyverse solution. Here is a base R solution:

df$newitem <- sample_info[df$items]

Here the dplyr pendant:

df %>% 
  mutate(newitem = sample_info[items])

output:

   items newitem
1      1    T1_1
2      1    T1_1
3      1    T1_1
4      1    T1_1
5      1    T1_1
6     44    C_28
7     44    C_28
8     44    C_28
9     44    C_28
10    44    C_28

data:

df <- structure(list(items = c(1L, 1L, 1L, 1L, 1L, 44L, 44L, 44L, 44L, 
44L), newitem = c("T1_1", "T1_1", "T1_1", "T1_1", "T1_1", "C_28", 
"C_28", "C_28", "C_28", "C_28")), row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10"), class = "data.frame")