Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

New column based on rows of other dataframe

I am wondering how I can create a new column and use the information of the corresponding row in a different dataset as input for that column.

Suppose I have these two datasets:

newDf <- data.frame(c("Juice 1", "Juice 2", "Juice 3", "Juice 4","Juice 5"),
                    c("Banana", "Banana", "Orange", "Pear", "Apple"),
                    c("Pear", "Orange", "Pear", "Apple", "Pear"),
                    c("Orange", "Mango", "Banana", "Banana", "Banana"))
colnames(newDf) <- c("Juice", "Fruit 1", "Fruit 2", "Fruit 3")

newDf2 <- data.frame(c("Juice 6", "Juice 3", "Juice 2", "Juice 8","Juice 1"),
                     c(NA,NA,NA,NA,NA),
                     c(NA,NA,NA,NA,NA),
                     c(NA,NA,NA,NA,NA))
colnames(newDf2) <- c("Juice", "Fruit 1", "Fruit 2", "Fruit 3")

Resulting in:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

> print(newDf)
    Juice Fruit 1 Fruit 2 Fruit 3
1 Juice 1  Banana    Pear  Orange
2 Juice 2  Banana  Orange   Mango
3 Juice 3  Orange    Pear  Banana
4 Juice 4    Pear   Apple  Banana
5 Juice 5   Apple    Pear  Banana

> print(newDf2)
    Juice Fruit 1 Fruit 2 Fruit 3
1 Juice 6      NA      NA      NA
2 Juice 3      NA      NA      NA
3 Juice 2      NA      NA      NA
4 Juice 8      NA      NA      NA
5 Juice 1      NA      NA      NA

In my newDf2 I want to assign the fruits from newDf. However, as you can see the Juices in the first column are not in the same order, so I have to assign based on the Juice.

I want to get to this outcome:

    Juice Fruit 1 Fruit 2 Fruit 3
1 Juice 6      NA      NA      NA
2 Juice 3  Orange    Pear  Banana
3 Juice 2  Banana  Orange   Mango
4 Juice 8      NA      NA      NA
5 Juice 1  Banana    Pear  Orange

Can somebody help me with this?

Thanks!

EDIT:

In my actual dataset, the columns do not all start with Fruit btw… So they all have different names. Could you help me find a solution that works per total column name instead of the ones only starting with ‘Fruit’?

>Solution :

We may use match within across – loop across the ‘Fruit’ columns in ‘newDf2’, match the ‘Juice’ column values from the ‘newDf’, use that index to extract the values of the corresponding columns (cur_column() – returns the column name of the looped column) from ‘newDf`

library(dplyr)
newDf2 <- newDf2 %>% 
  mutate(across(starts_with('Fruit'), 
       ~ newDf[[cur_column()]][match(Juice, newDf$Juice)]))

-output

newDf2
  Juice Fruit 1 Fruit 2 Fruit 3
1 Juice 6    <NA>    <NA>    <NA>
2 Juice 3  Orange    Pear  Banana
3 Juice 2  Banana  Orange   Mango
4 Juice 8    <NA>    <NA>    <NA>
5 Juice 1  Banana    Pear  Orange

If column names doesn’t have any pattern, then use column index. Assuming that the first column in both datasets are Juice, exclude that column in across with -1

newDf2 <- newDf2 %>% 
  mutate(across(-1, 
       ~ newDf[[cur_column()]][match(Juice, newDf$Juice)]))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading