Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Fill a data.frame column with values from other data.frame column with a condition R

I have this two data.frames:

  df1 = data.frame(
    "scientific_name"=c("Anchietea ballardii","Anemia patens","Cololobus argenteus"),
    "threat_status"=c("VU","EN","EN")
  )

> df1
      scientific_name threat_status
1 Anchietea ballardii            VU
2       Anemia patens            EN
3 Cololobus argenteus            EN
> 


  df2 = data.frame(
    "scientific_name"=c("Anchietea ballardii","Anemia patens","Cololobus argenteus","Coleocephalocereus pluricostatus", "Dyckia bracteata", "Merianthera parvifolia", "Kielmeyera rupestris"),
    "threat_status"=c(NA)
  )

> df2
                     scientific_name threat_status
1                Anchietea ballardii            NA
2                      Anemia patens            NA
3                Cololobus argenteus            NA
4   Coleocephalocereus pluricostatus            NA
5                   Dyckia bracteata            NA
6             Merianthera parvifolia            NA
7               Kielmeyera rupestris            NA
>

I need to fill the df2$threat_status with the correspondent df1$threat_status, but when there is not any value for the column df2$threat_status it can be filled with "LC". I’m trying with sqldf INSERT TO, but it’s not working. I’s like to have a solution in plain R, something tells me it would be more elegant. Somebody could help? Thanks a lot!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

In base R, you can do:

df2 = merge(df2[,1,drop=F],df1, by="scientific_name", all.x=T)
df2[is.na(df2$threat_status), 2] <- "LC"

Output:

                   scientific_name threat_status
1              Anchietea ballardii            VU
2                    Anemia patens            EN
3 Coleocephalocereus pluricostatus            LC
4              Cololobus argenteus            EN
5                 Dyckia bracteata            LC
6             Kielmeyera rupestris            LC
7           Merianthera parvifolia            LC

Another option in your example is to simply set all the threat_status values in df2 to "LC", and then just row bind df1 to the rows in df2 where df1$scientfic_name does not appear:

df2$threat_status="LC"
rbind(df1,df2[!df2$scientific_name %in% df1$scientific_name,])

Output:

                   scientific_name threat_status
1              Anchietea ballardii            VU
2                    Anemia patens            EN
3              Cololobus argenteus            EN
4 Coleocephalocereus pluricostatus            LC
5                 Dyckia bracteata            LC
6           Merianthera parvifolia            LC
7             Kielmeyera rupestris            LC
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading