I have two datasets like the following:
df1:
| gene_id | pvalue |
|---|---|
| ENSG00000000003 | 0.04 |
| ENSG00000000419 | 0.004 |
| ENSG00000111780 | 0.0004 |
| ENSG00000093100 | 0.00004 |
df2:
| gene_id | gene_name |
|---|---|
| ENSG00000000003 | TSPAN6 |
| ENSG00000000419 | DPM1 |
| ENSG00000111780 | |
| ENSG00000093100 |
I want to join the two dataframes by the gene_id column and fill the gene_name blanks with the corresponding gene_id.
The result I would like to obtain is as follows:
| gene_id | pvalue | gene_name |
|---|---|---|
| ENSG00000000003 | 0.04 | TSPAN6 |
| ENSG00000000419 | 0.004 | DPM1 |
| ENSG00000111780 | 0.0004 | ENSG00000111780 |
| ENSG00000093100 | 0.00004 | ENSG00000093100 |
How could I get the above result in R preferably using dplyr (or tidyverse)?
I tried to use left_join, but in gene_name there are blank spaces not filled.
new_df <- left_join(df1, df2, by = "gene_id")
>Solution :
After you do the join just use this. It will check for the empty values and place the gene_id for that row in gene_name:
new_df$gene_name <- ifelse(df$gene_name == '',df$gene_id,df$gene_name)