Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Convert a dataframe where each row has categorical data into a new dataframe with each category represented as a separate column

The following dataframe has one row per each patient (the rowids correspond to the patients), and one single column.

df <- data.frame(
  mutations = c('A497T', NA, 'C320T', 'A497T', NA, 'G621C', 'G621C')
)

This column tells whether the patient (row), has a given mutation, or not (NA).

I want to create a new dataframe where every unique mutation corresponds to a column, so for example, the first column will be "A497T", and every patient that presents this mutation will have a "Yes" value.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Same for the rest of the columns.

Original dataframe

mutations
<chr>
A497T               
NA              
C320T               
A497T               
NA              
G621C               
G621C

Desired output

A497T  | C320T  |  G621C
<chr>  | <chr>  |  <chr>
Yes    |NA      |NA
NA     |NA      |NA
NA     |Yes     |NA
Yes    |NA      |NA
NA     |NA      |NA
NA     |NA      |Yes
NA     |NA      |Yes

>Solution :

You can use table:

table(rownames(df), df$mutations)
    A497T C320T G621C
  1     1     0     0
  2     0     0     0
  3     0     1     0
  4     1     0     0
  5     0     0     0
  6     0     0     1
  7     0     0     1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading