Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Separate text into columns and pivot table

I have a dataframe where one of the columns has its items separated with commas. It looks like:

Data
a,b,c
a,c,d
d,e
a,e
a,b,c,d,e

My goal is to create a matrix that has as header all the unique values from column Data, meaning [a,b,c,d,e]. Then as rows a flag indicating if the value is at that particular row.
The matrix should look like this:

Data a b c d e
a,b,c 1 1 1 0 0
a,c,d 1 0 1 1 0
d,e 0 0 0 1 1
a,e 1 0 0 0 1
a,b,c,d,e 1 1 1 1 1

To separate column Data what I did is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['data'].str.split(',', expand = True)

Then I dont know how to proceed to allocate the flags to each of the columns.

Can anybody help me with this?

Thank you very much
Regards

>Solution :

Maybe you can try this without pivot.

Create the dataframe.

import pandas as pd
import io

s = '''Data
a,b,c
a,c,d
d,e
a,e
a,b,c,d,e'''

df = pd.read_csv(io.StringIO(s), sep = "\s+")

We can use pandas.Series.str.split with expand argument equals to True. And value_counts each rows with axis = 1.

Finally fillna with zero and change the data into integer with astype(int).

df["Data"].str.split(pat = ",", expand=True).apply(lambda x : x.value_counts(), axis = 1).fillna(0).astype(int)

#
    a   b   c   d   e
0   1   1   1   0   0
1   1   0   1   1   0
2   0   0   0   1   1
3   1   0   0   0   1
4   1   1   1   1   1

And then merge it with the original column.

new = df["Data"].str.split(pat = ",", expand=True).apply(lambda x : x.value_counts(), axis = 1).fillna(0).astype(int)
pd.concat([df, new], axis = 1)

#
    Data        a   b   c   d   e
0   a,b,c       1   1   1   0   0
1   a,c,d       1   0   1   1   0
2   d,e         0   0   0   1   1
3   a,e         1   0   0   0   1
4   a,b,c,d,e   1   1   1   1   1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading