Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to split text file with Pipe delimiter in Python and then have number of columns equal to number of attribute values?

I have a text file as shown below with "|" delimiters, the number of values for an attribute can be any. Need new columns based equal to number of values for an attribute.

Attribute1|6
Attribute2|10|15|27
Attribute3|3|7

Required output in DataFrame should be:

Attribute1         6
Attribute2_val_1  10
Attribute2_val_2  15
Attribute3_val_3  27
Attribute3_val_1   3
Attribute3_val_2   7

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use the following:

MAX_COL = 10
df = (pd.read_csv('filename.txt', sep=r'\|', names=range(MAX_COL+1))
        .dropna(how='all', axis=1)
        .set_index(0).rename_axis('id').stack()
        .reset_index(name='value')
        .assign(id=lambda d: (d['id'].where(d.groupby('id')['id'].transform('size').le(1),
                                            d['id']+'_val_'+d['level_1'].astype(str))
                             )
               )
        .drop(columns=['level_1'])
      )

output:

>>> df
                 id  value
0        Attribute1    6.0
1  Attribute2_val_1   10.0
2  Attribute2_val_2   15.0
3  Attribute2_val_3   27.0
4  Attribute3_val_1    3.0
5  Attribute3_val_2    7.0

alternative:

with open('filename.txt') as f:
    df = pd.DataFrame([[s[0], s[1:]] for s in
                        map(lambda s: s.strip().split('|'), f.readlines())],
                      columns=['id', 'value']
                     )

df = (df.explode('value')
        .assign(id=lambda d: (d['id'].where(d.groupby('id')['id'].transform('size').le(1),
                                            d['id']+'_val_'+d.groupby('id')['id'].cumcount().add(1).astype(str))
                             )
               )
      )

output:

                 id value
0        Attribute1     6
1  Attribute2_val_1    10
1  Attribute2_val_2    15
1  Attribute2_val_3    27
2  Attribute3_val_1     3
2  Attribute3_val_2     7
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading