Home How to split text file with Pipe delimiter in Python and then have number of columns equal to number of attribute values?

Questions

How to split text file with Pipe delimiter in Python and then have number of columns equal to number of attribute values?

byMR

November 18, 2021

I have a text file as shown below with "|" delimiters, the number of values for an attribute can be any. Need new columns based equal to number of values for an attribute.

Attribute1|6
Attribute2|10|15|27
Attribute3|3|7

Required output in DataFrame should be:

Attribute1         6
Attribute2_val_1  10
Attribute2_val_2  15
Attribute3_val_3  27
Attribute3_val_1   3
Attribute3_val_2   7

>Solution :

You can use the following:

MAX_COL = 10
df = (pd.read_csv('filename.txt', sep=r'\|', names=range(MAX_COL+1))
        .dropna(how='all', axis=1)
        .set_index(0).rename_axis('id').stack()
        .reset_index(name='value')
        .assign(id=lambda d: (d['id'].where(d.groupby('id')['id'].transform('size').le(1),
                                            d['id']+'_val_'+d['level_1'].astype(str))
                             )
               )
        .drop(columns=['level_1'])
      )

output:

>>> df
                 id  value
0        Attribute1    6.0
1  Attribute2_val_1   10.0
2  Attribute2_val_2   15.0
3  Attribute2_val_3   27.0
4  Attribute3_val_1    3.0
5  Attribute3_val_2    7.0

alternative:

with open('filename.txt') as f:
    df = pd.DataFrame([[s[0], s[1:]] for s in
                        map(lambda s: s.strip().split('|'), f.readlines())],
                      columns=['id', 'value']
                     )

df = (df.explode('value')
        .assign(id=lambda d: (d['id'].where(d.groupby('id')['id'].transform('size').le(1),
                                            d['id']+'_val_'+d.groupby('id')['id'].cumcount().add(1).astype(str))
                             )
               )
      )

output:

                 id value
0        Attribute1     6
1  Attribute2_val_1    10
1  Attribute2_val_2    15
1  Attribute2_val_3    27
2  Attribute3_val_1     3
2  Attribute3_val_2     7