Home Exploding multiple column list in pandas

Questions

Exploding multiple column list in pandas

September 2, 2024

I’ve already tried everything posted here but nothing is working, so please don’t mark this as duplicate because I think the problem is different.

I have a json like this:

[{'Id': 1,
  'Design': ["09",
   '10',
   '13'
  ],
  'Research': ['Eng',
   'Math']
  }]

Plus other non-list colums. This is repeated for 500 ids.

I need to explode the list columns. the final output should be an excel file, I don’t care if the explosion is done directly in json or in pandas.

Already tried:

    def lenx(x):
        return len(x) if isinstance(x,(list, tuple, np.ndarray, pd.Series)) else 1

    def cell_size_equalize2(row, cols='', fill_mode='internal', fill_value=''):
        jcols = [j for j,v in enumerate(row.index) if v in cols]
        if len(jcols)<1:
            jcols = range(len(row.index))
        Ls = [lenx(x) for x in row.values]
        if not Ls[:-1]==Ls[1:]:
            vals = [v if isinstance(v,list) else [v] for v in row.values]
            if fill_mode=='external':
                vals = [[e] + [fill_value]*(max(Ls)-1) if (not j in jcols) and (isinstance(row.values[j],list))
                        else e + [fill_value]*(max(Ls)-lenx(e))
                        for j,e in enumerate(vals)]
            elif fill_mode == 'internal':
                vals = [[e]+[e]*(max(Ls)-1) if (not j in jcols) and (isinstance(row.values[j],list))
                        else e+[e[-1]]*(max(Ls)-lenx(e)) 
                        for j,e in enumerate(vals)]
            else:
                vals = [e[0:min(Ls)] for e in vals]
            row = pd.Series(vals,index=row.index.tolist())
        return row

Leads to index error

df.explode(['B', 'C', 'D', 'E']).reset_index(drop=True)

Columns must have same lenght

df1 = pd.concat([df[x].explode().to_frame()
                      .assign(g=lambda x: x.groupby(level=0).cumcount())
                      .set_index('g', append=True) 
                for x in cols_to_explode], axis=1)

Somehow it creates lots of rows, I think it just explodes a column after another, and it leads to memory error.

Desired Output:

Id      Design     Research
1         09          Eng
1         10          Math
1         13

>Solution :

You can use json_normalize and a deduplication-explode (as presented here):

tmp = pd.json_normalize(json)

def explode_dedup(s):
    s = s.explode()
    return s.set_axis(
        pd.MultiIndex.from_arrays([s.index, s.groupby(level=0).cumcount()])
    )

ids = ['Id']
cols = tmp.columns.difference(ids)

out = (tmp[ids]
       .join(pd.concat({c: explode_dedup(tmp[c])
                        for c in cols}, axis=1)
               .droplevel(-1)
            )[tmp.columns]
      )

NB. if you know the columns to explode, you can alternatively use:

cols = ['Design', 'Research']
ids = tmp.columns.difference(cols)

Output:

   Id Design Research
0   1     09      Eng
0   1     10     Math
0   1     13      NaN

pandas

byMR

Published September 02, 2024

Add a comment

HQL : how to find sum of multiple columns in where clause

byMR

September 2, 2024

Questions

How to extract only unique values from string using regex in Python?

byMR

September 2, 2024

Questions

ReactJS and HTML – How to only allow letters on input?

byMR

September 2, 2024

Questions

Automatic marking bell Javascript

byMR

September 2, 2024

Questions

Max() and Min() have different behaviours?

byMR

September 2, 2024

Questions

How to render a single component for every route in Next.js app router?

byMR

September 2, 2024

Exploding multiple column list in pandas

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

HQL : how to find sum of multiple columns in where clause

How to extract only unique values from string using regex in Python?

ReactJS and HTML – How to only allow letters on input?

Automatic marking bell Javascript

Max() and Min() have different behaviours?

How to render a single component for every route in Next.js app router?

Keep Up to Date with the Most Important News

Exploding multiple column list in pandas

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

HQL : how to find sum of multiple columns in where clause

How to extract only unique values from string using regex in Python?

ReactJS and HTML – How to only allow letters on input?

Automatic marking bell Javascript

Max() and Min() have different behaviours?

How to render a single component for every route in Next.js app router?

Discover more from Dev solutions