Home How do I explode multiple lists with each having a random length and columns having random key names?

Questions

How do I explode multiple lists with each having a random length and columns having random key names?

April 12, 2024

I have a randomly generated Dataframe with random headers by using request apis from a website. I am trying to explode multiple lists with a random length with random key names at the same time so I can remove the duplicates values without any gaps.

Before:

Random Key 1   Random Key 2   Random Key 3   Random Key 4   Random Key 5
list1          list2          list3          list4          list5

Expected Results:

Random Key 1   Random Key 2   Random Key 3   Random Key 4   Random Key 5
string1.1      string2.1      string3.1      string4.1      string5.1
string1.2      string2.2      string3.2      string4.2      string5.2
string1.3      string2.3      string3.3      string4.3      
string1.4      string2.4                     string4.4      
               string2.5                     string4.5      
               string2.6                     string4.6      
                                             string4.7

I have very little experience on using Pandas Dataframes, but here is my code so far.

userList = pd.DataFrame.from_dict(data=user, orient="index")
userList = pd.DataFrame.transpose(userList)
allQuarters = list(set(allQuarters))


for quarters in allQuarters:
  userList = userList.explode(quarters)

  for headers in userList:
    userList.loc[userList[headers].duplicated(), headers] = ""

userList is the completed random dictionary of the random keys and random lists with unknown lengths.
allQuarters is a method I tried using to check the random keys (or columns)

>Solution :

Assuming a DataFrame with a single row of lists, like this:

df = pd.DataFrame({'Random Key 1': [['string1.1', 'string1.2', 'string1.3', 'string1.4']],
                   'Random Key 2': [['string2.1', 'string2.2', 'string2.3', 'string2.4', 'string2.5', 'string2.6']],
                   'Random Key 3': [['string3.1', 'string3.2', 'string3.3']],
                   'Random Key 4': [['string4.1', 'string4.2', 'string4.3', 'string4.4', 'string4.5', 'string4.6', 'string4.7']],
                   'Random Key 5': [['string5.1', 'string5.2']]})

You could use explode with ignore_index=True for each column with apply:

df.apply(lambda x: x.explode(ignore_index=True))

Output:

  Random Key 1 Random Key 2 Random Key 3 Random Key 4 Random Key 5
0    string1.1    string2.1    string3.1    string4.1    string5.1
1    string1.2    string2.2    string3.2    string4.2    string5.2
2    string1.3    string2.3    string3.3    string4.3          NaN
3    string1.4    string2.4          NaN    string4.4          NaN
4          NaN    string2.5          NaN    string4.5          NaN
5          NaN    string2.6          NaN    string4.6          NaN
6          NaN          NaN          NaN    string4.7          NaN

If you have several rows, you could use the approach that I described here with a custom function:

def explode_dedup(s):
    s = s.explode()
    return s.set_axis(
        pd.MultiIndex.from_arrays([s.index, s.groupby(level=0).cumcount()])
    )

out = df.apply(explode_dedup)

Output:

    Random Key 1 Random Key 2 Random Key 3 Random Key 4 Random Key 5
0 0    string1.1    string2.1    string3.1    string4.1    string5.1
  1    string1.2    string2.2    string3.2    string4.2    string5.2
  2    string1.3    string2.3    string3.3    string4.3          NaN
  3    string1.4    string2.4          NaN    string4.4          NaN
  4          NaN    string2.5          NaN    string4.5          NaN
  5          NaN    string2.6          NaN    string4.6          NaN
  6          NaN          NaN          NaN    string4.7          NaN
1 0    string1.1    string2.1    string3.1    string4.1    string5.1
  1    string1.2    string2.2    string3.2    string4.2    string5.2
  2    string1.3    string2.3    string3.3    string4.3          NaN
  3    string1.4    string2.4          NaN    string4.4          NaN
  4          NaN    string2.5          NaN    string4.5          NaN
  5          NaN    string2.6          NaN    string4.6          NaN
  6          NaN          NaN          NaN    string4.7          NaN

Alternative input:

df = pd.DataFrame({'Random Key 1': [['string1.1', 'string1.2', 'string1.3', 'string1.4'],
                                    ['string1.1', 'string1.2', 'string1.3', 'string1.4']],
                   'Random Key 2': [['string2.1', 'string2.2', 'string2.3', 'string2.4', 'string2.5', 'string2.6'],
                                    ['string2.1', 'string2.2', 'string2.3', 'string2.4', 'string2.5', 'string2.6']],
                   'Random Key 3': [['string3.1', 'string3.2', 'string3.3'], ['string3.1', 'string3.2', 'string3.3']],
                   'Random Key 4': [['string4.1', 'string4.2', 'string4.3', 'string4.4', 'string4.5', 'string4.6', 'string4.7'],
                                    ['string4.1', 'string4.2', 'string4.3', 'string4.4', 'string4.5', 'string4.6', 'string4.7']],                  
                   'Random Key 5': [['string5.1', 'string5.2'], ['string5.1', 'string5.2']]})