Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I explode multiple lists with each having a random length and columns having random key names?

I have a randomly generated Dataframe with random headers by using request apis from a website. I am trying to explode multiple lists with a random length with random key names at the same time so I can remove the duplicates values without any gaps.

Before:

Random Key 1   Random Key 2   Random Key 3   Random Key 4   Random Key 5
list1          list2          list3          list4          list5

Expected Results:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Random Key 1   Random Key 2   Random Key 3   Random Key 4   Random Key 5
string1.1      string2.1      string3.1      string4.1      string5.1
string1.2      string2.2      string3.2      string4.2      string5.2
string1.3      string2.3      string3.3      string4.3      
string1.4      string2.4                     string4.4      
               string2.5                     string4.5      
               string2.6                     string4.6      
                                             string4.7      

I have very little experience on using Pandas Dataframes, but here is my code so far.

userList = pd.DataFrame.from_dict(data=user, orient="index")
userList = pd.DataFrame.transpose(userList)
allQuarters = list(set(allQuarters))


for quarters in allQuarters:
  userList = userList.explode(quarters)

  for headers in userList:
    userList.loc[userList[headers].duplicated(), headers] = ""

userList is the completed random dictionary of the random keys and random lists with unknown lengths.
allQuarters is a method I tried using to check the random keys (or columns)

>Solution :

Assuming a DataFrame with a single row of lists, like this:

df = pd.DataFrame({'Random Key 1': [['string1.1', 'string1.2', 'string1.3', 'string1.4']],
                   'Random Key 2': [['string2.1', 'string2.2', 'string2.3', 'string2.4', 'string2.5', 'string2.6']],
                   'Random Key 3': [['string3.1', 'string3.2', 'string3.3']],
                   'Random Key 4': [['string4.1', 'string4.2', 'string4.3', 'string4.4', 'string4.5', 'string4.6', 'string4.7']],
                   'Random Key 5': [['string5.1', 'string5.2']]})

You could use explode with ignore_index=True for each column with apply:

df.apply(lambda x: x.explode(ignore_index=True))

Output:

  Random Key 1 Random Key 2 Random Key 3 Random Key 4 Random Key 5
0    string1.1    string2.1    string3.1    string4.1    string5.1
1    string1.2    string2.2    string3.2    string4.2    string5.2
2    string1.3    string2.3    string3.3    string4.3          NaN
3    string1.4    string2.4          NaN    string4.4          NaN
4          NaN    string2.5          NaN    string4.5          NaN
5          NaN    string2.6          NaN    string4.6          NaN
6          NaN          NaN          NaN    string4.7          NaN

If you have several rows, you could use the approach that I described here with a custom function:

def explode_dedup(s):
    s = s.explode()
    return s.set_axis(
        pd.MultiIndex.from_arrays([s.index, s.groupby(level=0).cumcount()])
    )

out = df.apply(explode_dedup)

Output:

    Random Key 1 Random Key 2 Random Key 3 Random Key 4 Random Key 5
0 0    string1.1    string2.1    string3.1    string4.1    string5.1
  1    string1.2    string2.2    string3.2    string4.2    string5.2
  2    string1.3    string2.3    string3.3    string4.3          NaN
  3    string1.4    string2.4          NaN    string4.4          NaN
  4          NaN    string2.5          NaN    string4.5          NaN
  5          NaN    string2.6          NaN    string4.6          NaN
  6          NaN          NaN          NaN    string4.7          NaN
1 0    string1.1    string2.1    string3.1    string4.1    string5.1
  1    string1.2    string2.2    string3.2    string4.2    string5.2
  2    string1.3    string2.3    string3.3    string4.3          NaN
  3    string1.4    string2.4          NaN    string4.4          NaN
  4          NaN    string2.5          NaN    string4.5          NaN
  5          NaN    string2.6          NaN    string4.6          NaN
  6          NaN          NaN          NaN    string4.7          NaN

Alternative input:

df = pd.DataFrame({'Random Key 1': [['string1.1', 'string1.2', 'string1.3', 'string1.4'],
                                    ['string1.1', 'string1.2', 'string1.3', 'string1.4']],
                   'Random Key 2': [['string2.1', 'string2.2', 'string2.3', 'string2.4', 'string2.5', 'string2.6'],
                                    ['string2.1', 'string2.2', 'string2.3', 'string2.4', 'string2.5', 'string2.6']],
                   'Random Key 3': [['string3.1', 'string3.2', 'string3.3'], ['string3.1', 'string3.2', 'string3.3']],
                   'Random Key 4': [['string4.1', 'string4.2', 'string4.3', 'string4.4', 'string4.5', 'string4.6', 'string4.7'],
                                    ['string4.1', 'string4.2', 'string4.3', 'string4.4', 'string4.5', 'string4.6', 'string4.7']],                  
                   'Random Key 5': [['string5.1', 'string5.2'], ['string5.1', 'string5.2']]})
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading