I have a randomly generated Dataframe with random headers by using request apis from a website. I am trying to explode multiple lists with a random length with random key names at the same time so I can remove the duplicates values without any gaps.
Before:
Random Key 1 Random Key 2 Random Key 3 Random Key 4 Random Key 5
list1 list2 list3 list4 list5
Expected Results:
Random Key 1 Random Key 2 Random Key 3 Random Key 4 Random Key 5
string1.1 string2.1 string3.1 string4.1 string5.1
string1.2 string2.2 string3.2 string4.2 string5.2
string1.3 string2.3 string3.3 string4.3
string1.4 string2.4 string4.4
string2.5 string4.5
string2.6 string4.6
string4.7
I have very little experience on using Pandas Dataframes, but here is my code so far.
userList = pd.DataFrame.from_dict(data=user, orient="index")
userList = pd.DataFrame.transpose(userList)
allQuarters = list(set(allQuarters))
for quarters in allQuarters:
userList = userList.explode(quarters)
for headers in userList:
userList.loc[userList[headers].duplicated(), headers] = ""
userList is the completed random dictionary of the random keys and random lists with unknown lengths.
allQuarters is a method I tried using to check the random keys (or columns)
>Solution :
Assuming a DataFrame with a single row of lists, like this:
df = pd.DataFrame({'Random Key 1': [['string1.1', 'string1.2', 'string1.3', 'string1.4']],
'Random Key 2': [['string2.1', 'string2.2', 'string2.3', 'string2.4', 'string2.5', 'string2.6']],
'Random Key 3': [['string3.1', 'string3.2', 'string3.3']],
'Random Key 4': [['string4.1', 'string4.2', 'string4.3', 'string4.4', 'string4.5', 'string4.6', 'string4.7']],
'Random Key 5': [['string5.1', 'string5.2']]})
You could use explode with ignore_index=True for each column with apply:
df.apply(lambda x: x.explode(ignore_index=True))
Output:
Random Key 1 Random Key 2 Random Key 3 Random Key 4 Random Key 5
0 string1.1 string2.1 string3.1 string4.1 string5.1
1 string1.2 string2.2 string3.2 string4.2 string5.2
2 string1.3 string2.3 string3.3 string4.3 NaN
3 string1.4 string2.4 NaN string4.4 NaN
4 NaN string2.5 NaN string4.5 NaN
5 NaN string2.6 NaN string4.6 NaN
6 NaN NaN NaN string4.7 NaN
If you have several rows, you could use the approach that I described here with a custom function:
def explode_dedup(s):
s = s.explode()
return s.set_axis(
pd.MultiIndex.from_arrays([s.index, s.groupby(level=0).cumcount()])
)
out = df.apply(explode_dedup)
Output:
Random Key 1 Random Key 2 Random Key 3 Random Key 4 Random Key 5
0 0 string1.1 string2.1 string3.1 string4.1 string5.1
1 string1.2 string2.2 string3.2 string4.2 string5.2
2 string1.3 string2.3 string3.3 string4.3 NaN
3 string1.4 string2.4 NaN string4.4 NaN
4 NaN string2.5 NaN string4.5 NaN
5 NaN string2.6 NaN string4.6 NaN
6 NaN NaN NaN string4.7 NaN
1 0 string1.1 string2.1 string3.1 string4.1 string5.1
1 string1.2 string2.2 string3.2 string4.2 string5.2
2 string1.3 string2.3 string3.3 string4.3 NaN
3 string1.4 string2.4 NaN string4.4 NaN
4 NaN string2.5 NaN string4.5 NaN
5 NaN string2.6 NaN string4.6 NaN
6 NaN NaN NaN string4.7 NaN
Alternative input:
df = pd.DataFrame({'Random Key 1': [['string1.1', 'string1.2', 'string1.3', 'string1.4'],
['string1.1', 'string1.2', 'string1.3', 'string1.4']],
'Random Key 2': [['string2.1', 'string2.2', 'string2.3', 'string2.4', 'string2.5', 'string2.6'],
['string2.1', 'string2.2', 'string2.3', 'string2.4', 'string2.5', 'string2.6']],
'Random Key 3': [['string3.1', 'string3.2', 'string3.3'], ['string3.1', 'string3.2', 'string3.3']],
'Random Key 4': [['string4.1', 'string4.2', 'string4.3', 'string4.4', 'string4.5', 'string4.6', 'string4.7'],
['string4.1', 'string4.2', 'string4.3', 'string4.4', 'string4.5', 'string4.6', 'string4.7']],
'Random Key 5': [['string5.1', 'string5.2'], ['string5.1', 'string5.2']]})