I wrote the following code (not sure if this is the best approach), just know the data I have is divided into two separate lists, in the correct order. Z[0] is steps, and z[1] is the lists.
for i,z in enumerate(zip(steps,userids_list)):
print(z)
This results in the following tuple values:
# SAMPLE
(('Step 1 string', [list of userid of that step]),
('Step 2 string', [list of userid of that step]),
('Step 3 string', [list of userid of that step]),
('Step n string', [list of userids of that step]))
My goal is to transform that style of data into the following pandas DataFrame.
Column 1 Column 2
Step 1 User id
Step 1 User id
Step 2 User id
Step 2 User id
Step 3 User id
Step 3 User id
Unfortunately I couldn’t find a way to transform the data into what I want. Any ideas on what I could try to do?
>Solution :
explode is perfect for this. Load your data into a dataframe and then explode the column containing the lists:
df = pd.DataFrame({
'Column 1': Z[0],
'Column 2': Z[1],
})
df = df.explode('Column 2')
For example:
steps = ['Step 1', 'Step 2', 'Step 3']
user_ids = [
['user a', 'user b'],
['user a', 'user b', 'user c'],
['user c'],
]
df = pd.DataFrame({
'step': steps,
'user_id': user_ids,
})
df = df.explode('user_id').reset_index(drop=True)
print(df)
Output:
step user_id
0 Step 1 user a
1 Step 1 user b
2 Step 2 user a
3 Step 2 user b
4 Step 2 user c
5 Step 3 user c