I have several Dataframes in a list
df1 = {'col1': ['a', 'a', 'b', 'c', 'c', 'd'], 'col2': [3, 4, 3, 8, 4, 5], 'col3': ['cat', 'cat', 'cat', 'cat', 'cat', 'cat']}
df2 = {'col1': ['a', 'a', 'b', 'c', 'c', 'd'], 'col2': [30, 40, 30, 80, 40, 50], 'col3': ['dog', 'dog', 'dog', 'dog', 'dog', 'dog']}
df3 = {'col1': ['a', 'a', 'b', 'c', 'c', 'd'], 'col2': [31, 41, 31, 81, 41, 51], 'col3': ['whale', 'whale', 'whale', 'whale', 'whale', 'whale']}
dfList = [df1, df2, df3]
What I want is a list of dataframes where, where each new dataframe for each value in col1, with col2 and col3 as columns. The values of col2 and col3 should come from each dataframe df1, df2, df3.
a
col2 col3
3 cat
30 dog
31 whale
a
col2 col3
4 cat
40 dog
41 whale
b
col2 col3
3 cat
30 dog
31 whale
c
col2 col3
8 cat
80 dog
81 whale
....
I would also be interested if it is possible to have that in one dataframe, something like
col2 col3
a 3 cat
30 dog
31 whale
a 4 cat
40 dog
41 whale
b 3 cat
30 dog
31 whale
c 8 cat
80 dog
81 whale
....
>Solution :
You can use concat and groupby:
df = (pd.concat(dfList)
.assign(id=lambda d: d.groupby(['col1', 'col3']).cumcount())
)
out = [d.drop(columns='id') for k,d in df.groupby(['col1', 'id'])]
output:
[ col1 col2 col3
0 a 3 cat
0 a 30 dog
0 a 31 whale,
col1 col2 col3
1 a 4 cat
1 a 40 dog
1 a 41 whale,
col1 col2 col3
2 b 3 cat
2 b 30 dog
2 b 31 whale,
col1 col2 col3
3 c 8 cat
3 c 80 dog
3 c 81 whale,
col1 col2 col3
4 c 4 cat
4 c 40 dog
4 c 41 whale,
col1 col2 col3
5 d 5 cat
5 d 50 dog
5 d 51 whale]
Alternative:
df.set_index(['col1', 'id']).sort_index()
output:
col2 col3
col1 id
a 0 3 cat
0 30 dog
0 31 whale
1 4 cat
1 40 dog
1 41 whale
b 0 3 cat
0 30 dog
0 31 whale
c 0 8 cat
0 80 dog
0 81 whale
1 4 cat
1 40 dog
1 41 whale
d 0 5 cat
0 50 dog
0 51 whale