Assume there is a pandas DataFrame such as
import pandas as pd
df = pd.DataFrame({'items':[[101,102],[102,101],[102,103],
[101,103],[101,101],[102,102],
[103,103]],
'value':[12,13,11,15,17,8,19]})
print(df)
items value
0 [101, 102] 12
1 [102, 101] 13
2 [102, 103] 11
3 [101, 103] 15
4 [101, 101] 17
5 [102, 102] 8
6 [103, 103] 19
I would like to sum over 2nd value of df['items']
in each row such that
[101, 102] + [101, 103] + [101, 101] = 12 + 15 + 17 = 44. Do the same thing for 102 & 103. The final data frame should have something like
0 101 44
1 102 32
2 103 19
This is my code but it seems to be incorrect
df1 = df.groupby(df['items'][1]).agg({'value':sum})
Any suggestion? many thanks
>Solution :
In [168]: df.groupby(df["items"].str[0]).agg({"value": "sum"})
Out[168]:
value
items
101 44
102 32
103 19
df["items"][0]
would choose the 0th value of the Series, not each 0th value of the lists in the Series. For that, we use the .str
accessor. It’s short for string but [..]
is supported by lists too (duck typing) so we can use them on lists as well. Note that Python is 0-indexed, so we use 0 not 1.