I have the following dataframe:
V1 = ['a','a','c','d']
V2 = ['test1', 'test2' , 'test3' , 'test4' ]
df = pd.DataFrame({'V1':V1,'V2':V2})
print(df.head())
V1 V2
a test1
a test2
c test3
d test4
I would like to loop over it as follow:
for [unique element in v1 column]:
for [corresponding elements in V2]:
I thought about building a dictionary with the following format:
dic = { 'a':['test1', 'test2'], 'c':['test3'] , 'd':['test4'] }
for elt in dic:
for i in dic[elt]:
Is there a better way/more efficient way to do this? If not how can I build such a dictionary efficiently?
Many thanks for your help!
>Solution :
An option to build the dictionary using pandas would be:
dic = pd.Series(V2, index=V1).groupby(level=0).agg(list).to_dict()
output: {'a': ['test1', 'test2'], 'c': ['test3'], 'd': ['test4']}
With classical python, use collections.defaultdict:
from collections import defaultdict
dic = defaultdict(list)
for k,v in zip(V1, V2):
dic[k].append(v)
dict(dic)
# {'a': ['test1', 'test2'], 'c': ['test3'], 'd': ['test4']}
To loop over your values from the initial dataframe:
df = pd.DataFrame({'V1':V1,'V2':V2})
for name, d in df.groupby('V1'):
print(f'entering group {name}')
for value in d['V2']:
print(f' value {value}')
output:
entering group a
value test1
value test2
entering group c
value test3
entering group d
value test4