I have a dataframe that looks like this:
a=['a','b','c','d']
b=['the','fox','the','then']
c=['quick','jumps','lazy','barks']
d=['brown','over','dog','loudly']
df=pd.DataFrame(zip(a,b,c,d),columns=['indexcol','col1','col2','col3'])
and a dictionary that looks like this:
keys=['a','b','c','d']
vals=[]
vals.append(['col1','col3'])
vals.append(['col1','col2'])
vals.append(['col1','col2','col3'])
vals.append(['col2','col3'])
newdict = {k: v for k, v in zip(keys, vals)}
What I’m trying to do is to create a new column in df which constructs a statement for each row. Taking the first row as an example, the sentence should look like so:
"col1 is ‘the’ | col3 is ‘lazy’ "
another example using the 3rd row just to make the task at hand crystal clear:
"col1 is ‘brown’ | col2 is ‘the’ | col3 is ‘then’ "
essentially, I want to refer to the dictionary values to look up the column in df using the dictionary keys as the row reference matching to indexcol in df.
Thanks in advance.
>Solution :
I’m not sure if I understand you correctly but you can try:
df = df.set_index("indexcol")
for k, v in newdict.items():
row = df.loc[k]
df.loc[k, "new_column"] = " | ".join(f"{i} is '{row[i]}'" for i in v)
print(df.reset_index())
Prints:
indexcol col1 col2 col3 new_column
0 a the quick brown col1 is 'the' | col3 is 'brown'
1 b fox jumps over col1 is 'fox' | col2 is 'jumps'
2 c the lazy dog col1 is 'the' | col2 is 'lazy' | col3 is 'dog'
3 d then barks loudly col2 is 'barks' | col3 is 'loudly'