Let’s say I have a DataFrame…
data = {'PVOL': [190, 105, 100, 150, 100, 170], 'STKS': [2000, 2500, 3000, 3500, 4000, 4500],
'CVOL': [64, 179, 98, 281, 86, 90]}
df = pd.DataFrame(data)
Now I want to find 3 largest values for all other columns(PVOL, CVOL…etc as my df may have multiple other columns too) and their corresponding value of column STKS in following manner (as string/print):
PVOL - 2000[190], 4500[170], 3500[150]
CVOL - 3500[281], 2500[179], 3000[98]
I tried following to get 2 largest values in df format …
columns_name = list(df.columns)
columns_name.remove('STKS')
data_dict = {}
for col in columns_name:
temp=[]
data=df.sort_values(col, ascending=False)[:2][[col,'STKS']].values
for row in data:
temp.append(row[1])
temp.append(row[0])
data_dict[col]=temp
new_df1=pd.DataFrame(data_dict,index="STK VOL STK VOL".split())
new_df1.set_axis(["PVOL", "CVOL"], axis='columns', inplace=True)
Vol_df = new_df1[["PVOL", "CVOL"]]
print(Vol_df)
Is there any simple method to do that plz ??? I also read about..
df.nlargest()
Thanks.
>Solution :
Using a custom function with nlargest:
def f(s, n=3):
x = s.nlargest(n)
return ', '.join(f'{a}[{b}]' for a,b in zip(x.index, x))
df.set_index('STKS').apply(f)
Output:
PVOL 2000[190], 4500[170], 3500[150]
CVOL 3500[281], 2500[179], 3000[98]
dtype: object
If you want strings:
for key, col in df.set_index('STKS').items():
x = col.nlargest(3)
s = ', '.join(f'{a}[{b}]' for a,b in zip(x.index, x))
print(f'{key} - {s}')
Output:
PVOL - 2000[190], 4500[170], 3500[150]
CVOL - 3500[281], 2500[179], 3000[98]
only for a subset of columns:
cols = ['CCHOI', 'PIV']
for key, col in df.set_index('STKS')[cols].items():
x = col.nlargest(3)
s = ', '.join(f'{a}[{b}]' for a,b in zip(x.index, x))
print(f'{key} - {s}')