Can the .loc command be used with groupby's apply function

May 11, 2023

This question already has answers here:
Pandas conditional creation of a series/dataframe column (13 answers)
Your post has been associated with a similar question. If that question doesn’t answer your issue, edit your question to highlight the difference between the associated question and yours. If edited, your question will be reviewed and might be reopened.

Find out more about duplicates and why your question has been closed.

Closed yesterday.

Hi. I have a DataFrame with multiple columns where I have successfully assigned values in a new column (sent_inclination) given a condition. However, I want to change the output such that the values are given to the entire given group identified by the column "id". The condition is: if "freq_cent" > 0.5, then the new column should have the value from the column ‘Sentiment’ in it for the given id. That is whether the freq_cent is greater than 0.5 or less than 0.5 for a given observation, the new column should contain the sentiment value that qualifies for greater than 0.5 for that entire group.
I am able to assign values to the entire DataFrame but cannot make sure that the values are the same for the entire group.

Here is my sample DataFrame:

data = {'id': ['205', '205', '204', '204', '204'], 
         'Sentiment': ['Positive', 'Positive', 'Neutral', 'Positive', 'Positive']}
df = pd.DataFrame(data)

df['freq'] = df.groupby('Sentiment')['id'].transform(pd.Series.nunique)
df['freq_sum'] = df.groupby('id')['freq'].transform(pd.Series.count)
df['freq_cent'] = (df['freq']/df['freq_sum'])

where if I apply the code:

df['sent_inclination'] = df.loc[df['freq_cent'] >0.5, ['Sentiment']]

I get the output:

    id  Sentiment   freq    freq_sum    freq_cent   sent_inclination
0   205 Positive    2       2           1.000000    Positive
1   205 Positive    2       2           1.000000    Positive
2   204 Neutral     1       3           0.333333    NaN
3   204 Positive    2       3           0.666667    Positive
4   204 Positive    2       3           0.666667    Positive

The desired output should have ‘sent_inclination’ as Positive for all observations where id is 204, that is:

    id  Sentiment   freq    freq_sum    freq_cent   sent_inclination
0   205 Positive    2       2           1.000000    Positive
1   205 Positive    2       2           1.000000    Positive
2   204 Neutral     1       3           0.333333    Positive
3   204 Positive    2       3           0.666667    Positive
4   204 Positive    2       3           0.666667    Positive

How can I achieve this? Any suggestions will be highly appreciated. Unfortunately the groupby.filter method doesn’t work for me.

So far I have tried multiple codes, some of which are as follows:

df['sent_inclination'] = df.loc[df.groupby('id').apply(lambda x: df.loc[df['freq_cent'] >0.5, df['Sentiment']])]

df['sent_inclination'] = df.groupby('id').apply(lambda x: (df.query('freq_cent >0.5')['Sentiment']))

df.groupby('id').apply(lambda x: x['sent_inclination'] == x['Sentiment'] if (x['freq_cent'] > 0.5) else '')

df.groupby('id').apply(lambda x: x['sent_inclination'] == (df.query('freq_cent >0.5')['Sentiment']))

>Solution :

I recommend to use groupbyfrom pandas und wherefrom numpy:

import pandas as pd
import numpy as np

#this will get you a appended dataframe where the maximum per group is picked (you can also use "mean" instead of "max" to get the group average
df = pd.merge(df, df.groupby(['id'])['freq_cent'].max().reset_index(), on='id', how='left')

#this will check the value is greater then 0,5
df['sent_inclination'] = np.where(df['freq_cent_y'] >0.5, 'Positive', df['Sentiment'])

#cleaning and rename
df.rename(columns={"freq_cent_x": "freq_cent_x"}, inplace=True)
df = df[['id', 'freq', 'freq_sum', 'freq_cent_x', 'sent_inclination']]

output:

print(df)
   id  freq  freq_sum  freq_cent_x sent_inclination
0  205     2         2     1.000000         Positive
1  205     2         2     1.000000         Positive
2  204     1         3     0.333333         Positive
3  204     2         3     0.666667         Positive
4  204     2         3     0.666667         Positive