Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Frequency of categories in a column conditioned on another column for each observation

I have multiple columns in my dataset, two of which are "id" and "Sentiment". I am trying to find the frequency of each of the sentiments for each of the ids in the dataset and add them in a new column in the same dataset. I have tried multiple commands, but have not been able to get the correct frequency. One of the commands that should logically work is as follows:

Sample DataFrame:

data = {'id': ['205', '205', '204', '204', '204'], 'First_name': ['Jon','Bill','Maria','Emma', 'Bee'], 
     'Sentiment': ['Positive', 'Positive', 'Neutral', 'Positive', 'Positve']}
df = DataFrame(data)

and the commands that I tried:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

for x in df['id']:
  df['sent_freq'] = df.Sentiment.map(df.Sentiment.value_counts())

Or

df['sent_freq'] = df.groupby('Sentiment')['id'].transform('count')

The output that I get from both is:

    id  First_name  Sentiment   sent_freq
0   205 Jon     Positive        3
1   205 Bill    Positive        3
2   204 Maria   Neutral         1
3   204 Emma    Positive        3
4   204 Bee     Positve         1

which is wrong, as it should be

    id  First_name  Sentiment   sent_freq
0   205 Jon     Positive        2
1   205 Bill    Positive        2
2   204 Maria   Neutral         1
3   204 Emma    Positive        2
4   204 Bee     Positve         2

Any leads will be highly appreciated.

>Solution :

Example

your example code have something wrong. i fix it

data = {'id': ['205', '205', '204', '204', '204'], 'First_name': ['Jon','Bill','Maria','Emma', 'Bee'], 
     'Sentiment': ['Positive', 'Positive', 'Neutral', 'Positive', 'Positive']}
df = pd.DataFrame(data)

Code

‘count’ -> pd.Series.nunique

df['sent_freq'] = df.groupby('Sentiment')['id'].transform(pd.Series.nunique)

output:

    id  First_name  Sentiment       sent_freq
0   205 Jon         Positive        2
1   205 Bill        Positive        2
2   204 Maria       Neutral         1
3   204 Emma        Positive        2
4   204 Bee         Positive        2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading