I have multiple columns in my dataset, two of which are "id" and "Sentiment". I am trying to find the frequency of each of the sentiments for each of the ids in the dataset and add them in a new column in the same dataset. I have tried multiple commands, but have not been able to get the correct frequency. One of the commands that should logically work is as follows:
Sample DataFrame:
data = {'id': ['205', '205', '204', '204', '204'], 'First_name': ['Jon','Bill','Maria','Emma', 'Bee'],
'Sentiment': ['Positive', 'Positive', 'Neutral', 'Positive', 'Positve']}
df = DataFrame(data)
and the commands that I tried:
for x in df['id']:
df['sent_freq'] = df.Sentiment.map(df.Sentiment.value_counts())
Or
df['sent_freq'] = df.groupby('Sentiment')['id'].transform('count')
The output that I get from both is:
id First_name Sentiment sent_freq
0 205 Jon Positive 3
1 205 Bill Positive 3
2 204 Maria Neutral 1
3 204 Emma Positive 3
4 204 Bee Positve 1
which is wrong, as it should be
id First_name Sentiment sent_freq
0 205 Jon Positive 2
1 205 Bill Positive 2
2 204 Maria Neutral 1
3 204 Emma Positive 2
4 204 Bee Positve 2
Any leads will be highly appreciated.
>Solution :
Example
your example code have something wrong. i fix it
data = {'id': ['205', '205', '204', '204', '204'], 'First_name': ['Jon','Bill','Maria','Emma', 'Bee'],
'Sentiment': ['Positive', 'Positive', 'Neutral', 'Positive', 'Positive']}
df = pd.DataFrame(data)
Code
‘count’ -> pd.Series.nunique
df['sent_freq'] = df.groupby('Sentiment')['id'].transform(pd.Series.nunique)
output:
id First_name Sentiment sent_freq
0 205 Jon Positive 2
1 205 Bill Positive 2
2 204 Maria Neutral 1
3 204 Emma Positive 2
4 204 Bee Positive 2