Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Add new column in dataframe based on multiple column conditions

I have the following dataframe with sentiments:

Text Negative Neutral Positive
I lost my phone. I am sad 0.8 0.15 0.05
How is your day? 0.1 0.8 0.1
Let’s go out for dinner today. 0.06 0.55 0.39
I am super pissed at my friend for cancelling the party. 0.73 0.11 0.16
I am so happy  I want to dance 0 0.1 0.9
I am not sure if I should laugh or just smile 0.08 0.24 0.68

This is based on the sentimental analysis I have completed. Now, each text can be tagged as any one of the 5:

Very Negative, Negative, Neutral, Positive, Very Positive.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I want to add a new column in the dataframe that analyses the sentiments and tags as per the following rule:

1. If the value of negative or positive is most dominating and >= 0.8 (80%) then mark it as very negative or very positive.

2. If the value of negative or positive is most dominating but it is >= 0.5 but less than 0.8 then just negative or positive.

3. If the value of neutral is >= 0.5 then Neutral. There is no such thing as Very Neutral.

For the above example, the result should look like below:

Text Negative Neutral Positive Sentiment
I lost my phone. I am sad 0.8 0.15 0.05 Very Negative
How is your day? 0.1 0.8 0.1 Neutral
Let’s go out for dinner today. 0.06 0.55 0.39 Neutral
I am super pissed at my friend for cancelling the party. 0.73 0.11 0.16 Negative
I am so happy  I want to dance 0 0.1 0.9 Very Positive
I am not sure if I should laugh or just smile 0.08 0.24 0.68 Positive

How can I perform this operation in dataframe. I want to then plot a graph to see the distribution of each of those 5 sentiments. That part I can do, but I am trying to get this multiple conditions working on pandas.

Any help is greatly appreciated.

>Solution :

You can use np.select()

conditions = [df['Positive']>=0.80, df['Negative']>=0.80, ((df['Positive']>=0.50) & (df['Positive']<0.80)),
              ((df['Negative']>=0.50) & (df['Negative']<0.80)), df['Neutral']>=0.5]
values = ['Very Positive', 'Very Negative', 'Positive', 'Negative', 'Neutral']
df['Sentiment'] = np.select(conditions, values, default=np.nan)

OUTPUT

                                               Text  Negative  Neutral  Positive      Sentiment
0                          I lost my phone. I am sad      0.80     0.15      0.05  Very Negative
1                                   How is your day?      0.10     0.80      0.10        Neutral
2                     Let's go out for dinner today.      0.06     0.55      0.39        Neutral
3  I am super pissed at my friend for cancelling ...      0.73     0.11      0.16       Negative
4                     I am so happy  I want to dance      0.00     0.10      0.90  Very Positive
5      I am not sure if I should laugh or just smile      0.08     0.24      0.68       Positive
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading