Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Loading Pandas Dataframe with skipped sentiment

I have this dataset for sentiment analysis, loading the data with this code:

url = 'https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/amazon_cells_labelled.tsv'
df = pd.read_csv(url, sep='\t', names=["Sentence", "Feeling"])

The issue is the DataFrame is getting lines with NaN, but It’s just part of the whole sentence.

The Output, right now is like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

sentence                      feeling
I do not like it.             NaN
I give it a bad score.        0

The Output should look like:

sentence                                    feeling
I do not like it. I give it a bad score     0

Can you help me to concatenate or load the dataset based on the scores?

>Solution :

Create virtual groups before groupby and agg rows:

grp = df['Feeling'].notna().cumsum().shift(fill_value=0)
out = df.groupby(grp).agg({'Sentence': ' '.join, 'Feeling': 'last'})
print(out)

# Output:
                                                  Sentence  Feeling
Feeling                                                            
0        I try not to adjust the volume setting to avoi...      0.0
1                              Good case, Excellent value.      1.0
2        I thought Motorola made reliable products!. Ba...      1.0
3        When I got this item it was larger than I thou...      0.0
4                                        The mic is great.      1.0
...                                                    ...      ...
996      But, it was cheap so not worth the expense or ...      0.0
997      Unfortunately, I needed them soon so i had to ...      0.0
998      The only thing that disappoint me is the infra...      0.0
999      No money back on this one. You can not answer ...      0.0
1000     It's rugged. Well this one is perfect, at the ...      NaN

[1001 rows x 2 columns]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading