Home How to solve Python Pandas assign error when creating new column

Questions

How to solve Python Pandas assign error when creating new column

November 20, 2021

I have a df containing home descriptions:

description
0   Beautiful, spacious skylit studio in the heart...
1   Enjoy 500 s.f. top floor in 1899 brownstone, w...
2   The spaceHELLO EVERYONE AND THANKS FOR VISITIN...
3   We welcome you to stay in our lovely 2 br dupl...
4   Please don’t expect the luxury here just a bas...
5   Our best guests are seeking a safe, clean, spa...
6   Beautiful house, gorgeous garden, patio, cozy ...
7   Comfortable studio apartment with super comfor...
8   A charming month-to-month home away from home ...
9   Beautiful peaceful healthy homeThe spaceHome i...

I’m trying to count the number of sentences on each row (using sent_tokenize from nltk.tokenize) and append those values as a new column, sentence_count, to the df. Since this is part of a larger data pipeline, I’m using pandas assign so that I could chain operations.

I can’t seem to get it to work, though. I’ve tried:

df.assign(sentence_count=lambda x: len(sent_tokenize(x['description'])))

and

df.assign(sentence_count=len(sent_tokenize(df['description'])))

but both return the following:

TypeError: expected string or bytes-like object

I’ve confirmed that each row has a str dtype. Perhaps it’s because description has dtype('O')?

What am I doing wrong here? Using a pipe with a custom function works fine here, but I prefer using assign.

>Solution :

x['description'] when you pass it to sent_tokenize in the first example is a pandas.Series. It’s not a string. It’s a Series (similar to a list) of strings.

So instead you should do this:

df.assign(sentence_count=df['description'].apply(sent_tokenize))

Or, if you need to pass extra parameters to sent_tokenize:

df.assign(sentence_count=df['description'].apply(lambda x: sent_tokenize(x)))

byMR

Published November 20, 2021

Add a comment

Handle json using Pandas json_normalize

byMR

November 20, 2021

Questions

Python PyGame append object on the screen multiple times with different movement

byMR

November 20, 2021

Questions

Python: Add optional argument into matplotlib button on_clicked function

byMR

November 20, 2021

Questions

Firestore filtering by document reference

byMR

November 20, 2021

Questions

Why is it necessary to use keyword ‘aligned’ in SQL when inserting a aligned timeseries in Apache IoTDB

byMR

November 20, 2021

Questions

c format input to binary and then output

byMR

November 20, 2021

How to solve Python Pandas assign error when creating new column

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Handle json using Pandas json_normalize

Python PyGame append object on the screen multiple times with different movement

Python: Add optional argument into matplotlib button on_clicked function

Firestore filtering by document reference

Why is it necessary to use keyword ‘aligned’ in SQL when inserting a aligned timeseries in Apache IoTDB

c format input to binary and then output

Keep Up to Date with the Most Important News

How to solve Python Pandas assign error when creating new column

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Handle json using Pandas json_normalize

Python PyGame append object on the screen multiple times with different movement

Python: Add optional argument into matplotlib button on_clicked function

Firestore filtering by document reference

Why is it necessary to use keyword ‘aligned’ in SQL when inserting a aligned timeseries in Apache IoTDB

c format input to binary and then output

Discover more from Dev solutions