Home Obtain the average lenght of words of sentences in a dataframe column

Questions

Obtain the average lenght of words of sentences in a dataframe column

July 27, 2022

Context: I’m trying to obtain the average length of words for a column in a dataframe.

Basically if we have these 3 sentences in a dataframe:

Sentence1 = "This is a sentence"
Sentence2 = "This is a larger sentence"
Sentence3 = "This is an even larger sentence"

The output should be the average lenght of them, split by word. So for sentence1 `len(x.split(" "))" would be 4, sentence2 would be 5 and sentence3 would be 6 and their average would be 5.

How could I do this in a dataframe?

I was trying this

avg = df['strings'].apply(lambda x: np.mean([len(words.split(" ")) for words in x if isnstance(x,str)]))

This doesn’t really make much sense since "x" would already be the string so "words" would actually be looping through the characters and that’s not what I want (plus a single character doesn’t have attr split)

Also, would be nice to filter out "strings" that only contain floats/NaN/only numbers (hence the isinstance(x,str).

How could I get the length of x.split(" ") only and only if x is a string? And then do the average of the sum of words for all the sentences?

Thank you in advance

>Solution :

import pandas as pd

df = pd.DataFrame({'sentence':
                   ["This is a sentence",
                    "This is a larger sentence",
                    "This is an even larger sentence",
                    "",
                    1,
                    None]})

df = 
                          sentence
0               This is a sentence
1        This is a larger sentence
2  This is an even larger sentence
3                                 
4                                1
5                             None

df['length'] = df['sentence'].apply(
    lambda row: min(len(row.split(" ")), len(row)) if isinstance(row, str) else None
)

df['length'] = 
0    4.0
1    5.0
2    6.0
3    0.0
4    NaN
5    NaN

df['length'].mean() = 3.75

If you want to assign the length 1 for "", use len(row.split(" ")) instead of min(len(row.split(" ")), len(row)).

pandas

byMR

Published July 27, 2022

Add a comment

How to read intergers from file?

byMR

July 27, 2022

Questions

dynamically add inputs based on a input number I give

byMR

July 27, 2022

Questions

VBA how do I build reference to a userform object property from a string?

byMR

July 27, 2022

Questions

TypeScript add a prefix to each type in a string union type

byMR

July 27, 2022

Questions

How to receive data from multiple channels using for/select syntax?

byMR

July 27, 2022

Questions

When try to filter data and make some change as per requirement even afte make new Df from pandas

byMR

July 27, 2022

Obtain the average lenght of words of sentences in a dataframe column

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How to read intergers from file?

dynamically add inputs based on a input number I give

VBA how do I build reference to a userform object property from a string?

TypeScript add a prefix to each type in a string union type

How to receive data from multiple channels using for/select syntax?

When try to filter data and make some change as per requirement even afte make new Df from pandas

Keep Up to Date with the Most Important News

Obtain the average lenght of words of sentences in a dataframe column

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How to read intergers from file?

dynamically add inputs based on a input number I give

VBA how do I build reference to a userform object property from a string?

TypeScript add a prefix to each type in a string union type

How to receive data from multiple channels using for/select syntax?

When try to filter data and make some change as per requirement even afte make new Df from pandas

Discover more from Dev solutions