Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas/ python sum data different result

i have a conceptional problem.

I working on pandas fron kaggle for learn and train my new skill.
I tried to solve an exercise, but I don’t understand why the result
is different from what I expected

question:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

"There are only so many words you can use when describing a bottle of wine. Is a wine more likely to be "tropical" or "fruity"? Create a Series descriptor_counts counting how many times each of these two words appears in the description column in the dataset. (For simplicity, let's ignore the capitalized versions of these words.)"

my answer:

tropical_count= reviews["description"].str.count(pat ="tropical").sum()
fruity_count= reviews["description"].str.count(pat ="fruity").sum()

descriptor_counts = pd.Series({"tropical":tropical_count,"fruity":fruity_count},index=["tropical","fruity"])

kaggle answare:

n_trop = reviews.description.map(lambda desc: "tropical" in desc).sum()
n_fruity = reviews.description.map(lambda desc: "fruity" in desc).sum()
descriptor_counts = pd.Series([n_trop, n_fruity], index=['tropical', 'fruity'])

all work grate, but the result are different, does anyone know why?

my result

tropical    3703
fruity      9259
dtype: int64

kaggle result

tropical    3607
fruity      9090
dtype: int64

>Solution :

Output is expected, because str.count counts substrings, but if use in operator it test only if exist value. So ouput is only True or False. Then if use sum boolean Trues are processing like 1 and False like 0, so ouput is different.

Sample:

reviews = pd.DataFrame(["Ttropical are tropical so fruity words you can",
                   "fruity ",
                   "fruity fruity",
                   "anythi"], columns=['description'])

tropical_count= reviews["description"].str.count(pat ="tropical")
fruity_count= reviews["description"].str.count(pat ="fruity")
print (tropical_count)
0    2
1    0
2    0
3    0
Name: description, dtype: int64
print (fruity_count)
0    1
1    1
2    2
3    0
Name: description, dtype: int64

n_trop = reviews.description.map(lambda desc: "tropical" in desc)
n_fruity = reviews.description.map(lambda desc: "fruity" in desc)
print (n_trop)
0     True
1    False
2    False
3    False
Name: description, dtype: bool

print (n_fruity)
0     True
1     True
2     True
3    False
Name: description, dtype: bool
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading