pandas/ python sum data different result

i have a conceptional problem.

I working on pandas fron kaggle for learn and train my new skill.
I tried to solve an exercise, but I don’t understand why the result
is different from what I expected


"There are only so many words you can use when describing a bottle of wine. Is a wine more likely to be "tropical" or "fruity"? Create a Series descriptor_counts counting how many times each of these two words appears in the description column in the dataset. (For simplicity, let's ignore the capitalized versions of these words.)"

my answer:

tropical_count= reviews["description"].str.count(pat ="tropical").sum()
fruity_count= reviews["description"].str.count(pat ="fruity").sum()

descriptor_counts = pd.Series({"tropical":tropical_count,"fruity":fruity_count},index=["tropical","fruity"])

kaggle answare:

n_trop = desc: "tropical" in desc).sum()
n_fruity = desc: "fruity" in desc).sum()
descriptor_counts = pd.Series([n_trop, n_fruity], index=['tropical', 'fruity'])

all work grate, but the result are different, does anyone know why?

my result

tropical    3703
fruity      9259
dtype: int64

kaggle result

tropical    3607
fruity      9090
dtype: int64

>Solution :

Output is expected, because str.count counts substrings, but if use in operator it test only if exist value. So ouput is only True or False. Then if use sum boolean Trues are processing like 1 and False like 0, so ouput is different.


reviews = pd.DataFrame(["Ttropical are tropical so fruity words you can",
                   "fruity ",
                   "fruity fruity",
                   "anythi"], columns=['description'])

tropical_count= reviews["description"].str.count(pat ="tropical")
fruity_count= reviews["description"].str.count(pat ="fruity")
print (tropical_count)
0    2
1    0
2    0
3    0
Name: description, dtype: int64
print (fruity_count)
0    1
1    1
2    2
3    0
Name: description, dtype: int64

n_trop = desc: "tropical" in desc)
n_fruity = desc: "fruity" in desc)
print (n_trop)
0     True
1    False
2    False
3    False
Name: description, dtype: bool

print (n_fruity)
0     True
1     True
2     True
3    False
Name: description, dtype: bool

Leave a Reply