Home Counting occurrences based on another column in a dataframe

Questions

Counting occurrences based on another column in a dataframe

November 26, 2021

In a database, I have data that looks like this (just a snippet):

itemId   userId     action       likeorDislike     timestamp
i1       u1         rate         0                 2021-06-09 10:43:57.827 UTC
i1       u1         rate         1                 2021-06-10 10:43:57.827 UTC
i1       u2         rate         1                 2021-06-09 11:43:57.827 UTC
i1       u3         rate         1                 2021-06-09 12:43:57.827 UTC
i2       u6         rate         1                 2021-06-09 10:43:57.827 UTC
i2       u6         rate         0                 2021-08-09 10:43:57.827 UTC
i2       u1         rate         0                 2021-06-12 10:43:57.827 UTC
i4       u1         rate         1                 2021-06-09 10:45:57.827 UTC
i4       u1         rate         1                 2021-06-09 10:48:57.827 UTC
i4       u3         rate         1                 2021-06-09 10:45:58.827 UTC
i1       u5         select                         2021-06-09 10:45:58.827 UTC

I want to compute a score for each itemId, which is the number of 1‘s over the sum of likeorDislike for that particular item, but only take into consideration the first attempt of a user to rate that particular item.

itemId     score
i1         0.66                 
i2         0.5
i4         1

What I did:
Only kept the data that interests me (ordered by timestamp, cleaned etc.), but I don’t know how to compute the above scores now:

itemId     likeOrDislike     
i1         0      
i1         1 
i2         1
i1         1
i2         0
i4         1
i4         1

I also wrote a SQL query that does compute these scores but doesn’t take into consideration the timestamp.

%%bigquery counting_answers_new
SELECT trueAnswers.itemId, IFNULL(falseAnswers.answerCount, 0) as falseAnswerCount, trueAnswers.answerCount as trueAnswerCount, IFNULL((trueAnswers.answerCount/(falseAnswers.answerCount+trueAnswers.answerCount)), 1) as score
FROM 
 ( SELECT itemId, likeorDislike, action, COUNT(*) as answerCount
 FROM `mydata` 
 GROUP BY itemId, likeorDislike, action
 HAVING action='rate' AND likeorDislike=false
 ORDER BY itemId) as falseAnswers
 RIGHT JOIN
 ( SELECT itemId, likeorDislike, acton, COUNT(*) as answerCount
 FROM `mydata` 
 GROUP BY itemId, likeorDislike, action
 HAVING action='rate' AND likeorDislike=true
 ORDER BY itemId) as trueAnswers
 ON falseAnswers.itemId = trueAnswers.itemId

I would like to do it in pandas and proceed from the dataframe with two columns that I have above because I already cleaned it/filtered it. I know how to count occurrences of 0’s and 1’s, but how do I do that pro item, and how do I compute these scores?

>Solution :

It seems like you just want the mean of the itemId’s likeOrDislike. In pandas you could do this:

df[["itemId", "likeOrDislike"]].groupby("itemId").mean()

google-bigquery

byMR

Published November 26, 2021

Add a comment

df.groupby(df.index.to_period('M')).last() generates a new date index exceeding the range of the initial date index

byMR

November 26, 2021

Questions

Compresssing Error Unexpected character `

byMR

November 26, 2021

Questions

Node Express dynamic route/path

byMR

November 26, 2021

Questions

deleting list element if the element is present on another column of a dataframe

byMR

November 26, 2021

Questions

Pointer to const object with emplace_back

byMR

November 26, 2021

Questions

Select index pandas Series by a specific value

byMR

November 26, 2021

Counting occurrences based on another column in a dataframe

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

df.groupby(df.index.to_period('M')).last() generates a new date index exceeding the range of the initial date index

Compresssing Error Unexpected character `

Node Express dynamic route/path

deleting list element if the element is present on another column of a dataframe

Pointer to const object with emplace_back

Select index pandas Series by a specific value

Keep Up to Date with the Most Important News

Counting occurrences based on another column in a dataframe

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

df.groupby(df.index.to_period('M')).last() generates a new date index exceeding the range of the initial date index

Compresssing Error Unexpected character `

Node Express dynamic route/path

deleting list element if the element is present on another column of a dataframe

Pointer to const object with emplace_back

Select index pandas Series by a specific value

Discover more from Dev solutions