I am working with the Pandas Library and learning more about it with data manipulation and analysis. With this Dataframe ITEM
ItemNo1 ItemNo2 ItemNo3 ItemNo4
0 Paper Paper Rock Paper
1 Scissor Scissor Rock NaN
2 Rock Rock Rock Paper
3 Scissor Paper Scissor Paper
4 Rock Paper Scissor Rock
I’d like to create a Dataframe with two columns. Item and Unique Item Row Count. The unique item row count should only have the unique row values of Item. Meaning if there are three of the same item in the row, it will only take into account the first instance for count. This would mean the following output:
Item UniqueItemRowCount
0 Paper 4
1 Rock 4
2 Scissor 3
There will be 4 for rock because for index 0 and 1 there is one rock. For index 2 and 4 there is already one Rock found so the rest are ignored/not added for the sum.
My main issue is trying to create a lambda function for this as I cannot make it properly get the sum. I have also tried to make the values NaN (null) if there is a duplicate found in the row but this does not work either.
>Solution :
Use DataFrame.melt with DataFrame.drop_duplicates for count duplicates per rows and then count values by Series.value_counts:
df1 = (df.melt()
.drop_duplicates()['value']
.value_counts()
.rename_axis('Item')
.reset_index(name='UniqueItemRowCount'))
print (df1)
Item UniqueItemRowCount
0 Rock 4
1 Scissor 3
2 Paper 3