So I have a pandas dataframe full of repeats. I’m trying to count how many times each row has been repeated and put that into a column.
Here’s my code – found it on stack overflow
def getUniqCounts2(dupDF ):
countARR=[]
uniqDF = getUniq(dupDF)
for i in range( 0,len(uniqDF)):
df2 = len(dupDF[(dupDF["variable1"]==uniqDF['variable1'][i]) &
(dupDF["variable2"]==uniqDF['variable2'][i])])
print(df2)
countARR.append(df2)
uniqDF['count'] = countARR
return uniqDF
Someone did recommend the pivot table function, but the problem is that the resulting DF has a column that is only partially filled even though the input data frame had all columns filled.
Now I have a DF of 1.5M rows. I just want to get the unique rows, and the no of repetitions found per row. What would be the fastest way to do this?
>Solution :
Suppose you have a DataFrame like named df, you can count the number of occurrences by simply using groupby() function like below:
counts = df.groupby(df.columns.tolist()).size().reset_index().rename(columns={0:'count'})
I guess this is much more efficient.