Home count lines equal with combinations in Python

Questions

count lines equal with combinations in Python

November 15, 2022

I have a dataframe like this:

col1	col2	col3	col N
x	y	z	f
y	x	z	f
f	none	none	none
z	y	x	f

I need to count the rows that equal, regardless of their combinations.

It means that, in this case, the output shoud be something like this:

col1	col2	col3	col N	freq
x	y	z	f	3
f	none	none	none	1

This bacause, according to the input dataset, there are three rows that have the same sequence (line 1, line 2, and line 4).

I tried to use the function "value_counts", however, according to the documentation, this function count only the unique values.

N.B. The initial dataset contain over 200 column.

Any solution?

Thanks

>Solution :

You can aggregate the columns as a unique object depending on the exact logic (a frozenset, a sorted tuple, etc.), then count the values of perform a groupby:

I would use:

out = df.agg(frozenset, axis=1).value_counts()

# or, if NaN should be ignored
out = df.agg(lambda x: frozenset(x.dropna()), axis=1).value_counts()

NB. if you can have multiple values and what to consider those, use sorted in place of frozenset

Output:

(x, y, f, z)    3
(f)             1
dtype: int64

If you really want a DataFrame with one of the original rows:

g = df.groupby(df.agg(frozenset, axis=1), as_index=False)
out = g.first().assign(freq=g['col1'].count())

Output:

  col1  col2  col3  colN  freq
0    x     y     z     f     3
1    f  None  None  None     1

dataframe

byMR

Published November 15, 2022

Add a comment

Looping through vector of variable names in R

byMR

November 15, 2022

Questions

Unable to run julia 1.7.3 kernel on jupyter lab

byMR

November 15, 2022

Questions

compare two columns in data frame, then produce 1 or 0 if they are equal or not

byMR

November 15, 2022

Questions

Calculating the change in % of data by year

byMR

November 15, 2022

Questions

Check child count for the active child objects

byMR

November 15, 2022

Questions

Why am I only targeting panel4 (not the clicked panel) when adding event listeners via a getElementsByClassName array

byMR

November 15, 2022

count lines equal with combinations in Python

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Looping through vector of variable names in R

Unable to run julia 1.7.3 kernel on jupyter lab

compare two columns in data frame, then produce 1 or 0 if they are equal or not

Calculating the change in % of data by year

Check child count for the active child objects

Why am I only targeting panel4 (not the clicked panel) when adding event listeners via a getElementsByClassName array

Keep Up to Date with the Most Important News

count lines equal with combinations in Python

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Looping through vector of variable names in R

Unable to run julia 1.7.3 kernel on jupyter lab

compare two columns in data frame, then produce 1 or 0 if they are equal or not

Calculating the change in % of data by year

Check child count for the active child objects

Why am I only targeting panel4 (not the clicked panel) when adding event listeners via a getElementsByClassName array

Discover more from Dev solutions