Home Pandas filter/subset columns based on conditions

Questions

Pandas filter/subset columns based on conditions

April 7, 2022

I have about 300 columns that are basically encoding of categorical variables. I’d like to drop columns where sum of values of column is <, say 3.

import pandas as pd

df = pd.DataFrame({
                   'id': [0, 1, 2, 3, 4, 5],
                   'col1': [0, 0, 0, 0, 0, 1],
                   'col2': [0, 1, 0, 0, 1, 0],
                   'col3': [1, 1, 0, 1, 1, 0],
                   'col4': [0, 1, 1, 1, 1, 0]
                 })

df.sum(axis=0)

Expected output:

id col3 col4
0     1    0
1     1    1
2     0    1
3     1    1
4     1    1
5     0    0

>Solution :

You can use loc to use a boolean indexing on the columns:

N = 3
out = df.loc[:, df.sum(axis=0) > N]

If id is not actually numeric or if N can be a very large number, then maybe set_index with id first, then use boolean indexing and reset_index back to original:

df = df.set_index('id')
df = df.loc[:, df.sum(axis=0)>3].reset_index()

Output:

   id  col3  col4
0   0     1     0
1   1     1     1
2   2     0     1
3   3     1     1
4   4     1     1
5   5     0     0

pandas

byMR

Published April 07, 2022

Add a comment

MongoDB UpdateMany Child Array without filter

byMR

April 7, 2022

Questions

How to use a declarative pattern on Observables that watch the network?

byMR

April 7, 2022

Questions

Python urllib3 doesn't seem to be sending fields data

byMR

April 7, 2022

Questions

Is there a Gson @DeserializedName equivalent to the @SerializedName annotation?

byMR

April 7, 2022

Questions

Reason why the code doesn't sum the values on the match column

byMR

April 7, 2022

Questions

Finding the first element of an array that satisfies a condition using an Async function

byMR

April 7, 2022

Pandas filter/subset columns based on conditions

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

MongoDB UpdateMany Child Array without filter

How to use a declarative pattern on Observables that watch the network?

Python urllib3 doesn't seem to be sending fields data

Is there a Gson @DeserializedName equivalent to the @SerializedName annotation?

Reason why the code doesn't sum the values on the match column

Finding the first element of an array that satisfies a condition using an Async function

Keep Up to Date with the Most Important News

Pandas filter/subset columns based on conditions

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

MongoDB UpdateMany Child Array without filter

How to use a declarative pattern on Observables that watch the network?

Python urllib3 doesn't seem to be sending fields data

Is there a Gson @DeserializedName equivalent to the @SerializedName annotation?

Reason why the code doesn't sum the values on the match column

Finding the first element of an array that satisfies a condition using an Async function

Discover more from Dev solutions