Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to find the columns that contain consecutive values in a pandas DataFrame?

I have a pandas DataFrame like this

import pandas as pd
import numpy as np

data = {
    'col1': [0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, np.nan, np.nan, np.nan],
    'col2': [1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0],
    'col3': [1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0]
}

df = pd.DataFrame(data)

print(df)
#     col1  col2  col3
# 0    0.0   1.0   1.0
# 1    1.0   1.0   1.0
# 2    1.0   1.0   0.0
# 3    1.0   1.0   1.0
# 4    1.0   1.0   1.0
# 5    0.0   0.0   0.0
# 6    1.0   0.0   0.0
# 7    1.0   0.0   0.0
# 8    0.0   0.0   0.0
# 9    0.0   1.0   1.0
# 10   0.0   1.0   1.0
# 11   NaN   1.0   0.0
# 12   NaN   1.0   1.0
# 13   NaN   1.0   1.0

How can I find the columns that have 4 or more consecutive 1.0?

In my example, col1 and col2 are what I want to find.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Because df['col1'][1:5] contains 4 1.0s and df['col2'][:5] contains 5 1.0s.

>Solution :

If large DataFrame you can avoid groupby and use this solution for count consecutive 1 values, last filter columns names:

m = df.eq(1)
b = m.cumsum()
s = b.sub(b.mask(m).ffill().fillna(0)).ge(4).any()
print (s)
col1     True
col2     True
col3    False
dtype: bool


out = s.index[s].tolist()
print (out)
['col1', 'col2']

Or:

df1 = df.loc[:, s]
print (df1)
    col1  col2
0    0.0   1.0
1    1.0   1.0
2    1.0   1.0
3    1.0   1.0
4    1.0   1.0
5    0.0   0.0
6    1.0   0.0
7    1.0   0.0
8    0.0   0.0
9    0.0   1.0
10   0.0   1.0
11   NaN   1.0
12   NaN   1.0
13   NaN   1.0

Details:

print (b.sub(b.mask(m).ffill().fillna(0)))
    col1  col2  col3
0    0.0   1.0   1.0
1    1.0   2.0   2.0
2    2.0   3.0   0.0
3    3.0   4.0   1.0
4    4.0   5.0   2.0
5    0.0   0.0   0.0
6    1.0   0.0   0.0
7    2.0   0.0   0.0
8    0.0   0.0   0.0
9    0.0   1.0   1.0
10   0.0   2.0   2.0
11   0.0   3.0   0.0
12   0.0   4.0   1.0
13   0.0   5.0   2.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading