Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python pandas How to pick up certain values by internal numbering?

I have a dataframe that looks like this:

    Answers  all_answers  Score
0       0.0            0     72
1       0.0            0     73
2       0.0            0     74
3       1.0            1      1
4      -1.0            1      2
5       1.0            1      3
6      -1.0            1      4
7       1.0            1      5
8       0.0            0      1
9       0.0            0      2
10     -1.0            1      1
11      0.0            0      1
12      0.0            0      2
13      1.0            1      1
14      0.0            0      1
15      0.0            0      2
16      1.0            1      1

The first column is a signal that the sign has changed in the calculation flow

The second one is I just removed the minus from the first one

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The third is an internal account for the second column – how much was one and how much was zero

I want to add a fourth column to it that would show me only those units that went in a row for example 5 times while observing the sign of the first column.

To get something like this

    Answers  all_answers  Score  New
0       0.0            0     72    0
1       0.0            0     73    0
2       0.0            0     74    0
3       1.0            1      1    1
4      -1.0            1      2   -1
5       1.0            1      3    1
6      -1.0            1      4   -1
7       1.0            1      5    1
8       0.0            0      1    0
9       0.0            0      2    0
10     -1.0            1      1    0
11      0.0            0      1    0
12      0.0            0      2    0
13      1.0            1      1    0
14      0.0            0      1    0
15      0.0            0      2    0
16      1.0            1      1    0
17      0.0            0      1    0

Is it possible to do this by Pandas ?

>Solution :

You can use:

# group by consecutive 0/1
g = df['all_answers'].ne(df['all_answers'].shift()).cumsum()

# get size of each group and compare to threshold
m = df.groupby(g)['all_answers'].transform('size').ge(5)

# mask small groups
df['New'] = df['Answers'].where(m, 0)

Output:

    Answers  all_answers  Score  New
0       0.0            0     72  0.0
1       0.0            0     73  0.0
2       0.0            0     74  0.0
3       1.0            1      1  1.0
4      -1.0            1      2 -1.0
5       1.0            1      3  1.0
6      -1.0            1      4 -1.0
7       1.0            1      5  1.0
8       0.0            0      1  0.0
9       0.0            0      2  0.0
10     -1.0            1      1  0.0
11      0.0            0      1  0.0
12      0.0            0      2  0.0
13      1.0            1      1  0.0
14      0.0            0      1  0.0
15      0.0            0      2  0.0
16      1.0            1      1  0.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading