Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to create a series of integers in a dataframe only when there are no NaNs, and integers increasing by 1 unit?

I have a dataframe that look like this:

data = pd.DataFrame([NaN, NaN, NaN, 0.5, 1.2, 2.7, 3.8, NaN, NaN, 0.1, 0.7, 2.3, NaN, NaN, NaN, NaN, NaN, 0.01, 0.4, 1.5, 2.8, 4.5, 5.6, NaN, NaN, NaN, NaN, NaN, NaN])

The extra column I need to create should look like this:

data = pd.DataFrame([NaN, NaN, NaN, 1, 1, 1, 1, NaN, NaN, 2, 2, 2, NaN, NaN, NaN, NaN, NaN, 3, 3, 3, 3, 3, 3, NaN, NaN, NaN, NaN, NaN, NaN])

Basically, when there are NaNs, the integer remains the same, and when not a NaN is found it should increment by 1 unit and be constant until another NaN is found.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Is there an elegant way to do this without going through every row individually? My dataframe is very large.

>Solution :

Let’s consider this input data

NaN = np.nan
data = pd.DataFrame(
    [NaN, NaN, NaN, 0.5, 1.2, 2.7, 3.8, NaN, NaN, 0.1, 0.7, 2.3, NaN, NaN, 
     NaN, NaN, NaN, 0.01, 0.4, 1.5, 2.8, 4.5, 5.6, NaN, NaN, NaN, NaN, NaN, NaN])

So the idea is to first create a Boolean column where it is not nan with notna.

m = data[0].notna()

Now, you can get the diff on this Boolean column to get True when it change from True to False or vice-versa. Keep the True only when it changes when notna with &m. Use cumsum to get the incremental integer and then where with the Boolean mask m to replace by NaN where it was Nan originally. So in one liner:

data['res'] = (m.diff()&m).cumsum().where(m)
print(data)
       0  res
0    NaN  NaN
1    NaN  NaN
2    NaN  NaN
3   0.50  1.0
4   1.20  1.0
5   2.70  1.0
6   3.80  1.0
7    NaN  NaN
8    NaN  NaN
9   0.10  2.0
10  0.70  2.0
11  2.30  2.0
12   NaN  NaN
13   NaN  NaN
...
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading