Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas interpolation adding rows by group with different ranges for each group

I am trying to add rows to a DataFrame interpolating values in a column by group, and fill with missing all other columns. My data looks something like this:

import pandas as pd 
import random

random.seed(42)
data = {'group':['a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c' ],
        'value' : [1, 2, 5,  3, 4, 5, 7, 4, 7, 9],
        'other': random.sample(range(1, 100), 10)}

df = pd.DataFrame(data)
print(df)
  group  value  other
0     a      1     82
1     a      2     15
2     a      5      4
3     b      3     95
4     b      4     36
5     b      5     32
6     b      7     29
7     c      4     18
8     c      7     14
9     c      9     87

What I am trying to achieve is something like this:

   group  value  other
      a      1     82
      a      2     15
      a      3     NaN
      a      4     NaN
      a      5     NaN
      b      3     95
      b      4     36
      b      5     32
      b      6     NaN
      b      7     29
      c      4     18
      c      5     NaN
      c      6     NaN
      c      7     14
      c      8     NaN
      c      9     87

For example, group a has a range from 1 to 5, b from 3 to 7, and c from 4 to 9.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The issue I’m having is that each group has a different range. I found something that works assuming a single range for all groups. This could work using the global min and max and dropping extra rows in each group, but since my data is fairly large adding many rows per group quickly becomes unfeasible.

>Solution :

>>> df.groupby('group').apply(lambda x: x.set_index('value').reindex(np.arange(x['value'].min(), x['value'].max() + 1))).drop(columns='group').reset_index()
   group  value  other
0      a      1   82.0
1      a      2   15.0
2      a      3    NaN
3      a      4    NaN
4      a      5    4.0
5      b      3   95.0
6      b      4   36.0
7      b      5   32.0
8      b      6    NaN
9      b      7   29.0
10     c      4   18.0
11     c      5    NaN
12     c      6    NaN
13     c      7   14.0
14     c      8    NaN
15     c      9   87.0

We group on the group column and then re-index each group with the range from the min to the max of the value column

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading