Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Fill missing value with with value from another row and if not take the next similar value

I have a problem. I have missing numbers inside the column materialNumber. But if a similar price is it should take the exactly same materialNumber. If more than two materialNumber occur with the same price it should take the first. If no materialNumber is found with the same price it should take the next nearst materialnumber depending on the price.

Dataframe

   customerId  materialNumber  price
0           1          1234.0    100
1           1          4562.0     20
2           2             NaN    100
3           2          4562.0     30
4           3          1547.0     40
5           3             NaN     37

Code

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd
d = {
    "customerId": [1, 1, 2, 2, 3, 3],
    "materialNumber": [
        1234,
        4562,
        None,
        4562,
        1547,
        None, 
    ],
    "price": [100, 20, 100, 30, 40, 37],
}
df = pd.DataFrame(data=d)
print(df)

import numpy as np
def find_next(x):
  if(x['materialNumber'] == None):
    #if price occurs only once it should finde the next nearst price
    if(x['price'].value_counts().shape[0] == 1): 
      return x.drop_duplicates(subset=['price'], keep="first")
    else:
      return x.iloc[(x['price']-input).abs().argsort()[:2]]

df['materialNumber'] = df.apply(lambda x: find_next(x), axis=1)

What I want

   customerId  materialNumber  price
0           1          1234.0    100
1           1          4562.0     20
2           2            1234    100 # of index 0: 1234.0, 100 (same value)
3           2          4562.0     30
4           3          1547.0     40
5           3            1547     37 # of index 4: 1547.0, 40 (next similar value)

>Solution :

Use merge_asof with match rows with missing values per materialNumber by rows without missing values and assign values in DataFrame.loc:

m = df['materialNumber'].isna()
new = pd.merge_asof(df[m].reset_index().sort_values('price'),
                    df[~m].sort_values('price'), on='price', direction='nearest')

df.loc[m, 'materialNumber'] = new.set_index('index')['materialNumber_y']
print(df)
   customerId  materialNumber  price
0           1          1234.0    100
1           1          4562.0     20
2           2          1234.0    100
3           2          4562.0     30
4           3          1547.0     40
5           3          1547.0     37
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading