Home Calculate list of missing columns id's

Questions

Calculate list of missing columns id's

November 17, 2021

I would want to calculate a list of missing columns id’s.

Current_dataframe:

data = {'DIVISION':  ['DENVER', 'JEWEL', 'JEWEL','DENVER', np.nan],
        'STORE_ID': [np.nan,np.nan,'0001','0001',np.nan],
        'FORECAST_DATE':['2021-08-15','2021-08-15','2021-08-15','2021-08-15','2021-08-15'],
        'START_TIME':[np.nan,np.nan,9.0,np.nan,np.nan],
        'FULFILLMENT_TYPE':['DUG','DELIVERY','DUG','DUG','DUG'],
        'MULTIPLIER':[1.04,1.10,1.04,1.10,0.90],
        'OVERWRITE':[np.nan,0,np.nan,1,2]
        }
df = pd.DataFrame(data)

Expected output:

data = {'DIVISION':  ['DENVER', 'JEWEL', 'JEWEL','DENVER', np.nan],
        'STORE_ID': [np.nan,np.nan,'0001','0001',np.nan],
        'FORECAST_DATE':['2021-08-15','2021-08-15','2021-08-15','2021-08-15','2021-08-15'],
        'START_TIME':[np.nan,np.nan,9.0,np.nan,np.nan],
        'FULFILLMENT_TYPE':['DUG','DELIVERY','DUG','DUG','DUG'],
        'MULTIPLIER':[1.04,1.10,1.04,1.10,0.90],
        'OVERWRITE':[np.nan,0,np.nan,1,2],
        'MISSING_ID':[[1,3],[1,3],np.nan,[3],[0,1,3]]
        }
expected = pd.DataFrame(data)

Where ‘MISSING_ID’ is based on missing values of columns DIVISION, STORE_ID, FORECAST_DATE, START_TIME and FULFILMENT_TYPE

I can achieve this by writing multiple np.where conditions and then using a list:

multiplier_df['DIV_MISSING'] = np.where(multiplier_df.DIVISION.isna(),1,0)
multiplier_df['STORE_MISSING'] = np.where(multiplier_df.STORE_ID.isna(),2,0)
multiplier_df['DATE_MISSING'] = np.where(multiplier_df.FORECAST_DATE.isna(),3,0)
multiplier_df['HOUR_MISSING'] = np.where(multiplier_df.START_TIME.isna(),4,0)
multiplier_df['FULFILLMENT_MISSING'] = np.where(multiplier_df.FULFILLMENT_TYPE.isna(),5,0)
multiplier_df['MISSING_ID'] = multiplier_df[['DIV_MISSING', 'STORE_MISSING','DATE_MISSING','HOUR_MISSING','FULFILLMENT_MISSING']].apply(list, axis=1)

However this results in 0’s which are unwanted and thus looking for a simpler way to do it. I see a similar solution in R here : " Return list of column names with missing (NA) data for each row of a data frame in R "
But need to do in Pandas

>Solution :

# 1) Take part of dataframe, only those columns in which we search for nans
df_part = df[['DIVISION', 'STORE_ID', 'FORECAST_DATE', 'START_TIME', 'FULFILLMENT_TYPE']]

# use pd.isna(df).values to get np.array of True/False indicating where nans are
# then use list comprehension and np.where to determine for each row which columns have nans
df['MISSING_ID'] = [np.where(row)[0] for row in pd.isna(df_part).values]
print(df)

  DIVISION STORE_ID FORECAST_DATE  ...  MULTIPLIER OVERWRITE  MISSING_ID
0   DENVER      NaN    2021-08-15  ...        1.04       NaN      [1, 3]
1    JEWEL      NaN    2021-08-15  ...        1.10       0.0      [1, 3]
2    JEWEL     0001    2021-08-15  ...        1.04       NaN          []
3   DENVER     0001    2021-08-15  ...        1.10       1.0         [3]
4      NaN      NaN    2021-08-15  ...        0.90       2.0   [0, 1, 3]

For now it gives [] instead of np.nan, but you can fix this easily.

df['MISSING_ID'] = df['MISSING_ID'].apply(lambda x: x if len(x) else np.nan)

numpy

byMR

Published November 17, 2021

Add a comment

Matching Columns in a dataframe and factoring in a List

byMR

November 17, 2021

Questions

ValueError: Length of values (4) does not match length of index (179) Pandas

byMR

November 17, 2021

Questions

Pandas show both unique counts and unique values?

byMR

November 17, 2021

Questions

Can't set max date on date input

byMR

November 17, 2021

Questions

what is the difference between <p> and text nodes?

byMR

November 17, 2021

Questions

Group by column on basis of occurrences

byMR

November 17, 2021

Calculate list of missing columns id's

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Matching Columns in a dataframe and factoring in a List

ValueError: Length of values (4) does not match length of index (179) Pandas

Pandas show both unique counts and unique values?

Can't set max date on date input

what is the difference between <p> and text nodes?

Group by column on basis of occurrences

Keep Up to Date with the Most Important News

Calculate list of missing columns id's

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Matching Columns in a dataframe and factoring in a List

ValueError: Length of values (4) does not match length of index (179) Pandas

Pandas show both unique counts and unique values?

Can't set max date on date input

what is the difference between <p> and text nodes?

Group by column on basis of occurrences

Discover more from Dev solutions