Home python code to extract a record from a data frame from excel based on condition and create and input as column value

Questions

python code to extract a record from a data frame from excel based on condition and create and input as column value

byMR

April 11, 2023

I am trying to extract a record from dataframe basing on condition and make it as a column. I have tried to work on below code .

import pandas as pd
df = pd.read_excel("C:/Users/0403 - Copy (3).xlsx")

def add_action(row): 
    if row["FILE TYPE"] == 'NaN': 
        return row['Row Labels']   

df = df.assign(Label_header=df.apply(add_action, axis=1))

Input dataframe

       |FILE TYPE |Row Labels               |TOTAL COUNT|    

0      | NaN      | CHANGE                  | 668   |
1      | 87I      | ei.cfc.h87.n5.20211     | 98    |   
2      | 87P      | ep.cfc.m87.n5.2023      | 570   |   
3      | NaN      | CHN                     | 5642  |   
4      | 87P      | NMMCMS.AC.2021_1_R.txt  | 1     |   
5      | 87P      | NS.AC.201_1_R.txt       | 1     |

Expected output dataframe

       |FILE TYPE |Row Labels               |TOTAL COUNT|    Label_header |

0      | 87I      | ei.cfc.h87.n5.20211     | 98    |        CHANGE
1      | 87P      | ep.cfc.m87.n5.2023      | 570   |        CHANGE
2      | 87P      | NMMCMS.AC.2021_1_R.txt  | 1     |        CHN
3      | 87P      | NS.AC.201_1_R.txt       | 1     |        CHN

>Solution :

Use Series.isna for test missing values with Series.where and Series.ffill for repeat values of Row Labels by mask, last use boolean indexing with inverted mask by ~:

m = df["FILE TYPE"].isna()

df['Label_header'] = df['Row Labels'].where(m).ffill()

out = df[~m].reset_index(drop=True)
print (out)
  FILE TYPE              Row Labels  TOTAL COUNT Label_header
0       87I     ei.cfc.h87.n5.20211           98       CHANGE
1       87P      ep.cfc.m87.n5.2023          570       CHANGE
2       87P  NMMCMS.AC.2021_1_R.txt            1          CHN
3       87P       NS.AC.201_1_R.txt            1          CHN

m = df["FILE TYPE"].isna()

out = df.assign(Label_header=df['Row Labels'].where(m).ffill())[~m].reset_index(drop=True)
print (out)
  FILE TYPE              Row Labels  TOTAL COUNT Label_header
0       87I     ei.cfc.h87.n5.20211           98       CHANGE
1       87P      ep.cfc.m87.n5.2023          570       CHANGE
2       87P  NMMCMS.AC.2021_1_R.txt            1          CHN
3       87P       NS.AC.201_1_R.txt            1          CHN