I am trying to extract a record from dataframe basing on condition and make it as a column. I have tried to work on below code .
import pandas as pd
df = pd.read_excel("C:/Users/0403 - Copy (3).xlsx")
def add_action(row):
if row["FILE TYPE"] == 'NaN':
return row['Row Labels']
df = df.assign(Label_header=df.apply(add_action, axis=1))
Input dataframe
|FILE TYPE |Row Labels |TOTAL COUNT|
0 | NaN | CHANGE | 668 |
1 | 87I | ei.cfc.h87.n5.20211 | 98 |
2 | 87P | ep.cfc.m87.n5.2023 | 570 |
3 | NaN | CHN | 5642 |
4 | 87P | NMMCMS.AC.2021_1_R.txt | 1 |
5 | 87P | NS.AC.201_1_R.txt | 1 |
Expected output dataframe
|FILE TYPE |Row Labels |TOTAL COUNT| Label_header |
0 | 87I | ei.cfc.h87.n5.20211 | 98 | CHANGE
1 | 87P | ep.cfc.m87.n5.2023 | 570 | CHANGE
2 | 87P | NMMCMS.AC.2021_1_R.txt | 1 | CHN
3 | 87P | NS.AC.201_1_R.txt | 1 | CHN
>Solution :
Use Series.isna for test missing values with Series.where and Series.ffill for repeat values of Row Labels by mask, last use boolean indexing with inverted mask by ~:
m = df["FILE TYPE"].isna()
df['Label_header'] = df['Row Labels'].where(m).ffill()
out = df[~m].reset_index(drop=True)
print (out)
FILE TYPE Row Labels TOTAL COUNT Label_header
0 87I ei.cfc.h87.n5.20211 98 CHANGE
1 87P ep.cfc.m87.n5.2023 570 CHANGE
2 87P NMMCMS.AC.2021_1_R.txt 1 CHN
3 87P NS.AC.201_1_R.txt 1 CHN
m = df["FILE TYPE"].isna()
out = df.assign(Label_header=df['Row Labels'].where(m).ffill())[~m].reset_index(drop=True)
print (out)
FILE TYPE Row Labels TOTAL COUNT Label_header
0 87I ei.cfc.h87.n5.20211 98 CHANGE
1 87P ep.cfc.m87.n5.2023 570 CHANGE
2 87P NMMCMS.AC.2021_1_R.txt 1 CHN
3 87P NS.AC.201_1_R.txt 1 CHN