I’m working up some data where I’ve looked at data and created a flag column because I want to provide a column (I’ve called it result although this is my aim) where it is empty top_tier is empty or returns the parent region.
I’ve tried ffill and apply but I can’t get the result.
Update: My original thinking was to use an apply function that if top_tier was null to return the region above that row and then have a ffill, after removing all the top_tier rows.
region admissions_total year top_tier result
England 292404 2014 TRUE
North East 17409 2014 TRUE
North East 17409 2014 TRUE
County Durham 2240 2014 North East
Darlington 283 2014 North East
North West 41358 2014 TRUE
North West 41358 2014 TRUE
Ashton, Leigh & Wigan 1943 2014 North West
Blackburn With Darwen Care Trust Plus 865 2014 North West
East Midlands 19256 2014 TRUE
East Midlands 19256 2014 TRUE
Bassetlaw 428 2014 East Midlands
Derby 2524 2014 East Midlands
>Solution :
Example Code
import pandas as pd
data1 = {'region': ['England', 'North East', 'North East', 'County Durham', 'Darlington', 'North West',
'North West', 'Ashton', 'Blackburn',
'East Midlands', 'East Midlands', 'Bassetlaw', 'Derby'],
'admissions_total': ['292404', '17409', '17409', '2240', '283', '41358', '41358', '1943',
'865', '19256', '19256', '428', '2524'],
'year': ['2014', '2014', '2014', '2014', '2014', '2014', '2014', '2014', '2015', '2014',
'2014', '2014', '2014'],
'top_tier': ['TRUE', 'TRUE', 'TRUE', None, None, 'TRUE', 'TRUE', None, None, 'TRUE',
'TRUE', None, None]}
df = pd.DataFrame(data1)
df
region admissions_total year top_tier
0 England 292404 2014 TRUE
1 North East 17409 2014 TRUE
2 North East 17409 2014 TRUE
3 County Durham 2240 2014 None
4 Darlington 283 2014 None
5 North West 41358 2014 TRUE
6 North West 41358 2014 TRUE
7 Ashton 1943 2014 None
8 Blackburn 865 2015 None
9 East Midlands 19256 2014 TRUE
10 East Midlands 19256 2014 TRUE
11 Bassetlaw 428 2014 None
12 Derby 2524 2014 None
Code
use where and ffill
cond = df['top_tier'].eq('TRUE')
df.assign(result=df['region'].where(cond).ffill().where(~cond))
result:
region admissions_total year top_tier result
0 England 292404 2014 TRUE NaN
1 North East 17409 2014 TRUE NaN
2 North East 17409 2014 TRUE NaN
3 County Durham 2240 2014 None North East
4 Darlington 283 2014 None North East
5 North West 41358 2014 TRUE NaN
6 North West 41358 2014 TRUE NaN
7 Ashton 1943 2014 None North West
8 Blackburn 865 2015 None North West
9 East Midlands 19256 2014 TRUE NaN
10 East Midlands 19256 2014 TRUE NaN
11 Bassetlaw 428 2014 None East Midlands
12 Derby 2524 2014 None East Midlands