I have pandas dataframe and I am trying to select multiple columns (column range starting from Test to Bio Ref). Selection has to start from column Name Test to any column name whose Name starts with Bio. Below is the sample dataframe.
In reality it can contain:
- any number of columns before
Testcolumn, - any number of columns between Test &
Bio Reflike 2,3,4,5 etc. - any number of columns after
Bio Ref. Bio Refcolumn can contain suffix in it butBio Refwill be there as start of column name always.
df_chunk = pd.DataFrame({'Waste':[None,None],
'Test':['something', 'something'],
'2':[None,None],
'3':[None,None],
'Bio Ref':['2-50','15-100'],
'None':[None,None]})
df_chunk
Waste Test 2 3 Bio Ref None
0 None something None None 2-50 None
1 None something None None 15-100 None
I have tried below codes that work:
df_chunk.columns.str.startswith('Bio')
df_chunk[df_chunk.columns[pd.Series(df_chunk.columns).str.startswith('Bio')==1]]
Issue: But when I try to use them for multiple column Selection then it doesn’t work:
df_chunk.loc[:,'Test':df_chunk.columns.str.startswith('Bio')]
>Solution :
You can creates masks for boolean indexing:
m1 = np.maximum.accumulate(df_chunk.columns=='Test')
# array([False, True, True, True, True, True])
m2 = np.maximum.accumulate(df_chunk.columns.str.startswith('Bio')[::-1])[::-1]
# array([ True, True, True, True, True, False])
# m1 & m2
# array([False, True, True, True, True, False])
out = df_chunk.loc[:, (m1&m2)]
Or identify the correct names to build a slice:
start = 'Test'
end = next(iter(df_chunk.columns[df_chunk.columns.str.startswith('Bio')]), None)
out = df_chunk.loc[:, slice(start, end)]
Output:
Test 2 3 Bio Ref
0 something None None 2-50
1 something None None 15-100