Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to select column range based on partial column names in Pandas Python?

I have pandas dataframe and I am trying to select multiple columns (column range starting from Test to Bio Ref). Selection has to start from column Name Test to any column name whose Name starts with Bio. Below is the sample dataframe.

In reality it can contain:

  1. any number of columns before Test column,
  2. any number of columns between Test & Bio Ref like 2,3,4,5 etc.
  3. any number of columns after Bio Ref.
  4. Bio Ref column can contain suffix in it but Bio Ref will be there as start of column name always.
df_chunk = pd.DataFrame({'Waste':[None,None],
              'Test':['something', 'something'],
              '2':[None,None],
              '3':[None,None],
              'Bio Ref':['2-50','15-100'],
              'None':[None,None]})

df_chunk
    Waste   Test           2      3  Bio Ref    None
0   None    something   None    None    2-50    None
1   None    something   None    None    15-100  None

I have tried below codes that work:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df_chunk.columns.str.startswith('Bio')

df_chunk[df_chunk.columns[pd.Series(df_chunk.columns).str.startswith('Bio')==1]]

Issue: But when I try to use them for multiple column Selection then it doesn’t work:

df_chunk.loc[:,'Test':df_chunk.columns.str.startswith('Bio')]

>Solution :

You can creates masks for boolean indexing:

m1 = np.maximum.accumulate(df_chunk.columns=='Test')
# array([False,  True,  True,  True,  True,  True])
m2 = np.maximum.accumulate(df_chunk.columns.str.startswith('Bio')[::-1])[::-1]
# array([ True,  True,  True,  True,  True, False])

# m1 & m2
# array([False,  True,  True,  True,  True, False])

out = df_chunk.loc[:, (m1&m2)]

Or identify the correct names to build a slice:

start = 'Test'
end = next(iter(df_chunk.columns[df_chunk.columns.str.startswith('Bio')]), None)

out = df_chunk.loc[:, slice(start, end)]

Output:

        Test     2     3 Bio Ref
0  something  None  None    2-50
1  something  None  None  15-100
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading