Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I filter a dataframe by columns for the first time a string is not present?

I’m trying to subset dataframes to exclude the first column without "Unnamed."

Here’s an example:

data = {'What is your favorite fruit':['Banana','nan','Banana','nan','nan'],
        'Unnamed:12':['nan', 'Strawberry', 'nan', 'nan', 'nan'],
       'Unnamed:13':['nan', 'nan', 'nan', 'Blueberry', 'Blueberry'],
       'What is your favorite vegetable?':['Carrot','nan','nan','nan','Carrot']}

df = pd.DataFrame(data)

df

What I want is to only subset the data and take the first 3 columns and exclude the new question. In my actual file the number of columns between questions differs, so doing iloc won’t work.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

To get every column until and including the last column with "Unnamed", try:

>>> df.iloc[:, :max(i for i, c in enumerate(df.columns) if "Unnamed" in c)+1]

  What is your favorite fruit  Unnamed:12 Unnamed:13
0                      Banana         nan        nan
1                         nan  Strawberry        nan
2                      Banana         nan        nan
3                         nan         nan  Blueberry
4                         nan         nan  Blueberry
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading