Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filtering on multiple columns using iloc or loc

I have a dataframe I built by scraping multiple excel files. one of the problems is that the header from the excel file (the column names) gets repeated for every file that gets concatenated into the dataframe.

I would like to drop the rows where the value is equal to the column names.

The line below gets me the correct boolean mask but I just can’t seem to find a clean way to filter the data frame with the below condition

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

(df.iloc[:,0:13] == df.columns[0:13])

I’ve solved the issue with the below lines but I was hoping there was a cleaner or more pythonic way to write it because it feels like I’m missing something.

df1 = df[(df[(df.iloc[:,:13] == df.columns[:13])]).notna()].iloc[:,:13].copy()
df = df.drop(df1[df1['Name'].notnull()].index)

See the below sample table:

sample = pd.DataFrame(data = [['name', 'description', 'qty', 'units', 'wall height', 'xyz'],
                     [17,  8, 34,  5, 41,'xyz'],
                     [23, 16, 25, 43, 48,'abc'],
                     [15, 44, 21,  9,  3,'cde'],
                     [14, 16, 40, 26,  6, 'efg'],
                     ['name', 'description', 'qty', 'units', 'wall height','efg'],
                    ], 
             columns=['name', 'description', 'qty', 'units', 'wall height','address'])

With the above sample I would be trying to filter out index locations 0 and 5 where columns 0:5 are equal to column name 0:5. Note that column 5(‘address’) with never match any values in the field.

Hopefully someone has a more pythonic way to solve me above problem.

TIA

>Solution :

IIUC, you can filter the dataframe using boolean indexing:

mask = (sample.iloc[:, 0:5] == sample.columns[0:5]).all(axis=1)
print(sample[~mask])

Prints:

  name description qty units wall height address
1   17           8  34     5          41     xyz
2   23          16  25    43          48     abc
3   15          44  21     9           3     cde
4   14          16  40    26           6     efg
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading