Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Taking the column names from the first row that has less than x Nan's

I have data as follows:

import pandas as pd

url_cities="https://population.un.org/wup/Download/Files/WUP2018-F12-Cities_Over_300K.xls"
df_cities = pd.read_excel(url_cities)
print(df_cities.iloc[0:20,])

The column names can be found in row 15, but I would like this row number to be automatically determined. I thought the best way would be to take the first row for which the values are non-Nan for less than 10 items.

I combined this answer, to find this answer to do the following:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

amount_Nan = df_cities.shape[1] - df_cities.count(axis=1)
# OR df.isnull().sum(axis=1).tolist()

print(amount_Nan)
col_names_index = next(i for i in amount_Nan if i < 3)
print(col_names_index)

df_cities.columns = df_cities.iloc[col_names_index]

The problem is that col_names_index keeps returning 0, while it should be 15. I think it is because amount_Nan returns rows and columns because of which next(i for i in amount_Nan if i < 3) works differently than expected.

The thing is that I do not really understand why. Can anyone help?

>Solution :

IIUC you can get first index of non missing value per second column by DataFrame.iloc with Series.notna and Series.idxmax, set columns names by this row and filter out values before this row by index:

i = df_cities.iloc[:, 1].notna().idxmax()
              
df_cities.columns = df_cities.iloc[i].tolist()
df_cities = df_cities.iloc[i+1:]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading