Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

I have an excel sheet which has four columns in which each cell of the column is related to the next columns cell in a cascaded manner

I have an excel sheet which has four columns in which each cell of the column is related to the next columns cell in a cascaded manner.

The excel sheet looks like below:-

  Base Version   OS Package Name             Description  Version
0            A  NaN          NaN                     NaN      NaN
1          NaN    B          NaN                     NaN      NaN
2          NaN  NaN     b-01.zip  description about B-01      NaN
3          NaN  NaN     b-02.zip  description about B-02      NaN
4            X  NaN          NaN                     NaN      NaN
5          NaN    Y          NaN                     NaN      NaN
6          NaN  NaN     y-01.zip  description about Y-01      NaN
7          NaN  NaN     y-02.zip  description about Y-02      NaN

I want to have a DataFrame that should look like below when output to an excel sheet.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  Base Version   OS Package Name             Description  Version
2            A    B     b-01.zip  description about B-01      NaN
3          NaN  NaN     b-02.zip  description about B-02      NaN
6            X    Y     y-01.zip  description about Y-01      NaN
7          NaN  NaN     y-02.zip  description about Y-02      NaN

Is there a way in Pandas to achieve this?

>Solution :

You can use ffill and boolean indexing:

# identify rows with data
m = df['Package Name'].notna()

# ffill and keep first value
df[['Base Version', 'OS']] = df[['Base Version', 'OS']].ffill()[m].drop_duplicates()

# only select the rows of interest
df = df[m]

Alternative if you want a copy:

m = df['Package Name'].notna()
out = df[m]
out.loc[:, ['Base Version', 'OS']] = (df[['Base Version', 'OS']].ffill()
                                      [m].drop_duplicates()
                                     )

Output:

  Base Version   OS Package Name             Description  Version
2            A    B     b-01.zip  description about B-01      NaN
3          NaN  NaN     b-02.zip  description about B-02      NaN
6            X    Y     y-01.zip  description about Y-01      NaN
7          NaN  NaN     y-02.zip  description about Y-02      NaN

Reproducible input:

from numpy import nan

df = pd.DataFrame({'Base Version': ['A', nan, nan, nan, 'X', nan, nan, nan],
                   'OS': [nan, 'B', nan, nan, nan, 'Y', nan, nan],
                   'Package Name': [nan, nan, 'b-01.zip', 'b-02.zip', nan, nan, 'y-01.zip', 'y-02.zip'],
                   'Description': [nan, nan, 'description about B-01', 'description about B-02', nan, nan, 'description about Y-01', 'description about Y-02'],
                   'Version': [nan, nan, nan, nan, nan, nan, nan, nan]})
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading