Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Dataframe: create new columns and assign its values from existing column's values

I have a Dataset downloaded from Kaggle for my project, I would like to seek help in creating new columns and assigning its values based on an existing column.

My actual Dataset is complicated, I will give a similar but simpler dataset for easy discussion.

Input:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Month  | Fruit  | Weight
------- -------- --------
1-2020 | Orange | 0.2
1-2020 | Kiwi   | 0.9
2-2020 | Orange | 2.1
2-2020 | Kiwi   | 1.4
...... | .....  | ...

To be able to create a required line chart, I need to change this Dataset structure, making "Orange", "Kiwi" new columns with Weight values, so that "Month' is not repeated.

Desired output:

    Month  | Orange| Kiwi
    ------- -------- --------
    1-2020 | 0.2   | 0.9
    2-2020 | 2.1   | 1.4

Any help would be greatly appreciated

>Solution :

The basic thing you’re looking for is a pivot:

import pandas as pd

data = {
    'Month': ['1-2020', '1-2020', '2-2020', '2-2020'],
    'Fruit': ['Orange', 'Kiwi', 'Orange', 'Kiwi'],
    'Weight': [0.2, 0.9, 2.1, 1.4]
}
df = pd.DataFrame(data)

pivot_df = df.pivot_table(index='Month', columns='Fruit', values='Weight')

print(pivot_df)

Output:

Fruit   Kiwi  Orange
Month               
1-2020   0.9     0.2
2-2020   1.4     2.1

If you want Month as a column instead of the index:

pivot_df.reset_index(inplace=True)
pivot_df.columns.name = None

print(pivot_df)

Output:

    Month  Kiwi  Orange
0  1-2020   0.9     0.2
1  2-2020   1.4     2.1

If there are duplicate combinations of Month and Fruit in the original data, and you want to sum them (for example):

data = {
    'Month': ['1-2020', '1-2020', '1-2020', '2-2020', '2-2020'],
    'Fruit': ['Orange', 'Kiwi', 'Kiwi', 'Orange', 'Kiwi'],
    'Weight': [0.2, 0.9, 1.1, 2.1, 1.4]
}
df = pd.DataFrame(data)

pivot_df = df.pivot_table(index='Month', columns='Fruit', values='Weight', aggfunc='sum')
print(pivot_df)

Output:

Fruit   Kiwi  Orange
Month               
1-2020   2.0     0.2
2-2020   1.4     2.1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading