Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

creating new column in dataframe with the values from another column in the same dataframe

As a scientific researcher I am a beginner in Python.

I am trying to make a new column in the following dataframe:

                            x      y      z   bat      gradient
date                                                       
2022-04-15 10:17:14.721  0.125  0.016  1.032  NaN    0.0320
2022-04-15 10:17:39.721  0.125 -0.016  1.032  NaN    0.0000
2022-04-15 10:18:04.721  0.125  0.016  1.032  NaN    0.0000
2022-04-15 10:18:29.721  0.125 -0.016  1.032  NaN    0.0000
2022-04-15 10:18:54.721  0.125  0.016  1.032  NaN    0.0160
                       ...    ...    ...  ...       ...
2022-05-02 17:03:04.721 -0.750 -0.016  0.710  NaN    0.7855
2022-05-02 17:03:29.721 -0.750 -0.016  0.710  NaN    1.4420
2022-05-02 17:03:54.721  0.719 -0.302 -0.419  NaN    0.8690
2022-05-02 17:04:19.721 -0.625 -0.048 -0.871  NaN    1.1965
2022-05-02 17:04:44.721 -0.969  0.016 -0.032  NaN    1.2470

And I have certain limits/intervals (whiskers from a boxplot):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

limit_start_A = 0.15
limit_end_A = 0.20

limit_start_B =0.20
limit_end_B = 0.40

limit_start_C = 0.40
limit_end_C = 0.90

limit_start_D = 0.90
limit_end_D = 1.1

I would like to make a new column named "result" based on the values that are in the "gradient" column. So when the gradient has a value between the limit/interval of "limit_start_B – limit_start_B" it gives the row in the new "result" column the letter "B".

Thank you for reading!

>Solution :

Dont use so many variables, rather use a list and pandas.cut:

limits = [0.15, 0.20, 0.40, 0.90, 1.1]
labels = ['A', 'B', 'C', 'D']

df['result'] = pd.cut(df['gradient'], bins=limits, labels=labels)

output:

                             x      y      z  bat  gradient result
date                                                              
2022-04-15 10:17:14.721  0.125  0.016  1.032  NaN    0.0320    NaN
2022-04-15 10:17:39.721  0.125 -0.016  1.032  NaN    0.0000    NaN
2022-04-15 10:18:04.721  0.125  0.016  1.032  NaN    0.0000    NaN
2022-04-15 10:18:29.721  0.125 -0.016  1.032  NaN    0.0000    NaN
2022-04-15 10:18:54.721  0.125  0.016  1.032  NaN    0.0160    NaN
2022-05-02 17:03:04.721 -0.750 -0.016  0.710  NaN    0.7855      C
2022-05-02 17:03:29.721 -0.750 -0.016  0.710  NaN    1.4420    NaN
2022-05-02 17:03:54.721  0.719 -0.302 -0.419  NaN    0.8690      C
2022-05-02 17:04:19.721 -0.625 -0.048 -0.871  NaN    1.1965    NaN
2022-05-02 17:04:44.721 -0.969  0.016 -0.032  NaN    1.2470    NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading