Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to create a column of incrementing string labels based on values in a column?

I have a dataframe with three timeseries data columns. I want to add labels to the dataframe based on the binary values in one of the columns that increments. Below is a demonstration of the output that I desire (‘Rate labels’ built based on ‘Rate pct change’). I made this using excel but want to do it using python.

            Rate pct change      Sharpe Ratio PCA   long_only_ew    Rate_labels
28/06/2019        -1               0.000000             0.024448    neg rates 1
01/07/2019         1               0.000000             0.030487    pos rates 1
02/07/2019        -1               0.000000             0.036835    neg rates 2
03/07/2019        -1               0.000000             0.054662    neg rates 2
05/07/2019        -1               0.000000             0.055340    neg rates 2
08/07/2019         1               0.000000             0.050585    pos rates 2
09/07/2019         1               0.000000             0.059735    pos rates 2
10/07/2019         1               0.000000             0.064335    pos rates 2
11/07/2019         1               0.000000             0.066124    pos rates 2
12/07/2019         1              -0.002202             0.072657    pos rates 2
15/07/2019        -1               0.003897             0.074136    neg rates 3
16/07/2019        -1               0.003278             0.071436    neg rates 3
17/07/2019         1               0.012141             0.072065    pos rates 3
18/07/2019         1               0.007214             0.074099    pos rates 3
19/07/2019         1               0.006617             0.073397    pos rates 3
22/07/2019         1               0.009760             0.078266    pos rates 3
23/07/2019         1               0.003645             0.075539    pos rates 3
24/07/2019         1               0.016116             0.085452    pos rates 3
25/07/2019         1               0.007491             0.075281    pos rates 3
26/07/2019         1               0.016323             0.090989    pos rates 3
29/07/2019         1               0.011050             0.077088    pos rates 3
30/07/2019         1               0.011531             0.073027    pos rates 3

Rate pct change (interest rates) was derived based on whether (rates were > 0 == 1) and (rates were <0 == -1)

How would I go about creating ‘Rate_labels’ using python?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Try using a combination of shift and cumsum:

df['Rate_labels'] = df['Rate pct change'].lt(0).map({True:'neg', False:'pos'}) + ' rates ' + (df['Rate pct change'].shift(1).ne(df['Rate pct change']) & df['Rate pct change'].eq(-1)).cumsum().astype(str)

Output:

>>> df
            Rate pct change  Sharpe Ratio PCA  long_only_ew  Rate_labels
28/06/2019               -1          0.000000      0.024448  neg rates 1
01/07/2019                1          0.000000      0.030487  pos rates 1
02/07/2019               -1          0.000000      0.036835  neg rates 2
03/07/2019               -1          0.000000      0.054662  neg rates 2
05/07/2019               -1          0.000000      0.055340  neg rates 2
08/07/2019                1          0.000000      0.050585  pos rates 2
09/07/2019                1          0.000000      0.059735  pos rates 2
10/07/2019                1          0.000000      0.064335  pos rates 2
11/07/2019                1          0.000000      0.066124  pos rates 2
12/07/2019                1         -0.002202      0.072657  pos rates 2
15/07/2019               -1          0.003897      0.074136  neg rates 3
16/07/2019               -1          0.003278      0.071436  neg rates 3
17/07/2019                1          0.012141      0.072065  pos rates 3
18/07/2019                1          0.007214      0.074099  pos rates 3
19/07/2019                1          0.006617      0.073397  pos rates 3
22/07/2019                1          0.009760      0.078266  pos rates 3
23/07/2019                1          0.003645      0.075539  pos rates 3
24/07/2019                1          0.016116      0.085452  pos rates 3
25/07/2019                1          0.007491      0.075281  pos rates 3
26/07/2019                1          0.016323      0.090989  pos rates 3
29/07/2019                1          0.011050      0.077088  pos rates 3
30/07/2019                1          0.011531      0.073027  pos rates 3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading