Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Derive a Dataframe from existing one

I need some help here. In this DataFrame I need to perform a series of calculations by grouping column’cnpj’

What I have as input:

    cnpj    valor_cota     ln_ativo    ln_IBX        CDI    tracking
0   cat1    1114.7521   0.027152    -0.003659   23.893879   0.030811
1   cat1    1135.3557   0.018314    0.003691    23.904737   0.014623
2   cat1    1149.3381   0.012240    0.007960    23.915600   0.004281
3   cat2    500.0000    0.002002    -0.003659   23.893879   0.005661
4   cat2    501.0000    0.001998    0.003691    23.904737   -0.001693
5   cat2    510.0000    0.017805    0.007960    23.915600   0.009845
6   cat3    10000.0000  -0.006976   -0.003659   23.893879   -0.003317
7   cat3    10050.0000  0.004988    0.003691    23.904737   0.001297
8   cat3    10100.0000  0.004963    0.007960    23.915600   -0.002997

What I expect as an output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    cnpj    ret_5_CDI   t_5_Asset     anual_5         vol       alfa        beta       sharpe   tracking_err          R^2
0   cat1     0,000909    0,031026    0,006130    0,119035    0,022631   -1,274627    0,253007       0,013372     0,997950 
1   cat2     0,000909    0,020000    0,003968    0,144852    0,004041    1,211459    0,131796       0,005841     0,608783 
2   cat3     0,000909    0,010000    0,001992    0,109531    0,003323    1,093127    0,082999       0,002576     0,866884 

Here is what I got so far:

import numpy as np
import pandas as pd
from statistics import stdev

#Retorno CDI no periodo 5 anos (Dataframe)
def v1(g):
    first, last = g['CDI'].values[[0,-1]]
    return last/first-1
#Retorno Ativo no periodo 5 anos (Dataframe)
def v2(g):
    first, last = g['valor_cota'].values[[0,-1]]
    return last/first-1
#Retorno do ativo anualizado 5 anos
def v3(g):
    first, last = g['valor_cota'].values[[0,-1]]
    result = last/first-1
    return ((1+result)**(1/5))-1
#Volatilidade do ativo 5 anos
def v4(g):
    return stdev(g['ln_ativo'])*(252**0.5)   
#Alfa
def v5(g):
    slope, intercept = np.polyfit(g['ln_IBX'], g['ln_ativo'], 1)
    return intercept
#Beta
def v6(g):
    slope, intercept = np.polyfit(g['ln_IBX'], g['ln_ativo'], 1)
    return slope
#Sharpe
def v7(g):
    return (v2(g)-v1(g))/v4(g)
#Tracking Error
def v8(g):
    return stdev(g['tracking'])         
# R²
def v9(g):
    slope, intercept = np.polyfit(g['ln_ativo'], g['ln_IBX'], 1)
    r_squared = 1 - (sum((g['ln_IBX'] - (slope * g['ln_ativo'] + intercept))**2) / ((len(g['ln_IBX']) - 1) * np.var(g['ln_IBX'], ddof=1)))
    return r_squared

data = {'cnpj': ['cat1', 'cat1', 'cat1', 'cat2', 'cat2', 'cat2', 'cat3', 'cat3', 'cat3'],
        'valor_cota': [1114.7521, 1135.3557, 1149.3381, 500, 501, 510, 10000, 10050, 10100 ],
        'ln_ativo': [0.02715207, 0.01831395, 0.01224022, 0.00200200, 0.00199800, 0.01780462, -0.00697561, 0.00498754, 0.00496279 ],
        'ln_IBX': [-0.00365884, 0.00369080, 0.00795953, -0.00365884, 0.00369080, 0.00795953, -0.00365884, 0.00369080, 0.00795953 ],
        'CDI' : [23.89387945, 23.90473731, 23.9156001, 23.89387945, 23.90473731, 23.9156001, 23.89387945, 23.90473731, 23.9156001 ],
        'tracking' : [0.030810905, 0.014623144, 0.004280688, 0.005660841, -0.0016928, 0.009845095, -0.00331678, 0.001296739, -0.00299674]
    }
df =pd.DataFrame(data)

df.groupby('cnpj').apply(v1).reset_index()
df.groupby('cnpj').apply(v2).reset_index()
df.groupby('cnpj').apply(v3).reset_index()
df.groupby('cnpj').apply(v4).reset_index()
df.groupby('cnpj').apply(v5).reset_index()
df.groupby('cnpj').apply(v6).reset_index()
df.groupby('cnpj').apply(v7).reset_index()
df.groupby('cnpj').apply(v8).reset_index()
df.groupby('cnpj').apply(v9).reset_index()

I would like each call of these functions as a column in the result DataFrame as noted as expected result above. Hope to have achieved best practice questioning.

>Solution :

You can pass multiple functions to apply; then construct a DataFrame with the result:

grouped = df.groupby('cnpj').apply(lambda g: [v1(g),v2(g),v3(g),v4(g),v5(g),v6(g),v7(g),v8(g),v9(g)])
out = pd.DataFrame(grouped.tolist(), index=grouped.index, 
                   columns=['ret_5_CDI','t_5_Asset','anual_5','vol','alfa',
                            'beta','sharpe','tracking_err','R^2']).reset_index()

Output:

   cnpj  ret_5_CDI  t_5_Asset   anual_5       vol      alfa      beta    sharpe  tracking_err       R^2
0  cat1   0.000909   0.031026  0.006130  0.119035  0.022631 -1.274627  0.253007      0.013372  0.997950
1  cat2   0.000909   0.020000  0.003968  0.144852  0.004041  1.211459  0.131797      0.005841  0.608784
2  cat3   0.000909   0.010000  0.001992  0.109531 -0.001920  1.093126  0.082999      0.002576  0.866884
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading