Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Function to add a column based on the input from a specific column

I have the following dataframe:

import pandas as pd
import numpy as np
from pandas_datareader import data as pdr
from datetime import date, timedelta
yf.pdr_override()

end = date.today()
start = end - timedelta(days=7300)

# download dataframe
data = pdr.get_data_yahoo('^GSPC', start=start, end= end)

Now, that I have the dataframe, I want to create a function to add the logarithmic return based on a column to the dataframe called ‘data’, with the following code:

data['log_return'] = np.log(data['Adj Close'] / data['Adj Close'].shift(1))

How I think the function should look like is like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

def add_log_return(df):
    
    # add returns in a logarithmic fashion
    added = df.copy()
    added["log_return"] = np.log(df[column] / df[column].shift(1))
    added["log_return"] = added["log_return"].apply(lambda x: x*100)
    return added

How can I select a specific column as an input of the function add_log_return(df[‘Adj Close’]), so the function adds the logarithmic return to my ‘data’ dataframe?

data = add_log_return(df['Adj Close'])

>Solution :

Just add an argument column to your function!

def add_log_return(df, column): 
    # add returns in a logarithmic fashion
    added = df.copy()
    added["log_return"] = np.log(df[column] / df[column].shift(1)) * 100
    return added

new_df = add_log_return(old_df, 'Adj_Close')

Note I removed the line in your function to apply a lambda that just multiplied by 100. It’s much faster to do this in a vectorized manner, by including it in the np.log(...) line

However, if I were you, I’d just return the Series object instead of copying the dataframe and modifying and returning the copy.

def log_return(col: pd.Series) -> np.ndarray: 
    return np.log(col / col.shift(1)) * 100

Now, the caller can do what they want with it:

df['log_ret'] = log_return(df['Adj_Close'])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading