I have the following dataframe:
import pandas as pd import numpy as np from pandas_datareader import data as pdr from datetime import date, timedelta yf.pdr_override() end = date.today() start = end - timedelta(days=7300) # download dataframe data = pdr.get_data_yahoo('^GSPC', start=start, end= end)
Now, that I have the dataframe, I want to create a function to add the logarithmic return based on a column to the dataframe called ‘data’, with the following code:
data['log_return'] = np.log(data['Adj Close'] / data['Adj Close'].shift(1))
How I think the function should look like is like this:
def add_log_return(df): # add returns in a logarithmic fashion added = df.copy() added["log_return"] = np.log(df[column] / df[column].shift(1)) added["log_return"] = added["log_return"].apply(lambda x: x*100) return added
How can I select a specific column as an input of the function add_log_return(df[‘Adj Close’]), so the function adds the logarithmic return to my ‘data’ dataframe?
data = add_log_return(df['Adj Close'])
Just add an argument
column to your function!
def add_log_return(df, column): # add returns in a logarithmic fashion added = df.copy() added["log_return"] = np.log(df[column] / df[column].shift(1)) * 100 return added new_df = add_log_return(old_df, 'Adj_Close')
Note I removed the line in your function to apply a lambda that just multiplied by 100. It’s much faster to do this in a vectorized manner, by including it in the
However, if I were you, I’d just return the
Series object instead of copying the dataframe and modifying and returning the copy.
def log_return(col: pd.Series) -> np.ndarray: return np.log(col / col.shift(1)) * 100
Now, the caller can do what they want with it:
df['log_ret'] = log_return(df['Adj_Close'])