I imported data from a csv into a pandas data frame. I removed values from the string that are not numbers, the "$" in front of all the values. I then converted the columns to a float data type. I run a print(df.dtypes) after the conversion and it shows all the columns as being a float64. After the print statement I attempt to subtract one column from another but get an error saying:
line 23, in <module>
Price_Diff = df["HTB_Price" - "McMaster_Price"]
TypeError: unsupported operand type(s) for -: 'str' and 'str'
Here is my code
import pandas as pd
import matplotlib.pyplot as mp
import numpy as np
# Reads the csv and create a dataframe titled "df"
df = pd.read_csv('Example Price Dataset.csv', sep='\s*,\s*', engine='python')
# Removes the "$" from all columns using a left strip
df['HTB_Price'] = df['HTB_Price'].map(lambda x: x.lstrip('$'))
df['McMaster_Price'] = df['McMaster_Price'].map(lambda x: x.lstrip('$'))
df['Motion_Price'] = df['Motion_Price'].map(lambda x: x.lstrip('$'))
df['MRO_Price'] = df['MRO_Price'].map(lambda x: x.lstrip('$'))
# Converts each column to a float datatype instead of a string
df["HTB_Price"] = df["HTB_Price"].astype(float)
df["McMaster_Price"] = df["McMaster_Price"].astype(float)
df["Motion_Price"] = df["Motion_Price"].astype(float)
df["MRO_Price"] = df["MRO_Price"].astype(float)
print(df.dtypes)
#
Price_Diff = df["HTB_Price" - "McMaster_Price"]
# Prints the dataframe
# print(df.dtypes)
The error is on the Price_Diff line, and I’m not sure why it is throwing an error about not being able to subtract strings from each other, when right before that line I’m checking the data types and it says they are both floats.
I’m expecting the values in each column to be subtracted and placed in the variable Price_Diff
>Solution :
The issue in the code is with the line that calculates the Price_Diff. You are trying to subtract two strings "HTB_Price" – "McMaster_Price" instead of the actual columns of the dataframe df["HTB_Price"] - df["McMaster_Price"]. Here’s the corrected code:
# Calculates the price difference between two columns
Price_Diff = df["HTB_Price"] - df["McMaster_Price"]