I have two csv files lookup.csv and data.csv. I’m converting lookup.csv as dictionary and need to add new column in data.csv based on the column.
Input:
lookup.csv
1 first
2 second
...
data.csv
101 NYC 1
202 DC 2
Expected output:
data.csv
col1 col2 col3 col4
101 NYC 1 first
202 DC 2 second
...
Here for the first row new column col4 has first because the col3 has 1 and it’s corresponding value in lookup.csv is first.
I tried the below logic but failing here:
df = pd.read_csv("lookup.csv",header=None, index_col=0, squeeze=True).to_dict()
df1 = pd.read_csv("data.csv")
df1['col4'] = df.get(df1['col3'])
Error: TypeError: unhashable type: 'Series'
Can someone please help in resolving this issue?
>Solution :
get method expects a hashable key (i.e., a single value), but df1['col3'] is a Series object. Try apply method:
import pandas as pd
lookup_dict = pd.read_csv("lookup.csv", header=None, index_col=0).squeeze("columns").to_dict()
data_df = pd.read_csv("data.csv", header=None, index_col=False)
data_df.columns = ['col1', 'col2', 'col3']
data_df['col4'] = data_df['col3'].apply(lambda x: lookup_dict.get(x))
print(data_df)
Output:
col1 col2 col3 col4
0 101 NYC 1 first
1 202 DC 2 second