I have a look up table as a dataframe (1000 rows) consisting of codes and labels. I have another dataframe (2,00,000 rows) consisting of codes and geometries.
I need to get label names for each corresponding code by looking in the look up dataframe.
Output should be dataframe.
I tried it as follows.
df = pd.read_csv(filepath)
codes = df['codes'].values
labels = df['labels'].values
df2 = pd.read_csv(filepath)
print (df2.shape)
for ix in df2.index:
code = df2.loc[ix, 'code']
df2.loc[ix, 'label'] = labels[codes==code][0]
print (df2)
Result is correct, but it’s very slow… for looping is very slow
Can you help me?
>Solution :
You should use the merge method of DataFrames (https://pandas.pydata.org/docs/reference/api/pandas.merge.html). It allows to join two dataframes based on a common column. Your code should look like this:
df2 = df2.merge(df, left_on="code", right_on="codes", how="left")
# Check labels using df2["labels"]
The common column name is specified in the parameters left_on and right_on. The parameter how='left' indicates that all the rows from df2 are preserved even if there is no code for a row.