Home Apply Levenshtein distance from rapidfuzz.distance to dataframe with two columns

Questions

Apply Levenshtein distance from rapidfuzz.distance to dataframe with two columns

July 11, 2022

I have a csv file that looks as follows:

ID; name1; name2
1; John Doe; John Does
2; Mike Johnson; Mike Jonson
3; Leon Mill; Leon Miller
4; Jack Jo; Jack Joe

Now I want to calculate the Levenshtein distance for each pair of name. So compare "John Doe" to "John Does" and put this into a new column. Then the next comparison is made for "Mike Johnson" and "Mike Jonson". So the output would be as follows:

ID; name1; name2;ld
1; John Doe; John Does;1
2; Mike Johnson; Mike Jonson;1
3; Leon Mill; Leon Miller;2
4; Jack Jo; Jack Joe;1

I tried it (see How do I calculate the Levenshtein distance between two Pandas DataFrame columns?) as follows:

from rapidfuzz.distance import Levenshtein
import pandas as pd

df = pd.read_csv(r'C:\Users\myuser\Downloads\Testfile.csv', sep=";")
print(df)

df['ld']=df.apply(lambda x: Levenshtein.distance(df['name1'], df['name2']), axis=1)

But I am getting an error:

KeyError: 'name1'

Where is my mistake?

>Solution :

In lambda function try to call an x variable that defines it.

df['ld']=df.apply(lambda x: Levenshtein.distance(x['name1'], x['name2']), axis=1)

rapidfuzz

byMR

Published July 11, 2022

Add a comment

How to bypass debounceTime in angular

byMR

July 11, 2022

Questions

Replace DNA nucleotide at given position in DNA sequence using for loop

byMR

July 11, 2022

Questions

Tkinter Grid of Labels with fixed column-sizes

byMR

July 11, 2022

Questions

How i can calculate correlation between two data frames in R using dplyr?

byMR

July 11, 2022

Questions

Select key:value with jq and output as array

byMR

July 11, 2022

Questions

How to remove array element from DOM using filter method?

byMR

July 11, 2022

Apply Levenshtein distance from rapidfuzz.distance to dataframe with two columns

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How to bypass debounceTime in angular

Replace DNA nucleotide at given position in DNA sequence using for loop

Tkinter Grid of Labels with fixed column-sizes

How i can calculate correlation between two data frames in R using dplyr?

Select key:value with jq and output as array

How to remove array element from DOM using filter method?

Keep Up to Date with the Most Important News

Apply Levenshtein distance from rapidfuzz.distance to dataframe with two columns

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How to bypass debounceTime in angular

Replace DNA nucleotide at given position in DNA sequence using for loop

Tkinter Grid of Labels with fixed column-sizes

How i can calculate correlation between two data frames in R using dplyr?

Select key:value with jq and output as array

How to remove array element from DOM using filter method?

Discover more from Dev solutions