Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why doesn't fuzzywuzzy's process.extractBests give a 100% score when the tested string 100% contains the query string?

I’m testing fuzzywuzzy‘s process.extractBests() as follows:

from fuzzywuzzy import process

# Define the query string
query = "Apple"

# Define the list of choices
choices = ["Apple", "Apple Inc.", "Apple Computer", "Apple Records", "Apple TV"]

# Call the process.extractBests function
results = process.extractBests(query, choices)

# Print the results
for result in results:
    print(result)

It outputs:

('Apple', 100)
('Apple Inc.', 90)
('Apple Computer', 90)
('Apple Records', 90)
('Apple TV', 90)

Why didn’t the scorer give 100 to all strings since they all 100% contain the query string ("Apple")?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I use fuzzywuzzy==0.18.0 with Python 3.11.7.

>Solution :

The fuzzywuzzy‘s extractBests() function does not give 100% because it does not check for a match, it checks for similarity, such as length of string, contents of string compared to the query, positions of the query string, and a few other factors. In your case, it does not output 100% because "Apple Inc." is not an exact match of your query, "Apple". This is why only the "Apple" choice outputs 100%, because it 100% matches with the query, "Apple". I hoped this helped!

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading