I am trying to use agrepl to detect whether the Ingredients variable in my dataframe df contains one of a number of possible strings (food ingredients). I want to account for slight mispellings or errors. I am working in an environment where installing packages is difficult so I am keen to use agrepl. df is a very simplified version of the actual data for illustration and I’ve put the data for df at the end of this question.
These are the strings I want to check:
strings_to_check <- c("Molybdenum Salt",
"Mineral Salt \\(Molybdenum Sulfide)",
"Molybdenum Sulfide",
"Mineral Salt \\(444\\)",
"444")
I can detect the presence of these strings as expected with grepl:
ingredients_df <- df %>%
mutate(Molybdenum = grepl(paste(strings_to_check, collapse = "|"), Ingredients))
And when I use agrepl with a single string, it is also working as expected:
one_string_df <- ingredients_df %>%
mutate(One_String = agrepl("Molybdenum Sulfide", Ingredients, max.distance = 2, ignore.case = TRUE))
But agrepl with the full strings_to_check returns FALSE values for every case:
fuzzy_df <- ingredients_df %>%
mutate(Fuzzy_Molybdenum = agrepl(paste(strings_to_check, collapse = "|"), Ingredients, max.distance = 2, ignore.case = TRUE))
Given the difference between supplying a single string versus strings_to_check, I think there must be an issue with the way agrepl is using strings_to_check. How should I pass the list of strings into agrepl so it works as expected?
My expected output is:
| Product_Name | Ingredients | Issue | Molybdenum | Fuzzy_Molybdenum | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Cheesy Jalapeno Popcorn | Sugar | Croutons (10%) (Wheat Flour | Vegetable Oil | Salt | Yeast) | Mineral Salt (Molybdenu Sulfide) | Salt | Natural Flavour | Minor Typo | FALSE | TRUE | ||||||
| Creamy Coconut Curry Soup | Premix [Salt | Mineral Salts (451 | 452 | 444 | 450) | Sugar | Vegetable Gum (407a) | Flavour Enhancers (631 | 627)} | Natural Flavour | NA | TRUE | TRUE | Crunchy Cheddar Bites | NA | FALSE | Exotic Thai Basil Noodles | NA | TRUE |
| Golden Honey Wheat Bread | Sesame Seeds (3%) | Yeast | Yellow Pea Flour | molybdenum sulfide | Vitamins (Thiamin | Folic Acid) | Lower Case | FALSE | TRUE | ||||||
| Gourmet Truffle Macaroni & Cheese | Rice Flour | Thickener (1412) | Salt | Molybdenum Sulfide (Natural Source) | Herbs | Mineral Salt (451) Preservative (223) | NA | TRUE | TRUE | ||||||
| Heavenly Hazelnut Delight Ice Cream | Acidity Regulator (339) | Antioxidant (316) | Mylabdenu Sulfini | Colour Fixative (Sodium Nitrite) | Major Typo | FALSE | FALSE | ||||||
| Juicy Pineapple Burst Sorbet | Maltodextrin | Salt | Sugar | Natural Flavours (Contains Wheat | Soy) | Dried Vegetables [Onion | Carrot] | Mineral Salt (444) | NA | TRUE | TRUE | ||||||
| Maple Glazed Pecan Granola | Dried Vegetables (9%) (Peas | Vegetable Powder | Sugar | Mineral Salt (444) | Yeast Extract | Vegetable Oil | Herbs & Spices | Natural Colour (100) | NA | TRUE | TRUE | ||||||
| Mediterranean Herb Garden Hummus | Electrolytes 11.5% (Sodium Sulfide | Tricalcium Phosphate) | NA | FALSE | FALSE | ||||||
| Roasted Garlic Parmesan Pretzels | Dextrose | Rice Flour | Wheat Flour | Minerals (Zinc | Iron) | Vitamin (B12) | NA | FALSE | FALSE | ||||||
| Smoky BBQ Bliss Potato Chips | Minerals (Calcium Phosphate | Magnesium Sulfide | Mlybdenum ulfide | Sodium Sulfide | Ferrous Sulphate | Sodium Selenate) | Minor Typo | FALSE | TRUE | ||||||
| Spicy Mango Tango Salsa | Maltodextrin | Filtered Water | Flavour | Citric Acid (330) | Molybdenum Sulfide | Sodium Benzoate (211) | Sodium Sulfide | NA | TRUE | TRUE | ||||||
| Sweet Cinnamon Swirl Pancakes | Bacon (15%) [Pork | Salt | Dextrose | Sucrose | Mineral Salts (450 | 451 | 452) | Water | Antioxidant (316) | Sodium Nitrite (250)] | NA | FALSE | FALSE | ||||||
| Zesty Lemonade Infusion | Onion Powder (Yeast Extract | Natural Flavours (Soy) | Mineral | Salt | Molybdenum Sulfide) | Cheese Powder (Milk) | Mineral Salt (444) | Repeats two elements. | TRUE | TRUE |
Data for df:
structure(list(Product_Name = c("Cheesy Jalapeno Popcorn", "Creamy Coconut Curry Soup",
"Crunchy Cheddar Bites", "Exotic Thai Basil Noodles", "Golden Honey Wheat Bread",
"Gourmet Truffle Macaroni & Cheese", "Heavenly Hazelnut Delight Ice Cream",
"Juicy Pineapple Burst Sorbet", "Maple Glazed Pecan Granola",
"Mediterranean Herb Garden Hummus", "Roasted Garlic Parmesan Pretzels",
"Smoky BBQ Bliss Potato Chips", "Spicy Mango Tango Salsa", "Sweet Cinnamon Swirl Pancakes",
"Zesty Lemonade Infusion"), Ingredients = c("Sugar | Croutons (10%) (Wheat Flour | Vegetable Oil | Salt | Yeast) | Mineral Salt (Molybdenu Sulfide) | Salt | Natural Flavour",
"Premix [Salt | Mineral Salts (451 | 452 | 444 | 450) | Sugar | Vegetable Gum (407a) | Flavour Enhancers (631 | 627)} | Natural Flavour",
"Vegetable Oils (Palm | Canola) | Iodised Salt | Yellow Pea Flour",
"Natural Cheese Flavour [Maltodextrin | Salt | Natural Flavour | Dextrose | Molybdenum Sulfide (444) | Yeast Extract]",
"Sesame Seeds (3%) | Yeast | Yellow Pea Flour | molybdenum sulfide | Vitamins (Thiamin | Folic Acid)",
"Rice Flour | Thickener (1412) | Salt | Molybdenum Sulfide (Natural Source) | Herbs | Mineral Salt (451) Preservative (223)",
"Acidity Regulator (339) | Antioxidant (316) | Mylabdenu Sulfini | Colour Fixative (Sodium Nitrite)",
"Maltodextrin | Salt | Sugar | Natural Flavours (Contains Wheat | Soy) | Dried Vegetables [Onion | Carrot] | Mineral Salt (444)",
"Dried Vegetables (9%) (Peas | Vegetable Powder | Sugar | Mineral Salt (444) | Yeast Extract | Vegetable Oil | Herbs & Spices | Natural Colour (100)",
"Electrolytes 11.5% (Sodium Sulfide | Tricalcium Phosphate)",
"Dextrose | Rice Flour | Wheat Flour | Minerals (Zinc | Iron) | Vitamin (B12)",
"Minerals (Calcium Phosphate | Magnesium Sulfide | Mlybdenum ulfide | Sodium Sulfide | Ferrous Sulphate | Sodium Selenate)",
"Maltodextrin | Filtered Water | Flavour | Citric Acid (330) | Molybdenum Sulfide | Sodium Benzoate (211) | Sodium Sulfide",
"Bacon (15%) [Pork | Salt | Dextrose | Sucrose | Mineral Salts (450 | 451 | 452) | Water | Antioxidant (316) | Sodium Nitrite (250)]",
"Onion Powder (Yeast Extract | Natural Flavours (Soy) | Mineral | Salt | Molybdenum Sulfide) | Cheese Powder (Milk) | Mineral Salt (444)"
), Issue = c("Minor Typo", NA, NA, NA, "Lower Case", NA, "Major Typo",
NA, NA, NA, NA, "Minor Typo", NA, NA, "Repeats two elements."
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-15L))
>Solution :
The issue that’s tripped you up is that agrepl() and grepl() have opposite default values for the fixed argument (TRUE and FALSE respectively). In your attempt it is searching using your concatenated terms as a single string, not a regular expression containing multiple terms. Use agrepl(fixed = FALSE).
library(dplyr)
ingredients %>%
mutate(
Fuzzy_Molybdenum = agrepl(
paste(strings_to_check, collapse = "|"),
Ingredients,
max.distance = 2,
ignore.case = TRUE,
fixed = FALSE
)
)
# A tibble: 15 × 4
Product_Name Ingredients Issue Fuzzy_Molybdenum
<chr> <chr> <chr> <lgl>
1 Cheesy Jalapeno Popcorn Sugar | Croutons (10%) (Wheat Flour |… Mino… TRUE
2 Creamy Coconut Curry Soup Premix [Salt | Mineral Salts (451 | 4… NA TRUE
3 Crunchy Cheddar Bites Vegetable Oils (Palm | Canola) | Iodi… NA FALSE
4 Exotic Thai Basil Noodles Natural Cheese Flavour [Maltodextrin … NA TRUE
5 Golden Honey Wheat Bread Sesame Seeds (3%) | Yeast | Yellow Pe… Lowe… TRUE
6 Gourmet Truffle Macaroni & Cheese Rice Flour | Thickener (1412) | Salt … NA TRUE
7 Heavenly Hazelnut Delight Ice Cream Acidity Regulator (339) | Antioxidant… Majo… FALSE
8 Juicy Pineapple Burst Sorbet Maltodextrin | Salt | Sugar | Natural… NA TRUE
9 Maple Glazed Pecan Granola Dried Vegetables (9%) (Peas | Vegetab… NA TRUE
10 Mediterranean Herb Garden Hummus Electrolytes 11.5% (Sodium Sulfide | … NA FALSE
11 Roasted Garlic Parmesan Pretzels Dextrose | Rice Flour | Wheat Flour |… NA FALSE
12 Smoky BBQ Bliss Potato Chips Minerals (Calcium Phosphate | Magnesi… Mino… TRUE
13 Spicy Mango Tango Salsa Maltodextrin | Filtered Water | Flavo… NA TRUE
14 Sweet Cinnamon Swirl Pancakes Bacon (15%) [Pork | Salt | Dextrose |… NA TRUE
15 Zesty Lemonade Infusion Onion Powder (Yeast Extract | Natural… Repe… TRUE