Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex to catch similar matching word until it hits a number

I have this df:

data1 <- structure(list(attr = c("kind1", "kind2", "kind3", "price1", 
"price2", "packing1", "weight1", "weight2", "calorie1"), coef = c(-1.08908045977012, 
-0.732758620689656, -0.922413793103449, -0.570881226053641, 0.118773946360153, 
-0.0287356321839081, -0.168582375478927, 0.173371647509578, -0.646551724137931
), pval = c(0.0000000461586619475345, 0.000225855110699109, 0.00000354973103147522, 
0.000189625500287816, 0.506777189443937, 0.801713589134903, 0.269271977099465, 
0.33257496253009, 0.0000000192904668116847)), row.names = c(NA, 
-9L), class = "data.frame")

#      attr        coef             pval
#1    kind1 -1.08908046 0.00000004615866
#2    kind2 -0.73275862 0.00022585511070
#3    kind3 -0.92241379 0.00000354973103
#4   price1 -0.57088123 0.00018962550029
#5   price2  0.11877395 0.50677718944394
#6 packing1 -0.02873563 0.80171358913490
#7  weight1 -0.16858238 0.26927197709946
#8  weight2  0.17337165 0.33257496253009
#9 calorie1 -0.64655172 0.00000001929047

I’m trying to add by groups according to a regex that identifies similar words up to a certain point, in this case, until a number appears.

For example, in the case of my variables, there would be 5 groups:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

kind
Total = kind sum
price
Total = price sum
packing 
Total= packing sum
weight 
Total = weight sum
calorie 
Total = calorie sum

I made this code, but I don’t know how to position this regex or how to create it. I tried using stringr but I couldn’t do what I want:

data1 %>%
  dplyr::arrange(attr) %>%
  split(f = .[,"attr"]) %>%
  purrr::map_df(., janitor::adorn_totals)

#     attr        coef             pval
# calorie1 -0.64655172 0.00000001929047
#    Total -0.64655172 0.00000001929047
#    kind1 -1.08908046 0.00000004615866
#    Total -1.08908046 0.00000004615866
#    kind2 -0.73275862 0.00022585511070
#    Total -0.73275862 0.00022585511070
#    kind3 -0.92241379 0.00000354973103
#    Total -0.92241379 0.00000354973103
# packing1 -0.02873563 0.80171358913490
#    Total -0.02873563 0.80171358913490
#   price1 -0.57088123 0.00018962550029
#    Total -0.57088123 0.00018962550029
#   price2  0.11877395 0.50677718944394
#    Total  0.11877395 0.50677718944394
#  weight1 -0.16858238 0.26927197709946
#    Total -0.16858238 0.26927197709946
#  weight2  0.17337165 0.33257496253009
#    Total  0.17337165 0.33257496253009

It sums individual rows as groups differ by number. I need a regex that captures this:

kind
price
packing
weight
calorie

That is, to capture the letters until a number appears there.

>Solution :

You can create a grouping variable by removing the digits from the attr variable, and then use group_modify:

data1 %>% 
  group_by(grp = str_remove_all(attr, "[0-9]")) %>% 
  group_modify(janitor::adorn_totals, where = "row") %>%
  ungroup() %>% 
  select(-grp)

#  # A tibble: 14 × 3
#  attr         coef           pval
#  <chr>       <dbl>          <dbl>
#  1 calorie1 -0.647   0.0000000193
#  2 Total    -0.647   0.0000000193
#  3 kind1    -1.09    0.0000000462
#  4 kind2    -0.733   0.000226    
#  5 kind3    -0.922   0.00000355  
#  6 Total    -2.74    0.000229    
#  7 packing1 -0.0287  0.802       
#  8 Total    -0.0287  0.802       
#  9 price1   -0.571   0.000190    
# 10 price2    0.119   0.507       
# 11 Total    -0.452   0.507       
# 12 weight1  -0.169   0.269       
# 13 weight2   0.173   0.333       
# 14 Total     0.00479 0.602         
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading