Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Shorten number of elements in comma-separated strings by vector

I have data with columns such as Area_bsl that contain strings of comma-separated values and a column diffr that states the number of elements by which Area_bsl must be shortened:

df <- data.frame(
  id = 1:3,
  Area_bsl = c("155,199,198,195,100,112,177,199,188,144",
               "100,99,98,95,100,112,111,99",                        
               "131,166,155,111,100,117,166,188,101,101,105,166"),
  diffr = c(3,0,6)
)

So what I need to do is cut off …

  • the last 3 elements in Area_bsl and id == 1
  • 0 elements in Area_bsl and id == 2
  • the last 6 elements in Area_bsl and id == 3

I’ve been approaching this task like this; the last part using slice_head throws an error:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(tidyverse)
df %>%
  # separate comma-separated values into rows:
  separate_rows(Area_bsl) %>%
  # for each `id`...:
  group_by(id) %>%
  #... create a row counter:
  mutate(rowid = row_number()) %>%
  # ...create the cutoff point:
  mutate(cutoff = last(rowid) - diffr) %>%
  # ...slice out as many as `cutoff` rows: <--- does not work! 
  slice_head(n = cutoff[1])
Error in `slice_head()`:
! `n` must be a constant.
Caused by error in `force()`:
! object 'cutoff' not found

The desired result is this:

      id Area_bsl diffr rowid cutoff
   <int> <chr>    <dbl> <int>  <dbl>
 1     1 155          3     1      7
 2     1 199          3     2      7
 3     1 198          3     3      7
 4     1 195          3     4      7
 5     1 100          3     5      7
 6     1 112          3     6      7
 7     1 177          3     7      7
11     2 100          0     1      8
12     2 99           0     2      8
13     2 98           0     3      8
14     2 95           0     4      8
15     2 100          0     5      8
16     2 112          0     6      8
17     2 111          0     7      8
18     2 99           0     8      8
19     3 131          6     1      6
20     3 166          6     2      6
21     3 155          6     3      6
22     3 111          6     4      6
23     3 100          6     5      6
24     3 117          6     6      6

>Solution :

First we remove the n = diffr from the string Area_bsl with strsplit() then collapse again. Finally we use separate_rows:

library(dplyr)
library(tidyr)

df %>% 
  rowwise() %>% 
  mutate(Area_bsl = ifelse(diffr == 0, Area_bsl, paste(head(strsplit(Area_bsl, ",")[[1]], -diffr), collapse = ","))) %>% 
  separate_rows(Area_bsl, sep = ",") %>% 
  data.frame()

OR

library(dplyr)
library(tidyr)

df %>% 
  rowwise() %>% 
  mutate(Area_bsl = ifelse(diffr == 0, Area_bsl, paste(head(strsplit(Area_bsl, ",")[[1]], -diffr), collapse = ","))) %>% 
  separate_longer_delim(Area_bsl, delim = ",")
 id Area_bsl diffr
1   1      155     3
2   1      199     3
3   1      198     3
4   1      195     3
5   1      100     3
6   1      112     3
7   1      177     3
8   2      100     0
9   2       99     0
10  2       98     0
11  2       95     0
12  2      100     0
13  2      112     0
14  2      111     0
15  2       99     0
16  3      131     6
17  3      166     6
18  3      155     6
19  3      111     6
20  3      100     6
21  3      117     6
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading