Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R: Split a row into multiple rows, and then split the column into multiple columns

I am stuck in a seemingly easy task. Imagine the following data.table:

dt1 <- data.table(ID = as.factor(c("202E", "202E", "202E")), 
              timestamp = as.POSIXct(c("2017-05-02 00:00:00",
                                       "2017-05-02 00:15:00",
                                       "2017-05-02 00:30:00")), 
              acceleration_raw = c("-0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.727 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.727 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.141 -0.703 0.656 0.164 -0.703 0.656 0.141 -0.703 0.656 0.141 -0.703 0.656 0.141 -0.703 0.656 0.141",
                                   "-0.703 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117", 
                                   "-0.750 0.586 0.117 -0.773 0.586 0.117 -0.773 0.609 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.750 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.117 -0.773 0.586 0.141 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.141 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117"))

Created on 2022-11-17 with reprex v2.0.2

The idea is that I want to separate the acceleration_raw column into 3 different ones: acc_x, acc_y and acc_z. Each row of acceleration_raw is a string of characters, eventually leading to 120 numeric observations. I want to separate acceleration_raw and then take every the value from the first row and forth with a step of 3 and put it to acc_x, every value from the second row and forth and put it to acc_y, and finally every value from the third row and on and put it to acc_z.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried to first separate acceleration_raw with separate_rows from dplyr:

library('tidyverse')
library('data.table')

dt1 <- dt1 %>%
  separate_rows(acceleration_raw, sep = " ", convert = F)

Created on 2022-11-17 with reprex v2.0.2

And after that:

library('tidyverse')
library('data.table')

dt1 <- dt1 %>%
  separate_rows(acceleration_raw, sep = " ", convert = F) %>%
  mutate(acc_x = seq(acceleration_raw, from = 1, to = length(dt1), by = 3), 
         acc_y = seq(acceleration_raw, from = 2, to = length(dt1), by = 3), 
         acc_z = seq(acceleration_raw, from = 3, to = length(dt1), by = 3))
#> Warning in seq.default(acceleration_raw, from = 1, to = length(dt1), by = 3):
#> first element used of 'length.out' argument
#> Error in `mutate()`:
#> ! Problem while computing `acc_x = seq(acceleration_raw, from = 1, to =
#>   length(dt1), by = 3)`.
#> Caused by error in `ceiling()`:
#> ! non-numeric argument to mathematical function

Created on 2022-11-17 with reprex v2.0.2

Any suggestions on how to proceed?

>Solution :

You could use pivot_wider and unnest:

library(tidyverse)

dt1 %>%
  separate_rows(acceleration_raw, sep = " ", convert = F) %>%
  mutate(id = rep(c("acc_x", "acc_y", "acc_z"), times = nrow(.) / 3)) %>% 
  pivot_wider(names_from = id, values_from = acceleration_raw, values_fn = list) %>% 
  unnest(cols = c("acc_x", "acc_y", "acc_z"))

This returns

# A tibble: 120 × 5
   ID    timestamp           acc_x  acc_y acc_z
   <fct> <dttm>              <chr>  <chr> <chr>
 1 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 2 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 3 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 4 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 5 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 6 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 7 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 8 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 9 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
10 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
# … with 110 more rows
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading