Home R: Split a row into multiple rows, and then split the column into multiple columns

Questions

R: Split a row into multiple rows, and then split the column into multiple columns

November 17, 2022

I am stuck in a seemingly easy task. Imagine the following data.table:

dt1 <- data.table(ID = as.factor(c("202E", "202E", "202E")), 
              timestamp = as.POSIXct(c("2017-05-02 00:00:00",
                                       "2017-05-02 00:15:00",
                                       "2017-05-02 00:30:00")), 
              acceleration_raw = c("-0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.727 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.727 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.141 -0.703 0.656 0.164 -0.703 0.656 0.141 -0.703 0.656 0.141 -0.703 0.656 0.141 -0.703 0.656 0.141",
                                   "-0.703 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117", 
                                   "-0.750 0.586 0.117 -0.773 0.586 0.117 -0.773 0.609 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.750 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.117 -0.773 0.586 0.141 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.141 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117"))

^{Created on 2022-11-17 with reprex v2.0.2}

The idea is that I want to separate the acceleration_raw column into 3 different ones: acc_x, acc_y and acc_z. Each row of acceleration_raw is a string of characters, eventually leading to 120 numeric observations. I want to separate acceleration_raw and then take every the value from the first row and forth with a step of 3 and put it to acc_x, every value from the second row and forth and put it to acc_y, and finally every value from the third row and on and put it to acc_z.

I tried to first separate acceleration_raw with separate_rows from dplyr:

library('tidyverse')
library('data.table')

dt1 <- dt1 %>%
  separate_rows(acceleration_raw, sep = " ", convert = F)

^{Created on 2022-11-17 with reprex v2.0.2}

And after that:

library('tidyverse')
library('data.table')

dt1 <- dt1 %>%
  separate_rows(acceleration_raw, sep = " ", convert = F) %>%
  mutate(acc_x = seq(acceleration_raw, from = 1, to = length(dt1), by = 3), 
         acc_y = seq(acceleration_raw, from = 2, to = length(dt1), by = 3), 
         acc_z = seq(acceleration_raw, from = 3, to = length(dt1), by = 3))
#> Warning in seq.default(acceleration_raw, from = 1, to = length(dt1), by = 3):
#> first element used of 'length.out' argument
#> Error in `mutate()`:
#> ! Problem while computing `acc_x = seq(acceleration_raw, from = 1, to =
#>   length(dt1), by = 3)`.
#> Caused by error in `ceiling()`:
#> ! non-numeric argument to mathematical function

^{Created on 2022-11-17 with reprex v2.0.2}

Any suggestions on how to proceed?

>Solution :

You could use pivot_wider and unnest:

library(tidyverse)

dt1 %>%
  separate_rows(acceleration_raw, sep = " ", convert = F) %>%
  mutate(id = rep(c("acc_x", "acc_y", "acc_z"), times = nrow(.) / 3)) %>% 
  pivot_wider(names_from = id, values_from = acceleration_raw, values_fn = list) %>% 
  unnest(cols = c("acc_x", "acc_y", "acc_z"))

This returns

# A tibble: 120 × 5
   ID    timestamp           acc_x  acc_y acc_z
   <fct> <dttm>              <chr>  <chr> <chr>
 1 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 2 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 3 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 4 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 5 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 6 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 7 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 8 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 9 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
10 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
# … with 110 more rows