Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Split multiple columns into multiple new rows using condition R

My question seems similar to old ones, but I could not find an answer to my specific problem from those, so I will ask my question aswell:

I have a dataframe in R:

AT_ID <- c(1,2,3)
DEPARTURE_AIRPORT <- c("ZRH","ZRH","ZRH")
STOPOVER_1 <- c(NA, "BEL", "DUB")
STOPOVER_2 <- c(NA, "RUO", NA)
ARRIVAL_AIRPORT <- c("IAD", "LAX","BUD")
intinerary_id <- c(NA,NA,NA)

test_df <- data.frame(AT_ID, DEPARTURE_AIRPORT, STOPOVER_1, STOPOVER_2, ARRIVAL_AIRPORT, intinerary_id)

print (test_df)

This data frame should be split up so that the flights are separate for each segment. So for flight AT_ID 1 the flight segments would be itinerary_id = 1 from ZRH to IAD, AT_ID 2 would the segment 1 from ZRH to BEL, segment 2 from BEL to RUO and segment 3 from RUO to LAX, each segment in a different row but with the same intinerary_id.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The result would look something like that:

AT_ID <- c(1,2,3,4,5,6)

DEPARTURE_AIRPORT <- c("ZRH","ZRH","BEL","RUO","ZRH","DUB")
ARRIVAL_AIRPORT <- c("IAD", "BEL","RUO", "LAX","DUB","BUD")
intinerary_id <- c(1,2,2,2,3,3)

test_df_target <- data.frame(AT_ID, DEPARTURE_AIRPORT, ARRIVAL_AIRPORT, intinerary_id)

print(test_df_target)

The split()- function and separate_columns did not work out for me as the columns are not all the same, some have NA and some 3 segments.

I hope my question is clear, otherwise please let me know, so I can specify.

Thank you in advance!

>Solution :

We could use pivot_longer like this:

library(dplyr)
library(tidyr)

test_df %>% 
  pivot_longer(
    cols =c(DEPARTURE_AIRPORT, STOPOVER_1, STOPOVER_2),
    names_to = "name",
    values_to = "DEPARTURE_AIRPORT"
  ) %>% 
  filter(!is.na(DEPARTURE_AIRPORT)) %>% 
  mutate(intinerary_id = AT_ID,
         AT_ID = row_number())%>% 
  select(AT_ID, DEPARTURE_AIRPORT, ARRIVAL_AIRPORT, intinerary_id)
  AT_ID DEPARTURE_AIRPORT ARRIVAL_AIRPORT intinerary_id
  <int> <chr>             <chr>                   <dbl>
1     1 ZRH               IAD                         1
2     2 ZRH               LAX                         2
3     3 BEL               LAX                         2
4     4 RUO               LAX                         2
5     5 ZRH               BUD                         3
6     6 DUB               BUD                         3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading