Wide to long omitting the early NAs

August 15, 2022

I have this simple data frame,

colx <- c(2002-01, 2002-02, 2002-03, 2002-04)
cola <- c(10, 20, NA, 40)
colb <- c(NA, NA, NA, 400)
colc <- c(1000, 2000, 3000, 4000)
    
df <- data.frame(colx, cola, colb, colc)
df
> df
  colx cola colb colc
1 2001   10   NA 1000
2 2000   20   NA 2000
3 1999   NA   NA 3000
4 1998   40  400 4000

I want to convert it to long shape,

library(reshape2)
df_long <- melt(df, id.vars = c("colx"))
df_long

> df_long
   colx variable value
1  2001     cola    10
2  2000     cola    20
3  1999     cola    NA
4  1998     cola    40
5  2001     colb    NA
6  2000     colb    NA
7  1999     colb    NA
8  1998     colb   400
9  2001     colc  1000
10 2000     colc  2000
11 1999     colc  3000
12 1998     colc  4000

However, I want to avoid NAs if they exist at the beginning of the variable. E.g. in ‘colb’ the first 3 cases are NA. I want to avoid those NAs. However, if the NAs appear later in the variable, e.g. case 3 in cola, I don’t want to remove it. The expected output should look as follows,

> df_long
  colx variable value
1 2001     cola    10
2 2000     cola    20
3 1999     cola    NA
4 1998     cola    40
5 1998     colb   400
6 2001     colc  1000
7 2000     colc  2000
8 1999     colc  3000
9 1998     colc  4000

Any suggestions on the way to achive this?

>Solution :

With first and filter:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(-colx) %>% 
  group_by(name) %>%
  filter(!(is.na(first(value)) & is.na(value)))

    colx name  value
  <dbl> <chr> <dbl>
1  2001 cola     10
2  2000 cola     20
3  1999 cola     NA
4  1998 cola     40
5  1998 colb    400
6  2001 colc   1000
7  2000 colc   2000
8  1999 colc   3000
9  1998 colc   4000