I have this simple data frame,
colx <- c(2002-01, 2002-02, 2002-03, 2002-04)
cola <- c(10, 20, NA, 40)
colb <- c(NA, NA, NA, 400)
colc <- c(1000, 2000, 3000, 4000)
df <- data.frame(colx, cola, colb, colc)
df
> df
colx cola colb colc
1 2001 10 NA 1000
2 2000 20 NA 2000
3 1999 NA NA 3000
4 1998 40 400 4000
I want to convert it to long shape,
library(reshape2)
df_long <- melt(df, id.vars = c("colx"))
df_long
> df_long
colx variable value
1 2001 cola 10
2 2000 cola 20
3 1999 cola NA
4 1998 cola 40
5 2001 colb NA
6 2000 colb NA
7 1999 colb NA
8 1998 colb 400
9 2001 colc 1000
10 2000 colc 2000
11 1999 colc 3000
12 1998 colc 4000
However, I want to avoid NAs if they exist at the beginning of the variable. E.g. in ‘colb’ the first 3 cases are NA. I want to avoid those NAs. However, if the NAs appear later in the variable, e.g. case 3 in cola, I don’t want to remove it. The expected output should look as follows,
> df_long
colx variable value
1 2001 cola 10
2 2000 cola 20
3 1999 cola NA
4 1998 cola 40
5 1998 colb 400
6 2001 colc 1000
7 2000 colc 2000
8 1999 colc 3000
9 1998 colc 4000
Any suggestions on the way to achive this?
>Solution :
With first and filter:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-colx) %>%
group_by(name) %>%
filter(!(is.na(first(value)) & is.na(value)))
colx name value
<dbl> <chr> <dbl>
1 2001 cola 10
2 2000 cola 20
3 1999 cola NA
4 1998 cola 40
5 1998 colb 400
6 2001 colc 1000
7 2000 colc 2000
8 1999 colc 3000
9 1998 colc 4000