I am new to R. I would like to calculate the mean for each row of a dataframe, but using different subset of columns for each row. I have two extra-columns providing me the names of the column that represent the "start" and the "end" that I should use to calculate each mean, respectively.
Let’s take this example
dframe <- data.frame(a=c("2","3","4", "2"), b=c("1","3","6", "2"), c=c("4","5","6", "3"), d=c("4","2","8", "5"), e=c("a", "c", "a", "b"), f=c("c", "d", "d", "c"))
dframe
Which provides the following dataframe:
a b c d e f
1 2 1 4 4 a c
2 3 3 5 2 c d
3 4 6 6 8 a d
4 2 2 3 5 b c
The columns e and f represent the first and last column I use to calculate the mean for each row.
For example, on line 1, the mean would be calculated including column a, b, c ((2+1+4)/3 -> 2.3)
So I would like to obtain the following output:
a b c d e f mean
1 2 1 4 4 a c 2.3
2 3 3 5 2 c d 3.5
3 4 6 6 8 a d 6
4 2 2 3 5 b c 2.5
I learnt how to create the indices, and I want then to use RowMeans, but I cannot find the correct arguments.
dframe %>%
mutate(e_indice = match(e, colnames(dframe)))%>%
mutate(f_indice = match(f, colnames(dframe)))%>%
mutate(mean = RowMeans(????, na.rm = TRUE))
Thanks a lot for your help
>Solution :
One dplyr option could be:
dframe %>%
rowwise() %>%
mutate(mean = rowMeans(cur_data()[match(e, names(.)):match(f, names(.))]))
a b c d e f mean
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl>
1 2 1 4 4 a c 2.33
2 3 3 5 2 c d 3.5
3 4 6 6 8 a d 6
4 2 2 3 5 b c 2.5