Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Calculate % Change Avoiding NA's, By Group

I have the following sample data:

df <- structure(list(person = c("a", "a", "a", "a", "a", "a", "a", 
"b", "b", "b", "b", "c", "c", "c", "c", "c", "c", "c", "d", "d", 
"d", "e", "f"), data = c(4.51646448148296, 6.50396548317118, 
NA, 9.84425670640439, NA, NA, 1.67850554309375, NA, 2.06950894895984, 
4.69548965710482, 5.12795074939009, 0.209845957108408, 6.47290211779118, 
6.26481009615426, NA, NA, 5.24543093106204, 4.75242871394851, 
6.8182202925863, NA, 5.3162641705813, 6.05862341995543, NA)), row.names = c(NA, 
-23L), class = "data.frame")

Which looks like this

  person     data
1       a 4.516464
2       a 6.503965
3       a       NA
4       a 9.844257
5       a       NA
6       a       NA
7       a 1.678506
8       b       NA
9       b 2.069509
10      b 4.695490
11      b 5.127951
12      c 0.209846
13      c 6.472902
14      c 6.264810
15      c       NA
16      c       NA
17      c 5.245431
18      c 4.752429
19      d 6.818220
20      d       NA
21      d 5.316264
22      e 6.058623
23      f       NA

I’m trying to be able to calculate the percent change from the latest and second latest entry in the data column based on person. The entries are logged from first to latest by person (i.e., row 1 was person A’s first entry, and row 7 was their latest entry).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have tried various combinations of the following code with no luck:

df %>%
  group_by(person) %>%
  mutate(percChange = c(rep(NA, ((df %>% count(person))[2]) - 
                              length(((na.omit(data) - lag(na.omit(data)))/na.omit(data)*100))), 
                        (na.omit(data) - lag(na.omit(data)))/na.omit(data)*100)) %>%
  summarise(max = max(na.omit(data)),
            avg = mean(na.omit(data)),
            lastPercChange = last(na.omit(percChange)))

The desired output would look like this, with example calculation of the last % change:

 person     max    avg        lastPercChange                                                                       
1 a         9.84   5.64       33.93138    (1.678506-9.844257)/1.678506 * 100 
2 b         5.13   3.96       8.433407    (5.127951-4.695490)/5.127951 * 100 
3 c         6.47   4.59       -10.37368   (4.752429-5.245431)/4.752429 * 100
4 d         6.82   6.07       14.66658    (5.316264-6.818220)/5.316264 * 100 
5 e         6.06   6.06       NA          (since only 1 observation)                 
6 f      -Inf    NaN          NA          (since only 1 observation)

Any help would be greatly appreciated.

>Solution :

For simplicity I would drop all the NA rows first then it is easy to calculate the values that you want. Function nth(..) or head/tail can help us provide last and second last values.

If a person has all the NA values it would be dropped completely from the output to keep it intact we do a right_join with the original data.

library(dplyr)

df %>%
  filter(!is.na(data)) %>%
  group_by(person) %>%
  summarise(max = max(data), 
            avg = mean(data), 
            lastPercChange = (nth(data,-1) - nth(data,-2))/nth(data, -1) *100)%>%
  right_join(df %>% distinct(person), by = 'person')

# person   max   avg lastPercChange
#  <chr>  <dbl> <dbl>          <dbl>
#1 a       9.84  5.64        -486.  
#2 b       5.13  3.96           8.43
#3 c       6.47  4.59         -10.4 
#4 d       6.82  6.07         -28.3 
#5 e       6.06  6.06          NA   
#6 f      NA    NA             NA   
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading