Home What is the simplest way to compute the average of one variable grouped by a second variable, iterating over all second variables dplyr?

Questions

What is the simplest way to compute the average of one variable grouped by a second variable, iterating over all second variables dplyr?

byMR

August 16, 2022

I have a data frame with a large number of variables, one of them, the probability of death to be predicted by all others.
As a preliminary step I want to compute the PoD by counting the death rate in bins of each variable.

let’s say df <- (age = c(25, 57, 60), weight = (80, 92, 61), cigarettes_a_day = c(30, 2, 19), death_flag=c(1,0,1))

Then I can group by age (say under 50 and over 50) and compute the PoD as the death rate of one group as the count of death_flags divided by the number of people falling into the group, or simply the average death_flag. When grouping by weight(say below and above 80) I will obtain a different death rate and thus a different PoD, for each binned variable, which is what I want. My problem arises when trying to iterate through all variables.

So far I’ve tried variants of the following piece of code, which however does not work:

for(n in names(df)) {

    df%>% group_by(n)%>%
      summarise(PoD_bin = mean(death_flag))
}

I haven’t figured out a way to run through all variables and perform the computation.

As a side note, the binning of variables I have done without dplyr by:

for(v in names(df[-1])){
    newVar <- paste(f, "bin", sep = "_")
    df[newVar] <- cut(as.matrix(df[v]), breaks = 100)
}

I am irritated, that I cannot refer to the variables in the first for loop for the grouping, while I can do so in the second to create new columns of the df.

Help is greatly appreciated!

>Solution :

Your loop doesn’t work because a character is parsed to group_by. You could modify your loop a little bit and get the desired result. I have added print() to see the output.

for (n in names(df)) {
  
  df |>
    group_by(!!sym(n)) |>
    summarise(PoD_bin = mean(death_flag)) |>
    print()
  
}

Output:

# A tibble: 3 × 2
    age PoD_bin
  <dbl>   <dbl>
1    25       1
2    57       0
3    60       1
# A tibble: 3 × 2
  weight PoD_bin
   <dbl>   <dbl>
1     61       1
2     80       1
3     92       0
# A tibble: 3 × 2
  cigarettes_a_day PoD_bin
             <dbl>   <dbl>
1                2       0
2               19       1
3               30       1
# A tibble: 2 × 2
  death_flag PoD_bin
       <dbl>   <dbl>
1          0       0
2          1       1

Data:

df <- tibble(age = c(25, 57, 60), weight = c(80, 92, 61), cigarettes_a_day = c(30, 2, 19), death_flag=c(1,0,1))

dynamic-variables

byMR

Published August 16, 2022

Add a comment

can't change month number to month name using scale_x

byMR

August 16, 2022

Questions

ENTITY FRAMEWORK GRoupBy,OrderBy,Sum

byMR

August 16, 2022

Questions

How can I use Pandas to add a count with a certain interval to the entire csv file?

byMR

August 16, 2022

Questions

How do I find all instances of a string starting with a certain character and replace the rest of the string using preg_replace / preg_match

byMR

August 16, 2022

Questions

Calculating multiple columns from one column with summarise

byMR

August 16, 2022

Questions

How to check which words from a list are contained in a string?

byMR

August 16, 2022

What is the simplest way to compute the average of one variable grouped by a second variable, iterating over all second variables dplyr?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

can't change month number to month name using scale_x

ENTITY FRAMEWORK GRoupBy,OrderBy,Sum

How can I use Pandas to add a count with a certain interval to the entire csv file?

How do I find all instances of a string starting with a certain character and replace the rest of the string using preg_replace / preg_match

Calculating multiple columns from one column with summarise

How to check which words from a list are contained in a string?

Keep Up to Date with the Most Important News

What is the simplest way to compute the average of one variable grouped by a second variable, iterating over all second variables dplyr?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

can't change month number to month name using scale_x

ENTITY FRAMEWORK GRoupBy,OrderBy,Sum

How can I use Pandas to add a count with a certain interval to the entire csv file?

How do I find all instances of a string starting with a certain character and replace the rest of the string using preg_replace / preg_match

Calculating multiple columns from one column with summarise

How to check which words from a list are contained in a string?

Discover more from Dev solutions