Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to use group by and a conditional mean at the same time in R?

I have a data set with the following columns: Ticker, Company_name, Forecast_period, Measure, Estimate, Forecast_date.

In the Measure column the value is either EPS or DPS and the Forecast_period is either 1 or 2.

I would like to calculate the mean of EPS1, (mean of Estimate if Measure = EPS & Forecast_period =1) mean of EPS2 (mean of Estimate if Measure = EPS & Forecast_period = 2) and the mean of DPS1 (mean of Estimate if Measure = DPS & Forecast_period =1).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

For that I wrote the following code:

IBESdatamean2 <- IBESdata %>%
  group_by(Company_name, Ticker, Forecast_period) %>%
  summarize(mean[Measure == "EPS" & Forecast_period == 1, 'Estimate'])

I need to group my data by Company_name, Ticker and Forecast_period that I am sure of, however I am not sure if I have written the right code for the mean calculation because I am getting the following error:

Caused by error in `mean[Measure == "EPS" & Forecast_period == 1, "Estimate"]`:
! object of type 'closure' is not subsettable

I want to have a conditional mean calculation as described above and if anyone could help me that would be greatly appreciated.

>Solution :

The summarise function works to implement a function on the entire dataset. In this case we want to filter to a specific subset and then extract a single value.

We can use the filter() function and then implement a pull() and mean()

library(dplyr)

IBESdatamean2 <- IBESdata %>%
  group_by(Company_name, Ticker, Forecast_period) %>%
  filter(Measure == "EPS", Forecast_period == 1) %>%
  pull(Estimate) %>%
  mean(na.rm = T)

This is how you have written your code.

However, what I think you are actually asking for is the mean of each individual measure and forecast period per company.

In which case it would be appropriate to implement a summarise(), however your group_by() is incorrect.

library(dplyr)

IBES_ALL_MEANS <- IBESdata %>%
  group_by(Company_name, Forecast_period, Measure) %>%
  summarise(Mean_Estimates = mean(Estimate, na.rm = T)

If you wanted to further filter down to a specific company, forecast period or measure you can now do so using the filter() function on the IBES_ALL_MEANS dataset.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading