I have this data below. I would like to impute the large value of income by 10 times the value of the median.
This is done using the following code:
df$income_imputed = ifelse(df$income > (10* median(df$income,na.rm = T)), (10* median(df$income,na.rm = T)), df$income)
However, i would like to do that for each country and year separately and not to the whole dataset. I know group_by could be helpful with such tasks, but i am not sure how to include both functions together.
country year income
<dbl> <dbl> <dbl>
1 1 1999 5000
2 1 1999 5000
3 1 1999 10000000
4 1 1999 3000
5 1 2000 4000
6 1 2000 4000
7 1 2000 20000000
8 1 2000 4000
9 2 1999 10000
10 2 1999 10000
11 2 1999 30000000
12 2 1999 4000
13 2 2000 12000
14 2 2000 12000
15 2 2000 40000000
df= structure(list(country = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,
2, 2, 2), year = c(1999, 1999, 1999, 1999, 2000, 2000, 2000,
2000, 1999, 1999, 1999, 1999, 2000, 2000, 2000), income = c(5000,
5000, 1e+07, 3000, 4000, 4000, 2e+07, 4000, 10000, 10000, 3e+07,
4000, 12000, 12000, 4e+07), income2 = c(5000, 5000, 1e+05, 3000,
4000, 4000, 1e+05, 4000, 10000, 10000, 1e+05, 4000, 12000, 12000,
1e+05)), row.names = c(NA, -15L), class = c("tbl_df", "tbl",
"data.frame"))
>Solution :
I believe this is what you expect :
my_df <- my_df %>% group_by(country, year) %>% mutate(income_imputed = ifelse(income > (10* median(income,na.rm = T)), (10* median(income,na.rm = T)), income))