Suppose my dataset is
| ID | Country | Type |
|---|---|---|
| 1 | Austria | A |
| 1 | Austria | A |
| 1 | Austria | A |
| 1 | Belgium | A |
| 2 | Czech | B |
| 2 | Czech | B |
| 2 | Denmark | B |
| 2 | Denmark | C |
I want to create a new variable of each type with the percent each country has within each ID. So
| ID | Country | Type | A | B | C |
|---|---|---|---|---|---|
| 1 | Austria | A | 0.75 | 0 | 0 |
| 1 | Austria | A | 0.75 | 0 | 0 |
| 1 | Austria | A | 0.75 | 0 | 0 |
| 1 | Belgium | A | 0.25 | 0 | 0 |
| 2 | Czech | B | 0 | 0.5 | 0 |
| 2 | Czech | B | 0 | 0.5 | 0 |
| 2 | Denmark | B | 0 | 0.25 | 0 |
| 2 | Denmark | C | 0 | 0 | 0.25 |
For example, we have four countries that belong to ID 1. It has three Austria’s with type A (therefore 0.75) and one Belgium with type A (0.25) therefore falling in to type A.
Any help would be appreciated!
>Solution :
I believe you need to group_by different variables when deriving the correct percentages.
library(tidyverse)
df |>
group_by_all() |>
mutate(value = n()) |>
group_by(ID) |>
mutate(value = value/n(),
rn = row_number(),
name = Type) |>
pivot_wider(values_fill = 0) |>
select(-rn) |>
ungroup()
# A tibble: 8 × 6
ID Country Type A B C
<int> <chr> <chr> <dbl> <dbl> <dbl>
1 1 Austria A 0.75 0 0
2 1 Austria A 0.75 0 0
3 1 Austria A 0.75 0 0
4 1 Belgium A 0.25 0 0
5 2 Czech B 0 0.5 0
6 2 Czech B 0 0.5 0
7 2 Denmark B 0 0.25 0
8 2 Denmark C 0 0 0.25