Conditional on first letter

June 30, 2023

Suppose my dataset is

ID	Country	Type
1	Austria	A
1	Austria	A
1	Austria	A
1	Belgium	A
2	Czech	B
2	Czech	B
2	Denmark	B
2	Denmark	C

I want to create a new variable of each type with the percent each country has within each ID. So

ID	Country	Type	A	B	C
1	Austria	A	0.75	0	0
1	Austria	A	0.75	0	0
1	Austria	A	0.75	0	0
1	Belgium	A	0.25	0	0
2	Czech	B	0	0.5	0
2	Czech	B	0	0.5	0
2	Denmark	B	0	0.25	0
2	Denmark	C	0	0	0.25

For example, we have four countries that belong to ID 1. It has three Austria’s with type A (therefore 0.75) and one Belgium with type A (0.25) therefore falling in to type A.

Any help would be appreciated!

>Solution :

I believe you need to group_by different variables when deriving the correct percentages.

library(tidyverse)

df |> 
  group_by_all() |> 
  mutate(value = n()) |> 
  group_by(ID) |> 
  mutate(value = value/n(),
         rn = row_number(),
         name = Type) |> 
  pivot_wider(values_fill = 0) |> 
  select(-rn) |> 
  ungroup()

# A tibble: 8 × 6
     ID Country Type      A     B     C
  <int> <chr>   <chr> <dbl> <dbl> <dbl>
1     1 Austria A      0.75  0     0   
2     1 Austria A      0.75  0     0   
3     1 Austria A      0.75  0     0   
4     1 Belgium A      0.25  0     0   
5     2 Czech   B      0     0.5   0   
6     2 Czech   B      0     0.5   0   
7     2 Denmark B      0     0.25  0   
8     2 Denmark C      0     0     0.25