I have a data frame as follows:
comment_id <- c(1, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10)
cat <- c("acc_sp", "acc_lex", "org_gen", "acc_gen", "ran_lex", "arg_rel", "len", "org_lay", "org_spe", "org_gen", "coh_link")
df <- data.frame(comment_id, cat)
You’ll notice that there are two items with comment_id = 2.
I need to create a new column which uniquely numbers each iteration of a comment_id. The first five lines of the new column would be as follows:
| comment_cat_id |
|---|
| 1_1 |
| 2_1 |
| 2_2 |
| 3_1 |
| 4_1 |
I’m thinking I can use:
df$comment_cat_id <- paste(comment_id, ?????, sep = "_")
to handle the creation of the new column. But I don’t know how to generate the unique count of each occurrence of each comment_id to place into the ????? slot in the above code.
Can anyone help?
>Solution :
That is a way to do it.
df$newcol <- NA # New Column
freq <- as.data.frame(table(df$comment_id)) # Occurencies
> for (i in unique(df$comment_id)) {
+ df[df$comment_id == i,"newcol"] <- paste(i, 1:freq[freq$Var1 == i,"Freq"], sep="_") # For each id, it fills the new column
+ }
> df
comment_id cat newcol
1 1 acc_sp 1_1
2 2 acc_lex 2_1
3 2 org_gen 2_2
4 3 acc_gen 3_1
5 4 ran_lex 4_1
6 5 arg_rel 5_1
7 6 len 6_1
8 7 org_lay 7_1
9 8 org_spe 8_1
10 9 org_gen 9_1
11 10 coh_link 10_1