I have a dataset with three identifiers. Say one identifies the country, another identifies time, and another which identifies a person. Like the data frame below:
country person time
1 A John 1
2 A John 2
3 A John 3
4 A Peter 1
5 A Peter 2
6 A Peter 3
7 B David 1
8 B Thomas 2
9 B David 3
10 B Adam 1
11 B Adam 2
12 B Thomas 3
How can I create a variable that generates a sequence of letters that will identify each person by Country? The output should look like this:
country person time Letterseq
1 A John 1 A
2 A John 2 A
3 A John 3 A
4 A Peter 1 B
5 A Peter 2 B
6 A Peter 3 B
7 B David 1 A
8 B Thomas 2 B
9 B David 3 A
10 B Adam 1 C
11 B Adam 2 C
12 B Thomas 3 B
Let me know if you need more clarification.
>Solution :
With data.table:
dt[order(country,person),LetterSeq:=LETTERS[rleid(person)],by=country][]
country person time LetterSeq
<char> <char> <int> <char>
1: A John 1 A
2: A John 2 A
3: A John 3 A
4: A Peter 1 B
5: A Peter 2 B
6: A Peter 3 B
7: B David 1 B
8: B Thomas 2 C
9: B David 3 B
10: B Adam 1 A
11: B Adam 2 A
12: B Thomas 3 C