How to create an order/ranking for majors declared by each student

Advertisements

I have a dataset that has student ids and name of major along with semester. The main complicating factor, is that if a student returns to a previous major, I want it to count as another major

StudID   Major       Term
12345    History     202101
12345    History     202102
12345    Management  202201
12345    History     202202

Desired result

StudID   Major       Term    MajorNumber
12345    History     202101  1
12345    History     202102  1
12345    Management  202201  2
12345    History     202202  3

The only thing I have tried is pulling in the earliest term for each major then doing a ranking within StudentID based on each majors earliest term, but that means that final "history" becomes a 1 instead of 3

>Solution :

The data.table::rleid function does this. Specifically, it will give all consecutive values that are the same the same ID, and will increment whenever there is a change.

With data.table,

library(data.table)
your_dt = as.data.table(your_data)
your_dt[, MajorNumber := rleid(Major), by = .(StudID)]
your_dt
#    StudID      Major   Term MajorNumber
# 1:  12345    History 202101           1
# 2:  12345    History 202102           1
# 3:  12345 Management 202201           2
# 4:  12345    History 202202           3

Or with dplyr:

library(dplyr)
your_data %>%
  group_by(StudID) %>%
  mutate(MajorNumber = data.table::rleid(Major)) %>%
  ungroup()

Leave a Reply Cancel reply