I’m trying to see the number of new employees a manager got between time one and time 2. I have a string of all employee ids that roll up under that manager.
My below code always says there is 1 new employee, but as you can see, there’s 2. How do I find out how many new employees there are? The ids aren’t guaranteed to always be in the same order, but they will always be split by a ", ".
library(dplyr)
library(stringr)
#First data set
mydata_q2 <- tibble(
leader = 1,
reports_q2 = "2222, 3333, 4444"
)
#Second dataset
mydata_q3 <- tibble(
leader = 1,
reports_q3 = "2222, 3333, 4444, 55555, 66666"
)
#Function to count number of new employees
calculate_number_new_emps <- function(reports_time1, reports_time2) {
time_1_reports <- ifelse(is.na(reports_time1), character(0), str_split(reports_time1, " ,\\s*")[[1]])
time_2_reports <- str_split(reports_time2, " ,\\s*")[[1]]
num_new_employees <- length(setdiff(time_1_reports, time_2_reports))
num_new_employees
}
#Join data and count number of new staff--get wrong answer
mydata_q2 %>%
left_join(mydata_q3) %>%
mutate(new_staff_count = calculate_number_new_emps(reports_q2, reports_q3))
EDIT:
The output that I want is for new_staff_count = 2 for this example.
That’s because there are 2 new employees (55555 and 66666) in q3 that weren’t in time q2.
>Solution :
Your separation in str_split is not correct. Just split on ", ". Then find the difference on the length between the two vectors.
calculate_number_new_emps <- function(reports_time1, reports_time2) {
if (is.na(reports_time1))
{time_1_reports <-character(0)}
else
{time_1_reports <- str_split(reports_time1, ", ")[[1]]}
print(time_1_reports)
time_2_reports <- str_split(reports_time2, ", ")[[1]]
num_new_employees <- length(time_2_reports) - length(time_1_reports)
num_new_employees
}
#Join data and count number of new staff--get wrong answer
mydata_q2 %>%
left_join(mydata_q3) %>%
mutate(new_staff_count = calculate_number_new_emps(reports_q2, reports_q3))