I am trying to automate the process of translating a questionnaire raw score to a percentile based on a table that references the normative population data.
These are my data frames:
id <- c(001, 002, 003, 004, 005, 006, 007, 008, 009, 010)
age <- c(3.5, 4, 7, 8, 11, 4, 4, 6, 7, 3)
domain_score <- c(sample(12:53, 10))
pd_frame <- data.frame(id, age, domain_score)
norms_table <- data.frame(percentile = rev(c(1, seq(from = 5, to = 100, by = 5))),
score_start = c(46, 38, 36, 34, 33, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 19, 17, 15, 13, 12),
score_end = c(53, 45, 37, 35, 33, 32, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 18, 16, 14, 12))
In order to solve this issue I have created a for-loop where I am aiming to match the domain_score from the pd_frame to the score bins I have in the norms_table.
Then if the domain_score falls within a given range in the norms_table, I want to retrieve the percentile in the norms_table and add it to a new percentile column in the pd_frame.
Here is what I have so far:
for (i in 1:nrow(pd_frame)) {
current_score <- pd_frame$domain_score[i]
bin_low <- norms_table$score_start[i]
bin_high <- norms_table$score_end[i]
current_percentile <- norms_table$percentile[i]
if (current_score >= bin_low & current_score <= bin_high) {
pd_frame$percentile[i] <- current_percentile[i]
} else {
pd_frame$percentile[i] <- NA
}
}
This for-loop works for the 1st row in the pd_frame but it stops after that and I am not sure how to make it continue to iterate for the entirety of the pd_frame.
Any advice would be much appreciated.
>Solution :
Your current for loop is not working as intended because the logic inside the loop is not correctly set up to iterate through norms_table for each domain_score in pd_frame. In your current setup, you are only comparing the ith domain_score with the ith row in norms_table, which is not the correct approach for this problem.
You need to modify your loop so that for each domain_score in pd_frame, it checks against all rows in norms_table to find the appropriate percentile. Here’s how you can do it:
# Adding a new column for percentile in pd_frame
pd_frame$percentile <- NA
for (i in 1:nrow(pd_frame)) {
current_score <- pd_frame$domain_score[i]
for (j in 1:nrow(norms_table)) {
bin_low <- norms_table$score_start[j]
bin_high <- norms_table$score_end[j]
if (current_score >= bin_low && current_score <= bin_high) {
pd_frame$percentile[i] <- norms_table$percentile[j]
break # Exit the inner loop once the correct bin is found
}
}
}