I am working with a dataset of about 2000 rows and 22 columns.
I need to assign a score of 0-4 into a new column, based on the contents of two others. Here’s the example:
SAMP_SMPT_USER_REFERENCE | MEAS_ANAL_METH_CODE | oP.Res | TP.Res | OP_METH | TP_METH |
---|---|---|---|---|---|
0031 | 21 | 0.02 | 0.04 | ||
0032 | 24 | 0.12 | 0.003 |
(with apologies for how the database generated that heading)
I am trying to build a loop that looks at MEAS_ANAL_METH_CODE and oP.Res. First it will see if MEAS_ANAL_METH_CODE is 21 or 24, then based on that answer it will look at the value in oP.Res and add a category (0,1,2,3,4) to the OP_METH column. Then it will so it again for TP.Res, looking at MEAS_ANAL_METH_CODE, TP.Res and assigning a value to TP_Meth.
I’m new at loops!
This is what I have tried:
for(oP.Res in Merged){
if(Merged$MEAS_ANAL_METH_CODE == 24 && Merged$oP.Res < 0.01) {
"0"
} else if(Merged$MEAS_ANAL_METH_CODE == 24 && Merged$oP.Res >= 0.01 && Merged$oP.Res < 0.0375) {
"1"
} else if(Merged$MEAS_ANAL_METH_CODE == 24 && Merged$oP.Res >= 0.0375 && Merged$oP.Res < 0.0725) {
"2"
} else if(Merged$MEAS_ANAL_METH_CODE == 24 && Merged$oP.Res >= 0.0725 && Merged$oP.Res < 0.1) {
"3"
} else if(Merged$MEAS_ANAL_METH_CODE == 24 && Merged$oP.Res >= 0.1) {
"4"
} else if(Merged$MEAS_ANAL_METH_CODE == 21 && Merged$oP.Res < 0.01) {
"4"
} else if(Merged$MEAS_ANAL_METH_CODE == 21 && Merged$oP.Res >= 0.01 && Merged$oP.Res < 0.0375) {
"3"
} else if(Merged$MEAS_ANAL_METH_CODE == 21 && Merged$oP.Res >= 0.0375 && Merged$oP.Res < 0.0725) {
"2"
} else if(Merged$MEAS_ANAL_METH_CODE == 21 && Merged$oP.Res >= 0.0725 && Merged$oP.Res < 0.1) {
"1"
} else {"0"}
}
I can see there is probably an issue about R not knowing where to put the outputs, but I cannot figure it out.
>Solution :
A solution using loops and a lookup list.
First store the cut breaks and labels for each code in a list.
tmp=list(
"21"=list(
"brk"=c(0,0.01,0.0375,0.0725,0.1,1),
"lab"=0:4
),
"24"=list(
"brk"=c(0,0.01,0.0375,0.0725,0.1,1),
"lab"=4:0
)
)
Then loop over the columns of interest and for each code apply the cut function.
for(cc in c("oP.Res","TP.Res")) {
Merged[paste0(cc,"_cut")]=NA
for (ctg in unique(Merged$MEAS_ANAL_METH_CODE)) {
Merged[Merged$MEAS_ANAL_METH_CODE==ctg,paste0(cc,"_cut")]=
as.character(
cut(
Merged[Merged$MEAS_ANAL_METH_CODE==ctg,cc],
tmp[[as.character(ctg)]][["brk"]],
tmp[[as.character(ctg)]][["lab"]]
)
)
}
}