I have two columns of protein data, one that contains a modification of an amino acid and the location, and another with a range. If the modification number falls within this range, I would like to generate another column and copy that string there. If there are no matches I’d like it to remain blank and not delete any rows.
# Sample data
# String with the protein modifications and numbers
ProteinModificationMotifs <- c(
"Glycosylation_49, Glycosylation_255, Glycosylation_399, Glycosylation_437, Glycosylation_455, Glycosylation_536",
"Glycosylation_32, Glycosylation_101", "Glycosylation_555"
)
# String with the ranges
AA_Range <- c("400-637", "0-50", "0-444")
# Creating a dataframe
peptide_df <- data.frame(ProteinModificationMotifs = ProteinModificationMotifs, AA_Range = AA_Range)
# Displaying the dataframe
print(peptide_df)
#The output should be another column
print(peptide_df$PeptideModificationMotifs)
[1]"Glycosylation_437, Glycosylation_455, Glycosylation_536",
[2]"Glycosylation_32",
[3]""
>Solution :
Here The check_range function extracts the modification numbers from each protein string and compares them with the range specified in the AA_Range column of the same row. All proteins with modification numbers falling within the specified range are then included in the third column (PeptideModificationMotifs).
#data
ProteinModificationMotifs <- c(
"Glycosylation_49, Glycosylation_255, Glycosylation_399, Glycosylation_437, Glycosylation_455, Glycosylation_536",
"Glycosylation_32, Glycosylation_101", "Glycosylation_555"
)
AA_Range <- c("400-637", "0-50", "0-444")
# Creating a dataframe
peptide_df <- data.frame(ProteinModificationMotifs = ProteinModificationMotifs, AA_Range = AA_Range)
# Function to check if a modification number falls within the range
check_range <- function(modification, range) {
mod_numbers <- as.numeric(gsub(".*_(\\d+)", "\\1", strsplit(modification, ", ")[[1]]))
range_values <- as.numeric(strsplit(range, "-")[[1]])
return(mod_numbers >= range_values[1] & mod_numbers <= range_values[2])
}
# Apply the function to create a new column
peptide_df$PeptideModificationMotifs <- sapply(1:nrow(peptide_df), function(i) {
proteins <- unlist(strsplit(peptide_df$ProteinModificationMotifs[i], ", "))
matching_proteins <- proteins[sapply(proteins, check_range, range = peptide_df$AA_Range[i])]
return(paste(matching_proteins, collapse = ", "))
})
# Displaying the updated dataframe
print(peptide_df)
View(peptide_df)
1 Glycosylation_49, Glycosylation_255, Glycosylation_399, Glycosylation_437, Glycosylation_455, Glycosylation_536
2 Glycosylation_32, Glycosylation_101
3 Glycosylation_555
AA_Range PeptideModificationMotifs
1 400-637 Glycosylation_437, Glycosylation_455, Glycosylation_536
2 0-50 Glycosylation_32
3 0-444