Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

I would like to see if one string value fits in the range of another, in R

I have two columns of protein data, one that contains a modification of an amino acid and the location, and another with a range. If the modification number falls within this range, I would like to generate another column and copy that string there. If there are no matches I’d like it to remain blank and not delete any rows.

# Sample data

# String with the protein modifications and numbers

ProteinModificationMotifs <- c(
    "Glycosylation_49, Glycosylation_255, Glycosylation_399, Glycosylation_437, Glycosylation_455, Glycosylation_536",
    "Glycosylation_32, Glycosylation_101", "Glycosylation_555"
)

# String with the ranges
AA_Range <- c("400-637", "0-50", "0-444")

# Creating a dataframe
peptide_df <- data.frame(ProteinModificationMotifs = ProteinModificationMotifs, AA_Range = AA_Range)

# Displaying the dataframe
print(peptide_df)

#The output should be another column

print(peptide_df$PeptideModificationMotifs)

    [1]"Glycosylation_437, Glycosylation_455, Glycosylation_536",
    [2]"Glycosylation_32",
    [3]""


>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Here The check_range function extracts the modification numbers from each protein string and compares them with the range specified in the AA_Range column of the same row. All proteins with modification numbers falling within the specified range are then included in the third column (PeptideModificationMotifs).

#data
    ProteinModificationMotifs <- c(
      "Glycosylation_49, Glycosylation_255, Glycosylation_399, Glycosylation_437, Glycosylation_455, Glycosylation_536",
      "Glycosylation_32, Glycosylation_101", "Glycosylation_555"
    )

AA_Range <- c("400-637", "0-50", "0-444")

# Creating a dataframe
peptide_df <- data.frame(ProteinModificationMotifs = ProteinModificationMotifs, AA_Range = AA_Range)

# Function to check if a modification number falls within the range
check_range <- function(modification, range) {
  mod_numbers <- as.numeric(gsub(".*_(\\d+)", "\\1", strsplit(modification, ", ")[[1]]))
  range_values <- as.numeric(strsplit(range, "-")[[1]])
  return(mod_numbers >= range_values[1] & mod_numbers <= range_values[2])
}

# Apply the function to create a new column
peptide_df$PeptideModificationMotifs <- sapply(1:nrow(peptide_df), function(i) {
  proteins <- unlist(strsplit(peptide_df$ProteinModificationMotifs[i], ", "))
  matching_proteins <- proteins[sapply(proteins, check_range, range = peptide_df$AA_Range[i])]
  return(paste(matching_proteins, collapse = ", "))
})

# Displaying the updated dataframe
print(peptide_df)
View(peptide_df)
1 Glycosylation_49, Glycosylation_255, Glycosylation_399, Glycosylation_437, Glycosylation_455, Glycosylation_536
2                                                                             Glycosylation_32, Glycosylation_101
3                                                                                               Glycosylation_555
  AA_Range                               PeptideModificationMotifs
1  400-637 Glycosylation_437, Glycosylation_455, Glycosylation_536
2     0-50                                        Glycosylation_32
3    0-444                                                        
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading