my data looks like this:
try=data.frame("histones"= c("encode3Ren_limb_H3K27me3_E10","encode3Ren_facial_prominence_H3K27me3_E10", "encode3Ren_liver_H3K27me3_E12", "encode3Ren_neural_tube_H3K27me3_E14", "encode3Ren_neural_tube_H3K4me1_E12" ,"encode3Ren_neural_tube_H3K27me3_E11", "encode3Ren_neural_tube_H3K4me1_E15", "encode3Ren_neural_tube_H3K4me2_E13" ), "a"= c(1,2,3,4,5,6,7,8))
try
histones a
1 encode3Ren_limb_H3K27me3_E10 1
2 encode3Ren_facial_prominence_H3K27me3_E10 2
3 encode3Ren_liver_H3K27me3_E12 3
4 encode3Ren_neural_tube_H3K27me3_E14 4
5 encode3Ren_neural_tube_H3K4me1_E12 5
6 encode3Ren_neural_tube_H3K27me3_E11 6
7 encode3Ren_neural_tube_H3K4me1_E15 7
8 encode3Ren_neural_tube_H3K4me2_E13 8
and I would to extract from the column "histones" only the histone mark (i.e. H3K27me3, H3K4me2), putting it in new column. I’m not able to use regular expression, so any help are very appreciated.
>Solution :
Well actually regular expressions are a good choice here:
try$mark <- str_extract(try$histones, "(?<=_)H\\d+K\\d+\\w+?(?=_)")
If you really can’t use regex for some reason, here is an option using base R string functions:
x <- "encode3Ren_facial_prominence_H3K27me3_E10"
mark <- tail(unlist(strsplit(x, "_")), 2)[-2]
mark
[1] "H3K27me3"