Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create a vector giving position for any change of levels

I currently running a code generating a heatmap with a list of specific genes for different cell type. Each gene is classified in a specified category (A, B, C, etc). In my heatmap function (pheatmap package), I can put "breaks" with a vector of number specifying the row where the break has to be made.

However, I want that code to be flexible and use with modified gene list/table. So I would like to create a vector specifying the "position" where a change in factors is made. Here is a dummy example:

df <- data.frame("Gene ID" = rep(paste0("Gene",1:10),1),
           "Category" = c("A", "B", "B", "D", "D", "D", "D", "E", "E", "H" ))
df

#which give
#Gene.ID Category
#1    Gene1        A
#2    Gene2        B
#3    Gene3        B
#4    Gene4        D
#5    Gene5        D
#6    Gene6        D
#7    Gene7        D
#8    Gene8        E
#9    Gene9        E
#10  Gene10        H


My idea was to order/arrange everything alphabetically (which is already done in my example) and extract the number of occurence through table() fonction:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

table(factor(df$Category))
# Which give: 
#A B D E H 
#1 2 4 2 1 

What I would like to do now

Is to create a vector that "sum" every number with the previous one, so I can have a vector indicating where the change of factor occurs. So the output would be:

# "1", "3", "7", "9", "10"

Indicating there that a break should occurs after row 1, row 3, row 7, row 9 and "row 10" (which is the end of the heatmap). How can I achieve that?

Also, in case, is there a better approach to do that?

Thanks in advance

>Solution :

I think you need cumsum:

cumsum(table(df$Category))
#  A  B  D  E  H 
#  1  3  7  9 10 

This assumes that Category is ordered perfectly, which results in the order of names (A, B, etc, above) being the same order as in the raw data.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading