Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Select a given percentage of the dataset or the closest value [R]

I would like to extract the percentile of Size distribution in my dataset for different groups (In that exemple, let say 10 % and 50 %). The size are in increasing order. My idea was to filter the cumulative percentage and select the value at the edge (e.g. if I want the percentile 10 of the biggest values, I will filter the size regarding the percentage and after take the minimal value).
Nevertheless, when I try to filter the cumulative percentage with that code df <- df %>% filter(., Cum <= 10) %>% map(~slice(.,which.min(Size))), I have "NA" for two of the sites. This is because there is no value below 10 for the cumulative percentage in those sites.

How should I proceed to select the smallest cumulative value if there is no value under 10%?

df <- list(structure(list(Size = c(42, 40, 40, 37, 36, 36, 35, 35, 
35, 34, 34, 34, 33, 33, 33, 31, 30, 29, 29, 27, 26, 23), SubStation = c("B", 
"B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B"
), Cum = c(4.54545454545455, 9.09090909090909, 13.6363636363636, 
18.1818181818182, 22.7272727272727, 27.2727272727273, 31.8181818181818, 
36.3636363636364, 40.9090909090909, 45.4545454545455, 50, 54.5454545454545, 
59.0909090909091, 63.6363636363636, 68.1818181818182, 72.7272727272727, 
77.2727272727273, 81.8181818181818, 86.3636363636364, 90.9090909090909, 
95.4545454545455, 100)), row.names = c(NA, -22L), class = c("tbl_df", 
"tbl", "data.frame")), structure(list(Size = c(43, 42, 36, 36, 
35, 35, 34, 34, 34, 33, 31, 31, 30, 30, 28, 27, 27, 27, 25, 25, 
25, 25, 24, 23), SubStation = c("M1", "M1", 
"M1", "M1", "M1", "M1", 
"M1", "M1", "M1", "M1", 
"M1", "M1", "M1", "M1", 
"M1", "M1", "M1", "M1", 
"M1", "M1", "M1", "M1", 
"M1", "M1"), Cum = c(4.16666666666667, 8.33333333333333, 
12.5, 16.6666666666667, 20.8333333333333, 25, 29.1666666666667, 
33.3333333333333, 37.5, 41.6666666666667, 45.8333333333333, 50, 
54.1666666666667, 58.3333333333333, 62.5, 66.6666666666667, 70.8333333333333, 
75, 79.1666666666667, 83.3333333333333, 87.5, 91.6666666666667, 
95.8333333333333, 100)), row.names = c(NA, -24L), class = c("tbl_df", 
"tbl", "data.frame")), structure(list(Size = c(36, 34, 34, 32, 
32, 24), SubStation = c("M2", "M2", "M2", 
"M2", "M2", "M2"), Cum = c(16.6666666666667, 
33.3333333333333, 50, 66.6666666666667, 83.3333333333333, 100
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(Size = c(34, 33, 33, 28, 25, 24), SubStation = c("M3", 
"M3", "M3", "M3", "M3", 
"M3"), Cum = c(16.6666666666667, 33.3333333333333, 
50, 66.6666666666667, 83.3333333333333, 100)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame")))

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Here with if and else functions, that does the job. I did for both P50 and P10 and joined them in a unique table (I use bind_row() to come back from a divided tibble to a dataframe).

P10 <- df %>% map(~filter(., if(any(Cum < 10)) Cum <= 10 else row_number(Cum) <= 1)) %>%
  map(~slice(.,which.min(Size))) %>% bind_rows() %>% select(P10 = Size, SubStation)
P10P50 <- df %>% map(~filter(., if(any(Cum < 50)) Cum <= 50 else row_number(Cum) <= 1)) %>%
  map(~slice(.,which.min(Size))) %>% bind_rows() %>% select(P50 = Size, SubStation) %>% inner_join(P10, by = "SubStation", copy = FALSE) %>% relocate(SubStation, P10, P50)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading