Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

dplyr – arrange on missingness in two variables

I have been stuck on this problem for hours and it’s becoming somewhat frustrating. Basically I want to arrange some data so that the NA’s appear first based on a grouping structure. I can get part of the way there, but nothing I try gets me to the desired result.

With this code,

df <-  df |> 
  group_by(AESOC, AEPT) |> 
  arrange(!is.na(AEPT), !is.na(Severity), .by_group = TRUE)

I have been able to achieve what is shown in the image.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

enter image description here

But I would still like to arrange further so that rows 9-12 appear before row 1 and rows 25-28 appear before row 13 (i.e at the very beginning of the groups determined by AESOC and AEPT.

This small data is included here:

df <-  structure(list(AESOC = c("Blood and lymphatic system disorders", 
"Blood and lymphatic system disorders", "Blood and lymphatic system disorders", 
"Blood and lymphatic system disorders", "Blood and lymphatic system disorders", 
"Blood and lymphatic system disorders", "Blood and lymphatic system disorders", 
"Blood and lymphatic system disorders", "Blood and lymphatic system disorders", 
"Blood and lymphatic system disorders", "Blood and lymphatic system disorders", 
"Blood and lymphatic system disorders", "Cardiac disorders", 
"Cardiac disorders", "Cardiac disorders", "Cardiac disorders", 
"Cardiac disorders", "Cardiac disorders", "Cardiac disorders", 
"Cardiac disorders", "Cardiac disorders", "Cardiac disorders", 
"Cardiac disorders", "Cardiac disorders", "Cardiac disorders", 
"Cardiac disorders", "Cardiac disorders", "Cardiac disorders"
), AEPT = c("    Anaemia", "    Anaemia", "    Anaemia", "    Anaemia", 
"    Lymphopenia", "    Lymphopenia", "    Lymphopenia", "    Lymphopenia", 
NA, NA, NA, NA, "    Dizziness", "    Dizziness", "    Dizziness", 
"    Dizziness", "    Palpitations", "    Palpitations", "    Palpitations", 
"    Palpitations", "    Presyncope", "    Presyncope", "    Presyncope", 
"    Presyncope", NA, NA, NA, NA), Severity = c("        mild", 
"        moderate", "        severe", NA, "        mild", "        moderate", 
"        severe", NA, "    mild", "    moderate", "    severe", 
NA, "        mild", "        moderate", "        severe", NA, 
"        mild", "        moderate", "        severe", NA, "        moderate", 
"        mild", "        severe", NA, "    moderate", "    mild", 
"    severe", NA)), row.names = c(NA, -28L), class = c("tbl_df", 
"tbl", "data.frame"))

Any help would be greatly appreciated.

>Solution :

You can use arrange in the following way :

library(dplyr)

df %>% arrange(AESOC, !is.na(AEPT), AEPT, !is.na(Severity), Severity)

which returns :

                              AESOC             AEPT         Severity
1  Blood and lymphatic system disorders             <NA>             <NA>
2  Blood and lymphatic system disorders             <NA>             mild
3  Blood and lymphatic system disorders             <NA>         moderate
4  Blood and lymphatic system disorders             <NA>           severe
5  Blood and lymphatic system disorders          Anaemia             <NA>
6  Blood and lymphatic system disorders          Anaemia             mild
7  Blood and lymphatic system disorders          Anaemia         moderate
8  Blood and lymphatic system disorders          Anaemia           severe
9  Blood and lymphatic system disorders      Lymphopenia             <NA>
10 Blood and lymphatic system disorders      Lymphopenia             mild
11 Blood and lymphatic system disorders      Lymphopenia         moderate
12 Blood and lymphatic system disorders      Lymphopenia           severe
13                    Cardiac disorders             <NA>             <NA>
14                    Cardiac disorders             <NA>             mild
15                    Cardiac disorders             <NA>         moderate
16                    Cardiac disorders             <NA>           severe
17                    Cardiac disorders        Dizziness             <NA>
18                    Cardiac disorders        Dizziness             mild
19                    Cardiac disorders        Dizziness         moderate
20                    Cardiac disorders        Dizziness           severe
21                    Cardiac disorders     Palpitations             <NA>
22                    Cardiac disorders     Palpitations             mild
23                    Cardiac disorders     Palpitations         moderate
24                    Cardiac disorders     Palpitations           severe
25                    Cardiac disorders       Presyncope             <NA>
26                    Cardiac disorders       Presyncope             mild
27                    Cardiac disorders       Presyncope         moderate
28                    Cardiac disorders       Presyncope           severe

and the same in base R :

df[with(df, order(AESOC, !is.na(AEPT), AEPT, !is.na(Severity), Severity)), ]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading