I have a dataset and I want to rearrange it in order to have more harmony and calculate in a more easy way means and frequency.
Let’s take the following example, I have a dataset cointaining the last shopping expenditures of different models:
| Observation | Model | Date | Clothing | Price in $ | Store |
|---|---|---|---|---|---|
| # 1 | Amy | 14 / 01 | Top | 60 | X |
| # 2 | Amy | 17 / 03 | SKIRT | 35 | X |
| # 3 | Amy | 05 / 05 | Skirt | 40 | X |
| # 4 | Amy | 05 / 05 | Blouse | 70 | P |
| # 5 | Claudia | 17 / 02 | BLOUSE | 40 | B |
| # 6 | Claudia | 17 / 02 | Jeans | 90 | L |
| # 7 | Claudia | 21 / 04 | Jacket | 120 | L |
| # 8 | Claudia | 22 / 04 | TOP | 30 | X |
| # 9 | Estella | 05 / 05 | NA | 95 | L |
| # 10 | Estella | 07 / 06 | Skirt | 40 | X |
| # 11 | Estella | 08 / 07 | Dress | 150 | H |
| # 12 | Estella | 04 / 08 | Hat | 15 | X |
As you can see somme clothing pieces are the same but are written differently (it’s on purpose). I want to rearrange this dataset in order to keep the models in the exact same order but organize the clothing so that it will always start in alphabetic order and missing values at the end (blouse, dress, hat, jacket, jeans, skirt, NA), regardless of how the word is written.
I don’t have many ideas about what to use as code for this, so I cannot provide a code…
>Solution :
You can sort only on the Clothing column, and put it back to your df$Clothing.
df$Clothing <- sort(df$Clothing, na.last = T)
Observation Model Date Clothing Price in $ Store
1 # 1 Amy 14 / 01 Blouse 60 X
2 # 2 Amy 17 / 03 BLOUSE 35 X
3 # 3 Amy 05 / 05 Dress 40 X
4 # 4 Amy 05 / 05 Hat 70 P
5 # 5 Claudia 17 / 02 Jacket 40 B
6 # 6 Claudia 17 / 02 Jeans 90 L
7 # 7 Claudia 21 / 04 Skirt 120 L
8 # 8 Claudia 22 / 04 Skirt 30 X
9 # 9 Estella 05 / 05 SKIRT 95 L
10 # 10 Estella 07 / 06 Top 40 X
11 # 11 Estella 08 / 07 TOP 150 H
12 # 12 Estella 04 / 08 <NA> 15 X
UPDATE: Seems like OP wants to arrange Clothing within each Model, here’s the code for this:
library(dplyr)
df %>% group_by(Model) %>% arrange(Clothing, .by_group = T)
# A tibble: 12 × 6
# Groups: Model [3]
Observation Model Date Clothing `Price in $` Store
<chr> <chr> <chr> <chr> <int> <chr>
1 # 4 Amy 05 / 05 Blouse 70 P
2 # 3 Amy 05 / 05 Skirt 40 X
3 # 2 Amy 17 / 03 SKIRT 35 X
4 # 1 Amy 14 / 01 Top 60 X
5 # 5 Claudia 17 / 02 BLOUSE 40 B
6 # 7 Claudia 21 / 04 Jacket 120 L
7 # 6 Claudia 17 / 02 Jeans 90 L
8 # 8 Claudia 22 / 04 TOP 30 X
9 # 11 Estella 08 / 07 Dress 150 H
10 # 12 Estella 04 / 08 Hat 15 X
11 # 10 Estella 07 / 06 Skirt 40 X
12 # 9 Estella 05 / 05 NA 95 L