I’m having some issues with pivot_wider() from tidyverse. I’ve looked at a few examples but can’t find anything that helps me with my issue. I keep getting NAs in my columns and was hoping to just have the data grouped.
The data is a group of grades from shoppers on various qualities of stores. It looks like this:
set.seed(4)
df <- tibble(
store = c("Store 1", "Store 1", "Store 1", "Store 1", "Store 1", "Store 1", "Store 2", "Store 2", "Store 2", "Store 2", "Store 2", "Store 2", "Store 1", "Store 1", "Store 1", "Store 1", "Store 1", "Store 1", "Store 2", "Store 2", "Store 2", "Store 2", "Store 2", "Store 2"),
location = c("West", "West", "West", "West", "West", "West", "East", "East", "East", "East", "East", "East", "West", "West", "West", "West", "West", "West", "East", "East", "East", "East", "East", "East"),
shopper = c("Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue"),
quality_group = c("Cleanliness", "Cleanliness", "Cleanliness", "Selection", "Selection", "Selection", "Cleanliness", "Cleanliness", "Cleanliness", "Selection", "Selection", "Selection","Cleanliness", "Cleanliness", "Cleanliness", "Selection", "Selection", "Selection", "Cleanliness", "Cleanliness", "Cleanliness", "Selection", "Selection", "Selection"),
within_group_quality = c("Floors", "Produce Area", "Bathroom", "Shelves Well Stocked", "Lots of Dairy Options", "Deli has a high selection", "Floors", "Produce Area", "Bathroom", "Shelves Well Stocked", "Lots of Dairy Options", "Deli has a high selection", "Floors", "Produce Area", "Bathroom", "Shelves Well Stocked", "Lots of Dairy Options", "Deli has a high selection", "Floors", "Produce Area", "Bathroom", "Shelves Well Stocked", "Lots of Dairy Options", "Deli has a high selection"),
grade = runif(n = 24, min = 35, max = 65),
overall_store_score = c(4,4,4,4,4,4,3,3,3,3,3,3,3,3,3,3,3,3,5,5,5,5,5,5)
)
df
# A tibble: 24 x 7
store location shopper quality_group within_group_quality grade overall_store_score
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 Store 1 West Bob Cleanliness Floors 52.6 4
2 Store 1 West Bob Cleanliness Produce Area 35.3 4
3 Store 1 West Bob Cleanliness Bathroom 43.8 4
4 Store 1 West Bob Selection Shelves Well Stocked 43.3 4
5 Store 1 West Bob Selection Lots of Dairy Options 59.4 4
6 Store 1 West Bob Selection Deli has a high selection 42.8 4
7 Store 2 East Bob Cleanliness Floors 56.7 3
8 Store 2 East Bob Cleanliness Produce Area 62.2 3
9 Store 2 East Bob Cleanliness Bathroom 63.5 3
10 Store 2 East Bob Selection Shelves Well Stocked 37.2 3
# ... with 14 more rows
I tried this, which obviously doesn’t work because it produces NA’s in the columns. I’d really like the values to be moved up so that I can compare grades across the stores for all of the shoppers.
df %>%
pivot_wider(names_from = store,
values_from = grade) %>%
as.data.frame()
# A tibble: 24 x 7
location shopper quality_group within_group_quality overall_store_score `Store 1` `Store 2`
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 West Bob Cleanliness Floors 4 52.6 NA
2 West Bob Cleanliness Produce Area 4 35.3 NA
3 West Bob Cleanliness Bathroom 4 43.8 NA
4 West Bob Selection Shelves Well Stocked 4 43.3 NA
5 West Bob Selection Lots of Dairy Options 4 59.4 NA
6 West Bob Selection Deli has a high selection 4 42.8 NA
7 East Bob Cleanliness Floors 3 NA 56.7
8 East Bob Cleanliness Produce Area 3 NA 62.2
9 East Bob Cleanliness Bathroom 3 NA 63.5
10 East Bob Selection Shelves Well Stocked 3 NA 37.2
# ... with 14 more rows
I’ve tried grouping by variables, doing a pivot_longer() before the pivot_wider() but just can’t seem to get it to work. Is there a useful work around for this?
>Solution :
I got a solution for you. Just group the data by location, shopper, quality_group, within_group_quality, and overall_store_score, and then calculate the mean grade. After that, use pivot_wider() to get the output you want. Here’s the code:
library(tidyverse)
df %>%
group_by(location, shopper, quality_group, within_group_quality, overall_store_score) %>%
summarize(mean_grade = mean(grade), .groups = "drop") %>%
pivot_wider(names_from = location, values_from = mean_grade) %>%
as.data.frame()
This should get you the result without any NA values, and you’ll be able to compare the grades across stores. Good luck!