pivot_wider — get rid of NA and move data up

April 8, 2023

I’m having some issues with pivot_wider() from tidyverse. I’ve looked at a few examples but can’t find anything that helps me with my issue. I keep getting NAs in my columns and was hoping to just have the data grouped.

The data is a group of grades from shoppers on various qualities of stores. It looks like this:

set.seed(4)
df <- tibble(
  store = c("Store 1", "Store 1", "Store 1", "Store 1", "Store 1", "Store 1", "Store 2", "Store 2", "Store 2", "Store 2", "Store 2", "Store 2", "Store 1", "Store 1", "Store 1", "Store 1", "Store 1", "Store 1", "Store 2", "Store 2", "Store 2", "Store 2", "Store 2", "Store 2"),
  location = c("West", "West", "West", "West", "West", "West", "East", "East", "East", "East", "East", "East", "West", "West", "West", "West", "West", "West", "East", "East", "East", "East", "East", "East"),
  shopper = c("Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue"),
  quality_group = c("Cleanliness", "Cleanliness", "Cleanliness", "Selection", "Selection", "Selection", "Cleanliness", "Cleanliness", "Cleanliness", "Selection", "Selection", "Selection","Cleanliness", "Cleanliness", "Cleanliness", "Selection", "Selection", "Selection", "Cleanliness", "Cleanliness", "Cleanliness", "Selection", "Selection", "Selection"),
  within_group_quality = c("Floors", "Produce Area", "Bathroom", "Shelves Well Stocked", "Lots of Dairy Options", "Deli has a high selection", "Floors", "Produce Area", "Bathroom", "Shelves Well Stocked", "Lots of Dairy Options", "Deli has a high selection", "Floors", "Produce Area", "Bathroom", "Shelves Well Stocked", "Lots of Dairy Options", "Deli has a high selection", "Floors", "Produce Area", "Bathroom", "Shelves Well Stocked", "Lots of Dairy Options", "Deli has a high selection"),
  grade = runif(n = 24, min = 35, max = 65),
  overall_store_score = c(4,4,4,4,4,4,3,3,3,3,3,3,3,3,3,3,3,3,5,5,5,5,5,5)
)

df

# A tibble: 24 x 7
   store   location shopper quality_group within_group_quality      grade overall_store_score
   <chr>   <chr>    <chr>   <chr>         <chr>                     <dbl>               <dbl>
 1 Store 1 West     Bob     Cleanliness   Floors                     52.6                   4
 2 Store 1 West     Bob     Cleanliness   Produce Area               35.3                   4
 3 Store 1 West     Bob     Cleanliness   Bathroom                   43.8                   4
 4 Store 1 West     Bob     Selection     Shelves Well Stocked       43.3                   4
 5 Store 1 West     Bob     Selection     Lots of Dairy Options      59.4                   4
 6 Store 1 West     Bob     Selection     Deli has a high selection  42.8                   4
 7 Store 2 East     Bob     Cleanliness   Floors                     56.7                   3
 8 Store 2 East     Bob     Cleanliness   Produce Area               62.2                   3
 9 Store 2 East     Bob     Cleanliness   Bathroom                   63.5                   3
10 Store 2 East     Bob     Selection     Shelves Well Stocked       37.2                   3
# ... with 14 more rows

I tried this, which obviously doesn’t work because it produces NA’s in the columns. I’d really like the values to be moved up so that I can compare grades across the stores for all of the shoppers.

df %>%
  pivot_wider(names_from = store,
              values_from = grade) %>%
  as.data.frame()

# A tibble: 24 x 7
   location shopper quality_group within_group_quality      overall_store_score `Store 1` `Store 2`
   <chr>    <chr>   <chr>         <chr>                                   <dbl>     <dbl>     <dbl>
 1 West     Bob     Cleanliness   Floors                                      4      52.6      NA  
 2 West     Bob     Cleanliness   Produce Area                                4      35.3      NA  
 3 West     Bob     Cleanliness   Bathroom                                    4      43.8      NA  
 4 West     Bob     Selection     Shelves Well Stocked                        4      43.3      NA  
 5 West     Bob     Selection     Lots of Dairy Options                       4      59.4      NA  
 6 West     Bob     Selection     Deli has a high selection                   4      42.8      NA  
 7 East     Bob     Cleanliness   Floors                                      3      NA        56.7
 8 East     Bob     Cleanliness   Produce Area                                3      NA        62.2
 9 East     Bob     Cleanliness   Bathroom                                    3      NA        63.5
10 East     Bob     Selection     Shelves Well Stocked                        3      NA        37.2
# ... with 14 more rows

I’ve tried grouping by variables, doing a pivot_longer() before the pivot_wider() but just can’t seem to get it to work. Is there a useful work around for this?

>Solution :

I got a solution for you. Just group the data by location, shopper, quality_group, within_group_quality, and overall_store_score, and then calculate the mean grade. After that, use pivot_wider() to get the output you want. Here’s the code:

library(tidyverse)

df %>%
  group_by(location, shopper, quality_group, within_group_quality, overall_store_score) %>%
  summarize(mean_grade = mean(grade), .groups = "drop") %>%
  pivot_wider(names_from = location, values_from = mean_grade) %>%
  as.data.frame()

This should get you the result without any NA values, and you’ll be able to compare the grades across stores. Good luck!