Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pivot_wider — get rid of NA and move data up

I’m having some issues with pivot_wider() from tidyverse. I’ve looked at a few examples but can’t find anything that helps me with my issue. I keep getting NAs in my columns and was hoping to just have the data grouped.

The data is a group of grades from shoppers on various qualities of stores. It looks like this:

set.seed(4)
df <- tibble(
  store = c("Store 1", "Store 1", "Store 1", "Store 1", "Store 1", "Store 1", "Store 2", "Store 2", "Store 2", "Store 2", "Store 2", "Store 2", "Store 1", "Store 1", "Store 1", "Store 1", "Store 1", "Store 1", "Store 2", "Store 2", "Store 2", "Store 2", "Store 2", "Store 2"),
  location = c("West", "West", "West", "West", "West", "West", "East", "East", "East", "East", "East", "East", "West", "West", "West", "West", "West", "West", "East", "East", "East", "East", "East", "East"),
  shopper = c("Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Bob", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue", "Sue"),
  quality_group = c("Cleanliness", "Cleanliness", "Cleanliness", "Selection", "Selection", "Selection", "Cleanliness", "Cleanliness", "Cleanliness", "Selection", "Selection", "Selection","Cleanliness", "Cleanliness", "Cleanliness", "Selection", "Selection", "Selection", "Cleanliness", "Cleanliness", "Cleanliness", "Selection", "Selection", "Selection"),
  within_group_quality = c("Floors", "Produce Area", "Bathroom", "Shelves Well Stocked", "Lots of Dairy Options", "Deli has a high selection", "Floors", "Produce Area", "Bathroom", "Shelves Well Stocked", "Lots of Dairy Options", "Deli has a high selection", "Floors", "Produce Area", "Bathroom", "Shelves Well Stocked", "Lots of Dairy Options", "Deli has a high selection", "Floors", "Produce Area", "Bathroom", "Shelves Well Stocked", "Lots of Dairy Options", "Deli has a high selection"),
  grade = runif(n = 24, min = 35, max = 65),
  overall_store_score = c(4,4,4,4,4,4,3,3,3,3,3,3,3,3,3,3,3,3,5,5,5,5,5,5)
)

df

# A tibble: 24 x 7
   store   location shopper quality_group within_group_quality      grade overall_store_score
   <chr>   <chr>    <chr>   <chr>         <chr>                     <dbl>               <dbl>
 1 Store 1 West     Bob     Cleanliness   Floors                     52.6                   4
 2 Store 1 West     Bob     Cleanliness   Produce Area               35.3                   4
 3 Store 1 West     Bob     Cleanliness   Bathroom                   43.8                   4
 4 Store 1 West     Bob     Selection     Shelves Well Stocked       43.3                   4
 5 Store 1 West     Bob     Selection     Lots of Dairy Options      59.4                   4
 6 Store 1 West     Bob     Selection     Deli has a high selection  42.8                   4
 7 Store 2 East     Bob     Cleanliness   Floors                     56.7                   3
 8 Store 2 East     Bob     Cleanliness   Produce Area               62.2                   3
 9 Store 2 East     Bob     Cleanliness   Bathroom                   63.5                   3
10 Store 2 East     Bob     Selection     Shelves Well Stocked       37.2                   3
# ... with 14 more rows

I tried this, which obviously doesn’t work because it produces NA’s in the columns. I’d really like the values to be moved up so that I can compare grades across the stores for all of the shoppers.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df %>%
  pivot_wider(names_from = store,
              values_from = grade) %>%
  as.data.frame()

# A tibble: 24 x 7
   location shopper quality_group within_group_quality      overall_store_score `Store 1` `Store 2`
   <chr>    <chr>   <chr>         <chr>                                   <dbl>     <dbl>     <dbl>
 1 West     Bob     Cleanliness   Floors                                      4      52.6      NA  
 2 West     Bob     Cleanliness   Produce Area                                4      35.3      NA  
 3 West     Bob     Cleanliness   Bathroom                                    4      43.8      NA  
 4 West     Bob     Selection     Shelves Well Stocked                        4      43.3      NA  
 5 West     Bob     Selection     Lots of Dairy Options                       4      59.4      NA  
 6 West     Bob     Selection     Deli has a high selection                   4      42.8      NA  
 7 East     Bob     Cleanliness   Floors                                      3      NA        56.7
 8 East     Bob     Cleanliness   Produce Area                                3      NA        62.2
 9 East     Bob     Cleanliness   Bathroom                                    3      NA        63.5
10 East     Bob     Selection     Shelves Well Stocked                        3      NA        37.2
# ... with 14 more rows

I’ve tried grouping by variables, doing a pivot_longer() before the pivot_wider() but just can’t seem to get it to work. Is there a useful work around for this?

>Solution :

I got a solution for you. Just group the data by location, shopper, quality_group, within_group_quality, and overall_store_score, and then calculate the mean grade. After that, use pivot_wider() to get the output you want. Here’s the code:

library(tidyverse)

df %>%
  group_by(location, shopper, quality_group, within_group_quality, overall_store_score) %>%
  summarize(mean_grade = mean(grade), .groups = "drop") %>%
  pivot_wider(names_from = location, values_from = mean_grade) %>%
  as.data.frame()

This should get you the result without any NA values, and you’ll be able to compare the grades across stores. Good luck!

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading