Home Unable to filter data frame by column value

Questions

Unable to filter data frame by column value

March 18, 2022

I created a data frame of reviews from a website. The three columns are date, rating, and text. I want to only see 1 and 5 star reviews. I have tried everything below and get roughly the same error

df %>% filter(Rating = '1 star', Rating = '5 star')

df$Rating

df[df$rating == '1 star',]

None have worked. Here’s the full code. The bit with the df is at the very bottom:

library(rvest)
library(tidyverse)

# Create url object ---------------------------------
url = "https://www.yelp.com/biz/24th-st-pizzeria-san-antonio?osq=Worst+Restaurant"

# Convert url to html object ------------------------
page <- read_html(url)

# Number of pages -----------------------------------
pageNums = page %>%
  html_elements(xpath = "//div[@class=' border-color--default__09f24__NPAKY text-align--center__09f24__fYBGO']") %>%
  html_text() %>%
  str_extract('of.*') %>% 
  str_remove('of ') %>% 
  as.numeric() 

# Create page sequence ------------------------------
pageSequence <- seq(from=0, to=(pageNums * 10)-10, by = 10)

# Create empty vectors to store data ----------------
review_date_all = c()
review_rating_all = c()
review_text_all = c()

# Create for loop -----------------------------------
for (i in pageSequence){
  if (i==0){
    page <- read_html(url) 
  } else {
    page <- read_html(paste0(url, '&start=', i))
  }
  
  # Review date ----
  review_dates <- page %>%
    html_elements(xpath = "//*[@class=' css-chan6m']") %>%
    html_text() %>%
    .[str_detect(., "^\\d+[/]\\d+[/]\\d{4}$")]
  
  # Review Rating ----
  review_ratings <- page %>%
    html_elements(xpath = "//div[starts-with(@class, ' review')]") %>%
    html_elements(xpath = ".//div[contains(@aria-label, 'rating')]") %>%
    html_attr('aria-label') %>%
    str_remove('rating')
  
  # Review text ----
  review_text = page %>%
    html_elements(xpath = "//p[starts-with(@class, 'comment')]") %>%
    html_text()
  
  # For each page, append these to appropriate vectors----
  review_date_all = append(review_date_all, review_dates)
  review_rating_all = append(review_rating_all, review_ratings)
  review_text_all = append(review_text_all, review_text)
}

# Create data frame ---------------------------------
df <- data.frame('Date' = review_date_all,
                 'Rating' = review_rating_all,
                 'Text'= review_text_all)
View(df)

What am I overlooking?

>Solution :

There’s an issue with the Rating values in your df. There’s an extra space at the end of every rating.

So you need to do something like this:

df1 <- df %>%
  filter(Rating == '1 star ' | Rating == '5 star ')

You can also remove the trailing whitespace using stringr library as follows:

library(stringr)
df1 <- df %>%
  mutate(Rating = str_squish(Rating)) %>%
  filter(Rating == '1 star' | Rating == '5 star')

dataframe

byMR

Published March 18, 2022

Add a comment

function(char message []) not compiling with function("Text");

byMR

March 18, 2022

Questions

How to extract a table from website without specifying the web browser in python

byMR

March 18, 2022

Questions

Filter column by group based on another column (R dplyr)

byMR

March 18, 2022

Questions

switch elements between columns based on the value of another column r

byMR

March 18, 2022

Questions

How to make main and aside side by side?

byMR

March 18, 2022

Questions

How to change flutter button border color based on MaterialState?

byMR

March 18, 2022

Unable to filter data frame by column value

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

function(char message []) not compiling with function("Text");

How to extract a table from website without specifying the web browser in python

Filter column by group based on another column (R dplyr)

switch elements between columns based on the value of another column r

How to make main and aside side by side?

How to change flutter button border color based on MaterialState?

Keep Up to Date with the Most Important News

Unable to filter data frame by column value

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

function(char message []) not compiling with function("Text");

How to extract a table from website without specifying the web browser in python

Filter column by group based on another column (R dplyr)

switch elements between columns based on the value of another column r

How to make main and aside side by side?

How to change flutter button border color based on MaterialState?

Discover more from Dev solutions