Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Encountered "UseMethod("xml_find_all")" when using "html_nodes" for a list

I got this problems when I tried to use "html_nodes" with a list (profile_data_list).

library(tidyverse)
library(rvest)
list.mst <- c("0100111338" "0100105077" "0100110528" "0107464283" "0105342089")
url <- 'https://infodoanhnghiep.com/tim-kiem/ma-so-thue/'
link <- paste0(url, list.mst,'/')
profile_data_list <- lapply(link, function(x){search.result <- read_html(x)})
list <- profile_data_list %>% html_nodes(".company-name a") %>% html_attr('href') %>% unique()
com.page = paste0("https:",profile_data_list)

Error in UseMethod("xml_find_all") : no applicable method for 'xml_find_all' applied to an object of class "character"

I have used forin, but if I use forin the result I got is only about the last value in sequence. For example, If I use forin I only get the result of "0105342089". Therefore, I use the reapply function to read_html of a list.mst, but I have struggle when using html_nodes. I also tried to use (but still failed), as follow: list <- purrr::map(profile_data_list, ~ .x %>% html_nodes(".company-name a")%>% html_attr('href') %>% unique()) and list<-lapply(profile_data_list, function(x) x%>% html_nodes(".company-name a") %>% html_attr('href')%>% unique()). I really appreciate any suggestions. Thanks all!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

library(tidyverse)
library(rvest)

link <- c("0100111338", "0100105077", "0100110528", "0107464283", "0105342089") %>% 
  str_c("https://infodoanhnghiep.com/tim-kiem/ma-so-thue/", ., "/")

scraper <- function(link) {
  cat("Scraping", link, "\n")
  link %>%
    read_html() %>%  
    html_elements(".company-item") %>% 
    map_dfr(~ tibble(
      link = .x %>% 
        html_element(".company-name a") %>% 
        html_attr("href") %>% 
        str_c("https:", .), 
      title = .x %>% 
        html_element(".company-name") %>% 
        html_text2(), 
      city = .x %>%  
        html_element(".description.hidden-xs") %>% 
        html_text2()
    )) %>%  
    mutate(source = link)
}

map_dfr(link, scraper)

# A tibble: 26 × 4
   link                                                        title city  source
   <chr>                                                       <chr> <chr> <chr> 
 1 https://infodoanhnghiep.com/thong-tin/Cong-Ty-Co-Phan-My-T… "C\u… "H\u… https…
 2 https://infodoanhnghiep.com/thong-tin/Cong-ty-TNHH-hoi-cho… "C\u… "H\u… https…
 3 https://infodoanhnghiep.com/thong-tin/Chi-Nhanh-Cty-My-Thu… "Chi… "TP … https…
 4 https://infodoanhnghiep.com/thong-tin/Chi-nhanh-cong-ty-my… "Chi… "H\u… https…
 5 https://infodoanhnghiep.com/thong-tin/Chi-nhanh-cong-ty-my… "Chi… "Th\… https…
 6 https://infodoanhnghiep.com/thong-tin/Cong-Ty-Co-Phan-Xay-… "C\u… "H\u… https…
 7 https://infodoanhnghiep.com/thong-tin/Chi-Nhanh-Cong-Ty-Co… "Chi… "H\u… https…
 8 https://infodoanhnghiep.com/thong-tin/Chi-nhanh-cong-ty-co… "Chi… "H\u… https…
 9 https://infodoanhnghiep.com/thong-tin/CHI-NHANH-CONG-TY-CO… "CHI… "H\u… https…
10 https://infodoanhnghiep.com/thong-tin/CHI-NHANH-CONG-TY-CO… "CHI… "H\u… https…
# … with 16 more rows
# ℹ Use `print(n = ...)` to see more rows
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading