How do I count icons using rvest?

I want to count the number of overall stars for each player on this page: https://cbgm.news/stats/CONN_Ratings.html Here’s my rvest code: library(tidyverse) library(rvest) url <- "https://cbgm.news/stats/CONN_Ratings.html&quot; scrape <- url %>% read_html() %>% html_nodes("td:nth-child(19)") scrape This returns: {xml_nodeset (14)} [1] <td>\n<i class="star yellow icon"></i><i class="star yellow ic … [2] <td>\n<i class="star yellow icon"></i><i class="star yellow ic …… Read More How do I count icons using rvest?

R loop not looping through days

I have this code that should iterate through each object in "days." However, when I run the loop it only returns the dates found on the last day. days = seq(as.Date("2004-09-21"),as.Date("2004-09-25"),by = 1) for (i in days){ link = paste0("https://alrai.com/search?date-from=&quot;, days[i]) readlink <- read_html(link) link_maxpagenumbers_full <- readlink %>% html_elements(".roboto-b") %>% html_text2() link_maxpagenumbers_cut <- str_extract_all(link_maxpagenumbers_full,’\\d{1,3}’) readlink… Read More R loop not looping through days

How can I determine the delimiter being used in an infobox-data table on Wikipedia using R?

I am trying to scrape the infobox data for an Indonesian film from Wikipedia using R. In the infobox, there are several fields that contain multiple lines of data. For example, the "Pemeran" (or "Cast") field for the film "Kutunggu di Sudut Semanggi" https://id.m.wikipedia.org/wiki/Kutunggu_di_Sudut_Semanggi looks like this in the HTML: <tr> <th scope="row" class="infobox-label" style="white-space:nowrap;padding-right:0.65em;">Pemeran</th>… Read More How can I determine the delimiter being used in an infobox-data table on Wikipedia using R?

webscraping: capture links of references with R

I want to capture the links to references from an article on this page: https://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S2448-76782022000100004&lang=es I have tried this: library(rvest) library(dplyr) link <- "https://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S2448-76782022000100004&lang=es&quot; page <- read_html(link) links <- page %>% html_nodes("a") %>% html_text() But these are not the links that I want to. There are 68 references so I want the 68 links attached… Read More webscraping: capture links of references with R

Rvest: xlsx download

I’m trying to download an xlsx file with the code below: library(rvest) file <- "tesouro.csv" site <- read_html("https://www.tesourotransparente.gov.br/publicacoes/boletim-resultado-do-tesouro-nacional-rtn/&quot;) link <- site %>% html_nodes(xpath="//a[contains(text(), ‘serie_historica_jun22.xlsx’)]") %>% html_attr("href") download.file( url = link, mode = "w", destfile = file ) But the download is an empty xlsx with the html code inside the spreadsheet: <html> <head> <script> var… Read More Rvest: xlsx download

Web-Scraping using R (I want to extract some table like data from a website)

I’m having some problems scraping data from a website. I do have not a lot of experience with web-scraping. My intended plan is to scrape some data using R from the following website: https://www.myfxbook.com/forex-broker-swaps More precisely, I want to extract the Forex Brokers Swap Comparison for all the available pairs. My idea so far: library(XML)… Read More Web-Scraping using R (I want to extract some table like data from a website)

How to use `label_date_short`?

How I am using wrongly label_date_short from scales package? library(tidyverse) library(scales) date_taille <- tibble( Taille = rep(c("taille_hiver", "taille_ete"), times = 2), Date_taille = c("2016-08-01", "2016-02-01", "2018-08-01", "2018-02-01") %>% as.Date() ) ggplot(date_taille) + aes(x = Date_taille, y = Taille) + geom_point() + scale_x_date(date_breaks = "month", date_labels = label_date_short()) #or label_date() #> Error in format(x, format =… Read More How to use `label_date_short`?

Efficiency in extracting data from webscraping in R

This is no doubt very simple so apologies but I am new to webscraping and am trying to extract multiple datapoints in one call using rvest. Let’s take for example the following code (NB I have not used the actual website which I have replaced in this code snippet with xxxxxx.com): univsalaries <- lapply(paste0(‘https://xxxxxx.com/job/p&#8217;, 1:20,’/key=%F9%80%76&final=1&jump=1&PGTID=0d3408-0000-24gf-ac2b-810&ClickID=2′),… Read More Efficiency in extracting data from webscraping in R

How to save result as "ND" when there is no record? rvest and R

I have these two example html: url1.html ; url2.html In URL1.html there is no information (71) and in URL2.html there is. I’m using this code in R: library(rvest) library(tidyverse) x<-data.frame( URL=c(1:2), page=c(paste(readLines("url1.html"), collapse="\n"), paste(readLines("url2.html"), collapse="\n")) ) for (i in 1:nrow(x)){ html<-x$page[i]%>% unclass() %>% unlist() read_html(html,encoding = "ISO-8859-1") %>% rvest::html_elements(xpath = ‘//*[@id="principal"]/table[2]’) %>% rvest::html_elements(xpath = ‘//div[@id="tituloContext"]’)… Read More How to save result as "ND" when there is no record? rvest and R