I want to count the number of overall stars for each player on this page: https://cbgm.news/stats/CONN_Ratings.html
Here’s my rvest code:
library(tidyverse)
library(rvest)
url <- "https://cbgm.news/stats/CONN_Ratings.html"
scrape <- url %>%
read_html() %>%
html_nodes("td:nth-child(19)")
scrape
This returns:
{xml_nodeset (14)}
[1] <td>\n<i class="star yellow icon"></i><i class="star yellow ic ...
[2] <td>\n<i class="star yellow icon"></i><i class="star yellow ic ...
[3] <td>\n<i class="star yellow icon"></i><i class="star yellow ic ...
[4] <td>\n<i class="star yellow icon"></i><i class="star yellow ic ...
[5] <td>\n<i class="star yellow icon"></i><i class="star yellow ic ...
[6] <td>\n<i class="star yellow icon"></i><i class="star yellow ic ...
[7] <td>\n<i class="star yellow icon"></i><i class="star yellow ic ...
[8] <td>\n<i class="star yellow icon"></i><i class="star yellow ic ...
[9] <td>\n<i class="star yellow icon"></i><i class="star yellow ic ...
[10] <td>\n<i class="star yellow icon"></i><i class="star yellow ic ...
[11] <td>\n<i class="star yellow icon"></i><i class="star yellow ic ...
[12] <td>\n<i class="star yellow icon"></i><i class="star yellow ic ...
[13] <td>\n<i class="star yellow icon"></i><i class="star yellow ic ...
[14] <td><i class="star half yellow icon"></i></td>\n
How do I convert the xml_nodeset to a df/tibble that allows for mutating and counting the number of star icons?
I appreciate any help with this puzzle!
>Solution :
You could make a small function that looks for stars (full and half) and returns the number. Then use mutate() to add a new column stars which holds the application of that function to each element of scrape.
f <- function(s) {
return(str_count(as.character(s), "star yellow") + str_count(as.character(s), "star half")/2)
}
Now, use rvest::html_table() along with mutate()
rvest::html_table(url %>% read_html)[[1]] %>%
mutate(OVERALL = sapply(scrape,f))
Output:
NUM POS PLAYER FGI FGJ FT SCR PAS HDL ORB DRB DEF BLK STL DRFL DI IQ ATH OVERALL
<int> <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <dbl>
1 1 PG Marek … 36 76 54 73 81 50 66 72 65 72 75 21 53 42 58 4
2 14 PG Brian … 10 90 72 50 74 32 71 71 69 53 82 32 65 57 91 3.5
3 15 PG Morris… 25 85 56 71 53 60 10 53 76 10 53 28 72 47 76 2
4 12 SG Ryan M… 31 78 96 74 46 38 50 43 71 46 40 35 61 45 75 3.5
5 21 SG Lenny … 10 90 67 50 60 49 56 71 58 60 66 39 56 38 69 3
6 5 SG Fred M… 10 83 61 71 30 23 10 78 63 10 16 39 61 38 87 2
7 23 SF Will B… 35 73 58 74 66 38 70 72 52 60 74 21 42 46 30 4
8 51 SF Lyly L… 51 76 83 84 75 32 66 81 61 70 85 24 52 47 60 5
9 42 SF Joe Ch… 58 50 80 70 56 39 53 53 78 10 53 21 54 52 78 2
10 40 PF Richar… 63 50 41 72 71 32 79 78 65 71 71 54 39 43 72 4
11 30 PF Ammer … 56 54 81 63 60 23 72 72 78 66 56 35 50 54 58 3.5
12 54 C Xavier… 100 33 36 100 76 16 96 91 76 100 87 61 28 41 73 5
13 45 C Brad L… 91 38 56 60 63 19 75 76 78 82 70 58 30 28 68 4
14 10 C Ed Str… 68 40 45 10 10 10 13 10 10 10 10 24 17 16 10 0.5