I’ve the following code:
read_prem_league <- function(year) {
"https://en.wikipedia.org/wiki/" %>%
paste0(year - 1, "-", substr(as.character(year), 3, 4), "_Premier_League") %>%
read_html() %>%
html_table() %>%
getElement(5) %>%
mutate(Season = year, .before = Pos)
}
read_prem_league(2015)
Which generates the following tibble:
#> # A tibble: 20 x 12
#> Season Pos Team Pld W D L GF GA GD Pts
#> <int> <chr> <int> <int> <int> <int> <int> <int> <chr> <int>
#> 1 2015 1 Manchester City (C) 38 27 5 6 83 32 +51 86
#> 2 2015 2 Manchester United 38 21 11 6 73 44 +29 74
#> 3 2015 3 Liverpool 38 20 9 9 68 42 +26 69
#> 4 2015 4 Chelsea 38 19 10 9 58 36 +22 67
#> 5 2015 5 Leicester City 38 20 6 12 68 50 +18 66
#> 6 2015 6 West Ham United 38 19 8 11 62 47 +15 65
#> 7 2015 7 Tottenham Hotspur 38 18 8 12 68 45 +23 62
#> 8 2015 8 Arsenal 38 18 7 13 55 39 +16 61
#> 9 2015 9 Leeds United 38 18 5 15 62 54 +8 59
#> 10 2015 10 Everton 38 17 8 13 47 48 -1 59
#> 11 2015 11 Aston Villa 38 16 7 15 55 46 +9 55
#> 12 2015 12 Newcastle United 38 12 9 17 46 62 -16 45
#> 13 2015 13 Wolverhampton Wande~ 38 12 9 17 36 52 -16 45
#> 14 2015 14 Crystal Palace 38 12 8 18 41 66 -25 44
#> 15 2015 15 Southampton 38 12 7 19 47 68 -21 43
#> 16 2015 16 Brighton & Hove Alb~ 38 9 14 15 40 46 -6 41
#> 17 2015 17 Burnley 38 10 9 19 33 55 -22 39
#> 18 2015 18 Fulham (R) 38 5 13 20 27 53 -26 28
#> 19 2015 19 West Bromwich Albio~ 38 5 11 22 35 76 -41 26
#> 20 2015 20 Sheffield United (R) 38 7 2 29 20 63 -43 23
#> # ... with 1 more variable: `Qualification or relegation` <chr>
I now want to merge the seasons from 2004 – 2015 by using the map_df function.
map_df(read_prem_league(2004:2015)) is maybe something on the way? What I’m struggling with is how to give my console an interval commando.
>Solution :
I think not all the years contain data in the format you expect. We can get all the years which fit your function by wrapping it in purrr::possibly(). We set the otherwise argument to NULL and call compact as next step to get rid of these elements. Then we can bind_rows. To make bind_rows work we need to convert the Pts column to character.
As next step you can inspect those years for which data couldn’t be retrieved with your function.
library(tidyverse)
library(rvest)
read_prem_league <- function(year) {
"https://en.wikipedia.org/wiki/" %>%
paste0(year - 1, "-", substr(as.character(year), 3, 4), "_Premier_League") %>%
read_html() %>%
html_table() %>%
getElement(5) %>%
mutate(Season = year, .before = Pos,
Pts = as.character(Pts)) # we need to convert Pts to `character`
}
test_ls <- map(set_names(2004:2015),
# if `read_prem_league` throws an error use `NULL` as result
possibly(read_prem_league, otherwise = NULL)) %>%
# lets get rid of those `NULL` elements
compact() %>%
bind_rows()
test_ls
#> # A tibble: 180 × 12
#> Season Pos Team Pld W D L GF GA GD Pts
#> <int> <int> <chr> <int> <int> <int> <int> <int> <int> <chr> <chr>
#> 1 2005 1 Chelsea (C) 38 29 8 1 72 15 +57 95
#> 2 2005 2 Arsenal 38 25 8 5 87 36 +51 83
#> 3 2005 3 Manchester Unit… 38 22 11 5 58 26 +32 77
#> 4 2005 4 Everton 38 18 7 13 45 46 −1 61
#> 5 2005 5 Liverpool 38 17 7 14 52 41 +11 58
#> 6 2005 6 Bolton Wanderers 38 16 10 12 49 44 +5 58
#> 7 2005 7 Middlesbrough 38 14 13 11 53 46 +7 55
#> 8 2005 8 Manchester City 38 13 13 12 47 39 +8 52
#> 9 2005 9 Tottenham Hotsp… 38 14 10 14 47 41 +6 52
#> 10 2005 10 Aston Villa 38 12 11 15 45 52 −7 47
#> # … with 170 more rows, and 1 more variable:
#> # `Qualification or relegation` <chr>
Created on 2022-07-31 by the reprex package (v0.3.0)