I have two dataframes in R. The first is called sports_countries and it looks as such:
sports_countries <- structure(list(Sport = c("Basketball", "Baseball", "Badminton",
"Cricket", "Hockey", "Gymnastics", "Rugby", "Football", "Tennis",
"Volleyball"), Country = c("Turkey", "Nicaragua", "South Korea",
"Botswana", "Oman", "Brazil", "Samoa", "Saudi Arabia", "Belarus",
"Latvia"), Score = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA,
-10L), class = "data.frame")
The second is called sports_countries_score and looks as such:
sports_countries_score <- structure(list(Country = c("Belarus", "Botswana", "Brazil", "Latvia",
"Nicaragua", "Oman", "Samoa", "Saudi Arabia", "South Korea",
"Turkey"), Basketball = c(100L, 100L, 13L, 25L, 88L, 100L, 100L,
68L, 12L, 11L), Baseball = c(49L, 44L, 18L, 100L, 17L, 100L,
100L, 100L, 5L, 51L), Badminton = c(100L, 100L, 52L, 100L, 100L,
100L, 100L, 100L, 2L, 48L), Cricket = c(100L, 40L, 30L, 100L,
100L, 18L, 44L, 32L, 100L, 100L), Hockey = c(14L, 100L, 35L,
10L, 100L, 27L, 66L, 100L, 9L, 28L), Gymnastics = c(100L, 100L,
4L, 100L, 100L, 100L, 100L, 100L, 8L, 11L), Rugby = c(100L, 54L,
10L, 50L, 100L, 100L, 3L, 100L, 30L, 13L), Football = c(57L,
100L, 3L, 100L, 100L, 73L, 97L, 53L, 17L, 41L), Tennis = c(2L,
100L, 13L, 17L, 100L, 100L, 100L, 100L, 99L, 100L), Volleyball = c(100L,
76L, 1L, 9L, 53L, 59L, 100L, 60L, 30L, 1L)), class = "data.frame", row.names = c(NA,
-10L))
Let’s first assume that each of these were different tabs in Excel of the same name. For reference, in the first table, "Sport" would represent Cell A1. In the second table, "Country" would represent Cell A1.
In order to fill out the empty cells in the "Score" column in the first table, I would create formulas that look as such (Starting with cell C2):
=VLOOKUP(B2,sports_countries_score!A:K,2,FALSE)
=VLOOKUP(B3,sports_countries_score!A:K,3,FALSE)
=VLOOKUP(B4,sports_countries_score!A:K,4,FALSE)
=VLOOKUP(B5,sports_countries_score!A:K,5,FALSE)
=VLOOKUP(B6,sports_countries_score!A:K,6,FALSE)
=VLOOKUP(B7,sports_countries_score!A:K,7,FALSE)
=VLOOKUP(B8,sports_countries_score!A:K,8,FALSE)
=VLOOKUP(B9,sports_countries_score!A:K,9,FALSE)
=VLOOKUP(B10,sports_countries_score!A:K,10,FALSE)
=VLOOKUP(B11,sports_countries_score!A:K,11,FALSE)
And that would create a final table that looks as such:
But I cannot for the life of me figure out how to do that equivalent in R. For example, I tried:
final_table <- sports_countries %>%
left_join(sports_countries_score, by = "Country") %>%
select(Sport, Country, Score = Basketball:Volleyball)
But that doesn’t work.
What would be the best way to go about accomplishing this?
>Solution :
To get the most out of tidyverse, it helps to use tidy layouts, on this basis your second frame was untidy, as it had parameters to join on in its column headings. This is easily solved with pivot_longer, to go from wide to long layout. Also an empty score field in the first table serves no purpose, so my code excludes it, then makes it from the values from the second table.
(final_table <- sports_countries |> select(-Score) |>
left_join(
sports_countries_score |>
pivot_longer(-Country),
by = c("Country",
"Sport" = "name"
)
) |>
rename("Score" = "value"))
further reading : https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html#:~:text=Tidy%20data%20is%20a%20standard,Every%20row%20is%20an%20observation.