I’ve got a table like this:
country continent date n_case Ex TD TC
--------------------------------------------------------------------------------
Italy Europe 2022-02-24 6 NA 2 90
Italy Europe 2022-01-17 12 87 2 86
USA America 2022-02-23 NA NA 3 65
USA America 2022-01-08 6 NA 5 67
USA America 2022-01-04 6 7 7 87
etc etc...
What I wish is a new data frame with one row per country that for each country will store as columns the country name (column = country) and continent (column = continent), and also the latest date reported for each value in the column(date, n_case, Ex, TD, TC):
the desirable data frame:
country continent date n_case Ex TD TC
--------------------------------------------------------------------------------
Italy Europe 2022-02-24 6 87 2 90
USA America 2022-02-23 6 7 3 65
etc etc..
values to ignore are either NA or " " (blank)
thank you!
>Solution :
With dplyr, you can sort the data by dates decreasingly and then select the first non-NA value in each column.
library(dplyr)
df %>%
group_by(country, continent) %>%
arrange(desc(date), .by_group = TRUE) %>%
summarise(across(everything(), ~ .x[!is.na(.x)][1])) %>%
ungroup()
# # A tibble: 2 × 7
# country continent date n_case Ex TD TC
# <chr> <chr> <date> <int> <int> <int> <int>
# 1 Italy Europe 2022-02-24 6 87 2 90
# 2 USA America 2022-02-23 6 7 3 65