For a dataset I am working on I generated the common name of birds based on their scientific name using the taxize package. This created a list variable for their common names. I want to change the list variable into a character variable, but I can’t get it working. I have used the following code:
mydf[] <- lapply(mydf, unlist)
Which gives me the error:
> Error in `[<-.data.frame`(`*tmp*`, , value = list(scientific_name = c("aegithalos caudatus", :
replacement element 2 has 16 rows, need 20
It seemed this error occurs due to four names being character(0), which occurs if there is no match on the scientific name. Even when changing character(0) to NA or a random name using dplyr I get the same error.
birds2 <- birds %>%
mutate(common_name = replace(common_name, common_name == "character(0)", "NA"))
My goal is to turn this list variable into a character variable so that I can export the data frame with the writexl package (if I do it now the column common_name is empty after exporting). I have added a dput of my data frame. This data frame also produced the aforementioned error message. The dput is only 20 rows of the original 2917 rows.
birds <- structure(list(scientific_name = c("aegithalos caudatus", "alcedo atthis",
"amandava amandava", "anthus pratensis", "cettia cetti", "cyanistes caeruleus",
"delichon urbicum", "erithacus rubecula", "estrilda astrild",
"euplectes afer", "ficedula hypoleuca", "luscinia megarhynchos",
"parus major", "passer domesticus", "passer hispaniolensis",
"phylloscopus collybita", "serinus serinus", "sylvia atricapilla",
"sylvia melanocephala", "turdus philomelos"), common_name = list(
`aegithalos caudatus` = "Northern long-tailed tit", `alcedo atthis` = "common kingfisher",
`amandava amandava` = "red avadavat", `anthus pratensis` = character(0),
`cettia cetti` = character(0), `cyanistes caeruleus` = "blue tit",
`delichon urbicum` = "Northern house-martin", `erithacus rubecula` = "European robin",
`estrilda astrild` = character(0), `euplectes afer` = "yellow-crowned bishop",
`ficedula hypoleuca` = character(0), `luscinia megarhynchos` = "nightingale",
`parus major` = "Great Tit", `passer domesticus` = "House sparrow",
`passer hispaniolensis` = "Spanish sparrow", `phylloscopus collybita` = "eurasian chiffchaff",
`serinus serinus` = "European serin", `sylvia atricapilla` = "blackcap",
`sylvia melanocephala` = "Sardinian warbler", `turdus philomelos` = "song thrush")), row.names = c(NA,
20L), class = "data.frame") ```
>Solution :
Your problem is that some of your common names entries are missing and have length-0, and they disappear when you unlist(), with no placeholder for a missing value. From this point, the solution would be to fill these in with a missing value placeholder say, "" or NA, so that you have the right number of rows. I show one method below. However there may be a better solution to revise how the common name values were generated.
birds$common_name[lengths(birds$common_name) == 0] = list(NA_character_)
birds$common_name = unlist(birds$common_name)
class(birds$common_name)
# [1] "character"
head(birds)
# scientific_name common_name
# 1 aegithalos caudatus Northern long-tailed tit
# 2 alcedo atthis common kingfisher
# 3 amandava amandava red avadavat
# 4 anthus pratensis <NA>
# 5 cettia cetti <NA>
# 6 cyanistes caeruleus blue tit