I have a data frame of plant Latin names, and another folder GBIF_data that stores the downloaded gbif data in csv named by the Latin names in the data frame, I want to mutate a new column to store how much data has been downloaded from GBIF for each plant Latin name, here is the code:
read.csv("data.csv") %>%
mutate(OCCURRENCES = nrow(read.delim(CSVPATH))) #csv files downloaded from GBIF use tab as delimiter so here read.delim should be used
The data frame looks like this (Here I show only the CSVPATH column which is mutated by concatenating the path before the plant Latin name and replacing the spaces in Latin name with the underscore, other columns that are not relative to the topic have been omitted):
CSVPATH
../GBIF_data/Lycopodium_cernuum.csv
../GBIF_data/Lycopodium_japonicum.csv
../GBIF_data/Lycopodiastrum_casuarinoides.csv
../GBIF_data/Selaginella_uncinata.csv
../GBIF_data/Selaginella_doederleinii.csv
../GBIF_data/Equisetum_ramosissimum.csv
../GBIF_data/Ophioglossum_reticulatum.csv
../GBIF_data/Osmunda_vachellii.csv
../GBIF_data/Lygodium_japonicum.csv
../GBIF_data/Lygodium_microphyllum.csv
And the name of the csv data stored in GBIF_data folder just replaced the space in the Latin name with the underscore _. When I ran the code, it reported the error:
Error in `mutate()`:
! Problem while computing `OCCURRENCES = nrow(read.delim(CSVPATH))`.
Caused by error in `h()`:
! error in evaluating the argument 'x' in selecting a method for function 'nrow': invalid 'description' argument
I wonder why dplyr::mutate does not work in this situation? It successfully mutated the Latin names to CSVPATH by string operations but when reading and counting the row numbers of another csv file it fails.
Thanks in advance!
>Solution :
We may need rowwise as read.delim is not vectorized i.e. it reads only a single file at a time
library(dplyr)
read.csv("data.csv") %>%
rowwise %>%
mutate(OCCURRENCES = nrow(read.delim(CSVPATH))) %>%
ungroup
Or another option is map
library(purrr)
read.csv('data.csv') %>%
mutate(OCCURRENCES = map_int(CSVPATH, ~ read.delim(.x) %>% nrow()))