Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Add a row in data.frame by counting row numbers of another csv with names stored in the data.frame using dplyr

I have a data frame of plant Latin names, and another folder GBIF_data that stores the downloaded gbif data in csv named by the Latin names in the data frame, I want to mutate a new column to store how much data has been downloaded from GBIF for each plant Latin name, here is the code:

read.csv("data.csv") %>%
  mutate(OCCURRENCES = nrow(read.delim(CSVPATH))) #csv files downloaded from GBIF use tab as delimiter so here read.delim should be used

The data frame looks like this (Here I show only the CSVPATH column which is mutated by concatenating the path before the plant Latin name and replacing the spaces in Latin name with the underscore, other columns that are not relative to the topic have been omitted):

   CSVPATH                                                                            
 ../GBIF_data/Lycopodium_cernuum.csv          
 ../GBIF_data/Lycopodium_japonicum.csv        
 ../GBIF_data/Lycopodiastrum_casuarinoides.csv
 ../GBIF_data/Selaginella_uncinata.csv        
 ../GBIF_data/Selaginella_doederleinii.csv    
 ../GBIF_data/Equisetum_ramosissimum.csv      
 ../GBIF_data/Ophioglossum_reticulatum.csv    
 ../GBIF_data/Osmunda_vachellii.csv           
 ../GBIF_data/Lygodium_japonicum.csv          
 ../GBIF_data/Lygodium_microphyllum.csv   

And the name of the csv data stored in GBIF_data folder just replaced the space in the Latin name with the underscore _. When I ran the code, it reported the error:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Error in `mutate()`:
! Problem while computing `OCCURRENCES = nrow(read.delim(CSVPATH))`.
Caused by error in `h()`:
! error in evaluating the argument 'x' in selecting a method for function 'nrow': invalid 'description' argument

I wonder why dplyr::mutate does not work in this situation? It successfully mutated the Latin names to CSVPATH by string operations but when reading and counting the row numbers of another csv file it fails.

Thanks in advance!

>Solution :

We may need rowwise as read.delim is not vectorized i.e. it reads only a single file at a time

library(dplyr)
read.csv("data.csv") %>%
  rowwise %>%
  mutate(OCCURRENCES = nrow(read.delim(CSVPATH))) %>%
  ungroup

Or another option is map

library(purrr)
read.csv('data.csv') %>%
   mutate(OCCURRENCES = map_int(CSVPATH, ~ read.delim(.x) %>% nrow()))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading