Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I insert a data frame in a function and then group by groups with tapply

I am new to programming in R and I have made a function that returns me some basic statistics from a list or vector that we insert. The problem comes when I want to insert a data frame.

The dataframe I want to insert has 2 columns; the first refers to a group (1 or 2) and the second refers to widths of the skull in cm (numerical values). I would like to take the mean of both groups separately so that later I can compare them (1 and 2), the mode, median, quartiles … (everything I have inside the function).

It occurred to me to use the function that I had made to insert lists or vectors and then to group me, use the tapply function but it gives me an error by console, this one:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Error in tapply(archivo, archivo$`Época histórica`, descriptive_statistics) : 
  arguments must have same length

Here you have the function and the tapply that I did:

descriptive_statistics = function(x){
  result <- list(
    mean(x), exp(mean(log(x))), median(x), modes(x),
    (range(x)[2] - range(x)[1]), var(x), sqrt(var(x)), sqrt(var(x)) / mean(x)
  )
  names(result) <- c('Aritmetic mean', 'Geometric mean', 'Median', 'Mode', 'Range', 'Variance', 'Standard deviation', 'Pearsons coefficient of variation')
  
  result
}

tapply(archivo, archivo$`Época histórica`, descriptive_statistics)


What could I improve my function so that it lets me enter dataframes? or what could I do in the tapply function to make it work for me? Can someone give me a hand with this? I also accept other ideas, I have tried with aggregate and inside the summary function and such but it does not give me the statistics I want, such as Pearson’s coefficient.

Thank you very much in advance, greetings

>Solution :

Pass column of dataframe in the function instead of complete dataframe. You haven’t shared your data so it is difficult to give specific answer but let’s assume the other column is called col1. In that case you can do –

tapply(archivo$col1, archivo$`Época histórica`, descriptive_statistics)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading