Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R Software: Creating a table of r-squares from a table with multiple data series

I have a data.frame with multiple columns in it. The first in the frame is the dependent variable and the other columns are various independent variables. I’d like to create a table with all the R2s where column1 is y, and the each column is a different x.

Here’s an example data.frame:

df <- data.frame(
  'A' = runif(20,min=0, max=100),
  'B' = runif(20,min=0, max=100),
  'C' = runif(20,min=0, max=100),
  'D' = runif(20,min=0, max=100),
  'E' = runif(20,min=0, max=100)
)

and I’m using a function to calculate R2:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

rsq <- function(x, y) summary(lm(y~x,na.action = na.omit))$r.squared

I would like the output to be look like this:

          A.B         A.C         A.D         A.E 
1 0.009213715 0.009213715 0.009213715 0.009213715 

I know I could hard code the table this way:

r2_df<- data.frame(
  'A~B'=rsq(x=df$B,y=df$A),
  'A~C'=rsq(x=df$C,y=df$A),
  'A~D'=rsq(x=df$D,y=df$A),
  'A~E'=rsq(x=df$E,y=df$A)
)

But, here’s the kicker, my data frame will change from time to time, with different data series and a different number of columns. "A" will stay the same, but next time I pull the data I may end up with columns "A","B","X","Y","Z","P","O","S". So, I don’t want to hard code anything, I’d like to just set A as y, and have it loop through the the rest of the columns to produce the table. I’m new to R, and I’m struggling to get an apply function to produce anything.

Thank you for any help!

>Solution :

We may need to loop over the columns other than the first, apply the rsq function on the column with the ‘A’ column, modify the names of the list output and then coerce it to data.frame

lst1 <- lapply(df[-1], function(x) rsq(x, df$A))
names(lst1) <- paste0("A.", names(lst1))
as.data.frame(lst1)

-output

     A.B       A.C         A.D        A.E
1 0.1514966 0.1207118 0.003884215 0.02558644

NOTE: values are different as the data was created with runif and there was no set.seed

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading