Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

For loop of linear regression using filter to call specific rows

I have a data frame of different sites, year, and max temperature. I’d like to run a linear regression of the temp and year for each specific site. Instead of doing this for each site, it’d be nice if I could write a for loop that applies the same linear regression model to all of the sites individually and gives me an output with the name of the site in it. I’ve made some dummy data, I have 25 sites in the actual df.

data<- data.frame(site= c('alder','alder','alder','alder','alder','alder','alder','alder', 'oak','oak','oak','oak','oak','oak','oak','oak' ),
                  year= c('2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015','2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015'),
                  temp= c(0.5,3, 12, 42, 67, 8, 12, 22, 11, 4, 3, 6, 76, 1, 11, .9))

What here’s how I’ve tried to do it so far:

output<- vector("list", length(unique(data$site)))

sites<- unique(data$site)

for (i in sites) {
  data %>% filter(site=i) =j
   lm(formula = temp~year, data = j)=k
  output[[i]]=k
  }

I’m not sure what the best way to make the for loop call the subset of rows that correspond to one site. When I run this code the error that I’m getting is

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Error in data %>% filter(site = i) <- j : 
  could not find function "%>%<-"

I’ve already made sure tidyverse is in my library

Thanks for your help!

>Solution :

There are couple of typos, = would be == and do the -> instead of =. A third issue is the assignment to [[i]] – here i is each sites value. Thus, we may need to name the output to get the correct assignment

names(output) <- sites
for (i in sites) {
  data %>% filter(site==i) -> j
   lm(formula = temp~year, data = j)-> k
  output[[i]]=k
  }

-output

> output
$alder

Call:
lm(formula = temp ~ year, data = j)

Coefficients:
(Intercept)     year2009     year2010     year2011     year2012     year2013     year2014     year2015  
        0.5          2.5         11.5         41.5         66.5          7.5         11.5         21.5  


$oak

Call:
lm(formula = temp ~ year, data = j)

Coefficients:
(Intercept)     year2009     year2010     year2011     year2012     year2013     year2014     year2015  
  1.100e+01   -7.000e+00   -8.000e+00   -5.000e+00    6.500e+01   -1.000e+01   -3.263e-15   -1.010e+01  

With tidyverse, we may be able to do this a couple of ways

library(dplyr)
library(tidyr)
data %>% 
    nest_by(site) %>%
    mutate(model = list(lm(temp ~ year, data = data))) %>% 
    ungroup
# A tibble: 2 × 3
  site                data model 
  <chr> <list<tibble[,2]>> <list>
1 alder            [8 × 2] <lm>  
2 oak              [8 × 2] <lm>  

Or use reframe # dplyr version >= 1.1.0

data %>%
   reframe(model = list(lm(temp  ~year)), .by = site) %>%
   as_tibble

-output

# A tibble: 2 × 2
  site  model 
  <chr> <list>
1 alder <lm>  
2 oak   <lm>  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading