Object not found when running cor()

I am trying to find the correlation between two columns (sunshine_in_hours and AgeGroup_30_to_34) from a combined dataset in R. However, every time I try to run the cor() function, I just end up getting this error:

Error in pmatch(use, c("all.obs", "complete.obs", "pairwise.complete.obs",  : 
  object 'AgeGroup_30_to_34' not found

Here’s the dput(head) snipit:

structure(list(Date = structure(c(18659, 18660, 18661, 18663, 
18665, 18666, 18667, 18668, 18669, 18670, 18671, 18673, 18674, 
18675, 18676, 18677, 18678, 18679, 18680, 18681, 18682, 18683, 
18684, 18685, 18686, 18687, 18688, 18689, 18690, 18691), class = "Date"), 
    Year = c(2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 
    2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 
    2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 
    2021, 2021), Month = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3), AgeGroup_30_to_34 = c(0, 
    0, 0, 2, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 
    2, 0, 0, 1, 2, 0, 3, 0, 0, 0), Sunshine_in_hours = c(1.6, 
    3.4, 13.1, 8.9, 2, 1.7, 12.7, 11.6, 5.5, 5.6, 4.9, 9.2, 8.3, 
    11.9, 12.4, 12.4, 5.9, 0, 6.3, 8.5, 9.9, 8.7, 6.3, 1, 9.2, 
    6.3, 1.4, 2.1, 2.6, 3.6), City = c("Melbourne", "Melbourne", 
    "Melbourne", "Melbourne", "Melbourne", "Melbourne", "Melbourne", 
    "Melbourne", "Melbourne", "Melbourne", "Melbourne", "Melbourne", 
    "Melbourne", "Melbourne", "Melbourne", "Melbourne", "Melbourne", 
    "Melbourne", "Melbourne", "Melbourne", "Melbourne", "Melbourne", 
    "Melbourne", "Melbourne", "Melbourne", "Melbourne", "Melbourne", 
    "Melbourne", "Melbourne", "Melbourne")), row.names = c(NA, 
-30L), class = c("tbl_df", "tbl", "data.frame"))

I tried to run the code:

Combined <- inner_join(covidS, weatherS, by = 'Date')%>%
  mutate(Date = mdy(Date),
         Year = year(Date),
         Month = month(Date),
         Day = day(Date))%>%
  select(Date, Year, Month, AgeGroup_30_to_34, Sunshine_in_hours, City)%>%
  filter(City == 'Melbourne')%>%
  cor(Sunshine_in_hours, AgeGroup_30_to_34 )

I’ve tried looking up tutorials on how to do this, however I keep running into a wall. Any help will be appreciated.

>Solution :

cor takes two inputs, and you’re giving it 3, two of which it doesn’t understand. Try this:

Combined <- inner_join(covidS, weatherS, by = 'Date')%>%
  mutate(Date = mdy(Date),
         Year = year(Date),
         Month = month(Date),
         Day = day(Date))%>%
  select(Date, Year, Month, AgeGroup_30_to_34, Sunshine_in_hours, City)%>%
  filter(City == 'Melbourne') 

corr = cor(Combined$Sunshine_in_hours, Combined$AgeGroup_30_to_34 )

Remember when you’re using pipes, you’re feeding your last object as the first argument of the function you’re calling. In this case, your code was equivalent to:

cor(inner_join(covidS, weatherS, by = 'Date')%>%
  mutate(Date = mdy(Date),
         Year = year(Date),
         Month = month(Date),
         Day = day(Date))%>%
  select(Date, Year, Month, AgeGroup_30_to_34, Sunshine_in_hours, City)%>%
  filter(City == 'Melbourne'),
Sunshine_in_hours, AgeGroup_30_to_34 )

So both Sunshine_in_hours and AgeGroup_30_to_34 mean nothing if the function doesn’t know those are columns from another dataframe. The thing is, this function was coded for base R, and the rest of your programming is dplyr, which are different paradigms. Always check the docs when in doubt

Leave a Reply