Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Comparison of coefficient test across subsamples

I am trying to conduct a comparison of coefficient tests across two subsamples. To achieve this, I do the following:

full_model <- lm(y ~ v1*subsample_dummy + fixed_effects, data=df)
reduced_model <- lm(y ~ v1 + subsample_dummy + fixed_effects, data=df)

test <- anova(full_model, reduced_model)

The above gives me the result.

However, I am not sure how to do the same in the situation where I have to cluster the models by, let’s say, the year variable.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I can cluster the lm models using the following code:

library(sandwich)
# cluster by year
clustered_se <- vcovCL(full_model, ~ year) 
clustered_se1 <- vcovCL(reduced_model, ~ year)

# generate summaries with clustered standard errors 
a <- coeftest(full_model, vcov. = clustered_se) 
b <- coeftest(reduced_model, vcov. = clustered_se1) 

However, the issue remains, as I still cannot do:

anova(a, b)

How to achieve the comparison of coefficient test across subsamples when the model requires standard error clustering?

>Solution :

We can use sandwich::vcovCL to get essentially the same standard errors like lfe::felm. Let’s estimate some models.

> est1 <- lfe::felm(y ~ x1 + x2 | id + firm | 0 | firm, data=d)  ## id + firm FE, clustered by firm
> est2 <- lfe::felm(y ~ x1 | id + firm | 0 | firm, data=d)  ## same, restricted model
> est3 <- lm(y ~ x1 + x2 + as.factor(id) + as.factor(firm), data=d)  ## as est1 w/o clustering
> est4 <- lm(y ~ x1 + as.factor(id) + as.factor(firm), data=d)  ## as restricted est3

Comparing the standard errors of est3 with est1,

> lmtest::coeftest(est3, vcov.=\(x) 
+                  sandwich::vcovCL(x, cluster=d$firm, type='HC0'))[1:3, ]
             Estimate Std. Error  t value      Pr(>|t|)
(Intercept) 3.7416642 0.10554929 35.44945 5.454680e-177
x1          1.0432612 0.02965723 35.17730 3.631800e-175
x2          0.4904104 0.03186679 15.38939  5.822822e-48
> coef(summary(est1))
    Estimate Cluster s.e.  t value     Pr(>|t|)
x1 1.0432612   0.02968696 35.14207 1.799173e-13
x2 0.4904104   0.03189874 15.37398 2.931632e-09

yields essentially the same.

Thus, and according to a post on Cross validated [1] we could compare models est1 and est2 using an lmtest::waldtest (there’s also a lfe::waldtest which works differently).

> lmtest::waldtest(est3, est4, vcov=\(x) 
+                  sandwich::vcovCL(x, cluster=d$firm, type='HC0'))
Wald test

Model 1: y ~ x1 + x2 + as.factor(id) + as.factor(firm)
Model 2: y ~ x1 + as.factor(id) + as.factor(firm)
  Res.Df Df      F    Pr(>F)    
1    966                        
2    967 -1 236.83 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Hope this brings you a step further.

BTW: You definitely could file a feature request on this as an issue on the author’s GitHub.


Data:

> set.seed(42)
> n <- 1e3
> d <- data.frame(x1=rnorm(n), x2=rnorm(n), id=factor(sample(20, n, replace=TRUE)),
+                 firm=factor(sample(13, n, replace=TRUE)), u=rnorm(n))
> id.eff <- rnorm(nlevels(d$id))
> firm.eff <- rnorm(nlevels(d$firm))
> d$y <- d$x1 + 0.5 * d$x2 + id.eff[d$id] + firm.eff[d$firm] + d$u
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading