I’m trying to perform 3 seperate Chi-squared tests to determine if there is a significance between BMI and Type 2 Diabetes. One for both males and females combined, one for just males, and one for just females. However, when I try running my code, I get really large X-squared values and a df value of n/a.
Code:
combined <- table(EMLabData$BMI, EMLabData$T2D)
men <- table(EMLabData$BMI[EMLabData$Sex=="Male"], EMLabData$T2D[EMLabData$Sex=="Male"])
women <- table(EMLabData$BMI[EMLabData$Sex=="Female"], EMLabData$T2D[EMLabData$Sex=="Female"])
chisq.test(combined, simulate.p.value = TRUE)
chisq.test(men, simulate.p.value = TRUE)
chisq.test(women, simulate.p.value = TRUE)
Outputs:
data: combined
X-squared = 1423.4, df = NA, p-value = 0.0004998
data: men
X-squared = 727.94, df = NA, p-value = 0.04798
data: women
X-squared = 1.2297, df = NA, p-value = 1
I’m unsure of what is going wrong and how to fix it. Any help would be greatly appreciated.
>Solution :
The degrees of freedom are used to choose which Chi-Squared distribution to use for computing the p-value. But you told the function to use simulation rather than a specific distribution to find the p-value, so the degrees of freedom do not matter in that case. The help file (?chisq.test) specifically says that the parameter, i.e. degrees of freedom, will be NA when using the Monte Carlo simulation (see the Value section and parameter subsection).
Large Chi-Squared values are evidence against the null hypothesis and will tend to be larger with both larger tables and larger counts in the tables when the null is false.
So R is giving you exactly what you asked for. There are no problems.