Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why does stats.linregress return complex r-values for complex input arrays?

I’m attempting to perform linear regression on two complex arrays. That is, I’d like to find the line of best fit, w=mz+b, where m and b are both permitted to be complex and where the R^2-value, R^2=1-RSS/TSS is minimized. (Here RSS and TSS are the sum of squared residuals and the total of sum of squares.)

I know this can be done by creating a design matrix, computing m and b, etc., but out of curiosity, I tried using linregress from scipy.stats, which did return values:

import numpy as np
from scipy import stats
rng = np.random.default_rng()
x = rng.random(10)+1j*rng.random(10)
y = 1.6*x + rng.random(10)+1j*rng.random(10)
res = stats.linregress(x, y)
print(res)

LinregressResult(slope=(1.5814820568268182-0.004143389169974774j), intercept=. 
(0.37141513243354485+0.4522070413718836j), rvalue=(0.8607413430092087- 
0.002255091256570885j), pvalue=0.00138658952096427, stderr=. 
(0.3306870298601568+0.0024769249452937106j), intercept_stderr=. 
(0.16366363994151886+0.12045799398296754j))

What meaning does a non-real, complex-valued rvalue have? Is the modulus of this value the coefficient of determination?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

The function stats.linregress from the Python’s scipy library returns complex R-values for complex input arrays because the calculation of the regression line involves the calculation of the covariance and the standard deviation of the input arrays. The calculation of these statistics is done using the formula:

Covariance = sum((x – mean(x)) * (y – mean(y))) / (n – 1)
Standard deviation = sqrt(sum((x – mean(x)) ** 2) / (n – 1))

If the input arrays contain complex numbers, these formulas can lead to complex results. In particular, the standard deviation can be complex if the input array contains complex numbers. When the standard deviation is complex, the calculation of the R-value, which is the covariance divided by the product of the standard deviations, will also result in a complex number.

In general, the presence of complex numbers in the regression line should not be surprising, since linear regression is a linear model and complex numbers can be used to represent complex relationships between variables. The interpretation of complex R-values is not straightforward and should be done with caution.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading