Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas column-slices with mypy

Lately I’ve found myself in a strange situation I cannot solve for myself:

Consider this MWE:

import pandas
import numpy as np

data = pandas.DataFrame(np.random.rand(10, 5), columns=list("abcde"))

observations = data.loc[:, :"c"]
features = data.loc[:, "c":]

print(data)
print(observations)
print(features)

According to this Answer the slicing itself is done correct and it works in the sense that the correct results are printed.
But when I try to run mypy over it I get this error:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

mypy.exe .\t.py
t.py:1: error: Skipping analyzing "pandas": module is installed, but missing library stubs or py.typed marker
t.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
t.py:6: error: Slice index must be an integer or None
t.py:7: error: Slice index must be an integer or None
Found 3 errors in 1 file (checked 1 source file)

Which is also correct since the slicing is not done with an integer.
How can I either satisfy or disable the Slice index must be an integer or None error?

Of course you could use iloc(:,:3) to solve this problem, but this feels like a bad practice, since with iloc we are depending on the order of the columns (in this example loc is also dependent on the ordering, but this is only done to keep the MWE short).

>Solution :

That’s an open issue (#GH2410).

As a workaround, you can maybe try with get_loc :

col_idx = data.columns.get_loc("c")
​
observations = data.iloc[:, :col_idx+1]
features = data.iloc[:, col_idx:]

Output :

           a         b         c # <- observations
0   0.269605  0.497063  0.676928
1   0.526765  0.204216  0.748203
2   0.919330  0.059722  0.422413
..       ...       ...       ...
7   0.056050  0.521702  0.727323
8   0.635477  0.145401  0.258166
9   0.041886  0.812769  0.839979

[10 rows x 3 columns]

           c         d         e  # <- features
0   0.676928  0.672298  0.177933
1   0.748203  0.995165  0.136659
2   0.422413  0.222377  0.395179
..       ...       ...       ...
7   0.727323  0.291441  0.056998
8   0.258166  0.219025  0.405838
9   0.839979  0.923173  0.431298

[10 rows x 3 columns]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading