<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CompPrice 400 non-null int64
1 Income 400 non-null int64
2 Advertising 400 non-null int64
3 Population 400 non-null int64
4 Price 400 non-null int64
5 ShelveLoc 400 non-null object
6 Age 400 non-null int64
7 Education 400 non-null int64
8 Urban 400 non-null object
9 US 400 non-null object
10 HighSales 400 non-null object
dtypes: int64(7), object(4)
memory usage: 34.5+ KB
As shown in the info() result above, there are 11 columns indexed from 0 to 10 in my dataset, DF. Now, I would like to extract only the first 10 columns (that are the columns with the indices 0 to 9). However, when I try to use the code below:
DF.iloc[:, 0:9]
It returns only the first 9 columns (that is, from CompPrice to Urban).
In this case, I need to change my code to:
DF.iloc[:, 0:10]
to get what I actually want (that is, from CompPrice to US).
I’m really confused by iloc() indices. Why it requires ’10’ instead ‘9’ but starts with the index ‘0’. The starting and ending indices are not consistent.
>Solution :
What you are observing is the standard functionality of pandas. If you look in the documentation, you can find the definition. This is intended and logical, as Python lists function the same way. As per the docs:
.iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. .iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. (this conforms with Python/NumPy slice semantics).